In the first part of this publication, we discussed the latest two major international standardisation activities on the topic of optical machine authentication of travel documents: the ‘Best Practice Guidelines on Optical Machine Authentication’[1] and the Technical Guideline ‘BSI TR-03135 – Machine Authentication of MRTDs for Public Sector Applications’.[2][3] In this second part, we present the results of the research project ‘Applied Research on Optical Machine Authentication’ (AROMA) on the development of an evaluation method for optical inspection software products. The method is based on the application of the standards described in Part I by utilising the standardised XML schema for logging detailed optical machine authentication results provided in BSI TR-03135 v2.1.
Upon procurement of machine authentication systems, for example for automated border control gates, the question arises which product to select with respect to the machine-assisted inspection of travel documents. Manufacturers usually keep the document inspection processes within such devices confidential, and a systematic evaluation of the performance of such systems is difficult if not impossible due to the complex scenarios required for a thorough and well-founded conclusion. Hence, there is a clear need for a systematic methodology to evaluate the performance of software products that are used for the optical authentication of identity documents.
Research objectives
Determining a systematic methodology is the main objective of our ongoing research activity. Contrary to the previous project IDEAL (‘ISU-basierte Dokumenten-Echtheitsprüfung mittels Ausweis-Lesern’), on which we reported in 2013/2014,[4][5] in AROMA we aimed to use live data of ‘real’ documents presented by passengers of international flights at the Frankfurt Airport border control. The insights gained ‘in vitro’ during the laboratory-type evaluation in IDEAL were applied to an operational border control environment, moving towards an ‘in-situ’ evaluation scenario. With that in mind, a secondary objective of the project was to ‘measure’ the quality of the optical machine authentication process currently in operation at Frankfurt Airport.
Furthermore, while machine authentication results of the inspection systems at Frankfurt Airport are routinely logged using BSI TR-03135 v1.2 for monitoring purposes, the subsequent generation of BSI TR-03135 – version 2.1 – was applied to an operational environment for the first time within the AROMA project. This version includes expanded logging details, such as the results of individual optical check routines.
Evaluation procedure
The performance evaluation of the optical document inspection systems involved in AROMA took place between April 2016 and February 2017. For data acquisition, the images and inspection results (logged using BSI TR-03135 v1.2) from documents presented at four border control stations at Frankfurt Airport were transferred to an evaluation server (see Figure 1). After discarding all data sets that did not match the input criteria (see below), the images of the remaining data sets were submitted to the document inspection software products of three different vendors for an optical authenticity check. The corresponding test results were then recorded in the form of the extended logging according to BSI TR-03135 v2.1. During the project, a total of more than 214,000 resulting data sets were collected and processed by the three different products. However, due to late software updates, server downtimes and licencing issues, only those data sets recorded during a defined evaluation period were taken into account, leading to a subset of almost 31,000 evaluated data sets. A brief overview of the project timeline is illustrated in Figure 2.
For practical reasons, the evaluation considered only eight identity documents as illustrated in Figure 3. These are:
- the 2005 German passport
- the 2010 German identity card
- the 2006 Swiss passport
- the 2006 British passport
- the 2010 British passport
- the 2015 British passport
- the 2010 Italian passport
- the 2006 French passport
Logging of test results according to BSI TR-03135 v2
It should be noted that the results achieved in this project are based directly on the detailed protocols of the optical check routines performed by the software products during the authentication process. This means that a meaningful analysis and comparison of their performance is only possible if the annotations of the check routines and the implementation of the XML schema according to BSI TR-03135 v2 for the logging of the test results are carried out correctly.
The project has shown that the implementation of BSI TR-03135 in version 2.1 is not trivial and that manufacturers need some help here. On the one hand, this concerns the creation of schema-compliant XML files, with which one manufacturer in particular experienced great difficulties (only about a quarter of the original data records could be used for an evaluation). On the other hand, this also concerns the correct assignment of the performed check routines to the generic identifiers given in BSI TR-03135. Despite the help, however, annotation errors occurred in the products of all three manufacturers (see Figure 4).
If implemented correctly, the extended logging of BSI TR-03135 v2 not only creates greater transparency with regard to the processes for identifying and verifying documents, but it also enables detailed monitoring and feedback loops between manufacturers, operational management and other parties involved in the document inspection processes. Problematic check routines of individual document models, for instance, can now be identified directly and root-cause analyses of such issues become feasible. Thus, it is strongly recommended to implement the BSI TR-03135 in its current version whenever inspection systems for operational, high-volume border control are planned, modified or updated.
Optimisations within BSI TR-03135
The experience gained through AROMA led to valuable optimisations within BSI TR-03135, which found their way into the recently released version 2.3. For example, the catalogue of check routines in v2.1 could be extended with additional check routines already performed by the evaluated software products. However, significant potential for improvement remains, in particular with respect to harmonised logging of errors as well as the unambiguous mapping of check routines to the front/reverse in the case of double-sided documents.
Systematic limitations of the method
While we started this project with a rather comprehensive scope in mind, we successively had to reduce our initial expectations: in order to interpret and discuss the evaluation results correctly, it is imperative to be aware of some systematic limitations in the implementation of the project.
Genuine documents
Based on the settings applied to the input filter for the evaluation server, the ‘real’ documents used in the evaluation had to be genuine. Only the documents that were successfully authenticated both electronically and optically at the border control station were forwarded to the different software products. Therefore, only the performance regarding the identification and verification of genuine documents can be evaluated here, as the detection of fraudulent documents cannot be assessed due to the lack of corresponding data sets.
‘Native’ and ‘non-native’ images
It should be noted that the images used for the evaluation were taken from devices at the border control stations. These devices were manufactured by and equipped with the software from vendor A, who also participated in the evaluation (albeit with its next generation and fundamentally changed software version). In this context, it is crucial to understand that vendor A developed its software products precisely for the image material acquired with its own proprietary devices, and has therefore optimised its inspection software for the specific characteristics of these ‘native’ images. Conversely, this also means that the products from the other vendors, in particular vendor B’s inspection product, are optimised for other recording devices with different specific characteristics and hence have to deal with ‘non-native’ image material. The evaluation showed that these differences between the image acquisition devices, especially by UV and IR exposure, are significant and have a decisive influence on the results of individual optical tests, and thus on the overall result (see Figure 5). This influence of the origin of the image material represents the greatest challenge for the development of a suitable performance evaluation method and must always be kept in mind when interpreting the results of the evaluation method presented here: the systems dealing with ‘non-native’ image material may show a significantly different performance in real life when employed with their ‘native’ image-acquisition devices.
User guidance and operability
The method presented here does not provide any conclusions about the user guidance and operability of the evaluated software products. Furthermore, assessments about the detection and handling of defects and errors such as reading processes interrupted by removing a document too early are impossible due to the setup lacking any user interface or interaction in this evaluation.
Other components and software factors
The evaluation procedure only considers the software component of an inspection system. The quality of the optical and electronic components for image acquisition or chip access as well as other software factors (for example, interoperability, compatibility and stability) are beyond the scope of this evaluation.
Analysis of incoming image material from border control stations
During the evaluation phase, two screenings of the input image material were conducted in order to ensure that the images actually match the document models identified by the border control stations. In many aspects, these screenings were very helpful for the evaluation and are strongly recommended for future evaluations. On the one hand, data sets unsuitable for the evaluation due to operating or logging errors at the border control station could be identified and subsequently excluded from the evaluation by setting a corresponding input filter. On the other hand, these events provided valuable insights into the root causes of failing check routines and enabled a meaningful interpretation of the results. For example, the screening revealed that the evaluation of double-sided documents requires an adapted treatment compared to that of single-sided documents, as described in the next section. Finally, the analysis supported the assumption that the input image material can actually be treated as valid ground truth.
In addition to the described benefits for the purpose of evaluation, the temporary storage of image material would also be helpful in permanent feedback loops to be established between manufacturers of inspection systems, operators, document designers and issuers. For instance, by analysing the input material, deviations in the UV features of the French passport could be identified which have a direct impact on the overall inspection result (see Figure 6).
Double-sided documents
The evaluation showed that double-sided documents are a special case in the evaluation. The logging process and the user guidance of the inspection system at the border control station for double-sided documents can have a decisive influence on the evaluability of the data, since the incoming image material and the log files can vary significantly depending on the scenario. The following situations of imprecise user guidance or operator behaviour could lead to different input data for the evaluation:
- The front is placed first, then the reverse containing the MRZ.
- The reverse containing the MRZ is placed first, then the front.
- The front or reverse containing the MRZ is placed, after which the process is aborted by the operator.
- The front or reverse of, for example, an identity card is placed, followed by a different kind of document, such as a passport, although the system expects the other side of the identity card.
Although theoretically these variations are undesirable for an evaluation, they do represent reality at border control. They provide valuable insights into how robustly or flexibly software reacts to different input material and how reliably and transparently it is handled in terms of feedback to the user and evaluation of the document.
Suitability of AROMA for the systematic evaluation of optical authentication software
The evaluation approach presented here considers various aspects and indicators of the performance of software products for the optical authentication of documents. Due to the systematic limitations described above, not all aspects can be used to the same extent as reliable criteria for comparison or equal ranking of the performance of document inspection software.
Scope and spectrum of testing
Logging according to BSI TR-03135 v2 allows for gathering information about which check routines have been performed. As an initial step, the theoretical potential of the participating documents was screened (see Figure 7) and used as a basis for further analysis by comparing the screening results with the check routines actually performed by the different software products. Furthermore, through assigning the check routines to the various categories of security features, i.e. substrate, security printing or personalisation, it was determined to what extent these categories are covered by the check routines of a software product (see Figure 8). This aspect of the evaluation allows for a direct comparison between the tested software products and is not subject to any systematic limitations in its informative value. However, it should be noted that check routines sometimes cannot be unambiguously assigned to a single category of security features.
Test duration
In principle, logging according BSI TR-03135 v2 enables the evaluation of the duration of the optical inspection for a certain document model. This parameter is of particular interest for border control authorities – especially at busy airports – aiming at minimising inspection times due to high passenger volumes. However, due to the non-operative environment of the tested inspection software tools, the measured values could only be considered as rough estimates and are therefore not presented here.
Identification rates
Typically, the software products use algorithms to identify the document based on image information captured in white light. Nevertheless, manufacturers are free to use other spectral ranges to identify a document beyond doubt (for example: one vendor distinguishes between two models of the French passport based on images taken in UV light). Since the characteristics of the images in these spectral ranges are specific to the configuration of the acquisition devices used, comparability of identification rates between manufacturers in this evaluation is rather limited.
Verification rates
In order to verify a document, the software products perform pattern recognition algorithms based on images acquired with all illumination spectra. Hence, the limitations imposed by the origin of the image material described above are of the highest relevance when comparing verification rates between manufacturers. In fact, a direct comparison of the verification performance between the inspection software products is virtually impossible on the basis of this evaluation method, since it cannot be ruled out that those products dealing with ‘non-native’ image material in the evaluation would deliver significantly different results when employed with proprietary acquisition devices delivering ‘native’ image material. Thus, a statement that software ‘A’ performs better than software ‘B’ is impossible based on the evaluation results.
One option to enable a direct comparison between inspection software products would be a standardisation of the image acquisition devices. Such a standardisation could ensure that all manufacturers optimise their software for the same image material, and that additionally, all software products are interoperable with various hardware configurations. However, for this approach, it is crucial that these standardised acquisition devices are actually employed in the operative field. The use of standardised hardware for the sole purpose of an evaluation could potentially lead to performance results being way off the mark, similar to cases like ‘Dieselgate’, for example.
Robustness/flexibility of the test software
Real-life border control situations are subject to various influencing factors and disturbances, which – similar to the circumstances described above – can manifest themselves in input material deviating from well-defined laboratory conditions. However, as these represent real situations at border control, any findings on how inspection software deals with these cases can certainly be useful in an evaluation process. The reasons for such deviations may sometimes lead to conclusions and corresponding counteractions which are beyond the scope of AROMA.
Implementation of logging according to BSI TR-03135 v2
As mentioned above, the evaluation is based on the correct implementation of BSI TR-03135 v2.1. Hence, the conformance with BSI TR-03135 can be considered as a valuable criterion for an evaluation of document inspection software. The experiences regarding the logging aspects within the present evaluation have already been described above.
Comparison of the evaluation results with predecessor IDEAL
Since some of the vendors participating in this evaluation were already examined in the BKA research project IDEAL in 2013, it is interesting to look at the development of the number of generic check routines performed on different document models. Thus far, none of the manufacturers makes use of more than half of the theoretical potential of check routines of the document models. On the positive side, one of the vendors has significantly increased the overall scope of inspection, which is mostly caused by performing several individual tests in the areas of the facial image, the MRZ and the VIZ instead of a UV brightness check covering the entire data page. Another vendor, on the other hand, has reduced the scope of inspection compared to the IDEAL project, especially in the category of personalisation technology.
Preparation for the feedback loop for optical document inspection
As mentioned above, at the German border inspection posts machine authentication results are routinely logged in conformance with BSI TR-03135 v1.2 to establish a permanent feedback loop for operational monitoring purposes. The federal office for information security has developed an evaluation system (data warehouse) on the basis of this logging, which enables comprehensive statistics of the documents checked, but can also provide practical feedback that facilitates for example the detection of new, previously unknown document models. The AROMA project has now shown that logging the additional elements described in BSI TR-03135 v2, such as the individual results of spectrally selective check routines, can provide further valuable information for a range of potential use cases:
- Border control authority: The existing evaluation possibilities (statistics, identification of newly introduced document series) for the border police could be extended by further aspects of quality assurance and maintenance, for example the detection of technical defects on the basis of systematically failing inspection routines. In addition, software updates could be tested under real-life conditions before they are put into operation.
- Inspection software manufacturer: Detailed logging of individual check routines enables the identification of unstable check routines requiring optimisation. With appropriate adjustments, the performance and reliability of the operative check software can be improved. Furthermore, novel check routines could be thoroughly tested using documents in circulation before going live.
- Producers/designers of security documents: The information about specific check routines failing for a document model may also point towards issues caused by variations during document manufacturing or by wear and tear in individual security features. These findings could then be utilised to aid the stabilisation or optimisation of the production processes of the security document itself.
The established mechanisms in the BSI are already suitable for the introduction of the format according to BSI TR-03135 v2. The positive experience within the AROMA project supports the activities for implementing this version in the next generation of the Integrated Border Control Application for the German Federal Police.
Outlook
The results and experiences gathered in the course of the AROMA project proved to give valuable insights into various aspects of state-of-the-art machine authentication processes far beyond the question of comparability:
- user guidance;
- handling of double-sided documents;
- unambiguous identification of document models despite variations in security features;
- conformance with logging formats;
- the great potential of screening events and feedback loops not only for operational monitoring purposes.
However, regarding the main objective of the project – the development of a systematic method for comparing the performance of software products for optical machine authentication of security documents – the issue of software optimisation for the image characteristics of specific acquisition devices limited the extent to which the performances of participating software products were comparable. In the case of verification rates, objective conclusions were even near to impossible, due to the strong influence of the hardware configuration on this performance parameter. This evaluation method is therefore only of limited relevance when applied for the sole purpose of performance comparison.
In principle, this limitation could be lifted by employing standardised, independent image material. Even then, however, this approach would only yield adequate comparability and meaningful evaluation results if both the evaluation and the operational environments would still rely on the same origin of the image material and all tested products are optimised for these images. One way to achieve this in practice could be to standardise image acquisition devices with all software vendors optimising their check routine algorithms for these devices. On the other hand, however, such standardisation would be a major drag on the innovation of the inspection hardware, such as the use of novel illumination techniques for existing or future security features.
AROMA also showed that the focus on only three of the four categories described in EU Council Regulation 2252/2004[6] may be too short-minded as, for example, the check of certain optical aspects of diffractive features (DOVIDs), was performed during the evaluation. This illustrates that security features primarily designed for visual inspection have gained increased attention in the past few years with respect to their use for machine authentication processes. In the upcoming third part of this publication, recent developments and activities by manufacturers of security features within this scope will be discussed.
Stay tuned, again…
Acknowledgements
BKA and secunet would like to thank the Federal Ministry of the Interior, Building and Community for providing the financial resources to conduct this project. In addition, BKA and secunet would like to thank their colleagues at the German Federal Police for their cooperation and support by connecting the border control stations at Frankfurt Airport and providing the evaluation server. Our thanks also go to our colleagues at the Federal Office for Information Security for their preparation and support in the quality assurance of the data.
References
- ICAO Technical Report (2018). Best Practice Guidelines for Optical Machine Authentication Part 1: Recommendations.
- Federal Office for Information Security (2017). BSI TR-03135 Machine Authentication of MRTDs for Public Sector Applications.
[Accessed 21 November 2018]. - Weigand, C. and Schneider, U. (2018). Optical Machine Authentication of Security Documents – Part I: Recent International Impact. Keesing Journal of Documents & Identity, Vol. 57, pp. 24-31.
- Schneider, U. and Seidel, U. (2013). Current Aspects in Machine Authentication of Security Documents – Part I: Do we need optical document security? Keesing Journal of Documents & Identity, Vol. 41, pp. 3-10.
- Schneider, U. and Seidel, U. (2014). Current Aspects in Machine Authentication of Security Documents – Part II: Unused potential and the need for improvement? Keesing Journal of Documents & Identity, Vol. 43, pp. 3-12.
- Council of the European Union (2004). Council Regulation (EC) No 2252/2004 of 13 December 2004 on standards for security features and biometrics in passports and travel documents issued by Member States. [Accessed 21 November 2018].
Christian Weigand received his PhD in electronics and telecommunication from the University of Trondheim in 2012. In 2016, he joined the Forensic Science Institute of the German Bundeskriminalamt, where he focusses on the forensic analysis of barcodes in identity documents, machine authentication and platforms for information exchange.