ICAO specifies that when a name is “in a national language that does not use the Latin alphabet,
a transliteration shall also be provided.” Although some Arabic-speaking nations have dual-script citizenship registers, generally Arabic-speaking people are free to choose their own transcription when applying for MRTDs. Thus as people travel they inevitably end up with multiple versions in Latin of their Arabic names. As this makes tracking criminals and terrorists extremely difficult, in 2007 Task Force 3 of ISO Working Group 3 was asked to consider the standardisation of the transliteration of Arabic names in MRTDs. Mike Ellis will attempt to give an outline of the issues, problems and solutions.
The standard for MRTDs, ICAO’s Doc 9303, specifies that when a name is “in a national language that does not use the Latin alphabet, a transliteration shall also be provided.”1 Generally, names written in Arabic script are not transliterated but transcribed into equivalent Latin characters. Transliteration converts texts from one script into another, by which the aim is to represent the characters correctly.
Transcription is a phonetic process – we write the name as it sounds. While this is a practical approach, it suffers from a number of drawbacks:
• The standard Arabic script consists of 32 consonants, 18 vowels and diphthongs and 3 other signs. In addition there are over 100 national characters in the Arabic script when used with non-Arabic languages, although some of these are obsolete and no longer in use. Some of the Arabic characters have equivalents in Latin, but others have none.
• Arabic and the other languages using the Arabic script are usually written using consonants alone. Thus the name(Mohammed) as written consists of just four consonants, which may be approximated in Latin as ‘Mhmd’. The vowels are added at the discretion of the translator to achieve a phonetic equivalent.
• The Latin character used depends on the target language, for example, ‘ch’, ‘sh’ and ‘th’ are pronounced differently in English and French and German. Compare the English transcription ‘Omar Khayyam’ with the German transcription ‘Omar Chajjam’ for the name of the mathematician and poet.
Multiple Latin versions of Arabic names
A common Arabic name such ashas two components: the firsthas five characters which could be approximated as Mhmwd; the secondhas nine which could be approximated as Abdalrhym. But because of the variation described above, the first name could be transcribed as ‘Mahmud’, ‘Mehmood’, ‘Mahmut’ or 54 other variations; the second name could be ‘Abd-al-Rahiim’, ‘Abdalraheem’, ‘Abd ar Raheem’ or 142 other variations. This one unique Arabic name could give rise to potentially 8,265 variations in Latin transcriptions.
Some Arabic-speaking nations have avoided this problem by having dual-script citizenship registers – people have their name recorded in Arabic script and one equivalent Latin script. But generally this does not occur and Arabic-speaking people are free to choose their own transcription (or be given one by an official) when they apply for passports, visas and airline tickets. Thus as people travel they inevitably end up with multiple versions in Latin of their Arabic names. This was of great concern to security agencies of course, because this made tracking criminals and terrorists extremely difficult or even impossible.
Time for standardisation
In 2007 Task Force 3 of ISO [ISO/IEC JTC1/SC17] Working Group 3 was asked by ICAO’s New Technologies Working Group (NTWG) to consider the standardisation of the transliteration of Arabic names in MRTDs. TF3 considered the problem and decided that a true transliteration of the Arabic name was the only solution to this problem. That is, the original name in Arabic script is unique – there is only one nameand that must be preserved across the transliteration process.
Problems with preserving Arabic names in transliteration
The problem with trying to preserve the Arabic name across the transliteration process is that the Machine Readable Zone (MRZ) of the MRTD is limited to the Latin characters ‘A’ to ‘Z’ and the symbol ‘<’. In other words over 32 Arabic consonants had to be represented by only 26 Latin characters. The problem was further compounded by the fact that several Arabic characters are approximated by the same Latin character. For example,(heh) is a soft ‘H’, while(hah) is a hard ‘H’.
Constructing a transliteration scheme
As there are no existing transliteration schemes that only use ‘A’ to ‘Z’, we had to construct our own transliteration scheme. The immediate problem was how to represent 39 basic Arabic characters of Modern Literary Arabic with just 26 Latin characters. We decided to use ‘X’ as an ‘escape’ character to get more combinations. This also enabled us to include the extra national characters that are used in some countries. ‘X’ already has a special function in the MRZ – a small number of European national characters have special transliterations using ‘X’ as a marker, for example, the character ‘Ñ’ can be transliterated into the MRZ as ‘NXX’. As well, an ‘X’ equivalent does not occur in Arabic.
So our first draft was to match the Arabic and Latin characters which had highly similar sounds. For example,(beh) becomes ‘B’;(teh) becomes ‘T’;(jeem) becomes ‘J’. For similar characters such as(heh) and(hah), we used the ‘X’ escape character to distinguish them:(heh) becomes ‘H’ and(hah) becomes ‘XH’. That accounted for many of the transliterations, but there remained some combinations which required sequences of an escape character followed by two letters. For example,(seen) becomes ‘S’;(sad) becomes ‘XSS’; and(sheen) becomes ‘XSH’. (If we represented by ‘XS’ then ‘XSH’ could be reverse transliterated as either ‘XSH’ =(sheen) or as ‘XS’+‘H’ =(sad) +(heh), a confusion which we had to avoid). There were some arbitrary choices:(ain) which does not have an exact Latin equivalent and is often transcribed as an apostrophe (‘), was transliterated as ‘E’.
We also encountered two special cases: the(teh marbuta) and the(shadda).
• The(teh marbuta)
This is often transcribed as ‘H’ or ‘T’ or ‘TAN’, depending upon the context. Academics that we consulted recommended ‘XTA’. One of the things we did was to use a computer to transliterate 600 common Arabic names into their Latin transliterations using our scheme.
It became obvious that teh marbuta was often used at the end of names to feminise them, and ‘XTA’ looked odd in this context. Therefore we proposed that teh marbuta could be transliterated as ‘XTA’, or if at the end of a name as ‘XAH’. Thus becomes ‘FAXTTMXAH’(Fatimah).
This is the ‘doubling’ marker, which means that the character under this mark is doubled. For example,has a shadda over the final consonant(yeh) and becomes ‘FWZYY’ (Fawzi).
Preserving similarity to phonetic transcription
We did try to preserve through this whole process some similarity in the transliterated version to the phonetic transcription, but it proved too difficult in the end. For example, the name (Abu Bakir Mohammed ibn Zakaria al Razi) becomes ‘ABW BKR MXHMD BN ZKRYA ALRAZY’, which is approximately close; but(Fidda) becomes ‘FXDZXDZXAH’ which is not. Approximately 80% of names are recognisable with a little experience; all of course are recognisable with a lot of experience. The MRZ is primarily for machine reading and it is more valuable to have the name in Arabic accurately recorded there. In effect, ‘FXDZXDZXAH’ can be easily processed by computer and always gives the Arabic name .
Benefits for Arabic-speaking countries
This then leads to another important consideration: as the MRZ now contains the name in an exact Arabic transliteration, the MRZ is available to Arabic-speaking countries to read passports in Arabic, that is, the name can appear on the computer screen in Arabic without any intermediate Latin stage, a stage which would have in the past given an inexact phonetic transcription. Thus the benefits of machine reading are extended to Arabic speaking countries.
One common comment during the presentations that we made in the development stages of this project, was that everything would be solved by adopting the e-Passport. Data Group 11 (DG11 – Additional Personal Details) on the chip can hold the name encoded using the Arabic tables of Unicode. Thus the nameis represented in Unicode by the sequence: 0645, 062D, 0645, 0648, 062F, 0639, 0628, 062F, 0627, 0644, 0631, 062D, 064A, 0645. However, this presupposes that every country introduces e-Passports, and every country reads them. The MRZ still provides the globally interoperable component, and is available when the chip breaks. And in the end, having the MRZ in a form which is interchangeable with DG11 is valuable from a security standpoint.
Thus these three representations of the nameare equivalent and interchangeable:
DG11: 0645, 062D, 0645, 0648, 062F, 0639, 0628, 062F, 0627, 0644, 0631, 062D, 064A, 0645
MRZ: MXHMWD EBDALRXHYM
TF3 wrote a Technical Report describing the development of the transliteration process, and incorporating the final transliteration tables. To assist with the implementation, we included computer programs that can be used to transliterate from Arabic to MRZ and from MRZ back to Arabic. At the 2008 meeting of the ICAO Technical Advisory Group on MRTDs, TF3 presented the preliminary Technical Report on behalf of the ICAO New Technologies Working Group. The Technical Advisory Group (TAG) accepted the work but asked us to consider the implementation issues. In particular, what was to be done about the visual zone (VIZ), and would there be any problems for other passports users, such as airlines, railways, hotels and car hire?
Visual zone (VIZ)
We recommended that the form of the name in the VIZ be left to the decision of the issuing country. We expect that for the VIZ the current practice of printing the Latin transcription will be continued. Although it is true that there would be no longer a 1-to-1 match between the name in the VIZ and the MRZ (which ICAO’s Doc 9303 allows, see box 3), there were benefits in continuing this tradition in the VIZ. The bearer of the passport would be more comfortable with this, and in interacting with other non-Arabic-speaking users of the passport, it would have advantages, such as:
• In dealing with the bearer as a traveller, the airline and border control officials would be able to address them in a familiar form.
• If a traveller lost their passport in a non-Arabic speaking environment an announcement of their name over a public address system would still be possible.
Advanced Passenger Information (API)
Another issue was that of advanced passenger information (API). Should the bearer’s name as shown in the VIZ or MRZ be forwarded to immigration control authorities in the country of destination? It turns out that there was an IATA/CAWG ‘API Statement of Principles’ made at their Facilitation Division meeting in Cairo in 2004, which said that the required API data should be limited to the data contained in the MRZ of travel documents, or obtainable from existing government databases, such as those containing visa issuance information. Forwarding of the transliterated name in the MRZ form was exactly what we thought was the best situation.
In fact, the UN/EDIFACT standard for the international exchange of data, which is commonly used for API, has a definition for the PAXLST (passenger list message) data items which states that the name components for passenger clearance should be reported in the same manner as they appear in the MRZ of the ICAO standard travel document.2
There already exist substantial databases of names in alert lists – names of smugglers, illegal immigrants, terrorists, and so on. In the case of people with Arabic names, these are generally held in the transcribed Latin form. How could these names be reconciled with the new transliterated form? This issue is relatively easy to solve. There are compiled lists of phonetic variations of Arabic names obtainable from various sources. We in fact found the 57 variations of(‘Mahmud’, ‘Mehmood’, ‘Mahmut’, etc.) on the internet. With the transliterated form we have the original Arabic form, so it is a much easier task to generate the variations and do comparisons. In time, we expect the transcribed phonetic variations will fall into disuse in databases as the new transliterated form takes over; the new form is after all the true Arabic version of the name.
Passenger Name Record (PNR)
The remaining issue which caused us the most thought was the situation with the airline Passenger Name Record (PNR). This is used by airlines to interact with their customers, and in certain circumstances could be used for advanced passenger information. However, the fundamental stance of the TAG is that the passport is issued by governments for the identification of travellers and border control. While governments acknowledge that passports are used by other parties, the passport must fulfil its fundamental role. The PNR is primarily for airline use and its use must be separated from any border control requirement by governments. Thus, the PNR, derived from the VIZ, will be used to board passengers, track their luggage, and so on; the API data will be derived from the MRZ.
In 2011 TF3 presented the Technical Report to the TAG on behalf of the NTWG. The transliteration table for Arabic was approved for inclusion as recommended practice in Appendix 5 of ICAO’s Doc 9303. It is expected that it will appear in the next revision of ICAO’s Doc 9303, and in the meantime it will be posted at the ICAO MRTD website.
It is recognised that the implementation of the transliteration scheme may cause some short term difficulties, but the long-term benefits are substantial. Fundamentally, the proper treatment of names written in the Arabic script has to start somewhere.
1 ICAO Doc 9303, Part 1, Volume 1, Section IV, Paragraph 8.3.
2 UN/EDIFACT Implementation Guide V3.5, page 89.