Identity verification at passport control, in policing, and in retail stores is most often achieved by matching an individual’s face to a photographic identity document. Despite this, recent research has shown that unfamiliar face recognition is a difficult task, and one which is highly prone to error. In this article, David Robertson, Russ Middleton and Mike Burton outline evidence which establishes the difficulty faced by professionals in occupations which require accurate face recognition, and suggest new avenues of research which may reduce current levels of human error and have the potential to improve real‑world recognition performance.
National security officials, officers in the criminal justice system and retail staff frequently rely on face recognition to establish and authenticate the identity of an individual. At UK Border Control, officials work to ensure that only those passengers whose passport photo matches their face are allowed to enter the country. In the criminal justice system, police officers often utilise CCTV images as a means of identifying the perpetrator of a crime. In addition, cashiers in retail stores must examine face‑photo ID cards in order to prohibit the illegal sale of age‑restricted goods. Each of these occupations relies on the ability to detect correctly whether or not the face of an unfamiliar person matches a face photo on an ID card or an image still.
Our reliance on face recognition for identity verification may stem from the fact that in some instances we show a high level of expertise in this area. For example, we are able to recognise familiar faces across a large range of highly variable photos, apparently effortlessly. However, in a striking contrast, recent research has shown that we are surprisingly poor at recognising new instances of an unfamiliar person. This distinction has major implications for applied professions in which accurate unfamiliar face recognition is vital.
Unfamiliar face recognition: laboratory studies
Our ability to recognise familiar faces is certainly impressive. Figure 1 shows a considerable range of visual conditions, such as changes in the viewing angle, physical appearance, lighting and camera. Nevertheless, familiar viewers find it easy to recognise this person. However, recent research has shown that this ability does not generalise to the identification of similar instances of unfamiliar people. For example, Burton et al developed the Glasgow Face Matching Test (GFMT), a psychometric test of unfamiliar face recognition ability, in which participants are required to decide whether a pair of face photos depicts two instances of the same person (taken seconds apart using different cameras) or two different people.1
This type of one‑to‑one matching task is the experimental analogue of the paired comparisons made by passport officers (face‑passport photo) and cashiers (face‑photo ID) on a daily basis. Despite the GFMT using high‑quality front‑facing images, error rates of between 15% and 20% are the norm, across hundreds of viewers tested.
In contrast to the one‑to‑one matching tested by the GFMT, Bruce et al had previously developed a task which modelled an old‑fashioned police line‑up scenario which involved a series of one‑to‑ten matching arrays.2 As seen in Figure 2, participants were presented with a single high‑quality front‑facing image still of a ‘suspect’ (taken from high‑quality video footage), below which was presented an array of ten face photos. Participants were required to decide whether or not the suspect was present in the array, and if so, to pick him out. Error rates on this task were very high, 30% on average, despite the photos being taken on the same day, in very similar pose and in optimal lighting conditions. Of course, with lower quality images, performance drops still further,3 but the key point is that it remains far from perfect, even with the highest quality images. In short, it is not the technology which limits performance on unfamiliar face matching; it is the properties of human vision.
All these studies examine people’s ability to match photos. However, it turns out that matching a photo to a live person is no easier. For example, Megreya & Burton showed that both one‑to‑one and one‑to‑ten matching was no easier for unfamiliar viewers, even when the target person was standing in front of the viewers.4 Davis & Valentine showed that people were very highly error‑prone when trying to match a live person to a full CCTV clip of the person, taken a short while earlier.5Although these lab studies are informative (and have been used to inform theories of the perceptual processes involved in face recognition), they are all performed on non‑specialist viewers, typically students. However, in real settings, it is important to know whether people who carry out these tasks professionally are able to perform more accurately than untrained viewers. We come to these studies next.
Unfamiliar face recognition: studies on specialist face recognisers
Kemp et al provided one of the first real‑world demonstrations of unfamiliar face recognition performance.6 The study aimed to assess whether identity fraud could be reduced by including a face photo on one’s credit card. Supermarket cashiers were required to decide whether or not the face photo on a credit card matched the live face of the customer standing in front of them (half held genuine cards, half held fraudulent cards). The cashiers were aware of the purpose of the study and knew that their performance was being observed and recorded. Despite this, Kemp et al reported that fraudulent cards – cards which contained a photo of a person different to that of the bearer – were accepted as genuine on 50% of occasions. The authors concluded that the introduction of photo ID credit cards would do very little to improve the detection of fraud at the point of sale. More broadly, as mentioned above, cashiers are routinely required to check photo ID before selling cigarettes or alcohol. It is reasonable to assume that the level of error reported by Kemp et al would be similar for photo ID cards in this context.
Burton et al compared the performance of a group of university students and a group of 20 police officers with experience in forensic identification (13.5 years of service on average).7 As illustrated in Figure 3, participants were required to view low‑quality CCTV video clips of individuals entering a building, whom, they were told, they would later be asked to identify. The participants were then shown high‑quality face photos (see Figure 3) and asked to rate how confident they were that these individuals had been present in the video clips. Police officers showed very poor accuracy on this task, and in fact did no better than a group of students. Both groups were almost at chance, leading one to ask whether the videos were of such poor quality that the task was impossible. However, Burton et al. also performed the same test on viewers who knew the people depicted in these videos. This group was almost perfect in their accuracy. Once again, we see a huge advantage for familiar over unfamiliar viewers – and a clear indication that, for this task, police officers were no better than any other unfamiliar viewer.
A recent study of unfamiliar face recognition in 30 Australian passport officers was conducted in collaboration with the Australian Department of Foreign Affairs. White et al asked the passport officers to decide whether a passport photo matched the face of a person standing in front of them.8 The study reported that the passport officials incorrectly accepted a fake passport photo as genuine on 14% of trials. Moreover, as shown in Figure 4, there was no relationship between employment duration/experience and accuracy on this task. In short, those who had 20 years’ experience were no more likely to be accurate than new recruits. As the figure shows, there was very wide variation between officers. In fact, this is the standard finding some people are better at these face tasks than others. However, it does not seem to be the case that professional training necessarily leads to high performance levels. To put this finding into perspective, some of the world’s busiest airports handle over 200,000 people every day. It is clear that an average error rate of 14% corresponds to several thousand ID errors a day an unacceptable security risk.
Automatic face recognition: airport e‑Gates
Although we have concentrated on human face recognition in this article, recent advances in technology have led to the installation of electronic facial recognition gates (e‑Gates) at UK airports. These machines scan a passenger’s face and attempt to match it to their passport photo. While in theory this should remove the level of human error reported above, these machines have not proven to be the security panacea that many assumed. Indeed, a recent UK Inspectorate of Borders report queried whether the e‑Gates were providing adequate security when, for example, a husband and wife were able to accidently swap passports and still make it through the system.9 Despite the claims of suppliers, automatic recognition systems have not yet reached the levels of accuracy necessary to make them practical to use at airports.10‑11
How to improve face photo identification
The preceding sections have shown that unfamiliar face recognition is a difficult task which is highly prone to error; regardless of whether one uses a live face or face photos, and whether one has a human or machine recognition system. Despite these findings, we maintain our reliance on photo ID in security, policing and retail contexts. If we are to persevere with this form of identity verification we must seek ways to improve human and machine performance. Recent advances in research, which we outline below, have begun to investigate ways in which unfamiliar face recognition can be improved.
The findings from the White et al passport office study showed that performance on the recognition task was not related to experience/years in employment.8 This suggests that some people are naturally good at this type of task, as can be seen in Figure 4. In future, an established psychometric test of face matching ability such as the GFMT could be used to assess and select the best candidates for positions in which accurate face recognition is vital.
A further study by White et al showed that performance on the GFMT (short version) was improved by 10% when trial‑by‑trial feedback was provided (i.e. the participant was informed whether they had made the correct judgment after each response).12 However, the most interesting finding was that the GFMT feedback training also led to a performance enhancement on an entirely novel set of naturally varying images. This is the first such evidence that training performance on one set of images can lead to a generalizable improvement in unfamiliar face recognition.
-Paired decision making
Dowsett & Burton have shown that performance on recognition tests can be improved when participants work together and come to a judgement in pairs.13 Across four experiments, the study tested unfamiliar face recognition individually (pre‑test), as pairs (paired‑test) and again individually (post‑tests).
The authors report that both low‑performing and high‑performing participants were found to be more accurate when they made their judgements in pairs than in the individual pre‑test phase. Furthermore, those who started with low performance showed a lasting benefit of having worked in pairs, suggesting that this type of procedure may be a particularly effective training method.
One final method of improving unfamiliar face matching focuses on the ID document rather than the selection and training of the human recogniser. One often hears the phrase ‘your passport photo looks nothing like you’ and it seems clear that a single instance of a person can never form a true representation of their appearance. Our research suggests that the key to improving unfamiliar face recognition is learning how an individual varies across a naturally occurring set of instances. In other words, ‘familiarity’ is short‑hand for learning an individual’s idiosyncratic variation in appearance (see Burton, 201314). One method of achieving this for photo‑ID is to increase the number of photos of the bearer on the document. White et al reported that unfamiliar face matching performance significantly improved when mock ID cards contained 2, 3 or 4 photo arrays.15 These findings suggest that a relatively small increase in the number of photos contained on an ID card could reduce the error found with single image identity documents (see Jenkins & Burton for the potential for ‘face averages’ to improve automatic face recognition systems16).
Unfamiliar face recognition is a difficult task which is highly prone to error; regardless of whether one uses a live face or face photos or whether one has a human or machine recognition system. Given the importance placed on identity verification from photo ID in a variety of important contexts, we must seek to find new ways to eliminate human error if we are to improve security and cut fraud.
Answer to line-up in Figure 2: Suspect is not present.
1 Burton, A.M., White, D. and McNeill, A. (2010). The Glasgow Face Matching Test. Behavior Research Methods, 42, 286‑291.
2 Bruce, V., Henderson, Z., Greenwood, K., Hancock, P., Burton, A.M. and Miller, P. (1999). Verification of face identities from images captured on video. Journal of Experimental Psychology: Applied, 5, 339‑360.
3 Henderson, Z., Bruce, V. and Burton, A.M. (2001). Matching the faces of robbers captured on video. Applied Cognitive Psychology, 15, 445‑464.
4 Megreya, A.M. and Burton, A.M. (2008). Matching faces to photographs: Poor performance in eyewitness memory (without the memory). Journal of Experimental Psychology: Applied, 14, 364‑372.
5 Davis, J.P., Valentine, T. (2009). CCTV on trial: Matching video images with the defendant in the dock. Applied Cognitive Psychology, 23, 4, 482‑505.
6 Kemp, R.I., Towell, N., and Pike, G. (1997). When seeing should not be believing: Photographs, credit cards and fraud. Applied Cognitive Psychology, 11(3), 211–222.
7 Burton, A.M., Wilson, S., Cowan, M. and Bruce, V. (1999). Face recognition in poor quality video: evidence from security surveillance. Psychological Science, 10, 243‑248.
8 White, D., Kemp, R.I., Jenkins, R., Matheson, M., Burton, A.M. (2014). Passport Officers’ Errors in Face Matching. PLoS One, 9, 8, e103510.
9 Vine, J. (2011). Inspection of Border Control Operations at Terminal 3, Heathrow Airport. http://icinspector.independent.gov.uk/inspections/inspection-reports/2012-inspection-reports/. Accessed on 7 December 2014.
10 Jenkins, R. and Burton, A.M. (2011). Stable face representations. Philosophical Transactions of the Royal Society of London, B, 366, 1671‑1683.
11 Jenkins, R., and White, D. (2009). Commercial face recognition doesn’t work. Bioinspired Learning and Intelligent Systems for Security, 2009. BLISS’09. Symposium on (pp. 43‑48). IEEE.
12 White, D., Kemp, R.I., Jenkins, R., and Burton, A.M. (2014). Feedback training for facial image comparison. Psychonomic bulletin & review, 20, 100‑106.
13 Dowsett, A.J., and Burton, A.M. (2014, in press). Unfamiliar face matching: Pairs out‑perform individuals and provide a route to training. British Journal of Psychology. Pre‑publication version available online at: http://dx.doi.org/10.1111/bjop.12103. Accessed on: 7 December 2014.
14 Burton, A.M. (2013). Why has research in face recognition progressed so slowly? The importance of variability. Quarterly Journal of Experimental Psychology, 66, 8, 1467‑1485.
15 White, D., Burton, A.M., Jenkins, R. and Kemp, R. (2014). Redesigning photo-ID to improve unfamiliar face matching performance. Journal of Experimental Psychology: Applied, 20, 166‑173.
16 Jenkins, R. and Burton, A.M. (2008). 100% accuracy in automatic face recognition. Science, 319, 435.