Evaluating Face Recognition Software’s Accuracy for Flight Boarding

This latest test concerns a specific application of one-to-many matching in airport transit settings, where travelers’ faces are matched against a database of individuals who are all expected to be present. In this scenario, only a few hundred passengers board a given flight. However, NIST also looked at whether the technology could be viable elsewhere in the airport, specifically in the security line where perhaps 100 times more people might be expected during a certain time window. (The database was built from images used in previous FRVT studies, but the subjects were not wearing face masks.)

As with previous studies, the team used software that developers voluntarily submitted to NIST for evaluation. This time, the team only looked at software that was designed to perform the one-to-many matching task, evaluating a total of 29 algorithms.

Among the report’s findings are: 

·  The seven top-performing algorithms can successfully identify at least 99.5% of passengers the first time around if the database contains one image of a passenger. If the database contains a single image of each individual, the study shows that for as many as 428 of 567 simulated flight boarding processes, with each flight carrying 420 passengers, the most accurate FR algorithm can identify passengers for boarding without any false negatives (meaning the software fails to match two images of the same person). Stated in terms of error rates, this corresponds to at least 99.87% of travelers being able to board successfully after presenting themselves one time to the camera. Six additional algorithms give better than 99.5% accuracy.

·  Performance improves dramatically if the database contains multiple images of a passenger. The database gallery can contain more than one image of a single passenger. When an average of six prior images of a passenger are in the gallery, then all algorithms realize large gains: The most accurate algorithm will check the identities of passengers on 545 of 567 flights without any errors, and at least 18 developers’ algorithms are effective at identifying more than 99.5% of travelers accurately with a single presentation to the camera.

·  Demographic differences in the dataset have little effect. The team explored differences in performance on male versus female subjects and also across national origin, which were the two identifiers the photos included. National origin can, but does not always, reflect racial background. Algorithms performed with high accuracy across all these variations. False negatives, though slightly more common for women, were rare in all cases. 

Grother said that the study does not address an important factor: the sort of camera that an FR system uses. Because airport environments differ, and because the cameras themselves operate in different ways, the report offers some guidance for tests that an airline or immigration authority could run to complement the NIST test results. Such tests would provide accuracy estimates that reflect the actual equipment and environment where it is used. 

“We do not focus on cameras, which are an influential variable,” he said. “We recommend that officials conduct the other tests we outline so as to refine their operations.”