Early this year, expert Ravi Das began a comprehensive series on modern day physical and behavioral biometric technologies. As we resume the series, he introduces the science of voice recognition, including its fascinating history and the factors that can influence it.

Voice recognition is a biometric technology whose research and development dates all the way back to World War II. For example, at that time, spectrographs showed that there are variations in the intensity of various sounds in a person’s voice, and at different frequency levels.

This propelled the idea of perhaps using voice recognition to confirm the identity of a particular individual. The research and development into voice recognition continued well into the 1960’s, and the voice spectrographs which were used at the time started to utilize statistical modeling as a means of biometric template creation, rather than using the traditional approaches.

This continued trend would allow for the evolution of automated voice recognition tools to come into play. In fact, the first known voice recognition system was called the “Forensic Automatic Speaker Recognition,” or FASR for short.

In today’s biometric world, voice recognition can be considered to be both behavioral based and a physical based biometric. This is so because the acoustic properties of a particular person’s voice are a direct function of the shape of the individual’s mouth, as well as the length and the quality of the vocal cords (the physical component). But also, at the same time, the behavioral data of an individual’s voice is present in the template as well, and this includes such variables as the pitch, volume, and the rhythm of the voice.

How Voice Recognition Works

The first step in voice recognition is for an individual to produce an actual voice sample. Voice production is a facet of life which we take for granted every day, and the actual process is complicated. The production of sound originates from the vocal cords. In between the vocal cords is a gap. When we attempt to communicate, the muscles which control the vocal cords contract.

As a result, the gap narrows, and as we exhale, this breath passes through the gap, which creates sound. The unique patterns of an individual’s voice are then produced by the vocal tract. The vocal tract consists of the laryngeal pharynx, oral pharynx, oral cavity, nasal pharynx, and the nasal cavity. It is these unique patterns created by the vocal tract which is used by voice recognition systems.

Even though people may sound alike to the human ear, everybody, to some degree, has a different or unique annunciation in their speech. To ensure a good quality voice sample, the individual usually recites some sort of text, which can either be a verbal phrase a series of numbers, or even repeating a passage of text, which is usually prompted by the voice recognition system. The individual usually has to repeat this a number of times.

The most common devices used to capture an individual’s voice samples are computer microphones, cell (mobile) phones, and land line-based telephones. As a result, a key advantage of voice recognition is that it can leverage existing telephony technology, with minimal disruption to an entity’s business processes. In terms of noise disruption, computer microphones and cell phones create the most, and land line-based telephones create the least.

How Voice Recognition Can Be Used

When compared to other major Biometric modalities (such as Fingerprint Recognition, Iris Recognition, and Facial Recognition), Voice Recognition has not been as widely deployed. However, it has found its home in some applications, such as the following:

  • Financial transactions: Whenever you call the number to contact your brokerage company, more often than not you have to wait in a long queue in order to talk to real, live person. Or, you may have to enter a series of PIN Numbers to confirm your identity. But with Voice Recognition, all of this can be eradicated. For example, when you first call in, the system on the receiving end can automatically confirm your identity by speaking just a few pass phrases. Once this has been accomplished, you can then proceed to make whatever financial transaction you need to.
  • Authentication for devices: The Smartphones of today are now requiring the use of what is known as “Multifactor Authentication”, or MFA for short. This is where you have to confirm your identity through at least three or more differing layers of authentication. For example, Apple has been using TouchID (for Fingerprint Recognition) and FaceID (for Facial Recognition). But now they have started to introduce the use of Voice Recognition as well for another means of authentication.
  • Microsoft is using it: As you know, Microsoft is the juggernaut of the software industry. Even Microsoft has started using Voice Recognition in their own brands of tablets and wireless devices in order to confirm the identity of the user. Learn more about Microsoft’s application of voice recognition.
  • Artificial Intelligence: Artificial Intelligence (AI) and Machine Learning (ML) are all the rage today, primarily fueled by the adoption of ChatGPT. Experimentation with voice recognition technology in this realm is also occurring.

Up Next: The next article in this series will cover the science of signature recognition as a form of biometric technology.


Join the conversation.

Keesing Technologies

Keesing Platform forms part of Keesing Technologies
The global market leader in banknote and ID document verification

+ posts

Ravi Das is a Cybersecurity Consultant and Business Development Specialist. He also does Cybersecurity Consulting through his private practice, RaviDas Tech, Inc. He also possesses the Certified in Cybersecurity (CC) cert from the ISC2.

Previous articleEgypt Issues Its First 20-Pound Polymer Banknote
Next articleThe U.S. CBP Mobile Passport Control App