Automatic Speech Recognition is Biased: Why we need human captioners in a digital world

Imagine a situation in which you have difficulty understanding or communicating with the people around you. Frustrated, you turn to technology to help get your message across. And then, your technology doesn’t understand you either. You feel a full spectrum of emotions, from embarrassment and frustration to inadequacy and isolation. 

Unfortunately, this scenario is more common than not. An inherent bias exists in almost every form of automatic speech recognition (ASR) technology, from Siri and Alexa to speech recognition systems utilized by tech giants such as Google, Zoom, and Microsoft. Studies have shown that a higher level of accuracy exists for cisgender white voices than for other genders and races. Older adults, people with disabilities, and those who are non-native speakers or have an accent are also likely to experience higher error rates while utilizing ASR systems

Errors in captions are often confusing, and they can also, in certain situations, be life-threatening. But how does a machine become biased in the first place? In this blog, our team explores ASR bias, the implications of ASR bias, and the importance of human captioners in a world that has gone digital. 

How does automatic speech recognition become biased?

Regarding captioning, ASR uses artificial intelligence (AI) technology to automatically transcribe closed captions that appear on the screen as people talk. This includes subtitles, live subtitles, closed captions, and automatic video captions.

ASR “learns” through datasets that are fed into their models. This could be, for example, hours of speech recordings or a dataset that is based on past company performance data. Because an ASR can only learn from what it is given, bias can occur if the dataset lacks diversity. If a training dataset primarily uses voices from one gender or race, the ASR model will become skewed and not representative of the entire population. 

Recent studies have found that the data used to train ASRs “has found a serious absence of recordings of African American speakers” and other dialects. This leads to higher levels of inaccuracy for select groups of people, often significantly impacting minority groups or those who have disabilities.  

Implications of biased automatic speech recognition  

According to the 2022 State of Voice Technology Report, the majority of respondents said that “they view voice-enabled experiences as a critical part of their company’s future enterprise strategy.” ASR is no longer limited to simple voice commands. It now permeates hiring processes, healthcare treatment, interactive learning portals, and more. Because of the increasing prevalence of ASR in day-to-day activities, its inherent bias has become problematic for many individuals.  

Many larger companies utilize AI and ASR technology in their hiring processes and pre-interview assessments. Suppose the ASR model has been trained using limited data lacking diversity or data based on potentially biased past company decisions. In that case, the model will likely exclude qualified individuals based on their speech patterns. As an Applied Linguistics article notes, “If ASRs in software like HireVue have been trained largely on white, middle-class, and likely cisgender male voices, then minoritized groups will be greatly disadvantaged, even if they have similar or better qualifications and even if they utilize SAE-aligned speech.”

Errors in ASR also have significant implications for healthcare industries. Studies show that using speech recognition for medical dictation can result in numerous errors, sometimes up to three times more errors than human transcription. There are also concerns over new tools used to help treat mental health and memory disorders. These diagnostic tools utilize ASR, and if the models become biased, that could mean “patients may receive less comprehensive care.” 

Allison Koenecke, the author of the article “Racial disparities in automated speech recognition,” also emphasizes that the quality of ASR technology significantly impacts people with disabilities. Often the most vulnerable individuals who experience disabilities rely on ASR for voice recognition or talk-to-text features. She notes, “For someone who has a disability and is dependent on these technologies, being misunderstood could have serious consequences.” 

Human captioners reduce bias

As AI and ASR models continue to grapple with biased datasets, human captioners have become more critical than ever. Too many individuals rely on accurate and unbiased captioning for dictations, broadcasts, work, school, and events. It is unacceptable for captions to just be “good enough.” Utilizing professional captioning services helps to bring greater inclusion and accessibility to schools, events, and the workplace to meet the needs of all users. The benefits of using human-translated and edited captions include the following: 

  • A higher degree of accuracy with fewer errors
  • Synchronized with speech with no delay
  • Chunked properly with punctuation and logical grammar breaks
  • Expands your audience reach
  • Meets the needs of customers and employees
  • Improves the quality of your video-viewing experience
  • Enhances accessibility for people with disabilities and ESL viewers
  • Works for live or virtual training, conferences, e-learning, podcasts, webinars, news conferences, social media videos

Most importantly, live captioning services don’t discriminate based on gender, ethnicity, religion, or disability. 

Improve accessibility with Caption Pros

Communication access is a fundamental right, not a privilege. Having more inclusive access to information is vital for a fully functioning and successful society. At Caption Pros, our team of certified and professional human captioners provide accurate captioning services for many types of events. From public forums to business meetings and virtual conferences, we will help to make your next event accessible to all participants. We provide: 

  • On-site event captioning
  • Remote CART Captioning
  • Instant transcription
  • Webcasting
  • Broadcast captioning

Learn more about our award-winning captioning services.