Bias in Automatic Speech Recognition
Technology has come a long way, making automatic captioning available on many video conferencing software, social platforms, and tech devices. But many users are aware of the pitfalls, including transcription delays and inaccuracies.
Automatic captioning works via automatic speech recognition systems and artificial intelligence (AI) technology. Recent studies show ASR technology misidentifies more words from people who are black, have accents, or speak other languages.
That’s problematic considering automatic recognition systems seem to be everywhere these days. The technology is used when you talk-to-text on your smartphone, ask Siri or Alexa to give you the weather forecast, or log on for a Zoom call with coworkers. These discrepancies make another strong case for using professional captioning services.
How does automatic speech recognition work?
When it comes to captioning, ASR uses artificial intelligence (AI) technology to automatically transcribes closed captions that appear on the screen as people talk. This includes subtitles, live subtitles, closed captions, and automatic video captions.
The same technology works for most tech devices. But researchers have uncovered a built-in bias when it comes to translating what appears on the screen versus what is spoken.
According to a study conducted by researchers at Stanford University, speech recognition systems from five of the world’s biggest tech companies — Amazon, Apple, Google, IBM, and Microsoft — understand some voices better than others. According to a study published in the journal Proceedings of the National Academy of Sciences, the systems have a higher rate of accuracy with white users than with users who are black.
Studies show inherent bias
“The Stanford study indicated that leading speech recognition systems could be flawed because companies are training the technology on data that is not as diverse as it could be — learning their task mostly from white people, and relatively few black people,” according to an article in The New York Times.
In another article on Slate, The Communities That Live Captioning Leaves Behind, author Sarah Bunin Benor notes auto-captions are ineffective, often botching words and leaving users confused. While it may work well for users who speak “standard” English, AI technology does a disservice to many minority groups.
Benor, also a linguist, tested out auto-captions on several platforms and found translation errors were not limited to her rabbi’s Hebrew words. She found problems occur with many instances of language mixing, including Spanglish entertainment and informational videos with loanwords from Arabic, Punjabi, Vietnamese, and Amharic.
“As immigrant, indigenous, and religious groups conduct their activities online, millions of people are affected by the software’s shortcomings,” Benor writes. “This is clearly an issue of equity and inclusion, and tech companies like Facebook, Google, and Zoom must address it.”
Benor and the Stanford study highlight the pitfalls of relying solely on AI technology. It’s also another boost for the captioning industry and why it pays to hire professional human captioners.
Benefits of human-translated and edited captions:
- A higher degree of accuracy, fewer errors
- Synchronized with speech, no delay
- Chunked properly with punctuation and logical grammar breaks
- Expand your audience
- Meet needs of customers and employees
- Improve the quality of your video viewing experience
- Enhance accessibility for people with disabilities and ESL viewers
- Works for live or virtual training, conferences, e-learning, podcasts, webinars, news conferences, social media videos
Captioning For Events
Caption Pros provides professional and accurate captioning services for many events. This can include public and business meetings, virtual conferences, and pre- or post-recorded audio and video files:
- On-site event captioning: We can project words on a screen alone or open captions with video
- CART Captioning: Streams words directly to the internet for instant reading on a mobile device, laptop, or tablet.
- Instant transcription: Translates spoken words using computer-aided transcription. This is great for news conferences
- Webcasting: Real-time speech-to-text translation for webcasts, webinars, and social media live streams
- Broadcast captioning: Provides captioning for local and national news programs, conferences, government and city council meetings, and athletic events
Artificial intelligence will only continue to permeate daily life, but the technology has limitations. Captions using automatic speech recognition typically have more errors. The quality varies according to the software and the device. Accessibility advocates argue that AI technology ignores linguistic and dialectic differences.
With heightened awareness around equity, inclusion, and accessibility, businesses, schools, and even big-tech giants have a renewed responsibility to meet the needs of all users. Live captioning services don’t discriminate based on ethnicity, religion, or disability.
Visit Caption Pros to learn more about our award-winning captioners and make your next event accessible to all participants by hiring a professional captioner.