18 Nov 2020 David Titmus

Webinar Examines Whether ASR Captions Effectively Accommodate Deaf People

Though automatic speech recognition (ASR) technology has improved in recent years, it still doesn’t match the quality of captions provided by human captioners and speech-to-text professionals − and individuals who are deaf are paying the price.

That was among the takeaways from a webinar hosted by the National Deaf Center on Postsecondary Outcomes (NDC) that focused on whether captions created by ASR software effectively accommodate members of the deaf and hard-of-hearing community.

The webinar − presented by Zainab Alkebsi, policy counsel at the National Association of the Deaf and Stephanie Zito, technical assistance coordinator at NDC − was aimed at educators and postsecondary professionals. It discussed the impact of auto captioning and ASR on effective communication with deaf students and staff as well as the institutional responsibility for colleges and universities.

“Although ASR technology has come a long way, it is not quite ready to be relied on for complex situations,” said Alkebsi.

She added that the “reliance on ASR as a way to accommodate deaf students actually jeopardizes effective communication, which is mandated by the Americans with Disabilities Act.”

Although the webinar was presented from a higher education/student perspective, the presenters noted that the information could be applied to other sectors as well.

Shortcomings and Pitfalls

Captions have continued to expand in popularity and reach over the past several years, and now often can be found in the conference room and the classroom just as much as the living room.

Classroom accessibility, captions, and remote learning became hot topics this year as schools and colleges across the country explored (and continue to explore) new ways to teach students as the coronavirus pandemic forced many academic institutions to close classrooms and cancel in-person instruction.

As distance learning and virtual teaching became commonplace – 74% of the 100 largest school districts in the United States chose remote learning as their back-to-school instructional model, affecting over 9 million students – instructors had little choice but to rely on video instruction and online course materials.

And with this shift to online learning, there also has been an increase in the use of auto captioning technology.

NDC analyzed research on ASR’s impact on deaf students and determined that auto captioning and ASR have several shortcomings and pitfalls.

The NDC noted that, “to the untrained eye, ASR may seem ‘good enough’ when testing its application in a quiet office with a single speaker” but when things such as accented speakers, rapid-fire dialogue, group discussions, and audio distortion are introduced, the technology has yet to prove comparable to a trained human captioning professional.

“NDC did a review of existing literature on ASR and the research shows that ASR often does not typically include proper grammar and punctuation marks or multiple speaker identification or speaking changes,” said Zito.

Slide from ASR webinar noting the errors in captioning for the term "acetaminophen"

A slide from the National Deaf Center’s “Does Auto Captioning Effectively Accommodate Deaf People?” webinar shows examples of the ASR errors made in captioning the word “acetaminophen” and the abbreviation “APAP.”

She said technical vocabulary, jargon, and proper nouns also are often missing from ASR captions and that the technology can have trouble differentiating between homonyms (words that sound the same but have different meanings).

VITAC, a full-service captioning company and industry leader in captioning and accessible media solutions for more than three decades, believes in the essential human element in creating captions.

Speech automation certainly has a role in creating captions, but the programs need a human hand (and eye and ear and voice and intelligence) guiding and assisting it. The problem in quality lies with “unassisted” captions, where a human is not involved.

Though it’s important that new technologies should be embraced, the rollout of these technologies, like ASR, without quality controls or testing or adhering to all FCC caption quality best practices for accuracy, synchronicity, completeness, and placement is a disservice to caption viewers.

“Courts have made it clear that effective communication is very much a subjective experience,” said Alkebsi. “You must check with the student and find out what their subjective experience is. Let them tell you what their needs and preferences are. It is never okay to force ASR on a student. Often ASR is not perceived as effective communication by certain individuals because of its flaws.

“ASR services tout their supposedly low error rate without explaining exactly how that number came to be because not all errors are created equal.

“For instance, it could be just one word missing, but it is a critical word that changes the entire meaning of what is being taught. Such as the word ‘not.’ Or sometimes the ASR could be hard to follow along in terms of comprehension. It is very taxing to constantly be filling in the missing pieces. That puts the burden on the student, which should not be happening. With all this extra mental exercise, it often means a missed opportunity for the deaf or hard-of-hearing student to ask questions, and meaningfully engage in the class.

“Quality captions matter, the quality of captions needs to be considered…auto captions may not be considered equity access under the law.”