TDI’s recent Biennial Conference brought together speakers, educators, and accessibility leaders representing a variety of industry sectors, government agencies, and advocacy groups. The four-day virtual conference featured a broad range of interactive panels and discussions, including a “No More Craptions” session focused on captioned IP programming, captions created by automatic speech recognition (ASR) programs, and caption quality metrics.
The 21st Century Communications and Video Accessibility Act (CVAA) was designed to ensure that people with disabilities weren’t left behind as technology progressed in the digital age. Among other things, the law required programming first captioned on broadcast TV also be captioned when delivered via IP, or online.
But as the popularity of online-only programming – such as content made specifically for streaming platforms like Netflix, Hulu, Apple TV, or YouTube, and not for traditional television – grows, so, too, do the questions over whether that content is required to be captioned under Federal Communication Commission (FCC) rules since it did not first appear on broadcast TV. (Many video platforms and streaming providers, however, have taken it upon themselves to require their videos be delivered with captions and other accessibility options, like audio description.)
Karen Peltz Strauss, former deputy chief of the FCC’s Consumer and Governmental Affairs Bureau, said that advocates have asked the FCC to take action to address the virtual explosion of online video distributors, and exercise its authority to close some of the commission’s existing categorical exemptions, such as the captioning requirements for online-only programming. (Other captioning exemptions include no requirement for captioned TV commercials and language that new broadcast networks aren’t required to provide captions for their first four years.)
Larry Walke, associate general counsel at the National Association of Broadcasters, said that a number of television stations across the country have tried using automatic speech recognition (ASR) systems to write captions and noted that many broadcasters who have used ASR reported no increase in viewer complaints over traditional human-generated captions.
“My impression…is that ASR is making a lot of progress in the last couple of years and, in certain respects, it seems to work very well,” said Walke. “And it can be valuable when it’s difficult to locate a live captioner on short notice, and it can allow more content to be captioned.”
Strauss, however, said that during her time with the FCC, there were periods when the commission did not get a lot of complaints on captioning and caption quality. Among the potential reasons for a lack of FCC complaints, panelists said, were viewer time constraints, difficulties in filing a complaint on the commission’s website, and the fact that many deaf and hard-of-hearing viewers don’t know what they can’t hear so if the captions are inaccurate or incomplete, viewers won’t necessarily know that as they don’t have the benefit of hearing the audio.
“The lack of complaints is not a good indicator for the lack of quality,” Strauss said.
The panelists did note that strides are being made with ASR, and that the recognition software can work well with pre-recorded programming (albeit with a follow-up accuracy check). For some, concerns still exist over ASR’s use with live programming and news and emergency reports in which accurate information is crucial.
Panelists noted some of the areas in which ASR has shown to fall short. These include:
- Noting speaker IDs
- Captioning heavily accented speech
- Captioning multiple, simultaneous speakers
- Captioning with background noise
- Captioning sound effects
- Captioning names and proper nouns
- Captioning specialized technical terms and jargon
Panelists also discussed the need for the FCC to establish a new set of neutral metrics – applying to both ASR and human-generated captions – for measuring caption accuracy. The FCC’s current Caption Quality rules require that captions be accurate, complete, properly placed on the screen, and in-synch with spoken words and sounds to the greatest extent possible.
Larry Goldberg, Head of Accessibility at Verizon Media, suggested that it might be time for a “Turing Test” for closed captioning, which would test a machine’s ability to exhibit intelligent behavior equivalent to or indistinguishable from that of a human, and vice versa. The goal, he said, would be to establish a high-level captioning playing field for humans and machines alike.
This is not the first time the FCC has been called upon to consider new metrics from measuring caption quality. A 2019 petition, filed on behalf of nearly a dozen deaf and hard-of-hearing, academic, and consumer groups, asked that the FCC issue a ruling explaining how the commission’s “best practices” for video programmers, caption vendors, and captioners applied to ASR. It requested that the FCC develop rules requiring live television programming to be captioned at a level that meets or exceeds technology-neutral metrics, guaranteeing that programs are accessible by those in the deaf and hard-of-hearing community.
Panel moderator and TDI board member Opeoluwa Sotonwa suggested that any metrics considered include the experience of caption consumers, and not just industry experts.