How Are Realtime Captions Generated?

Oct 11 2012 David Titmus
women typing on laptop

Popular posts

Cell phone laying on a desk near a computer keyboard with the Twitch logo displayed on the phone screen
How to Add Captions to Twitch How to Add Captions to Twitch
lamp on desk
So You Want to Be a VITAC Realtime Captioner… So You Want to Be a VITAC Realtime Captioner…

Related posts

Wide view of the Colorado Rockies baseball stadium, full of fans, as the sun sets in the distance.
VITAC Sponsoring SVG Regional Sports Production Summit, Showcasing AI Sports Captioning Solution VITAC Sponsoring SVG Regional Sports Production Summit, Showcasing AI Sports Captioning Solution
A man and a woman seated on a couch point a remote control towards the viewer.
New Study Reveals 83% of Global Ads Lack Captions, Accessibility Features New Study Reveals 83% of Global Ads Lack Captions, Accessibility Features

The process used to deliver realtime captions begins at the fingertips of a court reporter with specialized skills fine-tuned especially for this particular application. These court reporters are called realtime captioners.

Unlike a transcriptionist, who manually types each letter of a word on a standard computer keyboard, a realtime captioner is using a steno machine, which is designed in a fashion quite different from that of a standard keyboard. The standard computer keyboard consists of 26 letter keys, all of which can be shifted for capital letters, keys to type 31 marks of punctuation and/or symbols, 10 keys to type digits, as well as a variety of function keys. The steno machine consists of 22 keys and a number bar. Obviously, with only 22 keys, the operators of these machines are doing something different than a typist and something that’s a little mysterious to the general public.

Understanding how the steno machine operates and how realtime captioners work will remove some of that mystery. Clearly, with only 22 keys, all 26 letters of the alphabet are not available on the steno keyboard, although some letters are there twice. No marks of punctuation or symbols are available, either, and no shift key for capitalizing letters. In fact, the keys themselves are unmarked.

The keyboard was designed so the operator can press more than one key at a time. Each key, or combination of keys, represents a sound, and all of the keys pressed within a single stroke represent one syllable of a word. Operators of steno machines learn a writing theory based on phonetics. Realtime captioners do not actually type or spell words on their steno machine. They write words phonetically based on the sounds that they hear, one syllable at a time. So, generally, if a word has one syllable, the captioner can write that word with one stroke of the keyboard. If a word contains multiple syllables, the steno outline will consist of multiple strokes of the keyboard. The ability to write in syllables, as opposed to typing individual letter strokes, is what enables captioners to write at speeds of 225 words per minute and above.

Each captioner writes words based on the way they interpret the sounds they hear. So two people trained together in the same classroom won’t necessarily use matching steno outlines for the same word. To increase their speed, captioners also develop their own brief forms, or shortcuts, for multi-syllabic words or phrases that are used repeatedly in their particular specialty of work, individualizing even further the way they write.

Since all captioners do not write every word exactly the same way, each one of them develops dictionaries geared specifically to their personal writing style. These dictionaries are computerized and are used in conjunction with specialized software to electronically record and translate the steno strokes into English words. For a word to translate correctly, the captioner first must enter the strokes for that word into a dictionary.

Using their skills, their dictionaries and the software, captioners produce highly accurate captions, but within those captions will be some errors. They do everything possible to minimize the number of errors. They do research and dictionary building in preparation for each event or program, and they do follow-up afterward to review any errors that did occur in an effort to prevent that type of error in the future.

Although it’s likely there will be some errors, they should be minimal.

Stay tuned for more posts about our production processes!