>> Welcome to the "Bars and Tone" radio program, an in-depth look at the news and issues facing AHECTA members today. Now, here are your hosts, Hal Meeks and B.J. Attarian. >> All right, so, here's the deal. Our voiceover person, Megan, recorded a brand-new introduction that included you, Brandon, and forgot to save it. Now she is the Harley-Davidson intern girl riding across the country on the motorcycles. So, we can't get a new intro until she gets back here in the fall. So, you've got me, B.J., and Hal. Hey, Hal, how you doing? >> I'm doing great. >> And Brandon is here. It's the Fourth of July week. You have big plans for the Fourth of July? >> Oh, yeah, yeah. I'm going to Bellhaven, North Carolina, to watch the Fourth of July parade. It's a wonderful slice of Americana. I highly recommend it. >> It's for everyone who is in the vicinity of Bellhaven. What's the parade like? >> Pretty much anything that has wheels can go. And that includes riding tractors, ATVs, semis, tractors. >> Big Wheels? >> Yeah, tractors and just all kinds of stuff. It's pretty awesome. >> Cool. Enjoy the Fourth of July, whatever you're doing for the Fourth of July week. Today on the show, we have a great show for us here today. We're going to be talking captioning, all types of captioning -- captioning after the fact, captioning for the web, live captioning. Our guests include Daniell Krawczyk, president of Municipal Captioning, which is a live-caption aggregator. We've also got the vice president of product for rev.com. Mark Chen will be with us. And John Capobianco, chief marketing officer of Vitac, a leading company in the live-caption sector. So, let's get right into it. And joining us now is friend of the show, really, Daniell Krawczyk. He is the founder and president of Municipal Captioning, formerly of TelVue, then Tightrope. >> Yeah, and LiveU in between. >> That's right. And you were on the show, I think, for each one of those. >> I think I may have been. >> So, yeah. So, welcome back to the show again. >> Thank you. >> And talk about Municipal Captioning. What is the latest thing you're doing? >> Yeah, sure. So, last year, I was at the AHECTA conference, and I met a bunch of folks who were doing closed captioning through Georgia Tech, AMAC -- Accessibility Media Research Center -- and learned what was happening in the world of higher education, that due to ADA lawsuits, higher education was close-captioning everything -- all of the videos being distributed online, post-production-wise, but also pushed to close-caption all things that were happening live. And it made me realize that the world of cities and counties, public access, government access, that larger world I've been working with four years, was also going to need to be resolving this issue of providing effective communication to all the citizens through live, close-captioning of meetings, sports events, other live events, and then captioning of the other content of broadcasting. So, shortly after AHECTA, I left the job I was at, at Tightrope, and I started Municipal Captioning to help these community-television organizations, universities, different groups evaluate all the different options that are out there for captioning the content live or non-live and help them project the costs out for all the content that they have, compare three or more different options, and then be able to buy something that meets their needs. >> So, you don't actually do the captioning itself. >> That's right, yeah. Rather than be the person who is serving as a human professional captioner when there's many different services that provide that, or try to launch a new technology product that does it automatically with AI, I'm aggregating the needs of hundreds of different cities and pulling them together so that we can get better pricing from all the different solutions that are out there and then helping cities combine the different elements -- hardware from here, software from there, maybe correction via this interface or correction on their own, correction with a third party -- so that they can use the scale that they need and the scale of all the other communities around them to get something that would fit their budget. >> So, why, if I am a company -- and you kind of hit on it there a little bit -- but if I'm a company, why would I come to you instead of going directly to the captioning source? >> Sure, sure. So, if someone's trying to figure out what they're going to to do, the first thing that tends to be a problem is trying to figure out three or more different options so that they can get multiple quotes. They can compare and see what that fit is. So, I make it a lot easier for them. Rather than starting from scratch, I can help them see what all the different combinations, what all the different live-hardware solutions look like, in terms of if it's an up-front-only cost, if it's something you can pay for by the hour, and I can help them project out those costs and see. And then, this is a field where things are changing really quickly, so to presume that we can figure out what the perfect solution is right now and that it will still be the best solution for everybody in six months or a year is really unlikely. So, by working with me, they get to see what all the options are now, and then I'm going to keep them abreast of what all the options are as time goes on, so if in six months, a year, a year and a half, there's a better solution, they can easily switch without having to create a whole, new contract. >> And I want to come back to that in a minute because I have a different question -- kind of what you're talking about there. But as you know, we're at a university here, and we are getting into live captioning. What are some of the big obstacles that you see for a university or a smaller municipality in getting into this live captioning, because it's one thing if I want to go ahead and caption after the fact. >> Sure. >> Because there are a lot of opportunities for that. But actually to do the live-captioning part, that's another whole ballgame. >> Yeah, for sure. So, there's a couple elements there. Obviously, you need hardware. You need equipment that's capturing the audio in real time and either feeding it to a human captioner, who is being paid by the hour, or feeding it into machine learning or artificial-intelligence system that is doing the speech recognition. So, you have the initial hardware that's sitting there in your broadcast path, taking the audio from the meeting or the game, and then you have what I call the engine. That engine could be a human engine. It could be a person who is typing furiously on their keyboard and swapping every hour and a half, two hours with another person for a long event. It could be a physical server that sits right next to that encoder, runs the software locally. It could be an engine in the cloud so that the audio's going off and then coming back. So, those are the two main things. You need to have the encoder that's putting the closed captions into the signal, and you need to have the "engine" that's generating those captions. And there's a lot of barriers because people have to figure out how they're going to pay for this, who's going to be doing the work, what's going to be compatible. >> And it does sound like that could be quite a bit more expensive than if I was just doing an AI caption-after-the-fact. >> It can be. I will say, though, that there's so many options now that it's actually surprising that real-time AI captioning can be less expensive than a lot of the after-the-fact post-production solutions. So, traditional, live human captioning tends to be over $100 an hour. There are some that are less, but for the most part, $150, $125, sometimes even more, is what you pay for real-time human captioning. But just because there is such a range of artificial-intelligence solutions now, they tend to be a fraction of that, and some of them are even a fraction, the more expensive AI solutions. >> So, if I was a small school or a big school, how would I start this process of trying to research this? >> Sure. Again, this is what I'm mostly serving folks as, is the central-research person to help them with it, but if I wanted to give a couple pieces of advice for someone who wanted to do the research on their own, it would be to look at the various pieces of equipment, try to figure out the compatibility with the various engines, see what's flexible enough that if six months or a year from now, the best technology for changing audio to text is different, can you still reuse the things you've already invested in, or have you sunk costs into something that you can't reuse it, anyway? >> Or they could come to you. >> Yeah, of course. >> And make it a lot easier, right? >> I'd be happy to walk them through what the different options are, what the different pricing models are, figure out what's relevant. >> Okay, one thing I wanted to ask is that when you're talking to your clients, they're primarily focusing on captioning for broadcast. >> Yeah. So, a lot of my clients both put their meetings and other content on a television channel, a cable-television-broadcast channel, and they stream it online. They have a webstream that can either just be watching the browser or it can be watched on people's over-the-top devices. So, when I talk about broadcast with my customers, it's usually both television and the web. >> Right. Okay, so, do you have these folks doing any post-production captioning, as well? >> Sure. So, there's both elements, right? So, captioning it live doesn't give you the corrected version for post. So, I also have a database of solutions that are post. They range from ones that are entirely automated and don't have corrections to ones that do three layers, where you have the AI followed by two levels of human correction. And then there's even options that combine live with post, where it uses AI to generate captions in real time and then humans correct it to get it perfect or close to perfect, post-production quality. >> So, you kind of hit on it a little bit earlier about talking about doing the research and seeing where we are in six months. Technology is changing so quickly. >> Oh, it's crazy. >> So, look into your crystal ball. Where do you think the market's going, and what do you see coming? >> Sure. So, I went to IBC last fall and AB this spring. What I noticed already as a marked difference was the sheer number of things that were advertising using AI, not just for what I would think of as this first generation of transcribing purposes, but also to use AI to do screen scraping. I saw products that will feed all the lower thirds in your video, or even if it's just the name badge in front of the speaker on the desk. It could read that text and incorporate that into your search. I saw a lot of things coming out that incorporated the real-time speech transcription so that you could do a better job of searching your giant video archive. So, I think what we're going to see is a lot of secondary services, tertiary services, things that are built for helping people deal with their thousands of hours of video in a more efficient way, now that they have searchable text to speech -- or speech to text. I'm sorry. >> I've seen some of those things, too, and Final Cut is actually incorporating some of that now, as well, where they're going out and they're looking at not even with the metadata, but AI is determining what the metadata should be without actually having to go in and put that in. And then they group it. So, I can see that coming down. >> That's a really good point. So, I think it's shifting from something that has traditionally always happened after the production -- captioning was, like, the last step -- that we're now starting to see tools that are built to be used inside the nonlinear editor, and then they advertise that it can help you with your editing because you can use the transcripts to find the things that people are saying. Some of the tools allow you to trim the text transcript, and then it gives you an edit list for your video. So, if we were inputting this podcast, and I found the part where I said the wrong phrase, we could just delete that phrase I said incorrectly, and it would stitch the audio together so it didn't sound like I stumbled over myself. >> Amazing where it's going. So, if we want to find out more about Municipal Captioning, where do we go? >> All right, so, we have our website, obviously, MunicipalCaptioning.com. You can also find us on Facebook, but you can also e-mail me, and it's DanK@MunicipalCaptioning.com. >> Okay, Daniell Krawczyk, founder and president of Municipal Captioning. Thanks for joining us here today. >> Thank you, guys. >> We're talking with Mark Chen, who's the vice president of product -- is that correct? >> Yes, VP of product. >> For Rev, who provides captioning and transcription services. In what instances are closed captions required? Do you have any thoughts on that? >> Yeah. The requirement that closed captions on video comes from two different sources, two different regulatory bodies. One is the FCC, and the other is the ADA. For purposes of educational content, it's primarily governed by the ADA, Americans with Disabilities Act. It basically says that you have to accommodate students who might be deaf or hard of hearing. So, if you put any video online, if you're capturing lectures and placing them, you're making them available online for other students, if you potentially have students who are deaf or hard of hearing, you need to also have those captioned, right? Provide an alternative so they can get the same value out of that video as hearing students. And the FCC steps in when content is put on television. Essentially anything that goes on television has to be captioned. There are some carve-outs where if it's, like, broadcast between 2:00 a.m. and 4:00 a.m. in the morning or something, or if it's in a foreign language, it doesn't have to be captioned. But if you're using public airwaves, basically you need to have your video captioned. And then of course that sort of becomes pervasive online, as well, because the FCC a couple of years ago said that if any content has ever been on television and then put online, that content then needs to be captioned, as well, right? So, any TV shows or movies that were broadcast on television and then are now on Netflix, well, that needs to be captioned, right? If you have a talk show that's played, broadcast at 10:30 p.m. at night, but then you take a 5-minute clip of it and put it on Facebook, well, that was on television, that 5-minute clip, and so therefore it has to be captioned, as well. And so, basically, anything that you want to be shown to a larger audience and be accessible to deaf or hard-of-hearing people has to be captioned. >> Okay, so, you've touched on something here that I think is very important, and what you talked about just now is that for content that was originally in broadcast or some other medium, that content, when it's put online, has to be captioned. What about content that is native to an online environment, that was never broadcast or anything like that? >> Yes, technically it doesn't have to be captioned. At least the FCC has primarily steered clear of it. In fact, there was an interview a number of months ago -- I forget with whom. Basically declared that Netflix originals, right, content that Netflix develops on its own and doesn't...and only goes on Netflix doesn't have to be captioned. So, from a regulatory perspective, you're not required to have it captioned. But from a business standpoint, sort of for customer satisfaction, most content owners are moving that way. I think interesting models for this would be the online-education platforms, like Craftsy, Pluralsight, Khan Academy, lynda.com -- all of those sites, I'm not sure how familiar your listeners are with those, but they're subscription sites for the most part, where you can go online and learn, right -- further your career, learn personal skills, et cetera. And because they charge for it, for the most part, customers are looking for viewers. They're looking for a better premium experience. So, therefore, you got to those sites, and all of their videos are captioned because that's what customers are looking for. You know, 30% or more of all online-video viewers are playing video with captions turned on. Even though the total population of people with hearing difficulties is somewhere around 6%, a much larger share of the audience is actually getting value out of captions. >> Why is that? Why do you think people are actually using captioning? >> I think there's a wide variety of reasons. One big driver of it is mobile. When you are mobile, you're listening off headphones. You're on the move. So, even if you're watching, say, Netflix or Amazon Video, you might be wearing headphones possibly low quality. You want to watch with captions, generally with other people in the room. Some people, they may not be considered hard of hearing, but they have less-sensitive ears than their partners. They don't want to turn the volume way up. Sometimes, what I've heard from some viewers is when you're watching a show with accents, right? People are watching, I don't know, "Game of Thrones" or something along those lines, sometimes you want the captions just so you can understand what's being heard because it's being spoken with a heavy accent. But going back to mobile, with Facebook or Instagram, sort of auto-roll, basically video is starting to play as soon as you scroll through your feed. It's much more important to have captions. We heard from a content owner that makes heavy use of... that their videos on Facebook are viewed three times as often with captions as without captions. And that's primarily because when Facebook uses auto-roll, it's automatically muted. Your videos play, but there's no sound because just imagine all the people either on the bus or in their office setting sort of discreetly scrolling through Facebook and watching a video. And so, because they play without sound, captions are critical to get your content understood. >> That's a great answer. I think you've touched on some things that talk about why captioning is relevant for people who are not hearing-disabled. One question I think that comes up a lot is how are subtitles different than captioning? Do you have some thoughts on that? >> Yeah, I have some thoughts. Unfortunately, it's not really industry standards. Subtitles and captions, those terms get used quite often interchangeably. For us at Rev, and I think it's probably the most common usage in the industry, captioning is putting words on screen in the same language that the content is originally recorded in, right? So, imagine English video, English content with English words on screen, and that can be closed or open forms of it, whereas subtitles tend to refer to words that are in a different language, right? So, movies with English video with French subtitles, or vice versa. >> Okay, so, what we're going to do now is we're going to switch gears a little bit. We're going to talk about your company and some things that you do. First of all, what media formats do you accept for captioning? >> We essentially accept any nonproprietary video format, right? So, .mp4, QuickTime movies, Windows media files, even .avis. Essentially, if you can open up in any sort of video player, like VLC, we'll be able to caption it. On our side, what we do is we transcode it all into a standard .mp4 format, sort of down with the lower resolution so it makes it easier to move around. Some people will send us ProRes files that are gigabytes per hour or per 10 minutes, which are just impossible to move around. So, yes, we take pretty much anything, as you can tell. >> So, that includes audio-file formats, like .mp3, as well, right? >> Yes, .mp3, .wav, et cetera. With audio files, it gets a little bit trickier because we do have some audio recorders from, like, Olympus or Sony, that will record into their own formats, which are a little bit more problematic, but yes. >> Oh, yeah. Yeah, I'm familiar with Olympus. Yeah, they use a weird audio format. A lot of times what we'll do is we'll submit video. If the files are going to be really large files, what we'll do is we'll basically submit the .mp3, just the audio-only portion of the video, and then afterwards in post we'll basically take the caption content that you provide and marry it back into the video. And that works fine. >> Yeah, that works fine for us, as well. I'd say more common is people will create low-res proxies and send those over to us because there are some cases where having the video in conjunction with the audio leads to better output. Like, you know somebody is off-screen, and you can refer to that person as being off-screen. It helps with speaker tracking a little better if you do have video. But .mp3, just audio, is fine, as well. >> Okay. Actually, that's a great point. That's something I actually hadn't thought about. Let's say I submit a video file to you roughly about an hour long. What would I expect in terms of the turnaround time? >> We'll get an hour-long caption file back to you within two days. That's our guarantee, 48 hours. I'd say what's probably more typical, what you can expect is about a day, and that turnaround time is highly dependent on length, right? At the end of the day, it takes time to go through a video, type it out, synchronize it, and then quality-check. And so, if you were to send in something that's 30 minutes long, we guarantee 24 hours, and what's more typically is probably like 10. And if you were to send -- we have a lot of clients who use us for shorter clips for social media, YouTube, et cetera, and 5-minute videos will get turned around in about an hour. >> Wow. That's fantastic. What is the minimum cost for captioning? >> The minimum cost is just a dollar. So, our pricing is pretty simple. It's $1 per minute of content. So, if you have a 30-minute video, it costs $30. If it's an hour, it'll cost $60 with a one-minute minimum. If somebody sends us 10 seconds of video, we'll charge for $1. >> What an outrage. [ Laughs ] >> [ Laughs ] We have had a lot of people ask for discounts, but, frankly, people wanted to stitch those together into five of them and by a minute longest video, it'll still be a dollar. So, that's fine. >> Yeah, I've told my students about that, that they can submit their student projects to you guys, and they're typically about 3 minutes long, and then for $3, they can have captioning for their videos. They could also have a transcription, as well. >> Right, right. >> That's great. Your pricing is wonderful, and it's nice that it's easy to understand. How do you guys handle multiple speakers in, let's say, a video. If you've got, let's say, two or three people, how do you handle that? >> Yeah, our standard is to make a note that it is a different speaker by putting a dash in front of the dialogue block. That is customizable. In other words, some clients don't like having the dash. They think it distracts from the viewing experience, and so we can actually remove it. In other cases, you can actually add names, as well. You can have us add names. We do all that on the back end, anyway. So, our transcriptors are identifying speakers, noting them as such already. But if you don't want those, obviously we can relatively easily remove those. >> Okay, okay. Let's say I've got some video content that's got very specialized terminology. Let's say, for instance, medical terminology. Can you guys handle that? >> Yeah, I'd say for the most part, yes. You know, just to give you a little bit of sort of context for how we do things, how Rev works behind the scenes, right? When a customer provides, uploads a video, places an order, the video goes into our system, as I mentioned before. We do a few things to sort of transpose it, clean up, and get it to a format that's easy to sort of ingest. Then, essentially, we make it available to our captioner, and it's first come, first served, right? They have access to all of the different projects or jobs that are available at any given time and how much essentially they'll get paid to do that work. And they can listen to clips. And so, as you might imagine, what ends up happening is the people who are best suited to work on certain content or most excited to caption a particular project will claim those projects first, right? So, somebody who wants to learn about physics or knows about physics is going to be the first person most likely to claim a project, a video, that is a physics lecture. Or somebody who has experience doing medical transcription will claim medical jobs first because for others, what we expect is that you will do the necessary research to look up terms that are new to you. And for somebody who has experience, that's relatively easy, and for somebody who doesn't have that experience, they can do those projects, too, but they're still expected to look them up, right, or to identify things. >> Right. >> Yeah, so, we can do it within limits. Basically, if it's an actual word, our transcriptionist and captioners will typically find it. >> What is your accuracy, in terms of your transcription? >> We guarantee 99% word-level accuracy, so that every word, every audible word is captured properly. And then as far as time alignment is concerned, we guarantee down to a hundred milliseconds, that we'll start the caption group, the block of text will appear on screen within a hundred milliseconds of when it was actually spoken. >> That's fantastic. That's absolutely wonderful because as you know for captioning content, because of accessibility guidelines, it's very critical that you have a high degree of accuracy. And that's actually one of the problems that we see with machine translation is typically the accuracy is not good enough. >> Yeah, it's very common, particularly in the academic world but even in broadcast sometimes, well, for online, where the requirements are a little bit, they're not so stringent, in terms of accuracy, people are looking for an automated solution because the cost is lower. Despite our costs, there are...that are even lower because the accuracy isn't there, right, and particularly in words that matter. The speech-rec systems always seem to be able to get words like "the" and "and." It's the proper names of companies or individuals or products, et cetera, that they don't get correct. >> Okay, I've got one last question for you, and this has to do with basically a scenario that some people would probably experience. Let's just say that you have someone who has a YouTube video, and they want it to be captioned. What would they actually do to use your service? >> Yeah, it seems like it's pretty easy, although there are two methods. The first method is probably the easiest. You just go to rev.com/caption, and click "get started," and copy the link over from YouTube, right? So, you can go to YouTube. This is a video that you have. Copy the link and literally paste it into the order form. We will automatically get it, detect the length, and you can add in your credit card and check out, and then you'll get it back in an .srt format, which is a text file that you can then go back to YouTube and upload. That's probably the easiest to get started. I'd say most of our YouTube customers, particularly those who have ongoing needs and manage a channel, connect their YouTube account with Rev. So, the process is similar. It's just that when you go to rev.com/caption, and you go to place an order, there's an option there to connect to your YouTube account. And by clicking that, you basically log into YouTube and give Rev authorization to view your channel. And then you get a little sort of file with thumbnails of all the videos that you've uploaded to your channel. You can check the box or uncheck the box for any videos you want or don't want captioned, and then you click "okay." Then the checkout process is the same. We charge you $1 per minute. We automatically detect all the length. But when we're done with the captions, instead of sending you an .srt file, we push those captions back to YouTube on your behalf, right? So, when they're done, you just go onto your YouTube video, and you'll automatically see the captions appear on your video without having to touch it. >> Okay. Well, you know, Mark, that's it. That's all I've got for you today. >> Great. Thank you so much. >> I really appreciate your time. Okay? >> No problem -- our pleasure. We love talking about videos clearly and captions. We're trying to drive down the cost as much as possible so that more people can have access to the technology, to text on their videos. So, happy to help. >> Okay. I was talking with Mark Chen from Rev, who provides transcription and captioning services. Mark, I really appreciate your time, and I hope you have a great day. >> Thank you, Hal. Thank you. >> Thank you so much for tuning into the "Bars and Tones" podcast. Today, we are talking about captioning, and I'm joined by a very special guest, the chief marketing officer of Vitac, John Capobianco. He is joining us today over the phone. It's a big company, the biggest captioning company, accessibility company in the country. You'll see some of their captions if you're watching the recent Stanley Cup finals, "America's Got Talent," "Tonight Show with Jimmy Fallon," all the things that they caption. They also do conferences, graduations, events, and sports, which will be a little bit more relevant to our listeners here in the education field. John, what else can you tell us about Vitac and what a kind of average day is like? How much stuff are you captioning every single day? >> [ Chuckles ] Well, sometimes it kind of amazes people, just the volume of captioning we do on a daily basis. We do about 550,000 hours of captioning a year. That's a little bit more than a minute's worth of captioning for every second of every day, 24/7/365. >> Wow. >> So, 2 billion seconds of captioning on an annual basis -- just kind of an amazing thought when you think about that it's people that do this. There are people that are listening to whatever the event or the broadcast is, and they are transcribing that into the written word and transmitting that to -- it could be an event center, it could be a classroom, it could be the NBC News. And that gets put into the screen. Now, most people think of captions on real-time, on the morning news and stuff like that. You can see it. Or for a sporting event if you're in a restaurant or another establishment where it might be kind of noisy, and they're kind enough to put the captions on so you can actually know what's going on when you can't hear it. So, the average day here is a lot of realtime, what we call realtime, which is the live broadcast, and we're captioning those. And, again, that's true whether it's a lecture hall or it's an event center, and there's some baseball game going on, for instance, or it's a major event with a major corporation, and the keynote speech is being transmitted in text, as well as through sound. So, a lot of people think about that, but there's also a lot of what we call "offline," and you might think of as prerecorded programs. So, those files are sent to us, as well. And you mentioned some of those in the open. A lot of the TV shows and those kinds of things are prerecorded programs, so they come, and then they go to what's called "offline." That's what we call it, anyway. And they actually make a transcript of what's being said. They then take all that, and they time and place that verbiage. You'll notice the difference when you see captioning. If there's a lag between what somebody says and the words that are popping up, that's because it's being done real-time. It's got to move from the person who's speaking to the captioner's ears. It has to be transcribed. It has to be sent back. And typically in that environment, it's going through a bunch of technology, like encoders and those kind of things, that cause a little bit of delay. If you see the words coming up, and typically they come up at the time that somebody is speaking, that's a prerecorded program. And if it's really done properly, it's timed and placed. That is, the words are placed near the people that are speaking. If it's done properly, which we take great pride in, they make sure that the captions don't cover anything important on the screen. By the way, those are also FCC standards. And we also include things that can be heard but are not necessarily the speakers. So, there's some description of what's going on, you know, "dog barks," "clap," "music playing." You'll also see, if it's being done live captioning, you'll see the words to songs that are being sung and lyrics and those kinds of things. That's also a requirement. So, there's a lot of stuff that's going on. We also do up to 50 different languages in our multilanguage services. We also do multicasting, where we are putting together the same transmission in both English and Spanish simultaneously. We do that, as well. And, by the way, that's done in real time and in the offline. There's a lot of activity with hundreds and hundreds of people online right now transcribing some audio that's going on and turning it into the written word, which benefits lots and lots and lots of people. >> That's really amazing. And under ideal conditions, what are your live captioners -- like, what's their lag time from when they hear it to where they actually type it, minus all the encoders, just from their ear to the type? >> Well, for the actual lag that's introduced by the captioner is about a second or two. That's really all it is. The rest of that time is all technology delays. Encoders and those kinds of things cause additional delays. >> Right. >> But the captioner themselves is basically -- this is what's really strange, too, and people don't think about this because -- and I'm very familiar with that because I've only been in this business for about a year and a half and before that I thought the TV did the captioning, just like everybody else. [ Laughs ] A normal typist can type at about -- a fast typist does what, 40 to 60 words a minute? Most people talk in normal, casual conversation at about 180 words a minute. The average broadcaster is at about 225 and usually ramps up to about 280 and sometimes higher than that, 280 words a minute. These captioners keep up with that level. Our normal captioners think about a couple of hundred words a minute, 200 words a minute, as normal speaking, and that's how fast they translate this information from the spoken word to the written word. They do that in our company. We mandate a minimum of 98% accuracy, and most of our people are well above that. And that's just their normal daily work, and this is what they do all day, every day. And they enjoy it. I was amazed a year and a half ago, when I came to the company, and I went out and met many, many, you know, hundreds of captioners because most of them work remotely, and they mostly work from home. It's something that's a lifestyle business for them that they enjoy. And beyond the fact that they enjoy their work, they feel great pride in their delivery of service to the community because there are 50 million deaf and hard-of-hearing people in the United States alone, and they rely on the captions to be able to not only be included in society -- we call that accessibility -- but also, more importantly, for disaster preparedness and emergency situations, it's the only way they can get information because, of course, they can't hear it. You can add to that there's 83 million millennials. 58 million of them watch videos without sound, according to the projections that we've seen on places like Facebook, where 85% of all videos watch without sound on. That means there's another 58 million millennials that are receiving information typically on their mobile handheld through videos, and if your video is not captioned, it's meaningless, because they can't see anything coming out of it. And what's worse than that is if you let the machines caption it, and then you're the recipient of the stupid remarks that the machines make [Laughs] since they are generally in the 70% correct stand, facility. Anyway, that's kind of when we see the ASR engines. Most of them don't work all that well. They're fine for some things. You know, Siri works, but how often does it get a word wrong? And the problem is, unless captions are at least 98% accurate, they don't work for people who can't hear. They could be funny, but they're not -- by the way, the deaf and hard of hearing don't think that's funny at all. >> Mm-hmm. >> But having it be highly accurate is not only part of the law, but it's the right thing to do. >> Right, and we talked about, or you just talked about something I was going to hit on there. You're taking all your captioning in with human captioning. >> Correct. >> And we have this huge AI and computer-generated, really, insurgence of captioning. But it's still not there yet, especially for things that are critical, like health and safety information, tornado warnings, weather information. That -- you just really can't get to that level anywhere close, right? You have to use the humans still. >> It's just not accurate enough. Listen -- I don't throw any cold water on new technologies that are coming along. We use some ASR here, too, because we have voice writers, as well as stenocaptioners, so they use interpretive language. But there's a person there, and if something goes wrong -- the problem with the automated engines is nobody's monitoring it. So, when it makes a mistake -- because every word it does is a guess, right? That's what it's doing. It's guessing. "I think it's this." If they get it wrong, there's nobody there to correct it. The deaf and hard of hearing are used to this. Most people who aren't that don't know, but if you're watching captions, and you see a dash followed by words, that means the word prior to the dash was an error. The dash means "I'm correcting the error," and the correction is immediately following that. So, the fact that we have captioners associated with this means that even if it's ASR working, we have humans actually overseeing what's going on. When you try to use them without that, listen, they're making great strides, and we're all proud of the work that's going on in ASR. But it can't caption the way a human can. It doesn't have the human intelligence behind it that the captioner does. So, typically, you see things like synchronous problems. The words are on screen for -- they come too fast, or they come too slow. They catch up. The accuracy and completeness is way off. Usually, on things like proper nouns and foreign phrases, you can see a lot of times when people are using engines instead of humans. That works pretty well if you can feed it a script, like, if people are working off of scripted environments. And the problem is, as soon as they go off script, you wind up seeing a bunch of blanks on the screen because the ASR engine doesn't know what they said. So, speaker accents can cause all kinds of problems with that. So, there's a lot of the human element. When you think of captions, you got to think of them as a combination of art and science. The science part can be dealt with, but the human part is really important because the recipient is a human, and what they're looking for is the human context and the punctuation and all the things that the machines still have problems with. Maybe someday they'll get to the right spot. I don't think that's going to be in my lifetime, but they continue to make that better every day. We believe in human captioning because our job is not just the captions. It's the quality and service that we provide to the industry, not just the words themselves. >> Absolutely, and that's a big thing here in the education field is making sure that this huge accessibility push that's been going on more and more recently, especially as a lot of things have moved over to digital and technology, just making sure on campus that everybody's included and everybody is able to get the information. Can you tell us a little bit where we can expand this in the education field? >> Well, when we think about education, we got to think about more than just the accessibility for the deaf and hard of hearing, even though that's the primary mission that we have. We also need to think about English as a second language. We need to think about the benefit of transcriptions that can be available when you do captioning for sessions, whether they're training sessions, seminars, lectures, or whatever they are. Think about the fact that if you do realtime captioning for a lecture, let's say, not only do you make sure that words are presented for those who are not necessarily English as a first language, but English as a second language, but everybody has availability of the transcript of that spoken session. That's of great appeal, I believe, to the educational community because it's effectively notes that everybody can use to better understand what happened. By the way, that's not contained only in the education world, even though that's what we're talking about. Corporations find the same things. We see a huge increase in corporations doing training sessions and keynote speeches and their seminars and their big meetings, all captioning those things. Again, not only because they're presenting the information in another view -- that is, not just auditory but in the written word -- but they also have the benefit of the transcripts that come from all of that, which I think is greatly important for the education community. >> Absolutely. And we're almost out of time here with you today, but can you give us some information of how to get in contact with you, if someone's interested in reaching out? >> Well, I think the best way to contact us and find out more about is just to go straight to our website. We take a lot of pride in what we put out there. It's vitac.com, and you can find out all about us. You can contact us. You can get a hold of us there and see all the different things we do and all the different kind of programs that we offer in the marketplace. And, by the way, just to make sure everybody knows this, getting captions on your files is "A," easy, "B," it's quick, and "C," it's not all that expensive, when you think about the quality and the value you get out of it. So, just want to make sure that everybody knows that, and again just come see us at vitac.com. We'd love to help you out. >> Thank you so much for your time, John. It was a very interesting interview, and I think it's going to be a huge benefit to our listeners. Thank you for joining us on this edition of "Bars and Tones." >> Great. Thank you very much. >> John Capobianco, thanks for joining us here today. Now, Hal, we've heard a lot of things here today, but when it gets right down to it, captioning, it shouldn't be something that is "Oh, my gosh, I have to go caption this stuff." It should be something that we want to do. >> Right. So, Mark Chen said something really important, which is that captioning is something that benefits all of us. We often think of captioning for Section 508 compliance for accessibility and also because of broadcast guidelines. But, really, when we're watching a video in a noisy environment, and we turn captioning on, then suddenly we're able to follow the video along. My dad, for instance, was hard of hearing. He was functionally deaf. He could follow conversation, but for him captioning was a godsend. In fact, he sought out theaters that actually provided captioning equipment, which actually some do. So, he could go to a movie theater and actually follow along with the movie along with everyone else. So, when we think of captioning, we often think of captioning for a special case, but the reality is captioning is something that impacts all of us. So, it's really something that you want to do for your own work, but, also, you want to be an advocate for other people, as well. >> And, you know, it's becoming easier and easier to do it. Heck, even with just dropping the files onto Vimeo or YouTube, Final Cut now, you can caption right inside the NLE. So, it's really becoming easier. It's becoming much more commodity for colleges, universities, really, everybody to be able to do. >> Yes, actually, you know, one of the things that you probably would follow from this conversation we've had today is that there are standards in place for doing captioning that are easy to follow, and now you've got multiple paths in terms of being able to get your captioning. You can do captioning yourself for short-form content. Certainly there are tools now that -- like MovieCaptioner and tools like that -- that allow you to do it. But if that is a burdensome effort for you, there are, as you have heard, commercial services that can handle captioning for you and in most cases fairly reasonable. The price on captioning in general has come down a whole lot. And, again, the technology is there now. One thing that we always have to keep in mind, though, is machine translation is still not quite there yet. And so, while it can be useful for things such as key-word searching and stuff like that, we're not at a point where machine translation is good for 100% accuracy. It's still just not quite there. >> Our thanks to John Capobianco. He is the chief marketing officer at Vitac, vitac.com. Mark Chen, rev.com. He is the vice president of product at Rev. And Daniell Krawczyk, the president of Municipal Captioning. You can get to them at MunicipalCaptioning.com, or you can e-mail him. And it says DanK@MunicipalCaptioning.com, but the way I'm going to remember that is it also spells "dank." DanK@MunicipalCaptioning.com. Everybody have a great Fourth of July week. Any final thoughts? >> I have none, other than be sure that you grill and don't burn the hot dogs. >> All right. For Hal Meeks and Brandon Boucher, I'm B.J. Attarian. We will see you next time right here on the "Bars and Tone" podcast.