Rocky and I go around the world shooting videos. But it’s hard to consume the information in a video unless you have the 20 minutes to watch it, and Google can’t search the words spoken in my interviews. Today we’re going to see how SpeakerText solves the video/text conundrum.
SpeakerText is an online, on-demand, video-to-text platform. “What we’ve done is built this virtual assembly line that combines the power of crowdsourcing, with the parts of speech recognition and artificial intelligence that actually work,” says Matt Mireles, founder and CEO of SpeakerText. They do that by breaking the video’s sound into chunks, and sending it off to sites like Mechanical Turk, where workers transcribe the audio in units just five or ten seconds long. Then they reassemble the parts, have editors check for errors, use phonetic speech recognition to timestamp each word, and natural language processing to figure out sentence boundaries. And that’s just the platform.
Mireles points out that this process, which he calls “distributed human computation,” allows SpeakerText to transcribe a two-hour long video just about as fast as one that’s only a minute long. “There are all these things where people are trying to create truly artificial intelligence, and it’s not there,” says Mireles. “It always reaches maybe 85 percent accuracy that is okay, but not complete. The way these problems actually get solved is by layering on the human component.”
SpeakerText web site: http://speakertext.com/
SpeakerText blog: http://blog.speakertext.com/
SpeakerText on Twitter: http://twitter.com/speakertext
Matt Mireles on Twitter: http://twitter.com/mattmireles
SpeakerText profile on Crunchbase: http://www.crunchbase.com/company/speakertext