SpeakerText: Extracting text from video, with speed and savvy

January 18, 2011 | Robert Scoble

Rocky and I go around the world shooting videos. But it’s hard to consume the information in a video unless you have the 20 minutes to watch it, and Google can’t search the words spoken in my interviews. Today we’re going to see how SpeakerText solves the video/text conundrum.

SpeakerText is an online, on-demand, video-to-text platform. “What we’ve done is built this virtual assembly line that combines the power of crowdsourcing, with the parts of speech recognition and artificial intelligence that actually work,” says Matt Mireles, founder and CEO of SpeakerText. They do that by breaking the video’s sound into chunks, and sending it off to sites like Mechanical Turk, where workers transcribe the audio in units just five or ten seconds long. Then they reassemble the parts, have editors check for errors, use phonetic speech recognition to timestamp each word, and natural language processing to figure out sentence boundaries. And that’s just the platform.

“On top of that, we’ve built this application layer,” explains Mireles. “We’ve built this widget that taps into the JavaScript API of a video player, any player, whether it’s Brightcove, Ooyala, YouTube, blip.tv, even self-hosted videos. As the video plays back, it highlights each sentence as a video plays, and scrolls through the transcript. You can click on the transcript, and it will jump to the exact moment you’re interested in.” That also allows you to find a great quote in a video and tweet a link to that exact moment.

Mireles points out that this process, which he calls “distributed human computation,” allows SpeakerText to transcribe a two-hour long video just about as fast as one that’s only a minute long. “There are all these things where people are trying to create truly artificial intelligence, and it’s not there,” says Mireles. “It always reaches maybe 85 percent accuracy that is okay, but not complete. The way these problems actually get solved is by layering on the human component.”

More info:
SpeakerText web site: http://speakertext.com/
SpeakerText blog: http://blog.speakertext.com/
SpeakerText on Twitter: http://twitter.com/speakertext
Matt Mireles on Twitter: http://twitter.com/mattmireles
SpeakerText profile on Crunchbase: http://www.crunchbase.com/company/speakertext

This post was tagged:


Tom January 18, 2011 at 5:07 pm

Guess Nuance hasn’t yet moved into this market – running it through DragonSoft surely could be a 1st pass prior to putting it through a Mechanical Turk process?
I guess there’s also crowd sourcing it –

Why hasn’t someone made an audio CAPTCHA system, whereby it’s snippets of an interview – you have to ID the words, and then this gets double checked etc => free transcript.

Keith January 18, 2011 at 1:19 pm

Would have been a good idea to show the text from this video, no?

Comments on this entry are closed.