Q: How does the site work?
A: The way the site itself works is fairly simple: the program searches through a database of short video clips that are annotated with the word or words that Obama says in each clip. Once it’s found everything you typed in, it stitches the clips together using a popular, freely-available video editing program.
It includes phrases up to six words long, so if you put in a very common phrase like “the American dream,” it will provide a clip of that without having to stitch together three different single-word clips. It also includes clips of individual letters and sounds, so if you put in a word that isn’t in the database, it can “create” the word by stitching together the individual sounds that make up the word.
Q: What inspired you to create TalkObamaTo.Me?
A: Way back in the early days of YouTube, somebody posted a video of George W. Bush that was stitched together to make it look like he was singing “Sunday Bloody Sunday” by U2. Whoever created it had taken video of the State of the Union and meticulously cut up each sound and pasted them back together to create the song. I was in the middle of college and was taking a lot of linguistics and computer science classes, and I realized that a lot of that cut-and-paste process could probably be automated if you had enough video of a single person. And the one person that was all over digital video in those days was then-Senator Barack Obama.
Then I started a PhD program and of course didn’t have time to work on my project idea. But in the meantime, a good number of political remix artists popped up -- for example, BaracksDubs is a YouTube channel that stitches together video of Obama singing pop songs, and CassetteBoy is a digital artist who stitches together videos of politicians in order to make political statements. Their successes strengthened my resolve to make an automated video remixing program, and once I found the time I started work on TalkObamaTo.Me.
Q: Where did the videos come from?
A: Since President Obama entered office, the White House has posted a weekly video address to the nation, in which the President (or occasionally the Vice President or First Lady) speaks on some issue of national importance—it’s sort of like FDR’s “Fireside Chats,” updated for the YouTube age. These videos, and their transcripts, are freely available to the public.
I downloaded a few hundred Weekly Address videos from the White House website and used techniques from computational linguistics to match up the words in the transcript with the audio. I then chose the “best” clip for each word, and extracted these from the longer videos to create the database for TalkObamaTo.Me.
Q: How long did the whole thing take to put together?
A: The three main components—building the video database, writing the code to stitch the videos together, and building the machine learning model that guesses how to “pronounce” words that it’s never seen before—each took about a couple of months of on-and-off work. If I had been working on it full-time, it probably would have taken about three months.
Q: What was the reception like?
A: I posted the link on my personal Facebook and Twitter accounts on a Tuesday morning, and before the day was over someone had posted it on Hacker News, a major board where people post links to interesting tech projects and news. The next day, the link made the rounds on the tech blogs, and traffic took off. The site saw over 100,000 users in the first day, and about 650,000 in the first week.
Reception was largely positive—most people loved the idea, and I received dozens of congratulatory messages and requests for information about how the site works. Some people rightfully criticized how choppy and jumpy the resulting videos seem, but unfortunately that’s unavoidable. I did get some hate mail from a stranger calling me a “vapid time waster,” which gave me a good laugh.
Q: What are the implications and applications of this type of technology?
A: I think this project in particular is mostly useful as a toy that can be used to generate cute or funny videos (and, of course, since it’s on the Internet and since the site will accept any text input, plenty of videos include words that are not suitable for print).
The idea of creating a computer speech generator for a particular person isn’t new: I remember hearing that a group of engineers created a personal synthesizer machine for Roger Ebert after he could no longer speak due to surgery related to his cancer. But this technology is getting better and better: Amazon recently announced a program that can create a personalized speech synthesizer for anyone, using only a few minutes’ worth of speech. And, last year, a Stanford lab released a demo of a program that lets you superimpose your own mouth movements onto a video of someone else, allowing you to very realistically change what the other person appears to be saying.
Taken together, these voice and video technologies will be able to produce very lifelike impersonations of anyone you can find video and audio of. It will be sort of like Photoshop but for video of people speaking. Just as we currently have to be careful trusting what we see (and, especially in 2016, what we read), we might soon have to learn to be careful trusting what we watch.
Ed King is involved in psycholinguistics, computational modeling, and language and gender. He is the creator of TalkObamaTo.Me, a web-based video speech synthesizer of President Obama.