An MIT Algorithm Predicts the Future by Watching TV

Using neural networks and shows like “The Office” and “Big Bang Theory,” CSAIL’s system was able to predict how actors were about to greet each other.
The Office  Season 9
NBC

The next time you catch your robot watching sitcoms, don't assume it's slacking off. It may be hard at work.

TV shows and video clips can help artificially intelligent systems learn about and anticipate human interactions, according to MIT's Computer Science and Artificial Intelligence Laboratory. Researchers created an algorithm that analyzes video, then uses what it learns to predict how humans will behave.

Six-hundred hours of clips from shows like The Office and Big Bang Theory let the AI learned to identify high-fives, handshakes, hugs, and kisses. Then it learned what the moments leading to those interactions looked like.

After the AI devoured all that video to train itself, the researchers fed the algorithm a single frame from a video it had not seen and tasked it with predicting what would happen next. The algorithm got it right about 43 percent of the time.

Humans nail the answer 71 percent of the time, but the researchers still think the AI did a great job, given its rudimentary education. "Even a toddler has much more life experience than this," says Carl Vondrick, the project's lead author. "I'm interested to see how much the algorithms improve if we train it on years of videos."

The AI doesn't understand what's happening in the scene in the same way a human does. It analyzes the composition and movement of pixels to identify patterns."It drew its own conclusions in terms of correlations between the visuals and the eventual action," says Vondrick.

Vondrick was among three people who spent two years on the project. He says the efficient, self-reliant training could come in handy for more important things than watching reruns.

For example, an improved version of the system could have a future in hospitals and in places where it could prevent injuries. He mentions smart cameras that could analyze video feeds and alert emergency responders if somebody is about to fall or something catastrophic is about to happen. Embed these systems in robots, and they could even intervene in these situations themselves.