Lyrebird claims it can recreate any voice using just one minute of sample audio

The results aren’t 100 percent convincing, but it’s a sign of things to come

By James Vincent, a senior reporter who has covered AI, robotics, and more for eight years at The Verge.

Apr 24, 2017, 4:04 PM UTC

Artificial intelligence is making human speech as malleable and replicable as pixels. Today, a Canadian AI startup named Lyrebird unveiled its first product: a set of algorithms the company claims can clone anyone’s voice by listening to just a single minute of sample audio.

A few years ago this would have been impossible, but the analytic prowess of machine learning has proven to be a perfect fit for the idiosyncrasies of human speech. Using artificial intelligence, companies like Google have been able to create incredibly life-like synthesized voices, while Adobe has unveiled its own prototype software called Project VoCo that can edit human speech like Photoshop tweaks digital images.

But while Project VoCo requires at least 20 minutes of sample audio before it can mimic a voice, Lyrebird cuts this requirements down to just 60 seconds. The results certainly aren’t indistinguishable from human speech, but they’re impressive all the same, and will no doubt improve over time. Below you can hear the synthesized voices of Donald Trump, Barack Obama, and Hillary Clinton discussing the startup:

Lyrebird says its algorithms can also infuse the speech it creates with emotion, letting customers make voices sound angry, sympathetic, or stressed out. The resulting speech can be put to a wide range of uses, says Lyrebird, including “reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.” It takes quite a bit of computing power to generate a voice-print, but once done, the speech is easy to make — Lyrebird can create one thousand sentences in less than half a second.

There are more troubling uses as well. We already know that synthetic voice generators can trick biometric software used to verify identity. And, given enough source material, AI programs can generate pretty convincing fake pictures and video of anyone you like. For example, this research from 2016 uses 3D mapping to turn videos of famous politicians, including George W. Bush and Vladimir Putin, into real-time “puppets” controlled by engineers. Combine this with a realistic voice synthesizer and you could have a Facebook video of Donald Trump announcing that the US is bombing North Korea going viral before you know it. That said, while Lyrebird does do a good Trump impression, its other voices are noticeably more robotic:

Lyrebird is aware of these problems, but its suggested fix feels far from adequate. In an “Ethics” section on the company’s website, Lyrebird’s founders (three university students from the University of Montréal) acknowledge that their technology “raises important societal issues,” including bringing into question the veracity of audio recordings used in court. “This could potentially have dangerous consequences such as misleading diplomats, fraud, and more generally any other problem caused by stealing the identity of someone else,” they write.

Their solution is to release the technology publicly and make it “available to anyone.” That way, they say, the damage will be lessened because “everyone will soon be aware that such technology exists.” Speaking to The Verge, Alexandre de Brébisson of Lyrebird adds: “The situation is comparable to Photoshop. People are now aware that photos can be faked. I think in the future, audio recordings are going to become less and less reliable [as evidence].” However, de Brébisson concedes that even though Photoshop is now well known, people still fall for convincing fakes in the right context. The same would surely be true of voice synthesis.

For now, Lyrebird tech’s is still in development, and the company doesn’t want to discuss pricing. But de Brébisson says more than 6,000 individuals have signed up for early access to its APIs, and Lyrebird is working to improve its algorithms, including adding support for different languages like French. “This technology is going to happen,” says de Brébisson. “If it’s not us it’s going to be someone else.”

Update April 25th, 12.30PM ET: Updated with quotes from Lyrebird’s Alexandre de Brébisson

Lyrebird claims it can recreate any voice using just one minute of sample audio

Lyrebird claims it can recreate any voice using just one minute of sample audio

The results aren’t 100 percent convincing, but it’s a sign of things to come

Sonos CEO Patrick Spence steps down after disastrous app launch

The best actually real stuff at CES 2025

Sonos’ interim CEO hits all the right notes in first letter to employees

What does Mark Zuckerberg want from Donald Trump?

Pete Buttigieg has a few things to say on his way out

More from Tech

Friend or Faux?

Today is your last chance to shop Cyber Monday deals

China limits US export of chipmaking materials following sanctions

A gadget lover’s guide to the great outdoors

Lyrebird claims it can recreate any voice using just one minute of sample audio

Lyrebird claims it can recreate any voice using just one minute of sample audio

The results aren’t 100 percent convincing, but it’s a sign of things to come

Share this story

Sonos CEO Patrick Spence steps down after disastrous app launch

The best actually real stuff at CES 2025

Sonos’ interim CEO hits all the right notes in first letter to employees

What does Mark Zuckerberg want from Donald Trump?

Pete Buttigieg has a few things to say on his way out

More from Tech

Friend or Faux?

Today is your last chance to shop Cyber Monday deals

China limits US export of chipmaking materials following sanctions

A gadget lover’s guide to the great outdoors