Decrypt’s Art, Fashion, and Entertainment Hub.
Meta continued its push into the increasingly crowded AI field on Friday, announcing the creation of a tool called Voicebox. It's an app for generating spoken dialogue with a variety of potential use cases—but it's also ripe for misuse, as Meta admits, which is exactly why the social media giant isn't releasing Voicebox to the public just yet.
Unlike previous voice generator platforms, Meta says Voicebox can perform speech generation tasks it was not specifically trained on. With text input and a brief audio clip for context, the AI tool can create a potentially believable chunk of new speech that sounds like whoever was featured in the source clip.
"Prior to Voicebox, generative AI for speech required specific training for each task using carefully prepared training data," Meta AI said. "Voicebox uses a new approach to learn just from raw audio and an accompanying transcription."
Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.
More details on this work & examples ⬇️
— Meta AI (@MetaAI) June 16, 2023
Generative AI is a type of program that is capable of generating text, images, or other media in response to user prompts. Meta AI said that Voicebox can produce audio in six languages, including English, French, German, Spanish, Polish, and Portuguese, and can do so closer to how people speak naturally in the real world.
Meta suggests that the tool can be used to improve conversations across languages using tech tools, or to deliver more natural-sounding video game character dialogue. But Voicebox also looks like a faster and more economical way to create copycat "deepfake" dialogue, making it sound like someone (perhaps a public figure or celebrity) said something that they really didn't.
While it may be a breakthrough in AI development, Meta AI also acknowledged that potential for misuse, saying that the company has developed classifiers that distinguish between Voicebox creations and humans. Similar to spam filters, AI classifiers are programs that sort data into different groups or classes—in this case, human or AI-generated.
Meta stressed the need for transparency in AI development in its blog post, saying that it is crucial to be open with the research community. However, the company also said that it has no plans to make Voicebox publicly available due to the potential to harness the tech in potentially negative ways.
"There are many exciting use cases for generative speech models, but because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time," a Meta AI spokesperson told Decrypt in an email.
"While we believe it is important to be open with the AI community and to share our research to advance the state of the art in AI,” the spokesperson continued, “it’s also necessary to strike the right balance between openness with responsibility."
Instead of releasing the tool in a functional state, Meta has shared audio samples and a research paper to help fellow researchers understand its potential.
AI risks emerge
While artificial intelligence tools, specifically AI chatbots, have become more commonplace since the launch of OpenAI’s ChatGPT last November, rapid advances in artificial intelligence have global leaders sounding the alarm about the potential misuse of the technology.
On Monday, the UN Secretary-General reiterated the need to take the warnings about generative AI seriously.
"Alarm bells over the latest form of artificial intelligence, generative AI, are deafening, and they are loudest from the developers who designed it," UN Secretary-General António Guterres said at a press conference. "The scientists and experts have called on the world to act, declaring AI an existential threat to humanity on a par with the risk of nuclear war."
As concerning as the threat of global nuclear war may be, that likelihood remains in the realm of science fiction and Hollywood blockbusters. A more likely abuse of generative AI comes from scams targeting individuals using deepfake images and voices to trick victims out of money, for example—or, as the UN said in a recent report, fueling hate and misinformation online.
A deepfake is an increasingly common type of video or audio content created with artificial intelligence that depicts false events, but is done so in a way that can be very difficult to identify as being faked.
In April, CNN reported that scammers used AI technology to clone the voice of an Arizona woman's 15-year-old daughter, claiming to have kidnapped the teenager and demanding ransom. And in March, an AI-generated image of former President Donald Trump being arrested went viral after being shared on social media.