A Sound Decision: Meta Rolls Out AI-Powered Audiobox

The new tool lets users create diverse audio content using advanced AI, surpassing its controversial Voicebox predecessor.

4 min read

Dec 11, 2023

Facebook parent company Meta released the first demo for its new AI-powered audio generator platform, Audiobox, on Monday. The social media giant said Audiobox lets users create custom voices and sound effects using voice inputs and prompts.

Audiobox, Meta said, builds on the technology developed for its Voicebox platform introduced earlier this year, but it surpasses Voicebox in quality and includes automatic watermarking for “responsible use.”

“Audiobox, the successor to Voicebox, is advancing generative AI for audio even further by unifying generation and editing capabilities for speech, sound effects (short, discrete sounds like a dog bark, car horn, a crack of thunder, etc.), and soundscapes, with a variety of input mechanisms to maximize controllability for each use case,” Meta’s Audiobox team said.

Audiobox, the team explained, uses “bespoke solvers,” which they claim makes the generation process over 25 times faster than previous models without loss of performance.

In June, Meta announced Voicebox, a generative AI tool Meta said can produce audio in six languages, including English, French, German, Spanish, Polish, and Portuguese, and can do so closer to how people speak naturally in the real world.

With concerns about AI-powered deepfakes rising at the time, Meta said it would not release Voicebox to the public, acknowledging the potential for misuse. To combat misuse with Audiobox, Meta included watermarking.

“Recent advancement in quality and fidelity in the audio generative model has empowered novel applications and use [cases] on the model. However, at the same time, there are many people... raising concerns about the risks of misuse,” the Audiobox team said in its report. “Therefore, the ability to recognize which audio is generated or real is crucial to prevent the [misuse] of the technology and enable certain [platforms] to comply with their policy.”

“Both the Audiobox model and our interactive demo feature automatic audio watermarking so any audio created with Audiobox can be accurately traced to its origin,” Meta said. “Our watermarking method embeds a signal into the audio that’s imperceptible to the human ear but can be detected all the way down to the frame level using a model capable of finding AI-generated segments in [the] audio.”

“We design description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms,” the team said. “We allow transcript, vocal, and other audio styles to be controlled independently when generating speech.”

While it may be faster, Meta acknowledged that audio-generative AI models like Audiobox are limited by the amount of training data—in this case, sounds—labeled and fed into the AI model, emphasizing the importance of correctly labeling data.

An example, the researchers said, labeling the sounds of a chihuahua and a labrador barking as the specific dog type is preferable to simply labeling it as “dog barking.” Meta says the same applies to speech patterns like accents and regional dialects.

A Meta spokesperson declined to provide further comment.

Like Google, Microsoft, and Amazon, Meta has invested heavily in artificial intelligence. Earlier this month, Meta announced over 20 new AI-powered features coming to its suite of platforms, including Facebook, Instagram, and WhatsApp.

A proponent of responsible AI development, Meta recently partnered with IBM to launch the AI Alliance, a consortium of over 50 companies, universities, and think tanks focused on open-source AI innovation and development.

“The AI Alliance brings together researchers, developers, and companies to share tools and knowledge that can help us all make progress whether models are shared openly or not,” President of Global Affairs of Meta Nick Clegg said. “We’re looking forward to working with partners to advance the state-of-the-art in AI and help everyone build responsibly.”

Edited by Ryan Ozawa.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

The Most Surprising Bitcoin and Crypto Stories in the Epstein Files

A search through the trove of files related to convicted sex offender Jeffrey Epstein provides thousands of results related to crypto and Bitcoin, highlighted by Epstein’s early involvement and awareness of notable crypto projects and protocols. Over the course of the week, Decrypt has highlighted some of the largest stories that emerged from the millions of files released last week by the Department of Justice, including how Epstein invested in Coinbase and Bitcoin firm Blockstream, and had a v...

South Korean Crypto Exchange Accidentally Gave Away $43 Billion in Bitcoin

A South Korean crypto exchange accidentally credited users with billions of dollars’ worth of Bitcoin this week, triggering a flash crash in the platform’s listed value of the token. Instead of airdropping users 2,000 won (a sum worth $1.37 at writing), the exchange, Bithumb, reportedly sent 2,000 BTC apiece, users said. That massive sum was worth some $142 million at writing, with Bitcoin trading around $71,000 as of Friday when the issue was first disclosed. According to local reports, the acc...

Japan's Crypto Industry Faces Critical Test Ahead of Snap Election

As Japan heads to the polls on Sunday, Prime Minister Sanae Takaichi is staking her political future on translating approval ratings of 60-80% into a parliamentary majority that could accelerate crypto reforms. Takaichi has turned the election into a referendum on her leadership, declaring she is "putting my future as prime minister on this election." She called the parliament dissolution move an “extremely weighty decision” that would “determine Japan’s course together with the people,” setting...

News

Courses

Deep Dives

Coins

Videos