Meta Is Training Its AI on the Bible and Other Religious Texts

For its artificial intelligence speech tool, Meta is processing over 4,000 languages with the aim of preservation.

By Jason Nelson

3 min read

The parent company of Facebook and Instagram says it has developed an AI-powered text-to-speech technology that can identify over 4,000 languages. The goal, Meta says, is to preserve the world's languages, and the tech giant is using the Bible and other religious texts to do it.

"Collecting audio data for thousands of languages was our first challenge because the largest existing speech datasets cover 100 languages at most," Meta said in a post announcing the project. "To overcome this, we turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research."

In an accompanying research paper by the Meta AI core team, the company says it obtained its data from the Bible, including original text and audio recordings from FaithComesByHearing.com, GoTo.Bible, and Bible.com.

The project includes recordings of Bible stories, evangelistic messages, scripture readings, and songs in more than 6,255 languages and dialects. While most recordings feature were often by male readers, the Meta says its models work equally well for female voices.

A dataset of readings of the New Testament, Meta said, provided more than 1,100 languages that provided 32 hours of data per language on average.

According to Broward College's Lingua Language Center, there are over 7,100 living languages worldwide.

"Our consultations with Christian ethicists concluded that most Christians would not regard the New Testament, and translations thereof, as too sacred to be used in machine learning," the Meta AI team said, adding that the same is not true for all religious texts.

“There is also the risk of religious training data biasing the models with respect to a particular world view,” Meta AI said. “However, our analysis of the language generated by our models suggests that the language produced by the resulting speech recognition models exhibit only little bias compared to baseline models trained on other domains.”

After its metaverse ambitions fizzled earlier this year, Meta appears to have shifted its focus to artificial intelligence, including building an AI tool to identify and separate items in pictures and an AI-powered tool to help brands target users on its Facebook and Instagram platforms.

While the technology is still in its early stages, Meta says it is open-sourcing its data and code so that others can build on, develop, and improve the platform.

"Many of the world’s languages are in danger of disappearing, and the limitations of current speech recognition and generation technology will only accelerate this trend," Meta said. "We want to make it easier for people to access information and use devices in their preferred language, and today we’re announcing a series of artificial intelligence models that could help them do just that."

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Recommended News