Meta Says Its AI Training Used Public Social Media Posts—Mostly

The social media giant tapped the massive cache of posts by Facebook and Instagram users, the “majority” of the content deemed publicly available.

By Jose Antonio Lanz

Oct 3, 2023

3 min read

Image: Top_CNX / Shutterstock.com

Add on Google

Meta Platforms recently confirmed using public Facebook and Instagram posts to train its new Meta AI virtual assistant. The objective was clear: improving the performance of artificial intelligence and machine learning system by studying real user behaviors and preferences.

"We've tried to exclude datasets that have a heavy preponderance of personal information," Nick Clegg, Meta's President of Global Affairs, told Reuters. Clegg said the “vast majority” of the dataset was publicly available, but did not clarify if the remaining materia needed to complete the dataset was private, confidential data.

"We've tried to exclude datasets that have a heavy preponderance of personal information," Clegg said, citing Microsoft-owned LinkedIn as an example of a social network that Meta avoided because of privacy concerns.

The emphasis on artificial intelligence was palpable at the company’s annual Connect conference last week. Meta introduced MetaAI, an AI assistant with a broad range of personalities and functions, poised to redefine interactions through voice, text, and gestures.

Another significant revelation was a collaboration with eyewear brand Ray-Ban, resulting in smart glasses integrated with MetaAI along with other AI-powered tools that will be integrated into Meta’s social media apps.

Tech firms have long mined their own users’ data to feed recommendation engines. Beyond Meta, platforms like Spotify analyze listening habits to curate music suggestions, while Netflix employs viewing patterns to recommend shows and movies. Social platforms like Twitter and Instagram assess user interactions to personalize news feeds. When processed with AI, the phrase “you are the product” is more obvious than ever.

On the other side of the spectrum, some platforms have opted for a model that gives users control over their information. For example, an Amazon spokesperson told Decrypt that even though "[Amazon] has always believed that training Alexa with real-world requests is essential to delivering an experience to customers that's accurate and personalized and constantly getting better," its users have the option to prevent the company from training its model on their data.

OpenAI has also taken this approach. Artists who don’t want their work to be ingested for training the company’s models can opt-out—but the process is so onerous that some consider it “enraging.” Artists must file separate requests for each of their works, so the difficulty of opting out is significant if you are an artist like Greg Rutkowski with hundreds (or even thousands) of paintings.

The importance of user data in shaping these experiences cannot be understated. Meta and its tech industry counterparts must balance groundbreaking technological strides and the sanctity of user privacy.

The next time you share a post or ask Alexa a question, there's a chance you're training the next generation of AI. So choose your words wisely!

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices