Anthropic Says It Won’t Use Your Private Data to Train Its AI

As AI companies hungrily consume every trove of data they can find, the makers of Claude.AI say they won’t their its own customers' work to improve its chatbot.

Jan 5, 2024

4 min read

Leading generative AI startup Anthropic has declared that it will not use its clients’ data to train its Large Language Model (LLM), and that it will step in to defend users facing copyright claims.

Anthropic, founded by former researchers from OpenAI, updated its commercial Terms of Service to spell out its ideals and intentions. By carving out the private data of its own customers, Anthropic is solidly differentiating itself from rivals like OpenAI, Amazon and Meta, which do leverage user content to improve their systems.

"Anthropic may not train models on customer content from paid services,” according to the updated terms, which adds that "as between the parties and to the extent permitted by applicable law, anthropic agrees that customer owns all outputs, and disclaims any rights it receives to the customer content under these terms."

The terms go on to say that "Anthropic does not anticipate obtaining any rights in customer content under these terms” and that they “do not grant either party any rights to the other’s content or intellectual property, by implication or otherwise.”

The updated legal document ostensibly provides protections and transparency for Anthropic's commercial clients. Companies own all AI outputs generated, for example, avoiding potential IP disputes. Anthropic also commits to defending clients from copyright claims over any infringing content produced by Claude.

The policy aligns with Anthropic's mission statement that AI should be beneficial, harmless, and honest. As public skepticism grows over the ethics of generative AI, the company's commitment to addressing concerns like data privacy could give it a competitive edge.

Users’ Data: LLMs' Vital Food

Large Language Models (LLMs) like GPT-4, LlaMa or Anthropic's Claude are advanced AI systems that understand and generate human language by being trained on extensive text data. These models leverage deep learning techniques and neural networks to predict word sequences, understand context, and grasp the subtleties of language. During training, they continually refine their predictions, enhancing their ability to converse, compose text, or provide relevant information. The effectiveness of LLMs depends heavily on the diversity and volume of the data they are trained on, making them more accurate and contextually aware as they learn from various language patterns, styles, and new information.

And this is why users’ data is so valuable in training LLMs. Firstly, it ensures that the models stay current with the latest linguistic trends and user preferences (for example, understanding new slangs). Secondly, it allows for personalization and better user engagement by adapting to individual user interactions and styles. However, this generates an ethical debate because AI companies don’t pay users for this crucial information which is used to train models that make them millions of dollars.

As reported by Decrypt, Meta recently revealed that it is training its upcoming LlaMA-3 LLM based on users’ data and its new EMU models (which generate photos and videos from text prompts) were also trained using publicly available data uploaded by its users on social media.

Besides that, Amazon also revealed that its upcoming LLM, which would power an upgraded version of Alexa is also being trained on users’ conversations and interactions, however, users can opt-out of the training data which by default is set to assume users agree to share this information.“[Amazon] has always believed that training Alexa with real-world requests is essential to delivering an experience to customers that's accurate and personalized and constantly getting better,” an Amazon spokesperson told Decrypt. “But in tandem, we give customers control over whether their Alexa voice recordings are used to improve the service, and we always honor our customer preferences when we train our models.”

With tech giants racing to release the most advanced AI services, responsible data practices are key to earning public trust. Anthropic aims to lead by example in this regard. The ethical debate over gaining more powerful and convenient models at the expense of surrendering personal information is as prevalent today as it was decades ago when social media popularized the concept of users becoming the product in exchange for free services.

Yes! RT @bryce love this quote "If you're not paying for it, you're not the customer; you're the product being sold." http://bit.ly/93JYCJ

— Tim O'Reilly (@timoreilly) September 2, 2010

Edited by Ryan Ozawa.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices