Anthropic Says It Won’t Use Your Private Data to Train Its AI

As AI companies hungrily consume every trove of data they can find, the makers of Claude.AI say they won’t their its own customers' work to improve its chatbot.

4 min read

Jan 5, 2024

Leading generative AI startup Anthropic has declared that it will not use its clients’ data to train its Large Language Model (LLM), and that it will step in to defend users facing copyright claims.

Anthropic, founded by former researchers from OpenAI, updated its commercial Terms of Service to spell out its ideals and intentions. By carving out the private data of its own customers, Anthropic is solidly differentiating itself from rivals like OpenAI, Amazon and Meta, which do leverage user content to improve their systems.

"Anthropic may not train models on customer content from paid services,” according to the updated terms, which adds that "as between the parties and to the extent permitted by applicable law, anthropic agrees that customer owns all outputs, and disclaims any rights it receives to the customer content under these terms."

The terms go on to say that "Anthropic does not anticipate obtaining any rights in customer content under these terms” and that they “do not grant either party any rights to the other’s content or intellectual property, by implication or otherwise.”

The updated legal document ostensibly provides protections and transparency for Anthropic's commercial clients. Companies own all AI outputs generated, for example, avoiding potential IP disputes. Anthropic also commits to defending clients from copyright claims over any infringing content produced by Claude.

The policy aligns with Anthropic's mission statement that AI should be beneficial, harmless, and honest. As public skepticism grows over the ethics of generative AI, the company's commitment to addressing concerns like data privacy could give it a competitive edge.

Users’ Data: LLMs' Vital Food

Large Language Models (LLMs) like GPT-4, LlaMa or Anthropic's Claude are advanced AI systems that understand and generate human language by being trained on extensive text data. These models leverage deep learning techniques and neural networks to predict word sequences, understand context, and grasp the subtleties of language. During training, they continually refine their predictions, enhancing their ability to converse, compose text, or provide relevant information. The effectiveness of LLMs depends heavily on the diversity and volume of the data they are trained on, making them more accurate and contextually aware as they learn from various language patterns, styles, and new information.

And this is why users’ data is so valuable in training LLMs. Firstly, it ensures that the models stay current with the latest linguistic trends and user preferences (for example, understanding new slangs). Secondly, it allows for personalization and better user engagement by adapting to individual user interactions and styles. However, this generates an ethical debate because AI companies don’t pay users for this crucial information which is used to train models that make them millions of dollars.

As reported by Decrypt, Meta recently revealed that it is training its upcoming LlaMA-3 LLM based on users’ data and its new EMU models (which generate photos and videos from text prompts) were also trained using publicly available data uploaded by its users on social media.

Besides that, Amazon also revealed that its upcoming LLM, which would power an upgraded version of Alexa is also being trained on users’ conversations and interactions, however, users can opt-out of the training data which by default is set to assume users agree to share this information.“[Amazon] has always believed that training Alexa with real-world requests is essential to delivering an experience to customers that's accurate and personalized and constantly getting better,” an Amazon spokesperson told Decrypt. “But in tandem, we give customers control over whether their Alexa voice recordings are used to improve the service, and we always honor our customer preferences when we train our models.”

With tech giants racing to release the most advanced AI services, responsible data practices are key to earning public trust. Anthropic aims to lead by example in this regard. The ethical debate over gaining more powerful and convenient models at the expense of surrendering personal information is as prevalent today as it was decades ago when social media popularized the concept of users becoming the product in exchange for free services.

Edited by Ryan Ozawa.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Arizona Senate Advances Bill to Create Digital Assets Reserve Fund

Arizona lawmakers are moving to establish a state-managed digital assets reserve fund, advancing legislation that would allow the state treasurer to hold, invest, and loan seized crypto, as Bitcoin slid almost 5% on the day of its committee clearance. Last week, the Arizona Senate Finance Committee voted 4–2–1 to advance Senate Bill 1649, introduced by Senator Mark Finchem, which would create a "Digital Assets Strategic Reserve Fund" built from cryptocurrency seized, confiscated, or surrendered...

Fed Moves to Permanently Drop ‘Reputational Risk’ From Bank Supervision

The Federal Reserve has opened a two-month comment period on a proposal to permanently codify the removal of "reputational risk" from its bank supervision rules, the most binding step yet in a sweeping regulatory rollback that crypto advocates say puts Operation Choke Point 2.0 to bed. The move follows a last year announcement that the term would no longer factor into bank supervision and would instead be replaced with a focus on “material financial risks.” "This vague and inherently subjective...

Terraform Estate Sues Jane Street Over Trades Tied to 2022 Crypto Market Collapse: WSJ

The Terraform Labs bankruptcy administrator has sued Jane Street, alleging the quantitative trading firm used non-public information to profit at the height of the crypto market’s collapse in 2022. The lawsuit centers on allegations that Jane Street obtained advance insight into Terraform’s internal liquidity decisions and positioned trades around those moves as TerraUSD began to lose its dollar peg, according to a report from The Wall Street Journal on Monday. “Jane Street abused market relatio...

News

Courses

Deep Dives

Coins

Videos