‘Not Telling The Full Story’: OpenAI Challenges NYT's Copyright Lawsuit Claims

OpenAI says the New York Times cherry picked responses and manipulated prompts to make its chatbot “regurgitate” content.

By Jose Antonio Lanz

Jan 9, 2024

4 min read

The New York Times tests blockchain to fight fake news. Image: Shutterstock

Add on Google

In response to a lawsuit filed by the New York Times, in which the news outlet accused OpenAI of using its news content to train its AI model, OpenAI has brought receipts. The leading AI developer leaned into its oft-declared commitment to the news industry, declaring, "We support journalism, partner with news organizations, and believe The New York Times lawsuit is without merit."

OpenAI also accused the New York Times of incomplete reporting, alleging that "the New York Times is not telling the full story." The company suggests that the examples used by the newspaper came from older articles that are widely available on third-party websites, and also hinted that the New York Time had crafted its AI prompts to generate the most damning evidence.

“It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate,” OpenAI said, implying that the New York Times acted in bad faith by providing unnatural prompts as evidence.

“Even when using such prompts, our models don’t typically behave the way the New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

Prompt manipulation is a common practice in which people can trick an AI model into doing things it’s not programmed to do, using specific prompts that trigger a very specific response that would not be obtained under normal conditions.

OpenAI emphasized its collaboration with the news industry.

"We work hard in our technology design process to support news organizations," the company wrote, highlighting the deployment of AI tools that assist reporters and editors and the goal of mutual growth for both AI and journalism. OpenAI recently formed a partnership with Axel Springer—publisher of Rolling Stone—to provide more accurate news summaries.

Addressing the issue of content "regurgitation," as the New York Times alleged, OpenAI admits that it is an uncommon but existing issue that they are working to mitigate.

"Memorization is a rare failure of the learning process that we are continually making progress on," they explain, and defended their training methods. "Training AI models using publicly available internet materials is fair use."

Even so, OpenAI acknowledged the validity of ethical considerations by providing an opt-out process for publishers.

AI training and content storage

The battle between content creators and AI companies seems to be a zero sum game for now, as the root of it all is the fundamental way that AI models are trained.

These models are developed using vast datasets comprising texts from various sources, including books, websites, and articles. Other models use paintings, illustrations, movies, voices, and songs, depending on what they are trained to create. These models do not retain specific articles or data, however. Instead, they analyze these materials to learn language patterns and structures.

This process is crucial to understanding the nature of the allegations and OpenAI's defense, and why AI trainers believe their businesses are using content in a fair manner—similar to how an art student studies another artist or art style to understand its characteristics.

However, creators—including the New York Times and best-selling authors—argue that companies like OpenAI are using their content in bad faith. They assert that their intellectual property is being exploited without permission or compensation, leading to AI-generated products that could potentially compete with and divert audiences from their original content.

The New York Times sued OpenAI saying that the use of their content without explicit permission undercuts the value of original journalism, emphasizing the potential negative impact on the production of independent journalism and its cost to society. And, it could be argued, no matter how elaborated the prompt is, if it “regurgitated” any kind of copyrighted content, it’s because it was used.

Whether it was used fairly or unfairly is up to the courts to decide.

This legal battle is part of a legal movement that could shape the future of AI, copyright laws, and journalism. As the case unfolds, it will undoubtedly influence the conversation surrounding the integration of AI in content creation and the rights of intellectual property owners in the digital era.

Still, OpenAI doesn’t believe this is a zero-sum scenario. Despite criticizing the lawsuit’s key points, Altman’s company said it is ready to extend an olive branch and find a positive outcome somewhere.

“We are hopeful for a constructive partnership with the New York Times and respect its long history, which includes reporting the first working neural network over 60 years ago and championing First Amendment freedoms.”

Edited by Ryan Ozawa.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices