OpenAI Trained AI Models on Copyrighted Work, Says NYT Lawsuit

Current methods for training AI on copyrighted work means “less journalism will be produced, and the cost to society will be enormous,” says the NYT.

Dec 27, 2023

4 min read

The New York Times has launched a lawsuit against OpenAI and Microsoft, alleging that millions of its articles were improperly used to train AI models which now stand as direct competitors in the information and news landscape.

The lawsuit says OpenAI was “using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.” This legal action highlights a growing concern over the use of copyrighted material in the development of artificial intelligence tools. The outcome of the lawsuit could play a role in influencing the future landscape of digital content and intellectual property rights if it gains support in court.

“OpenAI and Microsoft have built a business valued into the tens of billions of dollars by taking the combined works of humanity without permission” The New York Times says, “In training their models, Defendants reproduced copyrighted material to exploit precisely what the Copyright Act was designed to protect: the elements of protectible expression within them, like the style, word choice, and arrangement and presentation of facts,” the lawsuit reads.

Large language models (LLMs) like OpenAI's ChatGPT are at the heart of this controversy. LLMs are trained using vast datasets, including texts from books, websites, and articles, to understand and generate language in a human-like manner. They don't retain specific articles or data. Instead they use them to learn patterns and information structures. This training enables them to produce content across various topics and styles, entering domains traditionally reserved for human experts.

However, The New York Times argues that OpenAI paid special attention to its articles when shaping up its model’s personality. “While Defendants engaged in wide scale copying from many sources, they gave Times content particular emphasis when building their LLMs—revealing a preference that recognizes the value of those works,” they said.

Considering the millions of pieces of media OpenAI’s LLM was trained on, it’s not all that surprising that this isn’t the first legal challenge for OpenAI or the broader generative AI community. Recently, a group of renowned authors, including Pulitzer Prize winners Taylor Branch, Stacy Schiff, and Kai Bird, represented by Julian Sancton, brought a lawsuit against OpenAI for similar allegations of using their works without permission. This suit underscores a growing trend of creatives and professionals pushing back against AI's unfettered access to their intellectual property.

The Value of Originality

The landscape of generative AI isn't just limited to text. Advances in AI art have seen their share of controversies, with several lawsuits challenging the copyright implications of AI-generated artworks in fields like movies, music and illustration. However, some of these cases have been dismissed, indicating a complex and evolving legal understanding of AI's creative capabilities and its relationship with existing copyright laws.

The New York Times's lawsuit is particularly significant as it represents the first major media organization to directly challenge tech giants over the alleged unauthorized use of its content. The suit does not specify a monetary sum but suggests that the infringement has caused substantial damages, warranting significant compensation and corrective action.

“Without the wide corpus of copyrighted material to feed off of, there would be no ChatGPT," the lawsuit claims. "Defendants’ commercial success was possible only because they copied and digested the protected, copyrightable expression contained in billions of pages of actual text, across millions of copyrighted works—all without paying a penny to authors and rightsholders."

The broader implications of this lawsuit extend to how AI companies might continue to access and use existing content. The legal challenge by The New York Times to OpenAI and Microsoft sets the stage for a broader conversation about the intersection of technology, law, and creative rights. The lawsuit underscores the concerns of content creators regarding the threats of Ai-powered competition.

“If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill,” The Times argues in its lawsuit, “less journalism will be produced, and the cost to society will be enormous.” For The Times, this means less journalists, for others this means the end of a working society as we know it.

Edited by Stacy Elliott.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Coin Prices