The United States District Court for Delaware rejected a fair use defense to the use of copyrighted works to train a natural language processing AI system in a decision that can have significant implications for providers and users of AI products and services. This decision highlights the copyright infringement risks of using third-party materials to train, develop, improve or operate AI tools. However, this case remains only one of many between rights owners, model developers and the AI ecosystem, and the facts and particularities of the technology at issue may change how this decision impacts other cases.
Background
Thomson Reuters sued Ross Intelligence (Ross) alleging that Ross infringed its copyrights where Ross used Thomson Reuters’s Westlaw headnotes to train Ross’s new AI legal-research search engine. Thomson Reuters owns Westlaw, a prominent legal research platform that includes legal materials and editorial content, like headnotes, that summarize key points of law and case holdings and is organized using the “Key Number System,” both of which Thomson Reuters claims as its copyrighted material. Notably, Ross first sought to license Westlaw's content to train its AI system, but Thomson Reuters refused because Ross was its direct competitor. Ross then turned to a legal analytics company, LegalEase, to compile AI training data in the form of “Bulk Memos.” These Bulk Memos are lawyers’ compilations of various legal questions with good and bad answers. To create its Bulk Memos, LegalEase gave each contributing lawyer a guide explaining how to create those legal questions using Westlaw headnotes, although the same instructions stated that headnotes should not be copied and pasted directly into the questions. LegalEase then sold those Bulk Memos to Ross, which Ross ultimately used to train its AI search engine tool.
Initially in 2023, Circuit Judge Stephanos Bibas largely denied Thomson Reuters’s motions for summary judgment on copyright infringement and the fair use defense. But after reviewing the case materials more closely, Judge Bibas chose to reconsider his decision and invited the parties to renew their summary judgment briefings. Ross argued the following defenses: (1) innocent infringement, (2) copyright misuse, (3) merger, (4) scènes á faire and (5) fair use. The court quickly dispensed of the first four defenses and focused primarily on fair use, discussed in more detail below.
Decision
The court granted partial summary judgment for Thomson Reuters on its claims of direct infringement and fair use, while denying Ross’s defenses – finding that Ross infringed on 2,243 Westlaw headnotes – and upheld the validity of Thomson Reuters’s copyrights, with the exception of those that may have expired. The court rejected Ross’s arguments that its use of the headnotes was innocent (noting that this defense is not applicable where the infringed work bears a copyright notice) or justified by the doctrines of merger, scènes á faire or copyright misuse. The court also rejected Ross’s fair use defense, finding that two of the four fair use factors weighed against Ross – namely, that Ross’s use harmed the market for Thomson Reuters’s headnotes and derivative products, as discussed in more detail below. The remaining issues for trial were the factual question of which headnotes are still protected by copyright.
Copyright infringement risks of AI training data
It should be noted that the decision addresses the use of non-generative AI in the context of copyright infringement such as Ross’s AI (described as non-generative) did not create new content but rather used existing content to provide relevant judicial opinions in response to user questions. The court’s analysis of the four fair use factors weighed in favor of Thomson Reuters in this non-generative context:
Factor 1 favors Thomson Reuters: Ross’s use of AI training data was commercial, not transformative.
The court reasoned that the use of headnotes as AI training data is not protected as “transformative use,” even at the intermediate stage of product development. The court distinguished this case from previous cases involving intermediate copying of software code, emphasizing that those cases involved functional elements of computer programs, whereas the copyrighted material in this case was used not out of necessity but “to make it easier to develop a competing legal research tool.” The court appeared to be swayed heavily by the competitive nature of the tool even in the transformative analysis.
Factor 2 favors Ross: The Westlaw headnotes and Key Number System are minimally creative.
In its copyright validity analysis, the court emphasized that even though the quoted or paraphrased judicial opinions are not copyrightable, “distilling, synthesizing, or explaining part of an opinion,” as demonstrated by the Westlaw headnotes, was deemed creative enough to constitute originality for the sake of copyright protection. However, in its fair use analysis, the court acknowledged that while Westlaw’s material has more than the minimal spark of originality required for copyright validity, the material is not that creative and is particularly far less creative than the work of a novelist or an artist drafting from scratch. The court ultimately decided this factor in favor of Ross but pointed out the relatively low weight of the factor in the overall fair use analysis.
Factor 3 favors Ross: Thomson Reuters’s AI training data was an insubstantial component of Ross’s output product.
In deciding this factor, the court considered whether Ross took the “heart” of Thomson Reuters’s work and, if so, how much the copied material was used in Ross’s final output product for the public. The court ultimately decided this factor in Ross’s favor because Ross did not make Westlaw headnotes available to the public in its AI search engine tool.
Factor 4 favors Thomson Reuters: Ross’s product could impact Thomson Reuters’s potential AI training data market.
Even though Thomson Reuters, according to the court, does not yet have an actual market for AI training data, the court determined that Ross’s use of Westlaw headnotes could harm Thomson Reuters’s potential market for AI training data. Particularly, the court found that Ross intended to compete with Westlaw by developing a market substitute. Thus, defenses for using AI training data may be denied even where there is not an established market for AI training data but there is the potential for one.
Impact of decision
As the first decision substantively addressing fair use in the context of AI training, the Thomson Reuters decision is significant. It signals cause to approach the use of data for training purposes cautiously and to adjust the risk analysis if relying on a fair use defense. However, it is equally important to recognize that this decision was decided under the specific context where Ross’s final output product was a legal-research search engine tool that utilizes artificial intelligence, not a generative AI tool. It will therefore be important to assess potential risks associated with the use of data for training AI with a close understanding of the AI technology itself and how these arguments may impact the potential risk analyses in context. For example, generative AI tools may have a more compelling argument for transformative use. The decision will still have implications for the rapidly evolving AI landscape, while pending and future cases will likely further address the use of copyrighted AI training data in the development of other forms of AI products.
Actionable steps for businesses
Providers, users and distributors of AI tools should be aware of the potential risks and liabilities involved in using copyrighted materials – including content that may be factual with minimal creativity – to train, improve or operate their AI products or services. Legal teams should be proactive in analyzing potential liability and defenses, taking into account the particular nature of the data used as well as the nature and use of the model or AI tool. Companies looking to train or fine-tune AI tools should exercise caution and implement measures to minimize the risks of copyright infringement liability and to leverage AI tools in a compliant manner.
(1) Explore and clear alternative data sources for training. When possible, consider using works in the public domain or those that have clear license terms to reduce infringement risks.
(2) Due diligence: review and audit AI training data, sources and output. Identify and assess the copyright and ownership status of data and content used or planned for use with AI, and consider licenses for copyrighted data.
(3) Licensing protocols. Establish clear protocols for obtaining licenses for third-party content used with AI or for licensing your own AI outputs to others, including proper representations, audit rights and indemnifications.
(4) Monitoring developments. This case is set for trial on the remaining issues in March 2025, and Reed Smith will continue to follow the evolving landscape of copyright and AI as these cases move forward.
Client Alert 2025-072