The first two court decisions (Bartz v. Anthropic and Kadrey v. Meta) addressing the use of copyrighted works to train generative AI were released in late June in rapid fire succession. Both came from judges within the same federal district, and both ruled, based on the records before them, that the use was fair. But despite the same outcomes, there were significant differences in how they got there.
Given the fact that these decisions may eventually be appealed, and the many lawsuits pending in multiple other US jurisdictions, this is far from the final word on this issue, and I suspect it will be a while before any judicial consensus begins to emerge.
But it is worth noting there are at least two points on which both district courts agree: (1) that the use of copyrighted works to train an AI models is “transformative”, and (2) courts should ignore harm to the market for licensing works under the fourth fair use factor. I previously wrote about why I think the courts are wrong on the first point, and today I want to write about why I think they are wrong on the second point.
Defining market harm
The fourth fair use factor, “undoubtedly the single most important element of fair use,” directs courts to consider “the effect of the use upon the potential market for or value of the copyrighted work.” It involves looking at both harm to the original (i.e., does the new work serve as a substitute for the original work) and harm to the market for derivative works. This is not a damages analysis in the sense of focusing only on a measurement of already occurred harm. It is instead meant to focus prospectively, and look at not just the harm from the “particular actions of the alleged infringer,” but also the harm if the conduct would become widespread.
The plaintiffs in Bartz and Kadrey both advanced multiple forms of market harm caused by defendants resulting from their copying of works to train their AI models. Both provided evidence of a market for licensing copyrighted works for AI training. But in both cases, the courts dismissed this evidence as legally irrelevant, a conclusion that was a shock to many, especially given that this market is estimated to be worth $2.5 billion today and expected to grow to $30 billion within the decade.
Bartz provided little explanation for its conclusion, saying only,
A market [for licensing copyrighted works for training] could develop. Even so, such a market for that use is not one the Copyright Act entitles Authors to exploit. None of the cases cited by Authors requires a different result. All contemplated losses of something the Copyright Act properly protected — not the kinds of fair uses for which a copyright owner cannot rightly expect to control.
Kadrey offered somewhat more explanation, saying,
But whether such a market [for licensing works to train AI] exists or is likely to develop is irrelevant, because this market is not one that the plaintiffs are legally entitled to monopolize. In every fair use case, the “plaintiff suffers a loss of a potential market if that potential [market] is defined as the theoretical market for licensing” the use at issue in the case. Therefore, to prevent the fourth factor analysis from becoming circular and favoring the rightsholder in every case, harm from the loss of fees paid to license a work for a transformative purpose is not cognizable.
In short, both courts decided copyright owners are not entitled to the market for licensing their works to AI developers for training, and thus courts should not consider that market when engaging in a fair use analysis.
No doubt, lawyers in these and other AI cases will argue that both courts seriously misread the case law. For example, it is notable that neither judge cites to Campbell v. Acuff-Rose, the one Supreme Court decision that spoke directly to what markets a copyright owner is entitled to under the fourth fair use factor, and how it seems unreasonable to twist Campbell’s reluctance to infer harm to a potential licensing market that copyright owners are unlikely to develop into a rule that courts should ignore an actual licensing market that copyright owners are actively engaged in. They may also point out that both courts have engaged in the type of categorical, bright-line rules that are disfavored in fair use. They have impermissibly collapsed the fair use analysis into a single inquiry of transformativeness, only a few years after Warhol warned otherwise. There are likely additional legal arguments that would persuade an appellate court to reverse these holdings.
When markets advance the goals of copyright
But it is also worth looking at this issue through the lens of whether recognition of a market under the fourth factor would advance the goals of copyright. That is, after all, the basic premise of copyright. Each of the four fair use factors must be considered independently in light of this goal. If licensing would serve the purpose of copyright better than unpermissioned and uncompensated use, shouldn’t that weigh against fair use?
It’s long been observed that a licensing market for criticism and parody would do little to promote the goals of copyright. Recognizing such a market might, for example, distort how critical uses are produced, which in turn would distort discourse in socially undesirable ways.
By contrast, a licensing market for AI training would promote the goals of copyright in at least four ways:
First, licensing creates incentives for high-quality data. A lot of work is needed to curate training datasets for specific purposes, clean and prepare the data to optimize model performance, and translate it into training-ready formats. With a market for licensing training datasets, publishers and other copyright owners will be motivated to make their own catalogs of works more attractive to AI developers. Third party aggregators and platforms will also be incentivized to curate and provide high-quality datasets with a competitive marketplace. In the absence of a market, only the largest companies will have the time and money to do this work, and they will be disincentivized from sharing datasets to prevent competitors from free-riding off their investments, inhibiting the broadest dissemination of copyrighted works for training.
Second, licensing reinforces public access. If scraping publicly available content is allowed without compensation, publishers are given a clear incentive to move their works behind paywalls, authentication barriers, or technical protections. Ironically, in the name of free access for AI, fair use may end up reducing access for humans. But if publishers know that public access does not mean uncompensated use, they are more likely to keep their content openly available, confident that doing so will not render it a free resource for commercial appropriation.
Third, licensing is pro-competitive. It lowers barriers to entry and broadens access to high-quality datasets, which means more AI companies can compete, leading to increased innovation, product variety, and lower costs. Licensing enables specialization, allowing licensors to deepen their expertise and rewarding them in the market for increasing efficiencies in creating and providing training datasets. Contrary to claims that licensing would favor big companies over small, it is the world that ignores copyright that is a winner takes all world, where the competitive edge goes only to the largest companies that can devote the most resources to collecting and building training datasets.
Fourth, licensing creates certainty. “Because copyright law ultimately serves the purpose of enriching the general public through access to creative works,” says the Supreme Court, “it is peculiarly important that the boundaries of copyright law be demarcated as clearly as possible.” The use of copyrighted works by AI developers is complex, far more complex than uses like criticism or parody, with multiple potential uses on the ingestion side and potential reproductions occurring on the output side. Even if courts begin to find that the actual training with copyrighted works is fair use, that still leaves substantial legal uncertainty around different factual scenarios, ancillary uses of works, and downstream uses.
The decision in Bartz is proof positive of this. Although the court held that the copies of works used to train Anthropic’s LLM were justified under fair use, it concluded that the downloading of pirate copies was not fair use, and any internal copying that was not part of the training process may still be infringing. Licensing can prospectively address these issues with far greater certainty and allow parties to allocate risks more efficiently, leading to increased investment and innovation.
Perhaps most importantly when it comes to the constitutional goals of copyright: licensing rewards authors and publishers. Creators of books, journalism, photography, and other expressive works are essential to the cultural and informational commons. Licensing allows them to share in the commercial success of AI, providing income streams that support continued creation. The public continues to benefit from their creative work, and the next generation of AI benefits from a renewable source of high-quality training materials.
Conclusion
Both Bartz and Kadrey have created a bright-line rule that transformativeness equals fair use. A categorical rule that excludes consideration of harm to markets for uses that a court determines are transformative effectively collapses the four factor analysis into a single inquiry. If this rule stands, it would put fair use at odds with the goals of copyright, and turn it into a doctrine that exists for the private benefit of the largest commercial actors in the world at the expense of the public good.