Content area
Generative AI, machine learning, and other computational uses of copyrighted works pose profound conceptual questions for copyright law. This Article surveys multiple countries with different legal traditions and local conditions to explore how they have responded to these questions in relation to the use of copyrighted works for AI training without express permission from the relevant rightsholders. Our survey suggests an emerging international equilibrium in which jurisdictions from around the world have found ways to reconcile copyright law and AI training. In this equilibrium, countries recognize that text and data mining, computational data analysis, and AI training can be socially valuable and may not inherently prejudice the copyright holders' legitimate interests. Such uses should therefore be allowed without express authorization in some, but not all, circumstances. We identify three forces driving toward this equilibrium: (1) the centrality of the idea-expression distinction in copyright law; (2) global competition in AI; and (3) the race to the middle among countries undertaking copyright law reforms. However, we also address factors that may upset this emerging equilibrium, including ongoing copyright litigation, partnerships, and licensing deals in the United States, as well as legislative and regulatory efforts in both the United States and the European Union, most notably the adoption of the EU Artificial Intelligence Act. A key lesson of our multi-country survey is that, globally, the binary policy debate that assumes that text and data mining and AI training must be categorically condemned or applauded has been eclipsed by a more granular debate about the specific circumstances in which the unlicensed use of copyrighted works for AI training should be allowed or prohibited. Countries that have hesitated until now to modernize their copyright laws in the area of AI training have several templates open to them and little reason for hesitation.
ABSTRACT
Generative Al, machine learning, and other computational uses of copyrighted works pose profound conceptual questions for copyright law. This Article surveys multiple countries with different legal traditions and local conditions to explore how they have responded to these questions in relation to the use of copyrighted works for Al training without express permission from the relevant rightsholders. Our survey suggests an emerging international equilibrium in which jurisdictions from around the world have found ways to reconcile copyright law and Al training. In this equilibrium, countries recognize that text and data mining, computational data analysis, and Al training can be socially valuable and may not inherently prejudice the copyright holders' legitimate interests. Such uses should therefore be allowed without express authorization in some, but not all, circumstances.
We identify three forces driving toward this equilibrium: (1) the centrality of the idea-expression distinction in copyright law; (2) global competition in Al; and (3) the race to the middle among countries undertaking copyright law reforms. However, we also address factors that may upset this emerging equilibrium, including ongoing copyright litigation, partnerships, and licensing deals in the United States, as well as legislative and regulatory efforts in both the United States and the European Union, most notably the adoption of the EU Artificial Intelligence Act.
A key lesson of our multi-country survey is that, globally, the binary policy debate that assumes that text and data mining and Al training must be categorically condemned or applauded has been eclipsed by a more granular debate about the specific circumstances in which the unlicensed use of copyrighted works for Al training should be allowed or prohibited. Countries that have hesitated until now to modernize their copyright laws in the area of AI training have several templates open to them and little reason for hesitation.
INTRODUCTION
The arrival of generative artificial intelligence (АТ) has set the world on fire. China, the European Union, the United States, and other countries are actively engaging in a race to advance and control cutting-edge Al technology, and generative Al in particular.! Policymakers have called for hearings, listening sessions, and public comments to better understand the promises and perils of Al, while legislatures have introduced new regulations to govern this new technology.? Meanwhile, copyright holders have filed individual and class action lawsuits against generative Al developers, claiming billions of dollars in damages and calling for the destruction of AI models? In turn, generative Al developers have begun to negotiate partnerships and licensing deals with publishing houses and media conglomerates.
At the international level, the World Intellectual Property Organization (WIPO) launched an initiative to address issues at the intersection of intellectual property and АТ.5 The U.N. Secretary-General created the High-Level Advisory Body on Artificial Intelligence to align AI development more closely with the needs of humanity .® Recent cross-border collaborations, including deals such as Microsoft's investment in French startup Mistral АГ and Google and Amazon's investment in Anthropic.® raise concerns about power and concentration in increasingly intertwined global technology markets.
Amid this great policy upheaval, we believe it is worth considering whether AI disruption has brought the world's major copyright systems closer together or driven them further apart. Countries have different legal traditions, economic conditions, technological capabilities, political systems, and cultural backgrounds." Their copyright laws understandably have varied origins, justifications, concepts, doctrines, and vocabularies.!® Meanwhile, issues relating to generative Al technology pose new conceptual challenges to copyright law. A case in point is the use of copyrighted works to train АТ models without express permission from the relevant rightsholders.!!
This Article explores whether copyright law has converged or diverged globally in the area of Al training. Part I introduces the challenge posed by the "nonexpressive use" of copyrighted works.!" It explains why practices such as reverse engineering object code, checking student papers for plagiarism, scanning library books to build a search engine index, subjecting entire libraries of books to statistical analysis, and using text to train machine learning algorithms present something of a puzzle for copyright law. This Part explores how generative Al is similar to and different from prior nonexpressive uses. https://www.reuters.com/technology/uks-antitrust-regulator-probe-amazons-ai-partnership-with-anthropic2024-08-08
Part II surveys three distinct groups of jurisdictions: (1) those with a fair use regime or its close variants; (2) those with express copyright exceptions for text and data mining (TDM) or computational data analysis; and (3) those actively pursuing Al development without updated copyright laws to facilitate Al training. This Part argues that, although the world has yet to achieve consensus on copyright and Al training, an international equilibrium has emerged. In this equilibrium, countries recognize that TDM, computational data analysis, and Al training can be socially valuable and may not inherently prejudice the copyright holders' legitimate interests. Such uses should therefore be allowed without express authorization in some, but not all, circumstances. On the issue of copyright exceptions for Al training, countries have opted for a middle path, instead of racing to the top or the bottom. Through a systematic analysis of exceptions across the world, we also show that some jurisdictions provide greater scope for training Al and machine learning models than others.
Part III identifies three contributing factors to the emergence of an international equilibrium on copyright and Al training: (1) the centrality of the idea-expression distinction in copyright law; (2) global competition in Al; and (3) the race to the middle among countries undertaking copyright law reforms. Part IV outlines uncertainties that may upset this emerging equilibrium. It discusses ongoing copyright litigation, partnerships, and licensing deals in the United States, as well as legislative and regulatory efforts in both the United States and the European Union, including the adoption of the EU Artificial Intelligence Act (EU AI Act).'3 Part У concludes with six key lessons drawn from our multi-country survey.
I. THE NONEXPRESSIVE USE OF COPYRIGHTED WORKS
Copyright law was invented in response to the printing press.'· Although times have changed, the printing press remains our dominant metaphor for how copyright functions. Copyright provides an incentive to authors whose works would otherwise be freely copied upon first publication, and the reproduction of a work naturally serves as the locus of exchange between the author and the reader. But what if there 1s no reader?
In the 1990s, for the first time, we began to see economically significant acts of copying that had nothing to do with communication or transmission of the underlying expression. Some prominent examples include reverse engineering object code, running student papers through plagiarism-detection systems, '· scraping webpages to build a search engine index," doing likewise with library books,'® and conducting meta-analysis of entire libraries of books." These nonexpressive uses pose a difficult conceptual issue for copyright law. The centerpiece of copyright law is the exclusive right to reproduce the work, yet the purpose underpinning that right is to allow authors to control the communication of their original expression to the public while still allowing ideas, facts, abstractions, and artistic methods to be freely copied." Whether and how copyright law should allow for nonexpressive uses is a critically important question for a variety of technologies most saliently machine learning and generative Al
The term "artificial intelligence" broadly encompasses computer systems that replace the need for human perception or judgment in a particular domain?! Most of what we now think of as Al is performed by some kind of machine learning." Machine learning models make predictions and classifications based on patterns distilled from the training data without being preprogrammed with any explicit theory." For instance, a machine learning model trained to distinguish between medical scans of patients with and without cancer can achieve high rates of accuracy without any understanding of human anatomy and without anyone suggesting which features to look for.?·
Section A discusses the concept of the nonexpressive use of copyrighted works as it applies to Al and machine learning. This section situates generative Al т a series of technologies that raised similar copyright questions in the past and shows how this new technology aligns with previous nonexpressive uses. Section B considers the potential distinctions between Al-related nonexpressive uses and earlier nonexpressive uses involving TDM. This section provides the background needed for an extended discussion of copyright exceptions for Al training from around the world, as well as the ongoing litigation, partnerships and licensing deals, and legislative and regulatory efforts involving generative Al developers.
A. Nonexpressive Use and Generative AI
"Generative АГ" usually describes machine learning models that employ patterns derived from training data to create new digital artifacts that are a reasonable facsimile of human expression." As one of us testified in a U.S. Senate hearing two years ago:
Although we are still a long way from the science fiction version of artificial general intelligence that thinks, feels, and refuses to "open the pod bay doors", . . . [w]e now have large language models ("LLMs") that can pass the bar exam, carry on a conversation on almost any topic, create new music, and new visual art.2°
Foundation models, such as OpenAT's now-famous GPT series, Meta's Llama series, and Anthropic's Claude series, are at the heart of a global debate over Al regulation, including regulation at the intersection of copyright and AI?'
Amid great optimism about our generative Al future are many fears: Al trained on a flawed society may exacerbate and perpetuate historical biases; Al tools may be misused to generate misinformation and biological weapons; generative Al may abruptly displace the need for human labor in fields ranging from legal practice to graphic arts: and, ultimately, today's generative AI models may be precursors to super-intelligent systems that are not aligned with the best interests of humanity 3! If concerns about Al's existential risks or optimistic projections of our Al-enabled future have any merit,· one might ask why copyright issues feature so prominently in current policy discussions. The answer is that, while copyright may not be the most important issue in relation to generative Al, the copy-reliant® nature of generative Al guarantees copyright law a seat at the table.
To appreciate the copyright issues at stake in current debates over generative Al, one must have at least a basic understanding of how models are trained. Our discussion focuses on text-based LLMs for narrative convenience.·· The first step in training an LLM is to collect and analyze a staggering quantity of training data.®® Presently, the majority of that data is invariably scraped from the Internet and stored on servers where it can be analyzed, deduplicated, and filtered for toxic and inappropriate content.' The desired training data is then converted into "tokens" represented by numbers that correspond to words, parts of words, and various punctuation marks.37
There are two critical observations about this preliminary part of the training process. First, downloading copyrighted works from the Internet and storing them on a server for more than a transitory duration is clearly an act of reproduction under U.S. copyright law.3® Likewise, converting that text into numerically represented tokens is an act covered by the law because those numerical representations could in theory be easily reversed into something human-readable.39
Second, there is something special about these copies: no one is likely to ever sec or read them. What then becomes of these unread invisible copies? Once training data has been appropriately filtered and tokenized, it can be used to train an AI model." A "large" language model typically comprises billions or trillions of parameters.·! At the beginning of training, these parameters are each assigned an arbitrary weight." During training, the model is exposed to sequences of tokens from the training data and instructed to predict the next most likely token." We can elide some fascinating computer science by simply summarizing that the relevant guesses are then evaluated, and the model weights are adjusted accordingly. · The initial guesses are almost invariably gibberish, but after a great many rounds of training, the model begins to make better and better predictions.· This training process culminates with a "pre-trained" model which explains the acronym "GPT," which stands for "generative pre-trained model." Despite the astonishing abilities and apparent versatility of these models, a model like GPT-40 only does one thing: it predicts the next token in a sequence of tokens.·" However, when each new token becomes part of the background sequence for the next prediction, prediction becomes generation.' After all, even Shakespeare wrote only one word at a time.
Generative АТ models are not designed to memorize their training data.· In general, these models are significantly smaller than the volume of information they are exposed to.°° This setup forces the model to learn at a somewhat abstract level, rather than through memorization.5' By deriving abstractions and heuristics from the training data, generative AI models improve at responding to new inputs, allowing them to navigate the latent space implied by the data, rather than simply remixing and reproducing the data as output." This design has important implications for copyright law.
To begin with, so long as the model has not memorized a particular work, it does not include a copy of that work.>? This fact alone does not address the issue of copying data used to train the model, but it is critical to understand that the model and the training data are quite separate things. Furthermore, if a generative Al model is not a copy of the relevant training data, the digital artifacts produced by that model will not be copies of the data. We do not deny the attenuated causal link between training data and model outputs.>· We simply note that if one created outputs through a process that did not involve creating invisible intermediate copies, those outputs would fail the traditional test of substantial similarity in copyright law." If someone asks ChatGPT to tell a bedtime story about a hard-boiled detective bear living in Helsinki, the resulting story will owe something to the works of Dashiell Hammett, but it will not be an infringing copy of The Maltese Falcon" because literary style and genre are not protectable aspects of copyright.' Nevertheless, some researchers have found generative Al models to memorize significant parts of particular works in the training data" or learn enough about the subject matter protected by copyright law at a higher level of abstraction, such as copyrightable fictional characters. Such memorization suggests a real possibility of copyright infringement.
Machine learning and generative Al raise a vital copyright question for jurisdictions around the world: should it be permissible to create hidden intermediate copies of copyrighted works that no one will ever see or read, so that a statistical model can learn uncopyrightable and abstract features of those works? Intuitions about this question turn on the relative importance one assigns to the mere fact that a technical act of copying has occurred, as opposed to the purpose and effect of that technical reproduction. Even taking the view that copyright should permit some nonexpressive uses, subsidiary questions arise. These questions include whether it matters if uncopyrightable and abstract features derived through machine learning remain as abstract metadata and whether they are used to produce new and different digital artifacts-if so, whether generative Al deserves special rules that differ from those applied to other forms of machine learning and TDM.
B. Potential Distinctions Between AI Training and TDM
For a long time, and in many different contexts, researchers have used computational processes and statistical methods to discover new information and reveal patterns in unstructured text data, usually referred to as TDM or computational data analysis. The growing importance of TDM as a method of discovery in computer science, the life sciences, literary criticism, linguistics, history, and other disciplines has spurred many proposals for legislative change." Indeed, countries began addressing the copyright implications of nonexpressive use in relation to TDM in the late 2000s.% At that time, the logic of viewing nonexpressive use as fair use under U.S. law was already apparent, but the case law was hardly as well defined as it is today."
Although the copyright implications of machine leaming seemed indistinguishable from other forms of TDM prior to generative ALS there are notable differences between machine learning and generative Al. First, generative Al does not simply analyze training data to derive useful information; it can produce digital artifacts in the same form as its training data. Even if the outputs do not approach substantial similarity-the threshold for copyright infringement-they may nonetheless compete directly with the works used to train AI models or with the copyright holders of those works." In the United States, this prospect of indirect substitution could complicate the fair use analysis with respect to the fourth factor, "the effect of the use upon the potential market for or value of the copyrighted work." In jurisdictions whose copyright laws have incorporated the "three-step test"·-a framework laid out in key international intellectual property instruments for determining the acceptability of copyright limitations and exceptions""-such prospect may also influence the assessment of whether the use would "unreasonably prejudice the legitimate interests of the right holder" (the last step of the three-step test)."!
Second, as noted above, some generative AI models have been shown to occasionally reproduce digital artifacts with more than a passing resemblance to particular works in their training data."> On such occasions, any argument that the work reproduced was used in a transformative or nonexpressive fashion becomes strained.
Understanding these potential distinctions will be important as we explore the international copyright law developments in the area of Al training in the next Part. Copyright holders who filed lawsuits against generative Al developers, as well as their supportive policymakers and commentators, have also utilized these potential distinctions to support their arguments and to push for their preferred copyright law reforms.
II. GLOBAL RESPONSES TO THE CHALLENGE OF NONEXPRESSIVE USE
Having set out the challenge posed by the nonexpressive use of copyrighted works above, we now turn to how different jurisdictions have responded to the tensions copyright law has posed to TDM, machine learning, and generative Al. Necessarily, this Part paints with a broad brush to maintain our focus on crosscountry comparisons. Section A examines those jurisdictions with a fair use regime or its close variants. Section B turns to jurisdictions with express copyright exceptions for TDM or computational data analysis. Section C covers countries actively pursuing Al development without updated copyright laws to facilitate Al training.
A. Fair Use and Close Variants
Most commonly associated with U.S. copyright law,7· the term "fair use" is generally used to refer to an open system of copyright limitations and exceptions." Such open-endedness is especially attractive for "creating a positive environment ... for innovation and investment in innovation," including in the AI sector. Many policymakers and commentators have also credited the fair use provision for the success of U.S. technology companies, such as Google and Facebook (now Meta)."· Today, a growing number of countries have adopted or considered the fair use regime or its close variants."" This section first discusses the fair use regimes in the United States and Israel before turning briefly to other fair use regimes around the world.
I. United States and Israel
In the United States, the doctrine of fair use was originally judge-made but later codified in Section 107 of the 1976 Copyright Act."® In Israel, that doctrine also emerged in case law before codification." In 2007, the country adopted a statutory provision that closely tracked the text and structure of the U.S. fair use provision." Because of the remarkable similarities between these two provisions, this section discusses them together.
Our view of American law is based on how courts have treated analogous technologies using the four fair use factors provided in Section 107-namely, (1) "the purpose and character of the use"; (2) "the nature of the copyrighted work"; (3) "the amount and substantiality of the portion used in relation to the copyrighted work as a whole"; and (4) "the effect of the use upon the potential market for or value of the copyrighted work."·! In contexts ranging from reverse engineering software to text mining millions of library books, U.S. courts have consistently held that technical acts of copying that do not communicate an author's original expression to a new audience constitute fair use.·" By contrast, courts have found no fair use in seemingly analogous cases where the challenged conduct appeared to have exceeded the limits of nonexpressive use and communicated significant expressive material to a new audience.
The Ministry of Justice of Israel reached similar conclusions in its advisory opinion entitled Uses of Copyrighted Materials for Machine Learning (MOJ Opinion" or "Opinion").5· Responding to the Ministry of Defense's query concerning the flagship government program for Al infrastructures that was not directed to any specific contested facts, the Opinion explains the necessity of copying large volumes of text and other copyrighted works for AI training 55 It further notes that in the ordinary course, "each individual work is a single component in an enormous dataset and holds an immaterial weight in the dataset."°° The Opinion presumes that the purpose of developing Al systems is not to produce digital artifacts that closely resemble the training data.·" Instead, it is premised on machine learning training qualifying as a nonexpressive use. Based on this premise, the Opinion concludes that "the fair use doctrine . . . typically permits the creation of [machine learning] datasets."
Allowing genuinely nonexpressive uses makes sense in terms of the four statutory fair use factors in both the United States and Israel. With respect to the first factor under U.S. copyright law, analyzing existing works to derive metadata or uncopyrightable abstractions and associations is "quintessentially transformative," and the use can be justified in terms of its "purpose and character." The mere fact that these abstractions and associations are used to generate new expressions should not diminish their transformative nature so long as those expressions are not substantially similar to the training data. After all, creating new expressions is the goal of copyright"! In Israel, the МОЛ Opinion agrees, noting that "such use is as transformative as it gets" so long as the system does not "produce outputs that would highly resemble their inputs."
Given that such nonexpressive uses are highly transformative, the commercial nature of much Al development is unlikely to weigh against fair use. In the United States, the Supreme Court's recent foray into fair use in Andy Warhol Foundation for the Visual Arts, Inc. у. Goldsmith" drew attention to commerciality, but the majority in that decision reiterated the holding in Campbell у. Acuff-Rose Music, Inc.· that commerciality "is to be weighed against the degree to which the use has a further purpose or different character'? and that "the more transformative the new work, the less will be the significance of other factors, like commercialism, that may weigh against a finding of fair use."
The view in Israel is the same." The МОЛ Opinion notes that "the paradigmatic case of [machine learning] dataset meets the [purpose and character] criterion . . . , given its societal value and transformative nature.""· The Opinion nonetheless cautions that commercial uses where the enterprise is not transformative are less likely to be fair use."
The highly transformative nature of nonexpressive Al training may further influence the analysis of the remaining factors. The second fair use factor, which directs courts to consider "the nature of the copyrighted work," has not been influential in fair use cases involving other nonexpressive uses.'% One of us has even suggested that the nature of the work is not truly a factor at all; it merely provides the factual context in which the other factors are evaluated.!! Consistent with U.S. case law, the MOJ Opinion in Israel places little weight on the difference between a more creative work and a more factual work in the machine learning context. !°2
Under U.S. copyright law, the third factor, "the amount and substantiality of the portion used,"'93 favors defendants in nonexpressive use cases because the ultimate question is whether the amount of copying is reasonable in relation to a purpose favored by fair use. '%· Our assessment of U.S. copyright law is echoed in the MOJ Opinion in Israel, which states that, although the works in the training data are usually copied in full, complete reproduction 1s necessary to extract unprotected elements like facts and ideas and to derive new observations from the training data.!% In both jurisdictions, making complete literal copies is reasonable as an intermediate technical step in an analytical process that does not lead to communicating the underlying original expression to a new audience. 106
Finally, based on the fourth factor in the U.S. fair use provision, using copyrighted works as training data for the development of AI models is unlikely to have a cognizable effect on the "potential market for or value of the copyrighted work, so long as the use is nonexpressive. This assessment may seem odd given the benefits commercial Al developers have derived from access to other people's works; but allowing such benefits tracks the fundamental commitment in copyright law to drawing a distinction between copyrightable expressions and uncopyrightable facts, ideas, styles, and abstractions.!°$ The "market" and "value" addressed in the fourth factor does not include a right to prevent quotations in critical book reviews or allegations of plagiarism, even 1f those quotations would cause economic harm.!% Moreover, courts have generally rebuffed circular arguments that copyright holders have a right to charge for nonexpressive uses because not being able to charge for such uses is a market harm under the fourth factor. 11°
In Israel, the MOJ Opinion makes a strong but nuanced case for Al training in terms of the fourth factor. It recognizes the lack of a present market for training data at the scale required for LLMs but concedes that such a market, were it to develop, could modify the application of the fair use doctrine.!!! Section IV.A.2 will discuss further whether recent licensing deals for access to training data may differentiate future generative Al cases from otherwise analogous cases such as Authors Guild, Inc. у. HathiTrust""" and Authors Guild у. Google, Inc. (Google Books).!!·
Our reading of existing U.S. case law 1s that nonexpressive uses pose no direct threat of expressive substitution and should generally be considered harmless under the fourth factor. However, that factor leaves room for considerations beyond direct expressive substitution.!!· Such considerations may include whether a defendant failed to adopt adequate security measures, accessed works by circumventing paywalls or disregarding robots.txt exclusions, and exploited caches of material on sites of known infringement.!' We note that each defendant in the nonexpressive use cases decided to date had lawful access to works copied. U.S. courts have not yet had a chance to consider scenarios where the works put to a nonexpressive purpose had first been copied unlawfully 116
The MOJ Opinion in Israel does not directly address these questions, but it suggests that fair use would not protect copying for a machine learning application that was "designed to mimic the style [of] a single author," due to the lack of sufficient transformativeness and the fact that such copying would pose too great a prospect of market harm.!17
2. Other Jurisdictions
The United States and Israel are not alone in adopting fair use. Liberia, Malaysia, the Philippines, Singapore, South Korea, Sri Lanka, and Taiwan have made similar choices,!'® and many of these jurisdictions did so at around the same time when Japan, the United Kingdom, and the European Union were crafting express copyright exceptions to allow for TDM, which the next Part will discuss. !!?
Despite criticisms by some policymakers, commentators, and industry representatives that U.S. fair use jurisprudence is incoherent in theory and unpredictable in application, !?· many jurisdictions have recognized that fair use has allowed U.S. copyright law to adapt more rapidly and effectively to the challenges of the digital age.!?! Although not all of these jurisdictions ended up incorporating fair use or a similar open standard into their copyright regimes, '?? all of them have recognized-in legislative proposals, consultation reports, or other documents-that the fair use doctrine has given the United States a distinct technological advantage in developing copy-reliant technologies such as Internet search.!·3
B. Express Exceptions for TDM or Computational Data Analysis
Unlike the United States and other countries with a fair use regime or its close variants, many jurisdictions have a closed system of copyright limitations and exceptions.!?· Under this arrangement, the copyright exception will only be available if the conduct at issue fits within a specified category, such as TDM or computational data analysis.!·· That the exception is express does not mean that courts will not undertake a balancing exercise, such as analyzing the four factors found in а fair use regime!·· or evaluating whether the use has "unreasonably prejudice[d] the legitimate interests of the right holder."'?7 As Michael Geist observes in regard to the distinction between fair use and an express copyright exception such as fair dealing: "The [fair dealing] model creates a two-stage analysis: first, whether the intended use qualifies for one of the permitted purposes, and second, whether the use itself meets the fairness criteria. By contrast, fair use raises only the second-stage analysis, since there are no statutory limitations on permitted purposes." 28
1. Japan
Japan was the first country to create an express exception to reduce the tension between TDM and copyright protection. In July 2009, Japan amended its Copyright Act by adding an exception supporting TDM or computational data analysis.!?· A decade later, Japan expanded that provision and combined it with Article 30-4!° to cover the use of a copyrighted work "in any way and to the extent considered necessary" when "it is not [the user's] purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in that work." Explicitly listed in the amended provision are three covered activities: (1) "testing [technology] to develop [it] or put[ting technology] into practical use"; (2) "data analysis"; and (3) "computer data processing."'3? The new Article 30-4 recalls the German civil law concept of Freier Werkgenuss (free enjoyment of a copyrighted work). 133
Because Article 30-4 of the Japanese Copyright Act was broadly drafted to cover any use that does not result in the enjoyment of a copyrighted work or facilitate such enjoyment, the provision is no longer a narrow exception for TDM or computational data analysis but a broad one for nonexpressive use.!·· Even though the exception was drafted before the arrival of generative Al, it covers the use of copyrighted works for AI training.!35 Applying to commercial and noncommercial uses alike, this sweeping exception 1s subject to a general limitation that the use must not "unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation,"'3· language drawn from the last step of the three-step test mentioned above.!37
In addition to Article 30-4, Japan introduced Article 47-5, which provides a new copyright exception for the "minor exploitation incidental to computerized data processing and the provision of the results thereof."'·· Working in tandem, these two provisions gave Japan arguably the world's broadest exception for TDM or computational data analysis. It is therefore small wonder that Tatsuhiro Ueno notes that "Japan could be said to be a "paradise" for machine learning and TDM."139
Notwithstanding this pro-AI development, policymakers and legislators have considered ways to tighten Article 30-4, due largely to concerns about the challenges posed by generative AL For example, in May 2024, the Council for Cultural Affairs of Japan published a nonbinding report entitled General Understanding on AI and Copyright in Japan, which outlines circumstances in the AI context that would and would not constitute the nonenjoyment of "the thoughts or sentiments expressed in [the copyrighted] work," including circumstances that would result in the simultaneous enjoyment and nonenjoyment of that work.!· The document suggests that businesses could be held liable if they knowingly collect training data from infringing sources. !·! It also points out that these businesses would "unreasonably prejudice the interests of the copyright holder" under Article 30-4 if they reproduce for Al training copyrighted databases that have available licenses and whose use has been restricted by technological protection measures (TPMs).!#
2. United Kingdom
The United Kingdom's TDM exception was inspired by the 2011 Hargreaves Review, which addressed the failure of UK intellectual property law to keep pace with technological advancements.'· The report specifically noted that the legal barriers to using TDM technologies could hinder scientific discovery and innovation and recommended that the United Kingdom facilitate access to TDM for noncommercial research by making it clear that such activity does not infringe copyright.'·
In May 2014, the United Kingdom enacted a narrow TDM exception.!· Section 29A of the Copyright, Designs and Patents Act 1988 ("СОРА") provides that it is not an infringement to copy a work so that "а person who has lawful access to the work may carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose. "!·· In addition to the limitations on "sole purpose," "lawful access," "research," and "non-commercial purpose," the provision is contingent on "sufficient acknowledgement" where practical and limitations on transfer of copies and subsequent dealing with copies'·-a requirement found in other UK fair dealing provisions.!· Within this narrow scope, however, those taking advantage of the UK TDM exception are immunized from contractual override-that 15, private agreements not to engage in TDM research permitted under the law will have no legal effect.!·°
In 2022, the United Kingdom expressed its intention to expand the TDM exception to cover commercial uses.!3! That plan, however, was abandoned a year later. 15?
3. European Union
The Directive on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Digital Single Market (DSM Directive) was adopted 1n April 2019 to modernize EU copyright law, promote cross-border access and market integration for digital goods and services, and balance the interests of various stakeholders in the digital economy. !53 At the time of the passage of this Directive, one of its less prominent and least controversial aspects was the twin TDM provisions.!·· In broad terms, Article 3 of the Directive establishes a farreaching exception for TDM by "research organizations and cultural heritage institutions," whereas Article 4 provides a more qualified exception that is open to all.!° Article 3 allows, for instance, a researcher working at a university to reproduce and store lawfully accessed works for TDM carried out for "purposes of scientific research."'5% Because Article 4 is not confined to "research organizations and cultural heritage institutions" or "purposes of scientific research," it 1s natural, if somewhat imprecise, to think of Article 3 as the notfor-profit, institutional, and research-focused exception and Article 4 as the commercial TDM exception.'·7 Although the DSM Directive does not mention machine learning explicitly, there is little doubt that the exceptions in Articles 3 and 4 extend to all forms of TDM, including machine learning and generative AL'S A recent German copyright case involving the LAION image database used for training AI models confirms as much.!>®
Article 3 requires EU members to allow "for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access."!'· The research exception in Article 3 is immune from contractual override'·! and is only subject to the TPMs deployed by rightsholders to the extent necessary to "ensure the security and integrity of the networks and databases™ hosting the protected works.!% Accordingly, the research exception is not subject to any express requirement to comply with rightsholder opt-outs. In contrast, the commercial TDM exception provided by Article 4 is not immune from contractual or technological override! and is expressly subject to the condition that the relevant TDM use "has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online." Copyright holders therefore have the ability to opt out of commercial TDM. !%
The EU blueprints for research and commercial TDM exceptions also diverge somewhat on the retention and storage of copies made in the course of TDM analysis. Article 3 allows copies to "be retained for the purposes of scientific research, including for the verification of research results," so long as they are "stored with an appropriate level of security." By contrast, Article 4 states that the relevant "[r]eproductions and extractions . . . may be retained for as long as is necessary for the purposes of text and data mining."
Importantly, both Articles 3 and 4 are subject to a general safeguard, incorporated by reference from the EU Directive on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society,!® which incorporates the last two steps of the three-step test mentioned above.!°° Article 5(5) of the Directive provides that copyright exceptions and limitations "shall only be applied in certain special cases which do not conflict with a normal exploitation of the work or other subject-matter and do not unreasonably prejudice the legitimate interests of the right holder."
4. Singapore
As part of broad-ranging amendments that became its Copyright Act 2021, Singapore introduced Section 244, an exception permitting making and retaining copies of lawfully accessed works for purposes of computational data analysis.!7! Such analysis is defined broadly to include "using a computer program to identify, extract and analyse information or data from the work or recording" and "using the work or recording as an example of a type of information or data to improve the functioning of a computer program in relation to that type of information or data."!"? Section 243 further provides as an illustration the use of "images to train a computer program to recognise images." With this illustration, it is beyond dispute that the new exception covers Al training.
The Singapore exception for computational data analysis applies to both commercial and noncommercial use. However, it is limited by a "lawful access" requirement!' and a duty to avoid training on infringing copies, including those found in a "flagrantly infringing online location."'" This exception does not allow for the circumvention of TPMs, which is prohibited under a separate provision in the Singapore Copyright Act.!'® Like the UK TDM exception and Article 3 of the EU DSM Directive, the Singapore exception is immune from contractual override!" and is not subject to any obligation to respect opt-outs.
Finally, although we discuss Singapore alongside Japan, the United Kingdom, and the European Union in relation to their express exceptions for TDM or computational data analysis, it 15 worth recalling that Singapore also has a fair use regime.!"® Having both provisions is both rare and important. It is rare because jurisdictions that have adopted an open-ended regime tend to rely on fair use to address new technological challenges posed to the copyright system. The existence of the two provisions, therefore, shows Singapore's determination to advance Al development regardless of the interpretation of the fair use provisions. Having both provisions is also important because if Singapore ends up interpreting the fair use provisions the same way as the United States in the context of nonexpressive use, Singapore will have an additional express exception to cover issues that may fall outside the scope of fair use."
C. Lack of Dedicated Exceptions Despite Active AI Development
After covering jurisdictions with a fair use regime or its close variants and those with express copyright exceptions for TDM or computational data analysis, this section turns to countries that have been actively pursuing Al development but that have not yet updated their copyright laws to facilitate Al training. Due to its limited length, this section explores, in turn, only China and the United Arab Emirates (UAE). The discussion of these countries raises interesting questions about the role of law in Al development, as well as the growing determination of countries to compete in the global Al race, regardless of whether they have completed the needed copyright law reforms.
1. China
Like the European Union and the United States, China is a major technological power that has received considerable attention in the global Al debate, especially in relation to its technological rivalry with other major geopolitical powers.! In the past decade, China has made considerable progress'·! and began seeking global AI leadership.!82 AI has featured prominently in the country's strategic plans for economic, social, scientific, and technological developments.'83 The State Council's Next-Generation Artificial Intelligence Development Plan outlines the target for China to become the world's major AI innovation center by 2030.84 China has also actively exported its model of technology development and regulation, '®> which will have major implications for global АТ development. According to the Artificial Intelligence Index Report 2024, produced by the Institute for Human-Centered Artificial Intelligence at Stanford University, China currently dominates the world in both AI patents and installations of industrial robots. !5 In a recent patent landscape report, WIPO also lists China-based Tencent, Ping An, Baidu, and the Chinese Academy of Sciences as the world's organizational leaders in volume of generative Al patents between 2014 and 2023.37 For comparison, IBM ranks only fifth in the same category.'8% As this Article entered production, Chinabased DeepSeek and its emerging competition with other major AI developers began gamering media and policy attention. 189
On November 11, 2020, amid the COVID-19 pandemic, China adopted the Third Amendment to the Copyright Law.!°° Entering into effect on June 1, 2021, this amendment provided a major overhaul of its copyright regime.!°! Article 24 of the amended statute enumerates circumstances in which a copyrighted work may be used without authorization or remuneration.!? Although the old provision on copyright limitations and exceptions provided only a closed list'3-similar to provisions found in jurisdictions with express copyright exceptions for TDM or computational data analysis-the latest amendment transformed that list by adding Clause 13, which covers "[o]ther circumstances provided for by laws and administrative regulations."!?· This change did not convert China to an open-ended regime like the U.S. fair use regime, but it is likely to promote Al training and development once the appropriate regulations have been introduced.
At the time of writing, China has not yet updated the Regulations for the Implementation of the Copyright Law, but it is anticipated to do so in the near future.'"> Should a new copyright exception for TDM or computational data analysis-similar to one found in Japan, the United Kingdom, the European Union, or Singapore!°·-be introduced through either the updated Implementing Regulations or a new set of Al-specific regulations, that exception could be read into Article 24(13) of the Copyright Law.
In July 2023, China adopted the Interim Measures for the Management of Generative Artificial Intelligence Services,!°" pioneering legislation that aimed to give China an early-mover advantage similar to the EU AI Act.'? While the measures mention intellectual property rights," the language is vague and open to interpretation. Specifically, Article 4 of the Interim Measures states that "[t]he provision and use of generative Al services shall ... [r]espect intellectual property rights and commercial ethics [and] protect business secrets."?" Article 7(3) further stipulates: "Where intellectual property rights are involved, the intellectual property rights that are lawfully enjoyed by others must not be infringed [by the providers of generative AI services]. "?°! How these provisions are to be interpreted will, of course, depend on the Chinese courts" determination of the legality of the use of copyrighted works to train AI models. Should such use be deemed legal, Articles 4 and 7 will not provide additional protection to copyright holders.
The development of Al technology in China and the role copyright law plays in such development remain to be seen. It will be intriguing to observe how the country balances the usual tensions between exerting control over technology companies in a state-driven economy and providing them with the freedom to grow into the national champions that the country will need to stay ahead in global AI competition." Moreover, as the European Union, the United States, and other countries continue to adopt nationalist policies to compete with China, what policies these jurisdictions will adopt and how China will respond will create an additional layer of uncertainty.
2. United Arab Emirates
The UAE provides another interesting case study due to its substantial investment in AI technology?" and its mixed legal system featuring both civil law and Shari'a.?® Released in 2017, its national AI strategy makes clear its "vision to become one of the leading nations in AI by 2031206
In May 2023, researchers at the Technology Innovation Institute in Abu Dhabi announced the launch of the open-source Falcon series of LLMs 27 Although models in that series are comparable to those released by OpenAl and Meta," the UAE Law on Copyright and Neighboring Rights does not contain a fair use provision or an express exception for TDM or computational data analysis. Instead, Article 22(1) allows for the "[r]eproduc[tion of] one single copy of the Work for purely personal use [and] for non-profit and nonprofessional purposes," with the exception for works of fine or applied arts, architectural works, and computer programs, applications, and databases?! Like Japan's copyright exception for a non-enjoyment purpose,·!! the entire Article 22 of the UAE Law on Copyright and Neighboring Rights, which includes other exceptions, 1s subject to the last two steps of the three-step test mentioned above.?'?
The open-source nature of the Falcon models and their association with a state university may qualify them as "non-profit and non-professional" within the meaning of Article 22(1). However, 1t 1s unclear whether this sub-provision fully covers the Al training involved in developing these models. If the UAE hopes that its investments in academic research on Al will lead to broader commercial applications, it may need to revise its copyright law.!· Reform in this area is particularly urgent considering that its close neighbor, Saudi Arabia, has similar ambitions in AI development.?!·
D. Affordances for Machine Learning and Al Training
The three previous sections have shown that the three types of jurisdictions surveyed in this Part have embraced, to varying degrees, TDM, computational data analysis, and other nonexpressive uses of copyrighted works. Their differing national preferences, in turn, translate into different legislation. This section provides a systematic analysis of some of the copyright exceptions discussed above to illustrate the impact of system design and implementation on the degree of affordance for machine learning and AI training.?'5 The analysis also identifies the implications of these laws for both rightsholders and AI developers.
Table 1 indicates the affordances individual jurisdictions provide for Al training by addressing four key questions: (1) whether the jurisdiction allows for noncommercial TDM; (2) whether it also allows for commercial TDM; (3) whether the relevant exception applies to all relevant rights in copyright law or only the reproduction right; and finally, (4) whether the TDM exception extends to specific applications, including machine learning and generative Al. Table 2 provides additional details on each jurisdiction, framed in terms of the ramifications of their copyright exceptions for rightsholders and AI developers. The tables omit China and the UAE due to difficulties in ascertaining how the relevant copyright exceptions apply in the АТ context.
Combining these two tables gives us an overview of how the jurisdictions surveyed in this Article have addressed the copyright issues inherent in TDM, machine learning, and Al training. These jurisdictions have made different choices in design and implementation, especially in relation to issues such as lawful access requirements and how to address contractual and technological restrictions imposed by rightsholders. In jurisdictions where the legality of training Al models on copyrighted works without express authorization depends on the application of an open-ended legal standard such as fair use, there is still some uncertainty over how courts will rule on these questions. Nevertheless, we sec three areas of broad agreement. First, each jurisdiction recognizes that, in some circumstances, TDM is socially valuable and does not inherently prejudice the copyright holders' legitimate interests. Second, the copying inherent in TDM research should be allowed without express authorization in most circumstances. Third, any right to engage in TDM should not be a blank check.
In virtually all jurisdictions, the scope of the relevant limitation or exception is subject to some kind of assessment of whether the unlicensed use would prejudice the copyright holder's legitimate interests. Although such assessment is not obvious from the copyright exceptions in the United Kingdom and Singapore, commentators take the view that courts will consider the impact of the unlicensed use on the copyright holder's legitimate interests.?·°
In Singapore, for instance, the Copyright Act protects the rightsholders" interests by introducing a "lawful access" requirement and a duty to avoid training on infringing copies found in sites of flagrant copyright infringement."·° More importantly, although Section 244 applies quite broadly to computational data analysis, nothing in that exception appears to carry forward to downstream uses.??7 Thus, if a company trains a music generation model in Singapore and that model tends to produce infringing derivative versions of the songs in its training data, those outputs would still be infringing.
Finally, although we have not observed a major disconnect between law on the books and law in action, readers should stay alert for this possibility. Assessing whether such a gap exists in the area of Al training is not always easy. To begin with, most of these exceptions have been adopted for only about a decade or less. It will therefore take some time before we have enough case law or other information to assess their application. A case in point is Section 29A of the CDPA, which has not yet been used despite its existence for more than a decade.? Even if the exceptions were older, the low volume of copyright litigation in some of these jurisdictions might not generate sufficient cases to illustrate the laws' application. Indeed, a key criticism of the proposals to transplant the U.S. fair use provision has been the lack of precedent in the recipient jurisdictions to facilitate post-transplant interpretation.
III. FACTORS CONTRIBUTING TO CONVERGENCE
In view of the emerging international equilibrium on copyright and Al training documented above, this Part explores three factors that have contributed to such emergence: (1) the centrality of the idea-expression distinction in copyright law; (2) global competition in Al; and (3) the race to the middle among countries undertaking copyright law reforms.
A. Centrality of the Idea-Expression Distinction
Regardless of the divisions between common law and civil law traditions (and, by extension, the copyright and droit d'auteur systems)-or the differences in economic conditions, technological capabilities, political systems, and cultural backgrounds-the idea-expression distinction is a key, basic principle of copyright law. Providing the internal logic of a copyright system and serving as the Grundnorm·· across a large number of jurisdictions, this distinction provides a powerful gravitational force pulling together different copyright exceptions in the area of Al training.
In the United States, Section 102(a) of the 1976 Copyright Act stipulates that "[c]opyright protection subsists . . . in original works of authorship fixed in any tangible medium of expression."?· Section 102(b) states further that copyright protection does not "extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery. "· As the United States Supreme Court reminds us, the uncopyrightable nature of facts and ideas 1s "the most fundamental axiom of copyright law,?· and the idea-expression distinction strikes "a definitional balance" between copyright law and the First Amendment. >»
In the United Kingdom, the dichotomy is less talismanic,?· yet it can still be casily found in case law and academic commentary. For example, in Hollinrake v. Trustwell, an 1894 copyright case before the Court of Appeal of England and Wales, Lord Justice Lindley declared: "Copyright . . . does not extend to ideas, or schemes, or systems, or methods; it is confined to their expression; and if their expression is not copied, the copyright is not infringed."
Courts in civil law jurisdictions may not invoke the idea-expression distinction by name as frequently as in the United States and other common law jurisdictions, but they arrive at the same conclusion. Instead of using case law precedents and engaging in a balancing exercise to separate the expression from the idea, civil law judges and commentators focus on the nature of the copyright interest and the object of copyright law. For instance, Japan's copyright exception for TDM and related uses focuses on whether the unlicensed use of a copyrighted work has affected the enjoyment of "the thoughts or sentiments expressed in" that work." Covering both common law and civil law jurisdictions, Article 1.2 of the EU Directive on the Legal Protection of Computer Programs also declares: "Protection in accordance with this Directive shall apply to the expression in any form of a computer program. Ideas and principles which underlie any element of a computer program, including those which underlie its interfaces, are not protected by copyright under this Directive.?·
At the global level, Article 9.2 of the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS Agreement) of the World Trade Organization (WTO) states: "Copyright protection shall extend to expressions and not to ideas, procedures, methods of operation or mathematical concepts as such." Originating from Japan's proposal for the provision on computer programs, the TRIPS provision "for the first time in an international agreement provides for a list of uncopyrightable subject matter."·5! At the time of writing, more than 160 countries are WTO members and abide by the TRIPS Agreement, "·? making the idea-expression distinction virtually universal around the world.·3 Subsequent instruments, such as the WIPO Copyright Treaty, have similar formulations.?>·
The centrality and universality of the idea-expression distinction goes a long way to explaining the convergence demonstrated in Part П with respect to Al training. Consistent with the idea-expression distinction, copyright law should generally allow for substantial latitude for actions that technically amount to copying but fail to communicate an author's original expression to a new audience.? Such latitude makes especially appropriate an exception for AI training-whether in the form of fair use, an express exception for TDM or computational data analysis, or something else. This approach is well supported by academic commentators. As one of us has advocated, courts should recognize a principle of nonexpressive use to resolve questions relating to copy-reliant technologies. ·® Some commentators have gone even further. Oren Bracha argues that nonexpressive uses such as Al training are not cognizable as acts of copying in the first place, and thus there is no need to engage with questions relating to copyright limitations and exceptions.?'
Notwithstanding the gravitational pull of the idea-expression distinction, some commentators have questioned whether the unlicensed use of copyrighted works to train AI models falls within the idea side of the idea-expression distinction and whether existing case law provides sufficient precedent to extend prior rulings to generative AL? The answers to these questions are further complicated by the fact that the trained АГ models may produce outputs that compete with the works of existing copyright holders.?·° Indeed, the relationship between the inputs and outputs in AI models has been the key focus of ongoing litigation, which Section IV.A.1 will discuss further 2%
B. Global Competition in AI
In the past few years, policymakers, commentators, and the mainstream media have paid growing attention to the technology race between China, the European Union, and the United States. For example, in Digital Empires, Anu Bradford explores the ongoing rivalries between these major geopolitical powers over their varying models of technology regulation and their efforts to export those models.?°! Susan Aaronson outlines the negative implications of policies promoting Al nationalism, such as data protection and localization laws, export controls on semiconductor chips, and subsidization of cloud infrastructure and high-speed computing. Lee Kai-Fu, the former president of Google China, documents China's substantial engagement in the AI space and its active development of Al-driven products and services.
At the time of writing, U.S. companies dominate the Al race. Such dominance has attracted considerable attention from European and U.S. competition authorities.?·· As the European Parliament observes, U.S. companies accounted for about half of the total AI investments in 2023.?% The Artificial Intelligence Index Report 2024 also states: "The United States leads China, the EU, and the U.K. as the leading source of top AI models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European Union's 21 and China's 15.26 Although many factors shape the environment for AI development,' it is not difficult for countries around the world to notice that the United States has a highly hospitable copyright system for machine learning and Al training.
Since the arrival of generative Al, countries with varying geopolitical strengths and global competitiveness have become eager to join the AI race. Their motivation is partly the fear of falling behind and becoming dependent on foreign technology and partly the desire to reap the economic, scientific, and strategic benefits of domestic control of AI? In these circumstances, it is no surprise to find countries eager to emulate the U.S. approach to promote Al training and development. To a large extent, the affordances of the U.S. fair use regime have placed considerable pressure on these countries. Similar sentiments have been expressed in consultation documents released abroad concerning the role of the copyright system in developing the Internet and other digital technologies.?"°
As if the pressure to stay in the Al race were not acute enough, one cannot lose sight of the fact that AI models, once trained, are portable and can therefore be deployed in or accessed from other parts of the world. The territorial nature of copyright law?"! and the realities of АТ training mean that an AI developer can train its model in the United States, taking advantage of its fair use regime, before making that model available globally.?"" Even better, that developer may be able to do so without significant risks of copyright liability. Properly trained, an Al model is separate from the training data, and is therefore not a copy or derivative work of that data.?"· So long as that model consists purely of uncopyrightable abstractions, patterns, relationships, and information gleaned from the training data and does not embody the original expression residing in that data, the model does not infringe copyright-whether in the United States or other jurisdictions.
Moreover, Al developers in jurisdictions with more restrictive copyright laws can use models trained in the United States by either setting up a data pipeline with leading U.S. AI developers or hosting their models in the United States (or doing the same in other jurisdictions with similarly permissive laws).?" To enhance global competitiveness, these companies may eventually lobby their home governments to undertake copyright reform to provide greater support for Al training. To reduce regulatory arbitrage and to ensure that Al developments stay within the national borders, policymakers in jurisdictions with more restrictive copyright laws may also voluntarily advocate for similar reform.?7°
Another factor that may affect global AI competition is the introduction of laws with extraterritorial effects, whether direct or indirect. A case in point is the EU AI Act," which Section IV.B will discuss further.?77 Recital 21, which is nonoperative but provides interpretative guidance, states explicitly that "the rules established by this Regulation should apply to providers of Al systems in anon-discriminatory manner, irrespective of whether they are established within the Union or in a third country." Recital 106 states further that "[a]ny provider placing a general-purpose AI model on the Union market should comply with this [regulation], regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of those general-purpose AI models take place." There are significant problems with such extraterritorial regulation, however. First, it remains to be seen how well the AT Act will be enforced, both internally and externally.?89 Second, and more importantly, it is unclear whether the training of AI models in an Al-friendly jurisdiction would necessarily constitute an infringing act under copyright law from a choice-of-law standpoint?!
C. Race to the Middle
Commentators tend to analyze regulatory arbitrage in terms of a race to the top or the bottom,?·" as opposed to what commentators have referred to as a "race to the middle." In the intellectual property context, such analysis is more complicated. To begin with, it is difficult to determine when a race is to the top and when to the bottom. As Peter Jaszi reminds us, "one might say that one nation's "piracy[]" is another man's 'technology transfer. ""?S· The benefits of a specific copyright provision are often in the eyes of the beholder.?® A very broad copyright exception for Al training marks a race to the top for generative Al developers, but a race to the bottom for traditional copyright-focused industries such as media and publishing.
More importantly, countries do not always engage in a race to either extreme. Many have chosen to compromise or take the middle path. As one of us has shown earlier in an article examining the global transplantation of the U.S. fair use provision, many jurisdictions have struck compromises by adopting hybrid models. Those models allow these jurisdictions to retain part of the status quo while adding new fair use elements that would help the copyright system evolve without undergoing a dramatic paradigm shift.'
In recent years, commentators have called for greater scholarly attention to this race to the middle. For instance, William Magnuson discusses how federalism practiced in the United States has enticed states to engage in such a race in the business and corporate governance contexts." As he explains, racing to the middle offers four primary benefits:
First, states benefit from a number of informational effects when they adopt legal regimes that are well established and widely prevalent. Second, corporations and firms have strong interests in seeking out, and locating themselves in, jurisdictions that have familiar legal structures. Third, adopting a widely prevalent legal structure provides a set of interoperability benefits for both corporations and firms. And fourth, the risk of federal intervention is lower when states have legal structures well within the norms of other state behavior. 59
More specifically in the intellectual property context, several reasons explain why countries chose to retain part of their preexisting copyright system despite their eagerness to transplant the U.S. fair use provision. For example, policymakers and legislators may "push for innovation in the legal system while at the same time demanding the retention of what they consider as the strengths of current law or what they perceive as an important local tradition." In doing so, they seek to "achieve the best of both worlds."?°! These policymakers and legislators may also actively customize the transplanted regime to ensure the appropriateness and effectiveness of the transplanted laws.?? Finally, interest group politics and legislative inertia may force legislators to strike compromises by adopting hybrid models.?·3
Because a copyright exception for TDM, computational data analysis, or Al training is a subset of a general copyright exception, akin to an open-ended fair use provision, we expect similar dynamics to play out when countries explore whether to introduce a new exception to facilitate Al training. For instance, policymakers and legislators may seek to "achieve the best of both worlds" by introducing a new exception for TDM, computational data analysis, or Al training without disturbing other provisions in the existing copyright system."·· They may also feel compelled to customize the transplanted exception based on local conditions and legal tradition. In addition, interest group politics, legislative inertia, and the legitimate interests of countervailing constituencies may induce them to strike compromises by taking the middle path.
D. Summary
The three contributing factors identified in this Part explain why copyright laws in the different jurisdictions surveyed in Part II are converging in the area of Al training. Although these jurisdictions have yet to achieve consensus, similar contributing factors could be at work in other jurisdictions, especially considering that the European Union and the United States have already adopted similar exceptions to promote Al training and development. Thus, we predict further global convergence т this area beyond the jurisdictions surveyed in this Article.
IV. UNCERTAINTIES THAT MAY UPSET THE EQUILIBRIUM
The previous Part identified factors that have contributed to the emerging international equilibrium on copyright and Al training. This Part turns to potential uncertainties that might upset this equilibrium in the future. Section A discusses ongoing copyright litigation, partnerships and licensing deals, and legislative and regulatory efforts in the United States. Section B explores developments in the European Union, with a focus on the EU АТ Act.
A. United States
1. Ongoing Litigation
As discussed above, U.S. courts have held that the fair use doctrine justified nonexpressive uses by copy-reliant technologies such as reverse engineering, plagiarism detection, book and image search, and digital humanities research across millions of library books.?·· The logical extension of these nonexpressive use cases is that the copying involved in training AI models will, in most ordinary circumstances, amount to fair use." Admittedly, our view is not shared by the plaintiffs in over two dozen individual and class action lawsuits now being adjudicated or pending in U.S. district courts.?®" The plaintiffs include famous and unknown authors and artists, politicians, record labels, image rights aggregators, and news outlets. Collectively, these cases-which we will refer to as "generative Al cases" below-demand an end to Al training without express authorization, billions of dollars in damages, the destruction of АТ models, or all or some of the above. ?®
In the generative Al cases filed thus far, courts have already dismissed many of the more speculative liability theories asserted.?" Other dismissals are likely to follow. However, some generative Al cases may succeed, while others may settle.°°! Because a systematic review of the merits of each lawsuit is beyond the scope of this Article, this section discusses only a few notable complaints.
The generative Al plaintiffs' most promising paths are fact specific. For example, the consolidated cases in Tremblay у. OpenAl, Inc. present a plausible argument under the fourth fair use factor that commercial Al developers undermine the basic incentive structure of copyright by training on sites of known infringement and thus bypassing the market for access without a compelling justification. The Tremblay plaintiffs alleged that OpenAI and other developers had obtained access to over 100,000 books and other works through shadow libraries, such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik.3% This relatively novel argument is bolstered by its resonance with copyright laws in other jurisdictions. In the European Union, for instance, "lawful access" to the relevant copyrighted works is an essential condition under the TDM exceptions in the DSM Directive.?% Similarly, the exception for computational data analysis in Singapore is subject to both a "lawful access" requirement and a duty to avoid training on infringing copies found on sites of flagrant copyright infringement.3%
Even though the generative Al plaintiffs almost invariably argue that any unlicensed reproduction of copyrighted works is infringing regardless of the circumstances, recent complaints have devoted substantial attention to demonstrating extensive memorization. Rather than fighting against the weight of authority in the nonexpressive use cases mentioned above, some plaintiffs may be able to demonstrate that the extent of memorization in particular cases is so significant that the Al training in question does not qualify as a nonexpressive use. For example, the complaints in New York Times Co. v. Microsoft Corp.3% and Concord Music Group, Inc. у. Anthropic PBC· are accompanied by impressive examples of memorization demonstrated through the reproduction of specific works. If these plaintiffs can show that the Al models at issue routinely and indiscriminately reproduce works in the training data and that such reproductions are more than theoretically accessible, they will have seriously undermined the defendants" fair use defenses. With the factual record at such an carly stage, the outcomes of these cases are hard to predict. Yet even if some cases succeed, we are fairly confident that the generally Alfriendly orientation of the U.S. fair use regime will continue-due partly to nonexpressive use precedents set by cases such as HathiTrust and Google Books, and partly to the reluctance of American judges to throttle this American-led technology or send its development overseas.
2. Partnerships and Licensing Deals
Although commentators have paid considerable attention to litigation in the wake of the launch of ChatGPT and other generative Al tools, the past two years have seen Al developers, such as OpenAl and Google, actively developing partnerships and licensing deals with publishing houses and media conglomerates. 3%
Consider OpenAl, for instance. In December 2023, this dominant generative AI developer announced its partnership with Axel Springer 319 In addition to authorizing training on copyrighted content, the partnership allows OpenAl to provide "summaries of selected global news content from Axel Springer's media brands," such as Politico and Business Insider 3'! In exchange, OpenAl agrees to include т "ChatGPT"s answers to user queries . . . attribution and links to the full articles for transparency and further information."·!? A few months later, OpenAl announced another partnership "with international news organizations Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT."313 In May 2024, OpenAl struck a deal with News Corp. that "could be worth more than $250 million over five years. ·!· Most recently, the magazine empire of Condé Nast announced a similar deal with OpenAl, which allows for the use of content from Vogue, The New Yorker, Vanity Fair, and Wired, among others.?'·
While these alliances will help prevent lawsuits from licensing partners, it is possible that they will weaken the position of AI developers in ongoing and future fair use litigation.3'5 As noted above, courts have generally rejected circular and hypothetical claims of injury under the fourth fair use factor and have focused on lost licensing revenue "only when the use serves as a substitute for the original. "·"7 However, in other cases, courts have considered actual and potential injury to licensing markets.>'® In theory, empirical evidence of the existence of a viable market for Al training data could tip the balance of the fourth factor against the defendants in generative Al cases. In reality, however, the partnerships and licensing arrangements between Al developers and media conglomerates of which we are aware have yet to prove the viability of an optin model.
To begin with, the partnerships cover terms beyond Al training, such as the provision of summaries of news content and the inclusion of attribution and hyperlinks.' Given the confidential terms of the relevant agreements, it is difficult, if not impossible, to determine which portion of the partnerships, if any, should be considered as the fee for licensing copyrighted content for Al training. More importantly, even though the reported licensing deals are worth hundreds of millions of dollars, the content they make available is a drop in the ocean compared to the scale of training data required for the current generation of LLMs. For example, Meta's Llama 3 was trained on over 15 trillion tokens collected from publicly available sources.°"° Assuming that the New York Times print edition 1s roughly fifty pages per day, cach page has 2000 words, and there are 1.3 tokens per word, the newspaper would generate roughly 0.91 million tokens per week. At that rate, 1t would take about 316,000 years to generate 15 trillion tokens.3·! Even if small models could be trained exclusively on public domain and licensed data, the notion of rights clearance for training the current generation of leading-edge LLMs is a fantasy.
The deals with media conglomerates show that some generative Al developers will pay for easy access to significant caches of high-quality training data that would otherwise be inaccessible due to paywalls and machine-readable exclusion headers. A developer that chose to ignore these safeguards would be taking risks under U.S. copyright law3?? and could fall outside the protection of the EU DSM Directive.3?· The content deals discussed above are not just about access and training: they appear to allow AI developers to cross the line from nonexpressive to expressive use. This may be particularly advantageous in the context of Al-enabled search, a use case that combines language models with traditional Internet search3?· and often results in paraphrased answers that may, from a copyright law standpoint, be uncomfortably close to the original text.3%5
Nevertheless, we can see that in the court of public opinion, the tens or hundreds of millions of dollars that AI developers -some valued at billions or trillions of dollars -are paying for content strengthens the argument that copyright holders are entitled to expect licenses for Al training. Accordingly, we view the above arrangements as a potential double-edged sword: they turn enemies into allies and provide a benchmark for future licensing negotiations to settle additional lawsuits, yet they may negatively impact the Al developers" litigation position and general public sentiment.
3. Legislative and Regulatory Efforts
During the Biden Administration, the U.S. Senate Judiciary Committee held ten public hearings on Al-related issues, covering intellectual property, human rights, regulatory issues, governance and oversight, journalism, criminal investigations and prosecutions, and deepfakes during political elections 3° Although Congress has yet to adopt any legislation addressing copyright law and Al training specifically, it is considering bills that will affect AI development, especially in relation to problems created by deep fakes.??7 Many states have also adopted or considered new legislation to regulate digital replicas 3% All of this legislation may spill over into copyright law.
The ongoing legislative efforts have recently carned the support of the U.S. Copyright Office. As the Office declared in the first part of its study on copyright and AL, which focuses on digital replicas:
We recommend that Congress establish a federal right that protects all individuals during their lifetimes from the knowing distribution of unauthorized digital replicas. The right should be licensable, subject to guardrails, but not assignable, with effective remedies including monetary damages and injunctive relief. Traditional rules of secondary liability should apply, but with an appropriately conditioned safe harbor for [online service providers]. The law should contain explicit First Amendment accommodations. Finally, in recognition of welldeveloped state rights of publicity, we recommend against full preemption of state laws.·?°
The Biden Administration also issued the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,33° which has since been revoked and replaced by a new executive order. Regulatory bodies such as the Federal Trade Commission (FTC) may also take action to address the varied challenges posed by AI developers and their technology .3·? Although the FTC rarely interacts with copyright law-except to address the anti-competitive effects posed by companies with substantial copyright interests?··-the agency, in Winter 2023, filed a somewhat controversial submission with the U.S. Copyright Office in response to the latter's request for comments on copyright and AI3·· As the Commission declared:
Conduct that may violate the copyright laws-such as training an Al tool on protected expression without the creator's consent or selling output generated from such an AI tool, including by mimicking the creator's writing style, vocal or instrumental performance, or likeness-may . . . constitute an unfair method of competition or an unfair or deceptive practice, especially when the copyright violation deceives consumers, exploits a creator's reputation or diminishes the value of her existing or future works, reveals private information, or otherwise causes substantial injury to consumers.3·
Practitioners and academic commentators alike have heavily criticized this comment.·· Although further discussion of the FTC's position and potential regulatory actions is outside the scope of this Article, and we have offered our critiques separately elsewhere," it is worth noting that regulatory efforts outside the intellectual property arena could deeply affect future Al development, including in the copyright area generally and in relation to Al training more specifically. With the arrival of the Trump Administration, some of the positions taken by government agencies under the previous administration have also changed.
B. European Union
The previous section focused on issues relating to litigation, licensing, and legislative and regulatory efforts in the United States. Many of these issues will equally affect the European Union.' To avoid repetition, this section concentrates on the EU AI Act, which partially entered into force on August 1, 2024.33
The legislative effort concerning the АТ Act was launched in April 2022 to address potential risks arising with advances in AI technology. When this effort began, generative Al had yet to enter the regulatory picture. Nor was intellectual property protection the drafters' key focus.··! Instead, the regulation governs Al technology based on individual and societal risks.3·? Halfway through the deliberation, intellectual property issues began to grab greater policy attention, due in no small part to ChatGPT"s effect on public consciousness of generative AL In the end, the АТ Act included two key provisions that will have major impacts on both copyright holders and AI developers: Article 53 and Recital 106 (a nonoperative provision that provides interpretative guidance).3% These provisions, however, "do[] not apply to AI systems or AI models, including their output, specifically developed and put into service for the sole purpose of scientific research and development."
Article 53 raises two separate issues: transparency and remuneration. Article 53(1)(d) requires AI developers to "draw up and make publicly available a sufficiently detailed summary about the content used for training of the generalpurpose AI model, according to a template provided by the AI Office. 35 While greater disclosure of training materials can be beneficial, >·® heavy disclosure obligations and cumbersome requirements may stifle future AT development. It remains to be seen how Article 53 applies in individual EU member states -in particular, whether its application can strike a good balance between copyright protection and AI development.
The second issue concerns remuneration. Article 53(1)(c) requires "[p]roviders of general-purpose AT" to "put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of" the DSM Directive." The referenced provision states that the TDM exception "shall apply on condition that the use of [copyrighted] works and other subject matter . . . has not been expressly reserved by their rightsholders in an appropriate manner."··s Taken together, these two provisions enable copyright holders to opt out of Al training and demand compensation for the use of their works."
As Section III B. has noted, Recital 106 indicates the drafters' intent for the АТ Act to have extraterritorial effects, similar to the General Data Protection Regulation (GDPR)350' While there is no denying the "Brussels effect" generated by the latter, ®! it is unclear whether non-EU countries will feel the same pressure to adopt legislation consistent with the AI Act, especially considering that AI developers in Europe have a limited market share vis-a-vis their U.S. and Chinese counterparts.3? Policymakers and commentators have already raised concerns about powerful multinational companies pressuring EU regulators with threats to exit the European market. Given the dominance of U.S. Al companies, these regulators may face a tough balancing act: regulating multinational Al companies while keeping them and their technologies in the Union. Moreover, the enforcement and choice-of-law questions identified above may mute the extraterritorial effect of the AI Act.33 In view of these challenges and complications, we expect the actual extraterritorial effect to be more limited than the regulation suggests.
Finally, some commentators and organizations have called for greater use of collective management to address the unlicensed use of copyrighted works for training AI models. For instance, Martin Senftleben observes: "Providers of Al systems with the potential to serve as a substitute for human creations could be obliged to pay remuneration to collecting societies, which would then use the revenue to support human authors and their creative work."35· In relation to the potential for equitable remuneration under the EU AI Act and the opt-out arrangement facilitated by the DSM Directive, the European Copyright Society noted the usefulness "to investigate and identify the legal options left to Member States or adopted at the EU level to organise some other forms of appropriate compensation (such as a residual remuneration right or collective remuneration models existing in several Member States ...).% Similar proposals have surfaced across the Atlantic.3% It remains to be seen what role collecting societies will play in the debate on copyright and AI training, whether collective intermediation or remuneration will provide an expedient solution,3·7 and if the ongoing commentaries and proposals will rejuvenate the past discussion of statutory licenses and private levies, both of which were actively explored as solutions to address Internet file-sharing in the early 2000s.358
V. KEY LESSONS
Thus far, we have addressed the emergence of an international equilibrium on copyright and Al training, factors contributing to this emergence, and uncertainties that may upset the equilibrium. In this Part, we derive six key lessons from the multi-country survey of copyright law developments provided in Part II.
First, the survey has shown that countries around the world have actively embraced the use of copyrighted works to train AI models-due to copyright law's focus on protecting expression, the countries' eagerness to remain globally competitive, and their preference for taking the middle path.3 Jurisdictions with different legal traditions, economic conditions, technological capabilities, political systems, and cultural backgrounds have found ways to reconcile copyright law and Al training. Although ongoing litigation, licensing deals, and legislative and regulatory efforts have created uncertainties, 2 we predict further global convergence of copyright laws in the агеа of Al training. Gone is a binary categorical debate about the legality of using copyrighted works to train Al models. Emerging instead is a more granular debate about the specific circumstances in which the unlicensed use of copyrighted works for Al training should be allowed or prohibited. Thus, countries eager to update their copyright laws to facilitate Al training are advised to adopt some relevant exceptions- whether in the form of fair use, an express exception for TDM or computational data analysis, or something else. The more hesitation they have, the less globally competitive they will be.
Second, with copyright exceptions to facilitate Al training already in place in most major geopolitical powers and the emergence of a broader international equilibrium, major international disputes over those copyright exceptions are unlikely. When countries are actively finetuning their copyright systems to promote Al training and development, it is difficult to substantiate a violation of an international legal obligation. More importantly, without any significant disagreement between major geopolitical powers in the area of Al training, it is hard to envision any country taking action before an international adjudicatory body, such as the WTO Dispute Settlement Body.3·! Indeed, because of the United States" undisputed leadership in using copyright limitations and exceptions to facilitate Al training and the development of other copy-reliant technologies, the chance of other countries being put on the United States' Section 301 Watch List or Priority Watch List for emulating the U.S. approach is very slim.?% Countries that have held back copyright reform in the area of AI training due to their fear of WTO litigation or trade retaliation?% should feel confident to move forward.
Third, many ways exist to introduce copyright limitations and exceptions to facilitate AI training.?· While increased globalization and the prevalence of international trade and intellectual property agreements have created the expectation of, if not a preference for, one-size-fits-all standards, no such standards exist at the intersection of copyright and Al training. Instead, some countries have embraced open-ended limitations and exceptions, while others have chosen to develop express exceptions for TDM or computational data analysis.·°° The existence of these global variations provides a rare opportunity for policy and academic researchers to evaluate the affordances of different copyright legislation for machine learning and AI training. 356
Fourth, the continuous evolution of copyright limitations and exceptions for Al training will be affected by developments outside the intellectual property arena in addition to those within, such as copyright litigation, partnership and licensing deals, and legislative and regulatory efforts. Although intellectual property policymakers and commentators have paid considerable attention to the development of the EU AI Act, it is worth remembering that non-intellectual property issues provide the impetus behind the drafting of this regulation.>®" Similarly, much of the emerging U.S. legislation has focused on issues ranging from consumer protection to political communication to national security.3 Most Al regulatory efforts supported by U.N. agencies and other international and regional bodies have also focused on issues outside the intellectual property domain 3%
Fifth, because generative Al is still in its nascent stage of development, it is too early to tell what will happen. Although we are comfortable predicting that copyright laws regarding Al training will converge in substance across the globe, we anticipate substantial variation in form. It is equally hard to predict the evolution of AI, including generative Al technology: the capabilities and affordances of this technology in 2025 are a far cry from what they were in 2022. Current trends such as agentification,?7 small-language models'! and the use of synthetic data?? may further reduce the perceived conflict between copyright and AL
Finally, this Article focuses primarily on the use of copyrighted works to train AI models. While this analysis could be extended to cover other areas at the intersection of copyright and Al, the law on the copyrightability of AIgenerated works seems to have diverged globally. While the U.S. Copyright Office rejected a number of widely reported applications for registration of copyright in Al-generated works,?3 Chinese courts and the Korea Copyright Commission have extended copyright protection to the same type of work. 374 Although the U.S. Copyright Office has recently begun to register copyright in Al-generated works based on selection, coordination, and arrangement," it is too soon to tell whether our prediction of convergence in the area of Al training will extend equally to other issues at the intersection of copyright and A1.°7°
CONCLUSION
It is logical to expect countries with different legal traditions, economic conditions, technological capabilities, political systems, and cultural backgrounds to take diverging approaches to copyright law. In the area of Al training, however, these divergent approaches have not appeared as many have expected. Even though laws in this area remain varied in form, they have converged globally in substance. Thus, despite the world's failure to achieve consensus on copyright and Al training, an international equilibrium has emerged.
Because Al technology will continue to evolve in the near future, sparking further legal, regulatory, technological, and business developments, it remains to be seen whether and how this equilibrium will be maintained. Regardless of the outcome, scrutinizing international copyright law developments in the area of АГ training will deepen our understanding of how to better harness the copyright system to advance Al, including generative Al technology. Because some copyright legislation will provide greater affordances for machine learning and Al training than others, policymakers and commentators should pay greater attention to the legislation's relative strengths and weaknesses. Like the design of AI models and training processes, the design of the copyright system can play a very important role in the age of generative Al.
1 See infra text accompanying notes 66-72 (distinguishing between generative Al and text and data mining).
2 See discussion infra Section IV.A 3.
3 See discussion infra Section IV.A.1.
4 See discussion infra Section IV.A.2.
5 See Artificial Intelligence and Intellectual Property, WORLD INTELL. PROP. ORG.<https://www.wipo.int/about-ip/en/frontier technologies/ai> and ip.html (last visited Mar. 6, 2025).
6 See generally UN. Sec'y-Gen.'s High-Level Advisory Body on A.L, Governing Al for Humanity: Final Report (Sept. 2024) [hereinafter AZ Advisory Body Final Report], U.N. Sec'y-Gen.'s High-Level Advisory Body on AL, Governing Al for Humanity: Interim Report (Dec. 2023).
7 See Martin Coulter & Foo Yun Chee, М icrosoft's Deal with Mistral AI Faces EU Scrutiny, REUTERS (Feb. 27, 2024, 11:37 AM), https://www.reuters.com/technology/microsofts-deal-with-mistral-ai-faces-euscrutiny-2024-02-27/ (highlighting the anti-competitive concerns of Microsoft's investment in Mistral AT); Emilia David, Microsoft's Mistral Deal Beefs up Azure Without Spurning OpenAl, THE VERGE (Mar. 4, 2024, 12:36 PM), https://www.theverge.com/24087008/microsoft-mistral-openai-azure-europe (explaining how Microsoft's investment in Mistral АТ "lets the company become a player in the European Al space by buying into a company that already has a presence in the region").
8 See Alex Hern, UK Regulator Looks at Google's Partnership with Anthropic, THE GUARDIAN (July 30, 2024, 10:53 AM), https://www.theguardian.com/technology/article/2024/jul/30/google-anthropic-partnershipcma-ai (reporting the British Competition and Markets Authority's investigation of Google's partnership with Anthropic); UK Starts Probe into Amazon's AT Partnership with Anthropic, REUTERS (Aug. 8, 2024, 1:29 PM), (reporting a similar investigation of Amazon's partnership with Anthropic).
9 See discussion infra Sections ILA-C.
10 See George С. Christie, Some Key Jurisprudential Issues of the Twenty-First Century, 8 TUL. J. INTL & COMPAR. L. 217, 218-23 (2000) (discussing the different approaches to judicial interpretation by common law and civil law judges); Graeme B. Dinwoodie, International Intellectual Property Litigation: A Vehicle for Resurgent Comparativist Thought, 49 AM. J. COMPAR. L. 429, 436 (2001) ("[E]ven identical rules of law may lead to different results when applied in different social contexts by different tribunals."); Peter К. Yu, The Harmonization Game: What Basketball Can Teach About Intellectual Property and International Trade, 26 FORDHAM INTL L.J. 218, 233-34 (2003) ("TFJoreign judges, in particular those who have been trained in civil law countries, tend to interpret laws differently, especially in areas where fundamental philosophical differences are involved." (footnote omitted)).
11 Some industry groups, policymakers, and commentators have referred to such training as "ingestion." See, e.g., THE AUTHORS GUILD, COMMENTS OF THE AUTHORS GUILD: ARTIFICIAL INTELLIGENCE AND COPYRIGHT 15 (2023), https://authorsguild.org/app/uploads/2023/10/Authors-Guild-Comments-AI-andCopyright-October-30-2023.pdf ("[T]raining [large language models], at this stage, requires ingestions of complete works."). However, that term is misleading. In most cases, training data influences the model without becoming part of the model. See PAMELA SAMUELSON, CHRISTOPHER JON SPRIGMAN & MATTHEW SAG, COMMENTS IN RESPONSE TO THE COPYRIGHT OFFICE'S NOTICE OF INQUIRY ON ARTIFICIAL INTELLIGENCE AND COPYRIGHT 7 (2023) [hereinafter SAMUELSON ET AL., USCO COMMENT], https://www.regulations.gov/comment/COLC-2023-0006-8854 [https://perma.cc/EP3E-EWYU].
12 One ofus coined the term "nonexpressive use" in a 2009 law review article to describe this phenomenon. Matthew Sag, Copyright and Copy-Reliant Technology, 103 Nw. U. L. REV. 1607, 1608 (2009) [hereinafter Sag, Copy-Reliant Technology].
13 Regulation 2024/1689, 2024 O.J. (L 144) 1 [hereinafter EU AI Act].
14 See PAUL GOLDSTEIN, COPYRIGHTS HIGHWAY: FROM GUTENBERG TO THE CELESTIAL JUKEBOX 31 (rev. ed. 2003); see also Peter К. Yu, Of Monks, Medieval Scribes, and Middlemen, 2006 MICH. ST. L. REV. 1, 10-14 (discussing the challenges by the invention of the printing press).
15 See Sony Comput. Ent., Inc. у. Connectix Corp., 203 F.3d 596, 598-99 (9th Cir. 2000); Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510, 1514 (9th Cir. 1992).
16 See А.М. ex rel. Vanderhye у. iParadigms, LLC, 562 F.3d 630, 633-34 (4th Cir. 2009).
17 See Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1155-56 (9th Cir. 2007), Kelly v. Arriba Soft Corp., 336 F.3d 811, 815 (9th Cir. 2003).
18 See Authors Guild, Inc. у. Google, Inc., 804 F.3d 202, 207 (2d Cir. 2015).
19 See Authors Guild, Inc. у. HathiTrust, 755 F.3d 87, 90 (2d Cir. 2014).
20 Compare 17 U.S.C. $ 106(1), with id. $ 102(b).
21 See Harry Surden, Artificial Intelligence and Law: An Overview, 35 GA. ST. U. L. REV. 1305, 1307 (2019). See generally STUART J. RUSSELL & PETER NORVIG, ARTIFICIAL INTELLIGENCE: A MODERN APPROACH (4th ed. 2021) (providing an authoritative discussion of the history and definitions of AI).
22 See Surden, supra note 21, at 1307-16.
23 For discussions of machine learning, see generally ETHEM ALPAYDIN, MACHINE LEARNING: THE NEW AI (2016); JOHN D. KELLEHER, DEEP LEARNING (2019).
24 See ERIC J. TOPOL, DEEP MEDICINE: HOW ARTIFICIAL INTELLIGENCE CAN MAKE HEALTHCARE HUMAN AGAIN 117-18 (2019) (discussing the impressive progress in algorithmic image processing); Jonathan Guo & Li Bin, 7he Application of Medical Artificial Intelligence Technology in Rural Areas of Developing Countries, 2 HEALTHEQUITY 174, 175 (2018) (noting research which showed that systems using deep convolutional neural networks were "able to classify skin cancer at a comparable level to dermatologists" and "could improve the speed, accuracy, and consistency of diagnosis [of breast cancer metastasis in lymph nodes], as well as reduce the false negative rate to a quarter of the rate experienced by human pathologists").
25 See Matthew Sag, Fairness and Fair Use in Generative AI, 92 FORDHAM L. REV. 1887, 1888-89 (2024) [hereinafter Sag, Fairness and Fair Use] (discussing LLMs and generative AI).
26 earing on Artificial Intelligence and Intellectual Property-Part II: Copyright Before the Subcomm. on Intell. Prop. of the U.S. Senate Comm. on the Judiciary, 118th Cong. 1 (2023) (footnote omitted) (statement of Matthew Sag, Professor of Law, Emory University School of Law).
27 See Sag, Fairness and Fair Use, supra note 25, at 1889-90 (noting the copyright questions raised by the arrival of "Generative AI' systems, such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) language models and the Stable Diffusion and Midjourney text-to-image models").
28 See, e.g., Ben Packer, Yoni Halpern, Mario Guajardo-Céspedes & Margaret Mitchell, Text Embedding Models Contain Bias. Here's Why That Matters., GOOGLE FOR DEVELOPERS (Apr. 13, 2018),
https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html (noting that natural language processing models exhibit gender stereotypes when trained on news articles).
29 See Christopher A. Mouton, Caleb Lucas & Ella Guest, The Operational Risks of AI in Large-Scale Biological Attacks: - Results of a Red-Team Study, RAND (Jan. 25, 2024),<https://www.rand.org/pubs/research reports/RRA2977-2.html> (finding that the existing generation of LLMs did not measurably change the operational risk of a biological weapon attack).
30 See Peter K. Yu, Artificial Intelligence, the Law-Machine Interface, and Fair Use Automation, 72 ALA. L. REV. 187, 189 n.8 (2020) (providing sources that discuss the impact of AI and robot lawyers on the legal profession); Fabrizio Dell Acqua, Edward McFowland III, Ethan Mollick, Hila Lifshitz-Assaf, Katherine С. Kellogg et al., Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality 1 (Harv. Bus. Sch., Working Paper No. 24-013, 2023), https://www.hbs.edu/ris/Publication%20Files/24-013 _d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf (finding that "AI capabilities cover an expanding, but uneven, set of knowledge work . .. [that can] displace human work™). See generally ERIK BRYNJOLFSSON & ANDREW MCAFEE, THE SECOND MACHINE AGE: WORK, PROGRESS, AND PROSPERITY IN A TIME OF BRILLIANT TECHNOLOGIES 138-46 (2014) (examining the transformative impacts of emerging digital technologies on jobs and the economy).
31 See generally STUART RUSSELL, HUMAN COMPATIBLE: ARTIFICIAL INTELLIGENCE AND THE PROBLEM OF CONTROL (2019) (discussing possible incompatibilities between AI and human values).
32 See Yonathan Arbel, Matthew Tokson & Albert Lin, Systemic Regulation of Artificial Intelligence, 56 ARIZ. ST. L.J. 545, 556-70 (2024) (arguing that AI has posed both immediate harms and long-term, existential risks). For a note of skepticism on those long-term, existential risks, see Timnit Gebru & Émile P. Torres, 7he TESCREAL Bundle: Eugenics and the Promise of Utopia Through Artificial General Intelligence, FIRST MONDAY, Apr. 2024, at 1, https://doi.org/10.5210/fm.v29i4.13636.
33 See generally Sag, Copy-Reliant Technology, supra note 12, at 1616-24, 1639-56 (providing case studies on copy-reliant technologies).
34 Not every generative AI model is an LLM, nor is every LLM confined to text inputs and outputs or dependent on the transformer architecture. The current leading foundation models, such as Google's Gemini, Metas Llama-3, and OpenAl"s GPT40, are multimodal models. See GPT-4, OPENAI, https://openai.com/index/gpt-4-research/ (last visited Feb. 6, 2025) ("GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) . . . ."), Introducing Meta Llama 3: The Most Capable Openly Available LLM to Date, META (Apr. 18, 2024), https://ai.meta.com/blog/meta-Ilama-3/ ("Our goal in the near future is to make Llama 3 multilingual and multimodal . . . .") [hereinafter Introducing Meta Llama
35 See Melissa Heikkilä, OpenAT's Hunger for Data Is Coming Back to Bite It, MIT TECH. REV. (Apr. 19, 2023), https://www.technologyreview.com/2023/04/19/1071789/openais-hunger-for-data-is-coming-back-tobite-it/ ("OpenAl's GPT-2 model had a data set consisting of 40 gigabytes of text. GPT-3, which ChatGPT is based on, was trained on 570 GB of data. OpenAl has not shared how big the data set for its latest model, GPT4, is."); see also infra text accompanying note 320.
36 See Introducing Meta Llama 3, supra note 34 (noting that Llama 3 training data was filtered through "a series of data-filtering pipelines [which] . . . include using heuristic filters, NSFW [Not Safe for Work] filters, semantic deduplication approaches, and text classifiers to predict data quality"); see also Sag, Fairness and Fair Use, supra note 25, at 1893 & n.28 (discussing the sound technical reasons for using locally stored copies of the training data).
37 See Lukas Selin, Demystifying Tokens in LLMs, TOKES COMPARE, https://tokescompare.io/demystifying-tokens-in-llms/ (last visited Mar. 6, 2025).
38 See Sag, Fairness and Fair Use, supra note 25, at 1893 n.29 ("[TThe reproduction right in $ 106(1) is only triggered by the making of a copy or copies of the work and, to qualify as a 'copy' under the relevant definition in § 101, the embodiment of the work must be permanent or stable enough to be perceived, reproduced or communicated; and it must exist in that state for 'more than transitory duration." ... But the creation of semipermanent stored copies, which appears to be common practice in training LLMs, clearly does not result in such a temporary or transient copy.").
39 For something to constitute a "copy" of a work under the relevant definition in Section 101 of the Copyright Act, it must embody the work in a form that is permanent or stable enough to be perceived, reproduced, or communicated "cither directly or with the aid of a machine or device." 17 U.S.C. $ 101. Works transformed into tokens may appear to be unreadable, but the tokenization process can be easily reversed. Thus, these numerical representations can be read "with the aid of a machine or device" and meet the statutory definition of copies.
40 See Sag, Fairness and Fair Use, supra note 25, at 1893 n.28, 1907.
41 See EU AI Act, supra note 13, recital 98 (stating that "models with at least a billion of parameters and trained with a large amount of data using self-supervision at scale should be considered to display significant generality and to competently perform a wide range of distinctive tasks"); Tom В. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan et al., Language Models Are Few-Shot Learners 5 (July 22, 2022) (unpublished manuscript), https://arxiv.org/pdf/2005.14165.pdf (referring to GPT-3 as "a 175 billion parameter autoregressive language model").
42 See Sag, Fairness and Fair Use, supra note 25, at 1907.
43 See id.; Brown et al., supra note 41.
44 See Matthew Sag, Copyright Safety for Generative AI, 61 Hous. L. REV. 295, 313-21 (2023) [hereinafter Sag, Copyright Safety] (discussing the development of language models).
45. See Sag, Fairness and Fair Use, supra note 25, at 1907-08.
46 Billy Perrigo, The A to Z of Artificial Intelligence, TIME (Apr. 13, 2023, 1:02 PM), https://time.com/6271657/a-to-z-of-artificial-intelligence/.
47 See Hello GPT-40, OPENAI (May 13, 2024), https://openai.com/index/hello-gpt-40. GPT-40 is a model currently hosted by Open AI and can be accessed through an application programming interface, versions of the ChatGPT app, or via the ChatGPT website. See Sean Michael Kerner, GPT-40 Explained: Everything You Need to Know, TECHTARGET (July 19, 2024), https://www.techtarget.com/whatis/feature/GPT-40-explainedEverything-you-need-to-know.
48 See Sag, Fairness and Fair Use, supra note 25, at 1907-08.
49 For discussions of memorization in the generative AI context, see generally A. Feder Cooper & James Grimmelmann, The Files Are in the Computer: On Copyright, Memorization, and Generative AI, 100 CHI.-KENT L. REV. (forthcoming 2025); Sag, Copyright Safety, supra note 44, at 310-13, 326-37.
50 Compare Getting the Models, META,<https://www.llama.com/docs/getting the models/meta/> (last visited Mar. 6, 2025) ("Llama 3.1: The 405B models require significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing."), with Introducing Meta Llama 3, supra note 34 ("Llama 3 uses a tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently . . . .").
51 See Sag, Fairness and Fair Use, supra note 25, at 1908; see also Sag, Copyright Safety, supra note 44, at 343 ("The legal and ethical imperative is to train models that learn abstract and uncopyrightable latent features of the training data and that do not simply memorize a compressed version of the training data.").
52 For discussions of the latent space involving the training and design of AI systems, see generally BJ Ard, Copyright's Latent Space: Generative AT and the Limits of Fair Use, 110 CORNELL L. REV. 509 (2025); Tan Stenbit, François Chollet & Luke Wood, А Walk Through Latent Space with Stable Diffusion, KERAS (Sept. 28, 2022),<https://keras.io/examples/generative/random walks with stable diffusion/.
53 See Sag, Copyright Safety, supra note 44, at 302. Nor can it be regarded as a derivative work "Ti]n the absence of substantial similarity in expression between th[e] inputs and . . . outputs." Pamela Samuelson, Fair Use Defenses in Disruptive Technology Cases, 71 UCLA L. REV. 1484, 1553 (2024).
54 See Sag, Copyright Safety, supra note 44, at 313-25 (discussing this attenuated link).
55 Under the classic formulation in Arnstein v. Porter, substantial similarity is both a quantitative and qualitative inquiry. Arnstein у. Porter, 154 F.2d 464, 472-73 (2d Cir. 1946). The question is whether one work has taken "so much of what is pleasing [to a lay audience] that defendant wrongfully appropriated something which belongs to the plaintiff." 7d. at 473.
56 See Tell Me a Bedtime Story About Bear Who Lives in Helsinki Who Is Also a Detective, in a HardBoiled Style., CHATGPT, https://chatgpt.com/share/976af48d-9d3f-485a-ad75-e30dc8c239ca (last visited Mar. 6, 2025).
57 DASHIELL HAMMETT, THE MALTESE FALCON (1930).
58 For discussions of copying style in the generative AI context, see generally Sean M. O'Connor, AZ Replication of Musical Styles Points the Way to an Exclusive Rights Regime, in RESEARCH HANDBOOK ON INTELLECTUAL PROPERTY AND ARTIFICIAL INTELLIGENCE 565 (Ryan Abbott ed., 2022); Benjamin Sobel, Elements of Style: Copyright, Similarity, and Generative AI, 38 HARV. J.L. & TECH. (forthcoming 2025).
59 See Sag, Copyright Safety, supra note 44, at 310-13; Peter Henderson, Li Xuechen, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley et al., Foundation Models and Fair Use 22 (Mar. 29, 2023) (unpublished manuscript), https://arxiv.org/pdf/2303.15715.pdf.
60 See Sag, Copyright Safety, supra note 44, at 327-36 (explaining the "Snoopy Problem"). Some recent research suggests the possibility of finetuning models to effectively unlearn fictional characters, but the reliability, scalability, and potential drawbacks of such techniques are unclear at present. See, e.g., Ronen Eldan & Mark Russinovich, Who's Harry Potter? Approximate Unlearning in LLMs (Oct. 4, 2023) (unpublished manuscript), https://arxiv.org/abs/2310.02238.
61 See Matthew Sag, The New Legal Landscape for Text Mining and Machine Learning, 66 J. COPYRIGHT Soc'y U.S.A. 291, 295-301 (2019) [hereinafter Sag, New Legal Landscape] (providing examples).
62 See TAN HARGREAVES, DIGITAL OPPORTUNITY: A REVIEW OF INTELLECTUAL PROPERTY AND GROWTH 48 (2011).
63 See Sag, New Legal Landscape, supra note 61, at 312-13.
64 See Sag, Fairness and Fair Use, supra note 25, at 1903-06.
65 See Sag, Copyright Safety, supra note 44, at 307 (comparing logistic regression with and without machine learning).
66 See id. at 309-10 ("TA]lthough LLMs do not generally produce pseudo-expressive works that mimic their training data, they may do so under specific circumstances, particularly in the context of copyrightable characters and analogous situations.").
67 See Sag, Fairness and Fair Use, supra note 25, at 1919-20.
68 17US.C. $ 107; see also Sag, Fairness and Fair Use, supra note 25, at 1919-20.
69 A case in point is Japan's copyright exception for a non-enjoyment purpose. CHOSAKUKENHO [Japanese Copyright Act] 1970, art. 30-4 (Japan), https://www.cric.or.jp/english/clj/doc/20210624 law.pdf; see discussion infra Section IV.C.1 (discussing this provision); see also Peter K. Yu, Customizing Fair Use Transplants, LAWs, Mar. 2018, no. 9, at 6 [hereinafter Yu, Customizing Fair Use] (discussing jurisdictions that add the three-step test in their effort to transplant the U.S. fair use provision).
70 Derived from Article 9(2) of the Berne Convention for the Protection of Literary and Artistic Works (Berne Convention), Article 13 of the Agreement on Trade-Related Aspects of Intellectual Property Rights requires members of the World Trade Organization to "confine limitations or exceptions to exclusive rights to [1] certain special cases which [2] do not conflict with a normal exploitation of the work and [3] do not unreasonably prejudice the legitimate interests of the right holder." Berne Convention for the Protection of Literary and Artistic Works art. 9(2), Sept. 9, 1886, 828 U.N.T.S. 221 (last revised at Paris July 24, 1971) [hereinafter Berne Convention]; Agreement on Trade-Related Aspects of Intellectual Property Rights art. 13, Apr. 15, 1994, Marrakesh Agreement Establishing the World Trade Organization, Annex 1C, 1869 U.N.T.S. 299 [hereinafter TRIPS Agreement]; see also Peter К. Yu, The Confuzzling Rhetoric Against New Copyright Exceptions, т 1 KRITIKA: ESSAYS ON INTELLECTUAL PROPERTY 278, 289 (Peter Drahos, Gustavo Ghidini & Hanns Ullrich eds., 2015) (discussing the introduction of the three-step test into domestic copyright legislation). See generally MARTIN SENFTLEBEN, COPYRIGHT, LIMITATIONS, AND THE THREE-STEP TEST: AN ANALYSIS OF THE THREE-STEP TEST IN INTERNATIONAL AND EC COPYRIGHT LAW (2004) (providing a seminal study of the three-step test).
71 TRIPS Agreement, supra note 70, art. 13.
72 See infra text accompanying notes 307-308.
73 See 17 U.S.C. $ 107 (codifying fair use).
74 See id. (determining the outcome based on the consideration of four nonexhaustive factors).
75 HARGREAVES, supra note 62, at 44; see also AUSTL. L. REFORM COMM N, COPYRIGHT AND THE DIGITAL ECONOMY: FINAL REPORT 104-08 (2013) [hereinafter ALRC FINAL REPORT] (discussing how fair use can assist innovation); COPYRIGHT REV. COMM., MODERNISING COPYRIGHT 93 (2013) (Ir.) [hereinafter CRC FINAL REPORT] (noting that the adoption of the proposed fair use doctrine "will send important signals about the nature of the Irish innovation ecosystem, . . . provide the Trish economy with a competitive advantage in Europe, and ... give Irish law a leadership position in EU copyright debates").
76 See HARGREAVES, supra note 62, at 44 (discussing the benefits of fair use to U.S. technology companies).
77 See Peter К. Yu, Fair Use and Its Global Paradigm Evolution, 2019 U.ILL.L. REV. 111, 128 [hereinafter Yu, Paradigm Evolution] ("Australia, Hong Kong, Ireland, Israel, Liberia, Malaysia, the Philippines, Singapore, South Korea, Sri Lanka, and Taiwan have already adopted or proposed to adopt the fair use regime or its close variants."); see also infra text accompanying note 118.
78 17U.S.C. $ 107. The U.S. fair use doctrine dates back almost 200 years. Folsom v. Marsh, 9 F. Cas. 342 (C.C.D. Mass. 1841) (No. 4901). However, it was not codified until 1976. See generally Matthew Sag, The PreHistory of Fair Use, 76 BROOK. L. REV. 1371 (2011) (tracing the origins of American fair use doctrine back to nineteenth-century English copyright cases on fair abridgment).
79 See Niva Elkin-Koren, The New Frontiers of User Rights, 32 AM. U. INTL L. REV. 1, 18-19 (2016) (tracing the Israeli fair use doctrine to the 1993 Israeli Supreme Court decision of Geva v. Walt Disney Co.).
80 See $ 19, Copyright Act, 5768-2007, LSI 34 (2007) (Isr.).
81 17U..C. $ 107.
82 See supra notes 15-19; see also Sag, New Legal Landscape, supra note 61, at 310-29 (discussing these cases). For application to generative Al, see generally Sag, Copyright Safety, supra note 44; Sag, Fairness and Fair Use, supra note 25.
83 See Fox News Network, LLC у. TVEyes, Inc, 883 F.3d 169, 173-74 (2d Cir. 2018); Associated Press v. Meltwater U.S. Holdings, Inc., 931 F. Supp. 2d 537, 541 (S.D.N.Y. 2013).
84 [ISR.] MINISTRY OF JUST., OPINION: USES OF COPYRIGHTED MATERIALS FOR MACHINE LEARNING (2022), https://www.gov.il/BlobFolder/legalinfo/machine-learning/he/18-12-2022.pdf [hereinafter MOJ OPINION]. While the Opinion is not binding on courts, it is expected to weigh heavily on their approach to cases involving alleged infringement through TDM. See Jonathan Band, Israel Ministry of Justice Issues Opinion Supporting the Use of Copyrighted Works for Machine Learning, DISRUPTIVE COMPETITION PROJECT (Jan. 19, 2023), https://project-disco.org/intellectual-property/011823-israel-ministry-of-justice-issues-opinionsupporting-the-use-of-copyrighted-works-for-machine-learning/.
85 See МОЛ OPINION, supra note 84, at 9.
86 1d. at7.
87 See id. at 8 ("Datasets that purposely comprise of a specific type of works (typically for the purpose of producing identical products) might be excluded from the Opinion . . ..").
88 Although the Opinion does not use the term "nonexpressive use," its analysis seems consistent with this principle.
89 1d. ate.
90 17 USC. § 107, see also Authors Guild, Inc. v. HathiTrust, 755 F. 3d 87, 97 (2d Cir. 2014) (citing Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994)).
9l See 17 U.S.C. $ 102(a) (protecting "original works of authorship fixed in any tangible medium of expression").
92 MOI OPINION, supra note 84, at 18.
93 Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023); see also Pamela Samuelson, Justifications for Fair Uses, 2025 WIS. L. REV. (forthcoming) (arguing that Andy Warhol Foundation has preserved the flexible and well-balanced standards for assessing fair use defenses that the Court first established in Campbell).
94 Campbell, 510 U.S. 569.
95 Andy Warhol Found., 598 U.S. at 510.
96 Id at 571 (Kagan, J., dissenting) (quoting Campbell, 510 U.S. at 579).
97 See MOT OPINION, supra note 84, at 10.
98 Id. at 19.
99 See id.
100 See SAMUELSON ET AL., USCO COMMENT, supra note 11, at 15; see also Authors Guild, Inc. v. HathiTrust, 755 F. 3d 87, 98 (2d Cir. 2014) (holding that the second fair-use factor "may be of limited usefulness where . . . the creative work is being used for a transformative purpose" (quoting Cariou у. Prince, 714 F.3d 694, 710 (2d Cir. 2013)) (internal quotation marks omitted)).
101 See Sag, Fairness and Fair Use, supra note 25, at 1913.
102 See МОТ OPINION, supra note 84, at 19 (citing Israeli authorities and U.S. academic studies).
103 17U.S.C. $ 107.
104 See Campbell у. Acuff-Rose Music, Inc., 510 U.S. 569, 586-87 (1994) ("[T]he extent of permissible copying varies with the purpose and character of the use.").
105 See МОЛ OPINION, supra note 84, at 20.
106 See, e.g., Authors Guild, Inc. у. Google, Inc., 804 F.3d 202, 221 (2d Cir. 2015) ("Complete unchanged copying has repeatedly been found justified as fair use when the copying was reasonably appropriate to achieve the copier's transformative purpose and was done in such a manner that it did not offer a competing substitute for the original."); Authors Guild, Inc. у. HathiTrust, 755 Е. 3d 87, 98 (2d Cir. 2014) ("In order to enable the full-text search function, the [defendant] Libraries . . . created digital copies of all the books in their collections. Because it was reasonably necessary for the [HathiTrust Digital Library] to make use of the entirety of the works in order to enable the full-text search function, we do not believe the copying was excessive.").
107 17U.S.C. $ 107.
108 See id. $ 102(b).
109 See Campbell, 510 U.S. at 591-92; see also AV ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630, 644 (4th Cir. 2009) ("Clearly no market substitute was created by iParadigms, whose archived student works do not supplant the plaintiffs' works in the 'paper mill" market so much as merely suppress demand for them, by keeping record of the fact that such works had been previously submitted . . . . In our view, then, any harm here is not of the kind protected against by copyright law.").
110 See, e.g., Cambridge Univ. Press v. Patton, 769 F.3d 1232, 1265 (11th Cir. 2014) (noting that "the reasoning is somewhat circular" when the failure to pay a potential licensing fee is used to disprove fair use).
111 See МОЛ OPINION, supra note 84, at 20-22.
112 See Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 100 (2d Cir. 2014) ("Lost licensing revenue counts under Factor Four only when the use serves as a substitute for the original and the full-text-search use does not.").
113 See Authors Guild у. Google, Inc., 804 F.3d 202, 223 (2d Cir. 2015) (framing the question as "whether the copy brings to the marketplace a competing substitute for the original, or its derivative, so as to deprive the rights holder of significant revenues because of the likelihood that potential purchasers may opt to acquire the copy in preference to the original").
114 See Sag, Fairness and Fair Use, supra note 25, at 1916-21.
115 See id.
116 We code this as "unclear" in Table 2.
117 MOI OPINION, supra note 84, at 39.
118 Yu, Paradigm Evolution, supra note 77, at 115-16. Despite technically retaining a "fair dealing" regime, Canada could be added to this club. Fair dealing regimes are typically regarded as inflexible and unresponsive to technological change. See id. at 126-27. Nevertheless, over the past two decades, Canadian "fair dealing" jurisprudence has embraced users' rights and become indistinguishable from the U.S. fair use regime in practice. For discussions of these similarities, see generally Michael Geist, Fairness Found: How Canada Quietly Shifted from Fair Dealing to Fair Use, in THE COPYRIGHT PENTALOGY: HOW THE SUPREME COURT OF CANADA SHOOK THE FOUNDATIONS OF CANADIAN COPYRIGHT LAW 157, 176 (Michael Geist ed., 2013) [hereinafter COPYRIGHT PENTALOGY]; Ariel Katz, Fair Use 2.0: The Rebirth of Fair Dealing in Canada, in COPYRIGHT PENTALOGY, supra, at 93, 95. See generally David Vaver, User Rights: Fair Use and Beyond, 69 J. COPYRIGHT SOC'Y U.S.A. 337 (2022) (discussing users" rights in Canada). Like Canada, Malaysia has a fair dealing regime that functions like a fair use regime. See Yu, Customizing Fair Use, supra note 69, at 5-7.
119 See discussion infra Section П.В.
120 See ALRC FINAL REPORT, supra note 75, at 115 ("The opponents of fair use have pointed to research indicating that the outcome of fair use cases is unpredictable."); Matthew Sag, Predicting Fair Use, 73 OHIO ST. L.J. 47, 48 n.1 (2012) (citing sources claiming that fair use is unpredictable). For contrary views, see generally Michael J. Madison, A Pattern-Oriented Approach to Fair Use, 45 WM. & MARY L. REV. 1525 (2004) (advancing a pattern-oriented approach to fair use decisions); Sag, Predicting Fair Use, supra (empirically assessing the predictability of fair use outcomes in litigation); Pamela Samuelson, Unbundling Fair Uses, 77 FORDHAM L. REV. 2537 (2009) (identifying common "policy-relevant clusters" of fair use cases that lend the doctrine coherence).
121 See Yu, Paradigm Evolution, supra note 77, at 128 ("Australia, Hong Kong, Ireland, Israel, Liberia, Malaysia, the Philippines, Singapore, South Korea, Sri Lanka, and Taiwan have already adopted or proposed to adopt the fair use regime or its close variants.").
122 Australia, Hong Kong, and Ireland seriously considered adopting a fair use standard but were unable to do so. See ALRC FINAL REPORT, supra note 75, at 123-60 (recommending the introduction of a fair use exception); CRC FINAL REPORT, supra note 75, at 93-94 (recommending the introduction of the fair use exception as a new Section 49A of the Irish Copyright and Related Rights Act); Legislative Council, Amendments to Be Moved by the Honourable CHAN Kam-Lam, SBS, JP 4 (2015) (HK.), http://www.legco.gov.hk/yr15-16/english/counmtg/papers/cm20151209cb3-219-e.pdf (LC Paper No. CB(3) 219/15-16) (providing the text of the fair use proposal that was tabled for legislative debate in Hong Kong).
123 See supra text accompanying note 75.
124 See Yu, Paradigm Evolution, supra note 77, at 125 (noting that fair dealing regimes in the United Kingdom and many Commonwealth jurisdictions "promote[] a closed system of copyright limitations and exceptions"); see also Peter K. Yu, The Quest for a User-Friendly Copyright Regime in Hong Kong, 32 AM. U. INT'LL. REV. 283, 327 (2016) [hereinafter Yu, User-Friendly Copyright Regime] ("[A] better way to distinguish between fair dealing and fair use is to describe the former as a closed-ended, purpose-based regime and the latter as an open-ended, flexible regime.").
125 See Yu, User-Friendly Copyright Regime, supra note 124, at 327 (describing close-ended fair dealing regimes as "purpose-based").
126 See, e.g., Copyright Ordinance, (1997) Cap. 528, $$ 38, 41A, 54A (Н.К.) (incorporating the fairness factors), Hubbard v. Vosper, [1972] 2 QB 84 (Eng.) (defining fair dealing by identifying factors that resemble those found in the U.S. fair use provision); see also Giuseppina D'Agostino, Healing Fair Dealing? A Comparative Copyright Analysis of Canada's Fair Dealing to U.K. Fair Dealing and U.S. Fair Use, 53 MCGILL L.J. 309, 342-43 (2008) (extracting from English copyright law the following fairness factors: nature of the work, how the work was obtained, amount taken, uses made, commercial benefit, motives for the dealing, consequences of the dealing, and purpose achieved by different means); Yu, User-Friendly Copyright Regime, supra note 124, at 323 ("[B]ecause of the common law tradition in those Commonwealth jurisdictions embracing the fair dealing model, the use of fairness factors often emerge through case law even when those factors have not been written into the statutory provisions.").
127 TRIPS Agreement, supra note 70, art. 13; see also Japanese Copyright Act art. 30-4 (including a general limitation that the use must not "unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation").
128 Geist, supra note 118, at 158.
129 See generally Tatsuhiro Ueno, The Flexible Copyright Exception for "Non-Enjoyment" Purposes, 70 GRURINT'L 145 (2021) (discussing this provision and the 2018 amendment).
130 Japanese Copyright Act art. 30-4; see also He Tianxiang, Copyright Exceptions Reform and AI Data Analysis in China: A Modest Proposal, in ARTIFICIAL INTELLIGENCE AND INTELLECTUAL PROPERTY 196, 209- 11 (Lee Jyh-An, Reto Hilty & Liu Kung-Chung eds., 2021) (comparing the old Articles 30-4 and 47-7 with the amended Article 30-4).
131 Japanese Copyright Act art. 30-4.
132 Id.
133 See Ueno, supra note 129, at 152 ("There seems to be a certain similarity between the concept of 'Freier Werkgenuss" and the theory behind Art. 30-4 of the Japanese Copyright Act . . . ."). In Germany, Section 24(1) of the Copyright Act provided that "Ta]n independent work created in the free use of the work of another person may be published and exploited without the consent of the author of the work used." Urheberrechtsgesetz [UrhG] [Copyright Act], $ 24(1) (repealed 2021) (Ger.). This provision has since been repealed. See Case C-476/17, Pelham GmbH у. Hütter, ECLI:EU:C:2019:624, 11 56-65 (July 29, 2019) (finding that the provision was inconsistent with Article 5 of the EU Directive on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society (InfoSoc Directive)).
134 Japanese Copyright Act art. 30-4.
135 Id.
136 74 Tables 1 and 2, and the footnotes therein, address the additional details relating to this provision.
137 See supra text accompanying notes 69-71.
138 Japanese Copyright Act art. 47-5.
139 Ueno, supra note 129, at 149.
140 COUNCIL FOR CULTURAL AFFS., COPYRIGHT DIV., SUBCOMM. ON LEGAL SYS. GENERAL UNDERSTANDING ON AI AND COPYRIGHT IN JAPAN [AI & E1FHEIL AT 2577121 YT] (2024), available in Japanese<https://www.bunka.go.jp/seisaku/bunkashingikai/chosakuken/hoseido/r05 07/pdf/94024201> 01.pdf, see also JAPAN COPYRIGHT OFF., GENERAL UNDERSTANDING ON АТ AND COPYRIGHT IN JAPAN: OVERVIEW (2024),
https://www.bunka.go.jp/english/policy/copyright/pdf/94055801 01.pdf (providing an overview of this report); Kenji Tosaki, Hiroki Tajima & Chie Komiya, Report on AI and Copyright Issues by Japanese Government, NAGASHIMA OHNO & TSUNEMATSU (Apr. 2024), https://www.noandt.com/en/publications/publication20240325-3 (discussing the report).
141 See JAPAN COPYRIGHT OFF., supra note 140, at 11.
142 See id. at 10.
143 HARGREAVES, supra note 62.
144 See id. at 48.
145 The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014, $1 2014/1372, art. 3(2) (UK).
146 Copyright, Designs and Patents Act 1988, с. 48, $ 29A(1)(a) (UK).
147 UK copyright law does not allow for commercial TDM. However, commercialization ofresearch outputs is permitted where the "original purpose of carrying out the text and data mining analysis is solely noncommercial" UK INTELL. PROP. OFF, EXCEPTIONS TO COPYRIGHT: RESEARCH 10 (2014), https://assets.publishing.service.gov.uk/media/5a7d678ee5274a02dcdf4502/Research.pdf.
148 Copyright, Designs and Patents Act 1988, с. 48, $ 29A(1)(b), (2), (3) (UK). Both the person copying the copyrighted work and the person performing the TDM must have lawful access to the work. See id. $ 29A(1).
149 See, e.g., id. $ 29(1) (providing a fair dealing exception for research and private study); id. $ 30(1), (2) (providing a fair dealing exception for criticism, review, quotation, and news reporting).
150 See id. $ 29A(5).
151 See Alina Trapova & Jodo Pedro Quintais, The UK Government Moves Forward with a Text and Data Mining Exception for All Purposes, KLUWER COPYRIGHT BLOG (Aug. 24, 2022), https://copyrightblog kluweriplaw.com/2022/08/24/the-uk-government-moves-forward-with-a-text-and-datamining-exception-for-all-purposes.
152 See UK Withdraws Plans for Broader Text and Data Mining (TDM) Copyright and Database Right Exception, HERBERT SMITH FREEHILLS (Mar. 1, 2023), https://4Www.herbertsmithfreehills.com/notes/ip/202303/uk-withdraws-plans-for-broader-text-and-data-mining-tdm-copyright-and-database-right-ex ception.
153 Directive 2019/790, 2019 O.J. (L 130) 92 [hereinafter DSM Directive].
154 Other more controversial issues included Articles 15 and 17 of the Directive. See id. art. 15 (offering "[p]rotection of press publications concerning online uses"); id. art. 17 (regulating "Tujse of protected content by online content-sharing service providers," including filtering and licensing obligations).
155 Id. arts. 3, 4; see also Jodo Pedro Quintais, What is a "Research Organisation" and Why it Matters: From Text and Data Mining to AI Research, 74 GRUR INT'L 397 (2025) (discussing the concept of "research organization" in the context of the DSM Directive). EU Directives are intended to take effect through national implementing legislation. Once the deadline of national implementation has passed, other EU members can challenge noncompliance with these directives before the Court of Justice of the European Union. The deadline for implementing the DSM Directive was June 7, 2021. DSM Directive, supra note 153, art. 29(1).
156 DSM Directive, supra note 153, art. 3.
157 The DSM Directive defines "research organisation" and "cultural heritage institution" to exclude primarily profit-motivated entities. 7d. art. 2(1), (3).
158 This coverage was fairly clear from the definition of TDM in Article 2(2), which speaks to "any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations." Id. art. 2(2). It was made even clearer by the express linkage of Article 4(3) of the DSM Directive to the training of general-purpose AI models in Article 53(1)(c) of the EU AI Act. See id. art. 4(3); EU AI Act, supra note 13, art. 53(1)(c); see also infra text accompanying notes 347-349.
159 Landgericht Hamburg, Sept. 27, 2024, 310 O 227/23 (Ger.), https://pdfupload.io/docs/4bcc432c; see also Unofficial English Translation by ChatGPT, CHATGPT IS EATING THE WORLD (Sept. 30, 2024), https://chatgptiscatingtheworld.com/2024/09/28/unofficial-english-translation-of-german-courts-decisionkneschke-v-laion-under-tdm-exception/ (providing an unofficial English translation). In this case, the plaintiff photographer filed a copyright claim against the LAION (Large-Scale Artificial Intelligence Open Network) image database for making unlicensed use of his copyrighted works for training AI models. See id. The Hamburg Regional Court dismissed the case under Section 60d of the German Copyright Act (Urheberrechtsgesetz), which implemented Article 3 of the DSM Directive. Id.
160 DSM Directive, supra note 153, art. 3(1).
161 See id. art. 7(1).
162 74 art. 3(3).
163 Article 4 is not listed in Article 7, which prohibits the contractual override of select provisions of the DSM Directive. Article 4 does not expressly address the issue of TPM, as the circumvention of these measures are subject to the EU InfoSoc Directive. See Directive 2001/29, art. 6(1), 2001 O.J. (L 167) 10 [hereinafter InfoSoc Directive].
164 DSM Directive, supra note 153, art. 4(3).
165 See Paul Keller & Zuzanna Warso, Defining Best Practices for Opting Out of ML Training, 5, OPEN FUTURE PoL'Y BRIEF (Sept. 29, 2023), https://openfuture.eu/wp-content/uploads/2023/09/Best- practices for optout ML training.pdf (noting that "it is currently unclear how opt-outs from [machine learning] training based on the machine-readable reservation of rights provided for in Article 4 will work in practice, as there are currently no generally recognized standards or protocols for the machine-readable expression of the reservation").
166 DSM Directive, supra note 153, art. 3(2).
167 1d. art. 4(2).
168 See id. art. 7(2) ("Article 5(5) of Directive 2001/29/EC shall apply to the exceptions and limitations provided for under this Title.").
169 See supra text accompanying notes 69-71.
170 InfoSoc Directive, supra note 163, art. 5(5).
171 Copyright Act 2021 [Singapore Copyright Act] (2020 Rev Ed) $ 244(1)~(2)(d), (3) (Sing).
172 Id. $ 243; see also MINISTRY OF L. & INTELL. PROP. OFF. OF SING., SINGAPORE COPYRIGHT REVIEW REPORT 32-34 (2019). Section 244 of the Singapore Copyright Act also provides for the limited communication of such works to the public for the purposes of verification or to enable "collaborative research or study relating to the purpose" of the original computational data analysis. Singapore Copyright Act $ 244(2)(c)(ii). The Act further imposes an obligation to identify authors in such communication and confers other similar moral rights. 1d. §§ 369-407.
173 Singapore Copyright Act $ 243.
174 1d. $ 244(2)(d).
175 1d. $ 244(2)(e)(i))(B). The Act defines a "flagrantly infringing online location" as "an online location that has been or is being used to flagrantly commit or facilitate rights infringements." 7d. $ 99(1). It also provides seven factors for determining whether an online location is flagrantly infringing. 7d. $ 99(2).
176 See id. 8 425. Any work accessed through TPM circumvention would be unlawfully accessed and would therefore be excluded from Section 244's permissions.
177 See id. $ 187(c). Evasion of this provision through choice of law is also prohibited. 7d. $ 188.
178 See discussion supra Section IL.A.2.
179 See Peter К. Yu, The Future Path of Artificial Intelligence and Copyright Law in the Asian Pacific, 33 MICH. ST. INTL L. REV. (forthcoming 2025) [hereinafter Yu, Future Path].
180 See generally ANU BRADFORD, DIGITAL EMPIRES: THE GLOBAL BATTLE TO REGULATE TECHNOLOGY 69-104 (2023) [hereinafter BRADFORD, DIGITAL EMPIRES] (exploring the ongoing rivalries between China, the European Union, and the United States over their varying models of technology regulation and their efforts to export those models).
181 See STANFORD UNIV., INST. FOR HUMAN-CENTERED A.L, ARTIFICIAL INTELLIGENCE INDEX REPORT 2024, at 19 (2024) [hereinafter AT INDEX 2024] ("In 2013, China°s installations [of industrial robots] accounted for 20.8% of the global total, a share that rose to 52.4% by 2022.").
182 See EUR. PARLIAMENTARY RSCH. SERV., AI INVESTMENT: EU AND GLOBAL INDICATORS (2024), https://www.europarl.europa.cu/RegData/etudes/ AT AG/2024/760392/EPRS ATA(2024)760392 EN.pdf (noting that "[t]he US is leading private investment in AI (€62.5 billion) in 2023, followed by China (€7.3 billion)"); Paul Triolo & Kendra Schaefer, China's Generative AI Ecosystem in 2024: Rising Investment and Expectations, NAT'L BUREAU OF ASIAN RSCH. (June 27, 2024), https://www.nbr.org/publication/chinasgenerative-ai-ecosystem-in-2024-rising-investment-and-expectations/ (discussing China's environment for АТ development).
183 See generally LEE KAI-FU, AI SUPERPOWERS: CHINA, SILICON VALLEY, AND THE NEW WORLD ORDER (2018) (documenting China's substantial engagement in the AI space and its active development of AI-driven products and services).
184 Guowuyuan Guanyu Yinfa Xinyidai Rengong Zhineng Fazhan Guihua De Tongzhi, Guofa [2017] Sanshiwu Hao (El $$ XT ERİR RN TE REZ EN AAA, EX [2017] 355) [Notice of the NextGeneration Artificial Intelligence Development Plan, Notice No. 35 [2017]] (issued by the State Council, July 20, 2017).
185 See generally BRADFORD, DIGITAL EMPIRES, supra note 180, at 69-104 (discussing Chinas state-driven regulatory model). For discussions of China's Belt and Road Initiative in the intellectual property context, see generally Lee Jyh-An, The New Silk Road to Global IP Landscape, in LEGAL DIMENSIONS OF CHINAS BELT AND ROAD INITIATIVE 417 (Lutz-Christian Wolff & Xi Chao eds., 2016); Peter K. Yu, Building Intellectual Property Infrastructure Along China's Belt and Road, 14 U. PA. ASIANL. REV. 281 (2019); Peter К. Yu, China, "Belt and Road" and Intellectual Property Cooperation, 14 GLOB. TRADE & CUSTOMS J. 244 (2019); Zhang Hongzhou & Shaleen Khanal, To Win the Great AI Race, China Turns to Southeast Asia, ASIA POL"Y, Jan. 2024, at 21.
186 ATINDEX 2024, supra note 181, at 14, 19.
187 WORLD INTELL. PROP. ORG., GENERATIVE ARTIFICIAL INTELLIGENCE: PATENT LANDSCAPE REPORT 8 (2024).
188 Id.
189 See, e.g., David Goldman & Matt Egan, A Shocking Chinese AI Advancement Called DeepSeek Is Sending US Stocks Plunging, CNN (Jan. 27, 2025, 4:21 PM), https://www.cnn.com/2025/01/27/tech/deepseekstocks-ai-china/index.html (discussing the disruption caused by DeepSeek); Liu Tongliang, DeepSeek: How a Small Chinese AI Company Is Shaking up US Tech Heavyweights, THE CONVERSATION (Jan. 28, 2025, 1:13 AM), https://theconversation.com/deepseek-how-a-small-chinese-ai-company-is-shaking-up-us-techheavyweights-248434 (discussing the challenges DeepSeek has posed to its technology competitors in the United States and other parts of the world).
190 Zhonghua Renmin Gongheguo Zhuzuoquan Fa (HALA RAEE (FL) [Copyright Law of the People's Republic of China] [2020 Chinese Copyright Law] (promulgated by the Standing Comm. Nat'l People's Cong., Sept. 7, 1990, amended Nov. 11, 2020, effective June 1, 2021), http://www.npc.gov.cn/englishnpc/c23934/202109/ac0£0804894b4£71949016957eec45a3 .shtml.
191 For discussions of the Third Amendment, see generally Peter К. Yu, The Long and Winding Road to Effective Copyright Protection in China, 49 PEPP. L. REV. 681 (2022) [hereinafter Yu, Long and Winding Road], Peter К. Yu, Third Amendment to the Chinese Copyright Law, 69 J. COPYRIGHT Soc" Y U.S.A. 5 (2022). See generally Symposium, Third Amendment to the Chinese Copyright Law, 69 J. COPYRIGHT S°C'Y U.S.A. 1 (2022) (collecting essays that closely examine this amendment).
192 2020 Chinese Copyright Law art. 24.
193 Zhonghua Renmin Gongheguo Zhuzuoquan Fa (HALA RAEE EAL) [Copyright Law of the People's Republic of China] (promulgated by the Standing Comm. Nat"1 People's Cong., Sept. 7, 1990, amended Oct. 27, 2001, effective Nov. 1, 2001), art. 22, http://www.asianlii.org/cn/legis/cen/laws/cloproc372/.
194 2020 Chinese Copyright Law art. 24(13). For discussions of this provision, see generally He Tianxiang, The Copyright Limitations of the 2020 Copyright Law of China: A Satisfactory Compromise? , 69 J. COPYRIGHT Soc'y U.S.A. 107 (2022); Hua (Jerry) Jie, Copyright Exceptions for Text and Data Mining in China: Inspiration from Transformative Use, 69 J. COPYRIGHT Soc" Y U.S.A. 123 (2022).
195 See Yu, Long and Winding Road, supra note 191, at 721 (noting "the drafting of the pending implementing regulations").
196 See discussion supra Section Ш.С.
197 Shengcheng Shi Rengong Zhineng Fuwu Guanli Zhanxing Banfa (4 UA TE ВОЛЕ EEE 2547 Я 1%) [Interim Measures for the Management of Generative Artificial Intelligence Services] (promulgated by the Cyberspace Admin. of = China, July 10, 2024, effective Aug. 15, 2024), https://www.chinalawtranslate.com/en/generative-ai-interim [hereinafter Interim Measures].
198 EU AI Act, supra note 13.
199 Interim Measures, supra note 197, arts. 4, 7.
200 7d. art. 4.
201 Td. art. 7.
202 For discussions of Chinese technology regulation in the area of generative AI, see generally ZENG JINGHAN, ARTIFICIAL INTELLIGENCE WITH CHINESE CHARACTERISTICS: NATIONAL STRATEGY, SECURITY AND AUTHORITARIAN GOVERNANCE (2022); ANGELA HUYUE ZHANG, HIGH WIRE: How CHINA REGULATES BIG TECH AND GOVERNS ITS ECONOMY 277-91 (2024); Cheng Jing & Zeng Jinghan, Shaping AT's Future? China in Global AI Governance, 32 J. CONTEMP. CHINA 794 (2023); Angela Huyue Zhang, The Promise and Perils of China's Regulation of Artificial Intelligence, 63 COLUM. J. TRANSNAT'L L. 1 (2025); Matt Sheehan, China 's AI Regulations and How They Get Made, CARNEGIE ENDOWMENT FOR INT'L PEACE (July 20, 2023), https://carnegieendowment.org/research/2023/07/chinas -ai-regulations-and-how-they-get-made? lang=en; Triolo & Schaefer, supra note 182.
203 See Susan Ariel Aaronson, The Age of AI Nationalism and Its Effects (Ctr. for Intl Governance Innovation, Paper No. 306, 2024).
204 See Maureen Farrell & Rob Copeland, Saudi Arabia Plans $40 Billion Push into Artificial Intelligence, N.Y. TIMES, Mar. 19, 2024, at В1.
205 For discussions of intellectual property protection and Shari'a, see generally Tabrez Ebrahim, Islamic Intellectual Property, 54 SETON HALL L. REV. 991 (2024); Tabrez Y. Ebrahim, Intellectual Property Through a Non-Western Lens: Patents in Islamic Law, 37 GA. ST. U. L. REV. 789 (2021).
206 (UAE) NAT'L PROG. FOR AI, UAE NATIONAL STRATEGY FOR ARTIFICIAL INTELLIGENCE 2031, at 7 (2018).
207 Press Release, Tech. Innovation Inst., UAE's Technology Innovation Institute Launches Open-Source "Falcon 40B" Large Language Model for Research & Commercial Utilization (May 25, 2023), https://www.tii.ac/news/uaes-technology-innovation-institute-launches-open-source-falcon-40b-largelanguage-model; see also Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru et al., The Falcon Series of Open Language Models (Nov. 28, 2023) (unpublished manuscript), https://arxiv.org/abs/2311.16867 (providing a technical report on Falcon models).
208 See Quentin Malartic, Nilabhra Roy Chowdhury, Ruxandra Cojocaru, Mugariya Farooq, Giulia Campesan et al, Falcon2-11B Technical Report (July 20, 2024) (unpublished manuscript), https://arxiv.org/abs/2407.14885 (providing a technical report on Falcon 2 models).
209 Federal Decree-Law No. (38) of 2021 on Copyrights and Neighboring Rights, https://www.moec.gov.ae/documents/20121/376326/copyright.pdf/1b4d5d16-8e3c-6012-afa8-56cd4eb008da.
210 74 82201).
211 See supra text accompanying notes 134-137.
221 Td; see also supra text accompanying notes 69-71.
213 But cf. UK INTELL. PROP. OFF., supra note 147 ("{O]riginal purpose of carrying out the text and data mining analysis is solely non-commercial.").
214 See Adam Satariano & Paul Mozur, The Global Race to Control A.I., N.Y. TIMES (Aug. 14, 2024), https//www.nytimes.com/2024/08/14/briefing/ai-china-us-technology.html ("Ца Saudi Arabia, Crown Prince Mohammed bin Salman is pouring billions into АЛ. development and striking deals with companies like Amazon, 1.В.М. and Microsoft to make his country a major new hub.").
215 Affordances are defined as "relational properties of things in the environment whose meaning or significance is derived from their service to a given agent's needs or capabilities." Jake Goldenfein, Deirdre K. Mulligan, Helen Nissenbaum & Wendy Ju, Through the Handoff Lens: Competing Visions of Autonomous Futures, 35 BERKELEY TECH. L.J. 835, 846 (2020) (citing JAMES J. GIBSON, THE ECOLOGICAL APPROACH TO VISUAL PERCEPTION 127-28 (1979)).
216 DSM Directive, supra note 153, art. 4(3); see also supra text accompanying notes 163, 348-349.
217 DSM Directive, supra note 153, arts. 3, 4.
218 Id.
219 Council Directive 96/9, art. 7(1), 1996 O.J. (L 77) 20 (EC).
220 Singapore Copyright Act § 244(4); see also supra note 172.
221 See Peter К. Yu, Anticircumvention and Anti-Anticircumvention, 84 DENV. U. L. REV. 13, 34 n.99 (2006) ("A self-executing treaty is one that can be enforced in courts without prior implementing legislation. In jurisdictions where [treaties] are self-executing, courts will directly apply the treaties as if they are domestic laws").
222 See, e.g., Murray v. The Schooner Charming Betsy, 6 U.S. (2 Cranch) 64, 118 (1804) ("An act of Congress ought never to be construed to violate the law of nations, if any other possible construction remains.").
223 Copyright, Designs and Patents Act 1988, с. 48, $ 29A (UK).
224 See Eleonora Rosati, No Step-Free Copyright Exceptions: The Role of the Three-Step in Defining Permitted Uses of Protected Content (Including TDM for AI-Training Purposes), 46 EUR. INTELL. PROP. REV. 262, 271 (2024) ("[L]ike most UK defences, s.29A СОРА is framed within fair dealing. Hence, a court tasked with determining whether the provision is applicable in the circumstances at hand will need to determine if the relevant conditions are satisfied, including having regard to the fairness of the dealing at hand.").
225 Singapore Copyright Act § 244. Accord Rosati, supra note 224, at 271 (holding a similar view).
226 17 U.S.C. $ 107; $19, Copyright Act, 5768-2007, LSI 34 (2007) (Isr.); see also supra text accompanying notes 69-71.
227 See JAPAN COPYRIGHT OFF., supra note 140, at 11; see also supra text accompanying notes 140-141.
228 Copyright, Designs and Patents Act 1988, с. 48, $ 29(5) (UK); see also supra text accompanying note 150.
229 See Sag, Copy-Reliant Technology, supra note 12, at 1675.
230 $19, Copyright Act, 5768-2007, LSI 34 (2007) (Isr.).
231 Copyright, Designs and Patents Act 1988, с. 48, $ 29A(1)(b) (UK).
232 DSM Directive, supra note 153, art. 3(2).
233 Td. art. 33).
234 Authors Guild у. HathiTrust, 755 F.3d 87, 100-01 (2d Cir. 2014); Authors Guild у. Google, Inc., 804 Е. 3d 202, 227-28 (2d Cir. 2015).
235 See supra notes 223-225.
236 Singapore Copyright Act $ 244(2)(d), (e(ii)(B).
237 1d. $ 244.
238 See Yu, Customizing Fair Use, supra note 69, at 10 (noting that a full understanding of the operation of a copyright statute will go beyond the analysis of statutory provisions and "will require follow-up studies on its utilization and interpretation by courts, law enforcement authorities, copyright holders and other parties"). See generally Roscoe Pound, Law in Books and Law in Action, 44 AM. L. REV. 12 (1910) (distinguishing between "law in books" and "law in action").
239 Rosati, supra note 224, at 270.
240 See AUSTL. L. REFORM COMM"N, COPYRIGHT AND THE DIGITAL ECONOMY: DISCUSSION PAPER 74-75 (2013) (noting that "it would take many years to develop case law-especially given that Australia is not as populous or litigious a society as the US"); Yu, User-Friendly Copyright Regime, supra note 124, at 334 ("TUJntil the parties appear before a court, it is difficult to know for certain whether the conduct at issue is permissible. With limited case law, it may also be hard to predict the outcome of the case.").
241 As Claude Masouyé observed in WIPO's official guide to the Berne Convention, an international instrument with more than 180 state parties: "A fundamental point is that ideas, as such, are not protected by copyright. . . . [O]nce that idea has been elaborated and expressed, copyright protection exists for the words, notes, drawings, etc., in which it is clothed. In other words, it is the form of expression which is capable of protection and not the idea itself." CLAUDE MASOUYE, GUIDE TO THE BERNE CONVENTION FOR THE PROTECTION OF LITERARY ARTISTIC WORKS (PARIS ACT, 1971) 12 (1978); see also HANS KELSEN, PURE THEORY OF LAW 8- 9 (Max Knight trans., 2d ed., Univ. of Cal. Press 1967) (introducing the concept of "Grundnorm").
242 17US.C. $ 102(a).
243 Td. $ 102(b). The distinction is longstanding. See e.g., Baker у. Selden, 101 U.S. 99, 104 (1880) (distinguishing between the protectable expression of a bookkeeping system and the unprotectable system itself).
244 Feist Publ'ns, Inc. у. Rural Tel. Serv. Co., 499 U.S. 340, 353 (1991).
245 Golan v. Holder, 565 U.S. 302, 328-29 (2012); Eldred у. Ashcroft, 537 U.S. 186, 219 (2003); Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 556 (1985).
246 Cf Designers Guild Ltd. v. Russell Williams (Textiles) Ltd. [2000] 1 WLR 2416 (Hoffman, L.J.) (noting that the idea-expression dichotomy "needs to be handled with саге").
247 Hollinrake v. Trustwell, (1894) 3 Ch. 420, 427 (CA), http://www.commonlii.org/uk/cases/UKLawRpCh/1894/158 html.
248 Japanese Copyright Act art. 30-4; see also discussion supra Section II.B.1.
249 Directive 2009/24, art. 1.2, 2009 O.J. (L 111) 16 (EC): see also THE WITTEM PROJECT, EUROPEAN COPYRIGHT CODE art. 1.1(3) (2010) (stating that ideas and theories "are not, in themselves, to be regarded as expressions within the field of literature, art or science within the meaning of" the European Copyright Code).
250 TRIPS Agreement, supra note 70, art. 9.2.
251 UNITED NATIONS CONF. ON TRADE & DEV.-INT'L CTR. FOR TRADE AND SUSTAINABLE DEV., RESOURCE Book ON TRIPS AND DEVELOPMENT 143 (2005) [hereinafter TRIPS RESOURCE BOOK].
252 Members and Observers, WORLD TRADE ORG.,<https://www.wto.org/english/thewto_e/whatis e/tif e/org6> e.htm (last visited Mar. 7, 2025).
253 See TRIPS RESOURCE BOOK, supra note 251, at 139 ("[TThe rule that copyright protection extends only to expressions and not to the underlying ideas is generally recognized in all countries."); see also Peter K. Yu, Clusters and Links in Asian Intellectual Property Law and Policy, in ROUTLEDGE HANDBOOK OF ASIAN LAW 147, 150-51 (Christoph Antons ed., 2017) (crediting the WTO and its TRIPS Agreement for being "Tt]he primary driver of convergence of intellectual property laws in Asia").
254 See WIPO Copyright Treaty art. 2, Dec. 20, 1996, 2186 U.N.T.S. 121 ("Copyright protection extends to expressions and not to ideas, procedures, methods of operation or mathematical concepts as such.").
255 See Sag, Copy-Reliant Technology, supra note 12, at 1610.
256 See id. Other scholars have reached the same conclusion using different terminology. See, e.g., ABRAHAM DRASSINOWER, WHAT'S WRONG WITH COPYING? 88 (2015) ("[B]ecause the work is a communicative act, it cannot support entitlements in respect of merely technical or noncommunicative uses."); Maurizio Borghi & Stavroula Karapapa, Non-Display Uses of Copyright Works: Google Books and Beyond, 1 QUEEN MARY J. INTELL. PROP. 21, 21-22 (2011) (discussing "de-intellectualized" and "non-display uses"); Rossana Ducato & Alain Strowel, Ensuring Text and Data Mining: Remaining Issues with the EU Copyright Exceptions and Possible Ways Out, 43 EUR. INTELL. PROP. REV. 322, 334 (2021) (discussing the use of a copyrighted work "not ... as а work" in the TDM context).
257 See Oren Bracha, The Work of Copyright in the Age of Machine Production, 38 HARV. J.L. & TECH. 171, 181 (2024) ("Non-expressive training copies simply do not infringe from the outset, due to the most basic first principles of copyright that determine what subject matter lies within its domain in the first place."); see also DRASSINOWER, supra note 256, at 88 (making a similar argument).
258 See, e.g., Robert Brauneis, Copyright and the Training of Human Authors and Generative Machines, 48 COLUM. J.L. & ARTS 1 (2025); Jacqueline Charlesworth, Generative AT's Illusory Case for Fair Use, 27 VAND. J. ENT. & TECH. L. 323 (2025); Jane C. Ginsburg, Fair Use in the US Redux: Reformed or Still Deformed?, 2024 SING. J. LEGAL STUD. 52, 73-79; David W. Opderbeck, Copyright in AI Training Data: A Human-Centered Approach, 76 OKLA. L. REV. 951, 975-1009 (2024); Benjamin L.W. Sobel, Artificial Intelligence 's Fair Use Crisis, 41 COLUM. J.L. & ARTS 45, 49-79 (2017); see also Mark A. Lemley & Bryan Casey, Fair Learning, 99 TEX. L. REV. 743, 746 (2021) ("Given the doctrinal uncertainty and the rapid development of [machine learning] technology, it is unclear whether machine copying will continue to be treated as fair use.").
259 See Sag, Copyright Safety, supra note 44, at 313-25 (discussing the attenuated link between training data and model output); Peter K. Yu, Artificial Intelligence, Autonomous Creation, and the Future Path of Copyright Law, 50 BYU L. REV. 753 (2025) [hereinafter Yu, Autonomous Creation] (discussing the potential doctrinal changes to address concerns about the unlicensed use of copyrighted works to train those АТ systems that have the capacity to produce competing creative output); Pamela Samuelson, U.S. Copyright Office's Questions About Generative AI Generating More Questions than Answers, COMMC'NS ACM, Маг. 2024, at 25, 25-26 (summarizing the different submissions to the U.S. Copyright Office regarding the question of training data and generative Al output).
260 See discussion infra Section IV.A.1.
261 BRADFORD, DIGITAL EMPIRES, supra note 180.
262 See Aaronson, supra note 203.
263 See generally LEE, supra note 183.
264 See generally Daryl Lim & Peter K. Yu, The Antitrust-Copyright Interface in the Age of Generative Artificial Intelligence, 74 EMORY L.J. 847 (2025) (examining the changing antitrust-copyright interface and proposing reforms that would meet the needs and challenges posed by generative AI); Lina M. Khan, Opinion, We Must Regulate AI. Here's How, N.Y. TIMES (May 3, 2023), https://www.nytimes.com/2023/05/03/opinion/ai-lina-khan-ftc-technology html (explaining the need to use antitrust law to promote competition in the AI space); Pamela Samuelson, Christopher Jon Sprigman & Matthew Sag, The FTC's Misguided Comments on Copyright Office Generative AI Questions, PATENTLY-O (Dec. 11, 2023), https://patentlyo.com/patent/2023/12/misguided-copyright-generative.html (criticizing the Federal Trade Commission's submission to the U.S. Copyright Office on AI and copyright).
265 See EUR. PARLIAMENTARY RSCH. SERV., supra note 182, at 1 ("The US is leading private investment in AI (€62.5 billion) in 2023, followed by China (€7.3 billion) . . . . The EU and the United Kingdom . . . together attracted €9 billion worth of private investment in 2023").
266 ATINDEX 2024, supra note 181, at 5.
267 See Satariano & Mozur, supra note 214 ("The U.S. has advantages other countries cannot yet match. American tech giants control the most powerful AI models and spend more than companies abroad to build them. Top engineers and developers still aspire to a career in Silicon Valley. Few regulations stand in the way of development. American firms have the easiest access to precious A.I. chips, mostly designed by Nvidia in California").
268 See id. (noting that countries such as France, India, Saudi Arabia, and the UAE have joined the Al race); Carys J. Craig, Canada's Changing Al-Copyright Policy Discourse: A Play in Three Parts?, KLUWER COPYRIGHT BLOG (Apr. 25, 2024), https://copyrightblog.kluweriplaw.com/2024/04/25/canadas-changing-aicopyright-policy-discourse-a-play-in-three-parts/ (discussing Canada's effort "to secure . . . world-leading AI advantage").
269 See Satariano & Mozur, supra note 214.
270 See, eg, ANDREW GOWERS, GOWERS REVIEW OF INTELLECTUAL PROPERTY 68 (2006) (calling for amending Article 5 of the EU InfoSoc Directive "to allow for an exception for creative, transformative or derivative works, within the parameters of the Berne Three Step Test"); HARGREAVES, supra note 62, at 52 (extolling the benefits of fair use and describing it as "the big once and for all fix of the UK").
271 See Berne Convention, supra note 70, art. 5(3) (Protection in the country of origin is governed by domestic law."); see also Peter К. Yu, А Spatial Critique of Intellectual Property Law and Policy, 74 WASH. € LEE L. REV. 2045, 2064-67 (2017) (discussing the territorial nature of intellectual property law).
272 See João Pedro Quintais, Generative AI, Copyright and the AI Act, 56 COMPUT. L. & SEC. REV., no. 106107 (2025) (discussing the problem posed by the territoriality principle to the enforcement of the EU Al Act).
273 See Sag, Copyright Safety, supra note 44, at 302.
274 See Katherine Lee, A. Feder Cooper & James Grimmelmann, Talkin' 'bout AI Generation: Copyright and the Generative-AI Supply Chain, 71 J. COPYRIGHT Soc Y (forthcoming 2024) (discussing the complexities in the generative АТ supply chain).
275 However, the policy space for such reform is limited, due to the anchoring effect of international copyright norms protecting the rightsholders" interests in controlling the communication of their original expressions and the possible stickiness of key human resources in AI development. Global competition will therefore lead to further convergence, rather than a race to the bottom in which countries lower their copyright standards to facilitate AI training.
276 EU AI Act, supra note 13.
277 See discussion infra Section IV.B.
278 Id. recital 21.
279 Id. recital 106.
280 See Alexander Peukert, Copyright in the Artificial Intelligence Act-A Primer, 73 GRUR INT°L 497, 508-09 (2024) (discussing the enforcement of the EU AI Act).
281 See id. at 505-06 (discussing the choice-of-law complications raised by the EU AI Act); see also Yu, Autonomous Creation, supra note 259, at 823-26 (offering the judicial application of choice-of-law principles as an option for Al-related copyright law reform).
282 For discussions of regulatory arbitrage in the intellectual property or cyberlaw context, see generally A. Michael Froomkin, The Internet as a Source of Regulatory Arbitrage, in BORDERS IN CYBERSPACE: INFORMATION POLICY AND THE GLOBAL INFORMATION INFRASTRUCTURE 129 (Brian Kahin & Charles Nesson ed., 1997); Pamela Samuelson, Intellectual Property Arbitrage: How Foreign Rules Can Affect Domestic Protections, 71 U. CHI. L. REV. 223 (2004).
283 E.g., Omri Ben-Shahar, An Ex-Ante View of the Battle of the Forms: Inducing Parties to Draft Reasonable Terms, 25 INT'L REV. L. & ECON. 350, 367 (2005); Stephen J. Choi, Law, Finance, and Path Dependence: Developing Strong Securities Markets, 80 TEX. L. REV. 1657, 1720 (2002); Jeffrey N. Gordon, Corporations, Markets, and Courts, 91 COLUM. L. REV. 1931, 1958 n.93 (1991); William Magnuson, The Race to the Middle, 95 NOTRE DAME L. REV. 1183, 1183 (2020).
284 Peter Jaszi, A Garland of Reflections on Three International Copyright Topics, 8 CARDOZO ARTS & ENT. L.J. 47, 63 (1989).
285 See Peter К. Yu, Intellectual Property and the Information Ecosystem, 2005 MICH. ST. L. REV. 1, 10 n.51 ("Although copyright holders often accuse of piracy those who make copies without their authorization, piracy is in the eyes of the beholder.").
286 See Yu, Paradigm Evolution, supra note 77, at 137.
287 See id. at 128-41 (noting that countries have engaged more in "paradigm evolution" than "paradigm shifts" in their efforts to transplant the U.S. fair use provision).
288 See Magnuson, supra note 283, at 1184-87.
289 7d. at 1201.
290 Yu, Paradigm Evolution, supra note 77, at 143.
291 Id.
292 See id. at 146-47.
293 See id. at 148-55.
294 Cf id. at 143.
295 See supra notes 15-19.
296 See Sag, Copyright Safety, supra note 44, at 326-37 (discussing the limited circumstances in which the copying involved in training AI models may create problems for fair use).
297 See generally CHATGPT IS EATING THE WORLD, https://chatgptiseatingtheworld.com (last visited Mar. 7, 2025) (collecting and discussing these cases); DAIL-THE DATABASE OF AI LITIGATION, https://blogs.gwu.edu/law-eti/ai-litigation-database (last visited Mar. 7, 2025) (providing a database about ongoing and completed Al litigation).
298 See Pamela Samuelson, How to Think About Remedies in the Generative AI Copyright Cases, COMMC'NS ACM, July 2024, at 27.
299 See Andersen у. Stability AI Ltd., No. 23-CV-00201, 2024 WL 3823234, at ·7 (N.D. Cal. Aug. 12, 2024) (dismissing claims under Section 1202(a) and (b)(1) of the U.S. Copyright Act); Doe 1 v. GitHub, Inc., No. 22-CV-06823, 2024 WL 235217, at ·8-9 (N.D. Cal. Jan. 22, 2024) (dismissing claims under Section 1202(b)(1) and (3) of the U.S. Copyright Act and various state-law claims).
300 For example, the plaintiffs in Raw Story Media, Inc. у. OpenAl, Inc. and The Intercept Media Inc. v. OpenAl, Inc. relied exclusively on alleged violations of Section 1202 of the U.S. Copyright Act. Complaint at 9, Raw Story Media, Inc. у. OpenAl Inc., No. 1:24-CV-01514 (S.D.N.Y. Feb. 28, 2024), ECF No. 1, https://www.loevy.com/wp-content/uploads/2024/02/Raw-Story-v.-OpenAl-Complaint-Filed.pdf, Complaint at 10, The Intercept Media Inc. у. Ореп АТ, Inc., No. 1:24-CV-01515 (S.D.N.Y. Feb. 28, 2024), ECF No. 1, https://storage.courtlistener.com/recap/gov.uscourts.nysd.616536/gov.uscourts.nysd.616536.1.0 1.pdf; see also 17 U.S.C. $ 1202 (protecting the integrity of copyright management information).
301 See, e.g., Thomson Reuters Enter. Ctr. GMBH у. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) (finding that the use of legal memoranda built from Westlaw headnotes to train АТ search tools did not constitute fair use).
302 Complaint at 1, Tremblay у. OpenAl, Inc, No. 3:23-CV-03223 (N.D. Cal. June 28, 2023), ECF No. 1, https://storage.courtlistener.com/recap/gov.uscourts.cand.414822/gov.uscourts.cand.414822.1.0 1.pdf.
303 See Sag, Fairness and Fair Use, supra note 25, at 1917-18 (addressing this argument in greater detail).
304 Complaint at 6, Tremblay, No. 3:23-CV-03223.
305 DSM Directive, supra note 153, arts. 3(1), 4(1).
306 Singapore Copyright Act $ 244(2)(d), (e(ii)(B).
307 Complaint at 23-24, N.Y. Times Со. у. Microsoft Corp., No. 1:23-CV-11195 (S.D.N.Y. Dec. 27, 2023), ECF No. 1, https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf.
308 Concord Music Grp., Inc. у. Anthropic PBC, No. 3:24-CV-03811 (N.D. Cal. June 26, 2024) (transferred from M.D. Tenn. under Case No. 3:23-CV-01092).
309 See, e.g., Rajan Patel, An Expanded Partnership with Reddit, INSIDE GOOGLE (Feb. 22, 2024), https://blog. google/inside-google/company-announcements/expanded-reddit-partnership/ (announcing Google's agreement with Reddit); Press Release, OpenAI, Partnership with Axel Springer to Deepen Beneficial Use of Al in Journalism (Dec. 13, 2023), https://openai.com/index/axel-springer-partnership/ [hereinafter OpenAl Press Release] (announcing OpenAT's partnership with Axel Springer).
310 OpenAl Press Release, supra note 309.
311 1d.
312 Id.
313 Press Release, OpenAI, Global News Partnerships: Le Monde and Prisa Media (Mar. 13, 2024), https://openai.com/index/global-news-partnerships-le-monde-and-prisa-media/.
314 Alexandra Bruell, Sam Schechner & Deepa Seetharaman, OpenAI, WSJ Owner News Corp Strike Content Deal Valued at Over $250 Million, WALL ST. J. (May 22, 2024, 9:45 PM), https://www.wsj.com/business/media/openai-news-corp-strike-deal-23f186ba.
315 Caitlin Huston, Condé Nast Inks Multiyear OpenAI Deal for Its Magazine Brands, HOLLYWOOD REP. (Aug. 20, 2024, 11:13 AM), https://www.hollywoodreporter.com/business/business-news/conde-nast-inksmultiyear-openai-deal-for-its-magazine-brands-1235979339/.
316 See Nilay Patel, Why the Atlantic Signed a Deal with OpenAI, THE VERGE (July 11, 2024, 1:05 PM), https://www.theverge.com/2024/7/11/24196396/the-atlantic-openai-licensing-deal-ai-news-journalism-webfuture-decoder-podcasts (discussing the motivations behind The Atlantic's licensing agreement and quoting Atlantic CEO Nicholas Thompson as saying, "I believe that us doing this deal and the Wall Street Journal doing their deal helps The Times because it shows that there is a market for this stuff").
317 Authors Guild, Inc. у. Google, Inc. (Google Books), 804 F.3d 202, 224 (2d Cir. 2015); Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 100 (2d Cir. 2014). In Google Books, the Second Circuit went so far as to note that, even though the snippet function could cause some loss of sales, "the possibility, or even the probability or certainty, of some loss of sales does not suffice to make the copy an effectively competing substitute that would tilt the weighty fourth factor in favor of the rights holder in the original." Google Books, 804 F.3d at 224.
318 See Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 567 (1985) (stating that Time Magazine's canceled serialization of President Gerald Ford's memoirs as the direct result of defendant's infringement); Am. Geophysical Union v. Texaco Inc., 60 F. 3d 913, 931 (2d. Cir. 1994) (finding the potential to license end-user photocopies through a collecting society a cognizable market harm).
319 See OpenAl Press Release, supra note 309.
320 Introducing Meta Llama 3, supra note 34.
321 This example is meant to illustrate the scale of material required for some Al training. We understand that the New York Times has a large back-catalogue of content and that not all AI models require trillions of tokens of text to train. Consistent with this ballpark estimate, our heroic research assistant estimated that there were roughly 107,475 words in the March 20, 2025, issue of the New York Times print edition by counting the number of rows in each section on each page of the newspaper and multiplying that by the number of words in the first row of that section.
322 See Sag, Fairness and Fair Use, supra note 25, at 1920.
323 See DSM Directive, supra note 153, art. 4(3).
324 See SearchGPT Prototype, OPENAI (July 25, 2024), https://openai.com/index/searchgpt-prototype/ (announcing the SearchGPT prototype).
325 See Complaint at 22, N.Y. Times Co. у. Microsoft Corp., No. 1:23-CV-11195 (S.D.N.Y. Dec. 27, 2023), ECF No. 1, https://storage.courtlistener.com/recap/gov.uscourts.nysd.612697/gov.uscourts.nysd.612697.1.0.pdf (complaining about "the ability to generate natural language summaries of search result contents, including hits on Times Works, that obviate the need to visit The Times's own websites" and noting that "[t]hese 'synthetic' search results purport to answer user queries directly and may include extensive paraphrases and direct quotes of Times reporting").
326 See Yu, Autonomous Creation, supra note 259, at 756 n.2 (collecting these Senate hearings).
327 See, e.g., No AI FRAUD Act, HR. 6943, 118th Cong. (2024); NO FAKES Act of 2024, S. 4875, 118th Cong. (2024); The COPIED Act, $. 4674, 118th Cong. (2024); see also U.S. COPYRIGHT OFF., COPYRIGHT AND ARTIFICIAL INTELLIGENCE: PART 1: DIGITAL REPLICAS 24-28 (2024) [hereinafter DIGITAL REPLICAS STUDY], https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-1-Digital-Replicas-Report.pdf [https://perma.cc/YG9Z-B8UR] (discussing these proposed legislative bills and ongoing Congressional developments); AI INDEX 2024, supra note 181, at 6 ("In 2023, there were 25 Al-related regulations [in the United States], up from just one in 2016.").
328 See DIGITAL REPLICAS STUDY, supra note 327, at 15-16 (discussing new laws in Tennessee, Louisiana, and New York targeting the problems posed by digital replicas).
329 Id. at 57.
330 Exec. Order No. 14110, 88 Fed. Reg. 75191 (Oct. 30, 2023).
331 Exec. Order №. 14179, 90 Fed. Reg. 8741 (Jan. 23, 2025).
332 See FED. TRADE COMM'N, COMMENT ON ARTIFICIAL INTELLIGENCE AND COPYRIGHT 5-6 (2023), https://www.ftc.gov/system/files/ftc_gov/pdf/p241200 Вс comment to copyright office.pdf [https://perma.cc/TF2F-DMJV].
333 See Lim & Yu, supra note 264, at 882-83 (discussing these cases).
334 FED. TRADE COMMN, supra note 332.
335 1d. at 5-6.
336 See Lim € Yu, supra note 264, at 884-901 (criticizing this comment and explaining why antitrust intervention in the area of AI training will be ill-advised); Samuelson et al., supra note 264 ("[W]hen the courts are still in the process of determining the law, the FTC should not be issuing statements that suggest that it has pre-judged the issue. The FTC has no authority to determine what is and what is not copyright infringement, or what is or is not fair use."); Nolan Goldberg & Michelle Ovanesian, FTC Appears to Expand AI Regulatory Role into Copyright Matters, MONDAQ (Nov. 23, 2023), https://www.mondaq.com/unitedstates/copyright/1393254/ftc-appears-to-expand-ai-regulatory-role-intocopyright-matters (observing that the FTC's comment suggests that the agency "will aggressively and proactively challenge alleged unfair practices involving artificial intelligence, even if that means stretching the meaning of 'unfair' to increase its jurisdiction over such matters").
337 See Lim & Yu, supra note 264; Samuelson et al., supra note 264.
338 See, e. g., Robert Levine, GEMA Sues OpenAI over Song Lyrics in a First for PROs, BILLBOARD (Nov. 13, 2024), https://www.billboard.com/pro/gema-sues-openai-song-lyrics-copyright-law-europe/ (reporting the copyright infringement lawsuit German performing rights organization GEMA filed against OpenAl).
339 AT Act Enters into Force, EUR. COMMN (Aug. 1, 2024), hitps://commission.europa.eu/news/ai-actenters-force-2024-08-01 en. Article 53, which this section discusses, will go into effect on August 2, 2025. EU АТ Act, supra note 13, art. 113(b).
340 See Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts, СОМ(2021) 206 final (Apr. 21, 2021). For overviews of the copyright provisions in the EU АТ Act, see generally Peukert, supra note 280; Quintais, supra note 272.
341 See Peukert, supra note 280, at 497, 499 ("Copyright was not considered a major problem requiring regulatory intervention and was not even mentioned in the proposal. . . . [It] was a last-minute addition to an act with a very different subject matter and purpose."); Quintais, supra note 272 (noting that "[tlhe AI Act is conceptually akin to a public law instrument designed through a product safety prism").
342 See EU AI Act, supra note 13, art. 6 (providing the classification rules for high-risk АТ systems).
343 Td. recital 106, art. 53.
344 Td. art. 2(6).
345 Td. art. 53(1)(d).
346 See Sag, Copyright Safety, supra note 44, at 340-41 (arguing that "those who use copyrighted works as training data for LLMs should keep detailed records of the works and from where they were obtained").
347 EU AI Act, supra note 13, art. 53(1)(c).
348 DSM Directive, supra note 153, art. 4(3).
349 Note the potential parallel to the need to respect paywalls and comply with machine-readable exclusion headers under U.S. copyright law. See Sag, Fairness and Fair Use, supra note 25, at 1920.
350 Regulation 2016/679, 2016 O.J. (L 119) 1, recital 115.
351 See generally ANU BRADFORD, THE BRUSSELS EFFECT: HOW THE EUROPEAN UNION RULES THE WORLD (2020) (discussing the "Brussels effect," the ability to convert an EU rule into a global one).
352 See supra text accompanying note 265-266.
353 See supra text accompanying notes 280-281.
354 Martin Senftleben, Generative AI and Author Remuneration, 54 INT'L REV. INTELL. PROP. & COMPETITION L. 1535, 1537 (2022).
355 EUR. COPYRIGHT S°C"Y, COPYRIGHT AND GENERATIVE AI: OPINION OF THE EUROPEAN COPYRIGHT SOCIETY 12 (2025), https://europeancopyrightsociety.org/wpcontent/uploads/2025/02/ecs opinion genai january2025.pdf.
356 See, e.g., Daniel J. Gervais, Noam Shemtov, Haralambos Marmanis & Catherine Zaller Rowland, The Heart of the Matter: Copyright, AI Training, and LLMs, 72 J. COPYRIGHT SOC'Y 482 (2024) (noting the important roles voluntary collective licensing can play in addressing the unauthorized use of copyrighted works for AI training); Adam Jaffe, Controlling the Use of Copyrighted Materials in Training, in U.S. COPYRIGHT OFF., IDENTIFYING THE ECONOMIC IMPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR COPYRIGHT POLICY CONTEXT AND DIRECTION FOR ECONOMIC RESEARCH 48-52 (Brent Lutes ed, 2025) (discussing the option of creating a new statutory blanket license to cover the use of copyrighted works for AI training).
357 See generally COLLECTIVE MANAGEMENT OF COPYRIGHT AND RELATED RIGHTS (Daniel Gervais & Jodo Pedro Quintais eds., 4th ed. 2025) (discussing the roles played by collecting societies in the copyright system around the world and the strengths and weaknesses of collective copyright management).
358 See Peter К. Yu, P2P and the Future of Private Copying, 76 U. COLO. L. REV. 653, 704-15 (2005) (discussing and collecting the proposals for compulsory and voluntary collective licensing to address Internet file-sharing). To name a few, these proposals include WILLIAM W. FISHER III, PROMISES TO KEEP: TECHNOLOGY, LAW, AND THE FUTURE OF ENTERTAINMENT 199-258 (2004); Daniel Gervais, Application of an Extended Collective Licensing Regime in Canada: Principles and Issues Related to Implementation (2003), https://ssrn.com/abstract=1920391; Neil W. Netanel, Impose a Noncommercial Use Levy to Allow Free Peerto-Peer File Sharing, 17 HARV. J.L. & TECH. 1 (2003); Fred von Lohmann, 4 Better Way Forward: Voluntary Collective Licensing of Music File Sharing, ELEC. FRONTIER FOUND. (Apr. 30, 2008), https://www.eff.org/files/eff-a-better-way-forward.pdf.
359 See discussion supra Part Ш.
360 See discussion supra Part IV.
361 Cf. Yu, User-Friendly Copyright Regime, supra note 124, at 309 ("[I]t is hard to imagine any country willing to challenge the U.S. fair use provision, including the transformative use doctrine, before the Dispute Settlement Body of the World Trade Organization .... Nor is a WTO panel likely to strike down this provision.").
362 Section 301 permits the U.S. President to investigate and impose sanctions on countries engaging in unfair trade practices that threaten the United States" economic interests, including the inadequate protection and enforcement of intellectual property rights. See 19 U.S.C. §§ 2411-2420. For discussions of the operation of the Section 301 process, see generally Joe Karaganis & Sean Flynn, Networked Governance and the USTR, in MEDIA PIRACY IN EMERGING ECONOMIES 75 (Joe Karaganis ed., 2011); Paul С.В. Liu, U.S. Industry 's Influence on Intellectual Property Negotiations and Special 301 Actions, 13 UCLA PAC. BASIN L.J. 87 (1994).
363 See Yu, Customizing Fair Use, supra note 69, at 3 (noting the developing countries" "fear that the introduction of [copyright] limitations and exceptions could reduce foreign investment, invite WTO complaints, harm diplomatic relations with powerful countries or all of the above").
364 See discussion supra Sections ILA-C.
369 See discussion supra Sections П.А-В.
366 See discussion supra Section П.С.
367 See supra text accompanying notes 340-343.
368 See discussion supra Section IV.A.2.
369 See Comm. on Econ., Soc. & Cultural Rts., General Comment No. 25 (2020) on Science and Economic, Social and Cultural Rights (Article 15(1)(b), (2), (3) and (4) of the International Covenant on Economic, Social and Cultural Rights), 99 72-76, U.N. Doc. E/C.12/GC/25 (Apr. 30, 2020) (discussing the risks and promises of Al and other new emerging technologies); 47 Advisory Body Final Report, supra note 6, at 31 (providing a list of "Al-related risks based on existing or potential vulnerability").
370 See Bill Gates, AI Is About to Completely Change How You Use Computers, GATES NOTES (Nov. 9, 2023), https://www.gatesnotes.com/Al-agents (discussing AI agents).
371 See Lim & Yu, supra note 264, at 915 (discussing these models); Peter K. Yu, Beyond Transparency and Accountability: Three Additional Features Algorithm Designers Should Build into Intelligent Platforms, 13 NE. U. L. REV. 263, 295 (2021) (discussing the use of "lean data," which is contrasted with big data).
372 See Press Release, Gartner Identifies Top Trends Shaping the Future of Data Science and Machine Learning (Aug. 1, 2023), https://www.gartner.com/en/newsroom/press-releases/2023-08-01-gartner-identifiestop-trends-shaping-future-of-data-science-and-machine-learning ("By 2024, Gartner predicts 60% of data for AI will be synthetic to simulate reality, future scenarios and derisk AI, up from 1% in 2021."). For discussions of synthetic data, see generally Michal S. Gal & Orla Lynskey, Synthetic Data: Legal Implications of the DataGeneration Revolution, 109 IOWA L. REV. 1087 (2024); Peter Lee, Synthetic Data and the Future of AT, 110 CORNELL L. REV. 1 (2025).
373 See Yu, Autonomous Creation, supra note 259, at 757 (collecting these cases); see also U.S. COPYRIGHT OFF., COMPENDIUM OF U.S. COPYRIGHT OFFICE PRACTICES § 313.2 (3d ed. 2021) (stating that the U.S. Copyright Office "will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author"); U.S. COPYRIGHT OFF., COPYRIGHT AND ARTIFICIAL INTELLIGENCE: PART 2: COPYRIGHTABILITY (2025) (providing the second report on copyright and Al, covering the copyrightability of Al-generated works).
374 See Limoumou Su Liumoumou Qinhai Zuopin Shumingquan, Xinxi Wangluo Chuanboquan Jiufen An (ERAN ARA E VEM AZİL EMERY E) [Li у. Liu], (2023) Jing 0491 Min Chu No. 11279 ((2023)7049197112795) (Beijing Internet Ct. Nov. 27, 2023), translated at https://patentlyo.com/media/2023/12/Li-v-Liu-Beijing-Internet-Court-20231127-with-English-Translation.pdf; Shenzhen Tengxun Su Shanghai Yingxun Zhuzuoquan Qinquan An (КВН CASA EEE) [Shenzhen Tencent Comput. Sys. Co. v. Shanghai Yingxun Tech. Co.] (2019) Yue 0305 Min Chu No. 14010 ((2019) 2.0305 #140105) (Shenzhen Nanshan Dist. Ct. Dec. 24, 2019), https://www.chinajusticeobserver.com/law/x/2019-yue-0305-min-chu-14010/chn; Matthew Murphy, China's Second AI-Generated Image Copyright Infringement Case, HG.ORG, https://www.hg.org/legal-articles/china-ssecond-ai-generated-image-copyright-infringement-case-68497 (last visited Mar. 7, 2025) (reporting a November 2024 decision by the Changshu Municipal People's Court to recognize the copyrightability of an AIgenerated artwork that "depicts a red heart reflected in water"); Edward Lee, South Korea Grants Copyright to AI Generated Work, "Al Suro's Wife" Film as Work Edited by Humans, CHATGPT IS EATING THE WORLD (Jan. 8, 2024), https://chatgptiseatingtheworld.com/2024/01/08/south-korea-grants-copyright-to-ai-generated-workai-suros-wife-film-as-work-edited-by-humans/ (reporting the copyright registration for Al-generated film 47 Surobuin in South Korea); see also Yu, Future Path, supra note 179 (discussing these cases).
375 See, e.g., Edward Lee, Al-Generated Image Received Copyright Registration Based on "Selection, Coordination, and Arrangement." Yes, in the United States. How?, CHATGPT IS EATING THE WORLD (Feb. 11, 2025), https:/chatgptiseatingtheworld.com/2025/02/11/ai-generated-image-received-copyright-registrationbased-on-selection-coordination-and-arrangement-yes-in-the-united-states-how/based-on-selection-coordination-and-arrangement-yes-in-the-united-states-how/ (reporting the registration of a single Al-generated image entitled A Single Piece of American Cheese), Edward Lee, Copyright Office Registers Artwork Collage Consisting of AI-Generated Elements, CHATGPT IS EATING THE WORLD (Feb. 13, 2025), https://chatgptiseatingtheworld.com/2025/02/13/copyright-office-registers-artwork-collage-consisting-of-aigenerated-elements/ (reporting the registration of a visual collage entitled А Collection of Objects Which Do Not Exist), Edward Lee, US Copyright Office Allows Registration of AI-Generated Video Based on Editing of AI Generated Video, Music, CHATGPT Is EATING THE WORLD (Feb. 16, 2025), https://chatgptiseatingtheworld.com/2025/02/16/us-copyright-office-allows-registration-of-ai-generated-videobased-on-editing-of-ai-generated-video-music/ (reporting the registration of Film Clip for Song Just Like in a Movie (SNEAK PREVIEW), an Al-generated video with Al-generated music).
376 See Yu, Autonomous Creation, supra note 259, at 779, 791-92 (noting the global convergence of copyright law in AI training but global divergence regarding the copyrightability of AI-generated works.
Copyright Emory University, School of Law 2025