The CLEAR Act: U.S. Legislative Push for AI Training Data Transparency and Its Licensing Implications
Introduction
On 17 March 2026, the Copyright Labeling and Ethical AI Act (CLEAR Act) was introduced in the United States Congress. While much of the executive branch’s early attention to generative AI had centred on enforcement priorities and existing legal tools, the CLEAR Act represents a legislative attempt to address the foundational information asymmetry that has characterised the relationship between copyright owners and developers of large AI models. By proposing a mandatory notification regime requiring commercial AI developers to disclose to the Register of Copyrights the specific copyrighted materials used in training their models, the bill seeks to create a public, searchable clearinghouse that would empower individual creators to determine whether their works were ingested and to pursue licensing claims or other remedies accordingly.
The introduction of the CLEAR Act occurs against the backdrop of intense litigation and policy debate in the United States over whether and how existing copyright law applies to the ingestion of copyrighted works for AI training. Unlike the European Union’s approach under Article 53 of the AI Act, which imposes transparency obligations directly on providers placing models on the EU market, the CLEAR Act would operate through a centralised federal disclosure mechanism administered by the U.S. Copyright Office. This structural difference has significant implications for how licensing markets for training data might evolve in the United States and globally.
The Core Mechanism: Mandatory Notification and the Public Clearinghouse
At the heart of the CLEAR Act is a requirement that commercial entities developing or deploying AI models that were trained on copyrighted material must submit detailed notifications to the Register of Copyrights identifying the specific works or categories of works used. These notifications would be compiled into a publicly accessible, searchable database. The design is explicitly intended to overcome the current “black box” problem: creators currently have little practical way of knowing whether their books, articles, images, music, or code were included in the massive datasets used to train frontier models.
By creating this clearinghouse, the bill would fundamentally alter the information environment in which licensing negotiations occur. Rights holders would no longer need to rely solely on inference, leaks, or costly discovery in litigation to establish that their works were used. Instead, they could query the database, obtain direct evidence of ingestion, and approach AI developers with concrete claims for compensation, licensing terms, or, in appropriate cases, demands to cease use or remove the works from training corpora. This shift from opacity to mandated transparency is the bill’s central licensing-related innovation.
Implications for Retroactive and Prospective Licensing
If enacted, the CLEAR Act would likely stimulate both retroactive and prospective licensing activity. On the retroactive side, creators who discover that their works were used without prior authorisation would be better positioned to negotiate settlements or licences covering past use. The public nature of the clearinghouse would also facilitate collective or representative actions, potentially lowering the transaction costs that have so far made individual claims economically unviable for many creators. AI developers, for their part, would face increased pressure to regularise their historical data practices through licensing programmes or other arrangements.
For prospective licensing, the regime would create stronger incentives for AI companies to secure licences before or during the training process rather than relying on fair use arguments or post-hoc rationalisations. Knowing that their data usage will be publicly disclosed, developers would have powerful reasons to obtain clear contractual rights upfront, both to avoid future claims and to demonstrate good-faith compliance to regulators, investors, and the public. This dynamic could accelerate the development of standardised licensing frameworks, collective management solutions, and data marketplaces specifically tailored to AI training needs.
Interaction with U.S. Copyright Doctrine and Fair Use Debates
The CLEAR Act does not purport to resolve the underlying substantive question of whether training AI models on copyrighted works constitutes fair use under U.S. law. Pending cases before U.S. courts continue to test that issue. However, the bill would materially affect the practical context in which those debates play out. By mandating disclosure, it would make it more difficult for developers to maintain that their use was non-commercial, transformative in a way that does not substitute for the original, or otherwise favoured by the fair use factors, if the scale and commercial nature of the training activity become publicly visible through the clearinghouse.
Moreover, the existence of a formal disclosure regime could influence judicial and regulatory assessments of good faith. Developers who have complied with CLEAR Act notification requirements might argue that their transparency demonstrates respect for copyright interests, while those who fail to disclose or who provide incomplete information could face heightened scrutiny. In licensing negotiations, rights holders would likely cite the statutory disclosure obligation as evidence that the use was significant enough to warrant compensation, shifting the baseline expectations in commercial discussions.
Implementation Challenges and Constitutional Considerations
The CLEAR Act, as introduced, raises significant implementation questions. Defining the precise scope of what must be disclosed, individual works versus aggregated datasets, exact copies versus transformed representations, publicly available versus restricted content, will require careful regulatory design by the Copyright Office. There are also practical challenges in verifying the accuracy and completeness of notifications submitted by developers, particularly for models trained on web-scale data where provenance tracking may be imperfect. Enforcement mechanisms, penalties for non-compliance or false disclosure, and the treatment of trade secrets or confidential business information embedded in training methodologies will all need to be addressed.
Constitutional concerns, particularly under the First Amendment, have already been raised in policy discussions surrounding similar transparency proposals. Critics argue that compelled disclosure of training data details could chill protected speech or impose undue burdens on expressive activity. Proponents counter that the requirement is narrowly tailored to serve the important government interest in protecting copyright and promoting a functioning licensing market. How these arguments would fare in litigation, and whether the final version of any legislation would survive constitutional challenge, remains uncertain but will be central to the bill’s ultimate viability.
Comparative Perspective: CLEAR Act and the EU AI Act Approach
The CLEAR Act’s centralised, Copyright Office-administered clearinghouse model differs markedly from the EU’s approach under Article 53 of the AI Act, which requires providers to publish summaries directly and to maintain internal copyright compliance policies. The U.S. proposal places greater emphasis on a single public database as a tool for rights holder empowerment and potential collective action. This structural choice has implications for transaction costs, enforcement, and the development of licensing markets. A centralised clearinghouse could facilitate more systematic matching between creators and developers, while the EU’s decentralised publication model places greater onus on individual rights holders to monitor and assert claims across multiple providers’ disclosures.
If both regimes ultimately take effect, AI developers operating globally will face a complex compliance environment requiring different disclosure formats, levels of detail, and timing for different markets. This regulatory fragmentation could itself become a driver for more standardised, licence-based approaches to training data acquisition, as companies seek to minimise the risk of divergent or conflicting obligations across jurisdictions.
Conclusion
The introduction of the CLEAR Act on 17 March 2026 marks a significant legislative attempt to address one of the core obstacles to a functioning market for AI training data: the profound information asymmetry between creators and developers. By proposing a mandatory, public notification system administered through the Copyright Office, the bill would equip rights holders with the factual foundation needed to pursue licensing claims, whether retroactively for past use or prospectively for future training activities.
Whether the CLEAR Act ultimately becomes law in its current or a modified form remains to be seen. Its passage would not resolve the deeper doctrinal questions surrounding fair use and AI training. It would, however, materially change the practical environment in which those questions are litigated and negotiated. For the licensing community, the bill represents both an opportunity, the potential emergence of more transparent, better-informed, and better-compensated markets for training data, and a challenge, the need to develop new contractual frameworks, due diligence practices, and collective solutions capable of operating at the scale and speed demanded by frontier AI development. The coming legislative and regulatory process will determine whether this particular mechanism for transparency ultimately takes root in the U.S. copyright landscape.
Author:- Amrita Pradhan, in case of any queries please contact/write back to us at support@ipandlegalfilings.com or IP & Legal Filing.
References
- Copyright Labeling and Ethical AI Reporting Act (CLEAR Act), https://www.govinfo.gov/app/details/BILLS-119s3813is.
- Senate Bill S.3813, § 1.
- U.S. Copyright Act, 17 U.S.C. §§ 101–122, https://www.law.cornell.edu/uscode/text/17.
- U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training, https://www.copyright.gov/ai/.
- U.S. Copyright Office, Copyright and Artificial Intelligence Report Series, https://www.copyright.gov/ai/.
- U.S. Constitution, Amendment I (First Amendment), https://constitution.congress.gov.
- Authors Guild, policy submissions concerning generative AI training and creator compensation frameworks, https://authorsguild.org.
- Recording Industry Association of America (RIAA), policy statements concerning AI training transparency and copyright protection of creative works, https://www.riaa.com.
- World Intellectual Property Organization (WIPO), Generative Artificial Intelligence and Copyright (2024), https://www.wipo.int.
- Organisation for Economic Co-operation and Development (OECD), Copyright, Data Access and Artificial Intelligence (2024), https://www.oecd.org.
- European Union, Regulation (EU) 2024/1689 (Artificial Intelligence Act), Article 53, https://eur-lex.europa.eu/eli/reg/2024/1689/oj.
- Directive (EU) 2019/790 on Copyright in the Digital Single Market (DSM Directive), arts. 3–4, https://eur-lex.europa.eu/eli/dir/2019/790/oj.



