The AI tools generating music, images, and written content today were trained on millions of works created by human artists, musicians, photographers, writers, and developers. Most of those creators were never asked. Most were never paid.
Whether that is legal is now one of the most actively litigated questions in copyright law. What you can do about it follows from the answer.
Here is where things stand.
How AI training works and why your work is in it
Generative AI systems learn by processing enormous amounts of existing content. A music generation model is trained on recordings, compositions, and audio files. An image model is trained on photographs, illustrations, and digital art. A text model is trained on books, articles, websites, and code. The model extracts patterns from this material, then uses those patterns to generate new outputs on demand.
The content used in training was overwhelmingly scraped from the internet: from streaming platforms, social media, stock libraries, and any other publicly accessible source an AI company could reach. Datasets like LAION-5B (used to train image models) contained billions of images. Training datasets for music models included recordings whose rights belong to record labels, independent artists, and publishing companies.
No license was requested. No royalty was paid. No opt-in was offered. The AI companies' position, stated openly, was that training on publicly available content is fair use, and therefore no permission is required.
That position is now being tested in court.
What the U.S. Copyright Office concluded
In May 2025, the U.S. Copyright Office released Part 3 of its Report on Copyright and Artificial Intelligence. Part 3 addresses the specific question of AI training data.
The Office's conclusion on fair use: after analyzing all four statutory factors, the analysis "generally favours copyright owners." Not AI companies.
That is not a court ruling. It is the Copyright Office's legal analysis and recommendation to Congress. It does not resolve the lawsuits currently in progress. But it signals where the federal government's copyright authority believes the law points.
The Office also analyzed whether voluntary licensing is feasible and recommended that Congress consider frameworks to require AI companies to license the content they use in training. Several legislative proposals are now pending, though none have passed as of May 2026.
What the courts have actually decided
Thomson Reuters v. Ross Intelligence
In February 2024, a Delaware federal court issued the first major ruling on AI training data and fair use. The case involved Ross Intelligence, which used Westlaw's legal content to train an AI legal research tool without a license.
The court ruled: training on copyrighted content without permission is not fair use.
This was the first judicial decision to directly address AI training data as a copyright question. The court applied all four fair use factors and found they did not support the AI company's position. The ruling established an important precedent, though it is one district court decision, not a Supreme Court ruling or binding across all circuits.
The music industry lawsuits: Suno and Udio
In June 2024, the Recording Industry Association of America filed copyright infringement suits against Suno and Udio, two AI music generation platforms. The suits alleged that the platforms trained their models on copyrighted recordings without a license.
The cases have since split:
Warner Music Group settled with Suno in November 2025. The settlement includes a licensing arrangement allowing Suno to use Warner's catalog to train its model, with compensation flowing back to rights holders. The specific terms are confidential, but the deal was reported as involving both a lump payment and an ongoing licensing structure.
UMG settled with Udio in October 2025. That settlement created what has been called the first major-label licensing template for AI music generation, a model other labels and platforms are watching closely.
Sony's claims against Udio remain ongoing as of May 2026.
Suno and UMG are not yet settled. As of April 2026, settlement talks hit an impasse. That case continues.
The pattern across these settlements is meaningful: major labels did not fight to a final court judgment. They moved to licensing. That is not a legal concession. It reflects commercial pragmatism. A licensing deal generates ongoing revenue. A final court win generates a one-time verdict, a slower appeals process, and an industry relationship that no longer exists.
What this means if you are an independent creator
The major labels have resources, lawyers, and leverage that individual creators do not. The settlements and licensing deals described above directly benefit artists signed to those labels and those whose work is administered by the labels' publishing arms. They do not automatically extend to independent musicians, self-releasing artists, visual artists, photographers, or writers.
Independent creators in this landscape face three realities:
Your work may already be in AI training datasets. If you have released music on any major streaming platform, published photography on any publicly accessible site, or posted visual art on social media, there is a realistic chance it was scraped. There is currently no comprehensive way to verify this.
Your options to remedy past training are limited. Even if your work was used without permission, pursuing a standalone copyright infringement claim against a large AI company as an individual creator is expensive and uncertain. The legal costs involved make it impractical for most independent creators to litigate alone.
Your position going forward is better than it was. The Thomson Reuters ruling, the USCO Part 3 report, and the licensing settlements all signal that the argument that AI training is automatically fair use is weakening. Platforms are beginning to develop opt-in licensing programs. Regulatory proposals are moving. The landscape is shifting toward a framework where creators receive compensation for training data use.
What you can do now
Register your copyright. Registration does not retroactively fix past unlicensed training. But it strengthens your position in any future dispute and is a prerequisite for the most significant legal remedies available under U.S. copyright law. If you have not registered your catalog, this is the time to do it.
Use available opt-out mechanisms. Some AI companies and platforms now offer opt-out controls for training data use. These are inconsistent, incomplete, and platform-specific. They typically only affect future training, not past scraping. But using them is better than not using them. Resources like the AIRights Guide document the current state of opt-out options by platform.
Monitor emerging licensing opportunities. Several organizations are developing collective licensing frameworks for AI training data. The Music Modernization Act's infrastructure is being considered as a potential model. If your work is distributed through a label or publisher, ask what your agreement says about AI training and whether they are part of any collective licensing arrangement.
Understand that this area is actively developing. New court decisions, legislative actions, and industry deals are changing the landscape. A position that is uncertain today may be clarified in the next six to twelve months.
If you believe your work is being used in AI training and you want to understand your specific options, that is a conversation worth having with an attorney who understands both copyright law and the current state of AI litigation.
For a broader overview of how copyright law applies to AI-generated content, including what you own when you create with AI tools, see our AI copyright guide for creators.
This article is for general information only — not legal advice.




