Your Music and Art in AI Training Data: What Creators Need to Know

TL;DR

AI companies scraped millions of songs, images, and written works to train their models, mostly without asking. Whether that is legal is now being decided in courts around the world. The U.S. Copyright Office concluded in 2025 that the fair use argument generally favors copyright owners, not AI companies. Major labels have already moved to licensing deals. Independent creators still have options, and knowing them matters.

Aghil Ebrahimi, Esq.
Licensed in California · Ontario · Quebec~13 min read

The AI tools generating music, images, and written content today were trained on millions of works created by human artists, musicians, photographers, writers, and developers. Most of those creators were never asked. Most were never paid.

Whether that is legal is now one of the most actively litigated questions in copyright law. What you can do about it follows from the answer.

Here is where things stand.

How AI training works and why your work is in it

Generative AI systems learn by processing enormous amounts of existing content. A music generation model is trained on recordings, compositions, and audio files. An image model is trained on photographs, illustrations, and digital art. A text model is trained on books, articles, websites, and code. The model extracts patterns from this material, then uses those patterns to generate new outputs on demand.

The content used in training was overwhelmingly scraped from the internet: from streaming platforms, social media, stock libraries, and any other publicly accessible source an AI company could reach. Datasets like LAION-5B (used to train image models) contained billions of images. Training datasets for music models included recordings whose rights belong to record labels, independent artists, and publishing companies.

No license was requested. No royalty was paid. No opt-in was offered. The AI companies' position, stated openly, was that training on publicly available content is fair use, and therefore no permission is required.

That position is now being tested in court.

In May 2025, the U.S. Copyright Office released Part 3 of its Report on Copyright and Artificial Intelligence. Part 3 addresses the specific question of AI training data.

The Office's conclusion on fair use: after analyzing all four statutory factors, the analysis "generally favours copyright owners." Not AI companies.

That is not a court ruling. It is the Copyright Office's legal analysis and recommendation to Congress. It does not resolve the lawsuits currently in progress. But it signals where the federal government's copyright authority believes the law points.

The Office also analyzed whether voluntary licensing is feasible and recommended that Congress consider frameworks to require AI companies to license the content they use in training. Several legislative proposals are now pending, though none have passed as of May 2026.

What the courts have actually decided

Thomson Reuters v. Ross Intelligence

In February 2024, a Delaware federal court issued the first major ruling on AI training data and fair use. The case involved Ross Intelligence, which used Westlaw's legal content to train an AI legal research tool without a license.

The court ruled: training on copyrighted content without permission is not fair use.

This was the first judicial decision to directly address AI training data as a copyright question. The court applied all four fair use factors and found they did not support the AI company's position. The ruling established an important precedent, though it is one district court decision, not a Supreme Court ruling or binding across all circuits.

The music industry lawsuits: Suno and Udio

In June 2024, the Recording Industry Association of America filed copyright infringement suits against Suno and Udio, two AI music generation platforms. The suits alleged that the platforms trained their models on copyrighted recordings without a license.

The cases have since split:

Warner Music Group settled with Suno in November 2025. The settlement includes a licensing arrangement allowing Suno to use Warner's catalog to train its model, with compensation flowing back to rights holders. The specific terms are confidential, but the deal was reported as involving both a lump payment and an ongoing licensing structure.

UMG settled with Udio in October 2025. That settlement created what has been called the first major-label licensing template for AI music generation, a model other labels and platforms are watching closely.

Sony's claims against Udio remain ongoing as of May 2026.

Suno and UMG are not yet settled. As of April 2026, settlement talks hit an impasse. That case continues.

The pattern across these settlements is meaningful: major labels did not fight to a final court judgment. They moved to licensing. That is not a legal concession. It reflects commercial pragmatism. A licensing deal generates ongoing revenue. A final court win generates a one-time verdict, a slower appeals process, and an industry relationship that no longer exists.

What this means if you are an independent creator

The major labels have resources, lawyers, and leverage that individual creators do not. The settlements and licensing deals described above directly benefit artists signed to those labels and those whose work is administered by the labels' publishing arms. They do not automatically extend to independent musicians, self-releasing artists, visual artists, photographers, or writers.

Independent creators in this landscape face three realities:

Your work may already be in AI training datasets. If you have released music on any major streaming platform, published photography on any publicly accessible site, or posted visual art on social media, there is a realistic chance it was scraped. There is currently no comprehensive way to verify this.

Your options to remedy past training are limited. Even if your work was used without permission, pursuing a standalone copyright infringement claim against a large AI company as an individual creator is expensive and uncertain. The legal costs involved make it impractical for most independent creators to litigate alone.

Your position going forward is better than it was. The Thomson Reuters ruling, the USCO Part 3 report, and the licensing settlements all signal that the argument that AI training is automatically fair use is weakening. Platforms are beginning to develop opt-in licensing programs. Regulatory proposals are moving. The landscape is shifting toward a framework where creators receive compensation for training data use.

What you can do now

Register your copyright. Registration does not retroactively fix past unlicensed training. But it strengthens your position in any future dispute and is a prerequisite for the most significant legal remedies available under U.S. copyright law. If you have not registered your catalog, this is the time to do it.

Use available opt-out mechanisms. Some AI companies and platforms now offer opt-out controls for training data use. These are inconsistent, incomplete, and platform-specific. They typically only affect future training, not past scraping. But using them is better than not using them. Resources like the AIRights Guide document the current state of opt-out options by platform.

Monitor emerging licensing opportunities. Several organizations are developing collective licensing frameworks for AI training data. The Music Modernization Act's infrastructure is being considered as a potential model. If your work is distributed through a label or publisher, ask what your agreement says about AI training and whether they are part of any collective licensing arrangement.

Understand that this area is actively developing. New court decisions, legislative actions, and industry deals are changing the landscape. A position that is uncertain today may be clarified in the next six to twelve months.

If you believe your work is being used in AI training and you want to understand your specific options, that is a conversation worth having with an attorney who understands both copyright law and the current state of AI litigation.

For a broader overview of how copyright law applies to AI-generated content, including what you own when you create with AI tools, see our AI copyright guide for creators.


Frequently asked questions

Can AI companies legally use my music without my permission to train their AI?

That question is being decided right now. AI companies have argued that training on publicly available content is fair use and requires no permission. The U.S. Copyright Office concluded in its May 2025 Part 3 Report on Generative AI Training that the fair use analysis generally favors copyright owners, not AI companies. The first major court ruling on this question, Thomson Reuters v. Ross Intelligence, found that AI training on copyrighted material is not fair use. Several major music copyright cases have settled through licensing deals rather than producing a final court ruling. The short answer: the law is unsettled, but the direction of legal authority is moving toward requiring permission.

Is AI training on copyrighted content considered fair use?

Not automatically. The fair use doctrine, codified at 17 U.S.C. § 107, requires a four-factor analysis, and no single factor is conclusive. The Copyright Office analyzed those factors in Part 3 of its AI report and concluded they generally favor copyright owners when AI companies use copyrighted works without a license for commercial AI training. One federal court, in Thomson Reuters v. Ross Intelligence, reached the same conclusion for AI training on legal text. Whether that reasoning will apply to music, visual art, and other content types is still being litigated.

What happened with the Suno and Udio lawsuits?

The Recording Industry Association of America sued both platforms in June 2024. Warner Music Group settled with Suno in November 2025, establishing a licensed AI music deal with compensation for rights holders. Universal Music Group settled with Udio in October 2025, creating the first major-label licensing template for AI music generation. Sony's claims against Udio remain ongoing. Suno and UMG are still in dispute as of April 2026, with settlement talks at an impasse.

In February 2024, a Delaware federal court ruled that Ross Intelligence's use of Westlaw's legal content to train an AI research tool was not fair use. The case docket and order are available on CourtListener (Thomson Reuters Enterprise Centre v. Ross Intelligence, D. Del.). This was the first major U.S. court ruling directly addressing AI training data as a copyright question. The court applied the four-factor fair use test and found it did not protect the AI company. This decision is persuasive authority, not binding across all courts, but it is the most significant judicial statement on AI training and fair use to date.

How do I know if my music or art was used to train an AI?

There is currently no reliable, comprehensive way to verify this for most creators. Some AI companies have published partial information about their training datasets, but disclosure is inconsistent and incomplete. Researchers and advocacy organizations have developed tools to detect when specific images appear in known training datasets, but music and text present different technical challenges. The honest answer: if your work is publicly accessible, it may have been scraped. You likely cannot confirm it without significant investigation.

Can I send a DMCA takedown to an AI company for training on my work?

The DMCA's takedown process targets infringing content that is stored and accessible: a video, an image, a file. AI training data is not typically stored and served in that way after the training process completes. DMCA takedowns are generally not the right tool for addressing past training data use. They may be relevant if an AI company is reproducing or distributing your specific work in its outputs, or if your work remains in an accessible dataset. This is a fact-specific question that depends on exactly what the company is doing with the content.

How do I opt my music or artwork out of AI training?

Opt-out options are available on some platforms but are inconsistent and limited in scope. Meta offers an opt-out for using your content to train its AI. Some AI music platforms allow catalog opt-outs through rights management systems. The AIRights Guide documents current opt-out options across major platforms. These opt-outs typically affect only future training. They do not remove your work from datasets that were already built. Using them is still worthwhile as a protective measure.

Can I sue an AI company for training on my work without permission?

You may have a claim, but the practical barriers for individual creators are significant. A standalone copyright infringement lawsuit against a large AI company involves substantial legal costs, years of litigation, and uncertain outcomes. Class action lawsuits have been filed by groups of creators, which distribute costs across many plaintiffs. If you believe your work was used and you want to evaluate your specific options, consult a copyright attorney. Collective action through industry organizations or class litigation is currently the more viable path for individual creators than individual suits.

In Part 3 of its Report on Copyright and Artificial Intelligence, released in May 2025, the Copyright Office concluded that the use of copyrighted works in AI training may constitute prima facie copyright infringement. On the fair use question, the Office's analysis "generally favours copyright owners." The Office recommended that Congress consider voluntary licensing frameworks and potentially a statutory mechanism to ensure creators are compensated when their work is used in AI training. The report does not resolve pending litigation but represents the federal government's copyright authority's clearest statement on the issue.

Registration does not retroactively remedy past unlicensed training. But it matters for your legal position. Copyright registration is required before you can file a lawsuit for infringement in U.S. federal court. Works registered before the infringement, or within three months of first publication, are eligible for statutory damages and attorney fees, which are the remedies that make litigation financially viable. If you have not registered your catalog, registering now strengthens your position for any future infringement, including claims related to AI use of your outputs.

What is the difference between AI training infringement and AI output infringement?

These are two separate legal questions. Training infringement is about whether copying your work into an AI training dataset, without permission, violates your copyright. Output infringement is about whether a specific AI-generated output reproduces or closely imitates your protected expression. You can have a training claim without an output claim, and an output claim without a training claim. The Suno and Udio suits focused primarily on training. Output claims require showing that a specific generated work is substantially similar to a specific copyrighted work you own.

Are AI companies going to be required to license creators' work in the future?

That is the direction legislative and regulatory pressure is moving. The U.S. Copyright Office recommended that Congress consider licensing frameworks. Several bills have been proposed. The EU's AI Act and related regulations impose disclosure requirements on AI training data. Major labels have established licensing templates through their settlements. Whether a mandatory U.S. licensing scheme passes and what it would look like remains to be seen. The realistic expectation for independent creators: some licensing framework will emerge, but the timeline and structure are uncertain.

This article is for general information only — not legal advice.

Aghil Ebrahimi, Esq.

About the author

Aghil Ebrahimi, Esq.

Founder of StarGuard Law. Trilingual IP and technology attorney licensed in California, Ontario, and Quebec. Former touring artist and tech founder who now represents creators, founders, and agencies at the intersection of law, technology, and culture.

Work With Me

Think this applies to your situation?

Book a free discovery call