Advertisment

Meta allegedly uses pirated content to train AI models

Mark Zuckerberg, CEO of Meta, has been accused of deliberately using copyrighted material from pirated books to train artificial intelligence (AI) models, according to new court documents.

author-image
Social Samosa
New Update
xc

Meta has been accused of downloading materials from an online platform, Library Genesis (LibGen), which is the subject of a copyright infringement lawsuit, to train its artificial intelligence models. The accusation comes as part of a court filing in the ongoing case 'Richard Kadrey et al vs Meta Platforms,' in which authors, including novelist Richard Kadrey and comedian Sarah Silverman, claim that their copyrighted works were illegally used to train AI models.

The filing alleges that Meta downloaded documents from LibGen, a website known for distributing pirated books and other content, despite the ongoing legal challenges it faces from textbook publishers who accuse it of hosting and distributing stolen works. The plaintiffs claim that internal Meta documents reveal a debate over accessing LibGen, with some hesitations about using BitTorrent to download the content, before it was approved by a senior figure within the company.

One document filed by the plaintiffs also claims that the company removed copyright notices from materials downloaded from LibGen, possibly to avoid revealing that its models were trained using copyrighted content. Another filing, submitted by Meta, contests these claims, arguing that the use of LibGen was already known and documented months prior.

The dispute centres around the plaintiffs’ attempt to introduce a new legal claim under the California Comprehensive Computer Data Access and Fraud Act, which criminalises the unauthorised access of computers or networks with fraudulent intent. Meta, however, argues that this additional claim is unwarranted.

Meta has also rejected claims that it 'distributed' content from LibGen, countering the plaintiffs' assertion that using BitTorrent to download the content constitutes sharing pirated material. The company sought to seal some of the filings, arguing they contained commercially sensitive information, but the court denied the request, with the judge noting that the company's desire to prevent publicity appeared to be the main motive.

In a document Meta wanted to seal, an employee acknowledged that media coverage of the use of LibGen could damage the company’s position with regulators. Meta's alleged use of LibGen reflects ongoing concerns over the ethical sourcing of training data for AI models, an issue that has sparked widespread debate in the tech industry.

Meta meta piracy meta AI model