Meta Pirated 53 of My Books and Stories to Train Their AI


Pirated booksA fabulous analysis in The Atlantic by Alex Reisner The Unbelievable Scale of AI’s Pirated-Books Problem shows how employees at Meta, all the way up to “MZ” (presumably Zuckerberg), chose to steal 7.5 million books and 81 million research papers to train its flagship AI model, Llama 3.

Based on recently released court documents, Reisner says Meta employees downloaded and used the Library Genesis (LibGen) database, one of the largest pirated libraries available online. 

Within the database are 53 of my own copyrighted works, including various editions of the books I’ve written, some co-authored books, a book where I wrote a foreword, and one story published in a magazine.

Move Fast and Steal Things

DMS 53 booksI know it seems like I’m beating up on Meta this month, writing last week Freedom of Speech? The Book Facebook Wants to Suppress.

However, it’s a coincidence that the new book Careless People: A Cautionary Tale of Power, Greed, and Lost Idealism by former Facebook executive Sarah Wynn-Williams and the Atlantic article by Alex Reisner were both published within the last few days.

Reisner says Meta has argued in court “that it’s “fair use” to train their generative-AI models on copyrighted work without a license, because LLMs “transform” the original material into new work”. However, Reisner shares details from court documents that “Meta employees acknowledged in their internal communications that training Llama on LibGen presented a “medium-high legal risk,” and discussed a variety of “mitigations” to mask their activity.”

It’s not like they can’t afford to pay for data. Meta Platforms is one of the most valuable companies in the world. In 2024, annual revenue was USD $164 billion and net profit was USD $62 billion. 

AI is transforming the world

Yes, I am an enthusiastic and vocal supporter of the power of Artificial Intelligence to transform business and life.

I use AI every day in my work (such as the AI generated image to illustrate this post).

I’ve written often about AI and delivered talks to thousands of people on the topic. I’ve also invested in and advise AI companies. So yes, I am all in on AI.

At the same time, for more than twenty years I’ve pioneered the idea of using content as a form of marketing. I’ve advocated putting content out there for free to educate and inform potential customers, the media, investors, and others. I’ve published nearly 2,000 blog posts, have been a guest on hundreds of podcasts, and all of this is freely available. I’ve got a bunch of videos of my talks available for anyone to watch. I’ve published a bunch of free ebooks and hundreds of LinkedIn articles.

I’m totally cool with Generative AI tools training on all content that I put out there for free! Have at it.

However, Meta chose to rip off my paid content without my permission (or my publishers’ permission) and use it in ways I didn’t authorize.

Not cool, Zuck, not cool.

Images: Via ChatGPT from the prompt: To illustrate a blog post, I want an image of some hardcover books on a desktop or tabletop. Coming out of each book is a USB port with a cord that is connected to a computer.  Book database output via The Atlantic.

New call-to-action



We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0