2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

L4sBot · 1 year ago

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

@[email protected] · edit-2 1 year ago

Also, it should be mentioned that pretty much all games are in some form derivative works. Lets take Undertale since I’m most familiar with it. It’s well known that Undertale takes a lot of elements from other games. RPG mechanics from Mother and Earthbound. Bullet hell mechanics from games like Touhou Project. And more from games like Yume Nikki, Moon: Remix RPG Adventure, Cave Story. And funnily enough, the creator has even cited Mario & Luigi as a potential inspiration.

So why was it allowed to exist without being struck down? Because it fits the definition of a derivative works to the letter. You can find individual elements which are taken almost directly from other games, but it doesn’t try to be the same as what it was created after.

@Eccitaze · 1 year ago

Undertale was allowed to exist because none of the elements it took inspiration from were eligible for copyright protection. Everything that could have qualified for copyright protection–the dialogue, plot, graphical assets, music, source code–were either manually reproduced directly by Toby Fox and Temmie Chang, or used under permissive licenses that allowed reproduction (e.g. the GameMaker Studio engine). Meanwhile, the vast majority of content OpenAI used to feed its AI models were not produced by OpenAI directly, nor were they obtained under permissive license.

So… thanks for proving my point?

@[email protected] · edit-2 1 year ago

The AI models (not specifically OpenAI’s models) do not contain the original material they were trained on. Just like the creators of Undertale consumed the games they were inspired by into their brain, and learned from them, so did the AI learn from the material it was trained on and learned how to make similar yet distinctly different output. You do not need a permissive license to learn from something once it has been publicized.

You can’t just put your artwork up on a wall and then demand every person who looks at it to not learn from it while simultaneously allowing them to look at it because you have a license that says learning from it is not allowed - that’s insane and hence why (as far as I know) no legal system acknowledges that as a legal defense.

@Eccitaze · 1 year ago

That’s utterly ridiculous. Even ignoring that OpenAI keeps private the training dataset they use to produce their GPT models–almost certainly because if they did publish it, it would be an open confession that they stole the content of virtually everybody on the internet for their profit–if the training datasets don’t preserve the original material they were trained on, how can someone ask ChatGPT to act as an ebook reader and get it to print the first few pages of Harry Potter and the Sorcerer’s Stone?

@[email protected] · 1 year ago

Meanwhile, the vast majority of content OpenAI used to feed its AI models were not produced by OpenAI directly, nor were they obtained under permissive license.

That’s input, not output, so not relevant to copyright law. If your arguments focused on the times that ChatGPT reproduced copyrighted works then we can talk about some kind of ContentID system for preventing that before it happens or compensating the creators of it does. I think we can all acknowledge that it feels iffy that these models are trained on copyrighted works but this is a brand new technology. There’s almost certainly a win-win outcome here.

@Eccitaze · 1 year ago

If I include copyrighted source code in my game, that’s still copyright infringement, even if the output of that source code is totally different. No difference between that and ChatGPT.