via https://ift.tt/zbnHfvL
cfiesler https://cfiesler.tumblr.com/post/702901922538061824/elon-musk-did-not-create-an-ai-trained-on-your :
Hi, AI ethicist + fanfiction expert here. (This is one of those times where I feel uniquely qualified to comment on something…)
I’m seeing this weird game of telephone about the Sudowrite AI that I think started out pretty accurate, but now has become “Elon Musk created an AI that is stealing your fanfiction” (which frankly gives him far too much credit). I can probably say more about this, but here are a few things that I want to clarify for folks, which can be boiled down to “Elon Musk has nothing to do with this” and “this is nothing new”:
Elon Musk is not involved in any way with Sudowrite, as far as I can tell. Sudowrite does, however, use GPT-3, the widely-used large language model created by OpenAI, which Elon Musk co-founded. He resigned in 2018, citing a conflict of interest due to Tesla’s AI development. It wasn’t until after he left that OpenAI went from being a non-profit to a capped for-profit. Elon Musk doesn’t have anything to do with OpenAI currently (and in fact just cut off their access to Twitter data), though I can’t find anything that confirms whether or not he might have shares in the company. I would also be shocked if Elon actually contributed anything but money to the development of GPT-3.
Based on Sudowrite’s description on their FAQ https://www.sudowrite.com/FAQ, they are not collecting any training data themselves - they’re just using GPT-3 paired with their own proprietary narrative model. And GPT-3 is trained on datasets like common crawl and webtext, which can simplistically be described as “scraping the whole internet.” Same as their DALL-E art generator. So it’s not surprising that AO3 would be in that dataset, along with everything else (e.g., Tumblr posts, blogs, news articles, all the words people write online) that doesn’t use technical means to prohibit scraping.
OpenAI does make money now, including from companies like Sudowrite paying for access to GPT-3. And Sudowrite itself is a paid service. So yes, someone is profiting from its use (though OpenAI is capped at no more than 100% return on investment) and I think that the conversations about art (whether visual or text) being used to train these models without consent of the artist are important conversations to be having.
I think it’s possible that what OpenAI is doing is legal (i.e., not copyright infringement) for some of the same reasons that fanfiction is legal (or perhaps more accurately, for reasons that many for-profit remixes are found to be fair use), but I think whether it’s ETHICAL is a completely different question, and I’ve seen a huge amount of disagreement on this.
But the last thing I will say is that this is nothing new. GPT-3 has been around for years and it’s not even the first OpenAI product to have used content scraped from the web. (Your picture was not posted)