Fosstodon @fosstodon

Recent searches

Search options

Only available when logged in.

**Simon Willison** @simon@simonwillison.net · May 4, 2023

May 4, 2023

Simon Willison @simon@simonwillison.net

Leaked Google document: “We Have No Moat, And Neither Does OpenAI”

The most interesting thing I've read recently about LLMs - a purportedly leaked document from a researcher at Google talking about the huge strategic impact open source models are having
https://simonwillison.net/2023/May/4/no-moat/

simonwillison.netLeaked Google document: “We Have No Moat, And Neither Does OpenAI”SemiAnalysis published something of a bombshell leaked document this morning: Google “We Have No Moat, And Neither Does OpenAI”. The source of the document is vague: The text below is …

Jim Gardner @jimgar@fosstodon.org

@simon Hey Simon, I’ve been holding off the use of ChatGPT, Bard, etc., even though I think they could be useful. This is because I can see (especially with ChatGPT) the horrible unethical behaviour that the companies are using in their arms race to deploy deploy deploy. With all the talk in this leaked doc about open source alternatives, do you know of any LLMs that are “ethically sourced” and available for the average punter to use? I don’t want to be left behind :/

May 04, 2023, 09:33 PM··Mastodon for iOS

0boosts·0favorites

**Simon Willison** @simon@simonwillison.net · May 4, 2023

May 4, 2023

Simon Willison @simon@simonwillison.net

@jimgar the ethics of this stuff is incredibly complicated

I'm very optimistic about the models being trained on the RedPajama data - there's one out already and evidently more to follow very shortly https://simonwillison.net/tags/redpajama/

simonwillison.netSimon Willison on redpajama

**Simon Willison** @simon@simonwillison.net · May 4, 2023

May 4, 2023

Simon Willison @simon@simonwillison.net

Claude is an interesting option that's one of the most promising closed alternatives to ChatGPT - they have an interesting approach to AI safety which they call "constitutional AI" https://www.anthropic.com/index/introducing-claude

AnthropicIntroducing ClaudeAfter working with key partners for the past few months, we’re opening up access to Claude, our AI assistant.

**Jim Gardner** @jimgar · May 5, 2023

May 5, 2023

Jim Gardner @jimgar

@simon thank you so much, l’ll give these a look. Everywhere I look in tech it’s one ethical nightmare after another

**Tom Resing** @resing@social.coop · May 6, 2023

May 6, 2023

Tom Resing @resing@social.coop

@simon what's your take on the copyrighted material included in RedPajama through CommonCrawl? It seems to me that one could train a model on only text that has been shared freely and that might be more ethical. cc @jimgar

**Simon Willison** @simon@simonwillison.net · May 6, 2023

May 6, 2023

Simon Willison @simon@simonwillison.net

@resing @jimgar I'm not convinced it's possible to train a usable LLM without including copyrighted material in they raw pretraining data

As such, personally think it's a necessary evil to avoid a monopoly on LLM technology belonging to organizations that are willing to train against crawler data

**Tom Resing** @resing@social.coop · May 6, 2023

May 6, 2023

Tom Resing @resing@social.coop

@simon @jimgar not sure I follow. Are you saying that crawler data, which includes copyrighted material shouldn’t be used by commercial companies and LLMs are inherently flawed because of that? If so, I’m not saying you’re wrong, just trying to understand.

**Simon Willison** @simon@simonwillison.net · May 6, 2023

May 6, 2023

Simon Willison @simon@simonwillison.net

@resing @jimgar I'm saying I'm not sure it's possible to build a useful LLM without including copyrighted data in the training set

The ethics of this entire field are incredibly murky - I wrote about that last year https://simonwillison.net/2022/Aug/29/stable-diffusion/#ai-vegan

simonwillison.netStable Diffusion is a really big dealIf you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be. Stable Diffusion is a new “text-to-image diffusion model” that was released to the …

**Jim Gardner** @jimgar · May 6, 2023

May 6, 2023

Jim Gardner @jimgar

@simon @resing it *all* feels fundamentally wrong, so long as the results rely on indiscriminate harvesting of people’s work without permission. Literally the only compelling argument I have heard is the “necessary evil” Simon mentions - doing it anyway but making it open source. I just find it sad that this is the position we’re in at all, and worse, how little the majority of people seem to care about providence and permissions full stop.

**Simon Willison** @simon@simonwillison.net · May 6, 2023

May 6, 2023

Simon Willison @simon@simonwillison.net

@jimgar @resing search engines work by indiscriminately harvesting people's work without their permission, and have done for decades

What's different here isn't how the things are built, it's what they can be used for

People mostly tolerated search engines because they saw them as useful - they helped people's work be found, they didn't (appear to) threaten their livelihoods

**Simon Willison** @simon@simonwillison.net · May 6, 2023

May 6, 2023

Simon Willison @simon@simonwillison.net

@jimgar @resing note that I'm not saying that search engines were morally/ethically pure here either!

The ethics around this are deeply complicated - there are no easy or obvious answers

**Tom Resing** @resing@social.coop · May 6, 2023

May 6, 2023

Tom Resing @resing@social.coop

@simon @jimgar the legal issue might be resolved soon. if @binarybits@instinctd.com is right, Stable Diffusion could lose the lawsuit against them. I buy his argument in favor of that. If that's the case, LLMs trained on sets that only allow that use might really take off https://arstechnica.com/tech-policy/2023/04/stable-diffusion-copyright-lawsuits-could-be-a-legal-earthquake-for-ai/

Ars Technica · Apr 3, 2023Stable Diffusion copyright lawsuits could be a legal earthquake for AIExperts say generative AI is in uncharted legal waters.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back