Leaked Google document: “We Have No Moat, And Neither Does OpenAI”
The most interesting thing I've read recently about LLMs - a purportedly leaked document from a researcher at Google talking about the huge strategic impact open source models are having
https://simonwillison.net/2023/May/4/no-moat/
@simon Hey Simon, I’ve been holding off the use of ChatGPT, Bard, etc., even though I think they could be useful. This is because I can see (especially with ChatGPT) the horrible unethical behaviour that the companies are using in their arms race to deploy deploy deploy. With all the talk in this leaked doc about open source alternatives, do you know of any LLMs that are “ethically sourced” and available for the average punter to use? I don’t want to be left behind :/
@jimgar the ethics of this stuff is incredibly complicated
I'm very optimistic about the models being trained on the RedPajama data - there's one out already and evidently more to follow very shortly https://simonwillison.net/tags/redpajama/
Claude is an interesting option that's one of the most promising closed alternatives to ChatGPT - they have an interesting approach to AI safety which they call "constitutional AI" https://www.anthropic.com/index/introducing-claude
@simon thank you so much, l’ll give these a look. Everywhere I look in tech it’s one ethical nightmare after another
@resing @jimgar I'm saying I'm not sure it's possible to build a useful LLM without including copyrighted data in the training set
The ethics of this entire field are incredibly murky - I wrote about that last year https://simonwillison.net/2022/Aug/29/stable-diffusion/#ai-vegan
@simon @resing it *all* feels fundamentally wrong, so long as the results rely on indiscriminate harvesting of people’s work without permission. Literally the only compelling argument I have heard is the “necessary evil” Simon mentions - doing it anyway but making it open source. I just find it sad that this is the position we’re in at all, and worse, how little the majority of people seem to care about providence and permissions full stop.
@jimgar @resing search engines work by indiscriminately harvesting people's work without their permission, and have done for decades
What's different here isn't how the things are built, it's what they can be used for
People mostly tolerated search engines because they saw them as useful - they helped people's work be found, they didn't (appear to) threaten their livelihoods
@simon @jimgar the legal issue might be resolved soon. if @binarybits@instinctd.com is right, Stable Diffusion could lose the lawsuit against them. I buy his argument in favor of that. If that's the case, LLMs trained on sets that only allow that use might really take off https://arstechnica.com/tech-policy/2023/04/stable-diffusion-copyright-lawsuits-could-be-a-legal-earthquake-for-ai/