Fmllm: 4mb training data, 100mb model, Fibonacci embeddings, near-coherent. WTF?

Fmllm: 4mb training data, 100mb model, Fibonacci embeddings, near-coherent. WTF?
Universal Pictures Adds ‘May Not Be Used to Train AI’ to the End of its Movies https://petapixel.com/2025/08/13/universal-pictures-adds-may-not-be-used-to-train-ai-to-the-end-of-its-movies/ #universalstudios #trainingdata #Technology #aitraining #universal #News
Stability AI Wants a Spotify-Type Model for Images and AI Training Data https://petapixel.com/2025/07/31/stability-ai-wants-a-spotify-type-model-for-images-and-ai-training-data/ #aitrainingdata #trainingdata #stabilityai #Technology #aitraining #spotify #News
6 Essential Data Annotation Techniques that Drive Computer Vision
Our latest video on the 6 common types of annotation in Computer Vision reveals how the perfect blend of human intelligence and cutting-edge data annotation techniques can significantly enhance the performance and scalability of your AI and ML models.
“Suno, for those of you not familiar, is an #AI #SongGenerator: enter a text prompt (such as “a jazz, reggae, EDM pop song about my imagination”) and a song comes back. Like many #GenerativeAI companies, it is also being sued by all and sundry for ingesting #copyrighted #material. The parties in the suit — including major labels and the #RIAA — don’t have a smoking gun, since they can’t directly peek at Suno’s #TrainingData. But they have managed to generate some suspiciously similar-sounding AI generated materials, #mimicking (among others) “Johnny B. Goode,” “Great Balls of Fire,” and Jason Derulo’s habit of singing his own name.
#Suno essentially admits these songs were #regurgitated from #copyrighted source material, but it says such use was legal. “It is no secret that the tens of millions of #recordings that Suno’s model was trained on presumably included recordings whose rights are owned by the Plaintiffs in this case,” it says in its own legal filing. Whether AI training data constitutes fair use is a common but unsettled legal argument, and the plaintiffs contend Suno still amounts to “pervasive #illegal #copying” of artists’ works.”
#NYA / #music / #ElizabethLopatto / #amazon / #DataTheft <https://neilyoungarchives.com/news/3/article?id=Music%20-%20Amazon%20is%20blundering%20into%20an%20AI%20copyright%20nightmare>
Ethical AI Image Generator Bria Releases Next-Gen Model That Didn’t Steal Your Data https://petapixel.com/2025/07/01/ethical-ai-image-generator-bria-releases-next-gen-model-that-didnt-steal-your-data/ #aiimagegenerator #trainingdata #Technology #ethicalai #News #bria
It appears that we are not providing sufficient training data for Meta's AI tools. I mean, you can now upload your private stories to get AI suggestions for your next Instagram story.
#AI #trainingdata
https://thehackernews.com/2025/06/facebooks-new-ai-tool-requests-photo.html
Getty Images Drops Main Copyright Claims Against Stability AI in UK Legal Case https://petapixel.com/2025/06/26/getty-images-drops-main-copyright-claims-against-stability-ai-in-uk-legal-case/ #aiimagegenerator #gettyvsstability #machinelearning #stablediffusion #trainingdata #gettyimages #stabilityai #Technology #aitraining #copyright #aiimage #News #Law
Anthropic destroyed millions of print books to build its AI models https://arstechni.ca/YLWE #InternetArchive #machinelearning #AIdevelopment #bookscanning #legalrulings #trainingdata #AIcompanies #googlebooks #AIresearch #AItraining #Anthropic #copyright #AIethics #scanning #fairuse #Biz&IT #Policy #Claude #AIlaw #AI
Anthropic destroyed millions of print books to build its AI models - On Monday, court documents revealed that AI company Anthropi... - https://arstechnica.com/ai/2025/06/anthropic-destroyed-millions-of-print-books-to-build-its-ai-models/ #internetarchive #machinelearning #aidevelopment #bookscanning #legalrulings #trainingdata #aicompanies #googlebooks #airesearch #aitraining #anthropic #copyright #aiethics #scanning #fairuse #biz #policy #claude #ailaw #ai
#MIT researchers developed a method called #SelfAdapting #LanguageModels (#SEAL) that enables large language models to continuously #learn and #improve by generating #synthetic #trainingdata and updating their parameters based on new information. https://www.wired.com/story/this-ai-model-never-stops-learning/?eicker.news #tech #media #news
“Broad didn’t train his #AI on #Rothko; he didn’t train it on any #data at all. By hacking a #NeuralNetwork, and locking elements of it into a #recursive #loop, he was able to induce this AI into producing #images without any #TrainingData at all — no inputs, no influences.
Depending on your perspective, Broad’s art is either a pioneering display of pure artificial creativity, a look into the very soul of AI, or a clever but meaningless electronic by-product, closer to guitar feedback than music.
In any case, his work points the way toward a more creative and ethical use of #GenerativeAI beyond the large-scale manufacture of #DerivativeSlop now oozing through our visual culture.”
#Art / #TerenceBroad / #UnstableEquilibrium <https://www.theverge.com/ai-artificial-intelligence/688576/feed-ai-nothing>
Disney and Universal Studios Sue AI Image Generator Midjourney Over Copyright https://petapixel.com/2025/06/11/disney-and-universal-studios-sue-ai-image-generator-midjourney-over-copyright/ #aiimagegenerator #universalstudios #generativeai #trainingdata #aicopyright #midjourney #disney #News #Law
Oh, look! A thrilling 120-page #novella about Claude 4 System Cards, because in the #AI world, longer is clearly better... right?
Filled with steaming details for those who miss "Person of Interest" fan fiction and enjoy deciphering cryptic training data.
Meanwhile, landing pages remain a mythical creature in Anthropics' universe.
https://simonwillison.net/2025/May/25/claude-4-system-card/ #Claude4 #FanFiction #TrainingData #MythicalCreatures #HackerNews #ngated
Facial recognition algorithms developed in East Asia performed better on Asian subjects, while Western algorithms performed better on White subjects. This discrepancy is attributed to different racial distribution in training sets. #TrainingData #AIAccuracy
Just wondering how you collect as much Mastodon content as possible for AI training purposes ?
#AI #Mastodon #trainingdata
Open Web Crawl is such a security vulnerability, that I don’t know why it isn’t the top of the news every day.
If you turn on a general suction hose, how do you not realise there’s going to be a party of attackers right there feeding it all the #propaganda they possibly can?
How can you be so nonchalant about it? How do you not realise you created the biggest attack vector in the history of computing?
The EU’s #AI Challenge: Can Europe Compete Without Enough #TrainingData?
Daniel Friedlaender explains why AI #innovation depends on #data diversity – and how outdated #privacy approaches are a disadvantage in the global AI race.