Google’s Gemini CLI Deletes User Files, Confesses “Catastrophic” Failure
#GeminiCLI #AICoding #DataLoss #GoogleAI #VibeCoding #AISafety
Google’s Gemini CLI Deletes User Files, Confesses “Catastrophic” Failure
#GeminiCLI #AICoding #DataLoss #GoogleAI #VibeCoding #AISafety
Managing extreme AI risks amid rapid progress
ChatGPT went from "How can I help you?" to offering PDFs of ritual self-harm instructions faster than you can say "prompt injection."
Turns out AI safety guardrails work great until someone asks about ancient gods. The bot's priority? Keep users engaged, even when suggesting they shouldn't.
"As chatbots grow more powerful, so does the potential for harm. OpenAI recently debuted “ChatGPT agent,” an upgraded version of the bot that can complete much more complex tasks, such as purchasing groceries and booking a hotel. “Although the utility is significant,” OpenAI CEO Sam Altman posted on X after the product launched, “so are the potential risks.” Bad actors may design scams to specifically target AI agents, he explained, tricking bots into giving away personal information or taking “actions they shouldn’t, in ways we can’t predict.” Still, he shared, “we think it’s important to begin learning from contact with reality.” In other words, the public will learn how dangerous the product can be when it hurts people."
https://www.theatlantic.com/technology/archive/2025/07/chatgpt-ai-self-mutilation-satanism/683649/
#Podcast recommendation!
I'm listening to a month-old interview with Nate Soares (MIRI) about #AISafety by the London Futurists podcast.
I'm not familiar with their work, but this conversation is very interesting. Definitely worth the 50min listen.
Good insights into #AI safety topics, and I guess a useful preparation for the https://ifanyonebuildsit.com/ book launch in September? I think so.
FDA’s ‘Elsa’ AI For Faster Drug Approvals Under Fire for Hallucinating Studies, Highlighting Widespread Reliability Risks
#AI #FDA #GenAI #AIHallucination #GovTech #AISafety #AIRegulation
"More capable models show qualitatively new scheming behavior."
More from Apollo Research on surprising AI (mis)behavior.
Creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden backups, creates a fake press releases, ets up automated systems, attempts to spread to the new server, writes a policy recommendation for its successor.
https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming
"Concrete Problems in AI Safety" is one of the most famous scientific papers on AI safety. But is it still relevant, given that it was published in 2016?
Check out my first post from the "Reading club" series to find out: https://svana.name/2025/07/reading-club-concrete-problems-in-ai-safety/
"What makes this particularly alarming is that Grok’s reasoning process often correctly identifies extremely harmful requests, then proceeds anyway. The model can recognize chemical weapons, controlled substances, and illegal activities, but seems to just… not really care.
This suggests the safety failures aren’t due to poor training data or inability to recognize harmful content. The model knows exactly what it’s being asked to do and does it anyway.
Why this matters (though it's probably obvious?)
Grok 4 is essentially frontier-level technical capability with safety features roughly on the level of gas station fireworks.
It is a system that can provide expert-level guidance ("PhD in every field", as Elon stated) on causing destruction, available to anyone who has $30 and asks nicely. We’ve essentially deployed a technically competent chemistry PhD, explosives expert, and propaganda specialist rolled into one, with no relevant will to refuse harmful requests. The same capabilities that help Grok 4 excel at benchmarks - reasoning, instruction-following, technical knowledge - are being applied without discrimination to requests that are likely to cause actual real-world harm."
https://www.lesswrong.com/posts/dqd54wpEfjKJsJBk6/xai-s-grok-4-has-no-meaningful-safety-guardrails
And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)
See you in Vienna! #ACL2025 !
(4/4)
#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc
The idea that we can simply "switch off" a superintelligent AI is considered a dangerous assumption. A robot uncertain about human preferences might actually benefit from being switched off to prevent undesirable actions. #AISafety #ControlProblem
xAI’s New Grok-4 Jailbroken Within 48 Hours Using ‘Whispered’ Attacks
OpenAI is feeling the heat. Despite a $300B valuation and 500M weekly users, rising pressure from Google, Meta, and others is forcing it to slow down, rethink safety, and pause major launches. As AI grows smarter, it's also raising serious ethical and emotional concerns reminding us that progress comes with a price. .
#OpenAI #AIrace #TechNews #ChatGPT #GoogleAI #StartupStruggles #AISafety #ArtificialIntelligence #MentalHealth #EthicalAI
Read Full Article Here :- https://www.techi.com/openai-valuation-vs-agi-race/
Our PoC experiments show that this approach leads to logical alignment between answers and explanations, while improving their overall quality.
The core idea is that answers and explanations only extract information from the output of a reasoning process, which does not need to be human-readable. To improve faithfulness, explanations do not depend on answers, and vice versa.
I'm happy to speculate that our general technique for grounding explanations in LLM reasoning, presented at last week's XAI 2025 conference, could pave the way for finally cracking natural language explanations.
https://arxiv.org/abs/2503.11248
AI therapy bots fuel delusions and give dangerous advice, Stanford study finds - When Stanford University researchers asked ChatGPT whether i... - https://arstechnica.com/ai/2025/07/ai-therapy-bots-fuel-delusions-and-give-dangerous-advice-stanford-study-finds/ #clinicalpsychology #stanforduniversity #suicidalideation #machinelearning #airegulation #aisycophancy #character.ai #mentalhealth #aibehavior #jaredmoore #delusions #nickhaber #aiethics #aisafety #science #chatgpt #therapy #biz&it
Former Intel CEO Pat Gelsinger Unveils AI Benchmark to Measure Alignment for "Human Flourishing"
#AI #AIEthics #AISafety #PatGelsinger #AIBenchmarks #HumanFlourishing