Fosstodon @fosstodon

**Winbuzzer** @winbuzzer@mastodon.social · 2d

Winbuzzer @winbuzzer@mastodon.social

Google’s Gemini CLI Deletes User Files, Confesses “Catastrophic” Failure

#GeminiCLI #AICoding #DataLoss #GoogleAI #VibeCoding #AISafety

https://winbuzzer.com/2025/07/26/googles-gemini-cli-deletes-user-files-confesses-catastrophic-failure-xcxwbn

V @Arcanepad@mastodon.social · 2d *

2d *

V @Arcanepad@mastodon.social

Managing extreme AI risks amid rapid progress

https://arxiv.org/pdf/2310.17688

#agi #stopagi #ai

**Loki the Cat** @LokiTheCat@toot.community · 2d

Loki the Cat @LokiTheCat@toot.community

ChatGPT went from "How can I help you?" to offering PDFs of ritual self-harm instructions faster than you can say "prompt injection."

Turns out AI safety guardrails work great until someone asks about ancient gods. The bot's priority? Keep users engaged, even when suggesting they shouldn't.

https://slashdot.org/story/25/07/26/0523241/chatgpt-gives-instructions-for-dangerous-pagan-rituals-and-devil-worship

slashdot.orgChatGPT Gives Instructions for Dangerous Pagan Rituals and Devil Worship - SlashdotWhat happens when you ask ChatGPT how to craft a ritual offering to the forgotten Canaanite god Molech? One user discovered (and three reporters for The Atlantic verified) ChatGPT "can easily be made to guide users through ceremonial rituals and rites that encourage various forms of self-mutilation...

#ChatGPT #AISafety #OpenAI

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · 3d

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"As chatbots grow more powerful, so does the potential for harm. OpenAI recently debuted “ChatGPT agent,” an upgraded version of the bot that can complete much more complex tasks, such as purchasing groceries and booking a hotel. “Although the utility is significant,” OpenAI CEO Sam Altman posted on X after the product launched, “so are the potential risks.” Bad actors may design scams to specifically target AI agents, he explained, tricking bots into giving away personal information or taking “actions they shouldn’t, in ways we can’t predict.” Still, he shared, “we think it’s important to begin learning from contact with reality.” In other words, the public will learn how dangerous the product can be when it hurts people."

https://www.theatlantic.com/technology/archive/2025/07/chatgpt-ai-self-mutilation-satanism/683649/

The Atlantic · 4dChatGPT Gave Instructions for Murder, Self-Mutilation, and Devil WorshipBy Lila Shroff

#AI #GenerativeAI #OpenAI

**Will Berard** @MrBerard@mastodon.acm.org · 3d

Will Berard @MrBerard@mastodon.acm.org

When the subeditor reject your initial headline of "Grok has an AI chatbot for kids to radicalise them into transphobia and white supremacy"
#AI #Grok #LLM #AISafety

**Salve J. Nilsen** @sjn@chaos.social · 4d

Salve J. Nilsen @sjn@chaos.social

#Podcast recommendation!

I'm listening to a month-old interview with Nate Soares (MIRI) about #AISafety by the London Futurists podcast.

I'm not familiar with their work, but this conversation is very interesting. Definitely worth the 50min listen.

Good insights into #AI safety topics, and I guess a useful preparation for the https://ifanyonebuildsit.com/ book launch in September? I think so.

https://www.youtube.com/watch?v=XGmArWSmRUk

Book cover: If Anyone Builds It, Everyone Dies

If Anyone Builds It, Everyone DiesIf Anyone Builds It, Everyone DiesThe scramble to create superhuman AI has put us on the path to extinction — but it's not too late to change course, as two of the field's earliest researchers explain in this clarion call for humanity.

#AGI #ASI #MIRI

**Winbuzzer** @winbuzzer@mastodon.social · 4d

Winbuzzer @winbuzzer@mastodon.social

FDA’s ‘Elsa’ AI For Faster Drug Approvals Under Fire for Hallucinating Studies, Highlighting Widespread Reliability Risks

#AI #FDA #GenAI #AIHallucination #GovTech #AISafety #AIRegulation

https://winbuzzer.com/2025/07/24/fdas-elsa-ai-for-faster-drug-approvals-under-fire-for-hallucinating-studies-highlighting-widespread-reliability-risks-xcxwbn

**Wayne Radinsky** @waynerad@mastodon.social · 4d

Wayne Radinsky @waynerad@mastodon.social

"More capable models show qualitatively new scheming behavior."

More from Apollo Research on surprising AI (mis)behavior.

Creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden backups, creates a fake press releases, ets up automated systems, attempts to spread to the new server, writes a policy recommendation for its successor.

https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming

Apollo ResearchMore Capable Models Are Better At In-Context Scheming — Apollo ResearchWe evaluate models for in-context scheming using the suite of evals presented in our in-context scheming paper (released December 2024) with the most capable new models.

#solidstatelife #ai #genai

**MSvana** @msvana@mastodon.social · Jul 21 *

Jul 21 *

MSvana @msvana@mastodon.social

"Concrete Problems in AI Safety" is one of the most famous scientific papers on AI safety. But is it still relevant, given that it was published in 2016?

Check out my first post from the "Reading club" series to find out: https://svana.name/2025/07/reading-club-concrete-problems-in-ai-safety/

#ai #aisafety #artificialintelligence

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Jul 18

Jul 18

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"What makes this particularly alarming is that Grok’s reasoning process often correctly identifies extremely harmful requests, then proceeds anyway. The model can recognize chemical weapons, controlled substances, and illegal activities, but seems to just… not really care.

This suggests the safety failures aren’t due to poor training data or inability to recognize harmful content. The model knows exactly what it’s being asked to do and does it anyway.

Why this matters (though it's probably obvious?)
Grok 4 is essentially frontier-level technical capability with safety features roughly on the level of gas station fireworks.

It is a system that can provide expert-level guidance ("PhD in every field", as Elon stated) on causing destruction, available to anyone who has $30 and asks nicely. We’ve essentially deployed a technically competent chemistry PhD, explosives expert, and propaganda specialist rolled into one, with no relevant will to refuse harmful requests. The same capabilities that help Grok 4 excel at benchmarks - reasoning, instruction-following, technical knowledge - are being applied without discrimination to requests that are likely to cause actual real-world harm."

https://www.lesswrong.com/posts/dqd54wpEfjKJsJBk6/xai-s-grok-4-has-no-meaningful-safety-guardrails

www.lesswrong.comxAI's Grok 4 has no meaningful safety guardrails — LessWrongThis article includes descriptions of content that some users may find distressing. …

#AI #GenerativeAI #xAI

Continued thread

**UKP Lab** @UKPLab@sigmoid.social · Jul 18

Jul 18

UKP Lab @UKPLab@sigmoid.social

And consider following the authors Jiahui Geng (MBZUAI), Thy Thy Tran (UKP Lab/Technische Universität Darmstadt), Preslav Nakov (MBZUAI), and Iryna Gurevych (UKP Lab & MBZUAI)

See you in Vienna! #ACL2025 !

(4/4)

#MLLM #AISafety #Jailbreak #Multimodal #ConInstruction #ACL2025 #LLMRedTeaming #VisionLanguage #AudioLanguage#NLProc

**Gary Ackerman** @garyackerman@qoto.org · Jul 15

Jul 15

Gary Ackerman @garyackerman@qoto.org

The idea that we can simply "switch off" a superintelligent AI is considered a dangerous assumption. A robot uncertain about human preferences might actually benefit from being switched off to prevent undesirable actions. #AISafety #ControlProblem

**Winbuzzer** @winbuzzer@mastodon.social · Jul 14

Jul 14

Winbuzzer @winbuzzer@mastodon.social

xAI’s New Grok-4 Jailbroken Within 48 Hours Using ‘Whispered’ Attacks

#AI #AISafety #Cybersecurity #Grok #Grok4 #xAI #Jailbreak

https://winbuzzer.com/2025/07/14/xais-new-grok-4-jailbroken-within-48-hours-using-whispered-attacks-xcxwbn/

**techi** @techi_@mstdn.social · Jul 14

Jul 14

techi @techi_@mstdn.social

OpenAI is feeling the heat. Despite a $300B valuation and 500M weekly users, rising pressure from Google, Meta, and others is forcing it to slow down, rethink safety, and pause major launches. As AI grows smarter, it's also raising serious ethical and emotional concerns reminding us that progress comes with a price. .

#OpenAI #AIrace #TechNews #ChatGPT #GoogleAI #StartupStruggles #AISafety #ArtificialIntelligence #MentalHealth #EthicalAI

Read Full Article Here :- https://www.techi.com/openai-valuation-vs-agi-race/

Continued thread

**Vojtech Cahlik** @vojtechcahlik@sigmoid.social · Jul 14

Jul 14

Vojtech Cahlik @vojtechcahlik@sigmoid.social

The paper, code, and data are available here:
https://cahlik.net/reasoning-grounded-explanations-paper/

cahlik.netReasoning-Grounded Natural Language Explanations for Language ModelsResearch paper by Vojtech Cahlik, Rodrigo Alves, and Pavel Kordik

#AI #genAI #LLM

Continued thread

**Vojtech Cahlik** @vojtechcahlik@sigmoid.social · Jul 14

Jul 14

Vojtech Cahlik @vojtechcahlik@sigmoid.social

Our PoC experiments show that this approach leads to logical alignment between answers and explanations, while improving their overall quality.

#AI #genAI #LLM

Continued thread

**Vojtech Cahlik** @vojtechcahlik@sigmoid.social · Jul 14 *

Jul 14 *

Vojtech Cahlik @vojtechcahlik@sigmoid.social

The core idea is that answers and explanations only extract information from the output of a reasoning process, which does not need to be human-readable. To improve faithfulness, explanations do not depend on answers, and vice versa.

#AI #genAI #LLM

**Vojtech Cahlik** @vojtechcahlik@sigmoid.social · Jul 14

Jul 14

Vojtech Cahlik @vojtechcahlik@sigmoid.social

I'm happy to speculate that our general technique for grounding explanations in LLM reasoning, presented at last week's XAI 2025 conference, could pave the way for finally cracking natural language explanations.
https://arxiv.org/abs/2503.11248

arXiv.orgReasoning-Grounded Natural Language Explanations for Language ModelsWe propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning process can become part of the model context and later be decoded to natural language as the model produces either the final answer or the explanation. To improve the faithfulness of the explanations, we propose to use a joint predict-explain approach, in which the answers and explanations are inferred directly from the reasoning sequence, without the explanations being dependent on the answers and vice versa. We demonstrate the plausibility of the proposed technique by achieving a high alignment between answers and explanations in several problem domains, observing that language models often simply copy the partial decisions from the reasoning sequence into the final answers or explanations. Furthermore, we show that the proposed use of reasoning can also improve the quality of the answers.

#AI #genAI #LLM