fosstodon.org is one of the many independent Mastodon servers you can use to participate in the fediverse.
Fosstodon is an invite only Mastodon instance that is open to those who are interested in technology; particularly free & open source software. If you wish to join, contact us for an invite.

Administered by:

Server stats:

8.8K
active users

#aisafety

8 posts8 participants0 posts today

"As chatbots grow more powerful, so does the potential for harm. OpenAI recently debuted “ChatGPT agent,” an upgraded version of the bot that can complete much more complex tasks, such as purchasing groceries and booking a hotel. “Although the utility is significant,” OpenAI CEO Sam Altman posted on X after the product launched, “so are the potential risks.” Bad actors may design scams to specifically target AI agents, he explained, tricking bots into giving away personal information or taking “actions they shouldn’t, in ways we can’t predict.” Still, he shared, “we think it’s important to begin learning from contact with reality.” In other words, the public will learn how dangerous the product can be when it hurts people."

theatlantic.com/technology/arc

The Atlantic · ChatGPT Gave Instructions for Murder, Self-Mutilation, and Devil WorshipBy Lila Shroff

#Podcast recommendation!

I'm listening to a month-old interview with Nate Soares (MIRI) about #AISafety by the London Futurists podcast.

I'm not familiar with their work, but this conversation is very interesting. Definitely worth the 50min listen.

Good insights into #AI safety topics, and I guess a useful preparation for the ifanyonebuildsit.com/ book launch in September? I think so.

youtube.com/watch?v=XGmArWSmRUk

Book cover: If Anyone Builds It, Everyone Dies
If Anyone Builds It, Everyone DiesIf Anyone Builds It, Everyone DiesThe scramble to create superhuman AI has put us on the path to extinction — but it's not too late to change course, as two of the field's earliest researchers explain in this clarion call for humanity.

"More capable models show qualitatively new scheming behavior."

More from Apollo Research on surprising AI (mis)behavior.

Creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden backups, creates a fake press releases, ets up automated systems, attempts to spread to the new server, writes a policy recommendation for its successor.

apolloresearch.ai/blog/more-ca

Apollo ResearchMore Capable Models Are Better At In-Context Scheming — Apollo ResearchWe evaluate models for in-context scheming using the suite of evals presented in our in-context scheming paper (released December 2024) with the most capable new models.

"What makes this particularly alarming is that Grok’s reasoning process often correctly identifies extremely harmful requests, then proceeds anyway. The model can recognize chemical weapons, controlled substances, and illegal activities, but seems to just… not really care.

This suggests the safety failures aren’t due to poor training data or inability to recognize harmful content. The model knows exactly what it’s being asked to do and does it anyway.

Why this matters (though it's probably obvious?)
Grok 4 is essentially frontier-level technical capability with safety features roughly on the level of gas station fireworks.

It is a system that can provide expert-level guidance ("PhD in every field", as Elon stated) on causing destruction, available to anyone who has $30 and asks nicely. We’ve essentially deployed a technically competent chemistry PhD, explosives expert, and propaganda specialist rolled into one, with no relevant will to refuse harmful requests. The same capabilities that help Grok 4 excel at benchmarks - reasoning, instruction-following, technical knowledge - are being applied without discrimination to requests that are likely to cause actual real-world harm."

lesswrong.com/posts/dqd54wpEfj

www.lesswrong.comxAI's Grok 4 has no meaningful safety guardrails — LessWrongThis article includes descriptions of content that some users may find distressing. …

OpenAI is feeling the heat. Despite a $300B valuation and 500M weekly users, rising pressure from Google, Meta, and others is forcing it to slow down, rethink safety, and pause major launches. As AI grows smarter, it's also raising serious ethical and emotional concerns reminding us that progress comes with a price. .

#OpenAI #AIrace #TechNews #ChatGPT #GoogleAI #StartupStruggles #AISafety #ArtificialIntelligence #MentalHealth #EthicalAI

Read Full Article Here :- techi.com/openai-valuation-vs-

Continued thread

The core idea is that answers and explanations only extract information from the output of a reasoning process, which does not need to be human-readable. To improve faithfulness, explanations do not depend on answers, and vice versa.

#AI#genAI#LLM