A new study shows a simple way to make AI safer of bio -arme risks – with surprising results

0
Untitled-design-5.png



Welcome to AI! In this edition … Teaching deep ignorance … The great financing of cohere and the new hiring … Ai Dehilling … Anthropic acquires human co -founders … Chatppt market sharing.

And if preventing AI from helping someone build a biological weapon was as simple as never to teach them?

This question had long intrigued Stella Biderman, executive director of the basic non -profit research laboratory Eleuther AI. In collaboration with the British government’s IA security institute, and the main authors Kyle O’Brien and Stephen Casper, Biderman decided to find the answer – something that had never been explored in public before.

In a new newspaper, IgnoranceThe researchers found that filtering risky information from the training data of an AI model from the start can “cook” guarantees that are more difficult to falsify – even in open -source models that everyone can download and adapt. Above all, these protections did not harm the overall performance of the model.

To test the approach, the versions formed by the team of an open source AI model on data sets have been rubbed with certain “proxy” information-stand-ins for dangerous content, such as materials linked to bio-armes. The models formed on cleaner data were less able to produce harmful information, while performing just as well on most other tasks.

In a thread X on the project, Casper said that the objective was to make LLMS “not only in safety on the shelf, but also to resist harmful falsification”. This is difficult because most of the safety efforts have focused on post -training adjustments – changes made after building a model. These fixes, such as the responses of a model to be labeled to avoid dangerous outputs, can work in the short term but are easier to cancel and can sometimes weaken the model in an involuntarily. Pre-training filters aim to cook safe from the start, so the model remains safe even if someone tries to falsify later.

Biderman noted that this type of work is rare in public research because it is expensive and takes time – an obstacle for most academic and non -profit groups. The private companies of AI like Openai and Anthropic have the resources, she said, but avoid revealing the details of their pre-training for competitive reasons and concerns about the risks of copyright.

“They could absolutely do it, and who knows if they do,” she said. “They are incredibly secret and say nothing to you.” She highlighted the own OPENAI indications that he uses a certain filtering in his recently published open weight model and in his owner GPT-4O.

In the company’s model card for the open weight model, Openai writes: “To improve the safety of the model, we have filtered the data for harmful pre-training content, in particular around the knowledge of dangerous biosecurity, by reusing CBRN filters with GPT-4O.” In other words, the company applied the same screening process used in GPT-4O to eliminate chemical, biological, radiological and nuclear information potentially dangerous before training.

For Biderman, Ignorance is supposed to go beyond what technological companies are ready to say publicly. “Having it in public allows more people to do better,” she said. She added that it was partly motivated by the refrain of the technological industry that its massive data sets cannot be documented or examined. “There is a story that Openai particularly likes to tell how the data is unfathomable, how could we know what is in our data,” she said. “It’s something that pissed me off for a long time. I think that demonstrating on several occasions that this is false is important. ”

With that, here are the rest of the news from AI.

Sharon Goldman
sharon.goldman@fortune.com
@Sharongoldman

Fortune we have

The GPT-5 model router ignited a user reaction against Openai, but it could be the future of AI – by Sharon Goldman

AI already creates a billionaire boom: there are now 498 IA unicorns – and they are worth 2.7 billions of dollars – by Julia Coacci

A flood of Deepfakes of AI defies the financial sector, with more than 70% of new registrations for certain companies that are false – by Lionel Lim

Have in the news

COHERE RAISES $ 500 million, Hires train Meta Ai Leader Joelle Pineau. COHERE announced today that it had raised $ 500 million in a overcrowded financing link by valuing the company to $ 6.8 billion, led by Inovia Capital and Radical Ventures with the support of AMD Ventures, NVIDIA, PSP Investments, Salesforce Ventures and others. Cohere also announced that he had hired the former Meta AI chief, Joelle Pineau, as AI chief and François Chadwick as a financial director. “The fact that Joelle and François join at the same time that we bring this new funding is really a change Fortune. “The growth rate in 2025 has been absolutely incredible, companies realizing that our security approach is fundamentally unique – these superfoods we do.”

The AI quickly eroded the ability of doctors to identify cancer, discovers the study. According to Bloomberg, a new study in Gastroenterology and Lancet’s hepatology Offers an edifying history on medical AI: this can increase performance, but also causes skills erosion. Researchers have found that doctors using AI to identify precancerous colon growths have become so dependent on the tool that when it was removed, their detection rates have dropped by around 20% below the pre-Ai levels. The randomized trial, conducted in four endoscopy centers in Poland, suggests that excessive dependence at AI can make clinicians “less motivated, less targeted and less responsible” when working without it. The results then come that health systems – including the United Kingdom, which recently financed a major AI breast cancer test – are increasingly adopting AI to improve diagnostics.

Anthropic acquires co-founders and most of the team beyond Humanloop. Techcrunch reported that Anthropic has acquired the co-founders and most of the team behind Humanloop, a UK-based startup known for its company-focused AI tools, including rapid management, model assessment and observability. A dozen engineers and researchers – including CEO Raza Habib, CTO Peter Hayes and CPO Jordan Burgess – will join Anthropic, although the agreement has not included the assets or the Humanloop IP. The rental strengthens the thrust of the Anthropic company by adding experienced talents in the construction of the infrastructure which helps companies manage a safe and reliable AI on a large scale. Humanloop, founded in 2020, worked with customers like Duolingo, Gusto and Vanta, and previously lifted $ 7.91 million in YC and index Ventures seed funding.

You have a calendar

September 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attend.

October 6-10: World IA Week, Amsterdam

October 21-22: Tedai San Francisco. Apply to attend.

December 2-7: Neirips, San Diego

December 8-9: Fortune IA San Francisco. Apply to attend.

Eye on AI numbers

78.5%

This is the share of Chatgpt on the generating AI market today, according to similar data. The rest of the terrain drags far behind: Gemini (8.7%), Deepseek (4.1%), Grok (2.5%), perplexity (1.9%), Claude (1.6%) and Copilot (1.2%).

Less than three years after its beginnings in November 2022, Chatgpt is also the fifth most visited website in the world – and the fastest growth, with traffic of 134.9% from one year to the next.


https://fortune.com/img-assets/wp-content/uploads/2025/08/Untitled-design-5.png?resize=1200,600

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *