OpenAI creates a framework for understanding and dealing with the risks of advanced AI models

OpenAI shared that it has created the Preparedness Framework to help track, evaluate, forecast, and protect against the risks associated with advanced AI models that will exist in the future, or frontier models.

The Preparedness Framework is currently in beta, and it covers the actions OpenAI will take to safely develop and deploy frontier models.

RELATED CONTENT:

Anthropic, Google, Microsoft, and OpenAI form group dedicated to safe development of frontier AI models

OpenAI announces Superalignment grant fund to support research into evaluating superintelligent systems

Number one, it will run evaluations and develop scorecards for models, which the company will be continuously updating. During evaluation, it will push frontier models to their limits during training. The results of the evaluation will help both assess risks and measure the effectiveness of mitigation strategies. “Our goal is to probe the specific edges of what’s unsafe to effectively mitigate the revealed risks,” OpenAI stated in a post.

These risks will be defined across four categories and four risk levels. Categories include cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasions, and model autonomy, and risk levels will be low, medium, high, and critical. Only models that earn a post-mitigation score of high or below can be worked on further, and only models that are medium or lower can actually be deployed.

It will also create new teams to implement the framework. The Preparedness team will do technical work that examines the limits of frontier models, run evaluations, and synthesize reports, while the Safety Advisory Group will review these reports and present them to leadership and the Board of Directors.

The Preparedness team will regularly conduct drills to stress-test within the pressures of the business and its culture. The company will also have outside audits done and will continually red-team the models.

And finally, it will use its knowledge and expertise to track misuse in the real world and work with external parties to reduce safety risks.

“We are investing in the design and execution of rigorous capability evaluations and forecasting to better detect emerging risks. In particular, we want to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions. We also want to look beyond what’s happening today to anticipate what’s ahead. This is so critical to our mission that we are bringing our top technical talent to this work,” OpenAI wrote.