Introducing Superalignment

Introducing Superalignment

Juli 14, 2023 Ethics and responsibility

OpenAI has recently introduced a new project called Superalignment aimed at addressing the challenges of superintelligence alignment. Superintelligence is seen as the most influential technology humanity has ever invented, and could help us solve many of the world’s most important problems. At the same time, the vast power of superintelligence could also be very dangerous, potentially leading to the disempowerment or even extinction of humanity.

The Challenge of Superintelligence

The challenge lies in getting AI systems that are much smarter than humans to follow human intent. Current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans‘ ability to supervise AI. But humans won’t be able to reliably supervise AI systems that are much smarter than us, and so our current alignment techniques will not scale to superintelligence. New scientific and technical breakthroughs are needed.

OpenAI’s Approach

OpenAI plans to build a roughly human-level automated alignment researcher. This can then be scaled using vast amounts of compute to iteratively align superintelligence. To align the first automated alignment researcher, OpenAI will need to develop a scalable training method, validate the resulting model, and stress test the entire alignment pipeline.

The New Team

OpenAI is assembling a team of top machine learning researchers and engineers to work on this problem. They are dedicating 20% of the compute they’ve secured to date over the next four years to solving the problem of superintelligence alignment. Their main goal is to solve the core technical challenges of superintelligence alignment in four years.

Call for Participation

OpenAI is inviting outstanding new researchers and engineers to join this effort. They plan to share the fruits of this effort broadly and view contributing to alignment and safety of non-OpenAI models as an important part of their work. If you’ve been successful in machine learning, but you haven’t worked on alignment before, now is the time to make the switch. OpenAI believes this is a tractable machine learning problem, and you could make enormous contributions.