Rapid Response Fund • filling critical time-sensitive funding gaps created by the suspension of foreign aid •Donate Now

Building global resilience to AI risks

Illustrative image

▲ Photo by Jakub Żerdzicki on Unsplash

Many philanthropists interested in reducing AI-related risks have focused on interventions that could help prevent the deployment of unsafe AI. Fewer funders are investing in resilience: defenses that help prepare us for scenarios after deployment, with the goal of preventing advanced AI from becoming an existential threat.

Funding opportunities that boost global resilience to AI risks, such as by developing better warning systems and incident response plans, can help fill a critical gap in our defenses. We recently made a grant from our Global Catastrophic Risks Fund that focuses on improving how the US government detects and responds to advanced AI threats, with the goal of building resilience and ensuring more robust AI safety policy.

In this blog post, we’ll:

  • Discuss the role building resilience plays in a “defense in depth” framework
  • Explore the ways we can invest in global resilience against AI risks
  • Explain the goals of our recent grant and the unique impact of the GCR Fund in this space

The defense in depth framework for AI risks

Imagine developing a strategy to protect a building from fire hazards. You might start by setting up preventative safety protocols to make a fire unlikely to happen, but that probably wouldn’t be the only step you would take. You’d also want to set up a smoke alarm system to warn you as early as possible when a fire begins, make sure fire extinguishers are easily accessible, and create an evacuation plan to minimize damage in the worst-case scenario.

This type of risk management approach, which involves using multiple layers of protection to create a reliable safety system, is called a “defense in depth” framework. This type of multi-layered framework is used in many fields, including nuclear safety and information security. The goal is to make sure that the system remains resilient even if any one layer of protection fails.

The “defense in depth” approach also applies to potentially existential risks, like misaligned AI. We recently published a deep-dive research report about the existential threats that could emerge from transformative AI, ranging from power-seeking AI systems to the misuse of AI for biological weapons. When it comes to mitigating AI risks and other extinction-level risks, a defense in depth framework involves three “defensive layers”:

  1. Prevention: Reducing the chance that a local catastrophe occurs, i.e., a dangerous AI system being deployed.
  2. Response: Reducing the likelihood that the catastrophe becomes global, i.e., a dangerous AI system gaining widespread power.
  3. Resilience: Reducing the likelihood that the global catastrophe leads to an existential catastrophe, i.e., a dangerous AI system threatening the destruction of humanity’s long-term potential.

Currently, a large proportion of funding and attention related to AI risk mitigation focuses on the prevention component, but all three layers are needed for a robust system. As companies and the world’s superpowers race to build and deploy increasingly advanced AI, the risk of unsafe deployment grows, and boosting global resilience to AI risks becomes increasingly important.

What could increased resilience look like?

Resilience measures kick in during the period between when a dangerous AI system gets deployed and when it becomes an existential threat. We think of resilience-building interventions as interventions that increase the takeover threshold, the point where an advanced AI system has enough capabilities to disempower humanity.

We’ve identified several promising directions for increasing the takeover threshold, including:

  • Creating robust shutdown mechanisms: Developing improved capabilities to rapidly disconnect or deactivate dangerous AI systems if needed. This could involve both technical and organizational measures.
  • Hardening human defenses: Preparing key individuals and institutions to resist manipulation or coercion attempts by advanced AI systems. This might include specialized training programs to reduce susceptibility to bribery, blackmail, and other forms of manipulation.
  • Clarifying containment protocols: Ensuring there are clear plans in place for what to do if an AI system appears to be dangerous and capable of takeover.
  • Closing gaps in existing threat models: Addressing vulnerabilities related to pandemics, nuclear war, and other risks that an adversarial AI might exploit.
  • Restricting AI from critical domains: Prohibiting the use of advanced AI systems in high-stakes areas like nuclear weapons control or critical infrastructure.
  • Incentivizing cooperation: Exploring ways to incentivize even misaligned AI systems to cooperate with humans rather than pursuing takeover, perhaps by guaranteeing them some resources to reduce the likelihood that they’ll fully take control.
  • Increasing general disaster preparedness: Boosting overall societal resilience to a range of catastrophic scenarios, which would also help in AI-driven crises.

Why focus on building resilience?

There’s a compelling case to be made for why boosting global resilience is a potentially impactful strategy for addressing AI risks. In particular, consider these four reasons:

  1. It serves as another layer of defense. Resilience interventions are a post-deployment correction that can provide value even after a dangerous AI system gets deployed. These interventions serve as a critical backup if other preventative measures fail.
  2. It’s a threat-agnostic approach with likely positive externalities. Increasing global resilience can help mitigate multiple AI-related risks simultaneously, rather than just tackling one specific threat. This makes it a robust strategy given our uncertainty around how advanced AI might develop. Also, many resilience-boosting measures would have wider benefits by helping to protect us from other global catastrophic threats as well, such as biological catastrophes and nuclear risks.
  3. It’s a relatively neglected area within philanthropy. While much focus in AI safety has been on technical alignment research or governance, less attention has gone to resilience and recovery strategies. This means that philanthropists and other funders have the opportunity to make a high impact on the margins.
  4. It’s likely to be more tractable than prevention. Preventing dangerous AI systems from being deployed will become increasingly difficult as competitive pressures incentivize companies and states to race to develop more advanced AI. Further, full prevention requires deep technical knowledge of AI, whereas we can build AI safety by building upon existing crisis risk management practices.

For these reasons, we believe building global resilience to AI risks is a particularly important, tractable, and neglected area for philanthropy.

Our Recent Grant to CNAS

The Center for a New American Security (CNAS) is a highly respected, non-partisan source of policy insights with a strong reputation for producing useful research, particularly on issues related to emerging technologies like AI. We recently made a $156,770 grant to CNAS through the Global Catastrophic Risks Fund, as part of an active grantmaking effort to improve how the US government detects and responds to advanced AI threats.

With this grant, CNAS will investigate ways to improve AI resilience mechanisms within the US government. They’ll research potential ways to improve public visibility into frontier AI development, identify and respond to early “AI warning shots,” and enhance public-private cooperation on AI incident containment and response. Based on this research, CNAS will develop actionable policy recommendations to share with US government policymakers.

Our hope is that this research will better prepare the US for post-deployment worlds, and that this will pave the way for other governments to apply similar policies as well.

Looking Forward

Resilience is not our only strategy for addressing AI risks. In an ideal world, we’d prevent the race to build dangerous AI from happening in the first place. Other forms of interventions, including technical research on AI alignment and improved governance frameworks, also have crucial roles to play in building a robust framework for AI policy.

Our research team functions as an internal think tank that develops detailed technical reports about biological risks, advanced AI, and other global catastrophic risks. All of our grantmaking is research-first, focusing on ways to identify and catalyze funding opportunities that can make a cost-effective impact in any given risk landscape.

Going forward, we’ll continue to track the impact of our grant to CNAS, which we’ll share in future updates. We also hope to identify and initiate new projects related to building resilience in the future.

If you want to support more active grantmaking to help defend the world against global risks like misaligned AI, consider donating to the GCR Fund.


About the author

Portrait

Hannah Yang

Research Communicator

Hannah joined Founders Pledge as the Research Communicator in September 2024. After earning a BA in Economics from Yale, she began her career as a corporate strategy consultant at McKinsey, and then pivoted into pursuing a creative career as a speculative fiction author and content writer. Her interest in the intersection of writing, data science, and real-world impact led her to her work at Founders Pledge.