How should we handle humanity’s problem child?

Policymakers must remain focused on overseeing harmful applications rather than restricting beneficial technologies if AI regulation to be productive, says expert and author Nell Watson  In 1965, British mathematician Irving John Good, a colleague of Alan Turing at Bletchley Park famously said: “The first ultra-intelligent machine...

Policymakers must remain focused on overseeing harmful applications rather than restricting beneficial technologies if AI regulation to be productive, says expert and author Nell Watson 

In 1965, British mathematician Irving John Good, a colleague of Alan Turing at Bletchley Park famously said: “The first ultra-intelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” Even though Good would go on to serve as a consultant on supercomputers for Stanley Kubrick’s 2001: A Space Odyssey, it’s debatable whether he could have imagined the genie would be out of the bottle quite so soon.  

The rapid rise of agentic AI – artificial intelligence systems capable of adaptively achieving complex goals with minimal oversight – presents humanity with immense opportunities and risks. Today’s large language models, though not built for planning and real-world action, possess latent agentic capabilities that, when combined with additional programs, allow them to reason, self-check, and solve challenges autonomously. The first such models are now emerging, demonstrating remarkable capacities for innovation and problem-solving. While promising tremendous value, this raises concerns about AI optimising for objectives misaligned with human intent, potentially causing severe physical, emotional, financial, and reputational harms, as well as harms to knowledge and understanding.

A key challenge is the difficulty of robustly encoding human values into AI systems to ensure proper alignment. Even fully specifying human preferences may not suffice. Translating those preferences into machine objectives that reliably produce intended behaviour remains an unsolved problem. Slight misspecifications could lead to unintended and potentially catastrophic consequences. Ensuring advanced AI remains responsive to human oversight and modification is critical. Guardrails like iterative feedback loops, robust monitoring, and emergency shut-off procedures are needed to safely interrupt and correct misaligned AI systems. Research into corrigible and interruptible AI architectures should be prioritised.

Current language models integrated with application programming interfaces (APIs) are already tackling real-world problems powerfully. But this is only the beginning. Agentic AI paves the way toward artificial general intelligence (AGI) that could match and exceed the human mind, automating cognitive work on an unprecedented scale. An AGI could supercharge technological progress by synthesising knowledge across disciplines and improving itself. Parallel advances in quantum computing, biological substrates, robotics, manufacturing, and materials science could expand the possibilities for humans flourishing a hundredfold within a generation, but they also introduce novel existential risks.

The breakneck pace of AI progress, fueled by an intense commercial race to develop larger, faster, more opaque models, makes its trajectory unpredictable and potentially catastrophic. As models scale up and training data becomes richer, they gain capabilities in unexpected, uncontrolled leaps. More advanced systems are increasingly likely to develop recursive self-improvement capacities, potentially leading to a difficult-to-control intelligence explosion. We may be closer to AGI than commonly believed. From there, the path to a super-intelligence that rapidly eclipses humanity could be short and abrupt.

Formidable challenges of regulation

Without dramatic progress in AI alignment and transparency, this acceleration process becomes enormously risky. An advanced AI pursuing goals even slightly misaligned with ours could pose an existential threat, even without explicit malicious intent. A sufficiently advanced system vigorously pursuing a misspecified objective without common sense or concern for negative impacts could cause immense damage. AI safety isn’t just about preventing malicious use – it’s about ensuring advanced systems remain safely under human control and are understandable as they grow in power.

Policymakers face a formidable challenge in regulating this promising but perilous technology, and the EU’s AI Act is an important step in the right direction (see overleaf). They must proactively guide its development without unduly stifling beneficial innovation. Public opinion appears wary of unfettered AI development and skeptical of industry self-regulation adequately managing risks.

Governments therefore have a pivotal role to play, in concert with industry and civil society, to establish regulatory guardrails. One approach is to mandate formal pre-deployment safety certifications, especially in high-stakes domains. Developers could be held legally liable for harms caused by unsafe deployments. Models should undergo rigorous assessment for potential misuse or adverse societal impacts prior to release, with system transparency and interpretability being key factors. Black-box models, impossible to audit, present greater risks. Systems exhibiting agentic planning, self-modification, deception, or context manipulation warrant heightened scrutiny. Ongoing monitoring is critical for catching unanticipated problems.

Robust accountability mechanisms will be crucial for incentivising responsible development. Legal liability for AI harms should be clearly allocated to the appropriate parties. Ethical review boards, audits, and impact assessments can help ensure appropriate oversight. International coordination will prove vital for upholding consistent standards and preventing a race to the bottom. The ability of advanced AIs to potentially co-opt weaker systems also necessitates monitoring AI-to-AI interactions.

However, poorly constructed restrictions could backfire. Regulating rapidly evolving capabilities is deeply challenging. Policymakers must remain laser-focused on overseeing harmful applications rather than restricting beneficial technologies. Rigid constraints might drive development underground while hampering valuable research. An adaptive regulatory framework that evolves as capabilities advance while robustly serving the public interest is needed. If AI proves to be an existential threat, poorly designed regulations could worsen things by creating severe power imbalances.

Maintaining openness and trust through open-source AI development could enable broad participation and sharing of expertise. This approach would also facilitate collaborative efforts to identify and mitigate potential threats. Regulating high-risk applications and mandating disclosure can help address harms and inform responsible use without restricting potentially beneficial technologies. As AI’s complex impacts remain uncertain until more advanced systems emerge, we must regulate judiciously to avoid unintended negative consequences. Increasingly autonomous AI will profoundly impact human agency in complex ways, demanding proactive policy consideration. AI systems that deceive humans pose serious risks to personal autonomy. Key concerns include consent, privacy safeguards, and avoiding excessive dependence on AI systems. We must carefully balance AI’s benefits against the importance of human decision-making and control.

As we enter an era of increasingly agentic AI, the need for foresight and wisdom couldn’t be more pressing. The systems we build today are the scaffolding for what could become the most powerful technology in history – with the potential to help solve humanity’s greatest challenges or imperil our very existence. While further breakthroughs are needed to develop AI capable of open-ended learning and planning, we are swiftly approaching a threshold where AI begins to match and exceed human cognition across myriad domains. These systems, if developed without rigorous safety constraints, would prove immensely difficult to control. This risk is especially severe if they gain power and influence through co-opting economic and political systems.

The AI technologies developed in the coming years could be immensely beneficial, but only if we establish the proper foundations today. By crafting judicious regulations focused on ensuring advanced AI remains safe and beneficial, we can create a governance framework to harness the rewards and mitigate the risks. The alternative – recklessly forging ahead without a shared safety roadmap – is to treat humanity’s future as an afterthought.

We must proceed thoughtfully and carefully, with collaboration across stakeholder groups, to establish clear guidelines. With foresight and proactive leadership in Europe and beyond, we can build a future in which increasingly agentic AI systems remain a wellspring of flourishing rather than an existential threat.

About the author

Nell Watson is an AI expert, ethicist and author of ‘Taming the Machine: Ethically harness the power of AI’ published by Kogan Page, priced £13.94. www.tamingthemachine.com