moral progress
I wrote this essay a year ago, but I still feel the same and reflect that the work of aligning AI may not have a definite end.
May 17, 2023
I still continue to grapple with how to understand and how to best communicate the intricacies of my work, while being close to the system and my moral responsibilities that come with it, but often these things are so complex that I can hardly articulate them in writing. I aim to establish my own moral compass to navigate this complex space between concerns about unintended consequences from the technologies I help create and innate desire to push scientific progress and innovation forward.
I’m, for example, still unsettled by how my past choices in code shapes the present, and how it reflected the circumstances upon which it was written in the past. The choices I made then, which seemed pragmatic, now seem imperfect or limited in perspective. We all wish we did this differently, more perfect. But as my friend says, the difference between software craft and programming is that you have to live with the consequences of your past decisions. And I think I have to make peace with it by realizing that as long as I’m guided by principles in the first place, the key is to let my perspectives and understanding evolve, not cling to past notions of perfection.
I’m also finding myself at this contradiction of two worldviews: cautious worldview of researchers worried about uncontrolled progress in AI, and the ambitious worldview of builders racing to pioneer new forms of human-technology interaction. At the same time I also find myself in the middle of broader two worlds: while one corner fights physical war of the century, another builds technology of gods. I’m a part of all of these worlds, and this is the human condition that is so sublimely dichotomous, in all of its contradiction. But there is an elegance to inhabiting contradiction—it inspires growth and intimates imperfection.
Some of my friends have expressed the view that launching any new AI capability is morally wrong because it could either outpace our ability to grapple with the consequences, or spur other actors to recklessly charge ahead in developing the same advances without proper safeguards. I agree with the dangers of powerful systems and resonated a lot with Andy Matuschak’s essay “Ethics of AI-based invention: a personal inquiry” . As he said, “I’m worried about a rise in bewildering accidents and subtle injustices, as we hand ever more agency to inscrutable autonomous systems... These systems’ capabilities seem to be growing much more quickly than our ability to understand or cope with them.”
But I also think framing AI progress in terms of a dichotomy between "AI safety" and "AI capabilities" can be counterproductive — it suggests that more advanced AI is necessarily less safe, or that safer AI is necessarily less capable, when the goal should be to integrate safety and alignment in ever capable systems. Ultimately, is reducing hallucination a capability or a safety feature?
“We Must Declare Jihad Against AI” argues for rather a guided progress to benefit society rather than a complete ban or development shutdowns. A true "Butlerian Jihad" should not promote an anti-modern or anti-technological reactionary stance. Instead, it should adopt the cautious approach towards progress and cultural renewal that guided policymakers in earlier periods of US history, but has been largely forgotten recently. Current regulatory proposals fail to adequately address the threat to jobs and economic security, but the recent Senate’s hearing makes me hopeful.
While some may see shutting down AI progress as the morally perfect outcome, but in this view “perfect” is an endpoint - a static notion. I think moral progress is an ongoing process guided by core principles. We can allow capabilities to develop responsibly and judiciously rather than not at all. Doing so in house, under controlled conditions and with the best safety research, allows us to reap the benefits of progress while mitigating risks. But I also think AI safety should require recognizing the complex realities of an imperfect world, responding with sensitivity and nuance rather than pure absolutism.
There is an alternative perspective that reaches a similar conclusion. In game theory, complex games can enter endless "cycles" where play continues indefinitely without resolution. It may not be possible to create a debater or aligned AI system that is robust to all potential adversaries or misuses, no matter how much effort is expended. Instead, we may have to continuously devote resources to aligning AI without ever reaching a stable equilibrium — and that is a continuous process of adjustment to changing capabilities and incentives, rather than a fixed solution. The work of aligning AI, unfortunately, may not have a definite end.
Totally agree with the takeaway here. This never ending cycle of tweaking alignment also is inevitable due to the ever evolving consciousness of humans, which likely also doesn’t have an endpoint. As humans tweak their own preferences and behaviors, value alignment with AI will need to be tweaked as well - both on an individual and societal level. Free speech, expression, and informational exploration will be critical for human well being and growth.
The end of alignment work comes when AI is decisively smarter than human. At that point, it's out of our hands.