principled AGI progress

A new design mandate for AGI, moral craftsmanship, and inhabiting contradiction.

May 23, 2023

A New Design Mandate for AGI

Perfection implies stasis, yet in technology the only constant is change.

In this field we have this unique opportunity to think of the most ideal interface for human-AI interaction. But I’m coming to the conclusion that inasmuch as we dislike our most familiar chatbot designs or want to strive for the most ideal design, there is no such thing as a perfect conversational interface to the AGI. Because it itself will redefine our notions of what perfection and an ideal conversational experience will look like, depending on its future capabilities and the passage of time.

My relationship with AI has developed in complex ways since I last wrote about it. As much as I shape Claude, it seems to be shaping me in turn—not just through what it says but by virtue of my daily work that has fundamentally shifted over the past months. I’ll share a few glimpses:

Rapid prompt experimentation has replaced 40% of my prototyping in order to see early signs of life for both product and research ideas. Mocking APIs and seeing how Claude performs on a thing I want gives me a heuristic as to whether I can ship the feature in the next few days, or if I want to actually teach Claude to be really good at it - a more resource-intensive approach. The complexity arises when you have to navigate this trade-off between research investment to make the system excellent at a certain thing or ship what works well enough for users with the current state of the system.
Detail-oriented design craft won’t just mean pixel perfection or the creation of novel metaphors, but also cultivating AI which has a good taste in and of itself. Ellen Ullman put it nicely, “The world as humans understand it and the world as it must be explained to computers come together in the programmer in a strange state of disjunction.” While working on auto-generating titles for the conversational interface I wanted to push the idea that Claude could have an excellent editorial taste. We asked Claude to generate the prompt for that feature and now it comes up with titles based on users writing style and previous titles, resulting in uniquely tailored micro experience to each individual.
Determining the best and safest form factor for AI capabilities requires a thorough consideration. While working on a 100k tokens context window, I was reminded how for Karl Lagerfeld, “embroidery is not mere lux augmentation, but capability intrinsic to the garment.” This is true for AI as well — how do we enable a form factor that is most intrinsic to the system? What type of interface affordances — file uploads, a multi-branch conversation, or something else—would have been most intrinsic for such a capability? This is an unending effort to thoughtfully manage an evolving relationship between humans and machines.

When we are posed with novel tech we all want to build perfect systems, nail it perfectly. But some discussions on software longevity and perfection made me realize that "perfect software or AI" does not refer to an unchanging, abstract ideal. Rather, it should mean the continual progression of fundamental principles. And I think this is what critically important in the moral craftsmanship of software and AI — technological temperance, a prudent and deliberate judgment of each new deployment.

Often I’m asked “What’s something you wish people understood more about AI?” and often my answer is that constraints will evolve and if you are a designer, your goal is to imagine your most ideal version of the system and work backwards to the most inherent problems worth solving for. If you are a builder in the space using LLMs, don’t overfocus on current model constraints; think about what your definition of perfection on a timescale of at least 10 years, and technology will likely enable it way faster.

Here is a specific example on conversational interfaces. We know that larger RLHF models display sycophantic behaviors. Determining how much mirroring is acceptable in a conversational flow and when it becomes too schizophrenic is a design challenge. While addressing this issue given current constraints may be worthwhile, but in my view the sycophancy problem will eventually be solved. A more inherent question in conversational flow design is the question of trust. How would you curate diverse rich behaviors, while also staying true to the fundamental “personality” of a LM? What’s the base “personality” that can be layered on top, yet you still know this is Claude and that is not Claude? Like in human interaction — whether you are at work meeting or a rave — you know this is one person you are talking to though conversation style might be different. Trust in the system comes down to defining a base “personality” that can be built upon while still recognizing the system. I see it as both alignment research and product design questions that must be co-evolved collaboratively.

As creators, our goal should be to relentlessly push the state of the art forward and move the goalposts of what is considered exceptional - the ephemeral experience of a solution fitting elegantly into the current constraints and needs. But needs, cultural changes, and constraints evolve, and so we must evolve with them.

Linux was made from a steadfast vision of transparency, community, and modifiability. In contrast, macOS had its vision broadened and reinterpreted to encompass new principles of privacy while clinging to a coherent design ethos. Similarly in AI training, we try to establish constitutional principles — a guided approach on how to craft the system behavior, not to “represent a specific ideology, but rather to be able to follow a given set of principles.” We know those principles will change over time - what seems appropriate today may not be acceptable in a few decades as interpretations of laws and norms shift.

Technological temperance is ultimately the dialog between vision and constraints, as guiding principles are evolved, contested and reimagined.

A new design mandate is, therefore, to envision the destination and move towards that with guided principles, often collaborating with alignment research and shifting the constraints of the AI system itself. However, in so doing, it also necessitates a careful self-examination of our own moral values.

On personal moral inquiry.

I still continue to grapple with how to understand and how to best communicate the intricacies of my work, while being close to the system and my moral responsibilities that come with it, but often these things are so complex that I can hardly articulate them in writing. I aim to establish my own moral compass to navigate this complex space between concerns about unintended consequences from the technologies I help create and innate desire to push scientific progress and innovation forward.

I’m, for example, still unsettled by how my past choices in code shapes the present, and how it reflected the circumstances upon which it was written in the past. The choices I made then, which seemed pragmatic, now seem imperfect or limited in perspective. We all wish we did this differently, more perfect. But as my friend says, the difference between software craft and programming is that you have to live with the consequences of your past decisions. And I think I have to make peace with it by realizing that as long as I’m guided by principles in the first place, the key is to let my perspectives and understanding evolve, not cling to past notions of perfection.

I’m also finding myself at this contradiction of two worldviews: cautious worldview of researchers worried about uncontrolled progress in AI, and the ambitious worldview of builders racing to pioneer new forms of human-technology interaction. At the same time I also find myself in the middle of broader two worlds: while one corner fights physical war of the century, another builds technology of gods. I’m a part of all of these worlds, and this is the human condition that is so sublimely dichotomous, in all of its contradiction. But there is an elegance to inhabiting contradiction—it inspires growth and intimates imperfection.

Some of my friends have expressed the view that launching any new AI capability is morally wrong because it could either outpace our ability to grapple with the consequences, or spur other actors to recklessly charge ahead in developing the same advances without proper safeguards. I agree with the dangers of powerful systems and resonated a lot with Andy Matuschak’s essay “Ethics of AI-based invention: a personal inquiry” . As he said, “I’m worried about a rise in bewildering accidents and subtle injustices, as we hand ever more agency to inscrutable autonomous systems... These systems’ capabilities seem to be growing much more quickly than our ability to understand or cope with them.”

But I also think framing AI progress in terms of a dichotomy between "AI safety" and "AI capabilities" can be counterproductive — it suggests that more advanced AI is necessarily less safe, or that safer AI is necessarily less capable, when the goal should be to integrate safety and alignment in ever capable systems. Ultimately, is reducing hallucination a capability or a safety feature?

“We Must Declare Jihad Against AI” argues for rather a guided progress to benefit society rather than a complete ban or development shutdowns. A true "Butlerian Jihad" should not promote an anti-modern or anti-technological reactionary stance. Instead, it should adopt the cautious approach towards progress and cultural renewal that guided policymakers in earlier periods of US history, but has been largely forgotten recently. Current regulatory proposals fail to adequately address the threat to jobs and economic security, but the recent Senate’s hearing makes me hopeful.

While some may see shutting down AI progress as the morally perfect outcome, but in this view “perfect” is an endpoint - a static notion. I think moral progress is an ongoing process guided by core principles. We can allow capabilities to develop responsibly and judiciously rather than not at all. Doing so in house, under controlled conditions and with the best safety research, allows us to reap the benefits of progress while mitigating risks. But I also think AI safety should require recognizing the complex realities of an imperfect world, responding with sensitivity and nuance rather than pure absolutism.

There is an alternative perspective that reaches a similar conclusion. In game theory, complex games can enter endless "cycles" where play continues indefinitely without resolution. It may not be possible to create a debater or aligned AI system that is robust to all potential adversaries or misuses, no matter how much effort is expended. Instead, we may have to continuously devote resources to aligning AI without ever reaching a stable equilibrium — and that is a continuous process of adjustment to changing capabilities and incentives, rather than a fixed solution. Much as in an endless game, the work of aligning AI, unfortunately, may not have a definite end.

Safety is the property of the world too.

When discussing safety with others, I've found that some people focused on the inherent properties of the AI system itself, often forgetting that it is also deeply dependent on the world in which the system is deployed. Truth is, AI safety cannot be reduced to system properties alone. Rather, it emerges from an ongoing, complex interaction between what we build and the world we build it for.

The values and preferences of the AI system matter in terms of its impacts, but so does the state of the world. We need to consider the sensitivity of the system to the state of the world, which requires thinking about the system in its context and how to design AI and environments such that their interaction avoids unwanted consequences. We cannot achieve true safety until we design AI not just for modeled environments but for the beautifully messy reality of our world - a world ever changing, overflowing with ambiguity, and often stubborn in its refusal to fit any model. Only by accounting for this deep interdependence can we build AI that is robustly aligned with human values.

Both interface designers and alignment researchers must see it as coevolving with complex environments, with constraints developed experientially rather than imposed upfront. By embracing progress in a principled way and embodying technological temperance, we can craft AI safety that is robust, aligned and beneficial. And we can achieve excellence not through obedience to static notions of perfection but through a dedicated craftsmanship of technology in the perpetual pursuit of its guiding principles. The path is unclear but the goal, ever-changing, is worth striving for.

In conversations with: Justin Barber, Tuhin Kumar, Justin Kawashima, Rasmus Andersson, Shan Carter, Andy Matuschyak, Ethan Perez, Kamille Lukosiute, Cem Anil. Thanks to Mina Fahmi, Austin Wu, and Jason Wei for valuable feedback.

Reads

Jeremy Hadfield

Oct 8, 2023

Thoughtfully written. I'm glad you & others are pondering these things. Nothing can be silo'd; working on AI is also working on the interconnected AI-human-Earth system. The goal is not an empty, unresponsive stasis that encodes the average early 21st-century attitude into models, but to design a robust, responsive, multifaceted system that can integrate multiple perspectives flexibly and safely. Much harder, but more worth doing. And to do it with taste? An individualized aesthetic quality that inherently differs from the average? Downright herculean. But even more worth trying for!

Expand full comment

sémaphore

Discussion about this post