alignment – Priyanka Bharadwaj

Relational Design Can’t Be Left to Chance

We say alignment is about control, safety, precision. But after a decade working as a matchmaker in India’s increasingly chaotic relationship market, I’ve learnt that what sustains a system isn’t control, it’s trust. And trust doesn’t live in rules. It lives in memory, repair, and mutual adaptation.

I’ve spent years watching relationships fall apart not because people weren’t compatible, but because they didn’t know how to collaborate. We are fluent in chemistry, but clumsy with clarity. We optimised for trait, not values or processes. And when conflicts hit, as it always does, we have no shared playbook to return to.

In traditional Indian matchmaking, we had a whole socio-structural scaffolding propping up long-term collaboration through race or caste endogamy, community expectations, family intermediation, shared rituals and rites. It was crude and often unjust, but it was structurally coherent. Marriage was not just a bond between two people, but between two lineages, empires and philosophies of life. There were rules, expectations and fallback norms. Vows weren’t just ceremonial; they were memory devices, reminding people what they were committing to when emotions faded.

Today, most of that scaffolding is gone.

Tinder has replaced the community priest or matchmaker, and in this frictionless new marketplace, we are left to figure out long-term cooperation with short-term instincts. Even when we genuinely care for each other, we often collapse under the weight of ambiguity. We never clarify what we mean by commitment. We never learnt how to repair after rupture, and we assume love would make things obvious.

But love doesn’t make things obvious, context does, and maybe design too.

This isn’t just about marriage, it’s about systems and it’s about alignment.

Much of the current conversation on AI alignment focuses on architecture, oversight, corrigibility and formal guarantees. All of that is necessary, and I am not refuting it one bit. But I don’t see AI in isolation, because we humans are building it, for us, and so, I can’t help but view it from a lens of collaboration or partnership.

In human systems, I’ve rarely seen misalignment fixed by control. I’ve seen it fixed by context, memory, feedback, and repair. Not all of which can be coded cleanly into an objective function.

I’ve watched couples disintegrate not because of what happened, but because it kept happening. The breach wasn’t just an error. It was a pattern that wasn’t noticed, a pain that wasn’t remembered and a signal that wasn’t acknowledged.

Systems that don’t track trust will inevitably erode it.

It’s tempting to think that AI, given enough data, will learn all this on its own. That it will intuit human needs, pick up patterns and converge on stable behaviours. But from the relational world, I’ve learnt that learning isn’t enough, structural scaffolding for sustenance matters.

Most humans don’t know how to articulate their emotional contracts, let alone renegotiate them. Many don’t even realise repair is an option. That they can say, “Hey, this mattered to me. Can you remember next time?” If we humans can’t do this instinctively, why would we expect machines to?

In nature, systems evolved slowly. Organs, species and ecosystems; they didn’t drop overnight like an update. They became resilient because they were shaped by millennia of co-adaptation. They learnt, painfully, that survival isn’t about short-term optimisation. It’s about coherence over time. It’s about knowing when not to dominate, and about restraint.

We humans can, if we choose, eliminate entire species. But most of us don’t. Somewhere in our messy cultural evolution, we’ve internalised a sense that … might isn’t always right. Survival is entangled, and so, power must be held in context.

AI doesn’t have that inheritance. It is young, fast and brittle (if not reckless), and it is being inserted into mature social ecosystems without the long runway of evolutionary friction. It’s not wrong to build it, but it is wrong to assume it will learn the right instincts just because it sees enough examples.

That’s why I think we need to take on the role not of controllers, but of stewards, or parents, even. Not to infantilise the system, but to give it what it currently lacks i.e. relational memory, calibrated responsiveness and the capacity to recover after breach.

Eventually, maybe it will become anti-fragile enough to do this on its own. But not yet. Until then, we design, and we nurture.

We design for value memory, not just functional memory, but the ability to track what a human has signalled as emotionally or ethically significant. We design for trust tracking, not just “was the task completed?” but “has the system earned reliability in the eyes of this user?” We design for repair affordances i.e. the moment when something goes wrong and the system says, “That mattered. Let me try again.” We design for relational onboarding or lightweight ways to understand a user’s tone, sensitivity, and boundary preferences.

These are not soft features. They are structural affordances for relational alignment. Just like rituals and vows aren’t romantic fluff, but memory scaffolds. Just like marriage is not only about love, but about co-navigation under stress.

Some might say this isn’t necessary. That good architecture, regulation, and interpretability will cover the gaps. But every safety approach needs a medium, and in complex socio-technical systems, that medium is trust. Not blind trust, but earned, trackable, recoverable trust.

Relational alignment won’t replace other paradigms. But it may be the piece that makes them stick like a substrate that holds the rest together when things begin to drift. Because if we don’t design our systems to repair trust, hold memory, and attune to difference, we won’t just build misaligned machines, we’ll build lonely ones.

And no, I am not anthropomorphising AI or worry about its welfare, but I know that loneliness puts us at odds with rest of the world, making it harder to distinguish right from wrong.

I use the parenting analogy not to suggest we’ll control AI forever, but to point out that even with children, foundational values are just the start. Beyond a point, it is each interaction, with peers, strangers, systems, that shapes who they become. Centralised control only goes so far. What endures is the relational context. And that, perhaps, is where real alignment begins.

Coaching AI: A Relational Approach to AI Safety

A couple of weekends ago, my family was at Lalbagh Botanical Garden in Bangalore. After walking through a crowded mango exhibition, my 8-year-old offered to fetch her grandparents, who were walking slowly behind us. We waited outside the exhibition hall.

Five minutes passed. Then ten. Then fifteen. The grandparents emerged from the hall, but my daughter had vanished. After thirty anxious minutes, we found her perched calmly on a nearby hilltop, scanning the garden below like a baby hawk.

Her reasoning was logical. She remembered where her grandparents had last stopped (a street vendor) and went to look for them there. When she didn’t find them, she climbed up a hillock for a bird’s-eye view. Perfectly reasonable, except she had completely missed them entering the hall with her.

Her model of the world hadn’t updated with new context, so she pursued the wrong goal with increasing confidence. From her perspective, she was being helpful and clever. From ours, she was very much lost.

The Confident Pursuit of the Wrong Objective

This is a pattern familiar in AI where systems escalate confidently along a flawed trajectory. My daughter’s problem wasn’t lack of reasoning, it was good reasoning on a bad foundation.

Large models exhibit this all the time. An LLM misinterprets a prompt and confidently generates pages of on-topic-but-wrong text. A recommendation engine over-indexes on ironic engagement. These systems demonstrate creativity, optimisation, and persistence, but in the service of goals that no longer reflect the world.

This I learnt is framed in AI in terms of distributional shift or exposure bias. Training on narrow or static contexts leads to brittleness in deployment. When feedback loops fail to re-anchor a system’s assumptions, it just keeps going confidently, and wrongly.

Why Interpretability and Alignment May Not Be Enough

Afterward, I tried to understand where my daughter’s reasoning went wrong. But I also realised that even perfect transparency into her thoughts wouldn’t have helped in the moment. I could interpret her reasoning afterward, but I couldn’t intervene in it as it unfolded. What she needed wasn’t analysis. She needed a tap on the shoulder, and just a question (not a correction, mind you) – “Where are you going, and why?”

This reflects a limitation in many current safety paradigms. Interpretability, formal alignment, and corrigibility all aim to shape systems from the outside, or through design-time constraints. But intelligent reasoning in a live context may still go off-track.

This is like road trips with my husband. When Google Maps gets confused, I prefer to ask a local. He prefers to wait for the GPS to “figure it out.” Our current AI safety approaches often resemble the latter, trusting that the system will self-correct, even when it’s clearly drifting.

A Relational Approach to Intervention: Coaching Systems

What if intelligence, especially in open-ended environments, is inherently relational? Instead of aiming for fully self-aligned, monolithic systems, what if we designed AI architectures that are good at being coached?

We could introduce a lightweight companion model, a “coach”, designed not to supervise or override, but to intervene gently at critical reasoning junctures. This model wouldn’t need full interpretability or full control. Its job would be to monitor for known failure patterns (like confidence outpacing competence) and intervene with well-timed, well-phrased questions.

Why might this work? Because the coach retains perspective precisely because it isn’t buried in the same optimisation loop. It sees the system from the outside, not from within. It may also be computationally cheaper to run than embedding all this meta-cognition directly inside the primary system.

Comparison to Existing Paradigms

This idea overlaps with several existing safety and alignment research threads but offers a distinct relational frame:

Chain-of-Thought & Reflection prompting: These approaches encourage a model to think step-by-step, improving clarity and reducing impulsive mistakes. But they remain internal to the model and don’t introduce an external perspective.
Debate (OpenAI): Two models argue their positions, and a third agent (often human) judges who was more persuasive. This is adversarial by design. Coaching, by contrast, is collaborative, more like a helpful peer than a rival.
Iterated Amplification (Paul Christiano): Breaks down complex questions into simpler sub-tasks that are solved by helper agents. It’s powerful but also heavy and supervision-intensive. Coaching is lighter-touch, offering guidance without full task decomposition.
Elicit Latent Knowledge (Anthropic): Tries to get models to reveal what they “know” internally, even if they don’t say it outright. This improves transparency but doesn’t guide reasoning as it happens. Coaching operates during reasoning process.
Constitutional AI (Anthropic): Uses a set of written principles (a “constitution”) to guide a model’s outputs via self-critique and fine-tuning. It’s effective for normative alignment but works mostly post hoc. Coaching enables dynamic, context-sensitive nudges while the reasoning is still unfolding.

In short, coaching aims to foreground situated, lightweight, real-time feedback, less through recursion, adversarial setups, or predefined rules, and more through the kind of dynamic, context-sensitive interactions that resemble guidance in human reasoning. I don’t claim this framing is sufficient or complete, but I believe it opens up a promising line of inquiry worth exploring.

Implementation Considerations

A coaching system might be trained via:

Reinforcement learning on historical failure patterns
Meta-learning over fine-tuning episodes to detect escalation behaviour
Lightweight supervision using confidence/competence mismatches as training

To function effectively, a coaching model would need to:

Monitor reasoning patterns without being embedded in the same loop
Detect early signs of drift or false certainty
Intervene via calibrated prompts or questions, not overrides
Balance confidence and humility so it is enough to act, enough to revise

Sample interventions:

“What evidence would change your mind?”
“You’ve rejected multiple contradictory signals, why?”
“Your predictions and outcomes don’t match. What assumption might be off?”

Architectural Implications

This approach suggests a dual-agent architecture:

Task Model: Focused on primary problem-solving.
Coach Model: Focused on relational meta-awareness and lightweight intervention.

The coach doesn’t need deep insight into every internal weight or hidden state. It simply needs to learn interaction patterns that correlate with drift, overconfidence, or tunnel vision. This can also scale well. We could have modular coaching units trained on classes of failures (hallucination, overfitting, tunnel vision) and paired dynamically with different systems.

Of course, implementing such a setup raises significant technical questions, including how do task and coach models communicate reliably? What information is shared? How is it interpreted? Solving for communication protocols, representational formats and trust calibration are nontrivial. I plan to explore some of them more concretely in a follow-up post on Distributed Neural Architecture (DNA).

Why This Matters

The future of AI safety likely involves many layers, including interpretability, adversarial robustness, and human feedback. But these will not always suffice, especially in long-horizon or high-stakes domains where systems must reason through novel or ambiguous contexts.

The core insight here is that complex reasoning systems will inevitably get stuck. The key is not to eliminate error entirely, but to recognise when we might be wrong, and to build the infrastructure for possibility of course correction. My daughter didn’t need to be smarter. She needed a nudge for course correction real-time.

In a world of increasingly autonomous systems, perhaps safety won’t come from more constraints or better rewards, but from designing architectures that allow systems to be interrupted, questioned, and redirected at just the right moment.

—

Open Questions

What failure modes have you seen in LLMs or agents that seem better addressed through coaching than control?
Could models learn to coach one another over time?
What would it take to build a scalable ecosystem of coach–task model pairs?

—

If coaching offers a micro-level approach to safety through localised, relational intervention, DNA begins to sketch what a system-level architecture might look like, one where such interactions can be compositional, plural, and emergent. I don’t yet know whether this framework is tractable or sufficient, but I believe it’s worth exploring further. In a follow-up post, I will attempt to flesh out the idea of Distributed Neural Architecture (DNA), a modular, decentralised approach to building systems that reason not alone, but in interaction.

Cognitive Exhaustion & Engineered Trust

I’ve been going to the same gym (close to home) since 2019. It used to be a place of quiet rhythm with familiar equipment, predictable routines, a kind of muscle memory not just for the body but the mind. I’d walk in, find my spot, ease into the day’s class. There was flow. Not just in movement, but in attention. The environment held me.

Then everything changed.

New people joined in waves. Coaches rotated weekly. Classes collided. Dumbbells became landmines. Every workout began with a risk assessment. Was that bench free? Will someone walk behind me mid-lift? Can I finish this set before another class floods in?

My body was lifting, but my mind was scanning. Hypervigilance had replaced focus. The gym hadn’t become more dangerous per se, but it had stopped helping me feel safe. And that made all the difference.

What broke wasn’t just order. What broke was affordance, that quiet contract between environment and behaviour, where the space guides you gently toward good decisions, without you even noticing. It wasn’t about rules. It was about rhythm. And without rhythm, all that remained was noise.

And that’s when I realised, this isn’t just about gyms. It’s about systems. It’s about how we design our spaces, physical, social, digital, that shape our decisions, our energy, and ultimately, our trust.

Driving in Bangalore: A Case Study in Cognitive Taxation

I live in Bangalore, a city infamous for its chaotic traffic. We’re not just focused on how we drive. We’re constantly second-guessing what everyone else will do. A bike might swerve into our lane. A pedestrian might dart across unexpectedly. Traffic rules exist, but they’re suggestions, not structure.

So we drive with one hand on the wheel and the other on our cortisol levels. Half our energy goes into vigilance. The other half is what’s left over, for driving, for thinking, for living. This isn’t just unsafe. It’s inefficient.

And the cost isn’t just measured in accidents. It’s measured in the slow leak of mental bandwidth. We don’t notice it day to day. But over weeks, months, years, our attention gets frayed. Our decisions get thinner. Our resilience drains. Not because we’re doing more. But because the system around us does less.

Chaos isn’t always a crisis. Sometimes, it’s a tax.

The Toyota Floor: What Safety Feels Like When It’s Working

Years ago, I worked on the factory floor at Toyota. It had real risks such as heavy machinery, moving parts, tight deadlines. But I felt less stressed there than I do on Bangalore roads or in my current gym.

Why?

Because the environment carried part of the load.

Walkways were marked in green. Danger zones had tactile and auditory cues. Tools had ergonomic logic. Even the sounds had a design language, each hiss, beep, or clang told you something useful. I didn’t need to remember a hundred safety rules. The floor whispered them to me as I walked on.

This wasn’t about surveillance. It was about upstream design, an affordance architecture that reduced the likelihood of error, not by punishing the wrong thing, but by making the right thing easy. Not through control. Through invitation. And it scaled better than mere control.

That made us more relaxed, not less alert. Because we weren’t burning all our cognition just staying afloat. We could actually focus on work. This isn’t just a better way to build factories. It’s a better way to build systems. Including AI.

Why Most AI Safety Feels Like My Gym

Most of what we call “AI alignment” today feels a lot like my chaotic gym. We patch dangerous behaviour with filters, tune models post-hoc with reinforcement learning, run red teams to detect edge cases. Safety becomes a policing problem. We supervise harder, tweak more often, throw compute at every wrinkle.

But we’re still reacting downstream. We’re still working in vigilance mode. And the system, like Bangalore’s traffic, demands too much of us, all the time.

What if we flipped the script? What if the goal isn’t stricter enforcement, but better affordance?

Instead of saying, “How do we make this model obey us?” we ask, “What does this architecture make easy? What does it make natural? What does it invite?”

When we design for affordance, we’re not just trying to avoid catastrophic errors. We’re trying to build systems that don’t need to be babysat. Systems where safety isn’t an afterthought, it’s the path of least resistance.

From Control to Co-Regulation

The traditional paradigm treats AI as a tool. Give it a goal. Clamp the outputs. Rein it in when it strays. But as models become more autonomous and more embedded in daily life, this control logic starts to crack. We can’t pre-program every context. We can’t anticipate every edge case. We can’t red-team our way to trust. What we need isn’t control. It’s co-regulation.

Not emotional empathy, but behavioural feedback loops. Systems that surface their uncertainty. That remember corrections. That learn not just from input-output pairs, but from the relational texture of their environment, users, constraints, other agents, evolving contexts, and are able to resolve conflicts.

This isn’t about making AI more human. It’s about making it more social. More modular. More structured in its interactions.

Distributed Neural Architecture (DNA)

What if, instead of one big fluent model simulating everything, we had a modular architecture composed of interacting parts? Each part could:

Specialise in a different domain,
Hold divergent priors or heuristics,
Surface disagreement instead of hiding it,
Adapt relationally over time.

I call it Distributed Neural Architecture or DNA.

Not a single consensus engine, but a society of minds in structured negotiation. This kind of architecture doesn’t just reduce brittleness. It allows safety to emerge, not be enforced. Like a well-designed factory floor, it invites trust by design through redundancies, reflections, checks, and balances.

It’s still early. I’ll unpack DNA more fully in a future post. But the core intuition is alignment isn’t a property of the parts. It’s a function of their relationships.

The Hidden Cost of Hyper vigilance

Whether we’re talking about gyms, traffic, factories, or AI systems, there’s a common theme here. When environments don’t help us, we end up doing too much. And over time, that extra effort becomes invisible. We just assume that exhaustion is the cost of functioning. We assume vigilance is the price of safety. We assume chaos is normal.

But it isn’t. It’s just what happens when we ignore design.

We can do better. In fact, we must because the systems we’re building now won’t just serve us. They’ll shape us. If we want AI that’s not just powerful, but trustable, we don’t need tighter chains. We need smarter scaffolds. Not stronger control. But better coordination. More rhythm. More flow.

More environments that carry the load with us, not pile it all on our heads.

Can you Care without Feeling?

One day, I corrected an LLM for misreading some data in a table I’d shared. Again. Same mistake. Same correction. Same hollow apology.

“You’re absolutely right! I should’ve been more careful. Here’s the corrected version blah blah blah.”

It didn’t sound like an error. It sounded like a partner who’s mastered the rhythm of an apology but not the reality of change.

I wasn’t annoyed by the model’s mistake. I was unsettled by the performance, the polite, well-structured, emotionally intelligent-sounding response that suggested care, with zero evidence of memory. No continuity. No behavioural update. Just a clean slate and a nonchalant tone.

We’ve built machines that talk like they care, but don’t remember what we told them yesterday. Sigh.

The Discomfort Is Relational

In my previous essays, I’ve explored how we’re designing AI systems the way we approach arranged marriages, optimising for traits while forgetting that relationships are forged in repair, not specs. I’ve argued that alignment isn’t just a technical challenge; it’s a relational one, grounded in trust, adaptation, and the ongoing work of being in conversation.

This third essay comes from a deeper place. A place that isn’t theoretical or abstract. It’s personal. Because when something pretends to care, but shows no sign that we ever mattered, that’s not just an error. That’s a breach.

And it’s eerily familiar.

A Quiet Moment in the Kitchen

Recently, I scolded my 8-year-old for something. She shut down, stormed off. Normally, I’d go after her. But that day, I was fried.

Later, I was in the kitchen, quietly loading the dishwasher, when she walked in and asked, “Mum, do you still love me when you’re upset with me?” I was unsure where this was coming from, but simply said “Of course, baby. Why do you ask?” She paused, and then said, “.. because you have that look like you don’t.”

That’s the thing about care. It isn’t what we say, it’s what we do. It’s what we adjust. It’s what we hold onto even when we’re tired. She wasn’t asking for reassurance. She was asking for relational coherence.

So was I, when the LLM said sorry and then forgot me again.

Care Is a System, Not a Sentiment

We’ve taught machines to simulate empathy, to say things like “I understand” or “I’ll be more careful next time.” But without memory, there’s no follow-through. No behavioral trace. No evidence that anything about us registered.

This results in machines that feel more like people-pleasers than partners. High verbal fluency, low emotional integrity. This isn’t just bad UX. It’s a fundamental misalignment. A shallow mimicry of care that collapses under the weight of repetition.

What erodes trust isn’t failure. It’s the apology without change. The simulation of care without continuity.

So, what is care, really?

Not empathy. Not affection. Not elaborate prompts or personality packs.

Care can be as simple as memory with meaning. I want behavioural updates, not just verbal flourishes. I want trust not because the model sounds warm, but because it acts aware.

That’s not emotional intelligence. That’s basic relational alignment.

If we’re building systems that interact with humans, we don’t need to simulate sentiment. But we do need to track significance. We need to know what matters to this user, in this context, based on prior signals.

Alignment as Behavioural Coherence

This is where it gets interesting.

Historically, we trusted machines to be cold but consistent. No feeling, but no betrayal. Now, AI systems talk like people, complete with hedging, softening, and mirroring our social tics. But they don’t carry the relational backbone we rely on in real trust, memory, calibration, adaptation and accountability.

They perform care without its architecture. Like a partner who says, “You matter,” but keeps repeating the same hurtful thing.

What we need is not more data. We need structured intervention. Design patterns that support pause, reflection, feedback integration, and pattern recognition over time. Something closer to a prefrontal cortex than a parrot.

As someone who’s spent a decade decoding how humans build trust, whether in relationships, organisations, or policy systems, I’ve come to believe …

Trust isn’t built in words. It’s built in what happens after them.

So no, I don’t need my AI systems to feel. But I do need them to remember.

To demonstrate that what I said yesterday still matters today.

That would be enough.

Relational Alignment

Recently, Dario Amodei, CEO of Anthropic, wrote about “AI welfare.” It got me thinking about the whole ecosystem of AI ethics, safety, interpretability, and alignment. We started by treating AI as a tool. Now we teeter on the edge of treating it as a being. In oscillating between obedience and autonomy, perhaps we’re missing something more essential – coexistence and collaboration.

Historically, we’ve built technology to serve human goals, then lamented the damage, then attempted repair. What if we didn’t follow that pattern with AI? What if we anchored the development of intelligent systems not just around outcomes, but around the relationships we hope to build with them?

In an earlier post, I compared the current AI design paradigm to arranged marriages: optimising for traits, ticking boxes, forgetting that the real relationship begins after the specs are met.

I ended that piece with a question …

What kind of relationship are we designing with AI?

This post is my attempt to sit with that question a little longer, and maybe go a level deeper.

From Obedience to Trust

We’re used to thinking of alignment in functional terms:

Does the system do what I ask?
Does it optimise the right metric?
Does it avoid catastrophic failure?

These are essential questions, especially at scale. But the experience of interacting with AI doesn’t happen in the abstract. It happens in the personal. In that space, alignment isn’t a solved problem. It’s a living process.

When I used to work with couples in conflict, I would often ask:

“Do you want to be right, or do you want to be in relationship?”

That question feels relevant again now, in the context of AI, because much of our current alignment discourse still smells like obedience. We talk about training models the way we talk about housebreaking a dog or onboarding a junior analyst.

But relationships don’t thrive on obedience. They thrive on trust, on care, attention, and the ability to repair when things go wrong.

Relational Alignment: A Reframe

Here’s the idea I’ve been sitting with …

What if alignment isn’t just about getting the “right” output, but about enabling mutual adaptation over time?

In this view, alignment becomes less about pre-specified rules and more about attunement. A relationally aligned system doesn’t just follow instructions, it learns what matters to you, and updates in ways that preserve emotional safety.

Let’s take a business example here: imagine a user relies on your AI system to track and narrate daily business performance. The model misstates a figure, off by a few basis points. That may not be catastrophic. But the user’s response will hinge on what they value: accuracy or direction. Are they in finance or operations? The same mistake can signal different things in different contexts. A relationally aligned system wouldn’t just correct the error. It would treat the feedback as a signal of value – this matters to them, pay attention.

Forgetfulness, in relationships, often erodes trust faster than malice. Why wouldn’t it do the same here?

From Universal to Relational Values

Most alignment work today is preoccupied with universal values such as non-harm, honesty, consent. And that’s crucial. But relationships also depend on personal preferences: the idiosyncratic, context-sensitive signals that make someone feel respected, heard, safe.

I think of these in two layers:

Universal values – shared ethical constraints
Relational preferences – contextual markers of what matters to this user, in this moment

The first layer sets boundaries. The second makes the interaction feel meaningful.

Lessons from Parenting

Of course, we built these systems. We have the power. But that doesn’t mean we should design the relationship to be static. I often think about this through the lens of parenting.

We don’t raise children with a fixed instruction set handed over to the infant at birth. We teach through modeling. We adapt based on feedback. We repair. We try again.

What if AI alignment followed a similar developmental arc? Not locked-in principles, but a maturing, evolving sense of shared understanding?

That might mean building systems that embody:

Memory for what matters
Transparency around uncertainty
Protocols for repair, not just prevention
Willingness to grow, not just optimise
Accountability, even within asymmetry

Alignment, then, becomes not just a design goal but a relational practice. Something we stay in conversation with.

Why This Matters

We don’t live in isolation. We live in interaction.

If our systems can’t listen, can’t remember, can’t repair, we risk building tools that are smart but sterile. Capable, but not collaborative. I’m not arguing against technical rigour, I’m arguing for deeper foundations.

Intelligence doesn’t always show up in a benchmark. Sometimes, it shows up in the moment after a mistake, when the repair matters more than the response.

Open Questions

This shift opens up more questions than it resolves. But maybe that’s the point.

What makes a system trustworthy, in this moment, with this person?
How do we encode not just what’s true, but what’s meaningful?
How do we design for difference, not just in data, but in values, styles, and needs?
Can alignment be personal, evolving, and emotionally intelligent, without pretending the system is human?

An Invitation

If you’re working on the technical or philosophical side of trust modelling, memory, interpretability, or just thinking about these questions, I’d love to hear from you. Especially if you’re building systems where the relationship itself is part of the value.

AI, Alignment & the Art of Relationship Design

We don’t always know what we’re looking for until we stop looking for what we are told to want.

When I worked as a relationship coach, most people came to me with a list. A neat, itemised checklist of traits their future partner must have. Tall. Intelligent. Ambitious. Spiritual. Funny but not flippant. Driven but not workaholic. Family-oriented but not clingy. The wish-lists were always oddly specific and wildly contradictory.

Most of them came from a place of fear. The fear of choosing wrong. The fear of heartbreak. The fear of regret.

I began to notice a pattern. We don’t spend enough time asking ourselves what kind of relationship we want to build. We outsource the work of introspection to conditioning, and compensate for confusion with checklists. Somewhere along the way, we forget that the person is not the relationship. The traits don’t guarantee the experience.

So I asked my clients to flip the script. Instead of describing a person, describe the relationship. What does it feel like to come home to each other? What are conversations like during disagreements? How do we repair? What values do we build around?

Slowly, something shifted. When we design the relationship first, we begin to recognise the kind of person who can build it with us. Our filters get sharper. Our search gets softer. We stop hunting for trophies and start looking for partners.

I didn’t know it then, but that framework has stayed with me. It still lives in my questions. Only now, the relationship I’m thinking about isn’t romantic. It’s technological.

Whether we realise it or not, we are not just building artificial intelligence, we are curating a relationship with it. Every time we prompt, correct, collaborate, learn, or lean on it, we’re shaping not just what it does, but who we become alongside it.

Just like we do with partners, we’re obsessing over its traits. Smarter. Faster. More efficient. More capable. The next version. The next benchmark. The perfect model.

But what about the relationship?

What kind of relationship are we designing with AI? Through it? Around it?

We call it “alignment”, but much of it still smells like control. We want AI to obey. To behave. To predictably respond. We say “safety”, but often we mean submission. We want performance, but not presence. Help, but not opinion. Speed, but not surprise.

It reminds me of the well-meaning aunties in the marriage market. Impressed by degrees, salaries, and skin tone. Convinced that impressive credentials are the same as long-term compatibility. It’s a comforting illusion. But it rarely works out that way.

Because relationships aren’t made in labs. They’re made in moments. In messiness. In the ability to adapt, apologise, recalibrate. It’s not about how smart AI is. It’s about how safe we feel with AI when it is wrong.

So what if we paused the chase for capabilities, and asked a different question?

What values do we want this relationship to be built on?

Trust, perhaps. Transparency. Context. Respect. An ability to say “I don’t know”. To listen. To course-correct. To stay in conversation without taking over.

What if we wanted AI that made us better? Not just faster or more productive, but more aware. More creative. More humane. That kind of intelligence isn’t artificial. It’s collaborative.

For that, we need a different kind of design. One that reflects our values, not just our capabilities. One that prioritises the quality of interaction, not just the quantity of output. One that knows when to lead, and when to listen.

We’re not building tools. We’re building relationships.

The sooner we start designing this, the better the chances we’ll have at coexisting, collaborating, and even growing together.

Because if we get the relationship right, the intelligence will follow.