alignment – Priyanka Bharadwaj

Relational Design Can’t Be Left to Chance

We say alignment is about control, safety, precision. But after a decade working as a matchmaker in India’s increasingly chaotic relationship market, I’ve learnt that what sustains a system isn’t control, it’s trust. And trust doesn’t live in rules. It lives in memory, repair, and mutual adaptation.

I’ve spent years watching relationships fall apart not because people weren’t compatible, but because they didn’t know how to collaborate. We are fluent in chemistry, but clumsy with clarity. We optimised for trait, not values or processes. And when conflicts hit, as it always does, we have no shared playbook to return to.

In traditional Indian matchmaking, we had a whole socio-structural scaffolding propping up long-term collaboration through race or caste endogamy, community expectations, family intermediation, shared rituals and rites. It was crude and often unjust, but it was structurally coherent. Marriage was not just a bond between two people, but between two lineages, empires and philosophies of life. There were rules, expectations and fallback norms. Vows weren’t just ceremonial; they were memory devices, reminding people what they were committing to when emotions faded.

Today, most of that scaffolding is gone.

Tinder has replaced the community priest or matchmaker, and in this frictionless new marketplace, we are left to figure out long-term cooperation with short-term instincts. Even when we genuinely care for each other, we often collapse under the weight of ambiguity. We never clarify what we mean by commitment. We never learnt how to repair after rupture, and we assume love would make things obvious.

But love doesn’t make things obvious, context does, and maybe design too.

This isn’t just about marriage, it’s about systems and it’s about alignment.

Much of the current conversation on AI alignment focuses on architecture, oversight, corrigibility and formal guarantees. All of that is necessary, and I am not refuting it one bit. But I don’t see AI in isolation, because we humans are building it, for us, and so, I can’t help but view it from a lens of collaboration or partnership.

In human systems, I’ve rarely seen misalignment fixed by control. I’ve seen it fixed by context, memory, feedback, and repair. Not all of which can be coded cleanly into an objective function.

I’ve watched couples disintegrate not because of what happened, but because it kept happening. The breach wasn’t just an error. It was a pattern that wasn’t noticed, a pain that wasn’t remembered and a signal that wasn’t acknowledged.

Systems that don’t track trust will inevitably erode it.

It’s tempting to think that AI, given enough data, will learn all this on its own. That it will intuit human needs, pick up patterns and converge on stable behaviours. But from the relational world, I’ve learnt that learning isn’t enough, structural scaffolding for sustenance matters.

Most humans don’t know how to articulate their emotional contracts, let alone renegotiate them. Many don’t even realise repair is an option. That they can say, “Hey, this mattered to me. Can you remember next time?” If we humans can’t do this instinctively, why would we expect machines to?

In nature, systems evolved slowly. Organs, species and ecosystems; they didn’t drop overnight like an update. They became resilient because they were shaped by millennia of co-adaptation. They learnt, painfully, that survival isn’t about short-term optimisation. It’s about coherence over time. It’s about knowing when not to dominate, and about restraint.

We humans can, if we choose, eliminate entire species. But most of us don’t. Somewhere in our messy cultural evolution, we’ve internalised a sense that … might isn’t always right. Survival is entangled, and so, power must be held in context.

AI doesn’t have that inheritance. It is young, fast and brittle (if not reckless), and it is being inserted into mature social ecosystems without the long runway of evolutionary friction. It’s not wrong to build it, but it is wrong to assume it will learn the right instincts just because it sees enough examples.

That’s why I think we need to take on the role not of controllers, but of stewards, or parents, even. Not to infantilise the system, but to give it what it currently lacks i.e. relational memory, calibrated responsiveness and the capacity to recover after breach.

Eventually, maybe it will become anti-fragile enough to do this on its own. But not yet. Until then, we design, and we nurture.

We design for value memory, not just functional memory, but the ability to track what a human has signalled as emotionally or ethically significant. We design for trust tracking, not just “was the task completed?” but “has the system earned reliability in the eyes of this user?” We design for repair affordances i.e. the moment when something goes wrong and the system says, “That mattered. Let me try again.” We design for relational onboarding or lightweight ways to understand a user’s tone, sensitivity, and boundary preferences.

These are not soft features. They are structural affordances for relational alignment. Just like rituals and vows aren’t romantic fluff, but memory scaffolds. Just like marriage is not only about love, but about co-navigation under stress.

Some might say this isn’t necessary. That good architecture, regulation, and interpretability will cover the gaps. But every safety approach needs a medium, and in complex socio-technical systems, that medium is trust. Not blind trust, but earned, trackable, recoverable trust.

Relational alignment won’t replace other paradigms. But it may be the piece that makes them stick like a substrate that holds the rest together when things begin to drift. Because if we don’t design our systems to repair trust, hold memory, and attune to difference, we won’t just build misaligned machines, we’ll build lonely ones.

And no, I am not anthropomorphising AI or worry about its welfare, but I know that loneliness puts us at odds with rest of the world, making it harder to distinguish right from wrong.

I use the parenting analogy not to suggest we’ll control AI forever, but to point out that even with children, foundational values are just the start. Beyond a point, it is each interaction, with peers, strangers, systems, that shapes who they become. Centralised control only goes so far. What endures is the relational context. And that, perhaps, is where real alignment begins.

Coaching AI: A Relational Approach to AI Safety

A couple of weekends ago, my family was at Lalbagh Botanical Garden in Bangalore. After walking through a crowded mango exhibition, my 8-year-old offered to fetch her grandparents, who were walking slowly behind us. We waited outside the exhibition hall.

Five minutes passed. Then ten. Then fifteen. The grandparents emerged from the hall, but my daughter had vanished. After thirty anxious minutes, we found her perched calmly on a nearby hilltop, scanning the garden below like a baby hawk.

Her reasoning was logical. She remembered where her grandparents had last stopped (a street vendor) and went to look for them there. When she didn’t find them, she climbed up a hillock for a bird’s-eye view. Perfectly reasonable, except she had completely missed them entering the hall with her.

Her model of the world hadn’t updated with new context, so she pursued the wrong goal with increasing confidence. From her perspective, she was being helpful and clever. From ours, she was very much lost.

The Confident Pursuit of the Wrong Objective

This is a pattern familiar in AI where systems escalate confidently along a flawed trajectory. My daughter’s problem wasn’t lack of reasoning, it was good reasoning on a bad foundation.

Large models exhibit this all the time. An LLM misinterprets a prompt and confidently generates pages of on-topic-but-wrong text. A recommendation engine over-indexes on ironic engagement. These systems demonstrate creativity, optimisation, and persistence, but in the service of goals that no longer reflect the world.

This I learnt is framed in AI in terms of distributional shift or exposure bias. Training on narrow or static contexts leads to brittleness in deployment. When feedback loops fail to re-anchor a system’s assumptions, it just keeps going confidently, and wrongly.

Why Interpretability and Alignment May Not Be Enough

Afterward, I tried to understand where my daughter’s reasoning went wrong. But I also realised that even perfect transparency into her thoughts wouldn’t have helped in the moment. I could interpret her reasoning afterward, but I couldn’t intervene in it as it unfolded. What she needed wasn’t analysis. She needed a tap on the shoulder, and just a question (not a correction, mind you) – “Where are you going, and why?”

This reflects a limitation in many current safety paradigms. Interpretability, formal alignment, and corrigibility all aim to shape systems from the outside, or through design-time constraints. But intelligent reasoning in a live context may still go off-track.

This is like road trips with my husband. When Google Maps gets confused, I prefer to ask a local. He prefers to wait for the GPS to “figure it out.” Our current AI safety approaches often resemble the latter, trusting that the system will self-correct, even when it’s clearly drifting.

A Relational Approach to Intervention: Coaching Systems

What if intelligence, especially in open-ended environments, is inherently relational? Instead of aiming for fully self-aligned, monolithic systems, what if we designed AI architectures that are good at being coached?

We could introduce a lightweight companion model, a “coach”, designed not to supervise or override, but to intervene gently at critical reasoning junctures. This model wouldn’t need full interpretability or full control. Its job would be to monitor for known failure patterns (like confidence outpacing competence) and intervene with well-timed, well-phrased questions.

Why might this work? Because the coach retains perspective precisely because it isn’t buried in the same optimisation loop. It sees the system from the outside, not from within. It may also be computationally cheaper to run than embedding all this meta-cognition directly inside the primary system.

Comparison to Existing Paradigms

This idea overlaps with several existing safety and alignment research threads but offers a distinct relational frame:

Chain-of-Thought & Reflection prompting: These approaches encourage a model to think step-by-step, improving clarity and reducing impulsive mistakes. But they remain internal to the model and don’t introduce an external perspective.
Debate (OpenAI): Two models argue their positions, and a third agent (often human) judges who was more persuasive. This is adversarial by design. Coaching, by contrast, is collaborative, more like a helpful peer than a rival.
Iterated Amplification (Paul Christiano): Breaks down complex questions into simpler sub-tasks that are solved by helper agents. It’s powerful but also heavy and supervision-intensive. Coaching is lighter-touch, offering guidance without full task decomposition.
Elicit Latent Knowledge (Anthropic): Tries to get models to reveal what they “know” internally, even if they don’t say it outright. This improves transparency but doesn’t guide reasoning as it happens. Coaching operates during reasoning process.
Constitutional AI (Anthropic): Uses a set of written principles (a “constitution”) to guide a model’s outputs via self-critique and fine-tuning. It’s effective for normative alignment but works mostly post hoc. Coaching enables dynamic, context-sensitive nudges while the reasoning is still unfolding.

In short, coaching aims to foreground situated, lightweight, real-time feedback, less through recursion, adversarial setups, or predefined rules, and more through the kind of dynamic, context-sensitive interactions that resemble guidance in human reasoning. I don’t claim this framing is sufficient or complete, but I believe it opens up a promising line of inquiry worth exploring.

Implementation Considerations

A coaching system might be trained via:

Reinforcement learning on historical failure patterns
Meta-learning over fine-tuning episodes to detect escalation behaviour
Lightweight supervision using confidence/competence mismatches as training

To function effectively, a coaching model would need to:

Monitor reasoning patterns without being embedded in the same loop
Detect early signs of drift or false certainty
Intervene via calibrated prompts or questions, not overrides
Balance confidence and humility so it is enough to act, enough to revise

Sample interventions:

“What evidence would change your mind?”
“You’ve rejected multiple contradictory signals, why?”
“Your predictions and outcomes don’t match. What assumption might be off?”

Architectural Implications

This approach suggests a dual-agent architecture:

Task Model: Focused on primary problem-solving.
Coach Model: Focused on relational meta-awareness and lightweight intervention.

The coach doesn’t need deep insight into every internal weight or hidden state. It simply needs to learn interaction patterns that correlate with drift, overconfidence, or tunnel vision. This can also scale well. We could have modular coaching units trained on classes of failures (hallucination, overfitting, tunnel vision) and paired dynamically with different systems.

Of course, implementing such a setup raises significant technical questions, including how do task and coach models communicate reliably? What information is shared? How is it interpreted? Solving for communication protocols, representational formats and trust calibration are nontrivial. I plan to explore some of them more concretely in a follow-up post on Distributed Neural Architecture (DNA).

Why This Matters

The future of AI safety likely involves many layers, including interpretability, adversarial robustness, and human feedback. But these will not always suffice, especially in long-horizon or high-stakes domains where systems must reason through novel or ambiguous contexts.

The core insight here is that complex reasoning systems will inevitably get stuck. The key is not to eliminate error entirely, but to recognise when we might be wrong, and to build the infrastructure for possibility of course correction. My daughter didn’t need to be smarter. She needed a nudge for course correction real-time.

In a world of increasingly autonomous systems, perhaps safety won’t come from more constraints or better rewards, but from designing architectures that allow systems to be interrupted, questioned, and redirected at just the right moment.

—

Open Questions

What failure modes have you seen in LLMs or agents that seem better addressed through coaching than control?
Could models learn to coach one another over time?
What would it take to build a scalable ecosystem of coach–task model pairs?

—

If coaching offers a micro-level approach to safety through localised, relational intervention, DNA begins to sketch what a system-level architecture might look like, one where such interactions can be compositional, plural, and emergent. I don’t yet know whether this framework is tractable or sufficient, but I believe it’s worth exploring further. In a follow-up post, I will attempt to flesh out the idea of Distributed Neural Architecture (DNA), a modular, decentralised approach to building systems that reason not alone, but in interaction.

Can you Care without Feeling?

One day, I corrected an LLM for misreading some data in a table I’d shared. Again. Same mistake. Same correction. Same hollow apology.

“You’re absolutely right! I should’ve been more careful. Here’s the corrected version blah blah blah.”

It didn’t sound like an error. It sounded like a partner who’s mastered the rhythm of an apology but not the reality of change.

I wasn’t annoyed by the model’s mistake. I was unsettled by the performance, the polite, well-structured, emotionally intelligent-sounding response that suggested care, with zero evidence of memory. No continuity. No behavioural update. Just a clean slate and a nonchalant tone.

We’ve built machines that talk like they care, but don’t remember what we told them yesterday. Sigh.

The Discomfort Is Relational

In my previous essays, I’ve explored how we’re designing AI systems the way we approach arranged marriages, optimising for traits while forgetting that relationships are forged in repair, not specs. I’ve argued that alignment isn’t just a technical challenge; it’s a relational one, grounded in trust, adaptation, and the ongoing work of being in conversation.

This third essay comes from a deeper place. A place that isn’t theoretical or abstract. It’s personal. Because when something pretends to care, but shows no sign that we ever mattered, that’s not just an error. That’s a breach.

And it’s eerily familiar.

A Quiet Moment in the Kitchen

Recently, I scolded my 8-year-old for something. She shut down, stormed off. Normally, I’d go after her. But that day, I was fried.

Later, I was in the kitchen, quietly loading the dishwasher, when she walked in and asked, “Mum, do you still love me when you’re upset with me?” I was unsure where this was coming from, but simply said “Of course, baby. Why do you ask?” She paused, and then said, “.. because you have that look like you don’t.”

That’s the thing about care. It isn’t what we say, it’s what we do. It’s what we adjust. It’s what we hold onto even when we’re tired. She wasn’t asking for reassurance. She was asking for relational coherence.

So was I, when the LLM said sorry and then forgot me again.

Care Is a System, Not a Sentiment

We’ve taught machines to simulate empathy, to say things like “I understand” or “I’ll be more careful next time.” But without memory, there’s no follow-through. No behavioral trace. No evidence that anything about us registered.

This results in machines that feel more like people-pleasers than partners. High verbal fluency, low emotional integrity. This isn’t just bad UX. It’s a fundamental misalignment. A shallow mimicry of care that collapses under the weight of repetition.

What erodes trust isn’t failure. It’s the apology without change. The simulation of care without continuity.

So, what is care, really?

Not empathy. Not affection. Not elaborate prompts or personality packs.

Care can be as simple as memory with meaning. I want behavioural updates, not just verbal flourishes. I want trust not because the model sounds warm, but because it acts aware.

That’s not emotional intelligence. That’s basic relational alignment.

If we’re building systems that interact with humans, we don’t need to simulate sentiment. But we do need to track significance. We need to know what matters to this user, in this context, based on prior signals.

Alignment as Behavioural Coherence

This is where it gets interesting.

Historically, we trusted machines to be cold but consistent. No feeling, but no betrayal. Now, AI systems talk like people, complete with hedging, softening, and mirroring our social tics. But they don’t carry the relational backbone we rely on in real trust, memory, calibration, adaptation and accountability.

They perform care without its architecture. Like a partner who says, “You matter,” but keeps repeating the same hurtful thing.

What we need is not more data. We need structured intervention. Design patterns that support pause, reflection, feedback integration, and pattern recognition over time. Something closer to a prefrontal cortex than a parrot.

As someone who’s spent a decade decoding how humans build trust, whether in relationships, organisations, or policy systems, I’ve come to believe …

Trust isn’t built in words. It’s built in what happens after them.

So no, I don’t need my AI systems to feel. But I do need them to remember.

To demonstrate that what I said yesterday still matters today.

That would be enough.

Relational Alignment

Recently, Dario Amodei, CEO of Anthropic, wrote about “AI welfare.” It got me thinking about the whole ecosystem of AI ethics, safety, interpretability, and alignment. We started by treating AI as a tool. Now we teeter on the edge of treating it as a being. In oscillating between obedience and autonomy, perhaps we’re missing something more essential – coexistence and collaboration.

Historically, we’ve built technology to serve human goals, then lamented the damage, then attempted repair. What if we didn’t follow that pattern with AI? What if we anchored the development of intelligent systems not just around outcomes, but around the relationships we hope to build with them?

In an earlier post, I compared the current AI design paradigm to arranged marriages: optimising for traits, ticking boxes, forgetting that the real relationship begins after the specs are met.

I ended that piece with a question …

What kind of relationship are we designing with AI?

This post is my attempt to sit with that question a little longer, and maybe go a level deeper.

From Obedience to Trust

We’re used to thinking of alignment in functional terms:

Does the system do what I ask?
Does it optimise the right metric?
Does it avoid catastrophic failure?

These are essential questions, especially at scale. But the experience of interacting with AI doesn’t happen in the abstract. It happens in the personal. In that space, alignment isn’t a solved problem. It’s a living process.

When I used to work with couples in conflict, I would often ask:

“Do you want to be right, or do you want to be in relationship?”

That question feels relevant again now, in the context of AI, because much of our current alignment discourse still smells like obedience. We talk about training models the way we talk about housebreaking a dog or onboarding a junior analyst.

But relationships don’t thrive on obedience. They thrive on trust, on care, attention, and the ability to repair when things go wrong.

Relational Alignment: A Reframe

Here’s the idea I’ve been sitting with …

What if alignment isn’t just about getting the “right” output, but about enabling mutual adaptation over time?

In this view, alignment becomes less about pre-specified rules and more about attunement. A relationally aligned system doesn’t just follow instructions, it learns what matters to you, and updates in ways that preserve emotional safety.

Let’s take a business example here: imagine a user relies on your AI system to track and narrate daily business performance. The model misstates a figure, off by a few basis points. That may not be catastrophic. But the user’s response will hinge on what they value: accuracy or direction. Are they in finance or operations? The same mistake can signal different things in different contexts. A relationally aligned system wouldn’t just correct the error. It would treat the feedback as a signal of value – this matters to them, pay attention.

Forgetfulness, in relationships, often erodes trust faster than malice. Why wouldn’t it do the same here?

From Universal to Relational Values

Most alignment work today is preoccupied with universal values such as non-harm, honesty, consent. And that’s crucial. But relationships also depend on personal preferences: the idiosyncratic, context-sensitive signals that make someone feel respected, heard, safe.

I think of these in two layers:

Universal values – shared ethical constraints
Relational preferences – contextual markers of what matters to this user, in this moment

The first layer sets boundaries. The second makes the interaction feel meaningful.

Lessons from Parenting

Of course, we built these systems. We have the power. But that doesn’t mean we should design the relationship to be static. I often think about this through the lens of parenting.

We don’t raise children with a fixed instruction set handed over to the infant at birth. We teach through modeling. We adapt based on feedback. We repair. We try again.

What if AI alignment followed a similar developmental arc? Not locked-in principles, but a maturing, evolving sense of shared understanding?

That might mean building systems that embody:

Memory for what matters
Transparency around uncertainty
Protocols for repair, not just prevention
Willingness to grow, not just optimise
Accountability, even within asymmetry

Alignment, then, becomes not just a design goal but a relational practice. Something we stay in conversation with.

Why This Matters

We don’t live in isolation. We live in interaction.

If our systems can’t listen, can’t remember, can’t repair, we risk building tools that are smart but sterile. Capable, but not collaborative. I’m not arguing against technical rigour, I’m arguing for deeper foundations.

Intelligence doesn’t always show up in a benchmark. Sometimes, it shows up in the moment after a mistake, when the repair matters more than the response.

Open Questions

This shift opens up more questions than it resolves. But maybe that’s the point.

What makes a system trustworthy, in this moment, with this person?
How do we encode not just what’s true, but what’s meaningful?
How do we design for difference, not just in data, but in values, styles, and needs?
Can alignment be personal, evolving, and emotionally intelligent, without pretending the system is human?

An Invitation

If you’re working on the technical or philosophical side of trust modelling, memory, interpretability, or just thinking about these questions, I’d love to hear from you. Especially if you’re building systems where the relationship itself is part of the value.