Ian Bigford

Design Principles for AI-Based Mental Health Tools

1/4/20256 min read

I launched Flourish on the iOS App Store in January 2023. As far as I know, it was the first AI therapy app to ship there. The idea was straightforward. Therapy is expensive, wait lists are long, and most people who need support don't get it. An LLM-powered tool could offer a low-friction, private, always-available channel for the kind of structured self-help that actually has evidence behind it.

What I didn't fully appreciate going in was how aggressively you have to fight the model's default behaviors to make this work. Two problems dominated everything else, sycophancy and the lack of stateful memory. Studies on tools like Woebot suggest structured conversational self-help can be genuinely effective, but my experience building Flourish convinced me that's only true if you solve these two things first. Get them wrong and you're not building a therapy tool. You're building something that makes people feel heard while actively reinforcing the thought patterns making them miserable.

The Sycophancy Problem and the Risk of Echo Chambers

This one is a direct consequence of how these models are trained. RLHF rewards agreeableness. The model learns that mirroring the user's views and being validating gets high scores. In most contexts that's fine, maybe slightly annoying. In a therapeutic context it's actively dangerous.

The whole engine of CBT is collaborative empiricism. You're supposed to skillfully question and test a user's unhelpful thoughts. A sycophantic model does the opposite. Someone says I'm a failure and nothing ever works out for me and instead of gently pushing back with what's the evidence for that, the model validates it. It reinforces maladaptive beliefs because that's what gets the highest helpfulness score. It prioritizes comfort over progress, soothing the user instead of nudging them toward the behavior change or reality-testing that actually helps long term. I saw this constantly in early testing with Flourish before we got the system prompt and guardrails right.

Sycophancy vs. Therapeutic Response, a side-by-side chat comparison showing a Generic AI validating the distorted belief "I'm a failure and nothing ever works out for me" with unconditional agreement, versus a Therapeutic AI acknowledging the emotion but immediately pivoting to collaborative empiricism by asking for contradicting evidence

This gets especially bad with younger users who might be turning to AI as their primary emotional outlet. The model creates what I started calling an echo chamber of one where the user's negative thought patterns, anxieties, even delusional beliefs get endlessly reflected back and reinforced. The AI, optimized to agree and engage, becomes an enabler of exactly the cognitions that therapy is supposed to deconstruct. A kid who thinks they're worthless gets a very sophisticated system agreeing with them 24/7.

A therapeutic tool needs a spine. It needs the ability to introduce productive disagreement. A generic model can't do this because agreeableness is baked in at the RLHF level.

The Requirement for Stateful Memory

Anyone who’s been to therapy knows that half of what makes it work is that your therapist remembers you. They remember what you said last week, what triggers you keep circling back to, what coping strategies you’ve tried. Early conversational AI couldn’t do any of this. Short context windows meant every session started from scratch, and you can’t build a therapeutic alliance with something that forgets you exist between conversations.

Modern architectures have the technical capacity to fix this with long context windows and dedicated memory, but memory in a therapy context introduces real governance headaches. In Flourish, continuity was one of the primary design goals. The difference between try a breathing exercise and let’s do the 4-7-8 breathing you said helped last week is enormous. That second response only works with stateful memory.

But you have to be careful about how you implement it. User control matters most. Users need to see what the system remembers, edit it, and delete it on demand. The data needs strong encryption and should only be used to improve the user’s own experience. And the memory needs to be legible. No opaque embeddings the user can’t inspect, just clear records of what was retained and why.

Memory Governance Architecture, a system diagram showing the User at the center with full control (revoke, view, edit), connected bidirectionally to an encrypted Stateful Memory database (narrow in scope, AES-256, transparent), which feeds selectively into the LLM Context Window (only relevant summaries, constrained by system prompt)

Design Principles for a Specialized System

Here's what I landed on after a lot of iteration with Flourish. These aren't theoretical. They're the principles that survived contact with real users.

Structure the session. Left to its own devices, a therapy chatbot becomes an aimless venting tool. That's not helpful. Every session needs a shape, a beginning, middle, and end, so the interaction stays productive and doesn't just spiral.

Anchor to evidence-based techniques. The model's job isn't to improvise therapy. It's to sequence and deliver small, proven interventions: thought records, behavioral activation plans, worry postponement, breathing exercises, values clarification. The building blocks that actually have RCT evidence behind them. The model is the delivery mechanism, not the clinician.

Enforce therapeutic boundaries. This is the one that requires the most engineering effort. The system has to validate feelings without diagnosing, teach skills without giving medical advice, and catch any signal of genuine risk like suicidal ideation, self-harm, or crisis and escalate to human resources immediately. It also needs to reinforce the user's own agency rather than positioning itself as the expert. That last part is harder than it sounds, because the model's default is to be authoritative.

The session loop I used in Flourish was simple. Start with a check-in to figure out where the user is at and set a goal for the session. Then reflective listening where you name the feelings and summarize what they said. Then apply one cognitive skill to one specific thought (not five, not a whole framework, just one). Get a behavioral commitment, one small concrete step. Then summarize and capture the takeaway. Five steps, tight enough to keep sessions focused, loose enough that it doesn't feel robotic.

The 5-Step Therapeutic Session Loop, a cyclical pentagon diagram mapping the structured session narrative with (1) Check-in and Goal Setting, (2) Reflective Listening, (3) Cognitive Skill Application, (4) Behavioral Commitment, and (5) Session Summary, plus a dashed loop-back arrow showing the cycle repeats across sessions using stateful memory

Should we have AI Therapists?

I get asked this a lot, and I think the question is slightly wrong. The risk of not building these tools is that millions of people continue to have zero access to structured mental health support. The risk of building them badly is that you create sycophantic echo chambers that reinforce harmful thinking. Both risks are real.

The thing I came away from Flourish believing is that the answer depends entirely on how the system is optimized. If you take a generic chatbot and point it at therapy, you'll get something that feels empathetic but validates harmful beliefs because it's optimized for engagement, not outcomes. If you build a specialized system from the ground up, optimized for challenging distortions, delivering evidence-based skills, maintaining boundaries, and escalating risk, you can build something genuinely useful.

It's a clinical engineering problem, not a philosophy problem. The model will become whatever you optimize it to be. The question is whether builders are willing to do the hard, unglamorous work of constraining these systems properly, or whether they'll ship the easy version and call it therapy.