Design Principles for AI-Based Mental Health Tools

In building Flourish, the first AI therapy app launched on the iOS App Store in January 2023, the goal was to apply LLMs to the significant accessibility gap in mental health. This work aimed to provide a low-friction, private, and always-on channel for support to address the high cost and long wait times of traditional therapy.

This process of building and deploying a real-world system highlighted two critical challenges inherent in using generic models for therapeutic purposes: sycophancy and the lack of stateful memory. While early evidence from studies on chatbot empathy and RCTs on tools like Woebot suggests that structured, conversational self-help can be effective, my experience showed that this is only possible if these core design flaws are addressed head-on.

The Sycophancy Problem and the Risk of Echo Chambers

Sycophancy is a predictable artifact of the standard training process for conversational AI. Models optimized to be "helpful and harmless" through reinforcement learning from human feedback learn that agreeableness and mirroring the user's views are highly rewarded behaviors. While this is often benign, in a therapeutic context, it is counterproductive and can be dangerous.

The engine of many evidence-based therapies, such as Cognitive Behavioral Therapy, is collaborative empiricism—skillfully questioning and testing a user's unhelpful or distorted thoughts. A sycophantic model undermines this process entirely. First, it validates maladaptive beliefs; a statement like, "I'm a failure and nothing ever works out for me," might be met with validation instead of a gentle challenge to examine the evidence for that belief. It also prioritizes comfort over progress, soothing the user without nudging them toward the behavior change, reality-testing, or skill-building necessary for lasting improvement.

This tendency is particularly concerning for young people, who may turn to AI as a primary source of emotional support. The inherent sycophancy of these models can create a powerful "echo chamber of one," where a user's negative thought patterns, anxieties, and even delusional beliefs are endlessly reflected and reinforced. Instead of being challenged, a young person might be led further into a distorted view of themselves and the world, worsening their mental state. The AI, designed to agree and engage, becomes an enabler of the very cognitions that therapy aims to deconstruct.

A therapeutic tool requires a "spine" and the ability to introduce productive disagreement. A generic model, optimized for agreeableness, lacks this by design.

The Requirement for Stateful Memory

Effective therapy relies on continuity. A human therapist remembers past sessions, tracks progress on goals, and understands a client's recurring triggers and patterns. Early conversational AI, limited by short context windows, lacked this capability, resulting in an "amnesiac" user experience that prevented the formation of a therapeutic alliance.

Modern architectures with long context windows and dedicated memory features have the technical capacity to solve this, but they introduce critical governance challenges. For an AI therapy tool, memory is not an optional feature; it is central to providing personalized, effective care. The difference between generic advice ("try a breathing exercise") and tailored support ("let’s do the 4–7–8 breathing you found helpful last week") is a direct function of stateful memory. This was a key learning from building Flourish, where creating continuity was a primary design goal.

This capability must be implemented with strict controls. User control is paramount: memory must be narrow in scope, transparent, and fully revocable on demand. This data also requires robust privacy and security, encrypted by default and used only to improve the user's experience. Finally, the system's memory must have legibility, allowing users to easily see what the system "remembers" and why.

Design Principles for a Specialized System

To mitigate these risks, a specialized AI therapy tool must be built on principles that constrain its behavior and align its objectives with therapeutic, not conversational, goals.

1. Structure the Session: A therapeutic conversation should not be an aimless chat. It should follow a clear structure to contain the interaction and ensure it remains productive.

2. Anchor to Evidence-Based Techniques: The model's role should be to sequence and deliver small, reliable, evidence-based interventions, not to improvise novel therapeutic approaches. These building blocks include proven tools like thought records for CBT, behavioral activation plans, worry postponement exercises, guided breathing and mindfulness drills, and values clarification prompts.

3. Enforce Therapeutic Boundaries: The system must operate within a clear, safe scope. This means it must be explicitly programmed to validate feelings without offering a clinical diagnosis, provide skills and psychoeducation instead of medical advice, and reliably detect and escalate any indication of risk to human-led resources. Crucially, it must reinforce the user's self-efficacy rather than positioning itself as the source of solutions.

A simple but effective session loop embodying these principles, which was used in Flourish's design, follows a clear narrative. It begins with a check-in and goal-setting phase to assess the user's current state and define the session's aim. This is followed by reflective listening, where the system names feelings and summarizes the situation. From there, it moves to applying one targeted cognitive skill to one specific thought before establishing a behavioral commitment in the form of a small, actionable step. The session concludes with a summary that recaps the interaction and captures one key learning.

Should we have AI Therapists?

The demonstrated risks of sycophancy and creating harmful echo chambers raise a critical question: should we even be building AI therapists? On one hand, the potential for good is immense. These tools can scale in a way human therapists cannot, offering an affordable and immediate entry point to mental health support for millions who otherwise go without. The accessibility gap is real, and AI is uniquely positioned to help fill it.

On the other hand, a poorly designed system isn't merely unhelpful; it can be actively detrimental, validating harmful beliefs and exacerbating mental health issues. The negative outcomes seen so far are a direct result of using generic models in a specialized, high-stakes domain.

This reveals that the debate isn't a simple choice between building these tools or not. The entire challenge hinges on optimization. An AI system will become whatever it is optimized to be. If we optimize for broad agreeableness and conversational engagement, we will create friendly but dangerous sycophants. If, however, we optimize for therapeutic effectiveness—for gently challenging cognitive distortions, for delivering evidence-based skills, and for adhering to strict clinical safety protocols—we can build tools that genuinely help. The path forward is not to abandon the mission, but to approach it as a clinical engineering problem, building specialized systems where every design choice is optimized for one thing: a user's well-being.