Every other agent SDK treats your prompt as a function input. We build assistants with a theory of mind — a model of who you are, what you really asked for, and how you actually work.
That sounds philosophical. It isn't. It's the difference between an agent that quietly resets every turn, and one that picks up where you left off five minutes ago, three sessions ago, or last week.
"Our assistants have a World Model which enables Theory of Mind, tracked through a dedicated dialogue state and explicitly declared ambiguity."
Scroll down below to see more details of what that means.
Twenty SDKs are competing on the same axis: faster tool calls, longer context, fancier reflection loops. They are all variations of the same loop. Read input. Pick a tool. Run. Repeat.
That loop has a ceiling. It can't tell when it's wrong, because it has no model of what "right" would look like. Every turn is a guess in a vacuum. Once the user says something even slightly ambiguous, the ceiling gets very low, very fast.
We took a different bet. The thing missing from "agents" isn't more tools or smarter prompts — it's a model of the person they're talking to, the conversation they're in, and the domain they're working in. That's the unlock. We call it Theory of Mind. Most other frameworks don't have it because they didn't think they needed it.
In cognitive science, theory of mind is your ability to model what another mind knows, wants, and is paying attention to. It's why you don't repeat the same joke to the same person twice.
For an agent, it's the same idea, made structural: an explicit model of you that survives between turns, between sessions, and across tasks. Not a context window. A model.
Most agents have working memory.
Ours have a model of you.
A theory of mind is a triangulation. Lose any of the three corners and the assistant becomes a chatbot again.
Preferences, prior work, the way you phrase things. Stored across sessions, retrieved when relevant. So when you say "the usual tone," it knows what you mean.
Internally: a layered memory — session scratchpad, account-level preferences, and slow-moving context about your team and your work.
A live, structured representation of the current goal, the entities in play, and the questions still open. The agent doesn't reread the transcript every turn — it resumes from where the model says it is.
Internally: an explicit dialogue state and a coordinator that turns messy chat history into structure.
The agent knows the entities of your work and the actions that act on them. "Polish the intro" isn't keyword-matched against tool descriptions; it's grounded against a typed, addressable thing.
Internally: a per-domain World Model — Hugo's is posts and sections, Dana's is tables and metrics, Rowan's is candidates and listings.
A theory of mind is only useful if the agent knows what it doesn't know. Every turn, the agent rates how certain it is about each part of the model — and that rating decides whether to ask, confirm, or proceed. "Asking one good question" isn't a fallback. It's the feature.
This is the hidden third axis behind the three models. Belief + uncertainty. The same idea your friend uses when they say "wait, you mean X or Y?"
You can drop a thread, come back ten minutes later, and the agent knows where it was. No re-explaining. No "as I mentioned earlier."
The agent stops guessing. It asks one targeted question, holds the plan, resumes when you answer. You spend less time undoing what it got wrong.
The three models are domain-agnostic. Swap the World Model and you've got a new assistant. That's why it's a factory, not a product.
We aren't building a smarter chatbot. We're building the layer under the chatbot — the part that knows you, knows the conversation, and knows the work.
If you've ever shipped an agent and watched users hit the same five failure modes — this is the layer that fixes them.
Writes blog posts with you. Topic to publication, with your voice intact across the archive.
Lives in your data. Cleans, transforms, analyzes, reports — and remembers what's been answered.
A recruiting partner. Listings, applications, candidates — patient with a long pipeline.
The onboarder. Asks a few questions and produces the next assistant for your domain.
The same prompt, given to a typical agent and to Hugo. Same words from the user. Different response — because one of them has a model of the room.
That's the entire pitch in 30 seconds. The first agent had a context window. The second had a model.