We raised $20M at a $750M valuation

Blog

Mutable Intelligence: Cognition in Code, Not Weights

Apollo-1 is the first foundation model for neuro-symbolic agents. Its reasoning was built through symbolic representation, not deep learning, shaped from the ground up to work on behalf of companies, not users.

Authors: Ohad Elhelo, Ori Cohen, Co-Founders

01. Summary

Apollo-1 is a neuro-symbolic foundation model. Neural modules handle language and perception. Symbolic modules handle state, logic, and execution. Together — in one reasoning loop, in one pass — they reason over both open-ended language and typed symbolic logic.

An Apollo-1 agent is not stored in weights. It is a program, written in a typed symbolic language designed for task-oriented reasoning. The runtime executes the program. The program is the agent.

Because cognition is now code, it is editable, version-controlled, auditable. The company writes its agents, holds them, and changes them as the business changes. And because code is searchable, the programs improve continuously, against their own traffic. The 2025 leap in coding-agent capability made this operational: coding agents became fluent enough to read, modify, verify, and propose Apollo-1 programs end-to-end. The maintenance cost that had been the standing argument against neuro-symbolic AI at scale closed.

Generative AI works for users — its principal is the developer, the employee, the individual. Apollo-1 works for companies. An Apollo-1 agent converses with users but needs to follow business rules. Booking, claims, returns, disputes, payments. The company is the principal.

A different kind of model, for a different kind of agent, running on a different reasoning framework. Cognition in code, not weights.

02. Neuro-Symbolic AI

Neuro-symbolic AI is a framework for reasoning that composes neural perception with symbolic logic in a single model. Decisions are computed from typed symbolic state, not from token probabilities.

The architecture has been a research direction for decades. It produced no foundation model. Classical symbolic AI tried to encode meaning into its symbols, which forced ontologies to represent the world; the world did not fit. The structures did not compose across domains. The maintenance cost crushed every implementation.

Apollo-1’s symbols are procedural, not semantic. They carry roles, relations, state transitions, and predicates over state — the universal grammar of task-oriented dialogue. Content stays in the neural modules. The symbolic layer knows where a value sits in a program and what role it plays; it does not represent what the value means in the world. This separation is what made the architecture finite and computable for task-oriented dialogue.

Apollo-1 is the first foundation model for neuro-symbolic AI. A single model, frozen, that generalizes the capability — reasoning over typed symbolic state, binding it to natural language — across every program written for it. Same model, different program, different agent. Improvements to the model propagate to every program built on it.

Three architectural consequences run through the rest of this paper: language and logic operate in one computation; the agent’s cognition becomes a program, separate from the model that runs it; and because the program is software, the agent improves through a similar loop to the one that has driven a decade of machine learning — proposed change, validation, commit.

03. Two Kinds of Agents

Two distinct objects are emerging under the name “agent.” They are not variations of the same thing. They have different principals and different jobs.

Open-ended agents work for users. Coding assistants, personal AI, employee productivity tools. The user is the principal; flexibility is the point. Cognition lives at the model provider, and the user adapts to whatever the latest model does. The LLM is the right substrate.

Task-oriented agents work on behalf of companies. The agent that handles a claim, schedules a procedure, processes a return, files a dispute, authorizes a transfer, books a seat. These agents serve users; they represent the company. The company is the principal — the entity whose policies must be enforced, whose lawyers must approve, whose compliance team must audit, whose product team must change behavior when the business changes.

Task-oriented agents require three properties. An architecture has to provide all three.

Reasoning over both language and state, in one model. Users do not follow scripts — they ask unexpected questions, change their mind, go off on tangents. The agent has to reason over what they say. At the same time, the agent must evaluate conditions against state and produce guaranteed outcomes. The ticket cancels only when the passenger is Business Class and Platinum Elite. The payment processes only on explicit confirmation. The refund issues only on documented eligibility. Reasoning over open-ended language and reasoning formally over state has to happen inside the same model, not across two systems trading messages.

Mutable cognition. The agent’s behavior must move with the business. An airline’s cancellation policy changes; a bank’s dispute window changes; a hospital’s scheduling rules change. The agent has to move with them — without retraining, without a model release.

Intelligence in a program, not in weights. The agent has to be an artifact: something compliance can read, legal can sign on, engineering can version. Cognition in weights cannot be read, audited, or attributed to specific decisions. Without an artifact to point at, the agent’s behavior is no one’s responsibility, and no enterprise will deploy responsibility it cannot assign.

04. Why Current Approaches Struggle

Two architectures dominate task-oriented AI today: orchestration frameworks and function-calling LLM agents. Both are LLM architectures wrapped in different scaffolding. Both fail the three-property test for the same structural reason: rules are not a first-class object in either.

Orchestration Frameworks

Orchestration wraps an LLM in a workflow system: state machines, routing logic, branching conditions. The state machine reasons; the LLM converses. Two systems, two computations.

The systems do not share understanding. A user is mid-payment and asks “wait — what’s the cancellation policy before I pay?” No transition was coded for the digression. The system either breaks, gives a canned response, or forces the user back on script. You add a branch. Then users ask about refunds mid-payment. Or shipping. Real deployments accumulate hundreds of branches and still miss edge cases.

The alternative is handing off to the LLM, but the LLM has no model of the flow, the rules, or the accumulated state. It might process the payment without confirmation because it is predicting the next token, not reasoning from state. Conversation and reasoning end up inversely correlated: the tighter the state machine, the worse the user experience; the more the LLM is trusted, the less the behavior holds.

Function-Calling LLM Agents

Function-calling architectures take the opposite approach: give the LLM tools and let it decide when to invoke them. Conversation works. But the LLM’s decisions are sampled from a probability distribution, not computed from state. Prompting, fine-tuning, and output filtering reduce unwanted tool calls; they do not eliminate them. The LLM might call the refund function without verifying documentation. It might skip confirmation. It might invoke a tool with incorrect parameters.

Validation layers — checks that gate a tool call before it executes — help. They do not solve the structure: validation is reactive, the agent has already decided to take the action, and the validation logic is per tool, not derived from any shared model of the domain.

What Rules Are Actually For

Both architectures treat rules as add-ons. In orchestration, a rule is a branch — part of a flow, not part of a model. In function-calling, a rule is a sentence in a system prompt — advisory, soft, forgettable.

Neither treatment captures what rules do in task-oriented cognition. Rules are not just enforcement. They are the structural layer that lets symbolic logic and neural reasoning coexist: when a rule is a symbolic predicate the runtime evaluates, the symbolic side holds the logic absolutely while the neural side handles whatever language comes at it. The rule does not constrain the conversation. It stabilizes the cognition. Without rules-as-structure, the model has nothing to hold, and conversation and reasoning revert to the inverse correlation orchestration and function-calling cannot escape.

Until neuro-symbolic AI, no architecture combined open-ended conversation with reliable enforcement in one model.

05. Origins

In 2017, we began encoding millions of real-user task-oriented conversations into structured data, with a workforce of 60,000 human agents. The insight was not data scale; it was what must be represented.

Task-oriented conversational AI requires two kinds of knowledge in tandem. Descriptive knowledge — entities, attributes, domain content. Procedural knowledge — roles, logic, flows, policies. Datasets are stateless. Logic requires explicit state. We began building a typed symbolic language to capture these recurring structures.

Around 2021, the leap in language models arrived. Modern LLMs replaced the pre-transformer foundations our neural stack had been built on. Language stopped being the bottleneck, and only then did the rest of the architecture become reachable.

Across every domain we tested — booking, scheduling, claims, disputes, renewals, authorizations — task-oriented dialogue followed the same procedural patterns: parameter extraction, intent identification, logic evaluation, policy enforcement, state-dependent branching. We built the typed symbolic language out across these structures, and the Neuro-Symbolic Reasoner that computes next actions from that encoded state. The procedural logic inside the engine is not learned. It was constructed over the years by dissecting conversations into their symbolic elements, with a reputation system that ranked contributions by peer review. Augmented Intelligence — our company’s name — is the term for the loop that produced it.

The second outside contribution arrived in mid-2025: coding agents capable of reading, modifying, and verifying structured codebases end-to-end. Cognition expressed as code is the end-state of a typed symbolic language— what the language was designed to enable. The maintenance cost of authoring and evolving programs in our language had been the standing argument against neuro-symbolic AI at scale. With coding agents at production capability, that argument closed.

06. Apollo-1

Apollo-1 is built on neuro-symbolic architecture. Its inputs are typed symbolic programs and natural-language messages. Its outputs are typed symbolic states and natural-language responses. Reasoning happens in one pass over a single representation: neural modules handle language and perception, symbolic modules handle state and logic, both operating together inside the same computational loop.

The same computation that writes the sentence checks the rule. There is no moment at which the model could choose to break a rule, because rules are part of the computation that produces the response, not a check around it.

What makes Apollo-1 distinct from any other foundation model is where its cognition lives. A language model’s cognition lives in weights — a parameter tensor whose behavior is implicit, produced by training, modifiable only by more training. Apollo-1’s cognition lives in code — a symbolic program whose behavior is explicit, produced by writing, modifiable by editing.

The runtime compiles and executes the program. The program is the agent. Two agents on Apollo-1 run the same runtime and different programs; what makes one a refund agent and another a claims agent is a file. The runtime is shared. The program is local. Apollo-1 is a foundation model in the strict sense: a single model, frozen, that generalizes the capability — reasoning over typed symbolic state, binding it to natural language — across every program written for it.

The Symbolic Language

Apollo-1’s symbolic substrate is a typed programming language for task-oriented reasoning — a grammar that covers a finite domain (task-oriented dialogue) and a runtime that compiles programs written in it directly.

The language has three properties.

It is typed. Every entity, parameter, rule, and tool has a type the runtime checks. A program that does not type-check does not run. Programs cannot reference an entity that is not declared, attach a rule to a tool that does not exist, or pass a value of the wrong shape.

It is finite in its procedural states. Across every domain we have studied, the same procedural structures appear under different content. The grammar covers their full set. Generalization works because the structures repeat: the runtime operates on intents, parameters, actions, and predicates that remain constant across instances, while the values populating them change turn to turn and domain to domain. Novel inputs are approximated to the finite set of procedural states at inference.

It is expressive over content. Any value can occupy any field, because the symbolic language describes procedural roles, not world meaning. The symbolic layer knows where a value sits in a program and what role it plays; the neural layer handles what the value means in the world.

An Apollo-1 agent’s program is a typed JSON codebase: five files at the root — agent.aui.json, entities.aui.json, parameters.aui.json, integrations.aui.json, rules.aui.json – and a tools/ directory with one file per tool. Each file is a structured symbolic object with typed fields. Together they describe what the agent does, what it knows, what it must enforce, and what it can call. The program is text, readable end-to-end by anyone who can open a file.

The Runtime

Apollo-1’s runtime is a neuro-symbolic reasoner. At inference time, it takes a typed symbolic program and a natural-language message and produces agent behavior — turn by turn, against live state.

A Domain-Agnostic Encoder parses the message into typed symbolic objects — entities, attributes, relational roles — forming the initial symbolic state.

A Stateful Reasoning Loop iterates until turn completion: the Neuro-Symbolic State Machine maintains symbolic state (procedural progress and descriptive facts), the Symbolic Reasoning Engine computes next actions from state, the Neuro-Symbolic Planner compiles executable plans.

A Domain-Agnostic Decoder generates natural language from the final state.

Within the loop, neural and symbolic modules operate jointly. The State Machine and Planner are neuro-symbolic by construction — symbolic in operation, neural in the bindings between symbolic objects and the specifics of a turn (forming an exact API query from a typed tool call, for example).

Perception is probabilistic; action selection is not. Given the same state, the runtime makes the same decisions, and every decision in a trace is reproducible from the state that produced it. End-to-end outputs are not deterministic — perception runs through the neural modules, and two phrasings of the same request can produce different initial states. Once a state is formed, the logic over it is fixed. Failures in perception surface as task failure, not as policy violation.

The Symbolic Reasoning Engine is a formal, rule-based engine. Its procedural logic is not learned. It is the reasoning we constructed over years of research and development.

Authoring

Apollo-1 agents are authored in two places: the CLI, by developers; and the Playground, by anyone working on an agent. Both edit the same files. Apollo-1 reads them at inference, regardless of where they were authored. The two surfaces are described in their own chapter.

In the CLI, programs are written by the developer’s coding agent. In the Playground, programs are written by the Agent Builder — Apollo-1’s authoring agent: a coding agent embedded in an authoring harness.

Apollo-1 provides the language, the runtime, the schema, the templates, and the documentation. The coding agent does the writing.

The diagram traces Apollo-1’s authoring loop. The user prompt, together with the agent’s context — traces, conversations, documentation — feeds the Authoring Agent, which hypothesizes N candidate program changes. The Runtime compiles each candidate and runs M scenarios against it in parallel, returning M × N traces. The Authoring Agent scores the results. If the score is not 1, the loop continues, with the previous round’s context informing the next hypothesis. When the score reaches 1, the changes surface as suggested diffs for the user to review. The loop can be set to a fixed number of rounds or to run until all conversations pass.

Augmented Intelligence (AUI) Inc. Patents Pending.

07. Properties

Many properties follow from cognition being code rather than weights. Three are central. Rules become a first-class object. Every decision is recorded in a trace anyone can read. And the program improves the way software improves — by proposed change, validation, and commit.

Rules as a First-Class Object

In Apollo-1, rules are typed symbolic predicates the runtime evaluates against state. The agent does not break rules for the same reason a compiler does not ignore types: the rule is part of the computation that produces the response, not a check around it.

In function-calling LLM agents, rules are sentences in a system prompt — advisory, soft, forgettable. In orchestration frameworks, rules are branches in a state machine — rigid, siloed, coded by hand. Apollo-1 makes rules first-class symbolic objects.

The structural payoff is wider than enforcement. When rules are part of the computation, they do not constrain the conversation; they stabilize the cognition. The symbolic side holds the logic absolutely — the predicate fires or it does not — and the neural side stays free to handle whatever language comes at it. This is what allows Apollo-1 to reason formally and openly in a single model.

When you define your tools, Apollo-1 generates an ontology — a typed representation of your entities, parameters, and relationships, shared across all your tools and interactions. From the ontology, you define the rules the agent must enforce. Apollo-1 supports a growing set of rule types, each evaluated symbolically at runtime: Policy Rules (unconditional enforcement), Confirmation Rules (explicit user consent before execution), Authentication Rules (identity verification before execution), Conditional Rules (rules that apply only when specific conditions are met), and Sequencing Rules (rules that enforce ordering). You author them in natural language or by uploading a document; Apollo-1 expresses them as symbolic predicates. They live in rules.aui.json as code.

Logic enforcement is formal; perception is not. When the Symbolic Reasoning Engine evaluates a rule, the evaluation is deterministic — if the predicate is today - txn.date <= 8 days and the transaction is 9 days old, the action is blocked. Every time. Perception remains probabilistic, handled by the neural modules. The system can misunderstand what is being requested. It cannot decide to skip a required step or forget a policy mid-conversation. Misclassification affects whether an action is attempted, not whether the rule is enforced.

White-Box Traceability

Every turn produces a trace that records the symbolic computation in full: the user’s intent as parsed, the entities resolved, the tools considered, the rules evaluated, the predicates that fired, the parameters extracted, the decisions made and the reasons attached, the response generated. Each is a real object in the trace, addressable in code, comparable across turns.

A language model’s reasoning is a forward pass through a parameter tensor. What happens inside that pass is not addressable. You can ask the model why it did something and get an explanation, but the explanation is generated text, not a record of the computation.

Apollo-1’s reasoning is different. The trace is the computation, recorded as it happens. The runtime cannot decide one thing and trace another, because the trace is not produced separately from the decision — they are the same object. Why did you block that? has a literal answer the runtime can produce: this rule, this predicate, this state.

Failures can be diagnosed to the exact point where perception diverged from intent — and fixed by modifying the program, under validation, rather than by retraining a model. Compliance teams can audit which rules fired against which states across every conversation. Operations teams can read why a tool was or was not called. Product teams can locate the exact predicate responsible for a behavior they want to change.

Schema Optimization

Cognition expressed as code inherits the properties of code: it can be searched, compared, versioned, improved. Learning in this paradigm is not developed from scratch — it follows from cognition being code, and code being something you can search over.

The search is not gradient descent. Code is discrete; there is no slope to follow. Change one thing in a program and the behavior either changes or it doesn’t. But search-then-keep is older than gradient descent — genetic algorithms, program synthesis, reinforcement learning, AlphaZero produce real learning over discrete spaces. The principle that you can improve a system by proposing new versions, running them, and keeping the ones that score better has been established for a long time. What had not existed was a version of that loop for a foundation model running symbolic programs over open-ended language. Three pieces were missing: a symbolic language expressive enough to specify task-oriented reasoning directly, a runtime that executes it, and something that could propose new versions of programs in it — fast and cheap enough to run hundreds of candidates against a test set. Coding agents became that something. Apollo-1 runs the programs they write.

The diagram traces this loop end-to-end. The agent’s current traces on a held-out test set seed each round. The Authoring Agent hypothesizes N candidate program variants. The Runtime — the same Neuro-Symbolic Reasoner that handles inference at the deployment — compiles each variant and runs N parallel calls, returning N program traces for scoring. Top-scored candidates pass to the Validation Gate, which rejects any variant that regresses on previously passing scenarios. Surviving variants seed the next hypothesis round. The loop iterates until improvement plateaus; the resulting program is surfaced for review.

Language model agents improve when the model is updated. Apollo-1 agents can improve continuously, against their own traffic, at the site of deployment, by the company running the agent. Throughout, rules are evaluated symbolically the same way as at inference: a candidate program cannot improve a score by bypassing a rule, because the rule is part of the computation, not a check around it.

08. The CLI and the Playground

Apollo-1 agents are authored in two places. The CLI is the developer surface; the Playground is the working surface for everyone touching an agent — engineers, compliance officers, operations leads, product managers, customer-experience owners. Both edit the same typed JSON codebase. Both run against the same runtime.

The CLI

An Apollo-1 agent is a typed JSON codebase, edited in Cursor, VS Code, or any editor with full schema autocomplete. Version control is git. Diffs against rules.aui.json are real diffs.

Programs are written by coding agents in the developer’s terminal or IDE. The developer prompts a coding agent with what the agent should do; the coding agent generates the program against the symbolic language specification; the developer iterates. Apollo-1 provides the language, the runtime, the templates, and the documentation. Program creation is not gated. Anything that can read the spec and produce well-typed output can write an Apollo-1 agent.

Pull requests work as pull requests work. Tests against fixtures work as tests against fixtures work. An agent’s history is a commit log; an agent’s review is a code review; an agent’s rollback is a git revert.

The Playground

The Playground is the agent’s working surface. Engineers use it to inspect reasoning, watch rules fire, and iterate on a policy without leaving the browser. Stakeholders without code in their workflow use it to author and edit the program in natural language.

Two surfaces sit side by side. On the right, the agent — the code at runtime. The agent talks and acts; click any turn and the white-box trace opens — initial state, execution, rule evaluation, generation. On the left, the brain — the agent in code. The same program, addressable in three modes: Build (English, where the Agent Builder authors and modifies the program), View (the structured UI), and Code (the raw .aui.json files). Same files in all three modes. Same files Apollo-1 reads at inference.

In Build mode, you work with the Agent Builder — Apollo-1’s authoring agent. The Agent Builder is a coding agent embedded in a context-rich authoring harness: read-write access to the program, live access to the reasoning trace of every turn, schema validation that ensures proposed changes type-check, scenario evaluation against test cases, and a quality gate that decides what commits. The user describes intent. The Agent Builder reads the program, forms a hypothesis about what change would achieve the intent, proposes a diff. If the first attempt does not land, it iterates. Every edit is a diff against a real file. Every change is a version: auditable, revertable, attributable. The same version is viewable as a code diff in the Playground itself, appears as a diff in the engineer’s IDE, and — when pushed — is what Apollo-1 reads at inference.

We tested the full loop against a Credit Card Dispute agent configured with the rule “block disputes when a transaction is pending or already disputed.” A user requests a dispute on a $99.99 Apple iTunes charge that is still pending; the Symbolic Reasoning Engine evaluates the predicate against state, the action is blocked, the trace records every step. In the Agent Builder, an operations lead types “policy change: we now allow users to dispute pending transactions.” The Agent Builder locates the rule, produces a diff against rules.aui.json, updates the user-facing explanation so the agent’s spoken refusal matches the new policy, evaluates the change against scenarios, and pushes it live. In a fresh conversation, the agent files the same pending Apple iTunes dispute — blocked under the old rule, processed under the new one. The loop took under a minute, with no engineer involved and no redeployment, and every step is auditable. See below:

09. What Apollo-1 Isn’t For

Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, it does not compete in other domains — by design.

Open-ended creative work. Creative writing, brainstorming, exploratory dialogue where variation creates value. Transformers remain the superior architecture. Apollo-1’s symbolic structures enforce consistency; creativity often requires the opposite.

Code generation. Apollo-1 can integrate with code execution tools, but its symbolic language is purpose-built for task execution, not software development.

Low-stakes, high-variation scenarios. Customer engagement campaigns, educational tutoring, entertainment chatbots — when conversational variety enhances user experience, probabilistic variation is preferable to formal enforcement.

10. General Availability

Ahead of general availability, Apollo-1 is already deployed at scale across dozens of enterprises in regulated and unregulated industries, including Fortune 500 companies. Strategic go-to-market partnership with Google.

A Preview Playground is accessible here, featuring Apollo-1 agents across HR, IT, regulated industries, retail, automotive warranties, and more — domains where we have active early deployments, simulated for preview. Each agent runs from its program alone, viewable as code in the Playground or in a UI view for non-technical stakeholders. The Agent Builder is available on every one of them.

A technical paper — architectural specifications, formal proofs, procedural ontology samples, turn-closure semantics — will be released alongside GA.

General Availability: Q2 2026.

Apollo-1 integrates with existing generative AI workflows and adapts to any API or external system — no changes to endpoints, no data preprocessing. Native connectivity with Salesforce, HubSpot, Zendesk, and others. Full MCP support.

GA launches with:

The Conversational API — for task-oriented dialogue, with Messaging and System Prompt endpoints.
The Apollo-1 Playground and CLI.
Full documentation and toolkits.

Following in 2026:

The Workflow Automation API — a second API modality, for task-oriented workflows that do not originate in a user message.
Voice support.
Fully local agent development in the CLI — pushing the agent file as-is, without server-side compilation.

Apollo-1 improves on three axes. Its neural modules improve with every advance in low-latency LLMs. Its symbolic language evolves as we extend its coverage of task-oriented reasoning. And as coding-agent capability advances, so does the ease of building on, evolving, and learning from Apollo-1.

11. Conclusion

Open-ended agents work for users; their cognition lives at the model provider, and that suits the user.

Apollo-1 is the foundation model for agents that work on behalf of the entity the user is talking to — agents whose cognition the company writes, owns, and changes. Filing claims, opening disputes, processing returns, authorizing payments, completing bookings: these are the conversations that run the economy.

A different kind of model, for a different kind of agent, running on a different reasoning framework — the first every organization can call its own.

Back