Pyre 0.1.x · Public BetaMCP-nativeApache 2.0 · open-coreAuto-detects your hardwareLatestPluggable storage in Engram · v0.2 desktop shipped

The control plane
for local intelligence.

Engram remembers. Persona speaks. Cortex knows. Pyre brings them to life.

A multi-agent AI runtime that runs on your hardware. Orchestrates any model from any provider — local or cloud, switched per conversation. Composes Engram for personal memory, Persona for personality, and Cortex for company memory — each one a standalone MCP server you can also use in Claude Desktop, Cursor, Cline, or any MCP-compatible client.

▸Qwen3-14Bon a9070 XT—30K visible/60–80K effectivecontext ·8-hour agent sessions·$0 API spend.

Install Pyre See the cognitive stack github.com/OneNomad-LLC/pyre

API spend

Local models, BYOK cloud

200K

Effective context

On a 16 GB consumer GPU

8 hr

Agent sessions

Without context blowup

// The moat

The local cognitive stack.
Free in Core.

Ollama and LM Studio run models. Pyre runs an orchestration system on top — four engineered pieces that turn 16 GB of consumer VRAM into something that feels like a 200K-context cloud model. Co-designed across Engram, Persona, and the Pyre engine, so the components compound instead of fighting each other.

01@onenomad/pyre-context-budget

Context Budget Engine

One tested module owns the watermark math: clamp(floor(ctx × 0.65), 2k, 24k). Slot-aware so the runtime decides what to compact instead of dropping the conversation. Renderer and engine read from the same source — no drift between client and server.

02@onenomad/pyre-tool-vault

Tool Output Vault

Pyre exposes ~70 tools — that's 70 ways to blow your context. The Vault catches tool returns, stores raw output on disk, and hands the agent a structured summary plus a get_tool_output(id) escape hatch. 40–60% token reduction on agentic sessions.

03@onenomad/pyre-compaction-sidecar

Compaction Sidecar

A small model on a parallel slot summarizes scrollback without touching your main inference loop. OpenAI-compatible — works with llama-server, Ollama, vLLM, OpenRouter. Drop-in upgrade for the Vault summarizer; falls through gracefully on outage.

04@onenomad/persona-mcp

Persona minimal-context mode

Soul + persona at three sizes — minimal (~400 tok), standard (~1-2k), full (~3-16k). Pick the budget; keep the personality. Drops VOICE & STYLE on minimal so a 14B model on a 16 GB GPU still feels like itself at 30K visible context.

// Receipts

92%

LoCoMo R@10

Engram, today — beats Mem0 / Zep / Letta

40–60%

Token reduction

Tool Output Vault on agentic runs

8 hr

Agent run, no blowup

Side-by-side video vs vanilla auto-truncation

Soon

LongBench-v2 · HELMET · RULER

In the eval queue · receipts coming

// The Stack

Four primitives.
One runtime.

Same runtime from a single user's laptop to a company's on-prem deployment — local models or any cloud provider, your data, your hardware. Engram and Persona are standalone MCP packages on npm — composed by Pyre, usable from any MCP client.

Flagship · Beta

Pyre

The control plane for local intelligence.

Runs Qwen3-14B on a 16 GB consumer GPU with up to 200 K effective context and 8-hour agent sessions at zero API spend. Multi-agent runtime that adapts to your hardware, switches between local and any cloud provider per conversation, and remembers everything via Engram. Pro keeps your agents, personal memory, and personality continuous across every device. Enterprise adds on-prem deployment, SSO, audit, and managed company-memory ingestion via Cortex.

Personal · FreePro · $20/moEnterprise · LicensedMCP-nativeMac · Win · Linux

Local

llama.cpp · MLX · Ollama

Cloud

OpenAI · Anthropic · OpenRouter

Personal memory

Engram

Identity

Persona

Apps

Desktop · CLI · Web · Chrome

License

Apache 2.0 · open-core

Engram

MCP server

remembers

Long-term personal memory for AI agents. 92% R@10 on LoCoMo — beats Mem0, Zep, Letta, and ChatGPT memory. Pluggable storage backend: the same codebase serves a local user via LanceDB or a multi-tenant cloud deployment via Postgres + pgvector.

$npm install @onenomad/engram-mcp

License · Apache 2.0·GitHub·npm

Persona

MCP server

speaks

Evolving personality for AI agents. Three-part soul system keeps a coherent voice that grows with you instead of resetting every session.

Soul — User territory — PERSONALITY.md, STYLE.md, SKILL.md files you edit directly.
Role — Overlays for context — developer, designer, pm, writer, researcher.
Journal — Persona's territory — evolution proposals land here before you apply them.

$npm install @onenomad/persona-mcp

License · Apache 2.0·GitHub·npm

Cortex

MCP server

knows

Your company’s knowledge as one searchable surface — Confluence, Jira, Linear, Notion, Obsidian, Bitbucket, GitHub, Slack — with permission-aware retrieval and on-prem deployment. Stateless on the answer side: retrieval and ranking happen in Cortex, LLM calls happen in the connected runtime. No duplicated LLM cost in the multi-tenant tier — enterprises bring their own model.

License · Source-available · commercial·onenomad.dev/cortex

// Cross-client

One memory. One personality. Every client.

Engram and Persona are MCP servers. Any MCP-compatible client connects to the same memory and the same personality — Pyre, Claude Desktop, Cursor, Cline, Windsurf, Continue, ChatGPT connectors. Switch tools without re-introducing yourself.

MCP clientPyre

MCP clientClaude Desktop

MCP clientCursor

MCP clientCline

MCP clientWindsurf

MCP clientContinue

MCP clientChatGPT connectors

MCP→same data

Cognitive layer

Engram

One memory follows you across every MCP client.

Persona

One personality follows you across every MCP client.

Most memory products lock you into their app. Engram and Persona are npm packages with open MCP servers — your data is yours, and so is the choice of client.

// Tiers

Free locally.
Pay for continuity.

Open-core funnel: free local runtime → paid cloud features → enterprise contracts. Same product, three deployment shapes.

Pyre · Core

Personal · Free forever

$0Apache 2.0 · open-core

Install

Full local runtime — every model, every provider
Single-machine personal memory (Engram) + personality (Persona)
Multi-agent orchestration up to your hardware limit
Desktop · CLI · Web server · Chrome companion
Local cognitive stack — 200K effective context on 16 GB GPU

Pyre is not...

Helps to know what Pyre isn't before deciding if it's for you.

× A modelWe don't train. Bring whichever you trust.
× A chat wrapperNot competing with the ChatGPT / Claude / Gemini surface.
× A cloud inference serviceSelling cloud inference contradicts the local-first brand.
× A workflow builderNot n8n / Zapier / Make. Different problem.

// Why Pyre

vs the alternatives

Pyre overlaps with several categories without belonging to any of them. Here's what we do that they don't.

vs Ollama · LM StudioRun models locally

Multi-agent runtime + personal memory + persona on top. The local cognitive stack (CBE · Vault · Sidecar · Persona-min) makes Pyre Core measurably better at long-running agentic work — not just a prettier model launcher.

vs ChatGPT · Claude · GeminiHosted chat with one model

Local-first by default. Multi-provider — switch model per conversation. Persona persists across sessions; Engram remembers. Your data stays on your machine unless you choose otherwise.

vs Cursor · Continue · ClineIDE-embedded AI

Different surface. Pyre orchestrates broader work — research, docs, agents, glue between tools — not just coding. Use Pyre alongside your IDE, not instead of it.

vs Glean · Microsoft 365 CopilotWorkplace search + AI answers

Cortex returns answers. Pyre takes action. Open-core, on-prem option, MCP-native, multi-source — not locked to one vendor's collaboration suite.

// Philosophy

We don't train models.
We orchestrate them.

Most AI tools today are chat wrappers around a single hosted model, or vertical apps that lock you into one stack. Pyre's bet: own the layer between your data and the model. That layer is durable when models change, durable when providers change, and accumulates value as your persona and personal memory deepen.

Local-first by default

Pyre runs on your machine — CPU, Mac Metal, CUDA, ROCm, Vulkan all auto-detected. Your data never leaves unless you choose a cloud provider per conversation. No vendor lock-in, no telemetry tax.

Open-core, not open-bait

Apache 2.0, open source from day one. Cloud and enterprise tiers fund the work. The Core tier is genuinely good, not a crippled demo. Cortex is the one exception — source-available under a separate commercial license.

Own the cognitive layer

Models are commoditizing. The durable value is personal memory, personality, and knowledge — the layer between your data and any model. Pyre owns that layer so you can swap models without losing yourself.

Adapters

OpenAI · Anthropic · OpenRouter · Ollama · llama.cpp · MLX · LM Studio

Apps

Desktop · CLI · Web · Chrome

Hardware

CPU · Metal · CUDA · ROCm · Vulkan

License

Apache 2.0 · open source

// About

Built by someone who lives in the workflow.

Matt Stvartak. Senior JavaScript engineer. Solo founder of OneNomad.

The whole stack — Pyre, Engram, Persona, Cortex — was shipped in under 30 days using Claude Code as an engineering pair. Pyre is built by someone who lives in the exact AI-augmented workflow Pyre is designed for.

OneNomad-LLC Twitter @onenomaddev Bluesky @onenomad.bsky.social hello@onenomad.dev

Investors

Pyre is in public beta. We're raising a small pre-seed. Investor inquiries: hello@onenomad.dev.

// Get Pyre

One command. Any machine.

Pyre auto-detects your hardware on first launch — CPU, Mac Metal, CUDA, ROCm, or Vulkan — and picks a model that fits. No config files, no ceremony.

terminal

# macOS / Linux
curl -fsSL https://getpyre.dev/install.sh | sh

# Or via npm
npm install -g @onenomad/pyre

# Run
pyre start

// Native installersAll releases →

macOS

Pyre Desktop

Apple Silicon.dmg

Intel·Homebrew

Windows

Pyre Desktop

Installer.exe

Portable·Winget

Linux

Pyre Desktop

AppImagex86_64

.deb·.rpm

Release notes Read the source

The control planefor local intelligence.

The local cognitive stack.Free in Core.

Context Budget Engine

Tool Output Vault

Compaction Sidecar

Persona minimal-context mode

Four primitives.One runtime.

Pyre

Engram

Persona

Cortex

One memory. One personality. Every client.

Free locally.Pay for continuity.

Pyre is not...

vs the alternatives

We don't train models.We orchestrate them.

Local-first by default

Open-core, not open-bait

Own the cognitive layer

Built by someone who lives in the workflow.

One command. Any machine.

The control plane
for local intelligence.

The local cognitive stack.
Free in Core.

Four primitives.
One runtime.

Free locally.
Pay for continuity.

We don't train models.
We orchestrate them.