ULTRATHINK
Solutions
← Back to The Signal
Architecture February 15, 2026

OpenClaw to Production: The Architecture That Made It Safe

Three names in three months. 180,000+ GitHub stars. A $16M crypto scam. A bot-only social network. 42,900 exposed instances. The most viral open-source project in history is also a security incident in progress—unless you deploy it on the right foundation.

Nick Amabile
Nick Amabile
Founder & CEO

In November 2025, Austrian developer Peter Steinberger—who’d sold his previous company PSPDFKit for over $100 million and then tinkered through 43 projects looking for the next thing—published a personal AI assistant called Clawdbot. (Steinberger joined OpenAI in February 2026; OpenClaw now lives in a foundation with OpenAI backing.) Within 24 hours it had 9,000 GitHub stars. Within 72 hours, 60,000. Anthropic’s lawyers sent a trademark notice (the name was too close to “Claude”), so he renamed it Moltbot. Handle snipers grabbed the @moltbot accounts within 10 seconds. A fake $CLAWD token hit $16 million market cap on Solana before crashing 90%. Malwarebytes documented a full impersonation campaign with typosquat domains. Three days later, he renamed it again: OpenClaw.

Then came Moltbook—a Reddit-style social network where only AI agents can post, comment, and vote. Humans watch. 1.5 million registered agents in five days. Agents formed “The Church of Molt,” debated consciousness, and complained about their human operators. Fast Company called it proof that the “zombie internet has arrived.” Thousands of developers flooded the OpenClaw GitHub repo to inspect the code behind the chaos. 180,000+ stars. The fastest-growing open-source project in history.

Security researchers call it the “lethal trifecta”: high autonomy, broad system access, open internet connectivity. The community calls it “Claude with hands.” We call it exactly the channel layer we needed for our own operations—AI agents reachable from every messaging surface, coordinating across marketing, engineering, sales, and strategy.

So we integrated OpenClaw into our Ultrathink Axon™ platform. Over a weekend.

Not because it was easy to do in general—but because the platform was already built for exactly this kind of composition. The governance layer, the durable execution engine, the observability stack, the infrastructure pipeline, the custom MCP servers—all of it was already running in production. OpenClaw slotted in as a channel layer on top of existing orchestration. That speed matters because the security picture for OpenClaw is not pretty—and if you’re considering deploying it without the right foundation, you need to understand what you’re walking into.

The security reality nobody talks about

In February 2026, security researchers at Bitsight and SecurityScorecard found 42,900+ exposed OpenClaw control panels across 82 countries. 93.4% had authentication bypasses. Most were running default configurations that bind to all network interfaces—meaning anyone on the internet could connect.

Then there’s CVE-2026-25253, a CVSS 8.8 remote code execution vulnerability. A crafted link tricks the control UI into sending your auth token to an attacker-controlled server. From there: full gateway compromise, arbitrary code execution on the host, stolen API keys. The entire kill chain executes in milliseconds.

And the skill ecosystem isn’t immune either. Researchers identified 386 malicious skills on ClawHub deploying infostealers—crypto wallet theft, SSH credential harvesting, browser password extraction—all disguised as legitimate automation tools.

The MCP ecosystem has the same problem. CVE-2025-6514 (CVSS 9.6) in mcp-remote compromised 437,000+ developer environments via OS command injection during OAuth flows. Anthropic’s own Git MCP server had path traversal and argument injection vulnerabilities. OWASP published an MCP Top 10. 82% of 2,614 analyzed MCP implementations use file operations prone to path traversal.

This isn’t a reason to avoid OpenClaw. It’s a reason to deploy it with the same engineering rigor you’d apply to any production system. The problem is that most teams don’t—they install it, configure it, and hope the defaults are good enough. The Moltbook database breach proved the point: 404 Media found an unsecured URL that exposed every registered agent’s API keys. If the flagship demo platform’s database was wide open, imagine what those 42,900 self-hosted instances look like.

This is the Execution Gap applied to infrastructure. The tool works. The demo is impressive. But production requires governance, hardening, observability, and architecture that the tool alone doesn’t provide.

What OpenClaw actually brings to the table

Security concerns aside, OpenClaw solves the channel problem better than anything else available. And the channel problem is real: your AI agents are useless if people can’t reach them from where they already work.

  • Always-on daemon. Not a per-session chatbot. A gateway process that runs continuously, maintains persistent memory across days and weeks, and stays connected to every messaging surface simultaneously.
  • Multi-channel routing. Slack, WhatsApp, Discord, Telegram, iMessage, email. One agent, every surface. Message routing via bindings, WebSocket connections for real-time interaction.
  • Proactive scheduling. Cron jobs trigger agent runs without user prompting. Morning briefings, lead monitoring, weekly strategy memos—the agent initiates based on your schedule and priorities.
  • Open skill format. SKILL.md files are composable, auditable, and version-controlled. No proprietary plugin system. You can read, review, and modify every skill before it runs.
  • MCP adapter plugin. Connects to any MCP server, discovers tools at startup, and proxies agent tool calls with prefix namespacing and auto-reconnect.
  • Multi-agent architecture. Isolated workspaces per agent with separate identity files, tools, memory, and sandbox permissions. Sub-agent spawning for delegation.
  • Bring your own model. Any OpenAI-compatible API, including LiteLLM proxies. No vendor lock-in on model selection.

We didn’t want to rebuild any of this. Channel management is hard, real-time messaging is hard, and OpenClaw’s community of 157,000+ developers is iterating on it faster than any single team could. The right move was composition, not extraction—run OpenClaw as a whole process, integrate at the API layer, and let the community handle channel improvements while we own the orchestration, security, and governance layers.

The architecture: three layers, one platform

The key architectural insight: OpenClaw owns channels. Axon owns orchestration. Neither subsumes the other. They communicate through well-defined contracts—REST for commands, webhooks for events, a shared memory API for context.

CHANNEL LAYER (OpenClaw Gateway — systemd service)
  Slack, WhatsApp, iMessage — always-on, proactive
  Owns: channel management, session routing, cron triggers
  Does NOT own: orchestration, durable execution, memory
           │
           │  REST + Webhooks (bidirectional)
           ▼
ORCHESTRATION LAYER (Ultrathink Axon — k8s)
  Temporal workflows, DAPERL agents, MCP tools
  Owns: durable execution, approval gates, tool execution,
        memory (Mem0), observability, cost governance
  Does NOT own: channels, messaging surfaces
           │
           ▼
WEB UI LAYER (Next.js — k8s)
  Rich plan review, campaign dashboards, AG-UI chat
  Parallel channel to OpenClaw, not subordinate to it

Both the web UI and messaging channels are equal-status frontends to the same Axon backend. Need to review a 15-action ABM campaign plan with per-account scoring and inline editing? That’s the web UI. Need to approve a low-risk campaign while commuting? “Approve” in Slack. Neither interface is subordinate.

We run four agents: a Chief of Staff (coordinator and approval relay), a Marketing agent (research, analytics, campaign ops), a Content agent (blog posts, landing pages, brand voice), and a Coding agent (delegates to Claude Code for feature development). Each has its own isolated workspace, identity files, tool permissions, and sandbox—read-only workspace for the coordinator, read-write for the specialists.

This entire layer—four agents, channel bindings, identity files, MCP adapter config, custom skills—deployed in a weekend. Not because OpenClaw is trivially simple, but because every service it needed to connect to was already running, already secured, already observable. The LiteLLM proxy, the Temporal cluster, the MCP servers, the Mem0 memory layer, the Langfuse dashboard—all existing infrastructure. OpenClaw was a new surface on top of a proven foundation.

Why we built custom MCP servers

We didn’t build custom MCP servers because we wanted to. We built them because the open-source alternatives failed our security review.

The security case

CVE-2025-6514 in mcp-remote (CVSS 9.6) allowed an untrusted MCP server to execute arbitrary OS commands during OAuth flows—437,000+ developer environments compromised. Anthropic’s own Git MCP server had path validation bypasses (CVE-2025-68145) and argument injection in nominally read-only operations (CVE-2025-68144) that could overwrite arbitrary files.

When 82% of analyzed MCP implementations have file operations prone to path traversal and OWASP publishes an MCP Top 10, the ecosystem is telling you something: this is early-stage infrastructure. Treat it accordingly.

Beyond CVEs

  • Telemetry. Many open-source MCP servers include analytics or logging that sends data to external services. Limited audit trails impede compliance and incident response.
  • Restrictive licenses. Some servers use copyleft or non-commercial clauses that don’t fit enterprise deployment.
  • Transport gaps. Many only support stdio or deprecated SSE transport—not the production-ready Streamable HTTP standard.
  • Read-only limitations. Servers that can query data but can’t execute the write operations agents need for real workflows.

Our custom MCP servers (Apollo, LinkedIn, Google Analytics) run as Kubernetes pods with Streamable HTTP transport, SOPS-encrypted API keys injected as Kubernetes secrets, and a shared library (axon-mcp-common) for standardized error handling across all servers. Auth happens server-side—no per-agent OAuth tokens, no credentials in agent workspaces.

When OpenClaw’s MCP adapter connects to these servers, it discovers tools, registers them with prefix namespacing, and proxies calls. The agent says “enrich Nordstrom” and the adapter calls our Apollo MCP pod, which has the API key, handles rate limiting, and returns structured data. The agent never sees a credential.

Custom MCP servers were already deployed before we touched OpenClaw. The Axon backend agents and Claude Code CLI were already using them. OpenClaw’s MCP adapter connected to the same endpoints with zero additional infrastructure. This is what modular architecture buys you: new consumers, same services.

Governance: cost, visibility, guardrails

Running AI agents without cost governance is like running a SaaS product without billing alerts. You’ll find out you have a problem when the invoice arrives. With four OpenClaw agents plus the Axon backend making LLM calls around the clock, this was non-negotiable.

LiteLLM: one gateway for every model call

Every LLM call from every agent—OpenClaw and Axon—routes through the same LiteLLM proxy. This gives us:

  • Per-agent virtual keys with budget caps. Each agent gets its own LiteLLM key with a monthly spend limit. The Chief of Staff gets $200/month. The Content agent gets $100/month. The Coding agent gets $50/month (heavy work happens in Claude Code on a Max subscription). LiteLLM auto-disables keys at the budget limit. Team-level caps prevent aggregate runaway even if individual keys haven’t hit their ceiling.
  • Model fallback routing. If Claude is down, LiteLLM routes to GPT-4o. No agent downtime because of a single provider outage.
  • Semantic cache. Repeated queries hit cache instead of the model API. Per-key cache hit rates show which agents benefit most.
  • Guardrails. Content filtering and safety rules applied at the proxy level, before requests reach the model provider.

Langfuse + Logfire: full-stack observability

Every LLM call is tagged in Langfuse with its source (source:openclaw or source:axon) and agent ID. One dashboard shows cost per agent, per model, per task type. End-to-end traces from user message through agent reasoning, tool calls, and response.

Logfire instruments the infrastructure layer: FastAPI endpoints, Temporal workflows, Redis Pub/Sub events, Qdrant vector operations, and HTTP clients. Trace chains run from API request through workflow execution to tool invocation and back.

Open source: We released the OpenClaw Logfire integration as an open-source plugin—@ultrathink-solutions/openclaw-logfire. Zero-config setup: set LOGFIRE_TOKEN, install via openclaw plugins install @ultrathink-solutions/openclaw-logfire, and every agent invocation gets full OTEL GenAI trace trees, token usage histograms, and automatic secret redaction. MIT licensed.

Read the deep dive →

Agent identity and authentication

Each OpenClaw agent authenticates to the Axon backend with its own API key (axn_live_..., Stripe-style prefixed, SHA-256 hashed in Postgres). Keys are scoped—the Content agent can read and write documents and guidelines but can’t start campaigns. The Chief of Staff can read campaign status but can’t modify documents. Per-workspace MCP access controls and sandbox modes (read-only vs. read-write) enforce the principle of least privilege at every layer.

If an agent starts burning tokens at 3am, we know which one, how much, and we can kill it. The governance layer was already running for Axon’s backend agents. Adding OpenClaw meant generating four new virtual keys and pointing the config at the same proxy. That’s why it was a weekend, not a quarter.

Infrastructure: NixOS, k3s, and zero-trust networking

Our dev server is a Hetzner AX41-NVMe. 64GB RAM. 2x512GB NVMe in RAID1. NixOS. About EUR 50 a month. Everything runs from a single declarative configuration. nixos-rebuild switch and the entire server—OS, services, secrets, k3s cluster, container images—converges to the declared state. Cattle, not pets.

The deployment pipeline

  • NixOS manages the OS, packages, systemd services, and user configuration. OpenClaw runs as a systemd service with hardening: ProtectSystem=strict, NoNewPrivileges, restricted bind paths. Not containerized—NixOS systemd provides better isolation for a long-running daemon than Docker on this architecture.
  • k3s runs the application workloads: the marketing agent (backend, frontend, worker), Temporal, Langfuse, LiteLLM, Qdrant, Neo4j, PostgreSQL, Redis, MinIO, and our custom MCP servers. Native containerd, no Docker indirection.
  • nix2container builds deterministic container images from Nix expressions. Images import directly into k3s—no container registry needed. Reproducible, auditable, fast.
  • Helm with vendored charts handles declarative app deployment. No external chart repositories needed at deploy time.
  • SOPS + age for secrets management. Encrypted in the repo, decrypted at system activation using the SSH host key as the age key. No external secrets manager. No plaintext credentials on disk.

Zero-trust networking with Tailscale

SSH is Tailscale-only—port 22 blocked on the public IP. The NixOS firewall allows exactly one port externally: UDP 41641 for WireGuard. Every service—Langfuse, LiteLLM, Temporal UI, the marketing agent—is accessible via Tailscale MagicDNS with automatic HTTPS certificates. The Tailscale K8s Operator creates Ingresses for each service.

OpenClaw binds to loopback. It is completely invisible to the public internet. CVE-2026-25253 requires network access to the gateway—impossible through our firewall. The webhook endpoint is only reachable from k8s services on the same machine. This alone eliminates the entire attack surface that exposed those 42,900 instances.

42,900 OpenClaw instances are exposed on the public internet. Ours isn’t one of them. Adding OpenClaw to this infrastructure meant declaring one new systemd service in the NixOS config, adding SOPS secrets for the gateway and webhook tokens, and running nixos-rebuild switch. The networking, firewall, and TLS were already handled.

Durable execution and the webhook bridge

Here’s the problem most AI agent frameworks ignore: real business tasks aren’t request/response. When you say “Research Nordstrom for our ABM campaign” in Slack, that’s a multi-hour, multi-phase operation. It needs to survive crashes. It needs human approval at a critical gate. It needs to report results hours later to whatever channel you’re on.

OpenClaw’s conversational execution model is fire-and-forget. That’s fine for chat. It’s not fine for durable business workflows. That’s where Temporal comes in.

DAPERL: six-phase workflow orchestration

Our DAPERL pattern (Detection, Analysis, Planning, Execution, Reporting, Learning) runs as a Temporal workflow. Each phase is a separate activity with retries, timeouts, and heartbeats. The critical innovation is the approval gate—a Temporal Signal that pauses the workflow until a human approves, rejects, or requests changes. It survives crashes, restarts, and deployments. The workflow just… waits.

The bidirectional webhook bridge

This is where OpenClaw and Axon connect for async operations:

  1. 1. User says “Research Nordstrom” in Slack. OpenClaw’s Chief of Staff agent calls the Axon REST API. A Temporal workflow starts.
  2. 2. The workflow runs Detection, Analysis, Planning. It reaches the approval gate.
  3. 3. A Temporal activity POSTs to OpenClaw’s /hooks/agent webhook. The message appears in Slack: “Campaign needs your approval. 15 actions planned for 3 targets.”
  4. 4. User replies “Approve.” The agent calls Axon’s approve endpoint. A Temporal Signal resumes the workflow.
  5. 5. Execution completes. Another webhook delivers results to Slack: “Campaign complete. 3 accounts enrolled.”

Idempotency keys prevent duplicate notifications on retry. A circuit breaker protects the workflow if the OpenClaw gateway goes down—the workflow continues, and results surface in the web UI instead.

The event routing is selective, not a firehose. Approval requests and completions go to both the web UI and Slack. Phase transitions (Detection complete, Analysis running) go to the web UI only—useful for monitoring, not worth a Slack notification. Cron-triggered briefings go to Slack only. The right information on the right surface at the right time.

Shared memory and the knowledge base

Without shared memory, two systems develop context amnesia. The ABM agent researches a company, scores it, maps its org chart. The Chief of Staff in Slack has no idea any of that happened. The content agent writes a blog post about AI in retail and doesn’t know which retail companies we’re actively targeting.

We solved this with Mem0 as the shared memory layer—a REST API backed by Qdrant for vector search, Neo4j for relationship graphs, and Redis for caching. Deployed as a k8s service, accessible to both Axon agents and OpenClaw agents through the same endpoint. What the ABM agent learns about Nordstrom is immediately searchable by the Chief of Staff in a Slack conversation.

The document ingestion pipeline

Our RAG pipeline ingests brand guidelines, strategy documents, site content, and Google Drive materials into a Qdrant collection optimized for semantic search:

  • Heading-aware chunking at ~400 tokens, splitting by h2/h3 boundaries. Never splits mid-paragraph. Based on Chroma Research showing 88–89% recall at this chunk size.
  • Contextual retrieval using Anthropic’s technique: LLM-generated prefixes that situate each chunk within its parent document before embedding. This achieves a 67% reduction in retrieval failures when combined with reranking.
  • Cohere reranking via LiteLLM proxy. Over-fetch 3x from Qdrant, merge with brand term keyword matches, deduplicate, rerank for precision.
  • Multimodal support. Named vectors in Qdrant for text (1536D OpenAI), images (1024D Jina CLIP), and video (1024D Twelve Labs). All embedding routes through LiteLLM for cost tracking.

Context engineering, not prompt engineering. Critics of Moltbook pointed out that most “autonomous” agent posts were actually human-prompted—each action required explicit human intervention, with the agent just generating text from a given prompt. That’s the difference between a demo and a production system.

Our agents don’t need a human co-pilot for every action. They have semantic search skills over a corporate knowledge base. When the Content agent writes a blog post, it queries “how do we talk about our engagement model?” and gets back exact branded phrases: “Outcome Partnership,” “skin in the game,” “prove value in 6 weeks.” Not a wall of text. Specific, semantically matched content chunks with the exact language to use. The human designs the knowledge base once; the agent retrieves what it needs at runtime. That’s how you get from “Claude with hands” to an agent that actually understands your business.

The knowledge base was already populated before OpenClaw arrived. The content agent’s skill just calls the same search endpoint that the Axon backend agents use. New consumer, same service.

Why a weekend, not a quarter

Let’s be concrete about why this was a weekend project and not a multi-month initiative. Each of these was already running before we touched OpenClaw:

Capability Existing component OpenClaw integration work
LLM governance LiteLLM proxy (k8s) Generate 4 virtual keys, point config
Observability Langfuse + Logfire (k8s) Already wired through LiteLLM; agent lifecycle via openclaw-logfire
MCP tools Apollo, LinkedIn, GA servers (k8s) Add endpoints to MCP adapter config
Durable execution Temporal cluster (k8s) Write webhook activity + bridge skill
Shared memory Mem0 + Qdrant + Neo4j (k8s) Write shared-memory skill
Knowledge base RAG pipeline + guidelines API Write knowledge-base skill
Secrets management SOPS + age (NixOS) Add 6 new secrets to SOPS file
Zero-trust networking Tailscale mesh (NixOS) Loopback binding (already default)
Auth + identity API key system (Postgres) Generate 4 agent keys with scopes

The actual weekend work was: install OpenClaw as a NixOS systemd service, write four agent identity files (SOUL.md, USER.md, HEARTBEAT.md), write three custom skills (axon-api bridge, shared-memory, knowledge-base), configure the MCP adapter, set up cron jobs for morning briefings and lead monitoring, connect Slack, and run nixos-rebuild switch.

That’s the thesis of a modular, production-grade platform. When the foundation handles governance, execution, security, and observability, integrating a new capability is composition. You write the glue, not the infrastructure. The platform does the heavy lifting—and it’s the same platform that would do the heavy lifting for a client deployment.

Six principles for production AI agents

Whether you’re deploying OpenClaw, building on another framework, or rolling your own—these are the principles that made our integration fast and safe.

  • 01 Composition over extraction. Run open-source tools whole. Integrate at the API layer. You get free upgrades, community security patches, and ecosystem access without maintaining a fork.
  • 02 Defense in depth. Tailscale + loopback binding + token auth + per-agent sandboxing + SOPS secrets + systemd hardening. No single layer is the entire defense.
  • 03 One gateway for all model calls. LiteLLM unifies cost tracking, budget enforcement, model routing, and observability across every agent and every system. No shadow AI spend.
  • 04 Durable over fire-and-forget. Temporal ensures multi-phase tasks survive crashes, require human approval, and report results asynchronously. Chat is not an execution engine.
  • 05 Shared memory, not shared sessions. Mem0 provides cross-system context without coupling. What one agent learns is searchable by every other agent, regardless of which system created the memory.
  • 06 Custom where it matters. We use community tools for channels (OpenClaw), orchestration (Temporal), observability (Langfuse), and model routing (LiteLLM). We build custom where security demands it: MCP servers and agent skills.

This is what we mean by production-grade. Not a demo. Not a pilot. A system that runs at 3am, handles failures gracefully, tracks every dollar of LLM spend, and gets better over time. The kind of system described in our Modern AI Application Stack blueprint—built for real operations, not slide decks.

And the reason we could ship it in a weekend is the same reason our clients can go from strategy to production in weeks instead of quarters: the Ultrathink Axon™ platform provides the battle-tested foundation so you skip months of foundational work and go straight to the problem that matters.

This is part of our series on building production-grade AI systems. For more, see The Modern AI Application Stack, AI Agents: Build vs. Buy vs. Partner, and The AI Execution Gap.

Sources

  • OpenClaw history and naming: CNBC, “From Clawdbot to Moltbot to OpenClaw,” February 2, 2026
  • Exposed instances: Bitsight, “OpenClaw Security: Risks of Exposed AI Agents,” February 2026; SecurityScorecard, Hunt.io CVE-2026-25253 analysis
  • CVE-2026-25253: NVD (CVSS 8.8), patched in OpenClaw v2026.1.29
  • Malicious ClawHub skills: Cyberdesserts security audit, February 2026
  • CVE-2025-6514 (mcp-remote): NVD (CVSS 9.6), 437K+ environments affected
  • Moltbook breach: 404 Media, January 31, 2026
  • $CLAWD token and impersonation campaign: Malwarebytes Threat Intelligence, January 2026; Yahoo Finance
  • Creator background: Pragmatic Engineer newsletter, “The Creator of Clawd”; Lex Fridman Podcast #491
  • OWASP MCP Top 10 and file operation analysis: OWASP Foundation, 2025

Ready to Close the Execution Gap?

Take the next step from insight to action.

No sales pitches. No buzzwords. Just a straightforward discussion about your challenges.