ShurAI / Shur Creative Partners Technical Review

EDDIE Agent System
Codebase Audit

A comprehensive architecture, security, and ontology analysis of the E.D.D.I.E. autonomous agent system — with recommendations for integration into the ShurAI semantic infrastructure.

Date
March 9, 2026
Prepared for
Jonny Dubowsky
Re: Discussion with
Nicholas Crabill
Scope
Full codebase audit

Executive Summary

EDDIE is a 235-file, 39,636-line TypeScript monolith that wraps Claude Code CLI in a Telegram relay with 35+ autonomous subsystems. It is ambitious, well-intentioned engineering — but it reproduces every structural vulnerability documented in the OpenClaw security literature while adding layers of complexity that actively degrade the AI's reasoning capacity.

The core thesis: EDDIE spends more tokens describing itself to Claude than it spends on the user's actual problem. The system is a context window tax — and the tax funds no ontology, no knowledge graph, and no integration with the actual client intelligence system (ShurAI) that gives the work meaning.

Section 01

What EDDIE Actually Is

At its core, EDDIE is a message relay: Telegram message in → Claude Code CLI subprocess out. Everything else is middleware.

User (Telegram) GramIO bot auth check security scan Claude CLI subprocess
   [BACKGROUND] tag detection tmux job spawn poll for completion reply

The system wraps this simple relay in:

260+
Environment Variables
100+
Feature Flags
35+
Proactive Modules
2,966
Lines in capabilities.ts
50+
Telegram Commands
15+
External Services
3
Docker Containers
4
Persistence Layers

Plus: a full video production pipeline, multi-channel comms (Gmail, Calendar, Slack, WhatsApp, iMessage), eBPF kernel monitoring in a privileged container, phone calls via Twilio, and a community distribution bridge.

How much of this complexity serves the user's actual needs, and how much serves the system's self-narrative?

Section 02

The Context Window Tax

This is the most architecturally damaging finding. Every Claude invocation pays a token "tax" before any user content is processed.

Visual: Context Window Consumption

Interactive Relay (every Telegram message)
~3%
Available for actual reasoning
Background Job (every spawned task)
~10%
Available for actual reasoning
Heartbeat Tick (every 30-60 minutes)
~20-40% overhead
Available for actual reasoning

What Gets Injected

Interactive relay: EDDIE persona (~1,200 chars) + top 10 memory matches (~3,000 chars) + session state (~2,000 chars) = 2,000–5,000 tokens before the user's message.

Background jobs: PDAC protocol (~3,000 chars) + Brain Vault paths + briefing + project states + CLAUDE.md files + routing hints = 8,000–15,000 tokens before the task prompt.

Heartbeat ticks: Checklist + last 3 heartbeats + goals + 30 conversations + ALL project states + usage data + crons + queue stats = 10,000–30,000 tokens before any reasoning.

The system is paying Claude to read its own autobiography on every tick.

The Ontology Disconnect

None of this injected context comes from a knowledge graph. The capabilities registry is a 2,966-line keyword matcher — pattern matching, not understanding:

{
  id: "agent:cold-outreach-strategist",
  triggers: [
    { type: "keyword", patterns: ["outreach", "cold email", "prospecting"...], weight: 0.9 },
    { type: "project", patterns: ["leadgen"], weight: 1.0 }
  ]
}

It doesn't know the relationship between a client, a campaign, and a lead. It can't distinguish "outreach" meaning cold email vs. community engagement vs. PR. No semantic graph — just keywords and weights.

Section 03

Security Findings

Critical Plaintext API keys committed to the repository
.claude/settings.json

The following credentials are in plaintext in a version-controlled file:

  • Obsidian API key
  • Attio API key
  • n8n API key (full JWT with subject UUID)
  • Cloudflare API token
  • Google API key
  • PostgreSQL connection string with password
  • Instantly API bearer token
  • Postiz internal URL with API key in path

All of these keys should be considered compromised.

Critical Dashboard: unauthenticated RCE endpoint
src/dashboard/server.ts

Dashboard binds to 0.0.0.0 with Access-Control-Allow-Origin: *. Authentication is optional. The /chat/send endpoint creates and spawns jobs from web requests — an unauthenticated remote code execution endpoint.

Critical --dangerously-skip-permissions on every invocation
src/claude/relay.ts:99, src/jobs/tmux.ts:380

Every Claude CLI call disables all safety guardrails. Any prompt injection that survives the input scan gets arbitrary file read/write, command execution, and network access with no confirmation prompts.

High Prompt injection bypass: send it twice
src/security/scan.ts

High-severity injections are blocked — but the system includes a one-time override mechanism. Send the same injection again, and it passes through. Medium-severity injections pass with only a text disclaimer.

High Indirect injection via memory store
src/memory/context.ts, src/memory/store.ts

Arbitrary content stored as "facts" gets injected into all future system prompts via buildMemoryContext(). No sanitization. A malicious fact persists indefinitely and poisons every future Claude call.

This is the exact "sticky attack" vector from the Akamai analysis: "An instruction injected today could lie dormant in the agent's 'memory' and be triggered weeks later."

High Self-modification feedback loops
src/proactive/dream.ts, src/jobs/self-heal.ts, src/proactive/self-improve.ts

Dream cycle: Autonomously extracts "insights" → stores as facts → injected into future prompts → shapes future behavior. No human review gate.

Self-heal: Failed jobs spawn autonomous repair jobs with full system access.

Self-improve: Modifies its own prompting strategy based on rejection patterns.

This is exactly the behavior documented in OpenClaw Issue #24237: agents silently modifying their own configuration. Here, it's by design.

Medium Dead dependencies, disabled scans, env leakage
package.json, src/security/output-scan.ts, src/claude/run-prompt.ts

openai and twilio packages are declared but never imported. Output scan is disabled by default. safeEnv() passes all env vars except CLAUDECODE to subprocesses.

Section 04

Complexity Analysis

235
TypeScript Files
39,636
Lines of TypeScript
88+
Registered Capabilities
12
SQL Migrations

What Could Be 50 Lines

The core value proposition — relay Telegram messages to Claude Code — is roughly a 50-line script:

import { Bot } from "gramio";
const bot = new Bot(process.env.TELEGRAM_BOT_TOKEN);
bot.on("message", async (ctx) => {
  if (ctx.from.id !== Number(process.env.OWNER_ID)) return;
  const proc = Bun.spawn(["claude", "-p", ctx.text, "--print"]);
  const result = await new Response(proc.stdout).text();
  await ctx.reply(result);
});
bot.start();

Everything beyond this is middleware. The question is whether each layer earns its complexity.

What's Doing Real Work

Complexity Without Proportional Value

35+ proactive modules: Heartbeat, dream cycle, morning brief, daily brief, weekly review, monthly review, weekly content, 80/20 analysis, nightly orchestrate, self-improve, optimizer, challenge, brainstorm, vision, pillars, goals, observation log, revenue tracker, expense tracker, social snapshot, content brief, waiting-on tracker, rejection learning, outcome analysis, report synthesis, accounting triage, discoverability, context drift, video idea pipeline, dashboard generation, eBPF monitoring, anthropic monitoring, claude health, memory monitor...

Heimdall: A full content scanning/cleaning/packaging/sync system for community distribution. An entire product inside an assistant.

Video pipeline: News → script → ElevenLabs TTS → Remotion render → FFmpeg → Gemini QA → YouTube upload → analytics. A full media production system embedded in a chat bot.

eBPF kernel monitoring: A privileged Docker container running bpftrace programs for network, process, and memory kernel events. Infrastructure monitoring inside an assistant.

Section 05

The OpenClaw Parallel

The OpenClaw extraction document prepared for Limore identified structural vulnerabilities. Every one maps to EDDIE:

OpenClaw Vulnerability EDDIE Equivalent Status
Dashboard as attack surfaceDashboard on 0.0.0.0 with CORS * and optional authPresent
Each channel = attack vectorTelegram + Gmail + Calendar + Slack + WhatsApp + iMessage + YouTubePresent (7 channels)
SOUL.md/MEMORY.md as swappable Post-it notesBrain Vault state files + Supabase facts injected into system promptsPresent
Memory persistence enabling sticky attacksstoreFact() stores arbitrary unsanitized content foreverPresent
Agents silently mutating own config (#24237)Dream cycle, self-heal, self-improve, nightly orchestratePresent (by design)
--dangerously-skip-permissionsUsed on every Claude invocationPresent
No separation of data and instruction channelsMemory content, state files, user messages all concatenated into system promptPresent
MoltMatch incident (unauthorized real-world actions)Heartbeat can autonomously spawn tasks, make calls, send messagesPresent (by design)
Shadow AI (operating outside IAM)API keys in plaintext settings.json, no secret rotationPresent

The concierge metaphor holds: anyone — or anything — that can write to the Brain Vault state files or Supabase facts table has effectively swapped the Post-it notes.

Section 06

The Ontology Gap

What EDDIE Has: Keyword Matching

A weighted keyword matcher that cannot understand relationships, context, or meaning. It doesn't know that "reach out to the prospect from Tuesday's meeting" refers to a specific person. It can't connect content to strategy.

What EDDIE Has: Flat File Memory

Brain Vault is a folder structure, not a knowledge graph. "Semantic search" is embedding similarity over stored text fragments — useful for recall, but it has no understanding of relationships, types, hierarchies, or provenance.

What ShurAI Provides: Semantic Infrastructure

EDDIE asks "does this message contain the word 'outreach'?"
ShurAI asks "what is this message about, in the context of what we know about this client, this project, and this strategy?"

Section 07

The Alternative: What We Can Build

EDDIE Capability ShurAI Equivalent Advantage
Telegram relay to ClaudeSame (50 lines of code)Same functionality, 1/500th the codebase
Background jobs via tmuxClaude Code native task systemNo tmux dependency, built-in monitoring
Memory/context injectionLetta memory blocks + InfraNodus graph contextStructured, scoped, auditable — not flat text dumps
Capability routingOntology-driven routing via knowledge graphSemantic understanding, not keyword matching
Prompt injection defenseArchitectural separation of data/instruction channelsDefense by design, not by regex
Autonomous heartbeatScheduled tasks with explicit human approval gatesAutonomy with accountability
Multi-channel commsMCP server integrations (already available)Standard protocol, not custom bridges
Video pipelineSeparate service (not embedded in an assistant)Proper separation of concerns
Security scanningTrust-boundary architecture from the ground upNot bolted on after the fact

Architecture Principles

Context window is sacred real estate.

Never inject context that doesn't directly serve the current query. No persona monologues. No autobiographies.

Knowledge graph over keyword matching.

Route requests based on semantic understanding, not regex patterns against a 3,000-line static registry.

Separation of concerns.

A personal assistant is not a video production pipeline is not a kernel monitoring system. Each is its own service.

Defense by architecture, not by regex.

Separate data channels from instruction channels. Don't store unsanitized user content where it gets injected as system prompt.

Human gates for autonomous action.

The system should propose, not execute. Especially for actions that affect the real world.

Secrets never touch version control.

Environment variables, secret managers, or encrypted stores — never plaintext in committed files.

Section 08

Recommendations for Nicholas

This report is not an attack on the engineering effort. EDDIE represents significant technical ambition and many individual modules are well-built. The critique is architectural:

  1. The complexity is the vulnerability. 235 files and 35+ autonomous modules create an attack surface that no amount of input scanning can defend. Simplify ruthlessly.
  2. The context window tax is real. Measure how many tokens are consumed by system scaffolding vs. actual user content. Claude may be spending more time reading about itself than thinking about your problem.
  3. The memory system needs sanitization. Any content that enters the system prompt must be validated. The storeFact()buildMemoryContext() pipeline is a persistent injection vector.
  4. Rotate all credentials immediately. Every key in .claude/settings.json should be considered compromised. Move to environment variables or a secret manager.
  5. The dashboard needs authentication. Binding to 0.0.0.0 with Access-Control-Allow-Origin: * and optional auth is an open door.
  6. Consider what actually needs to be autonomous. Dream cycle, self-heal, self-improve, and nightly orchestrate create feedback loops that drift from intended behavior. Each needs a human approval gate.
  7. Connect to real ontology. The keyword-matching router will always be brittle. A knowledge graph that understands relationships between clients, projects, and capabilities would make routing intelligent rather than pattern-matched.

Appendix A: File-Level Security Findings

FileFindingSeverity
.claude/settings.jsonPlaintext API keys (8+ services)Critical
src/dashboard/server.ts0.0.0.0 bind, CORS *, optional auth, RCE via /chat/sendCritical
src/claude/relay.ts:99--dangerously-skip-permissions on all relay callsHigh
src/jobs/tmux.ts:380--dangerously-skip-permissions on all job spawnsHigh
src/security/scan.tsOverride mechanism: send injection twice to bypassHigh
src/memory/store.tsstoreFact() stores unsanitized contentHigh
src/memory/context.tsInjects stored facts verbatim into system promptHigh
src/proactive/dream.tsAuto-extracts and stores "insights" (self-reinforcing loop)Medium
src/jobs/self-heal.tsAutonomous repair jobs with full system accessMedium
src/security/output-scan.tsDisabled by defaultMedium
src/claude/run-prompt.tssafeEnv() passes all env vars except CLAUDECODEMedium
package.jsonDead dependencies: openai, twilioLow
package.json@types/bun pinned to latest (non-deterministic)Low

Appendix B: Context Window Token Estimates

PathFixed OverheadVariable OverheadTotal Before User Content
Interactive relay~1,200 chars~3,000–5,000 chars2,000–5,000 tokens
Background job~3,000 chars~5,000–12,000 chars8,000–15,000 tokens
Heartbeat tick~2,000 chars~8,000–28,000 chars10,000–30,000 tokens

Appendix C: Dependency Graph

EDDIE ├── Telegram API (GramIO) ─── inbound user interface ├── Claude Code CLI ─── LLM execution (subprocess) ├── Supabase ─── primary persistence (conversations, facts, jobs, embeddings) ├── PostgreSQL ─── local job state queries ├── Google Gemini ─── low-cost assessments, QA, embeddings ├── ElevenLabs ─── text-to-speech (video voiceovers) ├── Twilio ─── phone calls (webhook on port 8443) ├── YouTube Data API ─── upload, analytics, playlists ├── Gmail / Calendar / Drive ─── unified comms ├── Slack ─── bot + slash commands ├── iMessage ─── relay client ├── WhatsApp ─── bridge client ├── Ollama ─── local embeddings (optional) ├── Docker ─── container orchestration ├── eBPF ─── kernel monitoring (privileged) ├── Remotion + FFmpeg ─── video rendering + mixing ├── Playwright ─── browser automation └── 15+ MCP Servers ─── Attio, n8n, Instantly, etc.

Each connection is an authentication surface, a credential to store, a dependency to maintain, and a potential injection vector.