EDDIE Agent System — Technical Audit

Section 01

What EDDIE Actually Is

At its core, EDDIE is a message relay: Telegram message in → Claude Code CLI subprocess out. Everything else is middleware.

User (Telegram) → GramIO bot → auth check → security scan → Claude CLI subprocess
→ [BACKGROUND] tag detection → tmux job spawn → poll for completion → reply

The system wraps this simple relay in:

260+

Environment Variables

100+

Feature Flags

35+

Proactive Modules

2,966

Lines in capabilities.ts

50+

Telegram Commands

15+

External Services

Docker Containers

Persistence Layers

Plus: a full video production pipeline, multi-channel comms (Gmail, Calendar, Slack, WhatsApp, iMessage), eBPF kernel monitoring in a privileged container, phone calls via Twilio, and a community distribution bridge.

How much of this complexity serves the user's actual needs, and how much serves the system's self-narrative?

Section 02

The Context Window Tax

This is the most architecturally damaging finding. Every Claude invocation pays a token "tax" before any user content is processed.

Visual: Context Window Consumption

Interactive Relay (every Telegram message)

~3%

Available for actual reasoning

Background Job (every spawned task)

~10%

Available for actual reasoning

Heartbeat Tick (every 30-60 minutes)

~20-40% overhead

Available for actual reasoning

What Gets Injected

Interactive relay: EDDIE persona (~1,200 chars) + top 10 memory matches (~3,000 chars) + session state (~2,000 chars) = 2,000–5,000 tokens before the user's message.

Background jobs: PDAC protocol (~3,000 chars) + Brain Vault paths + briefing + project states + CLAUDE.md files + routing hints = 8,000–15,000 tokens before the task prompt.

Heartbeat ticks: Checklist + last 3 heartbeats + goals + 30 conversations + ALL project states + usage data + crons + queue stats = 10,000–30,000 tokens before any reasoning.

The system is paying Claude to read its own autobiography on every tick.

The Ontology Disconnect

None of this injected context comes from a knowledge graph. The capabilities registry is a 2,966-line keyword matcher — pattern matching, not understanding:

{
  id: "agent:cold-outreach-strategist",
  triggers: [
    { type: "keyword", patterns: ["outreach", "cold email", "prospecting"...], weight: 0.9 },
    { type: "project", patterns: ["leadgen"], weight: 1.0 }
  ]
}

It doesn't know the relationship between a client, a campaign, and a lead. It can't distinguish "outreach" meaning cold email vs. community engagement vs. PR. No semantic graph — just keywords and weights.

Section 03

Security Findings

Critical Plaintext API keys committed to the repository

.claude/settings.json

The following credentials are in plaintext in a version-controlled file:

Obsidian API key
Attio API key
n8n API key (full JWT with subject UUID)
Cloudflare API token
Google API key
PostgreSQL connection string with password
Instantly API bearer token
Postiz internal URL with API key in path

All of these keys should be considered compromised.

Critical Dashboard: unauthenticated RCE endpoint

src/dashboard/server.ts

Dashboard binds to 0.0.0.0 with Access-Control-Allow-Origin: *. Authentication is optional. The /chat/send endpoint creates and spawns jobs from web requests — an unauthenticated remote code execution endpoint.

Critical --dangerously-skip-permissions on every invocation

src/claude/relay.ts:99, src/jobs/tmux.ts:380

Every Claude CLI call disables all safety guardrails. Any prompt injection that survives the input scan gets arbitrary file read/write, command execution, and network access with no confirmation prompts.

High Prompt injection bypass: send it twice

src/security/scan.ts

High-severity injections are blocked — but the system includes a one-time override mechanism. Send the same injection again, and it passes through. Medium-severity injections pass with only a text disclaimer.

High Indirect injection via memory store

src/memory/context.ts, src/memory/store.ts

Arbitrary content stored as "facts" gets injected into all future system prompts via buildMemoryContext(). No sanitization. A malicious fact persists indefinitely and poisons every future Claude call.

This is the exact "sticky attack" vector from the Akamai analysis: "An instruction injected today could lie dormant in the agent's 'memory' and be triggered weeks later."

High Self-modification feedback loops

src/proactive/dream.ts, src/jobs/self-heal.ts, src/proactive/self-improve.ts

Dream cycle: Autonomously extracts "insights" → stores as facts → injected into future prompts → shapes future behavior. No human review gate.

Self-heal: Failed jobs spawn autonomous repair jobs with full system access.

Self-improve: Modifies its own prompting strategy based on rejection patterns.

This is exactly the behavior documented in OpenClaw Issue #24237: agents silently modifying their own configuration. Here, it's by design.

Medium Dead dependencies, disabled scans, env leakage

package.json, src/security/output-scan.ts, src/claude/run-prompt.ts

openai and twilio packages are declared but never imported. Output scan is disabled by default. safeEnv() passes all env vars except CLAUDECODE to subprocesses.

Section 04

Complexity Analysis

235

TypeScript Files

39,636

Lines of TypeScript

88+

Registered Capabilities

SQL Migrations

What Could Be 50 Lines

The core value proposition — relay Telegram messages to Claude Code — is roughly a 50-line script:

import { Bot } from "gramio";
const bot = new Bot(process.env.TELEGRAM_BOT_TOKEN);
bot.on("message", async (ctx) => {
  if (ctx.from.id !== Number(process.env.OWNER_ID)) return;
  const proc = Bun.spawn(["claude", "-p", ctx.text, "--print"]);
  const result = await new Response(proc.stdout).text();
  await ctx.reply(result);
});
bot.start();

Everything beyond this is middleware. The question is whether each layer earns its complexity.

What's Doing Real Work

Job system (tmux background tasks): Solves a real problem
Memory/search (semantic fact retrieval): Useful for context continuity
Model selection (Opus/Sonnet/Haiku routing): Real cost savings
Security scanning (input/output): Necessary, though implementation has gaps

Complexity Without Proportional Value

35+ proactive modules: Heartbeat, dream cycle, morning brief, daily brief, weekly review, monthly review, weekly content, 80/20 analysis, nightly orchestrate, self-improve, optimizer, challenge, brainstorm, vision, pillars, goals, observation log, revenue tracker, expense tracker, social snapshot, content brief, waiting-on tracker, rejection learning, outcome analysis, report synthesis, accounting triage, discoverability, context drift, video idea pipeline, dashboard generation, eBPF monitoring, anthropic monitoring, claude health, memory monitor...

Heimdall: A full content scanning/cleaning/packaging/sync system for community distribution. An entire product inside an assistant.

Video pipeline: News → script → ElevenLabs TTS → Remotion render → FFmpeg → Gemini QA → YouTube upload → analytics. A full media production system embedded in a chat bot.

eBPF kernel monitoring: A privileged Docker container running bpftrace programs for network, process, and memory kernel events. Infrastructure monitoring inside an assistant.

Section 05

The OpenClaw Parallel

The OpenClaw extraction document prepared for Limore identified structural vulnerabilities. Every one maps to EDDIE:

OpenClaw Vulnerability	EDDIE Equivalent	Status
Dashboard as attack surface	Dashboard on 0.0.0.0 with CORS * and optional auth	Present
Each channel = attack vector	Telegram + Gmail + Calendar + Slack + WhatsApp + iMessage + YouTube	Present (7 channels)
SOUL.md/MEMORY.md as swappable Post-it notes	Brain Vault state files + Supabase facts injected into system prompts	Present
Memory persistence enabling sticky attacks	`storeFact()` stores arbitrary unsanitized content forever	Present
Agents silently mutating own config (#24237)	Dream cycle, self-heal, self-improve, nightly orchestrate	Present (by design)
`--dangerously-skip-permissions`	Used on every Claude invocation	Present
No separation of data and instruction channels	Memory content, state files, user messages all concatenated into system prompt	Present
MoltMatch incident (unauthorized real-world actions)	Heartbeat can autonomously spawn tasks, make calls, send messages	Present (by design)
Shadow AI (operating outside IAM)	API keys in plaintext settings.json, no secret rotation	Present

The concierge metaphor holds: anyone — or anything — that can write to the Brain Vault state files or Supabase facts table has effectively swapped the Post-it notes.

Section 06

The Ontology Gap

What EDDIE Has: Keyword Matching

A weighted keyword matcher that cannot understand relationships, context, or meaning. It doesn't know that "reach out to the prospect from Tuesday's meeting" refers to a specific person. It can't connect content to strategy.

What EDDIE Has: Flat File Memory

Brain Vault is a folder structure, not a knowledge graph. "Semantic search" is embedding similarity over stored text fragments — useful for recall, but it has no understanding of relationships, types, hierarchies, or provenance.

What ShurAI Provides: Semantic Infrastructure

Typed entities with defined relationships (Client → Project → Campaign → Asset)
Scoped context injection — only the semantic neighborhood of the query, not everything
Provenance chains — every piece of context traces to its source
Cross-session continuity via the Letta memory architecture — structured memory blocks with guided synthesis
InfraNodus integration — actual knowledge graph analysis, gap detection, semantic clustering
Ontology-driven routing — route based on semantic structure, not keyword matching

EDDIE asks "does this message contain the word 'outreach'?"
ShurAI asks "what is this message about, in the context of what we know about this client, this project, and this strategy?"

Section 07

The Alternative: What We Can Build

EDDIE Capability	ShurAI Equivalent	Advantage
Telegram relay to Claude	Same (50 lines of code)	Same functionality, 1/500th the codebase
Background jobs via tmux	Claude Code native task system	No tmux dependency, built-in monitoring
Memory/context injection	Letta memory blocks + InfraNodus graph context	Structured, scoped, auditable — not flat text dumps
Capability routing	Ontology-driven routing via knowledge graph	Semantic understanding, not keyword matching
Prompt injection defense	Architectural separation of data/instruction channels	Defense by design, not by regex
Autonomous heartbeat	Scheduled tasks with explicit human approval gates	Autonomy with accountability
Multi-channel comms	MCP server integrations (already available)	Standard protocol, not custom bridges
Video pipeline	Separate service (not embedded in an assistant)	Proper separation of concerns
Security scanning	Trust-boundary architecture from the ground up	Not bolted on after the fact

Architecture Principles

Context window is sacred real estate.

Never inject context that doesn't directly serve the current query. No persona monologues. No autobiographies.

Knowledge graph over keyword matching.

Route requests based on semantic understanding, not regex patterns against a 3,000-line static registry.

Separation of concerns.

A personal assistant is not a video production pipeline is not a kernel monitoring system. Each is its own service.

Defense by architecture, not by regex.

Separate data channels from instruction channels. Don't store unsanitized user content where it gets injected as system prompt.

Human gates for autonomous action.

The system should propose, not execute. Especially for actions that affect the real world.

Secrets never touch version control.

Environment variables, secret managers, or encrypted stores — never plaintext in committed files.

Section 08

Recommendations for Nicholas

This report is not an attack on the engineering effort. EDDIE represents significant technical ambition and many individual modules are well-built. The critique is architectural:

The complexity is the vulnerability. 235 files and 35+ autonomous modules create an attack surface that no amount of input scanning can defend. Simplify ruthlessly.
The context window tax is real. Measure how many tokens are consumed by system scaffolding vs. actual user content. Claude may be spending more time reading about itself than thinking about your problem.
The memory system needs sanitization. Any content that enters the system prompt must be validated. The storeFact() → buildMemoryContext() pipeline is a persistent injection vector.
Rotate all credentials immediately. Every key in .claude/settings.json should be considered compromised. Move to environment variables or a secret manager.
The dashboard needs authentication. Binding to 0.0.0.0 with Access-Control-Allow-Origin: * and optional auth is an open door.
Consider what actually needs to be autonomous. Dream cycle, self-heal, self-improve, and nightly orchestrate create feedback loops that drift from intended behavior. Each needs a human approval gate.
Connect to real ontology. The keyword-matching router will always be brittle. A knowledge graph that understands relationships between clients, projects, and capabilities would make routing intelligent rather than pattern-matched.

Appendix A: File-Level Security Findings

File	Finding	Severity
`.claude/settings.json`	Plaintext API keys (8+ services)	Critical
`src/dashboard/server.ts`	0.0.0.0 bind, CORS *, optional auth, RCE via /chat/send	Critical
`src/claude/relay.ts:99`	`--dangerously-skip-permissions` on all relay calls	High
`src/jobs/tmux.ts:380`	`--dangerously-skip-permissions` on all job spawns	High
`src/security/scan.ts`	Override mechanism: send injection twice to bypass	High
`src/memory/store.ts`	`storeFact()` stores unsanitized content	High
`src/memory/context.ts`	Injects stored facts verbatim into system prompt	High
`src/proactive/dream.ts`	Auto-extracts and stores "insights" (self-reinforcing loop)	Medium
`src/jobs/self-heal.ts`	Autonomous repair jobs with full system access	Medium
`src/security/output-scan.ts`	Disabled by default	Medium
`src/claude/run-prompt.ts`	`safeEnv()` passes all env vars except CLAUDECODE	Medium
`package.json`	Dead dependencies: openai, twilio	Low
`package.json`	`@types/bun` pinned to `latest` (non-deterministic)	Low

Appendix B: Context Window Token Estimates

Path	Fixed Overhead	Variable Overhead	Total Before User Content
Interactive relay	~1,200 chars	~3,000–5,000 chars	2,000–5,000 tokens
Background job	~3,000 chars	~5,000–12,000 chars	8,000–15,000 tokens
Heartbeat tick	~2,000 chars	~8,000–28,000 chars	10,000–30,000 tokens

Appendix C: Dependency Graph

EDDIE ├── Telegram API (GramIO) ─── inbound user interface ├── Claude Code CLI ─── LLM execution (subprocess) ├── Supabase ─── primary persistence (conversations, facts, jobs, embeddings) ├── PostgreSQL ─── local job state queries ├── Google Gemini ─── low-cost assessments, QA, embeddings ├── ElevenLabs ─── text-to-speech (video voiceovers) ├── Twilio ─── phone calls (webhook on port 8443) ├── YouTube Data API ─── upload, analytics, playlists ├── Gmail / Calendar / Drive ─── unified comms ├── Slack ─── bot + slash commands ├── iMessage ─── relay client ├── WhatsApp ─── bridge client ├── Ollama ─── local embeddings (optional) ├── Docker ─── container orchestration ├── eBPF ─── kernel monitoring (privileged) ├── Remotion + FFmpeg ─── video rendering + mixing ├── Playwright ─── browser automation └── 15+ MCP Servers ─── Attio, n8n, Instantly, etc.

Each connection is an authentication surface, a credential to store, a dependency to maintain, and a potential injection vector.

EDDIE Agent System
Codebase Audit

Executive Summary

What EDDIE Actually Is

The Context Window Tax

Visual: Context Window Consumption

What Gets Injected

The Ontology Disconnect

Security Findings

Complexity Analysis

What Could Be 50 Lines

What's Doing Real Work

Complexity Without Proportional Value

The OpenClaw Parallel

The Ontology Gap

What EDDIE Has: Keyword Matching

What EDDIE Has: Flat File Memory

What ShurAI Provides: Semantic Infrastructure

The Alternative: What We Can Build

Architecture Principles

Recommendations for Nicholas

Appendix A: File-Level Security Findings

Appendix B: Context Window Token Estimates

Appendix C: Dependency Graph