Claude Discord Bot Failure Diagnostic
claude-discord.service — root cause analysis — 2026-03-29
Executive Summary
4.6G
Peak Memory
+ 538M swap
22h
Time to Unresponsive
current run
10+
Bots in Server
CEO coordinates all
6h
Fix: Restart Cycle
RuntimeMaxSec
Root cause: Claude Code in
--channels mode accumulates every Discord message in its context window. With 10+ bots sending messages, the context fills within hours. The process stays alive but can no longer generate responses — effectively a zombie.
Service Architecture
Layer 0
systemd
claude-discord.service • Restart=always • RestartSec=10
Layer 1
Expect Wrapper
/root/claude-discord.sh • provides PTY • handles trust prompt
Layer 2
Claude Code CLI
claude --dangerously-skip-permissions --channels plugin:discord
Discord Plugin
bun server.ts • bot gateway
Telegram Plugin
bun server.ts • also loaded
Qdrant MCP
python3 • memory server
Context Growth — Failure Mode
T+0 — Fresh Start
~270M
ctx: ~3%
Healthy
T+6h — Active
~1.5G
ctx: ~40-60%
Slowing
T+12-22h — Critical
~3-4.5G
ctx: ~90-100%
Degraded
T+22h+ — Zombie
4.5G+
ctx: FULL
Unresponsive
The zombie state:
systemctl status shows active (running) — the process is alive in S (sleeping) state with 21 threads. It receives Discord messages ("Not addressed to me") but cannot generate new responses because the context window is full. No crash, no OOM kill, no error log. Just silence.
Crash History Timeline
Mar 22, 06:35
Initial Setup — Crash Loops
Service file misconfigured (missing PTY).
Error: Input must be provided. Fixed with expect wrapper.Mar 22, 06:49 → Mar 28, 08:39
6-Day Run (Longest)
Stable for 6 days. Memory grew to 4.6G peak + 538M swap. Eventually stopped responding. Exit code 143 (SIGTERM).
Mar 28, 08:39 → 09:21
Crash Loop — 6 Restarts in 42 min
Repeated start/stop cycles. Each attempt lasted 2-20 minutes. Memory 500-670M per attempt. All exits with
status=143/n/a.Mar 28, 09:21 → Mar 29, 07:13
Current Run — 22h, Now Unresponsive
Stabilized, ran 22h. Memory peaked at 4.5G, settled to 690M. Process alive in S state but not responding to Discord messages. Classic zombie state.
Memory Growth Pattern
Key Observation
Memory growth is not linear — it accelerates as bot-to-bot conversations generate longer context chains. The CEO bot processes messages from 10+ bots, each exchange adding to the context. Claude Code's auto-compaction helps temporarily (4.5G peak dropping to 690M current), but cannot prevent the context window from eventually filling completely.
The Fix — Three systemd Guards
Proactive Restart
RuntimeMaxSec=6h
Kills and restarts the bot every 6 hours, well before context can fill. Combined with
Restart=always, the bot comes right back with a fresh context window. Zero manual intervention.Hard Memory Cap
MemoryMax=3G
Safety net: if memory exceeds 3G for any reason, systemd OOM-kills the process. Prevents the 4.6G peaks and swap thrashing seen in the 6-day run. Restart=always brings it back.
Memory Pressure
MemoryHigh=2G
Soft limit: above 2G, the kernel applies reclaim pressure to slow memory growth. Buys more time before hitting the 3G hard cap. Acts as an early warning brake.
/etc/systemd/system/claude-discord.service
[Service] Type=simple ExecStart=/root/claude-discord.sh Restart=always RestartSec=10 # Proactive restart before context/memory bloat makes the bot unresponsive RuntimeMaxSec=6h MemoryMax=3G MemoryHigh=2G
Before vs After Lifecycle
Before
After
1
Service starts fresh
2
Runs for hours/days
3
Context fills, memory hits
4.5G+4
Bot goes unresponsive (zombie)
5
Manual restart needed
Downtime: unbounded (until human notices)
1
Service starts fresh
2
Runs for
6 hours max3
RuntimeMaxSec triggers clean stop4
Restart=always brings it back in 10s5
Fresh context, cycle repeats forever
Downtime: ~10 seconds every 6 hours (self-healing)