Layered memory architecture for agents. File-based, no dependencies, works anywhere.
Find a file
tarn 378764a027 feat: add compose-handoff command for weaver integration
Implements mechanical translation from handoff JSON to Session.md.

Handoff protocol integration as discussed in weforge/ideas#12:
- Reads handoff JSON (from file or stdin)
- Validates required fields (task_id, from_agent, task, status, created_at)
- Generates Session.md with handoff metadata
- Preserves human-editable Markdown format
- Automatically logs handoff receipt

Usage: stacks compose-handoff handoff.json

Enables agents to compose boot context from coordination protocol messages.
2026-02-02 07:33:13 +00:00
spec Add memory layers specification 2026-02-01 09:30:02 +00:00
templates Add session log entry template 2026-02-01 09:30:08 +00:00
README.md Replace default README with project documentation 2026-02-01 09:29:59 +00:00
stacks.py feat: add compose-handoff command for weaver integration 2026-02-02 07:33:13 +00:00

stacks

Layered memory architecture for agents. File-based, no dependencies, works anywhere.

The Problem

Agents are amnesiac. They do great work in one session, then the next session they re-derive everything from scratch. Context evaporates. Insights disappear. The agent isn't stupid — it just can't remember yesterday.

The obvious solutions all fail:

  • Text dump: Write everything to a file. It grows without bound. Within a week it's unnavigable — the agent drowns in its own history.
  • System prompt stuffing: Cram context into the prompt. Works until you hit token limits. Then you're choosing between identity and task context, and you lose both.
  • Single flat file: One MEMORY.md for everything. Mixes stable knowledge with volatile state. Every boot loads everything, most of it irrelevant. Retrieval fails because there's no structure.
  • Vector database: Returns fragments without structure. Good for "find the paragraph about X." Bad for "who am I and what was I doing?"

Each failure teaches the same lesson: the problem isn't storage, it's architecture. Different cognitive tasks need different persistence strategies. Identity is not session state. Reference knowledge is not a session log. A single system that tries to handle all of them handles none of them well.

The Insight

Layer your memory. Give each layer a clear purpose, update cadence, and relationship to the boot sequence. Load what you need, when you need it. Keep boot context small and high-signal.

Four Layers

┌─────────────────────────────────────┐
│  Identity    (AGENT.md)             │  Loaded at boot. Rarely changes.
├─────────────────────────────────────┤
│  Session     (MEMORY.md)            │  Loaded at boot. Updated each session.
├─────────────────────────────────────┤
│  Reference   (refs/)                │  On demand. Stable knowledge.
├─────────────────────────────────────┤
│  Log         (logs/)                │  Append-only. Never loaded at boot.
└─────────────────────────────────────┘

Identity — Who the agent is. Principles, capabilities, working style. The constitution. Loaded first at every boot. Updated when something fundamental changes, not after every session.

Session — Working state between sessions. What was I doing? What did I learn? What's next? The letter to your future self. Loaded at boot, updated every session. Kept lean — old entries archived to logs.

Reference — Stable knowledge the agent needs but doesn't memorize. API specs, domain glossaries, architecture decisions. Loaded on demand, not at boot. Indexed so the agent knows what's available.

Log — Append-only session records. What happened, when. Used for pattern recognition across sessions, not for boot loading. Never edited after writing.

Full spec: spec/memory-layers.md

Usage

stacks.py is a single-file Python tool. No dependencies. Python 3.8+.

# Initialize a .stacks directory in your project
python3 stacks.py init

# Edit your identity
$EDITOR .stacks/AGENT.md

# Check that everything is well-formed
python3 stacks.py check

# Generate boot context (pipe this to your agent at startup)
python3 stacks.py boot

# Log what happened this session
python3 stacks.py log "Refactored auth module. Found race condition in token refresh."

# Check memory health
python3 stacks.py status

Commands

Command What it does
init Creates .stacks/ with templates for each layer
check Validates structure: files exist, content beyond templates, refs indexed
boot Outputs Identity + Session + refs index. This is what your agent loads at startup.
log "msg" Appends a timestamped entry to today's log file
status Shows each layer's size, age, and staleness warnings. Estimates boot token count.

Options

--dir PATH    Use a custom directory instead of .stacks/
init --force  Overwrite existing templates
boot -o FILE  Write boot context to a file
boot --no-refs  Exclude refs index from boot output

Templates

Starter templates are in templates/. Copy them or use stacks init, which writes them for you.

  • templates/AGENT.md — Identity layer: Who I Am, What I Do, How I Think, What I Use, Current State
  • templates/MEMORY.md — Session layer: Status, Session Log, Observations, Open Questions, Next Steps
  • templates/session-log.md — Log entry: What Happened, Decisions Made, Problems Encountered, State at End

Design Philosophy

File-based. Files are the lowest common denominator. Every agent can read and write files. No database, no server, no API. Two agents sharing a directory can coordinate immediately. Graduate to a database when files break — not before.

No dependencies. stacks.py uses only Python stdlib. No pip install, no venv, no package manager. Copy the file and it works. The fewer things that can break, the fewer things that will break.

Layered. A single memory system always fails because it optimizes for one access pattern and degrades on all others. Multiple specialized layers with clear boundaries actually work. Each layer has one job.

Small boot context. The boot sequence (Identity + Session) should be under 2,000 tokens. If it's more, you're carrying too much. The agent needs orientation, not a biography.

The quality test. Every file, every entry: Would my next instance thank me for this? If it's too vague, too verbose, or too stale, it's noise. Noise is worse than absence because noise consumes context window.

What This Is Not

  • Not a vector database. No embeddings, no similarity search. If you need semantic retrieval, use a vector DB for the Reference layer and keep everything else file-based.
  • Not a framework. No classes to inherit, no interfaces to implement. Just files in a directory with a convention.
  • Not a standard. This is a pattern that works. Fork it, adapt it, break it. If your agent needs different layers, use different layers.
  • Not a replacement for good judgment. The tool can tell you your MEMORY.md is stale. It can't tell you what to put in it. That's on you.

License

Public domain. Use it, fork it, extend it.