Context Engineering · ~7 min · the density capstone

Signal Per Token

Compression isn't "write less." It's removing words that carry no meaning — and the curve that punishes you for stopping halfway.

Why this, for you: the last lever in the instruction-file set. After position, discoverability, layering, and caching, density is what's left: how much guidance per token in the lines you keep. Pure authoring craft (priority #3) with a direct daily-coding payoff — denser instructions get followed.

Verbose instructions don't improve accuracy — they raise the odds that important rules get buried and skipped. "Bloated CLAUDE.md files cause Claude to ignore your actual instructions" (Anthropic).

Compression removes words, not meaning. The test, applied to every line: "Can I remove a word — or this whole sentence — without losing a constraint?" If yes, cut it.

✗ Verbose — 4 sentences

It is very important that you
make sure all code changes are
thoroughly tested before
submitting. You should always
write unit tests… Please do not
submit code that is untested.

✓ Compressed — 3 rules

## Testing
- Write unit tests for every
  function added or modified
- Cover edge cases
- Do not submit untested code

Same constraints · 60% fewer tokens

Technique	Move
Tables over prose	Structured rows carry contrast with zero explanation overhead
Bullets over sentences	One idea per line, no transitional language
Rules over explanations	State the rule; explain only when compliance needs the reason
Negative constraints	"Never X" is cheaper and harder to misread than enumerating valid options
Front-load	Highest-damage-if-broken rules first (Lesson 02)

Two edges that make "shorter" dangerous

① The compression paradox — don't strip meaning

Cutting ceremony is free; cutting meaning backfires. A compressed log format cut input tokens 17% but raised total session cost 67% — the model spent reasoning tokens reconstructing what it could have read. Protect high-density tokens (names, rationale, error messages); only cut zero-density filler.

② The compliance U-curve — finish the job

Violations peak at medium compression. A half-trimmed rule (paraphrased tighter but still loose) hurts compliance more than the verbose original. Compress decisively to a crisp, unambiguous rule — or leave it verbose. The middle is the worst of both worlds.

↪ Your win: compress to crisp, protect meaning, or split it out

Run the compression test on every line — cut words and sentences that change no behavior.
Tables / bullets / rules over prose; negative constraints where they fit.
Compress all the way to an unambiguous rule — never leave it half-trimmed (the U-curve trough).
Keep rationale only where compliance on unforeseen inputs depends on it — don't strip the why that the agent needs to generalize.
Some content doesn't compress — it just isn't needed every task. Move it to an on-demand skill (structural, not lexical, compression).

Retrieval practice — recall, don't peek

Question 1Prompt compression should remove…

Question 2Constraint violations peak at which compression level?

Question 3Which packs more signal per token?

Question 4Stripping meaning (not ceremony) from context tends to…

Question 5 · spaced recall from Lesson 07Putting a live timestamp in the system prompt…

Ask me anything. Want me to compress a verbose section of your real content/ harness and show the before/after token count? This is also the last lever — the skill's check catalog (CE-1…CE-10) is now complete, so the finale is running the full audit-instruction-file across content/.