Part 3 · Loading & Economics

Context Engineering · ~6 min

Mind the Version Gap

The model writes against the API it learned, not the one you've installed. Tell it which version is real, and the failures stop.

Why this, for you: the agent's training froze a snapshot of every library's API. Your package-lock.json says otherwise. Close that gap and you get correct code on the first try in daily work (#1) — no more debugging calls that were renamed two minor versions ago.

A model that aces an isolated coding puzzle can still fail badly the moment that code has to run under a specific library version. The capability is there. The missing piece is context about your environment.

1 The gap is huge, and it's a context problem

On standard benchmarks (HumanEval+, MBPP) that test isolated functions with no version constraints, models score 80%+ Pass@1. Add the requirement that the code run under specific library versions and the same models collapse.

From 80%+ down to 13–28%

On the VersiBCB benchmark — code that must target specific library versions — Pass@1 drops to just 13–28%. The capability didn't change between benchmarks; the only thing added was version constraints. The deficit is context, not capability.

An independent benchmark, GitChameleon, confirms the pattern: enterprise models hit only 48–51% on version-conditioned Python tasks across 26 libraries. And in reproducibility studies, 31.7% of AI-generated code fails at runtime purely from environment mismatches.

2 Why it defaults to the wrong API

Models train on web-scale code corpora that contain far more examples of older API surfaces than current ones. So the model develops a systematic preference for deprecated patterns — and without environment context, it has no signal to deviate from the most common (often stale) version it saw.

This compounds in fast-moving libraries. ML frameworks — torch, transformers, tensorflow — show the steepest accuracy drops, because their API surfaces change across minor versions. Stable standard-library modules rarely trigger a mismatch.
# agent asked to write a Trainer config, no version given: args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", # deprecated in transformers v4.46+ ) # with pyproject.toml (transformers==4.47.0) in context: args = TrainingArguments( output_dir="./results", eval_strategy="epoch", # correct name for the installed version )

One renamed parameter. Trivial to fix once you see it — expensive to debug when nothing told the agent which version is live.

3 Close the gap: feed the environment in

The fix is to make the environment explicit context. Three moves, cheapest first:

Prefer migration over generation

Across the adaptation strategies tested, models are 2–3× better at adapting existing code to a target environment than generating version-correct code from scratch. When you can, hand the agent working code to migrate rather than asking it to invent the right calls.

And keep the loop honest: a runtime error trace like AttributeError: module 'torch' has no attribute 'compile' carries a version-specific signal. Feed it back in — it's a corrective the model can act on.

↪ Your win: code against the live spec, not the snapshot

Retrieval practice — recall, don't peek

Question 1Adding version constraints drops Pass@1 from 80%+ down to…

Question 2Models default to deprecated APIs mainly because…

Question 3The steepest version-gap accuracy drops show up in…

Question 4When the installed version postdates the training cutoff, you should…

Question 5 · spaced recall from Lesson 14A repository map gives the agent…

Ask me anything. Want the snippet that excerpts just the relevant dep lines from a lockfile into a prompt, or how versioned-doc fetch composes with the migration-over-generation rule? Next: Every Token Has a Cost — context as a finite budget you spend deliberately.
✎ Feedback