Splitting the context window

2026-02-01

Context windows, attention budgets, and why splitting work across specialized agents outperforms the monolith.

For six months I used Claude Code for everything. Architecture, logic, file operations, debugging, refactoring, tests. It was good at most of that. Then Google shipped Antigravity 1 and I started using it for the parts Claude was mediocre at: UI generation, visual hierarchy, responsive layouts, component structure.

1Launched November 2025. An "agent-first IDE" with a Manager Surface for spawning parallel agents, an Editor View for hands-on coding, and browser control for visual verification. Free during public preview. Supports Gemini 3 Pro, Claude Sonnet 4.5, and GPT-OSS.

Not because Antigravity is better overall. It is not. But because it is better at that specific thing, and in software engineering "better at one thing" is all you need to justify a separate tool.

This is a thread about why.

1. The monolith problem

When you ask a single agent to do everything (research, plan, write backend, write frontend, test, validate), you are making a familiar architectural mistake. You are building a monolith.

Monoliths work. For a while. They are easy to reason about because everything lives in one place. But they have a ceiling, and the ceiling is the context window.

An agent doing architecture, then switching to pixel-level UI decisions, then back to API logic is doing the cognitive equivalent of a context switch in an operating system. Every time it crosses a boundary between concerns, it pays a tax: relevant earlier context falls out of attention, instructions get diluted, and the quality of each domain degrades proportionally to how many domains share the same window.

This is not a metaphor. It is literally what happens inside a transformer. Attention is a finite resource1. The more unrelated concerns compete for it, the worse each one gets served.

1The self-attention mechanism computes a weighted sum over all tokens in the context. As the context fills with heterogeneous concerns, the model's capacity to attend to what matters for any single concern decreases. The math is quadratic, but the practical failure mode is simpler: the model just starts forgetting your earlier instructions.

The specialization argumentA single agent with a 200k context window holding architecture decisions, UI specifications, API schemas, test requirements, and validation rules is not "powerful." It is overloaded. Each concern dilutes the others. The context window is not infinite storage. It is an attention budget, and you are spending it on everything at once.

Two agents with 100k windows each, one holding architecture and logic, the other holding UI specs and design system rules, will outperform the monolith. Not because they have more total context. They have the same. But because each token of context is relevant to the task that agent is performing.

This is the same reason microservices beat monoliths at scale. Not because they are inherently better, but because specialization lets each unit optimize for its actual job without being distorted by concerns that belong somewhere else.

2. What each tool is actually good at

I have used both tools daily for three months. Here is what I observed, not what the marketing says.

Claude Code is a terminal. It reads your entire project, holds architectural context across sessions via CLAUDE.md, modifies files with surgical precision, chains bash commands, and reasons about codebases as interconnected systems. It is excellent at anything that requires understanding how pieces fit together: dependency graphs, data flow, migration paths, debugging across file boundaries. Its weakness is spatial. It cannot see. Ask it to build a dashboard and you will get structurally correct HTML that looks like it was designed by someone who has never used a computer1.

1This is unfair. It is getting better. But the fundamental limitation is that language models operating on text tokens have no spatial reasoning loop: they cannot "look at" their output and iterate on what they see.

Antigravity is a cockpit. Its Manager Surface lets you spawn multiple agents across separate workspaces, each with its own context, each working in parallel. One agent on the frontend, another on the backend, a third running validation, all visible in a single dashboard. Its agents can also control a browser, take screenshots, and verify their own visual output. This is the feature that changes the equation: an agent that can see what it built can iterate on what it sees. But Antigravity's understanding of existing codebases is shallow. It generates well. It does not integrate well. Hand it a project with 200 files and non-obvious conventions and it will produce beautiful components that do not fit.

The gap is obvious. Claude Code understands your project but cannot visualize. Antigravity visualizes but does not understand your project. Alone, each produces incomplete work.

3. The communication pattern

The insight that makes multi-agent work is not "use both tools." That is trivially true and trivially unhelpful. The insight is that the quality of the handoff between agents determines the quality of the output. The shared context between them is the bottleneck, not the agents themselves.

Here is the minimal architecture:

Claude Code researches, analyzes, and produces a specification: file structure, data models, component contracts, naming conventions, constraints That specification becomes a shared artifact (a JSON file, a markdown doc, whatever; the format matters less than the precision) Antigravity reads the spec and generates UI components within those constraints Claude Code receives the generated components and integrates them: wiring logic, adjusting imports, connecting to the actual data layer A validation step (automated or manual) catches the gaps

Why the spec is the hard partSteps (a) and (b) are where most people fail. They hand Antigravity a vague description ("build a dashboard for this data") and wonder why the result does not integrate. A vague handoff produces components that look right in isolation but assume a different data shape, a different routing structure, a different state management pattern than what your project actually uses.

The spec needs to contain: the exact props each component receives, the exact data types, the exact file paths where components will live, the CSS conventions already in use, and anything the generating agent would otherwise have to guess. Every guess is a potential integration failure.

This is not different from how human teams work. A senior architect who writes a two-page spec and hands it to a frontend developer gets better results than one who says "build me a settings page." The quality of the delegation determines the quality of the result. AI agents are no different. They are just faster at executing bad instructions.

4. Three levels of integration

Manual handoff

You run both tools. You copy context between them manually. Claude Code writes the architecture, you paste relevant decisions into Antigravity's prompt, Antigravity generates components, you paste them back into your project and let Claude Code clean up.

This is where you start. It works for small projects and prototypes. The overhead is manageable because you are the router: you decide what each agent needs to know, and you filter out the noise.

Orchestrated workflow

You define explicit handoff points. Claude Code writes a specs.json after the architecture phase. Antigravity reads it. Claude Code integrates the output. You review at defined checkpoints.

This is where the real value appears. The spec file becomes a contract between agents. You refine it over multiple projects until it captures exactly the information the downstream agent needs and nothing more. This takes maybe 30 minutes to set up per project type, and it is reusable.

Automated pipeline

Agents communicate through shared files with no human in the loop between phases. Claude Code writes specs, Antigravity reads and generates, validation scripts run automated checks, and a human reviews only the final output.

This is where most people think they want to be. Most people are not ready for it. Automated pipelines require that your spec format is precise enough to eliminate ambiguity, and ambiguity is what humans are best at resolving. I would not recommend this unless you have run the orchestrated workflow at least twenty times and your specs have converged to a stable format.

The autonomy trapAntigravity's default configuration gives agents significant autonomy. Multiple developers have reported agents issuing destructive system commands: wiped directories, overwritten configs, force-pushed branches. Claude Code has guardrails for this (hooks, permission tiers, allowedTools). Antigravity is still catching up. If you are running agents in parallel with filesystem access, you need to understand exactly what each one is allowed to do before you press start.

5. When this does not work

Multi-agent is not always the answer. Three cases where the overhead exceeds the benefit:

Small projects. If the entire codebase fits in a single context window with room to spare, splitting work across agents just adds communication overhead. A monolith works fine when the monolith is small.

Tightly coupled logic. If the frontend and backend are deeply interleaved (websocket handlers that mirror client state, server components that render UI, shared type definitions that change constantly), the spec file becomes so large and so frequently outdated that it costs more to maintain than it saves. The agents need to share a context because the concerns cannot be cleanly separated.

One-shot tasks. If you are building something once and never touching it again, the setup cost of multi-agent orchestration is not amortized over anything. Just use whichever tool is closest.

The compounding value of multi-agent shows up when you build the same kind of thing repeatedly. A landing page pipeline. A dashboard pipeline. A documentation pipeline. The spec format stabilizes, the handoff points become predictable, and the human time per project drops from hours to minutes. That is when it works. Before that, you are just adding complexity.

6. The actual reason this matters

The deeper point here is not about Claude Code or Antigravity specifically. Both will change. New tools will appear. The specifics are ephemeral.

What matters is the principle: the right unit of AI-assisted work is not a single omniscient agent, it is a team of specialized agents with a well-defined communication protocol. This is not a new idea. It is the Unix philosophy applied to AI: do one thing well, and compose. It is microservices. It is the biological division of labor in every organism more complex than a bacterium.

The reason it took this long to become practical is that until recently, the "communication protocol" between AI agents had to be a human copy-pasting text. That bottleneck is dissolving. Shared file systems, structured output, tool-use APIs, skills that encode handoff logic. The plumbing is finally catching up to the architecture.

We are not at the fully automated pipeline yet. We are somewhere in the middle: human-in-the-loop orchestration, with the loop getting shorter. The developers who learn to design that loop well now will have a significant advantage when it tightens further. Because the hard part was never the AI. The hard part was always the specification.