Context Windows and Sub-Agents
Level up from basic prompts to understanding context - the BIG prompt that agents use to gather tool results, previous conversation, project files (like CLAUDE.md), and handle sub-agents for complex work without polluting the main conversation.
Learning Objectives
After completing this module, you will be able to:
- Understand context as “BIG prompt” = system + history + tool results + project files
- Recognize context pollution and how it degrades LLM performance
- Use sub-agents (like Plan mode) to isolate work and keep main context clean
- Understand that this is just HTTP requests to GPT-like APIs, not magic
- Debug context overflow issues and manage token costs effectively
Prerequisites
- Completion of 01_prompt
- You understand prompts are structured input, not conversation
- Ready to learn about context management complexity
Course Module Content
Understanding Context Windows
What is Context? (The BIG Prompt)
Context = The BIG prompt that includes:
- System Prompt: Claude Code’s internal instructions (50k+ tokens - fixed overhead)
- CLAUDE.md: Your project-specific rules and patterns (if you created one)
- Your Messages: All prompts you’ve sent this session
- Agent Responses: All previous LLM outputs
- Tool Results: File reads, bash outputs, search results, API calls
- Hidden Summaries: Compacted old messages when approaching token limit
- Sub-Agent Results: Summaries returned from specialized agents
Think of it this way: Every tool call adds its output to your context window. Read a 10k file? +10k tokens. Run a command? +500 tokens. This accumulates until the LLM performance degrades or costs explode.
Context Overflow Management
sequenceDiagram
participant User
participant System
participant Context as Context Window
User->>Context: Messages accumulate
Note over Context: 195,000 / 200,000 tokens
Context->>System: WARNING Approaching limit!
System->>System: Trigger compaction
Note over System: Summarizing old messages:<br/>User asked about auth,<br/>I provided JWT examples
System->>Context: Remove old verbose messages
System->>Context: Add compact summary
Note over Context: 55,000 tokens<br/>System prompt + recent messages + summary
Context->>User: Ready for new messages
Token Economics
Token Calculation Basics
1 token ≈ 4 characters (for English text, approximate)
English: ~4 characters = 1 token
1000 characters ≈ 250 tokens
Russian: ~2.5 characters = 1 token
1000 characters ≈ 400 tokens (60% more expensive!)
Chinese: ~1.5 characters = 1 token
1000 characters ≈ 650 tokens (160% more expensive!)
Code: ~3.5 characters = 1 token (more efficient due to symbols)
1000 characters ≈ 285 tokens
Context Window Sizes (December 2024) (✓ Pricing verified December 2025)
| Model | Context Window | Input Cost | Output Cost |
|---|---|---|---|
| Claude Opus 4 | 200,000 | $15/1M tokens | $75/1M tokens |
| Claude Sonnet 4.5 | 200,000 | $3/1M tokens | $15/1M tokens |
| Claude Haiku 4 | 200,000 | $0.80/1M tokens | $4/1M tokens |
| GPT-4 Turbo | 128,000 | $10/1M tokens | $30/1M tokens |
| GPT-4o | 128,000 | $2.50/1M tokens | $10/1M tokens |
| Gemini 1.5 Pro | 2,000,000 | $1.25/1M tokens | $5/1M tokens |
| Gemini 1.5 Flash | 1,000,000 | $0.075/1M tokens | $0.30/1M tokens |
Claude Code Request-Response Flow
Visual Overview
sequenceDiagram
participant U as User
participant M as Main Agent
participant T as Tools
participant S as Sub-Agent (Plan)
U->>M: analyse implementation details...
Note over M: Context: 50k tokens
M->>T: Read PRP-003 document
T-->>M: 1132 lines returned
Note over M: Context: 65k tokens
M->>M: Task is complex, spawn sub-agent
M->>S: Analyze TUI implementation
Note over S: Fresh context: 20k tokens
S->>T: Read src/tui/main.tsx
T-->>S: 8k tokens
S->>T: Read StatusBar.tsx
T-->>S: 3k tokens
S->>T: 18 more tool calls
T-->>S: 49k more tokens
Note over S: Context: 80k tokens
S->>S: Analyze all data
S-->>M: Return 500 token summary only
Note over M: Context: 65.5k not 145k!
M->>U: Present plan based on summary
Note over U: Sees descriptive plan
rect
Note over U,M: Problem: Original prompt too vague
end
Real Example Breakdown
Let’s trace through an actual Claude Code interaction:
USER Request
> analyse implementation details in our code according to
prp/PRPs/PRP-003-tui-implementation.md
Step 1: Initial Response + Hidden Command
What you see:
> I'll analyze the implementation details for PRP-003 TUI implementation.
Let me first read the PRP document and then examine the current codebase
to understand what has been implemented.
What happens in context:
- Agent decides it needs to read the file first
- Generates hidden tool call:
Read(PRPs/PRP-003-tui-implementation.md)
Context at this point:
[System Prompt: ~50k tokens]
[User Message: "analyse implementation..." ~15 tokens]
[Assistant Thinking: "I need to read PRP first..." ~50 tokens]
[Tool Call: Read(...) - waiting for result]
Total: ~50,065 tokens
Step 2: Tool Execution
What you see:
> Read(PRPs/PRP-003-tui-implementation.md)
⎿ Read 1132 lines
What happens:
- File content loaded into context (full 1132 lines)
- Context size increases significantly
- Claude now has the full requirements document
Context now:
[System Prompt: ~50k tokens]
[User Message: ~15 tokens]
[Assistant Thinking: ~50 tokens]
[Tool Result: File contents - ~15k tokens]
Total: ~65,065 tokens
Why this matters: Every tool result stays in context until:
- You manually clear with
/clear - Automatic compaction triggers near limit
- Session ends
Step 3: Agent Decides to Use Sub-Agent
What you see:
> Now I'll use the Task tool with a Plan subagent to analyze the TUI
implementation details comprehensively.
What happens:
- Main agent realizes task is complex (needs to read many files)
- Decides to spawn a sub-agent with independent context
- Sub-agent gets its own fresh context window (doesn’t inherit main agent’s 65k tokens)
Why sub-agents?
- Context isolation: Sub-agent doesn’t pollute main conversation
- Specialized instructions: Sub-agent can have different system prompt (e.g., “You are a code analysis expert”)
- Parallel work: Multiple sub-agents can work simultaneously
- Cost efficiency: Main agent doesn’t pay token cost for sub-agent’s exploration
Step 4: Sub-Agent Execution
What you see:
> Plan(Analyze TUI implementation progress)
⎿ Done (20 tool uses • 80.1k tokens • 1m 4s)
What happens in sub-agent context:
Sub-agent receives:
[Sub-agent System Prompt: "You are a code analysis expert..." ~5k tokens]
[Task: "Analyze TUI implementation against PRP-003" ~100 tokens]
[PRP Document: ~15k tokens]
[Access to tools: Read, Bash, Search codebase]
Total start: ~20,100 tokens
Sub-agent executes 20 tool calls:
Read(src/tui/main.tsx)- loads main TUI file (~8k tokens)Read(src/tui/components/StatusBar.tsx)- checks status bar (~3k tokens)Read(src/tui/components/PRPList.tsx)- checks PRP list (~4k tokens)Bash(grep -r "EventBus" src/)- searches for event integration (~2k tokens)Read(src/tui/hooks/useKeyboard.tsx)- keyboard hooks (~3k tokens) 6-20. … More file reads and searches (~30k tokens)
Sub-agent context grows to 80,100 tokens with all the code it read.
Sub-agent performs analysis and generates SHORT SUMMARY:
{
"status": "65% complete",
"implemented": [
"Basic TUI layout with StatusBar and PRPList components",
"Static rendering of PRP information",
"Component structure following requirements"
],
"partially_implemented": [
"Keyboard navigation (Tab works, S/X shortcuts missing)",
"Debug mode (screen exists but no live data)"
],
"missing_features": [
"Real-time EventBus integration",
"Melody.json file generation for music sync",
"Complete keyboard navigation (S/X actions)",
"Live agent status updates"
],
"implementation_quality": "excellent",
"critical_blockers": 2,
"estimated_completion": "3-4 days"
}
IMPORTANT: Only this ~500 token summary (not full 80k context) returns to main agent!
Token savings:
- Without sub-agent: Main context would be 50k + 15 + 15k + 80k = 145,015 tokens
- With sub-agent: Main context stays at 50k + 15 + 15k + 500 = 65,515 tokens
- Savings: 79,500 tokens (~55% reduction)
Context Pollution Control
The Problem
graph TD
A["Start: 10k tokens"] -->|5 messages| B["45k tokens"]
B -->|Read 10 files| C["125k tokens"]
C -->|Tool executions| D["180k tokens"]
D -->|More work| E["195k tokens<br/>WARNING LLM degrading"]
Solutions
Frequent clearing:
Task: "Implement authentication"
→ Work on it (multiple prompts, file reads)
→ Task complete ✓
→ /clear immediately (critical!)
Next task: "Add logging"
→ Fresh context, no pollution from auth work
→ Clean slate, optimal performance
Use sub-agents for isolation:
> Use code-analyzer sub-agent to find performance issues in src/,
then use optimizer sub-agent to fix them.
# Sub-agents work in parallel, isolated contexts
# Main conversation stays clean
# Only summaries return to main context
Optimization Strategies
1. Prompt Compression
[FAIL] Wasteful (127 tokens):
I would like you to please analyze the implementation details of our
codebase specifically looking at the terminal user interface components
and comparing them against the requirements document that I have which
is located at the path prp/PRPs/PRP-003-tui-implementation.md and then
provide me with a comprehensive breakdown of what has been completed
versus what still needs to be done and also identify any potential
issues or blockers that might prevent us from finishing this work.
[PASS] Efficient (31 tokens):
Analyze TUI implementation against prp/PRPs/PRP-003-tui-implementation.md.
Return: completion status, missing features, blockers.
Format: markdown table.
Savings: 96 tokens (~76% reduction)
2. Avoid Repeated File Reads
[FAIL] Bad (16k tokens wasted):
Message 1: "Analyze auth.ts"
→ Agent reads auth.ts (8k tokens added to context)
Message 2: "Now check if it handles errors properly"
→ Agent reads auth.ts AGAIN (now 16k tokens total!)
Message 3: "What about the types?"
→ Agent reads auth.ts THIRD TIME (24k tokens!)
[PASS] Good (12k tokens total):
"Analyze auth.ts and auth-types.ts together. Check:
1. Logic correctness
2. Error handling completeness
3. Type safety
Return findings as numbered list."
→ Agent reads both files once (12k tokens)
→ All analysis done in single pass
→ 50% token savings
3. Use Artifacts for Large Outputs
When asking for large code generation:
"Generate a complete REST API with CRUD operations for User, Post, Comment models.
Include TypeScript types, validation, error handling, tests."
Claude creates artifact (separate context)
→ Your conversation context stays small (~1k tokens for request)
→ Artifact can be 50k+ tokens without polluting main context
→ Can regenerate artifact without affecting conversation
→ Can have multiple artifacts in same conversation
Practice Troubleshooting Common Issues
Issue: “Claude keeps forgetting what we discussed”
Symptoms:
- Agent asks for information you already provided
- Agent repeats previous suggestions
- Agent doesn’t remember earlier decisions
Cause: Context overflow - old messages compacted/removed
Solutions:
- Check context size with
Ctrl+O- If > 150k tokens, use
/compactor/clear
- If > 150k tokens, use
- Use
/clearmore frequently (after each major task) - Break large tasks into smaller ones (each fits in one context)
- Use sub-agents for isolated work (analysis, exploration)
- Put critical information in CLAUDE.md file (always included)
- Re-state critical constraints at start of each prompt
Prevention:
# Add to CLAUDE.md in project root:
## Critical Context (Always Remember)
- Authentication uses JWT tokens (not sessions)
- Database is PostgreSQL 14
- Use TypeScript strict mode
- All dates in ISO 8601 format
- Test coverage minimum: 80%
Issue: “Agent is doing things I didn’t ask for”
Symptoms:
- Agent modifies files you didn’t mention
- Agent changes code style unexpectedly
- Agent adds features you didn’t request
Cause: Vague prompt, agent filling in gaps with assumptions
Solutions:
- Be MORE specific in prompts
- List exact files to modify
- Specify what NOT to change
- Define success criteria clearly
- Use imperative instructions:
[FAIL] "Improve the authentication code" [PASS] "In src/auth/login.ts, add email validation before password check. Do not modify password hashing logic. Do not change error message formats." - Add explicit constraints:
```
“Constraints:
- Modify ONLY src/auth/login.ts
- Do not change function signatures
- Do not add new dependencies
- Do not modify test files” ```
- Use Plan mode to review before execution:
Shift+Tab # Enter plan mode # Review plan # If plan includes unexpected changes, refine prompt
Issue: “Agent gives wrong/outdated information”
Symptoms:
- Agent suggests deprecated APIs
- Agent references non-existent libraries
- Agent provides incorrect factual information
Cause: Hallucination (statistical prediction, not facts)
Solutions:
- Ask agent to search documentation:
[FAIL] "How do I use the Anthropic API?" [PASS] "Search the official Anthropic documentation for API authentication. Provide the current recommended approach." - Provide reference material in context:
"Using the authentication approach described in docs/api/auth.md, implement JWT token validation." - Use RAG (Retrieval Augmented Generation) tools:
- MCP servers for documentation
- Vector databases for codebase search
- Official docs as context
- Verify critical information independently:
- Check official docs manually
- Test the generated code
- Review diffs before committing
- Request citations:
"Explain JWT token expiration handling. Cite specific sections from RFC 7519 if possible."
Issue: “Token costs are higher than expected”
Symptoms:
- Bills higher than estimated
- Individual requests expensive
- Context growing unexpectedly
Causes:
- Context bloat (repeated file reads)
- Using non-English text (2x tokens for Russian, 2.6x for Chinese)
- Large outputs
- Not using prompt caching
- Wrong model selection (using Opus when Sonnet would suffice)
Solutions:
- Clear context frequently:
/clear # After each major task Ctrl+O # Monitor context size - Write prompts in English:
[FAIL] Russian: 1000 chars = ~400 tokens = $0.0012 input (Sonnet) [PASS] English: 1000 chars = ~250 tokens = $0.00075 input (Sonnet) Savings: 37.5% per request - Request concise outputs:
"Summarize in <200 words" "Return only the function signature, not full implementation" "List the top 5 issues, not all 47" - Use sub-agents to isolate large analyses:
```
“Use sub-agent to analyze all 50 files, return only:
- Total lines of code
- Average complexity score
- Top 3 refactoring priorities”
→ Sub-agent reads 200k tokens → Main agent receives 100 token summary → 99.95% context savings ```
- Choose right model:
Simple tasks, linting, formatting: Haiku ($0.80/1M in) Code generation: Sonnet ($3/1M in) Complex architecture: Opus ($15/1M in) Example: Fix 100 linting errors: Haiku: $0.08 Sonnet: $0.30 (4x more expensive!) Opus: $1.50 (19x more expensive!) - Enable prompt caching:
- Use stable system prompts
- Group related tasks
- Cache saves 90% on repeated content
Issue: “Agent is too slow”
Symptoms:
- Responses taking 30+ seconds
- Multiple retries before success
- Timeouts on large tasks
Causes:
- Context exceeding 150k tokens
- Complex tool chains (10+ tool calls)
- Sub-agent spawning overhead
- Large file reads (100k+ tokens)
- Model choice (Opus slower than Sonnet)
Solutions:
- Reduce context size:
/clearbefore starting- Remove large tool results
- Use
/compactinstead of keeping full history
- Optimize tool usage:
- Batch file reads: “Read files A, B, C” (not separate requests)
- Use grep/search instead of reading full files
- Request specific functions, not entire files
- Use faster model for simple tasks:
Haiku: 2-3 seconds average Sonnet: 4-6 seconds average Opus: 8-12 seconds average - Parallelize with sub-agents:
[FAIL] Serial (slow): Read file → Analyze → Fix → Test → Commit Total: 60 seconds [PASS] Parallel (fast): Sub-agent 1: Analyze frontend files Sub-agent 2: Analyze backend files Sub-agent 3: Analyze database files Main agent: Merge results Total: 25 seconds - Use headless mode for batch processing:
# Process 100 files [FAIL] Interactive: 100 × 10s = 1000s (17 minutes) [PASS] Headless batch: 100 files in 300s (5 minutes) claude -p "Fix linting errors in src/**/*.ts" --dangerously-skip-permissions
Practice Practical Examples
Context Hygiene Checklist
Before long task: ☐ Check current context size (Ctrl+O) ☐ Clear if > 100k tokens ☐ Consider using sub-agent if task needs many file reads
During task: ☐ Monitor context growth ☐ Use /compact if approaching 150k ☐ Avoid re-reading same files
After task: ☐ /clear immediately ☐ Don’t carry context to next task ☐ Start fresh
Cost Calculation Example
Simple Q&A Session (Claude Sonnet 4.5)
Scenario: 10 back-and-forth exchanges
User inputs: 10 messages × 50 tokens each = 500 tokens
Claude responses: 10 messages × 150 tokens each = 1,500 tokens
System prompt: 50,000 tokens (included in every request)
Total input tokens per message: 50,000 + cumulative conversation
- Message 1: 50,000 + 50 = 50,050 input tokens
- Message 2: 50,000 + 50 + 150 + 50 = 50,250 input tokens
- Message 3: 50,000 + 400 + 50 = 50,450 input tokens
- ...
- Message 10: 50,000 + 1,650 + 50 = 51,700 input tokens
Total input tokens: 50,050 + 50,250 + ... ≈ 505,000 tokens
Total output tokens: 1,500 tokens
Cost = (505,000 × $3/1M) + (1,500 × $15/1M)
= $1.515 + $0.0225
= $1.54 for the 10-message conversation
With prompt caching (90% discount on system prompt):
Cached: 50,000 tokens × 0.9 = 45,000 tokens saved per message
Savings: 45,000 × 9 messages × $3/1M = $1.215
New cost: $1.54 - $1.215 = $0.325
Summary Key Takeaways
- Context management is critical for quality and cost
- Clear context after each task using
/clear - Use sub-agents for exploration that requires reading many files
- Monitor context size with
Ctrl+Oin Claude Code - Optimize prompts to reduce token usage
- English text is ~40-60% cheaper than Russian/Chinese
- Claude Code API provides specific tools for context management
- Troubleshoot systematically using the patterns above
🔗 Next Steps
After mastering context management:
- Continue to 03_claudecode - Learn about Claude Code CLI and practice with specs
- Practice context optimization techniques in your own projects
- Review the Contributing Guidelines if you want to help improve this content
This module combines practical context management techniques with Claude API-specific implementation details. For more comprehensive information, see the original guide.