Advanced Agent Architectures
đź“„ PDF Version: Download the complete analysis (PDF)
In-depth analysis of production-ready agent architectures with performance comparisons, based on research by Sergei Parfenov.
Learning Objectives
After completing this module, you will be able to:
- Compare different agent architectures and their trade-offs
- Implement advanced patterns like ReAct, Self-Ask, and Tree of Thoughts
- Design multi-agent systems with coordination mechanisms
- Optimize agent performance for production workloads
- Understand performance characteristics and benchmarking results
Prerequisites
- Completion of 04_baseagents
- Experience with building simple agents
- Understanding of distributed systems concepts
Course Module Content
Overview of Agent Architectures
| Architecture | Key Idea |
|---|---|
| Basic Reflection | Single-pass response generation |
| Actor-Reflector | Iterative self-critique and improvement |
| Tree of Thoughts | Tree-based solution search |
| Plan-and-Execute | Explicit step-by-step planning |
| ReWOO | Separation of reasoning and observations |
| LLMCompiler | DAG orchestration and parallelism |
Architecture Details
Basic Reflection
Based on: Reflexion: Language Agents with Verbal Reinforcement Learning (10/10/2023)
Description: Basic reflection is an architecture where responses are generated iteratively through sequential generation, analysis, and text improvement using reflection.
Workflow:
- User request enters the system
- Initial response is generated using the first prompt (e.g., “write essay”)
- Response is passed to a second prompt for reflection (e.g., “evaluate essay”)
- Reflection stage generates critique and improvement ideas
- Reflection and initial response are passed back to the original prompt for revised draft generation
- Process repeats N times, then result is returned to user
Key Feature: Response quality improves through repeating cycles of generation and self-analysis without using external tools or explicit planning.
Actor-Reflector
Source: Language Agents with Verbal Reinforcement Learning (10.10.2023)
Workflow:
- User request enters the system
- Initial response is generated along with self-critique and suggested tool queries
- Suggested tool queries are executed (e.g., web search for additional information)
- Original response, reflection, and additional context from tools are passed to revision prompt
- Response is updated, new self-reflection is created, and new tool usage suggestions are made
- Process repeats N times until final response is returned to user
Tree of Thoughts (ToT)
Source: Language Agent Tree Search Unifies Reasoning, Acting and Planning in Language Models (12/05/2023)
Uses LLMs as agents, value functions, and optimizers within a Monte Carlo Tree Search algorithm.
Workflow:
- User request enters the system
- Initial response is generated as the root node of the tree (either response or tool execution)
- Reflection prompt generates:
- reflection on result
- result evaluation
- determination if solution is found
- Additional N candidates are generated considering previous output and reflection, tree expands
- Reflection prompt evaluates and scores each new candidate
- Best “trajectory” scores are updated
- Next N candidates are generated from best child node, cycle repeats
- Process continues until sufficient score is reached or maximum search depth is hit
Plan-and-Execute
Source: Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (05/26/2023)
Workflow:
- User request enters the system
- Initial planning prompt forms step-by-step plan for request execution
- First plan step is passed to agent for generation or tool execution
- Original request, original plan, and previous step results are passed to re-planning prompt
- Re-planner either updates plan or returns result to user
- Updated step is passed back to agent
- Cycle repeats N times until re-planner considers response sufficient
ReWOO - Reasoning Without Observation
Source: ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models (05/23/2023)
Combines multi-step planner with variable substitution for efficient tool usage, reducing token consumption and execution time.
Execution Flow:
- User request enters the system
- Planner generates plan as list of tasks with special placeholder variables
- Plan is parsed, and each step is executed by LLM agent
- Each step result is substituted into variables of next step and passed back to agent
- After all steps complete, plan and “evidence” from tool execution are passed to Solver prompt
- Solver generates and returns final response to user
LLMCompiler
Source: An LLM Compiler for Parallel Function Calling (02/06/2024)
LLMCompiler uses directed acyclic graph ideas from compiler design for automatic generation of optimized orchestration of parallel ReAct-style function calls.
Key Ideas:
- Use of directed acyclic graph (DAG)
- Automatic optimization of tool call order
- ReAct-style execution
Workflow:
- User request enters the system
- Planner generates task list with placeholder variables for dependencies and “thought” strings for reasoning
- Task extraction module analyzes plan and determines inter-task dependencies
- Independent tasks are sent to executor in parallel
- Executor results return to task extraction module for dependency resolution
- Cycle repeats until plan is complete
- Full result is passed to Joiner prompt:
- either final response is formed
- or new “thought” is added and plan sent for re-planning
- If needed, plan continuation is created (not entirely new plan)
- Process repeats until Joiner determines sufficient information for user response
Comparative Testing
Query:
Current trends in digital marketing for technology companies
Model: GPT-4-Turbo
| Architecture | Execution Time | Tokens |
|---|---|---|
| Basic Reflection | 118.99 s | 18,106 |
| Actor-Reflector | 69.04 s | 24,608 |
| Tree of Thoughts | 29.52 s | 8,493 |
| Plan-and-Execute | 24.72 s | 2,922 |
| ReWOO | 21.64 s | 5,828 |
| LLMCompiler | 11.29 s | 2,745 |
Performance Analysis
Basic Reflection
Forced rework causes token count to increase with long input data, such as website text.
Language Agent Tree Search
Surprisingly fast despite generating large number of variants.
Can exponentially “run away” if search depth is high.
Using LLM as evaluator can be very non-deterministic.
Plan-And-Execute
Forced addition of planning stage makes process slightly more efficient than sequential revisions.
Tends to generate good enough answers faster using its own instructions.
Reasoning without Observation (ReWOO)
Optimization of plan-and-execute agents is well evident here.
More tokens, but less processing time.
LLMCompiler
Even deeper optimizations make architecture very fast, however it’s not configured for full report generation.
Notes and Limitations
- This is not scientific comparison
- These agents are not configured for the same task, prompt, or tools
- This is not scientific comparison and not a determination of “best” agent architecture
- Responses are hallucinated, not based on tool usage or external research
- Processing is slow, token usage is high
- Forced revisions increase token count with long input data (e.g., website text)
- Tree search can grow exponentially with large depth
- Using LLM as evaluator can be non-deterministic
- Planning stage makes process more efficient than series of revisions
- Plan-and-execute architectures show time optimization despite more tokens
- Additional optimizations make execution very fast but not oriented toward full report generation
Sources
- Reflexion: Language Agents with Verbal Reinforcement Learning
- Language Agents with Verbal Reinforcement Learning
- Language Agent Tree Search Unifies Reasoning, Acting and Planning in Language Models
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
About the Author
Sergei Parfenov - LinkedIn Profile
Researcher and engineer in AI/machine learning, holding a bachelor’s degree in Computer Information Systems (St. Petersburg State University of Economics and Finance) and completed CS231n: Deep Learning for Computer Vision at Stanford. Develops, creates, and implements production-ready machine learning/AI solutions that transform cutting-edge research into measurable business value.
đź”— Next Steps
After studying these architectures:
- Practice implementing basic versions of each pattern
- Benchmark performance for your specific use cases
- Combine patterns for hybrid solutions
- Consider production requirements (scalability, reliability, cost)
- Review the Contributing Guidelines if you want to help expand this content
This module is based on research and analysis by Sergei Parfenov. For the complete research paper and detailed analysis, see the PDF version.