Advanced Agent Architectures

📄 PDF Version: Download the complete analysis (PDF)

In-depth analysis of production-ready agent architectures with performance comparisons, based on research by Sergei Parfenov.

Learning Objectives

After completing this module, you will be able to:

Compare different agent architectures and their trade-offs
Implement advanced patterns like ReAct, Self-Ask, and Tree of Thoughts
Design multi-agent systems with coordination mechanisms
Optimize agent performance for production workloads
Understand performance characteristics and benchmarking results

Prerequisites

Completion of 04_baseagents
Experience with building simple agents
Understanding of distributed systems concepts

Course Module Content

Overview of Agent Architectures

Architecture	Key Idea
Basic Reflection	Single-pass response generation
Actor-Reflector	Iterative self-critique and improvement
Tree of Thoughts	Tree-based solution search
Plan-and-Execute	Explicit step-by-step planning
ReWOO	Separation of reasoning and observations
LLMCompiler	DAG orchestration and parallelism

Architecture Details

Basic Reflection

Based on: Reflexion: Language Agents with Verbal Reinforcement Learning (10/10/2023)

Description: Basic reflection is an architecture where responses are generated iteratively through sequential generation, analysis, and text improvement using reflection.

Workflow:

User request enters the system
Initial response is generated using the first prompt (e.g., “write essay”)
Response is passed to a second prompt for reflection (e.g., “evaluate essay”)
Reflection stage generates critique and improvement ideas
Reflection and initial response are passed back to the original prompt for revised draft generation
Process repeats N times, then result is returned to user

Key Feature: Response quality improves through repeating cycles of generation and self-analysis without using external tools or explicit planning.

Actor-Reflector

Source: Language Agents with Verbal Reinforcement Learning (10.10.2023)

Workflow:

User request enters the system
Initial response is generated along with self-critique and suggested tool queries
Suggested tool queries are executed (e.g., web search for additional information)
Original response, reflection, and additional context from tools are passed to revision prompt
Response is updated, new self-reflection is created, and new tool usage suggestions are made
Process repeats N times until final response is returned to user

Tree of Thoughts (ToT)

Source: Language Agent Tree Search Unifies Reasoning, Acting and Planning in Language Models (12/05/2023)

Uses LLMs as agents, value functions, and optimizers within a Monte Carlo Tree Search algorithm.

Workflow:

User request enters the system
Initial response is generated as the root node of the tree (either response or tool execution)
Reflection prompt generates:
- reflection on result
- result evaluation
- determination if solution is found
Additional N candidates are generated considering previous output and reflection, tree expands
Reflection prompt evaluates and scores each new candidate
Best “trajectory” scores are updated
Next N candidates are generated from best child node, cycle repeats
Process continues until sufficient score is reached or maximum search depth is hit

Plan-and-Execute

Source: Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (05/26/2023)

Workflow:

User request enters the system
Initial planning prompt forms step-by-step plan for request execution
First plan step is passed to agent for generation or tool execution
Original request, original plan, and previous step results are passed to re-planning prompt
Re-planner either updates plan or returns result to user
Updated step is passed back to agent
Cycle repeats N times until re-planner considers response sufficient

ReWOO - Reasoning Without Observation

Source: ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models (05/23/2023)

Combines multi-step planner with variable substitution for efficient tool usage, reducing token consumption and execution time.

Execution Flow:

User request enters the system
Planner generates plan as list of tasks with special placeholder variables
Plan is parsed, and each step is executed by LLM agent
Each step result is substituted into variables of next step and passed back to agent
After all steps complete, plan and “evidence” from tool execution are passed to Solver prompt
Solver generates and returns final response to user

LLMCompiler

Source: An LLM Compiler for Parallel Function Calling (02/06/2024)

LLMCompiler uses directed acyclic graph ideas from compiler design for automatic generation of optimized orchestration of parallel ReAct-style function calls.

Key Ideas:

Use of directed acyclic graph (DAG)
Automatic optimization of tool call order
ReAct-style execution

Workflow:

User request enters the system
Planner generates task list with placeholder variables for dependencies and “thought” strings for reasoning
Task extraction module analyzes plan and determines inter-task dependencies
Independent tasks are sent to executor in parallel
Executor results return to task extraction module for dependency resolution
Cycle repeats until plan is complete
Full result is passed to Joiner prompt:
- either final response is formed
- or new “thought” is added and plan sent for re-planning
If needed, plan continuation is created (not entirely new plan)
Process repeats until Joiner determines sufficient information for user response

Comparative Testing

Query:

Current trends in digital marketing for technology companies

Model: GPT-4-Turbo

Architecture	Execution Time	Tokens
Basic Reflection	118.99 s	18,106
Actor-Reflector	69.04 s	24,608
Tree of Thoughts	29.52 s	8,493
Plan-and-Execute	24.72 s	2,922
ReWOO	21.64 s	5,828
LLMCompiler	11.29 s	2,745

Performance Analysis

Basic Reflection

Forced rework causes token count to increase with long input data, such as website text.

Language Agent Tree Search

Surprisingly fast despite generating large number of variants.

Can exponentially “run away” if search depth is high.

Using LLM as evaluator can be very non-deterministic.

Plan-And-Execute

Forced addition of planning stage makes process slightly more efficient than sequential revisions.

Tends to generate good enough answers faster using its own instructions.

Reasoning without Observation (ReWOO)

Optimization of plan-and-execute agents is well evident here.

More tokens, but less processing time.

LLMCompiler

Even deeper optimizations make architecture very fast, however it’s not configured for full report generation.

Notes and Limitations

This is not scientific comparison
These agents are not configured for the same task, prompt, or tools
This is not scientific comparison and not a determination of “best” agent architecture
Responses are hallucinated, not based on tool usage or external research
Processing is slow, token usage is high
Forced revisions increase token count with long input data (e.g., website text)
Tree search can grow exponentially with large depth
Using LLM as evaluator can be non-deterministic
Planning stage makes process more efficient than series of revisions
Plan-and-execute architectures show time optimization despite more tokens
Additional optimizations make execution very fast but not oriented toward full report generation

Sources

About the Author

Sergei Parfenov - LinkedIn Profile

Researcher and engineer in AI/machine learning, holding a bachelor’s degree in Computer Information Systems (St. Petersburg State University of Economics and Finance) and completed CS231n: Deep Learning for Computer Vision at Stanford. Develops, creates, and implements production-ready machine learning/AI solutions that transform cutting-edge research into measurable business value.

🔗 Next Steps

After studying these architectures:

Practice implementing basic versions of each pattern
Benchmark performance for your specific use cases
Combine patterns for hybrid solutions
Consider production requirements (scalability, reliability, cost)
Review the Contributing Guidelines if you want to help expand this content

This module is based on research and analysis by Sergei Parfenov. For the complete research paper and detailed analysis, see the PDF version.