Building AI Agents from Scratch: A Complete Guide (2026)
Introduction: The Age of AI Agents is Here
In 2024, we witnessed a fundamental shift in how we interact with AI. We moved from asking AI questions to letting AI take actions. ChatGPT’s o1 model brought advanced reasoning, Claude 3.5 Sonnet showcased unprecedented tool use, and frameworks like LangChain and LangGraph matured into production-ready platforms.
But here’s what changed everything: AI agents can now autonomously plan, execute, and iterate on complex tasks—from debugging your entire codebase to orchestrating multi-step workflows across different tools and APIs.
If you’ve been wondering how to build these autonomous systems yourself, you’re in the right place. In this comprehensive guide, we’ll go from zero to a fully functional AI agent that can reason, use tools, maintain memory, and handle real-world tasks.
What You’ll Learn
Conceptual Foundation: What AI agents really are (beyond the hype)
Architecture Patterns: ReAct, Plan-and-Execute, Reflection, and Multi-Agent systems
Hands-On Implementation: Build a working agent from scratch with Python
Production Considerations: Testing, monitoring, security, and scaling
Real-World Case Studies: How companies are deploying agents today
Prerequisites
Python 3.9+ and basic programming knowledge
LLM API access (OpenAI, Anthropic, or Google)
Basic understanding of REST APIs and async programming
Optional: Docker knowledge for deployment
Let’s dive in.
Part 1: Understanding AI Agents
What is an AI Agent?
An AI agent is an autonomous system that uses a Large Language Model (LLM) as its “brain” to perceive its environment, make decisions, take actions, and learn from feedback—all without continuous human intervention.
Think of it this way:
Traditional Chatbot:
User → Prompt → LLM → Response → Done
AI Agent:
User → Goal → Agent (loops):
1. Analyze current state
2. Plan next action
3. Execute with tools
4. Observe results
5. Decide: continue or finish
→ Final outcomeThe Four Core Components
🧠 Brain (LLM)
The reasoning engine (GPT-4, Claude, Gemini)
Interprets goals, plans actions, processes results
Handles decision-making and natural language
🛠️ Tools (Functions)
External capabilities the agent can invoke
Examples: web search, code execution, API calls, database queries
Defined with clear descriptions for the LLM
💾 Memory (Context)
Short-term: Conversation history, immediate context
Long-term: Vector stores, knowledge bases, past experiences
Enables learning and personalization
🔄 Control Loop (Orchestration)
The execution framework
Manages reasoning → action → observation cycles
Handles errors, retries, and termination conditions
AI Agent vs Traditional Chatbot
Real-World Use Cases
Here’s where AI agents are making an impact right now:
🔧 Software Development
Autonomous code generation and debugging
Test case creation and execution
Code review and refactoring suggestions
Example: Devin AI, GitHub Copilot Workspace
📊 Data Analysis
Automated data exploration and visualization
Generating insights from complex datasets
Building and executing SQL queries
Example: Julius AI, Continual
🎧 Customer Support
Multi-step problem resolution
Ticket routing and escalation
Knowledge base search and updates
Example: Sierra, Ada
⚙️ DevOps Automation
Infrastructure monitoring and remediation
Deployment orchestration
Log analysis and root cause detection
Example: Kubiya, Relay.app
🔬 Research & Synthesis
Literature review and summarization
Multi-source information gathering
Report generation with citations
Example: Elicit, Consensus
Part 2: Agent Architecture Patterns
Let’s explore the four foundational patterns that power modern AI agents.
Pattern 1: ReAct (Reasoning + Acting)
The most popular agent pattern, combining reasoning traces with action execution.
How it works:
Thought: Agent reasons about what to do
Action: Selects and executes a tool
Observation: Receives tool output
Repeat: Until task is complete
When to use:
Interactive tasks requiring iterative refinement
When you need visibility into agent’s reasoning
Tasks with multiple possible approaches
Example flow:
User: "What's the weather in the location where the Eiffel Tower is?"
Thought: I need to first find where the Eiffel Tower is located
Action: search("Eiffel Tower location")
Observation: The Eiffel Tower is in Paris, France
Thought: Now I can get the weather for Paris
Action: get_weather("Paris, France")
Observation: 15°C, Partly Cloudy
Thought: I have the answer
Final Answer: It's 15°C and partly cloudy in Paris, where the Eiffel Tower is located.Pattern 2: Plan-and-Execute
Strategic decomposition of complex tasks into sub-tasks before execution.
How it works:
Plan: Break down goal into sequential steps
Execute: Run each step with tools
Replanning: Adjust plan based on results
Completion: Aggregate outcomes
When to use:
Complex multi-step workflows
When task dependencies are clear
Need for parallelization
Example flow:
User: "Analyze our competitor's latest product launch"
Plan:
1. Search for competitor's recent announcements
2. Gather product details and features
3. Analyze pricing and positioning
4. Compare with our offerings
5. Generate SWOT analysis
Execute each step with appropriate tools →
Synthesize final reportPattern 3: Reflection
Self-critique and improvement through iterative refinement.
How it works:
Generate: Produce initial output
Reflect: Critique own work
Improve: Generate refined version
Repeat: Until quality threshold met
When to use:
Creative tasks (writing, design)
Quality-critical outputs
When “good enough” isn’t acceptable
Example flow:
Task: Write a product description
Draft 1: [Generated text]
Reflection: Too technical, lacks emotional appeal
Draft 2: [Improved version]
Reflection: Better, but missing key benefit
Draft 3: [Final polished version]
Quality Score: 9/10 → AcceptPattern 4: Multi-Agent
Specialized agents collaborating on complex problems.
How it works:
Delegation: Manager agent assigns sub-tasks
Specialization: Each agent has specific expertise
Communication: Agents share findings
Synthesis: Coordinator combines results
When to use:
Highly complex problems
Need for domain expertise
Parallel execution benefits
Different reasoning approaches needed
Example structure:
Manager Agent
├── Research Agent (web search, synthesis)
├── Code Agent (implementation, testing)
├── QA Agent (validation, edge cases)
└── Documentation Agent (writing, examples)Part 3: Building Your First Agent (Hands-On)
Now let’s build a practical AI agent from scratch. We’ll create a Research Assistant that can search the web, analyze content, and generate reports.
Step 1: Environment Setup
First, install required packages:
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install langchain langchain-openai langchain-community
pip install duckduckgo-search python-dotenv
pip install faiss-cpu # For vector storage
Create .env file:
OPENAI_API_KEY=your_api_key_hereStep 2: Create Custom Tools
Tools are functions your agent can call. Let’s create three essential tools:
# tools.py
from langchain.tools import tool
from duckduckgo_search import DDGS
import requests
from bs4 import BeautifulSoup
from typing import Optional
@tool
def search_web(query: str) -> str:
"""
Search the web for information using DuckDuckGo.
Args:
query: The search query string
Returns:
Formatted search results with titles and snippets
"""
try:
with DDGS() as ddgs:
results = list(ddgs.text(query, max_results=5))
if not results:
return "No results found."
formatted_results = []
for i, result in enumerate(results, 1):
formatted_results.append(
f"{i}. {result['title']}\n"
f" {result['body']}\n"
f" URL: {result['href']}\n"
)
return "\n".join(formatted_results)
except Exception as e:
return f"Search error: {str(e)}"
@tool
def fetch_webpage_content(url: str) -> str:
"""
Fetch and extract main text content from a webpage.
Args:
url: The webpage URL to fetch
Returns:
Extracted text content (first 2000 chars)
"""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text
text = soup.get_text()
# Clean up whitespace
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = ' '.join(chunk for chunk in chunks if chunk)
# Return first 2000 characters
return text[:2000] + "..." if len(text) > 2000 else text
except Exception as e:
return f"Error fetching {url}: {str(e)}"
@tool
def calculate(expression: str) -> str:
"""
Safely evaluate mathematical expressions.
Args:
expression: Math expression (e.g., "2 + 2 * 3")
Returns:
Calculation result
"""
try:
# Safe evaluation - only allow basic math
allowed_chars = set("0123456789+-*/.()")
if not all(c in allowed_chars or c.isspace() for c in expression):
return "Invalid expression: only basic math operations allowed"
result = eval(expression, {"__builtins__": {}}, {})
return f"Result: {result}"
except Exception as e:
return f"Calculation error: {str(e)}"Step 3: Build the Agent
Now let’s create the agent with ReAct pattern:
# agent.py
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from tools import search_web, fetch_webpage_content, calculate
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize LLM
llm = ChatOpenAI(
model="gpt-4",
temperature=0, # Deterministic for consistency
api_key=os.getenv("OPENAI_API_KEY")
)
# Define tools
tools = [search_web, fetch_webpage_content, calculate]
# Create custom prompt template
template = """You are a helpful research assistant with access to various tools.
Answer the user's question as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Important guidelines:
- Be thorough and use tools when needed
- Cite sources when providing information
- If you don't know something, search for it
- Break complex questions into steps
Begin!
Question: {input}
Thought: {agent_scratchpad}
"""
prompt = PromptTemplate.from_template(template)
# Create the agent
agent = create_react_agent(
llm=llm,
tools=tools,
prompt=prompt
)
# Create memory
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
# Create agent executor
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True, # Show reasoning steps
max_iterations=10, # Prevent infinite loops
handle_parsing_errors=True, # Graceful error handling
return_intermediate_steps=True
)
def run_agent(question: str):
"""
Run the agent with a question and return results.
"""
try:
result = agent_executor.invoke({"input": question})
return {
"answer": result["output"],
"steps": result.get("intermediate_steps", [])
}
except Exception as e:
return {
"answer": f"Error: {str(e)}",
"steps": []
}
# Example usage
if __name__ == "__main__":
# Simple test
question = "What are the latest developments in AI agents? Provide a summary with sources."
print(f"Question: {question}\n")
print("=" * 80)
result = run_agent(question)
print("\nFinal Answer:")
print(result["answer"])
print("\n" + "=" * 80)
print(f"Completed in {len(result['steps'])} steps")
Step 4: Add Memory for Context
Let’s enhance our agent with persistent memory:
# memory_agent.py
from langchain.memory import ConversationBufferWindowMemory
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List, Dict
class MemoryEnhancedAgent:
"""Agent with both short-term and long-term memory."""
def __init__(self, agent_executor):
self.agent_executor = agent_executor
# Short-term memory: last 5 interactions
self.short_term = ConversationBufferWindowMemory(
k=5,
memory_key="chat_history",
return_messages=True
)
# Long-term memory: vector store for facts
self.embeddings = OpenAIEmbeddings()
self.long_term = None # Initialize on first use
self.knowledge_base = []
def remember(self, text: str, metadata: Dict = None):
"""Store information in long-term memory."""
self.knowledge_base.append({
"text": text,
"metadata": metadata or {}
})
# Update vector store
texts = [item["text"] for item in self.knowledge_base]
metadatas = [item["metadata"] for item in self.knowledge_base]
if self.long_term is None:
self.long_term = FAISS.from_texts(
texts,
self.embeddings,
metadatas=metadatas
)
else:
self.long_term.add_texts(texts, metadatas=metadatas)
def recall(self, query: str, k: int = 3) -> List[str]:
"""Retrieve relevant information from long-term memory."""
if self.long_term is None:
return []
docs = self.long_term.similarity_search(query, k=k)
return [doc.page_content for doc in docs]
def run(self, question: str, use_memory: bool = True):
"""Run agent with memory context."""
# Get relevant context from long-term memory
context = ""
if use_memory and self.long_term:
relevant_info = self.recall(question)
if relevant_info:
context = f"\nRelevant context from past interactions:\n" + "\n".join(relevant_info)
# Add context to question
enhanced_question = question + context
# Run agent
result = self.agent_executor.invoke({"input": enhanced_question})
# Store interaction in memory
self.remember(
f"Q: {question}\nA: {result['output']}",
metadata={"type": "qa_pair"}
)
return result
# Example usage
if __name__ == "__main__":
from agent import agent_executor
memory_agent = MemoryEnhancedAgent(agent_executor)
# First interaction
result1 = memory_agent.run("What is LangChain?")
print(result1["output"])
# Second interaction - will remember previous context
result2 = memory_agent.run("How do I use it for building agents?")
print(result2["output"])
Step 5: Error Handling and Retries
Production agents need robust error handling:
# robust_agent.py
from typing import Any, Dict
import time
import logging
from functools import wraps
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def retry_with_backoff(max_retries: int = 3, backoff_factor: float = 2.0):
"""Decorator for retrying functions with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
retries = 0
while retries < max_retries:
try:
return func(*args, **kwargs)
except Exception as e:
retries += 1
if retries >= max_retries:
logger.error(f"Max retries reached for {func.__name__}: {e}")
raise
wait_time = backoff_factor ** retries
logger.warning(
f"Attempt {retries} failed for {func.__name__}: {e}. "
f"Retrying in {wait_time}s..."
)
time.sleep(wait_time)
return wrapper
return decorator
class RobustAgent:
"""Agent wrapper with error handling and monitoring."""
def __init__(self, agent_executor):
self.agent_executor = agent_executor
self.metrics = {
"total_runs": 0,
"successful_runs": 0,
"failed_runs": 0,
"total_tokens": 0,
"avg_response_time": 0
}
@retry_with_backoff(max_retries=3)
def run(self, question: str, **kwargs) -> Dict[str, Any]:
"""Run agent with monitoring and error handling."""
start_time = time.time()
self.metrics["total_runs"] += 1
try:
# Run agent
result = self.agent_executor.invoke({"input": question}, **kwargs)
# Update metrics
self.metrics["successful_runs"] += 1
response_time = time.time() - start_time
# Update average response time
n = self.metrics["successful_runs"]
current_avg = self.metrics["avg_response_time"]
self.metrics["avg_response_time"] = (
(current_avg * (n - 1) + response_time) / n
)
logger.info(f"Agent completed in {response_time:.2f}s")
return {
"success": True,
"output": result["output"],
"steps": result.get("intermediate_steps", []),
"metrics": {
"response_time": response_time,
"steps_count": len(result.get("intermediate_steps", []))
}
}
except Exception as e:
self.metrics["failed_runs"] += 1
logger.error(f"Agent failed: {str(e)}")
return {
"success": False,
"error": str(e),
"output": None,
"metrics": {
"response_time": time.time() - start_time
}
}
def get_metrics(self) -> Dict[str, Any]:
"""Return agent performance metrics."""
success_rate = (
self.metrics["successful_runs"] / self.metrics["total_runs"] * 100
if self.metrics["total_runs"] > 0 else 0
)
return {
**self.metrics,
"success_rate": f"{success_rate:.2f}%"
}
# Example usage
if __name__ == "__main__":
from agent import agent_executor
robust_agent = RobustAgent(agent_executor)
# Test with various questions
questions = [
"What is the capital of France?",
"Calculate 15 * 23 + 100",
"Search for latest AI developments and summarize"
]
for q in questions:
print(f"\nQuestion: {q}")
result = robust_agent.run(q)
if result["success"]:
print(f"Answer: {result['output']}")
print(f"Time: {result['metrics']['response_time']:.2f}s")
else:
print(f"Error: {result['error']}")
# Print overall metrics
print("\n" + "=" * 80)
print("Agent Performance Metrics:")
for key, value in robust_agent.get_metrics().items():
print(f" {key}: {value}")Part 4: Advanced Concepts
Token Usage Optimization
LLM calls are expensive. Here’s how to optimize:
# token_optimizer.py
from langchain.callbacks import get_openai_callback
from typing import List
class TokenOptimizer:
"""Monitor and optimize token usage."""
def __init__(self, agent_executor):
self.agent_executor = agent_executor
self.total_tokens = 0
self.total_cost = 0.0
def run_with_tracking(self, question: str):
"""Run agent and track token usage."""
with get_openai_callback() as cb:
result = self.agent_executor.invoke({"input": question})
self.total_tokens += cb.total_tokens
self.total_cost += cb.total_cost
return {
"output": result["output"],
"tokens_used": cb.total_tokens,
"cost": cb.total_cost,
"prompt_tokens": cb.prompt_tokens,
"completion_tokens": cb.completion_tokens
}
def get_stats(self):
"""Get cumulative usage statistics."""
return {
"total_tokens": self.total_tokens,
"total_cost": f"${self.total_cost:.4f}",
"avg_tokens_per_run": self.total_tokens / max(1, self.runs)
}
# Tips for reducing token usage:
# 1. Use shorter prompts
# 2. Implement caching for repeated queries
# 3. Use smaller models for simple tasks
# 4. Trim conversation history aggressively
# 5. Compress context before sending to LLMCaching Strategies
Avoid redundant API calls:
# caching.py
from functools import lru_cache
import hashlib
import json
from typing import Any
class AgentCache:
"""Cache agent responses for common queries."""
def __init__(self, max_size: int = 100):
self.cache = {}
self.max_size = max_size
self.hits = 0
self.misses = 0
def _hash_query(self, question: str) -> str:
"""Create hash of question for cache key."""
return hashlib.md5(question.encode()).hexdigest()
def get(self, question: str) -> Any:
"""Get cached response if exists."""
key = self._hash_query(question)
if key in self.cache:
self.hits += 1
return self.cache[key]
self.misses += 1
return None
def set(self, question: str, response: Any):
"""Cache a response."""
if len(self.cache) >= self.max_size:
# Remove oldest entry (FIFO)
self.cache.pop(next(iter(self.cache)))
key = self._hash_query(question)
self.cache[key] = response
def get_hit_rate(self) -> float:
"""Calculate cache hit rate."""
total = self.hits + self.misses
return (self.hits / total * 100) if total > 0 else 0
# Usage example
cache = AgentCache()
def cached_agent_run(question: str, agent_executor):
# Check cache first
cached = cache.get(question)
if cached:
print("✓ Cache hit!")
return cached
# Run agent
result = agent_executor.invoke({"input": question})
# Store in cache
cache.set(question, result)
return resultPart 5: Production Deployment
Containerization with Docker
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Environment variables
ENV PYTHONUNBUFFERED=1
ENV OPENAI_API_KEY=${OPENAI_API_KEY}
# Run application
CMD ["python", "api_server.py"]
# api_server.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from agent import agent_executor
from robust_agent import RobustAgent
import uvicorn
app = FastAPI(title="AI Agent API")
robust_agent = RobustAgent(agent_executor)
class Query(BaseModel):
question: str
use_memory: bool = True
class Response(BaseModel):
success: bool
output: str
steps_count: int
response_time: float
@app.post("/query", response_model=Response)
async def query_agent(query: Query):
"""Run agent with user question."""
try:
result = robust_agent.run(query.question)
if not result["success"]:
raise HTTPException(status_code=500, detail=result["error"])
return Response(
success=True,
output=result["output"],
steps_count=result["metrics"]["steps_count"],
response_time=result["metrics"]["response_time"]
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health_check():
"""Health check endpoint."""
return {"status": "healthy", "metrics": robust_agent.get_metrics()}
@app.get("/metrics")
async def get_metrics():
"""Get agent performance metrics."""
return robust_agent.get_metrics()
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Monitoring with LangSmith
# monitoring.py
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_langsmith_key"
os.environ["LANGCHAIN_PROJECT"] = "ai-agent-production"
# Now all agent runs will be tracked in LangSmith
from agent import agent_executor
result = agent_executor.invoke({"input": "test question"})
# View traces at https://smith.langchain.com
Security Best Practices
# security.py
import os
import re
from typing import List
class SecurityValidator:
"""Validate inputs and outputs for security."""
# Dangerous patterns to block
DANGEROUS_PATTERNS = [
r"(?i)drop\s+table",
r"(?i)delete\s+from",
r"(?i)truncate\s+table",
r"(?i)<script",
r"(?i)javascript:",
r"(?i)eval\s*\(",
r"(?i)exec\s*\(",
]
@classmethod
def validate_input(cls, user_input: str) -> tuple[bool, str]:
"""Check if input contains dangerous patterns."""
for pattern in cls.DANGEROUS_PATTERNS:
if re.search(pattern, user_input):
return False, f"Input contains prohibited pattern: {pattern}"
# Check input length
if len(user_input) > 5000:
return False, "Input too long (max 5000 characters)"
return True, "OK"
@classmethod
def sanitize_output(cls, output: str) -> str:
"""Remove sensitive information from output."""
# Remove API keys (simple pattern)
output = re.sub(r'[A-Za-z0-9]{32,}', '[REDACTED]', output)
# Remove potential file paths
output = re.sub(r'/[a-zA-Z0-9/_\-\.]+', '[PATH]', output)
return output
# Usage in agent
def secure_agent_run(question: str, agent_executor):
# Validate input
is_valid, message = SecurityValidator.validate_input(question)
if not is_valid:
return {"error": message}
# Run agent
result = agent_executor.invoke({"input": question})
# Sanitize output
result["output"] = SecurityValidator.sanitize_output(result["output"])
return resultPart 6: Real-World Case Studies
Case Study 1: Customer Support Automation
Company: Mid-size SaaS company Problem: 500+ support tickets daily, 2-hour average response time Solution: AI agent with access to documentation, ticketing system, and user database
Results:
⚡ 80% reduction in response time (2h → 24min)
🎯 60% of tier-1 tickets fully automated
💰 $200K annual savings in support costs
😊 Customer satisfaction increased from 3.2 to 4.6/5
Technical Implementation:
# Support agent tools
@tool
def search_docs(query: str) -> str:
"""Search internal documentation."""
# Vector search in knowledge base
...
@tool
def get_user_info(email: str) -> str:
"""Get user account details."""
# Query user database
...
@tool
def create_ticket(title: str, description: str) -> str:
"""Create ticket for human agent."""
# Escalation for complex issues
...
Case Study 2: Code Review Assistant
Company: Tech startup with distributed team Problem: Inconsistent code review quality, bottleneck for senior devs Solution: AI agent analyzing PRs for bugs, style, security issues
Results:
🐛 30% increase in bug detection pre-merge
⏱️ 45% reduction in review time
📚 Consistent enforcement of coding standards
🎓 Junior developers learning faster from feedback
Key Features:
Static analysis integration
Security vulnerability scanning
Performance impact estimation
Automated suggestions with explanations
Case Study 3: Data Analysis Pipeline
Company: E-commerce analytics team Problem: 10+ hours weekly on repetitive data analysis tasks Solution: Agent that queries databases, generates visualizations, and creates reports
Results:
⏰ 10 hours/week saved per analyst
📊 Daily automated reports instead of weekly
🔍 Proactive anomaly detection
💡 Insights delivered 5x faster
Implementation Highlight:
@tool
def query_database(sql: str) -> str:
"""Execute SQL query safely."""
# Validate SQL, run in read-only mode
...
@tool
def create_visualization(data: str, chart_type: str) -> str:
"""Generate chart from data."""
# Use plotly/matplotlib
...
@tool
def generate_insights(data: str) -> str:
"""Analyze data and find patterns."""
# Statistical analysis + LLM interpretation
...Part 7: Framework Comparison
LangChain vs LangGraph vs CrewAI vs AutoGen
Recommendation:
Start with LangChain for learning and simple agents
Use LangGraph for complex, stateful workflows
Try CrewAI for role-based multi-agent systems
Consider AutoGen for code-heavy tasks
Part 8: Testing Your Agent
# test_agent.py
import pytest
from agent import run_agent
class TestAgent:
"""Test suite for AI agent."""
def test_basic_query(self):
"""Test simple question answering."""
result = run_agent("What is 2+2?")
assert "4" in result["answer"]
def test_web_search(self):
"""Test tool usage - web search."""
result = run_agent("What is the capital of Japan?")
assert "Tokyo" in result["answer"]
assert len(result["steps"]) > 0 # Agent used tools
def test_error_handling(self):
"""Test agent handles errors gracefully."""
result = run_agent("Search for @@@invalid###")
assert result["answer"] is not None # Should not crash
def test_multi_step_reasoning(self):
"""Test complex multi-step task."""
question = "Find the population of the country where Mount Fuji is located"
result = run_agent(question)
# Should involve multiple steps
assert len(result["steps"]) >= 2
assert "Japan" in result["answer"]
@pytest.mark.slow
def test_token_limit(self):
"""Ensure agent respects token limits."""
# Very long question
long_question = "Explain " + "AI " * 1000
result = run_agent(long_question)
# Should handle gracefully without hitting limits
assert "error" not in result["answer"].lower()
# Run with: pytest test_agent.py -vPart 9: Common Pitfalls and Solutions
Pitfall 1: Infinite Loops
Problem: Agent keeps repeating the same action Solution: Set max_iterations and add loop detection
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=10, # Stop after 10 steps
early_stopping_method="generate" # Force answer if stuck
)
Pitfall 2: Tool Hallucination
Problem: Agent claims to use tools that don’t exist Solution: Clear tool descriptions and few-shot examples
@tool
def my_tool(param: str) -> str:
"""
VERY CLEAR DESCRIPTION of what this tool does.
Args:
param: Exact description of this parameter
Returns:
What this tool returns
Example:
Input: "example input"
Output: "example output"
"""
...
Pitfall 3: Cost Explosion
Problem: Agent makes too many LLM calls Solution: Caching + smaller models for simple tasks
# Use GPT-3.5 for simple tasks, GPT-4 for complex
if is_simple_query(question):
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
else:
llm = ChatOpenAI(model="gpt-4", temperature=0)
Pitfall 4: Context Window Overflow
Problem: Conversation history too long Solution: Sliding window memory + summarization
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=2000, # Keep last 2000 tokens
return_messages=True
)
Conclusion: From POC to Production
You’ve now learned how to: ✅ Understand agent architectures and patterns ✅ Build a working agent from scratch ✅ Add memory, error handling, and monitoring ✅ Deploy agents securely in production ✅ Optimize for cost and performance
Next Steps
Experiment: Modify the code, add new tools, try different LLMs
Specialize: Build an agent for YOUR specific use case
Scale: Deploy with Docker, add monitoring, handle errors
Share: Open-source your agent, write about learnings
Learn More: Join communities, read papers, follow developments
Resources
📚 Documentation:
🛠️ Tools & Frameworks:
LangSmith - Monitoring
LangServe - Deployment
Semantic Kernel - Alternative framework
👥 Communities:
📖 Further Reading:
“ReAct: Synergizing Reasoning and Acting in Language Models” (Paper)
“Reflexion: Language Agents with Verbal Reinforcement Learning” (Paper)
Andrew Ng’s AI Agents Course
Get the Code
All code from this tutorial is available on GitHub:
github.com/dailydevdotin/ai-agent-tutorial
⭐ Star the repo | 🍴 Fork it | 💬 Open issues
Join the Discussion
What will you build with AI agents? Have questions about the implementation? Share in the comments below!
👉 If you found this helpful, please share it with your network. 🐦 Tweet at me @dailydevdotin 💼 Connect on LinkedIn

