Building AI Agents: Architecture, Implementation, and Best Practices
Comprehensive guide to building AI agents covering architecture patterns, tool integration, multi-agent systems, error handling, and production deployment strategies. Learn from real-world implementations and avoid common pitfalls.
Introduction
2025 has been declared the "year of the AI agent." What started as experimental chatbots has evolved into autonomous systems capable of planning, reasoning, using tools, and executing multi-step workflows without human intervention. The agentic AI market reached $10.41 billion in 2025, growing at 56.1% year-over-year, with 99% of enterprise developers now exploring or building AI agents.
But building production-ready AI agents requires more than chaining LLM calls together. This guide covers the architecture patterns, implementation strategies, and operational practices needed to deploy agents that are reliable, observable, and safe enough for enterprise use.
What Defines an AI Agent?
An AI agent is fundamentally different from a stateless LLM completion:
Traditional LLM:
- Single request/response cycle
- No memory between interactions
- No access to external tools
- No ability to plan multi-step actions
AI Agent:
- Persistent execution across multiple steps
- Maintains conversation and task context
- Can use tools (APIs, databases, file systems)
- Plans and reasons about how to accomplish goals
- Decides when tasks are complete
The key distinction: Agents have agency. They don't just answer questions—they decompose problems, choose tools, execute actions, evaluate results, and iterate until objectives are met.
Core Architecture Patterns
1. ReAct (Reasoning + Acting)
The most widely adopted pattern for AI agents. The agent alternates between reasoning about what to do next and taking actions.
Flow:
1. Thought: Analyze the current state and decide next action
2. Action: Execute a tool or API call
3. Observation: Receive results from the action
4. (Repeat until task complete)
Example Implementation:
class ReActAgent {
constructor(llm, tools) {
this.llm = llm;
this.tools = tools;
this.maxIterations = 10;
}
async run(task) {
const history = [];
let iteration = 0;
while (iteration < this.maxIterations) {
// Reasoning step
const prompt = this.buildPrompt(task, history);
const response = await this.llm.complete(prompt);
// Parse thought and action
const { thought, action, actionInput } = this.parseResponse(response);
history.push({ thought, action, actionInput });
// Check for completion
if (action === 'finish') {
return actionInput; // Final answer
}
// Execute action
const tool = this.tools.find(t => t.name === action);
if (!tool) {
throw new Error(`Unknown tool: ${action}`);
}
try {
const observation = await tool.execute(actionInput);
history.push({ observation });
} catch (error) {
history.push({ observation: `Error: ${error.message}` });
}
iteration++;
}
throw new Error('Agent exceeded maximum iterations');
}
buildPrompt(task, history) {
return `
You are an AI agent. Use tools to accomplish tasks.
Available tools:
${this.tools.map(t => - ${t.name}: ${t.description}).join('\n')}
Task: ${task}
${history.map(h => {
if (h.thought) return Thought: ${h.thought}\nAction: ${h.action}\nAction Input: ${h.actionInput};
if (h.observation) return Observation: ${h.observation};
}).join('\n')}
What is your next thought and action?
Format:
Thought: [your reasoning]
Action: [tool name or "finish"]
Action Input: [input for tool or final answer]
`.trim();
}
parseResponse(response) {
const thoughtMatch = response.match(/Thought: (.?)(?=\n|$)/);
const actionMatch = response.match(/Action: (.?)(?=\n|$)/);
const inputMatch = response.match(/Action Input: ([\s\S]*?)$/);
return {
thought: thoughtMatch?.[1]?.trim() || '',
action: actionMatch?.[1]?.trim() || '',
actionInput: inputMatch?.[1]?.trim() || ''
};
}
}
Strengths:
- Transparent reasoning (you can see the agent's thought process)
- Works with any LLM that can follow instructions
- Easy to debug when things go wrong
Weaknesses:
- Can get stuck in loops ("thought → same action → thought → same action")
- LLM output parsing is fragile (especially with weaker models)
- Wastes tokens on verbose reasoning
2. Tool-Augmented Generation
Instead of free-form reasoning, the agent gets structured tool definitions and returns JSON function calls.
Example with OpenAI Function Calling:
const tools = [
{
type: 'function',
function: {
name: 'search_database',
description: 'Search product database by query',
parameters: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
limit: { type: 'number', description: 'Max results', default: 10 }
},
required: ['query']
}
}
},
{
type: 'function',
function: {
name: 'send_email',
description: 'Send email to customer',
parameters: {
type: 'object',
properties: {
to: { type: 'string', description: 'Recipient email' },
subject: { type: 'string' },
body: { type: 'string' }
},
required: ['to', 'subject', 'body']
}
}
}
];
async function runAgent(userMessage) {
const messages = [{ role: 'user', content: userMessage }];
while (true) {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages,
tools,
tool_choice: 'auto'
});
const message = response.choices[0].message;
messages.push(message);
// Agent finished
if (!message.tool_calls) {
return message.content;
}
// Execute tool calls
for (const toolCall of message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const result = await executeTool(toolCall.function.name, args);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
}
}
}
async function executeTool(name, args) {
switch (name) {
case 'search_database':
return await db.products.search(args.query, args.limit);
case 'send_email':
return await sendEmail(args.to, args.subject, args.body);
default:
throw new Error(Unknown tool: ${name});
}
}
Strengths:
- Structured, type-safe tool calls (no parsing)
- Native support in OpenAI, Anthropic, Google models
- Parallel tool execution (agent can call multiple tools simultaneously)
Weaknesses:
- Less transparency (no explicit reasoning)
- Requires models with function calling support
- Still possible to make wrong tool choices
3. Multi-Agent Systems
For complex tasks, use specialized agents that collaborate:
class MultiAgentOrchestrator {
constructor() {
this.agents = {
researcher: new ResearchAgent(),
writer: new WriterAgent(),
editor: new EditorAgent()
};
}
async generateArticle(topic) {
// Step 1: Research agent gathers information
const research = await this.agents.researcher.run({
task: Research ${topic} and gather key facts, statistics, and expert opinions
});
// Step 2: Writer agent creates draft
const draft = await this.agents.writer.run({
task: `Write a 1000-word article about ${topic}`,
context: research
});
// Step 3: Editor agent refines
const final = await this.agents.editor.run({
task: 'Edit for clarity, accuracy, and engagement',
content: draft,
researchContext: research
});
return final;
}
}
When to use:
- Tasks requiring distinct expertise (research vs. writing vs. coding)
- Workflow has clear sequential or parallel stages
- Need specialization (one agent fine-tuned for SQL, another for API calls)
Frameworks:
- CrewAI: Specialized in multi-agent collaboration with role definitions
- AutoGen: Microsoft's framework for conversational multi-agent systems
- LangGraph: Stateful, graph-based agent orchestration
Tool Integration Best Practices
1. Define Clear Tool Contracts
const tools = [
{
name: 'get_weather',
description: 'Get current weather for a city. Use when user asks about weather conditions.',
parameters: {
city: {
type: 'string',
description: 'City name (e.g., "San Francisco" or "London, UK")',
required: true
},
units: {
type: 'string',
enum: ['celsius', 'fahrenheit'],
description: 'Temperature units',
default: 'celsius'
}
},
// Critical: Return value schema
returns: {
type: 'object',
properties: {
temperature: { type: 'number' },
condition: { type: 'string' },
humidity: { type: 'number' }
}
}
}
];
Why this matters:
- Helps LLM understand when and how to use each tool
- Enables validation of tool inputs before execution
- Documents expected outputs for chaining tools
2. Implement Tool Safety Guards
class SafeToolExecutor {
constructor(tools) {
this.tools = tools;
this.rateLimits = new Map();
this.dangerousTools = new Set(['delete_database', 'send_bulk_email']);
}
async execute(toolName, args, context) {
// Rate limiting
if (this.isRateLimited(toolName, context.userId)) {
throw new Error(Rate limit exceeded for ${toolName});
}
// Require human approval for dangerous operations
if (this.dangerousTools.has(toolName)) {
if (!context.humanApproved) {
return {
requiresApproval: true,
message: `Tool ${toolName} requires human approval`,
pendingArgs: args
};
}
}
// Input validation
this.validateArgs(toolName, args);
// Execute with timeout
const tool = this.tools.find(t => t.name === toolName);
return await this.executeWithTimeout(tool, args, 30000);
}
async executeWithTimeout(tool, args, timeoutMs) {
return Promise.race([
tool.execute(args),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Tool execution timeout')), timeoutMs)
)
]);
}
}
3. Handle Tool Errors Gracefully
async function executeToolWithRecovery(tool, args, agent) {
try {
return await tool.execute(args);
} catch (error) {
// Log error
console.error(`Tool ${tool.name} failed:`, error);
// Return error to agent with suggestions
return {
error: error.message,
suggestion: getErrorRecoverySuggestion(error, tool)
};
}
}
function getErrorRecoverySuggestion(error, tool) {
if (error.message.includes('not found')) {
return Try using ${tool.name} with a different query or check if the resource exists;
}
if (error.message.includes('unauthorized')) {
return 'This operation requires authentication. Use the login tool first.';
}
if (error.message.includes('rate limit')) {
return 'Rate limit exceeded. Wait 60 seconds or use cached data instead.';
}
return 'An unexpected error occurred. Try a different approach.';
}
Memory and State Management
Agents need memory to maintain context across interactions:
Short-Term Memory (Conversation Context)
class ConversationMemory {
constructor(maxTokens = 4000) {
this.messages = [];
this.maxTokens = maxTokens;
}
addMessage(role, content) {
this.messages.push({ role, content, timestamp: Date.now() });
this.prune();
}
prune() {
// Estimate token count (rough: 4 chars = 1 token)
while (this.estimateTokens() > this.maxTokens && this.messages.length > 1) {
// Keep system message, remove oldest user/assistant messages
this.messages.splice(1, 1);
}
}
estimateTokens() {
return this.messages.reduce((sum, msg) =>
sum + Math.ceil(msg.content.length / 4), 0
);
}
getMessages() {
return this.messages;
}
}
Long-Term Memory (Vector Store)
import { Pinecone } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
class AgentMemory {
constructor() {
this.embeddings = new OpenAIEmbeddings();
this.pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
this.index = this.pinecone.index('agent-memory');
}
async remember(userId, content, metadata = {}) {
const vector = await this.embeddings.embedQuery(content);
await this.index.upsert([{
id: `${userId}-${Date.now()}`,
values: vector,
metadata: {
userId,
content,
timestamp: Date.now(),
...metadata
}
}]);
}
async recall(userId, query, topK = 5) {
const vector = await this.embeddings.embedQuery(query);
const results = await this.index.query({
vector,
topK,
filter: { userId },
includeMetadata: true
});
return results.matches.map(m => m.metadata.content);
}
}
// Usage in agent
async function runAgentWithMemory(userId, userMessage) {
const memory = new AgentMemory();
// Recall relevant past interactions
const context = await memory.recall(userId, userMessage);
const prompt = `
Previous relevant context:
${context.join('\n')}
Current user message: ${userMessage}
Respond appropriately using context from previous interactions.
`;
const response = await agent.run(prompt);
// Store this interaction for future recall
await memory.remember(userId, User: ${userMessage}\nAssistant: ${response});
return response;
}
Production Deployment Considerations
1. Observability
import { Langfuse } from 'langfuse';
class ObservableAgent {
constructor(agent) {
this.agent = agent;
this.langfuse = new Langfuse({
publicKey: process.env.LANGFUSE_PUBLIC_KEY,
secretKey: process.env.LANGFUSE_SECRET_KEY
});
}
async run(task, userId) {
const trace = this.langfuse.trace({
name: 'agent-execution',
userId,
metadata: { task }
});
try {
const result = await this.agent.run(task, {
onThought: (thought) => {
trace.event({ name: 'thought', metadata: { thought } });
},
onAction: (action, input) => {
trace.event({ name: 'action', metadata: { action, input } });
},
onObservation: (observation) => {
trace.event({ name: 'observation', metadata: { observation } });
}
});
trace.update({ output: result, status: 'success' });
return result;
} catch (error) {
trace.update({ status: 'error', metadata: { error: error.message } });
throw error;
} finally {
await this.langfuse.shutdown();
}
}
}
2. Cost Controls
class CostAwareAgent {
constructor(agent, budget) {
this.agent = agent;
this.dailyBudget = budget;
this.usage = new Map(); // userId -> daily cost
}
async run(userId, task) {
const today = new Date().toISOString().split('T')[0];
const key = ${userId}-${today};
const currentCost = this.usage.get(key) || 0;
if (currentCost >= this.dailyBudget) {
throw new Error('Daily budget exceeded');
}
const startTokens = this.agent.getTokenCount();
const result = await this.agent.run(task);
const endTokens = this.agent.getTokenCount();
const tokensUsed = endTokens - startTokens;
const cost = this.calculateCost(tokensUsed);
this.usage.set(key, currentCost + cost);
return {
result,
tokensUsed,
cost,
remainingBudget: this.dailyBudget - (currentCost + cost)
};
}
calculateCost(tokens) {
// GPT-4 pricing: $0.03 per 1K input tokens, $0.06 per 1K output
// Simplified: assume 50/50 split
return (tokens / 1000) * 0.045;
}
}
3. Reliability Patterns
class ReliableAgent {
constructor(agent, options = {}) {
this.agent = agent;
this.maxRetries = options.maxRetries || 3;
this.backoffMs = options.backoffMs || 1000;
}
async run(task) {
let lastError;
for (let attempt = 0; attempt < this.maxRetries; attempt++) {
try {
// Checkpoint system: save state before each iteration
const checkpoint = this.agent.getState();
const result = await this.executeWithTimeout(task, 120000);
// Validate result
if (!this.isValidResult(result)) {
throw new Error('Invalid agent output');
}
return result;
} catch (error) {
lastError = error;
console.warn(`Agent attempt ${attempt + 1} failed:`, error.message);
// Exponential backoff
if (attempt < this.maxRetries - 1) {
await this.sleep(this.backoffMs * Math.pow(2, attempt));
// Restore from last checkpoint
this.agent.restoreState(checkpoint);
}
}
}
throw new Error(`Agent failed after ${this.maxRetries} attempts: ${lastError.message}`);
}
async executeWithTimeout(task, timeoutMs) {
return Promise.race([
this.agent.run(task),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Agent timeout')), timeoutMs)
)
]);
}
isValidResult(result) {
return result !== null &&
result !== undefined &&
typeof result === 'string' &&
result.length > 0;
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
Common Pitfalls and Solutions
1. Agent Gets Stuck in Loops
Problem: Agent repeatedly tries the same failed action.
Solution: Track action history and detect loops:
class LoopDetector {
constructor(windowSize = 3) {
this.history = [];
this.windowSize = windowSize;
}
addAction(action, input) {
this.history.push({ action, input: JSON.stringify(input) });
if (this.history.length > this.windowSize * 2) {
this.history.shift();
}
}
isInLoop() {
if (this.history.length < this.windowSize * 2) return false;
const recent = this.history.slice(-this.windowSize);
const previous = this.history.slice(-this.windowSize * 2, -this.windowSize);
return JSON.stringify(recent) === JSON.stringify(previous);
}
}
2. Hallucinated Tool Calls
Problem: Agent invents tools that don't exist.
Solution: Strict tool validation + error feedback:
async function executeTool(toolName, args, availableTools) {
const tool = availableTools.find(t => t.name === toolName);
if (!tool) {
return {
error: Tool "${toolName}" does not exist.,
availableTools: availableTools.map(t => t.name),
suggestion: Did you mean one of: ${availableTools.map(t => t.name).join(', ')}?
};
}
return await tool.execute(args);
}
3. Context Window Overflow
Problem: Agent runs out of context trying to track long conversations.
Solution: Summarization + selective context:
async function manageContext(messages, llm) {
const tokenCount = estimateTokens(messages);
const maxTokens = 8000;
if (tokenCount > maxTokens * 0.8) {
// Summarize older messages
const toSummarize = messages.slice(0, -10);
const recent = messages.slice(-10);
const summary = await llm.complete(`
Summarize this conversation history concisely:
${toSummarize.map(m => ${m.role}: ${m.content}).join('\n')}
`);
return [
{ role: 'system', content: `Previous conversation summary: ${summary}` },
...recent
];
}
return messages;
}
Conclusion
AI agents are no longer science fiction—they're powering customer service, code generation, data analysis, and workflow automation in production systems. But building reliable agents requires careful architecture:
- Choose the right pattern: ReAct for transparency, function calling for structure, multi-agent for complexity
- Design safe tools: Validate inputs, implement rate limiting, require human approval for destructive actions
- Manage state properly: Short-term memory for conversations, long-term vector storage for recall
- Prioritize observability: Log every thought, action, and observation for debugging
- Build in reliability: Retries, timeouts, loop detection, and graceful error handling
The agents you build today will be your team's autonomous coworkers tomorrow. Build them to be reliable, observable, and safe—because once they're running in production, you need to trust them to make the right decisions.
Further Reading:
Related Articles
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality
Comprehensive guide to testing strategies covering unit tests, integration tests, end-to-end testing, test-driven development, mocking patterns, testing pyramid, and production testing practices for reliable software delivery.
Monitoring and Observability - Production Systems Performance and Debugging at Scale
Master monitoring and observability covering metrics collection with Prometheus, distributed tracing with OpenTelemetry, log aggregation, alerting strategies, SLOs/SLIs, and production debugging techniques for reliable systems.
Written by StaticBlock Editorial
StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.