Build Your First Agent
Deploy autonomous software that thinks—replace complex architectures with intelligent agents
Build Your First Agent
Build autonomous software with AI-powered reasoning—not rule engines
Replace thousands of lines of if-else logic with autonomous agents that think, correlate, and act.
You'll build: A self-healing log intelligence system that detects anomalies, correlates patterns across services, and creates intelligent alerts—all without hardcoded rules.
What makes this different:
- Multiple reasoners in one agent calling each other to form complex workflows
- AI replaces complex rule engines and correlation logic
- Autonomous decision-making with conditional escalation
- Real APIs, not chatbots
The Autonomous Log Intelligence Agent
Create the Log Intelligence Agent
af init log-intelligence --language python
cd log-intelligenceEdit main.py to build an agent with multiple reasoners that call each other:
from agentfield import Agent
from pydantic import BaseModel
from typing import List
# Configure global default: use powerful model for complex reasoning
app = Agent(
node_id="log-intelligence",
default_model="gpt-4o" # Complex reasoning by default
)
# Simple schemas: 1-3 fields only
class AnomalyResult(BaseModel):
is_anomaly: bool
confidence: float
reasoning: str
class ImpactAssessment(BaseModel):
severity: str # low, medium, high, critical
affected_services: List[str]
@app.reasoner()
async def detect_anomaly(logs: List[dict]) -> AnomalyResult:
"""
Complex reasoning: Uses default model (gpt-4o) for pattern detection.
Requires deep analysis of correlations and cascading failures.
"""
log_summary = "\n".join([
f"[{log['timestamp']}] {log['service']}: {log['level']} - {log['message']}"
for log in logs
])
# Uses default_model (gpt-4o) - no override needed for complex tasks
return await app.ai(
system="You are an expert at detecting anomalies in system logs. "
"Look for unusual patterns, error spikes, cascading failures, "
"or correlated issues across services.",
user=f"Analyze these logs:\n\n{log_summary}",
schema=AnomalyResult
)
@app.reasoner()
async def assess_impact(logs: List[dict], anomaly: AnomalyResult) -> ImpactAssessment:
"""
Simple classification: Override with faster model (gpt-4o-mini).
Just needs to map severity levels - no complex reasoning required.
"""
services = list(set(log['service'] for log in logs))
error_logs = [log for log in logs if log['level'] in ['ERROR', 'CRITICAL']]
# Override with faster model for simple classification
return await app.ai(
model="gpt-4o-mini", # Simple task → fast model
system="Classify severity as: low, medium, high, or critical. "
"Based on error count and service criticality.",
user=f"Services: {', '.join(services)}\nErrors: {len(error_logs)}\n"
f"Anomaly: {anomaly.reasoning}",
schema=ImpactAssessment
)
if __name__ == "__main__":
app.serve(port=8001)Test Individual Reasoners
# Terminal 1: Start the agent
af run
# Terminal 2: Test anomaly detection
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.detect_anomaly \
-H "Content-Type: application/json" \
-d '{
"input": {
"logs": [
{"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
{"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "ERROR", "message": "Database connection pool exhausted"},
{"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
{"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"}
]
}
}'Response:
{
"result": {
"is_anomaly": true,
"confidence": 0.92,
"reasoning": "Cascading failure detected: user-service database exhaustion causing downstream failures in api-gateway and billing-service"
}
}You just built an AI-powered anomaly detector that replaces traditional rule-based monitoring.
Model Selection Strategy
Key principle: Use powerful models for complex reasoning, fast models for simple tasks.
# Global default for the agent
app = Agent(
node_id="log-intelligence",
default_model="gpt-4o" # Complex reasoning by default
)
# Complex reasoning → uses default (gpt-4o)
await app.ai(
system="Detect complex patterns...",
schema=AnomalyResult # No model override
)
# Simple classification → override with fast model
await app.ai(
model="gpt-4o-mini", # Override for simple tasks
system="Classify severity...",
schema=ImpactAssessment
)When to use each:
- gpt-4o (default): Pattern detection, correlation analysis, complex decisions
- gpt-4o-mini (override): Classification, formatting, simple lookups
- Result: ~3x cost savings on simple tasks, optimal performance on complex ones
Orchestrating Multiple Reasoners
Now add an orchestrator reasoner that calls the other reasoners to form a complete workflow. Same-agent calls use await directly:
@app.reasoner()
async def analyze_logs(logs: List[dict]) -> dict:
"""
Orchestrator: Calls other reasoners in this agent to form a workflow.
Same-agent calls use 'await function()' directly—no app.call needed.
"""
# Step 1: Detect anomalies (call internal reasoner)
anomaly = await detect_anomaly(logs)
if not anomaly.is_anomaly:
return {
"status": "healthy",
"message": "No anomalies detected",
"confidence": anomaly.confidence
}
# Step 2: Assess impact (call another internal reasoner)
impact = await assess_impact(logs, anomaly)
# Step 3: Return comprehensive analysis
return {
"status": "anomaly_detected",
"anomaly": {
"is_anomaly": anomaly.is_anomaly,
"confidence": anomaly.confidence,
"reasoning": anomaly.reasoning
},
"impact": {
"severity": impact.severity,
"affected_services": impact.affected_services
}
}Test the complete workflow:
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
-H "Content-Type: application/json" \
-d '{
"input": {
"logs": [
{"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
{"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "ERROR", "message": "Database connection pool exhausted"},
{"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
{"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"}
]
}
}'Response:
{
"result": {
"status": "anomaly_detected",
"anomaly": {
"is_anomaly": true,
"confidence": 0.92,
"reasoning": "Cascading failure: user-service database exhaustion causing downstream failures"
},
"impact": {
"severity": "critical",
"affected_services": ["api-gateway", "user-service", "billing-service"]
}
}
}What happened: The orchestrator called detect_anomaly() → then assess_impact() → returned comprehensive analysis. All in one agent node.
Cross-Agent Communication
For critical issues, escalate to a separate Alert Manager agent using app.call():
Create the Alert Manager Agent
# In a new directory
af init alert-manager --language python
cd alert-managerEdit main.py:
from agentfield import Agent
from pydantic import BaseModel
from typing import List
# Fast model for simple alert routing logic
app = Agent(
node_id="alert-manager",
default_model="gpt-4o-mini" # Simple routing → fast model
)
class AlertDecision(BaseModel):
should_alert: bool
channels: List[str] # slack, pagerduty, email
priority: str
@app.reasoner()
async def create_alert(
severity: str,
affected_services: List[str],
reasoning: str
) -> AlertDecision:
"""
Simple routing logic: fast model is sufficient.
Maps severity → channels based on clear rules.
"""
return await app.ai(
system="Decide alert channels based on severity. "
"Critical → PagerDuty + Slack. High → Slack only. Medium/Low → no alert.",
user=f"Severity: {severity}\nServices: {', '.join(affected_services)}\n{reasoning}",
schema=AlertDecision
)
if __name__ == "__main__":
app.serve(port=8002)Update Log Intelligence Agent for Conditional Escalation
Add this to the log-intelligence/main.py orchestrator:
@app.reasoner()
async def analyze_logs(logs: List[dict]) -> dict:
"""Orchestrator with conditional cross-agent escalation."""
# Step 1: Detect anomalies (internal call with await)
anomaly = await detect_anomaly(logs)
if not anomaly.is_anomaly:
return {"status": "healthy", "message": "No anomalies detected"}
# Step 2: Assess impact (internal call with await)
impact = await assess_impact(logs, anomaly)
# Step 3: Conditional escalation to external agent (use app.call)
alert_decision = None
if impact.severity in ["critical", "high"]:
alert_decision = await app.call(
"alert-manager.create_alert",
severity=impact.severity,
affected_services=impact.affected_services,
reasoning=anomaly.reasoning
)
return {
"status": "anomaly_detected",
"anomaly": {
"is_anomaly": anomaly.is_anomaly,
"confidence": anomaly.confidence,
"reasoning": anomaly.reasoning
},
"impact": {
"severity": impact.severity,
"affected_services": impact.affected_services
},
"alert": alert_decision if alert_decision else {"status": "no_alert_needed"}
}Run the Complete System
# Terminal 1: Control plane
af server
# Terminal 2: Log Intelligence Agent
cd log-intelligence
af run
# Terminal 3: Alert Manager Agent
cd alert-manager
af run
# Terminal 4: Test the full workflow
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
-H "Content-Type: application/json" \
-d '{
"input": {
"logs": [
{"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
{"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "CRITICAL", "message": "Database connection pool exhausted"},
{"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
{"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"},
{"timestamp": "2024-01-15T10:00:07Z", "service": "payment-service", "level": "ERROR", "message": "Cannot connect to billing-service"}
]
}
}'Full Response with Alert Escalation:
{
"result": {
"status": "anomaly_detected",
"anomaly": {
"is_anomaly": true,
"confidence": 0.95,
"reasoning": "Cascading failure originating from user-service database exhaustion, propagating to api-gateway, billing-service, and payment-service"
},
"impact": {
"severity": "critical",
"affected_services": ["api-gateway", "user-service", "billing-service", "payment-service"]
},
"alert": {
"should_alert": true,
"channels": ["pagerduty", "slack"],
"priority": "P1"
}
},
"execution_id": "exec_abc123"
}What happened:
analyze_logs()orchestrator received logs- Called
detect_anomaly()(same agent, usedawait) - Detected anomaly → called
assess_impact()(same agent, usedawait) - Impact was "critical" → called
alert-manager.create_alert(different agent, usedapp.call()) - Alert Manager decided to alert via PagerDuty + Slack
- Complete result returned with alert decision
View the Workflow DAG:
Open http://localhost:8080 → Executions → Click on exec_abc123 → See the complete execution graph:
analyze_logs (log-intelligence)
├─ detect_anomaly (log-intelligence) [internal await]
├─ assess_impact (log-intelligence) [internal await]
└─ create_alert (alert-manager) [cross-agent app.call]Why This Matters
Traditional Approach:
# Hundreds of lines of brittle rules
if error_count > THRESHOLD_1 and service in CRITICAL_SERVICES:
if time_window < THRESHOLD_2:
if error_rate > THRESHOLD_3:
# Complex correlation logic...
if cascading_detected():
# Alert routing logic...
if severity == "critical" and on_call_hours():
send_pagerduty()
elif severity == "high":
send_slack()
# ... more conditionalsAutonomous Approach:
# AI reasons about the situation
anomaly = await detect_anomaly(logs) # AI detects patterns
impact = await assess_impact(logs, anomaly) # AI assesses severity
alert = await app.call("alert-manager.create_alert", ...) # AI routes alertsBenefits:
- No hardcoded thresholds - AI adapts to patterns
- Intelligent correlation - AI understands cascading failures
- Context-aware - AI considers service dependencies
- Reduced false positives - AI filters noise
- Evolves automatically - AI learns from new patterns
Testing Different Scenarios
Scenario 1: No Anomaly (Healthy System)
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
-H "Content-Type: application/json" \
-d '{
"input": {
"logs": [
{"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "INFO", "message": "Request processed successfully"},
{"timestamp": "2024-01-15T10:00:01Z", "service": "user-service", "level": "INFO", "message": "User authenticated"}
]
}
}'Response:
{
"result": {
"status": "healthy",
"message": "No anomalies detected",
"confidence": 0.98
}
}Scenario 2: Medium Severity (No Alert)
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
-H "Content-Type: application/json" \
-d '{
"input": {
"logs": [
{"timestamp": "2024-01-15T10:00:00Z", "service": "cache-service", "level": "WARN", "message": "Cache miss rate elevated"},
{"timestamp": "2024-01-15T10:00:02Z", "service": "cache-service", "level": "WARN", "message": "Eviction rate increasing"}
]
}
}'Response:
{
"result": {
"status": "anomaly_detected",
"anomaly": {
"is_anomaly": true,
"confidence": 0.78,
"reasoning": "Cache performance degradation detected"
},
"impact": {
"severity": "medium",
"affected_services": ["cache-service"]
},
"alert": {
"status": "no_alert_needed"
}
}
}Note: Medium severity didn't trigger alert escalation—AI decided it doesn't warrant immediate attention.
Production Considerations
This is a learning example. Before production, consider:
What's Missing:
- Historical context - Anomaly detection needs baseline metrics, time-series analysis
- Error handling - Retry logic, timeout handling, fallback responses
- Rate limiting - Prevent API overload during log spikes
- Observability - Metrics, tracing, structured logging for the agent itself
- Schema validation - Stricter Pydantic validation with enums and constraints
Quick Improvements:
from enum import Enum
class Severity(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class ImpactAssessment(BaseModel):
severity: Severity # Enum instead of string
affected_services: List[str]
class Config:
# Add validation
min_items_affected_services = 1Why Model Selection Matters:
- gpt-4o for anomaly detection: ~$0.005/request
- gpt-4o-mini for classification: ~$0.0015/request
- At scale: 1M classifications/month = $1,500 vs $5,000 (67% savings)
Next Steps
You just built autonomous software that:
- Uses multiple reasoners in one agent node
- Calls reasoners with
awaitfor internal workflows - Uses
app.call()for cross-agent communication - Makes conditional decisions with AI
- Replaces complex rule engines with intelligence
Explore more:
- Local Development - Development environment setup
- Managed Platforms - Deploy to production
- Python SDK Reference - Complete API documentation
Build autonomous software, not rule engines. Start with intelligent reasoners, orchestrate with workflows, scale with multiple agents