Build Your First Agent

Deploy autonomous software that thinks—replace complex architectures with intelligent agents

Build Your First Agent

Build autonomous software with AI-powered reasoning—not rule engines

Replace thousands of lines of if-else logic with autonomous agents that think, correlate, and act.

You'll build: A self-healing log intelligence system that detects anomalies, correlates patterns across services, and creates intelligent alerts—all without hardcoded rules.

What makes this different:

  • Multiple reasoners in one agent calling each other to form complex workflows
  • AI replaces complex rule engines and correlation logic
  • Autonomous decision-making with conditional escalation
  • Real APIs, not chatbots

The Autonomous Log Intelligence Agent

Start the Control Plane

af server

Opens Web UI at http://localhost:8080

Create the Log Intelligence Agent

af init log-intelligence --language python
cd log-intelligence

Edit main.py to build an agent with multiple reasoners that call each other:

from agentfield import Agent
from pydantic import BaseModel
from typing import List

# Configure global default: use powerful model for complex reasoning
app = Agent(
    node_id="log-intelligence",
    default_model="gpt-4o"  # Complex reasoning by default
)

# Simple schemas: 1-3 fields only
class AnomalyResult(BaseModel):
    is_anomaly: bool
    confidence: float
    reasoning: str

class ImpactAssessment(BaseModel):
    severity: str  # low, medium, high, critical
    affected_services: List[str]

@app.reasoner()
async def detect_anomaly(logs: List[dict]) -> AnomalyResult:
    """
    Complex reasoning: Uses default model (gpt-4o) for pattern detection.
    Requires deep analysis of correlations and cascading failures.
    """
    log_summary = "\n".join([
        f"[{log['timestamp']}] {log['service']}: {log['level']} - {log['message']}"
        for log in logs
    ])

    # Uses default_model (gpt-4o) - no override needed for complex tasks
    return await app.ai(
        system="You are an expert at detecting anomalies in system logs. "
               "Look for unusual patterns, error spikes, cascading failures, "
               "or correlated issues across services.",
        user=f"Analyze these logs:\n\n{log_summary}",
        schema=AnomalyResult
    )

@app.reasoner()
async def assess_impact(logs: List[dict], anomaly: AnomalyResult) -> ImpactAssessment:
    """
    Simple classification: Override with faster model (gpt-4o-mini).
    Just needs to map severity levels - no complex reasoning required.
    """
    services = list(set(log['service'] for log in logs))
    error_logs = [log for log in logs if log['level'] in ['ERROR', 'CRITICAL']]

    # Override with faster model for simple classification
    return await app.ai(
        model="gpt-4o-mini",  # Simple task → fast model
        system="Classify severity as: low, medium, high, or critical. "
               "Based on error count and service criticality.",
        user=f"Services: {', '.join(services)}\nErrors: {len(error_logs)}\n"
             f"Anomaly: {anomaly.reasoning}",
        schema=ImpactAssessment
    )

if __name__ == "__main__":
    app.serve(port=8001)

Test Individual Reasoners

# Terminal 1: Start the agent
af run

# Terminal 2: Test anomaly detection
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.detect_anomaly \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "ERROR", "message": "Database connection pool exhausted"},
        {"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"}
      ]
    }
  }'

Response:

{
  "result": {
    "is_anomaly": true,
    "confidence": 0.92,
    "reasoning": "Cascading failure detected: user-service database exhaustion causing downstream failures in api-gateway and billing-service"
  }
}

You just built an AI-powered anomaly detector that replaces traditional rule-based monitoring.


Model Selection Strategy

Key principle: Use powerful models for complex reasoning, fast models for simple tasks.

# Global default for the agent
app = Agent(
    node_id="log-intelligence",
    default_model="gpt-4o"  # Complex reasoning by default
)

# Complex reasoning → uses default (gpt-4o)
await app.ai(
    system="Detect complex patterns...",
    schema=AnomalyResult  # No model override
)

# Simple classification → override with fast model
await app.ai(
    model="gpt-4o-mini",  # Override for simple tasks
    system="Classify severity...",
    schema=ImpactAssessment
)

When to use each:

  • gpt-4o (default): Pattern detection, correlation analysis, complex decisions
  • gpt-4o-mini (override): Classification, formatting, simple lookups
  • Result: ~3x cost savings on simple tasks, optimal performance on complex ones

Orchestrating Multiple Reasoners

Now add an orchestrator reasoner that calls the other reasoners to form a complete workflow. Same-agent calls use await directly:

@app.reasoner()
async def analyze_logs(logs: List[dict]) -> dict:
    """
    Orchestrator: Calls other reasoners in this agent to form a workflow.

    Same-agent calls use 'await function()' directly—no app.call needed.
    """

    # Step 1: Detect anomalies (call internal reasoner)
    anomaly = await detect_anomaly(logs)

    if not anomaly.is_anomaly:
        return {
            "status": "healthy",
            "message": "No anomalies detected",
            "confidence": anomaly.confidence
        }

    # Step 2: Assess impact (call another internal reasoner)
    impact = await assess_impact(logs, anomaly)

    # Step 3: Return comprehensive analysis
    return {
        "status": "anomaly_detected",
        "anomaly": {
            "is_anomaly": anomaly.is_anomaly,
            "confidence": anomaly.confidence,
            "reasoning": anomaly.reasoning
        },
        "impact": {
            "severity": impact.severity,
            "affected_services": impact.affected_services
        }
    }

Test the complete workflow:

curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "ERROR", "message": "Database connection pool exhausted"},
        {"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"}
      ]
    }
  }'

Response:

{
  "result": {
    "status": "anomaly_detected",
    "anomaly": {
      "is_anomaly": true,
      "confidence": 0.92,
      "reasoning": "Cascading failure: user-service database exhaustion causing downstream failures"
    },
    "impact": {
      "severity": "critical",
      "affected_services": ["api-gateway", "user-service", "billing-service"]
    }
  }
}

What happened: The orchestrator called detect_anomaly() → then assess_impact() → returned comprehensive analysis. All in one agent node.


Cross-Agent Communication

For critical issues, escalate to a separate Alert Manager agent using app.call():

Create the Alert Manager Agent

# In a new directory
af init alert-manager --language python
cd alert-manager

Edit main.py:

from agentfield import Agent
from pydantic import BaseModel
from typing import List

# Fast model for simple alert routing logic
app = Agent(
    node_id="alert-manager",
    default_model="gpt-4o-mini"  # Simple routing → fast model
)

class AlertDecision(BaseModel):
    should_alert: bool
    channels: List[str]  # slack, pagerduty, email
    priority: str

@app.reasoner()
async def create_alert(
    severity: str,
    affected_services: List[str],
    reasoning: str
) -> AlertDecision:
    """
    Simple routing logic: fast model is sufficient.
    Maps severity → channels based on clear rules.
    """
    return await app.ai(
        system="Decide alert channels based on severity. "
               "Critical → PagerDuty + Slack. High → Slack only. Medium/Low → no alert.",
        user=f"Severity: {severity}\nServices: {', '.join(affected_services)}\n{reasoning}",
        schema=AlertDecision
    )

if __name__ == "__main__":
    app.serve(port=8002)

Update Log Intelligence Agent for Conditional Escalation

Add this to the log-intelligence/main.py orchestrator:

@app.reasoner()
async def analyze_logs(logs: List[dict]) -> dict:
    """Orchestrator with conditional cross-agent escalation."""

    # Step 1: Detect anomalies (internal call with await)
    anomaly = await detect_anomaly(logs)

    if not anomaly.is_anomaly:
        return {"status": "healthy", "message": "No anomalies detected"}

    # Step 2: Assess impact (internal call with await)
    impact = await assess_impact(logs, anomaly)

    # Step 3: Conditional escalation to external agent (use app.call)
    alert_decision = None
    if impact.severity in ["critical", "high"]:
        alert_decision = await app.call(
            "alert-manager.create_alert",
            severity=impact.severity,
            affected_services=impact.affected_services,
            reasoning=anomaly.reasoning
        )

    return {
        "status": "anomaly_detected",
        "anomaly": {
            "is_anomaly": anomaly.is_anomaly,
            "confidence": anomaly.confidence,
            "reasoning": anomaly.reasoning
        },
        "impact": {
            "severity": impact.severity,
            "affected_services": impact.affected_services
        },
        "alert": alert_decision if alert_decision else {"status": "no_alert_needed"}
    }

Run the Complete System

# Terminal 1: Control plane
af server

# Terminal 2: Log Intelligence Agent
cd log-intelligence
af run

# Terminal 3: Alert Manager Agent
cd alert-manager
af run

# Terminal 4: Test the full workflow
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "CRITICAL", "message": "Database connection pool exhausted"},
        {"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"},
        {"timestamp": "2024-01-15T10:00:07Z", "service": "payment-service", "level": "ERROR", "message": "Cannot connect to billing-service"}
      ]
    }
  }'

Full Response with Alert Escalation:

{
  "result": {
    "status": "anomaly_detected",
    "anomaly": {
      "is_anomaly": true,
      "confidence": 0.95,
      "reasoning": "Cascading failure originating from user-service database exhaustion, propagating to api-gateway, billing-service, and payment-service"
    },
    "impact": {
      "severity": "critical",
      "affected_services": ["api-gateway", "user-service", "billing-service", "payment-service"]
    },
    "alert": {
      "should_alert": true,
      "channels": ["pagerduty", "slack"],
      "priority": "P1"
    }
  },
  "execution_id": "exec_abc123"
}

What happened:

  1. analyze_logs() orchestrator received logs
  2. Called detect_anomaly() (same agent, used await)
  3. Detected anomaly → called assess_impact() (same agent, used await)
  4. Impact was "critical" → called alert-manager.create_alert (different agent, used app.call())
  5. Alert Manager decided to alert via PagerDuty + Slack
  6. Complete result returned with alert decision

View the Workflow DAG: Open http://localhost:8080 → Executions → Click on exec_abc123 → See the complete execution graph:

analyze_logs (log-intelligence)
  ├─ detect_anomaly (log-intelligence) [internal await]
  ├─ assess_impact (log-intelligence) [internal await]
  └─ create_alert (alert-manager) [cross-agent app.call]

Why This Matters

Traditional Approach:

# Hundreds of lines of brittle rules
if error_count > THRESHOLD_1 and service in CRITICAL_SERVICES:
    if time_window < THRESHOLD_2:
        if error_rate > THRESHOLD_3:
            # Complex correlation logic...
            if cascading_detected():
                # Alert routing logic...
                if severity == "critical" and on_call_hours():
                    send_pagerduty()
                elif severity == "high":
                    send_slack()
                # ... more conditionals

Autonomous Approach:

# AI reasons about the situation
anomaly = await detect_anomaly(logs)  # AI detects patterns
impact = await assess_impact(logs, anomaly)  # AI assesses severity
alert = await app.call("alert-manager.create_alert", ...)  # AI routes alerts

Benefits:

  • No hardcoded thresholds - AI adapts to patterns
  • Intelligent correlation - AI understands cascading failures
  • Context-aware - AI considers service dependencies
  • Reduced false positives - AI filters noise
  • Evolves automatically - AI learns from new patterns

Testing Different Scenarios

Scenario 1: No Anomaly (Healthy System)

curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "INFO", "message": "Request processed successfully"},
        {"timestamp": "2024-01-15T10:00:01Z", "service": "user-service", "level": "INFO", "message": "User authenticated"}
      ]
    }
  }'

Response:

{
  "result": {
    "status": "healthy",
    "message": "No anomalies detected",
    "confidence": 0.98
  }
}

Scenario 2: Medium Severity (No Alert)

curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "cache-service", "level": "WARN", "message": "Cache miss rate elevated"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "cache-service", "level": "WARN", "message": "Eviction rate increasing"}
      ]
    }
  }'

Response:

{
  "result": {
    "status": "anomaly_detected",
    "anomaly": {
      "is_anomaly": true,
      "confidence": 0.78,
      "reasoning": "Cache performance degradation detected"
    },
    "impact": {
      "severity": "medium",
      "affected_services": ["cache-service"]
    },
    "alert": {
      "status": "no_alert_needed"
    }
  }
}

Note: Medium severity didn't trigger alert escalation—AI decided it doesn't warrant immediate attention.


Production Considerations

This is a learning example. Before production, consider:

What's Missing:

  1. Historical context - Anomaly detection needs baseline metrics, time-series analysis
  2. Error handling - Retry logic, timeout handling, fallback responses
  3. Rate limiting - Prevent API overload during log spikes
  4. Observability - Metrics, tracing, structured logging for the agent itself
  5. Schema validation - Stricter Pydantic validation with enums and constraints

Quick Improvements:

from enum import Enum

class Severity(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

class ImpactAssessment(BaseModel):
    severity: Severity  # Enum instead of string
    affected_services: List[str]

    class Config:
        # Add validation
        min_items_affected_services = 1

Why Model Selection Matters:

  • gpt-4o for anomaly detection: ~$0.005/request
  • gpt-4o-mini for classification: ~$0.0015/request
  • At scale: 1M classifications/month = $1,500 vs $5,000 (67% savings)

Next Steps

You just built autonomous software that:

  • Uses multiple reasoners in one agent node
  • Calls reasoners with await for internal workflows
  • Uses app.call() for cross-agent communication
  • Makes conditional decisions with AI
  • Replaces complex rule engines with intelligence

Explore more:


Build autonomous software, not rule engines. Start with intelligent reasoners, orchestrate with workflows, scale with multiple agents