Deploy autonomous software that thinks—replace complex architectures with intelligent agents

Replace thousands of lines of if-else logic with autonomous agents that think, correlate, and act.

You'll build: A self-healing log intelligence system that detects anomalies, correlates patterns across services, and creates intelligent alerts—all without hardcoded rules.

What makes this different:

Multiple reasoners in one agent calling each other to form complex workflows
AI replaces complex rule engines and correlation logic
Autonomous decision-making with conditional escalation
Real APIs, not chatbots

The Autonomous Log Intelligence Agent

Start the Control Plane

af server

Opens Web UI at http://localhost:8080

Create the Log Intelligence Agent

af init log-intelligence --language python
cd log-intelligence

Edit main.py to build an agent with multiple reasoners that call each other:

from agentfield import Agent
from pydantic import BaseModel
from typing import List

# Configure global default: use powerful model for complex reasoning
app = Agent(
    node_id="log-intelligence",
    default_model="gpt-4o"  # Complex reasoning by default
)

# Simple schemas: 1-3 fields only
class AnomalyResult(BaseModel):
    is_anomaly: bool
    confidence: float
    reasoning: str

class ImpactAssessment(BaseModel):
    severity: str  # low, medium, high, critical
    affected_services: List[str]

@app.reasoner()
async def detect_anomaly(logs: List[dict]) -> AnomalyResult:
    """
    Complex reasoning: Uses default model (gpt-4o) for pattern detection.
    Requires deep analysis of correlations and cascading failures.
    """
    log_summary = "\n".join([
        f"[{log['timestamp']}] {log['service']}: {log['level']} - {log['message']}"
        for log in logs
    ])

    # Uses default_model (gpt-4o) - no override needed for complex tasks
    return await app.ai(
        system="You are an expert at detecting anomalies in system logs. "
               "Look for unusual patterns, error spikes, cascading failures, "
               "or correlated issues across services.",
        user=f"Analyze these logs:\n\n{log_summary}",
        schema=AnomalyResult
    )

@app.reasoner()
async def assess_impact(logs: List[dict], anomaly: AnomalyResult) -> ImpactAssessment:
    """
    Simple classification: Override with faster model (gpt-4o-mini).
    Just needs to map severity levels - no complex reasoning required.
    """
    services = list(set(log['service'] for log in logs))
    error_logs = [log for log in logs if log['level'] in ['ERROR', 'CRITICAL']]

    # Override with faster model for simple classification
    return await app.ai(
        model="gpt-4o-mini",  # Simple task → fast model
        system="Classify severity as: low, medium, high, or critical. "
               "Based on error count and service criticality.",
        user=f"Services: {', '.join(services)}\nErrors: {len(error_logs)}\n"
             f"Anomaly: {anomaly.reasoning}",
        schema=ImpactAssessment
    )

if __name__ == "__main__":
    app.serve(port=8001)

Test Individual Reasoners

# Terminal 1: Start the agent
af run

# Terminal 2: Test anomaly detection
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.detect_anomaly \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "ERROR", "message": "Database connection pool exhausted"},
        {"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"}
      ]
    }
  }'

Response:

{
  "result": {
    "is_anomaly": true,
    "confidence": 0.92,
    "reasoning": "Cascading failure detected: user-service database exhaustion causing downstream failures in api-gateway and billing-service"
  }
}

You just built an AI-powered anomaly detector that replaces traditional rule-based monitoring.

Model Selection Strategy

Key principle: Use powerful models for complex reasoning, fast models for simple tasks.

# Global default for the agent
app = Agent(
    node_id="log-intelligence",
    default_model="gpt-4o"  # Complex reasoning by default
)

# Complex reasoning → uses default (gpt-4o)
await app.ai(
    system="Detect complex patterns...",
    schema=AnomalyResult  # No model override
)

# Simple classification → override with fast model
await app.ai(
    model="gpt-4o-mini",  # Override for simple tasks
    system="Classify severity...",
    schema=ImpactAssessment
)

When to use each:

gpt-4o (default): Pattern detection, correlation analysis, complex decisions
gpt-4o-mini (override): Classification, formatting, simple lookups
Result: ~3x cost savings on simple tasks, optimal performance on complex ones

Orchestrating Multiple Reasoners

Now add an orchestrator reasoner that calls the other reasoners to form a complete workflow. Same-agent calls use await directly:

@app.reasoner()
async def analyze_logs(logs: List[dict]) -> dict:
    """
    Orchestrator: Calls other reasoners in this agent to form a workflow.

    Same-agent calls use 'await function()' directly—no app.call needed.
    """

    # Step 1: Detect anomalies (call internal reasoner)
    anomaly = await detect_anomaly(logs)

    if not anomaly.is_anomaly:
        return {
            "status": "healthy",
            "message": "No anomalies detected",
            "confidence": anomaly.confidence
        }

    # Step 2: Assess impact (call another internal reasoner)
    impact = await assess_impact(logs, anomaly)

    # Step 3: Return comprehensive analysis
    return {
        "status": "anomaly_detected",
        "anomaly": {
            "is_anomaly": anomaly.is_anomaly,
            "confidence": anomaly.confidence,
            "reasoning": anomaly.reasoning
        },
        "impact": {
            "severity": impact.severity,
            "affected_services": impact.affected_services
        }
    }

Test the complete workflow:

curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "ERROR", "message": "Database connection pool exhausted"},
        {"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"}
      ]
    }
  }'

Response:

{
  "result": {
    "status": "anomaly_detected",
    "anomaly": {
      "is_anomaly": true,
      "confidence": 0.92,
      "reasoning": "Cascading failure: user-service database exhaustion causing downstream failures"
    },
    "impact": {
      "severity": "critical",
      "affected_services": ["api-gateway", "user-service", "billing-service"]
    }
  }
}

What happened: The orchestrator called detect_anomaly() → then assess_impact() → returned comprehensive analysis. All in one agent node.

Cross-Agent Communication

For critical issues, escalate to a separate Alert Manager agent using app.call():

Create the Alert Manager Agent

# In a new directory
af init alert-manager --language python
cd alert-manager

Edit main.py:

from agentfield import Agent
from pydantic import BaseModel
from typing import List

# Fast model for simple alert routing logic
app = Agent(
    node_id="alert-manager",
    default_model="gpt-4o-mini"  # Simple routing → fast model
)

class AlertDecision(BaseModel):
    should_alert: bool
    channels: List[str]  # slack, pagerduty, email
    priority: str

@app.reasoner()
async def create_alert(
    severity: str,
    affected_services: List[str],
    reasoning: str
) -> AlertDecision:
    """
    Simple routing logic: fast model is sufficient.
    Maps severity → channels based on clear rules.
    """
    return await app.ai(
        system="Decide alert channels based on severity. "
               "Critical → PagerDuty + Slack. High → Slack only. Medium/Low → no alert.",
        user=f"Severity: {severity}\nServices: {', '.join(affected_services)}\n{reasoning}",
        schema=AlertDecision
    )

if __name__ == "__main__":
    app.serve(port=8002)

Update Log Intelligence Agent for Conditional Escalation

Add this to the log-intelligence/main.py orchestrator:

@app.reasoner()
async def analyze_logs(logs: List[dict]) -> dict:
    """Orchestrator with conditional cross-agent escalation."""

    # Step 1: Detect anomalies (internal call with await)
    anomaly = await detect_anomaly(logs)

    if not anomaly.is_anomaly:
        return {"status": "healthy", "message": "No anomalies detected"}

    # Step 2: Assess impact (internal call with await)
    impact = await assess_impact(logs, anomaly)

    # Step 3: Conditional escalation to external agent (use app.call)
    alert_decision = None
    if impact.severity in ["critical", "high"]:
        alert_decision = await app.call(
            "alert-manager.create_alert",
            severity=impact.severity,
            affected_services=impact.affected_services,
            reasoning=anomaly.reasoning
        )

    return {
        "status": "anomaly_detected",
        "anomaly": {
            "is_anomaly": anomaly.is_anomaly,
            "confidence": anomaly.confidence,
            "reasoning": anomaly.reasoning
        },
        "impact": {
            "severity": impact.severity,
            "affected_services": impact.affected_services
        },
        "alert": alert_decision if alert_decision else {"status": "no_alert_needed"}
    }

Run the Complete System

# Terminal 1: Control plane
af server

# Terminal 2: Log Intelligence Agent
cd log-intelligence
af run

# Terminal 3: Alert Manager Agent
cd alert-manager
af run

# Terminal 4: Test the full workflow
curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "user-service", "level": "CRITICAL", "message": "Database connection pool exhausted"},
        {"timestamp": "2024-01-15T10:00:03Z", "service": "api-gateway", "level": "ERROR", "message": "Connection timeout to user-service"},
        {"timestamp": "2024-01-15T10:00:05Z", "service": "billing-service", "level": "ERROR", "message": "Failed to fetch user data"},
        {"timestamp": "2024-01-15T10:00:07Z", "service": "payment-service", "level": "ERROR", "message": "Cannot connect to billing-service"}
      ]
    }
  }'

Full Response with Alert Escalation:

{
  "result": {
    "status": "anomaly_detected",
    "anomaly": {
      "is_anomaly": true,
      "confidence": 0.95,
      "reasoning": "Cascading failure originating from user-service database exhaustion, propagating to api-gateway, billing-service, and payment-service"
    },
    "impact": {
      "severity": "critical",
      "affected_services": ["api-gateway", "user-service", "billing-service", "payment-service"]
    },
    "alert": {
      "should_alert": true,
      "channels": ["pagerduty", "slack"],
      "priority": "P1"
    }
  },
  "execution_id": "exec_abc123"
}

What happened:

analyze_logs() orchestrator received logs
Called detect_anomaly() (same agent, used await)
Detected anomaly → called assess_impact() (same agent, used await)
Impact was "critical" → called alert-manager.create_alert (different agent, used app.call())
Alert Manager decided to alert via PagerDuty + Slack
Complete result returned with alert decision

View the Workflow DAG: Open http://localhost:8080 → Executions → Click on exec_abc123 → See the complete execution graph:

analyze_logs (log-intelligence)
  ├─ detect_anomaly (log-intelligence) [internal await]
  ├─ assess_impact (log-intelligence) [internal await]
  └─ create_alert (alert-manager) [cross-agent app.call]

Why This Matters

Traditional Approach:

# Hundreds of lines of brittle rules
if error_count > THRESHOLD_1 and service in CRITICAL_SERVICES:
    if time_window < THRESHOLD_2:
        if error_rate > THRESHOLD_3:
            # Complex correlation logic...
            if cascading_detected():
                # Alert routing logic...
                if severity == "critical" and on_call_hours():
                    send_pagerduty()
                elif severity == "high":
                    send_slack()
                # ... more conditionals

Autonomous Approach:

# AI reasons about the situation
anomaly = await detect_anomaly(logs)  # AI detects patterns
impact = await assess_impact(logs, anomaly)  # AI assesses severity
alert = await app.call("alert-manager.create_alert", ...)  # AI routes alerts

Benefits:

No hardcoded thresholds - AI adapts to patterns
Intelligent correlation - AI understands cascading failures
Context-aware - AI considers service dependencies
Reduced false positives - AI filters noise
Evolves automatically - AI learns from new patterns

Testing Different Scenarios

Scenario 1: No Anomaly (Healthy System)

curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "api-gateway", "level": "INFO", "message": "Request processed successfully"},
        {"timestamp": "2024-01-15T10:00:01Z", "service": "user-service", "level": "INFO", "message": "User authenticated"}
      ]
    }
  }'

Response:

{
  "result": {
    "status": "healthy",
    "message": "No anomalies detected",
    "confidence": 0.98
  }
}

Scenario 2: Medium Severity (No Alert)

curl -X POST http://localhost:8080/api/v1/execute/log-intelligence.analyze_logs \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "logs": [
        {"timestamp": "2024-01-15T10:00:00Z", "service": "cache-service", "level": "WARN", "message": "Cache miss rate elevated"},
        {"timestamp": "2024-01-15T10:00:02Z", "service": "cache-service", "level": "WARN", "message": "Eviction rate increasing"}
      ]
    }
  }'

Response:

{
  "result": {
    "status": "anomaly_detected",
    "anomaly": {
      "is_anomaly": true,
      "confidence": 0.78,
      "reasoning": "Cache performance degradation detected"
    },
    "impact": {
      "severity": "medium",
      "affected_services": ["cache-service"]
    },
    "alert": {
      "status": "no_alert_needed"
    }
  }
}

Note: Medium severity didn't trigger alert escalation—AI decided it doesn't warrant immediate attention.

Production Considerations

This is a learning example. Before production, consider:

What's Missing:

Historical context - Anomaly detection needs baseline metrics, time-series analysis
Error handling - Retry logic, timeout handling, fallback responses
Rate limiting - Prevent API overload during log spikes
Observability - Metrics, tracing, structured logging for the agent itself
Schema validation - Stricter Pydantic validation with enums and constraints

Quick Improvements:

from enum import Enum

class Severity(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

class ImpactAssessment(BaseModel):
    severity: Severity  # Enum instead of string
    affected_services: List[str]

    class Config:
        # Add validation
        min_items_affected_services = 1

Why Model Selection Matters:

gpt-4o for anomaly detection: ~$0.005/request
gpt-4o-mini for classification: ~$0.0015/request
At scale: 1M classifications/month = $1,500 vs $5,000 (67% savings)

Next Steps

You just built autonomous software that:

Uses multiple reasoners in one agent node
Calls reasoners with await for internal workflows
Uses app.call() for cross-agent communication
Makes conditional decisions with AI
Replaces complex rule engines with intelligence

Explore more:

Local Development - Development environment setup
Managed Platforms - Deploy to production
Python SDK Reference - Complete API documentation

Build autonomous software, not rule engines. Start with intelligent reasoners, orchestrate with workflows, scale with multiple agents

Build Your First Agent

Build Your First Agent

The Autonomous Log Intelligence Agent

Start the Control Plane

Create the Log Intelligence Agent

Test Individual Reasoners

Model Selection Strategy

Orchestrating Multiple Reasoners

Cross-Agent Communication

Create the Alert Manager Agent

Update Log Intelligence Agent for Conditional Escalation

Run the Complete System

Why This Matters

Testing Different Scenarios

Production Considerations

What's Missing:

Quick Improvements:

Why Model Selection Matters:

Next Steps