Deployment Overview

Understanding Agentfield's deployment architecture and patterns

Deployment Overview

How to deploy Agentfield from localhost to enterprise scale

Traditional agent deployments require Redis for state, Kafka for async jobs, load balancers for routing, and service meshes for observability.

Agentfield deployments need PostgreSQL. That's it.

The architecture eliminates infrastructure complexity while enabling enterprise-scale patterns: horizontal scaling, independent agent deployment, and complete observability—all built into the control plane.


The Architecture Advantage

Stateless Control Plane

Written in Go. Shares nothing but Postgres. Scale from 1 to N instances behind a load balancer with zero coordination overhead. Each instance handles requests independently.

Independent Agent Scaling

Each agent node is a separate service. Marketing team deploys their agents. Support team deploys theirs. No monolithic coordination. Control plane discovers and routes automatically.

Routing Through Control Plane

Agents never call each other directly. All app.call() requests route through the control plane. This enables: automatic load balancing, complete DAG visibility, circuit breaking, context propagation.

Database-Backed Everything

Async execution queue: Postgres. Distributed locks: Postgres. Memory fabric: Postgres. Webhook state: Postgres. No Redis, Kafka, or RabbitMQ. One dependency.


Deployment Patterns

Choose based on your scale and team structure:

Local Development

When: Solo prototyping, rapid iteration Database: BoltDB (embedded, zero-config) Scale: Single machine

# Start control plane with embedded database
af server

# Run your agent in another terminal
cd my-agent
af run

What you get:

  • Zero infrastructure setup
  • Hot reload for agent code
  • Local web UI at http://localhost:8080
  • BoltDB auto-initializes in ~/.agentfield/

Limitations: Single writer (BoltDB constraint), no horizontal scaling

Learn more →

Docker Compose

When: Team development, CI/CD testing Database: PostgreSQL (containerized) Scale: Multiple agents, single control plane

# docker-compose.yml (available on GitHub)
services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: agentfield
      POSTGRES_USER: agentfield
      POSTGRES_PASSWORD: change-in-production

  control-plane:
    image: agentfield/control-plane:latest
    ports: ["8080:8080"]
    environment:
      AGENTFIELD_POSTGRES_URL: postgres://agentfield:password@postgres:5432/agentfield
    depends_on: [postgres]

  agent-support:
    build: ./agents/support
    environment:
      AGENTFIELD_SERVER: http://control-plane:8080

  agent-analytics:
    build: ./agents/analytics
    environment:
      AGENTFIELD_SERVER: http://control-plane:8080

What you get:

  • Isolated environments per team
  • Multiple agent nodes in one stack
  • PostgreSQL persistence
  • Reproducible deployments

Official compose file: github.com/Agent-Field/agentfield/deployments/docker/docker-compose.yml

Learn more →

Managed Platforms

When: Production without Kubernetes complexity Database: Managed Postgres (add-on) Scale: Horizontal, auto-scaling

Supported platforms:

  • Railway (auto-detected via RAILWAY_ENVIRONMENT)
  • Render (auto-detected via RENDER_SERVICE_NAME)
  • Heroku (auto-detected via DYNO)
  • Fly.io (manual config)
# Railway example
railway init
railway add postgresql  # Managed Postgres add-on
railway up              # Deploys control plane

# Deploy agent (separate service)
cd agents/support
railway up

What you get:

  • Zero ops: platform handles scaling, SSL, monitoring
  • Persistent agent processes (not serverless cold starts)
  • Cost-effective for most workloads
  • Auto-detected by Agentfield SDK

Why not Lambda/Cloud Functions? Agents maintain HTTP servers and heartbeat connections. Managed platforms (Railway, Render) run persistent processes at lower cost than provisioned Lambda.

Learn more →

Serverless (Lambda, Cloud Functions, Cloud Run)

When: Bursty workloads, pay-per-use agents, zero ops runtime Database: Managed Postgres for the control plane (persistent state still lives there) Scale: Auto-scales with function provider; cold starts apply

# 1) Build a handler with the SDK + an adapter for your platform's event shape
#    Python: handler = lambda event, ctx: app.handle_serverless(event, adapter)
#    TS: export const handler = agent.handler((event) => normalize(event))
#    Go: Handler() for HTTP, or HandleServerlessEvent(ctx, event, adapter) for raw payloads

# 2) Deploy the function (Lambda/Cloud Run/Cloud Functions)

# 3) Register the function URL with the control plane
af nodes register-serverless --url https://<function-url>

# 4) Invoke normally through the gateway
curl -X POST "$CP/api/v1/reasoners/<node_id>.<reasoner>" -d '{"input":{...}}'

What you get:

  • No heartbeats or always-on servers; control plane calls /discover then /execute on demand
  • Full workflow/run IDs propagated into serverless executions; adapters let you align any event blob
  • Cross-agent calls still route through the control plane (app.call("other-node.fn")) even in serverless
  • Great for spiky or low-duty-cycle agents; consider managed (non-serverless) for steady traffic

Kubernetes

When: Enterprise scale, multi-region, compliance Database: Managed Postgres (AWS RDS, Cloud SQL, Azure) or operator Scale: Unlimited horizontal, HPA-ready

# Control plane: Stateless deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentfield-control-plane
spec:
  replicas: 3  # Scale horizontally
  selector:
    matchLabels:
      app: agentfield-cp
  template:
    spec:
      containers:
      - name: control-plane
        image: agentfield/control-plane:latest
        env:
        - name: AGENTFIELD_POSTGRES_URL
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: url
---
# Agent: Independent deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: support-agent
spec:
  replicas: 5  # Scale based on agent load
  template:
    spec:
      containers:
      - name: agent
        image: your-org/support-agent:latest
        env:
        - name: AGENTFIELD_SERVER
          value: "http://agentfield-cp-service:8080"

What you get:

  • Control plane scales independently of agents
  • Each agent type scales based on its load
  • Standard K8s patterns (HPA, ingress, service mesh)
  • Multi-region with shared Postgres

Learn more →


Database Setup

Agentfield uses your database as the single source of truth—all state, workflow executions, and agent registry data live there.

Local Development: BoltDB

af server  # Auto-creates ~/.agentfield/agentfield.bolt

Zero-config embedded database for solo development. Limitation: single process only.

Production: PostgreSQL

export AGENTFIELD_DATABASE_URL="postgres://user:pass@host:5432/agentfield"
af server  # Auto-runs migrations

Required for team development and production. Enables horizontal scaling and multi-writer support.

Managed Postgres providers:

  • AWS RDS, Google Cloud SQL, Azure Database
  • Railway, Render, Heroku (platform add-ons)

For advanced database tuning (connection pools, indexes, read replicas), see Database Performance Guide.


Environment Variables for Production

Control Plane

Essential:

# Storage
AGENTFIELD_STORAGE_MODE=postgres
AGENTFIELD_DATABASE_URL="postgres://user:pass@db.example.com:5432/agentfield?sslmode=require"

# Server
AGENTFIELD_PORT=8080
AGENTFIELD_MODE=cloud

# Performance
AGENTFIELD_STORAGE_POSTGRES_MAX_CONNECTIONS=100
AGENTFIELD_STORAGE_POSTGRES_MAX_IDLE_CONNECTIONS=25

# Security
AGENTFIELD_API_CORS_ALLOWED_ORIGINS=https://app.company.com,https://admin.company.com

Optional (with sensible defaults):

AGENTFIELD_UI_ENABLED=true
AGENTFIELD_STORAGE_POSTGRES_CONNECTION_TIMEOUT=60s
AGENTFIELD_STORAGE_POSTGRES_QUERY_TIMEOUT=60s

Python Agents

Essential:

# Control plane connectivity
AGENTFIELD_SERVER=https://agentfield.company.com
AGENT_CALLBACK_URL=http://agent-service:8000  # Critical for containers
PORT=8000

# AI providers
OPENAI_API_KEY=sk-proj-...
# OR
ANTHROPIC_API_KEY=sk-ant-...

Performance tuning:

# Multi-core scaling
UVICORN_WORKERS=4  # Set to (2 × CPU cores) + 1

# Async execution
AGENTFIELD_ASYNC_MAX_CONCURRENT_EXECUTIONS=2048
AGENTFIELD_ASYNC_CONNECTION_POOL_SIZE=128
AGENTFIELD_ASYNC_BATCH_SIZE=200

# Logging (set to INFO or WARNING in production)
AGENTFIELD_LOG_LEVEL=INFO
AGENTFIELD_LOG_TRUNCATE=500

Go Agents

Essential:

# AI integration
OPENAI_API_KEY=sk-proj-...
# OR for OpenRouter (multi-provider access)
OPENROUTER_API_KEY=sk-or-v1-...

# Model selection
AI_MODEL=gpt-4o  # or anthropic/claude-3-5-sonnet for OpenRouter

See Environment Variables Reference for the complete list.


Production Checklist

Before deploying to production, verify:

Infrastructure

  • PostgreSQL 15+ provisioned (managed or self-hosted)
  • Database connection pooling configured (min 25 connections)
  • Postgres backups enabled (point-in-time recovery)
  • Load balancer in front of control plane (if running multiple instances)

Security

  • TLS enabled for control plane endpoints
  • Database credentials in secrets (not environment variables)
  • Agent callback URLs use HTTPS (not HTTP)
  • DID/VC feature enabled for audit trails (features.did_vc.enabled: true)
  • Webhook HMAC secrets configured for callback authentication

Configuration

  • AGENTFIELD_POSTGRES_URL set on control plane
  • AGENTFIELD_SERVER set on all agent nodes
  • AGENT_CALLBACK_URL explicitly set (don't rely on auto-detection in production)
  • Queue worker count tuned for workload (execution_queue.workers)
  • Execution retention configured (cleanup.retention_hours)

Observability

  • Metrics endpoint exposed (/api/v1/metrics)
  • Health check configured on load balancer (/health)
  • Agent heartbeat interval tuned (agent_registry.heartbeat_interval)
  • Workflow DAG visualization accessible (control plane UI)
  • Log aggregation configured (stdout/stderr to ELK, Datadog, etc.)

Testing

  • Load test async execution queue (verify backpressure)
  • Test control plane failover (kill instance, verify requests route to others)
  • Test agent scaling (verify load balancing across replicas)
  • Verify webhook delivery and retries
  • Test database failover (if using HA Postgres)

Common Questions



Next Steps


The architecture eliminates complexity. One database. Stateless services. Independent scaling. Deploy with confidence.