Production RAG system built like a FastAPI app—the same chatbot powering this documentation site

Real-world example: This chatbot powers the AgentField documentation website itself. It's battle-tested in production.

Building production RAG systems typically requires setting up vector databases, embedding services, API gateways, and complex deployment pipelines. With AgentField, you just write functions—like building a FastAPI application.

Architecture Comparison

Traditional RAG systems require multiple services, complex setup, and manual infrastructure management:

AgentField RAG system is just functions with built-in memory—no external services needed.

Core Pattern: It's Just Functions

Memory Usage (No Setup)

Use app.memory directly—no vector database setup, no configuration:

from agentfield import Agent

app = Agent(node_id="documentation-chatbot")

# Get memory scope - works immediately
global_memory = app.memory.global_scope

# Store vectors directly
await global_memory.set_vector(
    key="doc:chunk-1",
    embedding=embedding,
    metadata={"text": "Documentation content...", "path": "docs/guide.md"}
)

# Search directly - no connection setup needed
results = await global_memory.similarity_search(
    query_embedding=query_embedding,
    top_k=10
)

Memory is distributed and works across agents automatically. No connection pooling, no retry logic—just use it.

Router Organization (FastAPI-style)

Organize code with AgentRouter—just like FastAPI:

from agentfield import Agent
from routers import qa_router, ingestion_router, retrieval_router

app = Agent(node_id="documentation-chatbot")

# Include routers - auto-generates REST APIs
app.include_router(qa_router)
app.include_router(ingestion_router)
app.include_router(retrieval_router)

# That's it - endpoints are ready:
# POST /api/v1/execute/documentation-chatbot.qa_answer_with_documents
# POST /api/v1/execute/documentation-chatbot.ingest_folder

Parallel Execution (Just asyncio)

Run multiple retrievals in parallel—standard Python patterns:

from agentfield import AgentRouter
import asyncio

retrieval_router = AgentRouter(tags=["retrieval"])

@retrieval_router.reasoner()
async def parallel_retrieve(queries: list[str], namespace: str = "documentation"):
    """Execute multiple queries in parallel for 3x speed improvement."""
    global_memory = retrieval_router.memory.global_scope

    # Create tasks for parallel execution
    tasks = [
        _retrieve_for_query(global_memory, query, namespace)
        for query in queries
    ]

    # Execute all queries concurrently
    all_results = await asyncio.gather(*tasks)

    # Merge and deduplicate
    return deduplicate_results(all_results)

What You Don't Need

❌ Vector database setup (Pinecone, Weaviate, Chroma)
❌ Embedding service configuration
❌ Connection pooling and retry logic
❌ API gateway setup
❌ Scaling configuration
❌ Manual service discovery

What You Get

✅ Production-ready REST APIs (auto-generated from reasoners)
✅ Distributed memory (works across agents automatically)
✅ Parallel execution (just use asyncio.gather)
✅ Workflow tracking (automatic DAG visualization)
✅ Inline citations (Perplexity-style [A][B] references)

Quick Usage

1. Ingest Documentation

# POST to /api/v1/execute/documentation-chatbot.ingest_folder
{
  "input": {
    "folder_path": "~/docs",
    "namespace": "product-docs",
    "chunk_size": 1200,
    "chunk_overlap": 250
  }
}

The ingestion reasoner chunks files, generates embeddings, and stores them in memory—all in one function.

2. Ask Questions

# POST to /api/v1/execute/documentation-chatbot.qa_answer_with_documents
{
  "input": {
    "question": "How does parallel execution work?",
    "namespace": "product-docs",
    "top_k": 6,
    "min_score": 0.35
  }
}

Response:

{
  "answer": "Parallel execution runs multiple queries concurrently using `asyncio.gather` [A]. This provides a 3x speed improvement over sequential retrieval [A].",
  "citations": [
    {
      "key": "A",
      "relative_path": "docs/core-concepts/parallel-execution.md",
      "start_line": 42,
      "end_line": 58,
      "score": 0.87
    }
  ],
  "confidence": "high"
}

Architecture: 3-Reasoner System

The chatbot uses a simple 3-step pipeline:

Query Planner - Generates diverse search queries from user question
Parallel Retrievers - Executes all queries concurrently (3x faster)
Self-Aware Synthesizer - Generates answer with inline citations and confidence assessment

@qa_router.reasoner()
async def qa_answer_with_documents(question: str, namespace: str = "documentation"):
    """Main QA orchestrator - simple 3-step pipeline."""

    # Step 1: Plan queries
    plan = await plan_queries(question)

    # Step 2: Parallel retrieval (3x speed improvement)
    chunks = await parallel_retrieve(plan.queries, namespace)

    # Step 3: Synthesize answer with citations
    answer = await synthesize_answer(question, chunks)

    return answer

Each step is a reasoner—an AI-powered function that automatically becomes a REST API endpoint.

Key Features

Parallel Retrieval: 3-5 queries executed concurrently for comprehensive coverage
Document-Aware: Retrieves full documentation pages, not just isolated chunks
Self-Aware Synthesis: Automatically assesses answer completeness and triggers refinement if needed
Inline Citations: Perplexity-style references [A][B] with file paths and line numbers
Two-Tier Storage: 70% storage savings by storing documents once, chunks reference them

Documentation Chatbot

Documentation Chatbot

Architecture Comparison

Core Pattern: It's Just Functions

Memory Usage (No Setup)

Router Organization (FastAPI-style)

Parallel Execution (Just asyncio)

What You Don't Need

What You Get

Quick Usage

1. Ingest Documentation

2. Ask Questions

Architecture: 3-Reasoner System

Key Features

Full Implementation

View on GitHub

Shared Memory

AgentRouter

Reasoners & Skills

Cross-Agent Communication