Documentation Chatbot
Production RAG system built like a FastAPI app—the same chatbot powering this documentation site
Documentation Chatbot
Production RAG built like a FastAPI app—no vector DB setup, no complex architecture
Real-world example: This chatbot powers the AgentField documentation website itself. It's battle-tested in production.
Building production RAG systems typically requires setting up vector databases, embedding services, API gateways, and complex deployment pipelines. With AgentField, you just write functions—like building a FastAPI application.
Architecture Comparison
Traditional RAG systems require multiple services, complex setup, and manual infrastructure management:
AgentField RAG system is just functions with built-in memory—no external services needed.
Core Pattern: It's Just Functions
Memory Usage (No Setup)
Use app.memory directly—no vector database setup, no configuration:
from agentfield import Agent
app = Agent(node_id="documentation-chatbot")
# Get memory scope - works immediately
global_memory = app.memory.global_scope
# Store vectors directly
await global_memory.set_vector(
key="doc:chunk-1",
embedding=embedding,
metadata={"text": "Documentation content...", "path": "docs/guide.md"}
)
# Search directly - no connection setup needed
results = await global_memory.similarity_search(
query_embedding=query_embedding,
top_k=10
)Memory is distributed and works across agents automatically. No connection pooling, no retry logic—just use it.
Router Organization (FastAPI-style)
Organize code with AgentRouter—just like FastAPI:
from agentfield import Agent
from routers import qa_router, ingestion_router, retrieval_router
app = Agent(node_id="documentation-chatbot")
# Include routers - auto-generates REST APIs
app.include_router(qa_router)
app.include_router(ingestion_router)
app.include_router(retrieval_router)
# That's it - endpoints are ready:
# POST /api/v1/execute/documentation-chatbot.qa_answer_with_documents
# POST /api/v1/execute/documentation-chatbot.ingest_folderParallel Execution (Just asyncio)
Run multiple retrievals in parallel—standard Python patterns:
from agentfield import AgentRouter
import asyncio
retrieval_router = AgentRouter(tags=["retrieval"])
@retrieval_router.reasoner()
async def parallel_retrieve(queries: list[str], namespace: str = "documentation"):
"""Execute multiple queries in parallel for 3x speed improvement."""
global_memory = retrieval_router.memory.global_scope
# Create tasks for parallel execution
tasks = [
_retrieve_for_query(global_memory, query, namespace)
for query in queries
]
# Execute all queries concurrently
all_results = await asyncio.gather(*tasks)
# Merge and deduplicate
return deduplicate_results(all_results)What You Don't Need
- ❌ Vector database setup (Pinecone, Weaviate, Chroma)
- ❌ Embedding service configuration
- ❌ Connection pooling and retry logic
- ❌ API gateway setup
- ❌ Scaling configuration
- ❌ Manual service discovery
What You Get
- ✅ Production-ready REST APIs (auto-generated from reasoners)
- ✅ Distributed memory (works across agents automatically)
- ✅ Parallel execution (just use
asyncio.gather) - ✅ Workflow tracking (automatic DAG visualization)
- ✅ Inline citations (Perplexity-style
[A][B]references)
Quick Usage
1. Ingest Documentation
# POST to /api/v1/execute/documentation-chatbot.ingest_folder
{
"input": {
"folder_path": "~/docs",
"namespace": "product-docs",
"chunk_size": 1200,
"chunk_overlap": 250
}
}The ingestion reasoner chunks files, generates embeddings, and stores them in memory—all in one function.
2. Ask Questions
# POST to /api/v1/execute/documentation-chatbot.qa_answer_with_documents
{
"input": {
"question": "How does parallel execution work?",
"namespace": "product-docs",
"top_k": 6,
"min_score": 0.35
}
}Response:
{
"answer": "Parallel execution runs multiple queries concurrently using `asyncio.gather` [A]. This provides a 3x speed improvement over sequential retrieval [A].",
"citations": [
{
"key": "A",
"relative_path": "docs/core-concepts/parallel-execution.md",
"start_line": 42,
"end_line": 58,
"score": 0.87
}
],
"confidence": "high"
}Architecture: 3-Reasoner System
The chatbot uses a simple 3-step pipeline:
- Query Planner - Generates diverse search queries from user question
- Parallel Retrievers - Executes all queries concurrently (3x faster)
- Self-Aware Synthesizer - Generates answer with inline citations and confidence assessment
@qa_router.reasoner()
async def qa_answer_with_documents(question: str, namespace: str = "documentation"):
"""Main QA orchestrator - simple 3-step pipeline."""
# Step 1: Plan queries
plan = await plan_queries(question)
# Step 2: Parallel retrieval (3x speed improvement)
chunks = await parallel_retrieve(plan.queries, namespace)
# Step 3: Synthesize answer with citations
answer = await synthesize_answer(question, chunks)
return answerEach step is a reasoner—an AI-powered function that automatically becomes a REST API endpoint.
Key Features
- Parallel Retrieval: 3-5 queries executed concurrently for comprehensive coverage
- Document-Aware: Retrieves full documentation pages, not just isolated chunks
- Self-Aware Synthesis: Automatically assesses answer completeness and triggers refinement if needed
- Inline Citations: Perplexity-style references
[A][B]with file paths and line numbers - Two-Tier Storage: 70% storage savings by storing documents once, chunks reference them
Full Implementation
See the complete codebase on GitHub:
View on GitHub
Complete implementation with all routers, schemas, and utilities