Dispatch multi-turn coding tasks to external coding agents from within your Agentfield workflows

Under Active Development

The harness is being built as part of Epic #208. APIs and behavior described here reflect the design spec and may change before the stable release.

app.ai() is excellent at what it does: a single-turn call to a language model that returns structured output. You give it a prompt, it gives you back text or a typed Pydantic object. Fast, cheap, predictable.

But some tasks can't be solved in one turn.

Imagine you need an agent to find a bug in a codebase, trace it through three files, write a fix, run the test suite, and confirm the tests pass. That's not a single LLM call. That's a multi-turn session where the agent browses files, edits code, executes commands, reads output, and iterates. No amount of prompt engineering turns app.ai() into that.

That's what .harness() is for.

What .harness() Does

app.harness() dispatches a task to an external coding agent — Claude Code, Codex, Gemini CLI, or OpenCode — and waits for the result. The coding agent runs its full agentic loop: reading files, editing code, running tests, checking output, trying again if something fails. When it's done, .harness() returns a HarnessResult with the agent's output, metrics, and optionally a validated schema instance.

It's the bridge between your Agentfield workflow and the coding agents you already know.

from agentfield import Agent, HarnessConfig

app = Agent(
    node_id="my-agent",
    harness_config=HarnessConfig(
        provider="claude-code",
        model="sonnet",
    ),
)

result = await app.harness("Fix the auth bug in src/auth.py")
print(result.text)

The coding agent handles everything in between: browsing the file, understanding the bug, writing the fix, running tests. Your code just dispatches the task and receives the result.

ai() vs harness()

These two methods solve different problems. Knowing when to use each is the key to building effective workflows.

	`app.ai()`	`app.harness()`
Turns	Single turn	Multi-turn session
What it does	Calls an LLM, returns output	Runs a full coding agent loop
File access	No	Yes — reads, edits, creates files
Command execution	No	Yes — runs tests, builds, scripts
Iteration	No	Yes — agent retries until done
Latency	Seconds	Minutes
Cost	Low	Higher (many LLM calls internally)
Best for	Classification, extraction, summarization, routing	Bug fixes, refactors, code generation, test writing

The mental model: app.ai() is a smart function call. app.harness() is hiring a contractor to do a job.

How It Works

You call app.harness() with a prompt

Your prompt describes the task in plain language. You can optionally pass a working directory, a Pydantic or Zod schema for structured output, tool permissions, and a cost cap.

result = await app.harness(
    "Refactor the database layer to use async/await throughout",
    cwd="/my/project",
    max_turns=100,
    max_budget_usd=3.0,
)

Agentfield selects the provider and builds the execution context

The HarnessRunner resolves configuration (constructor defaults merged with per-call overrides), validates the provider, and prepares the prompt. If you passed a schema, it appends output requirements to the prompt instructing the agent to write a JSON file.

The coding agent executes

The external coding agent runs its full loop. It browses your codebase, edits files, runs commands, reads output, and iterates. This can take seconds or minutes depending on the task. The agent has access to the tools you've permitted: Read, Write, Edit, Bash, Glob, Grep.

Results come back as HarnessResult

When the agent finishes, .harness() returns a HarnessResult containing the agent's text output, execution metrics (cost, turns, duration), and — if you passed a schema — a validated typed instance.

print(result.text)          # Agent's summary
print(result.cost_usd)      # What it cost
print(result.num_turns)     # How many iterations
print(result.parsed)        # Typed schema instance (if schema was passed)

The Provider Model

Agentfield supports four coding agent providers. You pick one when configuring your agent.

Provider	Integration	Notes
`claude-code`	Native Python/TypeScript SDK	In-process, no binary dependency
`codex`	CLI subprocess (Python) / Native SDK (TypeScript)	OpenAI's coding agent
`gemini`	CLI subprocess	Gemini CLI with JSON output
`opencode`	CLI subprocess	Open-source coding agent

Claude Code uses the claude_agent_sdk in Python and @anthropic-ai/claude-agent-sdk in TypeScript — running the agent in-process with no subprocess overhead. Codex, Gemini, and OpenCode run as CLI subprocesses, parsing their JSONL event streams.

Each provider requires its own authentication setup. See Provider Requirements for installation and auth details.

Schema-Constrained Output

When you need structured data back from a coding task, pass a Pydantic model (Python) or Zod schema (TypeScript). The harness instructs the coding agent to write its output as JSON to a file in the working directory, then reads and validates it after the session completes.

from pydantic import BaseModel

class RefactorResult(BaseModel):
    files_changed: list[str]
    summary: str
    tests_added: bool
    breaking_changes: bool

result = await app.harness(
    "Refactor the auth module to use the new token format",
    schema=RefactorResult,
    cwd="/my/project",
)

print(result.parsed.files_changed)   # ["src/auth.py", "tests/test_auth.py"]
print(result.parsed.tests_added)     # True

This approach works identically across all four providers — the agent writes a file, the harness reads it. No provider-specific schema flags, no token limit issues with large schemas.

If the output doesn't validate on the first read, the harness applies a four-layer recovery strategy: cosmetic repair (stripping markdown fences, fixing trailing commas), a follow-up prompt in the same session, and finally a full retry. In practice, the agent almost always gets it right the first time.

When to Use harness() vs ai()

Use app.ai() when:

You need to classify, extract, or summarize content
The task fits in a single prompt-response exchange
You need fast, cheap, predictable output
You're routing decisions or generating structured data for downstream logic

Use app.harness() when:

The task requires reading multiple files to understand context
You need the agent to edit code and verify the result
The work involves running tests or build commands
The task is open-ended enough that iteration is expected
You're automating something a developer would do manually

A common pattern is using app.ai() to analyze and plan, then app.harness() to execute:

@app.reasoner()
async def fix_github_issue(issue: dict) -> dict:
    # ai() to understand and plan (fast, cheap)
    plan = await app.ai(
        system="You are a senior engineer. Analyze this issue and describe the fix needed.",
        user=f"Issue: {issue['title']}\n\n{issue['body']}",
        schema=FixPlan,
    )

    # harness() to execute the plan (multi-turn, agentic)
    result = await app.harness(
        f"Implement this fix:\n\n{plan.description}",
        schema=FixResult,
        cwd=issue["repo_path"],
        max_turns=150,
    )

    return result.model_dump()

Quick Examples

from agentfield import Agent, HarnessConfig
from pydantic import BaseModel

app = Agent(
    node_id="code-agent",
    harness_config=HarnessConfig(
        provider="claude-code",
        model="sonnet",
    ),
)

class BugFix(BaseModel):
    files_changed: list[str]
    summary: str
    tests_added: bool

@app.reasoner()
async def fix_issue(issue: dict) -> dict:
    fix = await app.harness(
        f"Fix: {issue['title']}\n\n{issue['description']}",
        schema=BugFix,
        cwd=issue["repo_path"],
        max_turns=100,
        tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"],
    )
    return fix.model_dump()

import { Agent } from "@agentfield/sdk";
import { z } from "zod";

const agent = new Agent({
  nodeId: "code-agent",
  harnessConfig: {
    provider: "claude-code",
    model: "sonnet",
  },
});

const BugFix = z.object({
  filesChanged: z.array(z.string()),
  summary: z.string(),
  testsAdded: z.boolean(),
});

agent.reasoner("fixIssue", async (issue: Record<string, string>) => {
  const fix = await agent.harness(
    `Fix: ${issue.title}\n\n${issue.description}`,
    {
      schema: BugFix,
      cwd: issue.repoPath,
      maxTurns: 100,
    }
  );
  return fix.parsed;
});

Python SDK: app.harness() — Full parameter reference and return types
TypeScript SDK: agent.harness() — TypeScript API reference
Provider Requirements — Installation, authentication, and per-provider configuration

Coding Agent Harness