Generate Media

Generate images, video, and audio with AI

Generate Media

Generate images, video, and audio with AI

Agentfield provides unified methods for generating images, video, audio, and transcribing speech. All methods automatically route to the correct provider based on model prefix.

Setup

Set your API keys:

# For DALL-E and OpenAI TTS
export OPENAI_API_KEY="sk-..."

# For Fal.ai (Flux images, video, Whisper)
export FAL_KEY="..."

Or configure in code:

from agentfield import Agent, AIConfig

app = Agent(
    node_id="media-agent",
    ai_config=AIConfig(
        fal_api_key="...",  # Optional - falls back to FAL_KEY env var
        video_model="fal-ai/minimax-video/image-to-video"  # Default video model
    )
)

Generate Images

# Fal.ai - Flux (fast, high quality)
result = await app.ai_with_vision(
    "A cyberpunk city at night",
    model="fal-ai/flux/schnell"  # Fast
)
result.images[0].save("city.png")

# DALL-E 3 (via LiteLLM)
result = await app.ai_with_vision(
    "A serene mountain landscape",
    model="dall-e-3",
    size="1792x1024",
    quality="hd"
)

# OpenRouter
result = await app.ai_with_vision(
    "Abstract art",
    model="openrouter/google/gemini-2.5-flash-image-preview"
)

Generate Audio (TTS)

# OpenAI TTS
result = await app.ai_with_audio(
    "Hello, welcome to the presentation.",
    voice="nova",  # alloy, echo, fable, onyx, nova, shimmer
    model="tts-1-hd"
)
result.audio.save("greeting.mp3")
result.audio.play()  # Requires pygame

Generate Video

# Image-to-video (default model)
result = await app.ai_generate_video(
    "Camera slowly zooms in on the landscape",
    image_url="https://example.com/image.jpg"
)
result.files[0].save("video.mp4")

# Text-to-video
result = await app.ai_generate_video(
    "A cat playing with yarn",
    model="fal-ai/kling-video/v1/standard"
)

Transcribe Audio (STT)

# Basic transcription
result = await app.ai_transcribe_audio(
    "https://example.com/recording.mp3"
)
print(result.text)

# Faster transcription with language hint
result = await app.ai_transcribe_audio(
    "https://example.com/spanish.mp3",
    model="fal-ai/wizper",  # 2x faster than whisper
    language="es"
)

Provider Routing

Methods automatically route to providers based on model prefix:

Model PrefixProviderMethods
fal-ai/Fal.aiImage, Video, Audio, Transcription
openrouter/OpenRouterImage
(default)LiteLLMImage, Audio (TTS)