The unified inference API for generative video,
image, and audio. Sub-second cold starts, 100+ models.
Video generation from text or image
High-quality video synthesis
Speech-to-text transcription
Voice synthesis and cloning
Infer cut our inference costs by 60% while improving latency. The unified API means we ship features 3× faster.
Models are always warm. No waiting for containers to spin up.
From 1 to 10 million calls. No provisioning, no config — just hit the API.
SOC2 compliant. Multi-region redundancy. Real-time monitoring.
Flux, SDXL, Runway, Whisper, ElevenLabs. Same API shape.
No commitments. Transparent per-inference pricing.
Inference runs closest to your users. US, EU, APAC.
Always-warm replicas, co-located weights, and a streaming protocol that returns tokens as they're decoded — not after the full response lands. No cold-starts, no queueing, no babysitting.
→ PERF PANEL · FLUX-SCHNELL · 1024×1024 · STEPS=4
→ WINDOW · ROLLING LAST 24 HOURS · ALL REGIONS
→ MEASURED · END-TO-END (API CALL → FINAL BYTE)
Type-safe SDKs. OpenAPI spec. Streaming. Webhooks. Everything you need to ship fast — and nothing you don't.
1import infer 2 3# Initialize with your API key 4client = infer.Client() 5 6# Generate with any of 100+ models 7result = client.run("flux-schnell", { 8 "prompt": "A futuristic city at sunset", 9 "width": 1024,10 "height": 102411})1213# That's it. No infra, no queues.14print(result.url)
1import Infer from "@infer/sdk"; 2 3// Initialize with your API key 4const client = new Infer(); 5 6// Generate with any of 100+ models 7const result = await client.run("flux-schnell", { 8 prompt: "A futuristic city at sunset", 9 width: 1024,10 height: 1024,11});1213// Type-safe. Streaming-ready.14console.log(result.url);
1package main 2 3import "github.com/infer/sdk-go" 4 5func main() { 6 client := infer.NewClient() 7 8 // Run any of 100+ models 9 result, _ := client.Run("flux-schnell", infer.Params{10 Prompt: "A futuristic city at sunset",11 Width: 1024,12 Height: 1024,13 })14 fmt.Println(result.URL)15}
1# Works with any HTTP client 2curl https://api.infer.sh/v1/run \ 3 -H "Authorization: Bearer $INFER_KEY" \ 4 -H "Content-Type: application/json" \ 5 -d '{ 6 "model": "flux-schnell", 7 "prompt": "A futuristic city at sunset", 8 "width": 1024, 9 "height": 102410 }'1112# Response streams back over HTTP/213# X-Infer-Latency: 1243ms
Reserved capacity, private endpoints, compliance packages — available as an add-on for teams operating at serious scale. Not included in Pro; talk to us and we'll scope what you need.
Marey is Moonvalley's foundational video model — delivering director-grade cinematic video from text, image, and pose input. It powers work at some of the biggest names in Hollywood.
To hit feature-film SLAs at production scale, Moonvalley's platform runs on Infer — tapping reserved inference capacity, private model hosting, and sub-200ms edge routing across four regions.
“Marey has to run at the quality bar of a film set — and on the timelines of one. Infer lets us push Hollywood-grade video through the API without thinking about the infrastructure underneath.”
The security-review and procurement questions, handled up front.






















