AI Image & Video Models
Every frontier model in one canvas. Generate, edit, animate, and compare without juggling subscriptions.
10 models
Image models
Photorealism, typography, product shots, and editorial stills. Start with the models most likely to fit the brief.
Grok Imagine Image Quality
Grok Imagine Image Quality is xAI's recommended higher-quality image model replacing the retiring Pro tier. On Vofy, it supports prompt-based creation, image edits, broad style transfer, and multi-turn refinement at up to 2K with up to 10 outputs per run.

GPT Image 2
GPT Image 2 is OpenAI's state-of-the-art image generation model for fast, high-quality image generation and editing. OpenAI positions it as a major step forward in instruction following, dense text rendering, multilingual layouts, stylistic fidelity, flexible sizing, and stronger world knowledge.
Nano Banana 2
Nano Banana 2 combines Pro-level image quality with Gemini Flash speed — advanced world knowledge, subject consistency across 5 characters, precise text rendering and translation, and 4K output from 512px, all powered by real-time web search.
Seedream 5.0 Lite
Seedream 5.0 Lite is ByteDance's latest AI image creation model — the first to integrate real-time web search during generation. It fuses live web information to improve timeliness, with upgraded intelligence for parsing complex instructions and visual content, broader world knowledge, stronger cross-image consistency, and enhanced enterprise-grade scene generation quality.
Grok Imagine Image Pro
Grok Imagine Image Pro is the legacy Pro image model in xAI's Grok Imagine family. xAI is retiring this model on May 15, 2026, and Vofy now directs new high-quality Grok Imagine image workflows to Grok Imagine Image Quality.
GPT Image 1.5
GPT Image 1.5 is OpenAI's flagship image generation model — a creative studio in your pocket. Precise edits that keep lighting, composition, and likeness intact; creative transformations from photo to movie poster or painting; stronger instruction following; denser text rendering; and 4x faster generation.
13 models
Video models
From quick social cuts to multi-shot scenes, use leading video models without leaving your project.
Kling 2.6
Kling 2.6 is a balanced Kling video model on Vofy for short clips, motion-controlled video, interpolation, and optional audio workflows. The current Vofy setup supports text-to-video, image-to-video, interpolation, and motion control at 720p or 1080p.
Sora 2 Pro
Sora 2 Pro is OpenAI's higher-quality Sora 2 tier on Vofy for longer, more polished AI video. The current Vofy setup supports text-to-video and image-to-video in 16:9 or 9:16, with 720p, 1024p, and 1080p outputs from 4 to 20 seconds.
Veo 3.1 Lite
Veo 3.1 Lite is Google's lower-cost Veo video model on Vofy for high-volume short-form generation. In the current Vofy setup it supports text-to-video, image-to-video, and interpolation in 16:9 or 9:16, with 720p at 4, 6, or 8 seconds and 1080p at 8 seconds.
Kling 3.0
Kling 3.0 is Kuaishou's video generation family combining Video 3.0, Video 3.0 Omni, and Motion Control 3.0. Generate up to 15-second clips at 1080p with multi-shot storytelling, frame interpolation, lip-sync, and audio-aware workflows.
Seedance 2.0
Seedance 2.0 is ByteDance's multimodal AI video model on Vofy for reference-driven creation and video editing workflows. In the current Vofy setup it supports text-to-video, image-to-video, interpolation, reference-image, multimodal-reference, video-to-video, and video-extension generation at 480p or 720p for 4 to 15 seconds, with optional audio and web-search controls.
Seedance 2.0 Fast
Seedance 2.0 Fast is ByteDance's faster, lower-cost Seedance 2.0 tier on Vofy. It keeps the same broad workflow family as Seedance 2.0, including text-to-video, image-to-video, interpolation, reference images, multimodal references, video-to-video, and video extension at 480p or 720p for 4 to 15 seconds.
Built for creators and creative teams
From daily social posts to campaign visuals, use effects and workflows that move from idea to shareable asset fast.
Marketing teams
Launch creative and product photography in minutes.
Indie filmmakers
Storyboards and B-roll with motion control.
E-commerce
On-model try-on and lifestyle shots at catalog scale.
Social creators
Short-form video and daily posts across every format.
Designers
Concept art, moodboards, and surgical inpaint edits.
Agencies
Client pitches with motion-ready, brand-safe frames.
How it works
Start with an effect, add your own idea or media, then generate something ready to share.
Choose an effect
Start with a fresh preset for a video, image, edit, or social visual.
Add your media or prompt
Upload a reference, write a short idea, or adjust the model settings.
Generate and share
Create variations, refine the result, and save something ready for friends, followers, or your next post.
FAQ
Which AI image model is best for product photos?+
GPT Image and Nano Banana Pro are strong defaults for photorealism, clean composition, and text-heavy layouts. Seedream is useful when you need fast lifestyle or product variations.
Which AI video model should I use for short films?+
Sora Pro is a strong choice for narrative scenes with audio. Kling is useful for multi-shot sequences, while Veo and Seedance are practical for polished image-to-video motion.
Can I switch models in the middle of a project?+
Yes. Vofy keeps your prompts, references, and project context together so you can compare models without rebuilding the brief.
Do all models support reference images?+
Support varies by model. Image models commonly support reference-based edits, and several video models support image-to-video workflows. Each model detail page lists the relevant inputs.
How are credits charged across models?+
Credits depend on model tier, resolution, duration, and generation mode. The pricing page explains the credit cost for common image and video workflows.
How quickly do new models arrive on Vofy?+
Vofy is designed to add new frontier model releases quickly, so teams can try new OpenAI, Google, xAI, ByteDance, and Kuaishou models from the same workspace.
One subscription, every frontier model
Switch between image and video models from a single workspace, then keep creating in Studio.