AI Image & Video Models

Every frontier model in one canvas. Generate, edit, animate, and compare without juggling subscriptions.

10 models

Image models

Photorealism, typography, product shots, and editorial stills. Start with the models most likely to fit the brief.

Grok Imagine Image Quality

Grok Imagine Image Quality is xAI's recommended higher-quality image model replacing the retiring Pro tier. On Vofy, it supports prompt-based creation, image edits, broad style transfer, and multi-turn refinement at up to 2K with up to 10 outputs per run.

GPT Image 2

GPT Image 2

GPT Image 2 is OpenAI's state-of-the-art image generation model for fast, high-quality image generation and editing. OpenAI positions it as a major step forward in instruction following, dense text rendering, multilingual layouts, stylistic fidelity, flexible sizing, and stronger world knowledge.

Nano Banana 2

Nano Banana 2 combines Pro-level image quality with Gemini Flash speed — advanced world knowledge, subject consistency across 5 characters, precise text rendering and translation, and 4K output from 512px, all powered by real-time web search.

Seedream 5.0 Lite

Seedream 5.0 Lite is ByteDance's latest AI image creation model — the first to integrate real-time web search during generation. It fuses live web information to improve timeliness, with upgraded intelligence for parsing complex instructions and visual content, broader world knowledge, stronger cross-image consistency, and enhanced enterprise-grade scene generation quality.

Grok Imagine Image Pro

Grok Imagine Image Pro is the legacy Pro image model in xAI's Grok Imagine family. xAI is retiring this model on May 15, 2026, and Vofy now directs new high-quality Grok Imagine image workflows to Grok Imagine Image Quality.

GPT Image 1.5

GPT Image 1.5 is OpenAI's flagship image generation model — a creative studio in your pocket. Precise edits that keep lighting, composition, and likeness intact; creative transformations from photo to movie poster or painting; stronger instruction following; denser text rendering; and 4x faster generation.

13 models

Video models

From quick social cuts to multi-shot scenes, use leading video models without leaving your project.

Kling 2.6

Kling 2.6 is a balanced Kling video model on Vofy for short clips, motion-controlled video, interpolation, and optional audio workflows. The current Vofy setup supports text-to-video, image-to-video, interpolation, and motion control at 720p or 1080p.

Sora 2 Pro

Sora 2 Pro is OpenAI's higher-quality Sora 2 tier on Vofy for longer, more polished AI video. The current Vofy setup supports text-to-video and image-to-video in 16:9 or 9:16, with 720p, 1024p, and 1080p outputs from 4 to 20 seconds.

Veo 3.1 Lite

Veo 3.1 Lite is Google's lower-cost Veo video model on Vofy for high-volume short-form generation. In the current Vofy setup it supports text-to-video, image-to-video, and interpolation in 16:9 or 9:16, with 720p at 4, 6, or 8 seconds and 1080p at 8 seconds.

Kling 3.0

Kling 3.0 is Kuaishou's video generation family combining Video 3.0, Video 3.0 Omni, and Motion Control 3.0. Generate up to 15-second clips at 1080p with multi-shot storytelling, frame interpolation, lip-sync, and audio-aware workflows.

Seedance 2.0

Seedance 2.0 is ByteDance's multimodal AI video model on Vofy for reference-driven creation and video editing workflows. In the current Vofy setup it supports text-to-video, image-to-video, interpolation, reference-image, multimodal-reference, video-to-video, and video-extension generation at 480p or 720p for 4 to 15 seconds, with optional audio and web-search controls.

Seedance 2.0 Fast

Seedance 2.0 Fast is ByteDance's faster, lower-cost Seedance 2.0 tier on Vofy. It keeps the same broad workflow family as Seedance 2.0, including text-to-video, image-to-video, interpolation, reference images, multimodal references, video-to-video, and video extension at 480p or 720p for 4 to 15 seconds.

How it works

Start with an effect, add your own idea or media, then generate something ready to share.

Step01

Choose an effect

Start with a fresh preset for a video, image, edit, or social visual.

Step02

Add your media or prompt

Upload a reference, write a short idea, or adjust the model settings.

Step03

Generate and share

Create variations, refine the result, and save something ready for friends, followers, or your next post.

FAQ

Which AI image model is best for product photos?+

GPT Image and Nano Banana Pro are strong defaults for photorealism, clean composition, and text-heavy layouts. Seedream is useful when you need fast lifestyle or product variations.

Which AI video model should I use for short films?+

Sora Pro is a strong choice for narrative scenes with audio. Kling is useful for multi-shot sequences, while Veo and Seedance are practical for polished image-to-video motion.

Can I switch models in the middle of a project?+

Yes. Vofy keeps your prompts, references, and project context together so you can compare models without rebuilding the brief.

Do all models support reference images?+

Support varies by model. Image models commonly support reference-based edits, and several video models support image-to-video workflows. Each model detail page lists the relevant inputs.

How are credits charged across models?+

Credits depend on model tier, resolution, duration, and generation mode. The pricing page explains the credit cost for common image and video workflows.

How quickly do new models arrive on Vofy?+

Vofy is designed to add new frontier model releases quickly, so teams can try new OpenAI, Google, xAI, ByteDance, and Kuaishou models from the same workspace.

One subscription, every frontier model

Switch between image and video models from a single workspace, then keep creating in Studio.