crypto for all
Join
A
A

Google Unveils Veo 3.1 to Rival OpenAI’s Sora 2 with Realistic AI Videos and Sound

19h05 ▪ 6 min read ▪ by Mikaia A.
Getting informed Artificial Intelligence
Summarize this article with:

The war of artificial intelligences reaches a peak. With every announcement, a new model emerges, more daring, more immersive, more… expensive. In this battle of innovations, Google did not want to remain a spectator. By releasing Veo 3.1, it unveils a video AI armed with sounds, dialogues, and new editing capabilities. Facing the viral popularity of Sora 2, the Mountain View firm plays another card: that of narrative precision and creative control.

Two androids representing Google and OpenAI face off in a futuristic city, separated by an orange energy sphere.

In brief

  • Veo 3.1 integrates audio, dialogues and sound effects to enrich the AI-generated scenes.
  • The tool targets serious creators, with editing options and professional formats.
  • Three key modules: image composition, creative transitions, and smooth clip extension.
  • Google’s AI favors visual coherence, sometimes at the expense of action speed.

Technological Duel: Google attacks the queens of AI video

When OpenAI, valued at $500 billion without IPO, launched Sora 2 on September 30, the success was immediate. The app was downloaded more than one million times in only five days, climbing to the top of the App Store. Its approach? A “TikTok-ized” interface, designed for sharing and remixing.

Google did not choose this path. With Veo 3.1, the goal is clear: to address creators, not influencers. The model allows generating videos with 1080p resolution, in horizontal or vertical format, integrating sound atmosphere, synchronized voices, and realistic effects. Accessible via Flow, Vertex AI and Gemini API, it offers two plans: a fast version at $0.15/second, and a standard one at $0.40/second.

The firm emphasizes the audio capabilities, now present in all modules. It promises an unprecedented rendering: the lip synchronization of Veo 3.1 surpasses that of all other models.

Where Sora favors visual dynamism, Veo chooses coherence. Movements are slower, but elements remain stable. It is the price of precision. A positioning that contrasts with the ambitions of Meta or Luma Labs, who focus more on speed and the wow effect.

Stories that speak: Google’s AI wants to tell

One of Veo 3.1’s major bets is narrative immersion. The addition of sound allows Google to take a step forward: no longer just illustrating, but telling with images and voices. Three features stand out:

  • Ingredients to Video: you combine several reference images, and the AI generates a scene with objects and characters;
  • Frames to Video: you provide a starting image and an ending one, and the AI produces a coherent transition;
  • Extend: the AI extends a clip by generating the continuation from the last second.

The tool also allows adding or removing elements, taking shadows and lights into account. This level of detail is the strength of the approach: a film studio within an artificial intelligence interface.

But not everything is perfect. When instructions stray too far from visual logic, the AI goes off track. Some scenes jump from one shot to another, lose characters or completely change atmosphere. It remains a technology under development.

As Google explained in its official blog:

We’re also introducing Veo 3.1, which brings richer audio, more narrative control, and enhanced realism that captures true-to-life textures.

Veo 3.1 does not want to entertain: it wants to move. And this is probably where it differs radically from its competitors.

Demanding UX, stunning result: when artificial intelligence becomes a creative tool

The user experience provided by Veo 3.1 is not that of a social network. It is not a product to consume, but a tool to master. Creators must learn to speak the language of AI. A poorly written prompt or one too far from reference images can produce an incoherent result.

Some tips are already circulating among users. For example, going through Seedream to generate a faithful initial image before importing it into Veo. Or using an audio-aware construction, explicitly mentioning the desired sounds in prompts.

In this regard, here are some concrete facts:

  • Veo has generated more than 275 million videos since the launch of Flow;
  • Three creative modules are available: Ingredients, Frames, Extend;
  • The usage cost is up to 2 times lower than that of Sora 2 Pro;
  • Videos can last up to one minute, with integrated sound;
  • Only three models handle spoken voices: Sora, Grok, and now Veo.

The tool is not easily tamed. But once understood, it delivers videos of rare realism, with accurate intonations and credible characters. It just requires patience, skill… and some credits.

Google no longer hides its ambition to dominate generative AI. Veo 3.1 shows that the firm does not just want to follow. It wants to impose its tempo. And to confirm this thirst for achievement, one of its robots has just solved a math problem considered impossible. The message is clear: the AI giant is just starting to speak.

Maximize your Cointribune experience with our "Read to Earn" program! For every article you read, earn points and access exclusive rewards. Sign up now and start earning benefits.



Join the program
A
A
Mikaia A. avatar
Mikaia A.

La révolution blockchain et crypto est en marche ! Et le jour où les impacts se feront ressentir sur l’économie la plus vulnérable de ce Monde, contre toute espérance, je dirai que j’y étais pour quelque chose

DISCLAIMER

The views, thoughts, and opinions expressed in this article belong solely to the author, and should not be taken as investment advice. Do your own research before taking any investment decisions.