On Tuesday, Google launched Veo 3, a brand new AI video synthesis mannequin that may do one thing no main AI video generator has been in a position to do earlier than: create a synchronized audio observe. Whereas from 2022 to 2024, we noticed early steps in AI video era, every video was silent and normally very quick in length. Now you possibly can hear voices, dialog, and sound results in eight-second high-definition video clips.
Shortly after the brand new launch, individuals started asking the obvious benchmarking query: How good is Veo 3 at faking Oscar-winning actor Will Smith at consuming spaghetti?
First, a short recap. The spaghetti benchmark in AI video traces its origins again to March 2023, when we first covered an early instance of horrific AI-generated video utilizing an open supply video synthesis mannequin referred to as ModelScope. The spaghetti instance later grew to become well-known sufficient that Smith parodied it nearly a yr later in February 2024.
This is what the unique viral video appeared like:
One factor individuals overlook is that on the time, the Smith instance wasn’t one of the best AI video generator on the market—a video synthesis mannequin referred to as Gen-2 from Runway had already achieved superior outcomes (although it was not but publicly accessible). However the ModelScope consequence was humorous and bizarre sufficient to stay in individuals’s recollections as an early poor instance of video synthesis, useful for future comparisons as AI fashions progressed.
AI app developer Javi Lopez first got here to the rescue for curious spaghetti followers earlier this week with Veo 3, performing the Smith take a look at and posting the results on X. However as you may discover under if you watch, the soundtrack has a curious high quality: The fake Smith seems to be crunching on the spaghetti.
On X, Javi Lopez ran “Will Smith consuming spaghetti” in Google’s Veo 3 AI video generator and obtained this consequence.
It is a glitch in Veo 3’s experimental capacity to use sound results to video, seemingly as a result of the coaching information used to create Google’s AI fashions featured many examples of chewing mouths with crunching sound results. Generative AI fashions are pattern-matching prediction machines, and so they should be proven sufficient examples of assorted forms of media to generate convincing new outputs. If an idea is over-represented or under-represented within the coaching information, you may see uncommon era outcomes, similar to jabberwockies.