We've come a long way from Adobe Flash and animated JibJab e-cards.
Two decades on, and people with computers and a little free time can create high-quality animations (of both real people and illustrations) with just a few clicks and zero knowledge of digital editing.
That is the pitch, at least, of "Animate Anyone," an AI model introduced by the AI research team of Alibaba, a Chinese multinational technology company specializing in e-commerce and retail technology. And video of its technology at work—claiming to be able to aanimate any photo with remarkable consistency and control—has captured the imagination of millions.
Alibaba says Animate Anyone can transform photos into videos “as controlled by desired pose sequences and achieving temporal continuity,” explained AI avatar startup MyCompanions on Twitter. ”Less glitches and no extra fingers—pretty cool!”
The team adds that this technology opens doors for new use cases among influencers: AI-generated clothes and creating a market for mass-produced yet personalized videos.
The model's GitHub page was deluged with requests for access to the source code. In response, the team has reassured the public that it will make the demo and code available at a yet-unspecified date.
“Thank you all for your incredible support and interest in our project,” the team said on the project’s latest Github update. “We want to assure you that we are actively working on preparing the demo and code for public release.”
The statement drew over 240 likes in less than one day.
If the video demonstration is accurate, Animate Anyone can be used to create clear, temporally stable video results while maintaining the appearance of the reference character. This appears to be the result of the integration of diffusion models into a novel framework called ReferenceNet, which can merge detailed features via spatial attention.
To accomplish this, it takes the reference image, moves the parts to follow the desired pose, and then fills in the gaps that need to be filled in order to give the illusion of consistent movement for each frame of the generated video. The so-called openpose sequence results in near-flawless animation.
Animate Anyone is also drawing favorable comparisons to other popular animation tools like AnimateDiff, Warpfusion, Deforum, and ebSynth. These existing tools often fall short in generating consistent frames, making it easy to identify videos as being AI-generated. In contrast, Animate Anyone prouces more refined output, where frames are consistent and the animation is almost indistinguishable from reality.
The Animate Anyone team has not responded to a request for comment from Decrypt.
Amid the frenzy, however, a similar model named MagicAnimate has also emerged as a solid competitor. Recently made available for local testing, MagicAnimate takes a slightly different approach to the animation process. While not as popular, its release offers an alternative for those keen on exploring the realm of AI-driven animation more fully.
In contrast with Animate Anyone—which also uses a diffusion model but with a focus on frame-consistent and controllable animation from images—MagicAnimate's differentiator is enhancing temporal consistency and identity preservation. Its unique appearance encoder and video fusion technique reportedly leads to smoother transitions in long video animations and better detail preservation across frames.
While MagicAnimate excels in temporal coherence and per-frame quality, however, it does not seem to be as accurate as its competitor.
Former Meta AI researcher Alex Carliera had the opportunity to test MaticAnimate, and while he dubbed it as “a great first step for consistent video generation from a single image,” he noted that the generations were not 100% accurate versus the reference image, deforming the body in some frames.
So if you can't dance and feel left out of the latest TikTok choreography, perhaps Animate Anyone and MagicAnimate can be your ticket to viral success.
Edited by Ryan Ozawa.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.