Chinese Tech Giant Alibaba Shows Off AI That Can ‘Animate Anyone’

If you can't dance, Animate Anyone can make it happen with just one still photo. And MagicAnimate is gaining fast.

By Jose Antonio Lanz

Dec 5, 2023

4 min read

Image: Youtube

Add on Google

We've come a long way from Adobe Flash and animated JibJab e-cards.

Two decades on, and people with computers and a little free time can create high-quality animations (of both real people and illustrations) with just a few clicks and zero knowledge of digital editing.

That is the pitch, at least, of "Animate Anyone," an AI model introduced by the AI research team of Alibaba, a Chinese multinational technology company specializing in e-commerce and retail technology. And video of its technology at work—claiming to be able to aanimate any photo with remarkable consistency and control—has captured the imagination of millions.

Alibaba says Animate Anyone can transform photos into videos “as controlled by desired pose sequences and achieving temporal continuity,” explained AI avatar startup MyCompanions on Twitter. ”Less glitches and no extra fingers—pretty cool!”

The team adds that this technology opens doors for new use cases among influencers: AI-generated clothes and creating a market for mass-produced yet personalized videos.

Short form videos from a single photo? We'll be able to do this for all our influencers soon!

Based on the latest cutting edge research from the Alibaba group, this is nearly here. Why is this tech important? How can influencers best use this tech?

Thread below 👇 pic.twitter.com/C4QCJCeEXP

— MyCompanions (@MyCompanionsAI) December 3, 2023

The model's GitHub page was deluged with requests for access to the source code. In response, the team has reassured the public that it will make the demo and code available at a yet-unspecified date.

“Thank you all for your incredible support and interest in our project,” the team said on the project’s latest Github update. “We want to assure you that we are actively working on preparing the demo and code for public release.”

The statement drew over 240 likes in less than one day.

If the video demonstration is accurate, Animate Anyone can be used to create clear, temporally stable video results while maintaining the appearance of the reference character. This appears to be the result of the integration of diffusion models into a novel framework called ReferenceNet, which can merge detailed features via spatial attention.

To accomplish this, it takes the reference image, moves the parts to follow the desired pose, and then fills in the gaps that need to be filled in order to give the illusion of consistent movement for each frame of the generated video. The so-called openpose sequence results in near-flawless animation.

Animate Anyone is also drawing favorable comparisons to other popular animation tools like AnimateDiff, Warpfusion, Deforum, and ebSynth. These existing tools often fall short in generating consistent frames, making it easy to identify videos as being AI-generated. In contrast, Animate Anyone prouces more refined output, where frames are consistent and the animation is almost indistinguishable from reality.

The Animate Anyone team has not responded to a request for comment from Decrypt.

Amid the frenzy, however, a similar model named MagicAnimate has also emerged as a solid competitor. Recently made available for local testing, MagicAnimate takes a slightly different approach to the animation process. While not as popular, its release offers an alternative for those keen on exploring the realm of AI-driven animation more fully.

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model with @Gradio demo

local demo: https://t.co/ScsEU6oG64

This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion… pic.twitter.com/JCOr0yCRZs

— AK (@_akhaliq) December 4, 2023

In contrast with Animate Anyone—which also uses a diffusion model but with a focus on frame-consistent and controllable animation from images—MagicAnimate's differentiator is enhancing temporal consistency and identity preservation. Its unique appearance encoder and video fusion technique reportedly leads to smoother transitions in long video animations and better detail preservation across frames.

While MagicAnimate excels in temporal coherence and per-frame quality, however, it does not seem to be as accurate as its competitor.

Former Meta AI researcher Alex Carliera had the opportunity to test MaticAnimate, and while he dubbed it as “a great first step for consistent video generation from a single image,” he noted that the generations were not 100% accurate versus the reference image, deforming the body in some frames.

I tested the ControlNet for video (MagicAnimate) and here are is my opinion: it works great but has some flaws.

- the identity of the motion video leaks to the resulting video (and deforms body shape)
- bad hands and face (unsurprisingly!)

But a great first step for consistent… https://t.co/zY9tZZ6MaK pic.twitter.com/J9XELE5NGT

— Alex Carlier (@alexcarliera) December 4, 2023

So if you can't dance and feel left out of the latest TikTok choreography, perhaps Animate Anyone and MagicAnimate can be your ticket to viral success.

Edited by Ryan Ozawa.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Coin Prices