5 min read
Artificial intelligence (AI) image generation technology is accelerating rapidly—in more ways than one. Recent advancements have catapulted the industry from steady progress to relentless breakthroughs, now promising the advent of real-time, high-fidelity image creation.
It's not that these tools were slow—one minute is not too long to wait to "make it more." But users still demand more: more realism, more versatility, more variety, and more speed. And on that latter point, researchers are gladly delivering.
Stability AI has unveiled SDXL Turbo, which may represent a monumental leap in AI image generation. We don’t say that lightly: the recently announced model can generate images in one second instead of the 30 to over 60 seconds that usual generators take. It is almost, if not effectively, real-time AI image generation.
SDXL Turbo is different from all the previous Stable Diffusion models. The Adversarial Diffusion Distillation (ADD) technology is what makes it possible to significantly reduce the number of steps required to generate high-quality images—as few as even one step when normal images might take somewhere from 30 all the way to 100 steps. "ADD is the first method to unlock single-step, real-time image synthesis with foundation models," Stability AI claims in a research paper.
SDXL Turbo employs a hybrid of adversarial training and score distillation, optimizing the generative process and ensuring that images are produced rapidly while maintaining high fidelity.
As a result, the introduction of SDXL Turbo enables the production of complex, high-resolution images almost instantly. This new approach also brings attention to GANs, which were largely forgotten after diffusion technology started to dominate the scene.
If you don’t want to say goodbye to your “legacy” Stable Diffusion models, however, researchers have a solution for you.
Accompanying SDXL Turbo's advancements are Latent Consistency Models (LCMs) and LCM-LoRA, each contributing uniquely to the field.
LCMs, as presented in their dedicated research paper, stand out for their ability to generate high-resolution images by operating efficiently within the latent space of pre-trained autoencoders like Stable Diffusion. LCMs aim to enhance the speed of image generation without a significant loss in quality, focusing on high-resolution outputs. Utilizing a one-stage guided distillation method, LCMs transform pre-trained diffusion models into rapid image generators, skipping unnecessary steps.
In practical terms, users don’t need to change anything else. Just download the model and use it as a normal SDXL checkpoint. However, instead of running through a huge number of steps, they could turn down the gauge to the minimum. The model will produce good images with four steps in a couple of seconds, instead of calculating the generation for 25, 50 or 75 steps per image.
There are already great models with their own LCM versions for you to try. We recommend Hephaistos_NextGENXL for its versatility, but there are many great models available for testing.
Released in tandem with LCMs, LCM-LoRA offers a universal acceleration module that can be integrated into various Stable-Diffusion models. "LCM-LoRA can be viewed as a plug-in neural PF-ODE solver with strong generalization abilities," the research paper reads.
LCM-LoRA is designed to boost the efficiency of existing Stable Diffusion models, making them faster and more versatile. It employs LoRA (Low-Rank Adaptation) to update pre-trained weight matrices, reducing computational load and memory requirements.
With LCM-LoRA, the normal Stable Diffusion models experience a huge increase in their image generation speeds, making them highly effective for various tasks. Users would not even need to download a new model—just activate the LCM LoRA and generate images as fast as an LCM Mode would.
LCM-LoRAs can be downloaded for SD 1.5 and SDXL here.
Despite these technological leaps, the need remains to balance speed and image quality. While rapid-generation tools like SDXL Turbo and LCM-LoRA expedite the creative process, they do so at the expense of some image fidelity. In other words, an image generated with 50 steps and a good model will always have higher resolution or image fidelity than an image generated with 5 steps and a good LCM model.
However, this trade-off is mitigated by their utility in typical workflows where numerous images are generated to find the perfect one. Subsequent iterations with tools like image-to-image or inpaint can enhance details in these first-cut images, making up for any initial loss in quality. A properly edited image generated with one of these fast technologies can be as good as an image generated by a normal Stable Diffusion model.
Fasten your seatbelts because the AI image generation space is shifting into overdrive—and few people crave speed more than AI fanboys.
Decrypt-a-cookie
This website or its third-party tools use cookies. Cookie policy By clicking the accept button, you agree to the use of cookies.