New AI Image Generator Does More Than SDXL With Less

Stability AI has adopted a new architecture that outperforms SDXL and SD 1.5 in both speed and quality.

4 min read

Feb 14, 2024

Stability AI, the company behind the wildly popular Stable Diffusion image generator, has just lobbed another grenade into the hotly competitive AI arena.

Stability’s brand new Stable Cascade, powered by the new, open-source Würstchen architecture, provides a highly efficient and modular approach to text-to-image generation, balancing quality, speed, and adaptability.

The model achieves a compression factor unlike anything previously seen in traditional Stable Diffusion models, the company claims, and it is capable of producing results of greater resolution and details—comparable to modern generators like SDXL or MidJourney (which typically work with 1024x1024 resolutions).

Image: Stability AI

Würstchen ingredients

Stable Cascade adopts a three-stage process, as distinct from the traditional Stable Diffusion pipeline:

Stage A: The Image Compressor: Unlike typical models, this initial stage processes images like advanced puzzles. Employing a Vector-Quantized Generative Adversarial Network (VQGAN), the image is chopped into compact 256x256 sections. Each section receives a discrete "token" from a specialized codebook. This step paves the way for lightning-fast processing in the stages that follow.
Stage B: The Rebuilder (Latent Diffusion Model) This phase handles the image reconstruction work after compression. Think of it as a skilled building renovator using detailed instructions and blueprints for its work.
Stage C: The Text-Conditional Latent Generator Stage C focuses solely on processing text-based instructions and producing compressed latents. This decoupled text-generation approach drastically reduces the complexity and cost of fine-tuning for specific use cases.

Image: Stability AI

In other words, it does what its name suggests. It begins with a text-driven generator that churns out tiny image snapshots, which are inflated into a more detailed one, then properly presented to your eyes as a high-quality, full-resolution image.

Modular advantages

The modular design of Stable Cascade brings several compelling advantages, according to its developers. First is the extreme efficiency: due to the compressed latent space (the way an AI evaluates image composition as opposed to pixel space, which is what humans see) and the focused Stage C model, Stable Cascade achieves faster inference times, meaning it calculates its predictions faster. And it does so with significantly reduced hardware requirements compared to larger Stable Diffusion models like SDXL.

Stability AI's internal tests demonstrated Stable Cascade's ability to consistently outperform comparable models like SDXL in terms of both image quality and aesthetic appeal. Further, the model achieves these results at very high speeds while demanding significantly fewer computational resources.

Image: Stability AI

Another advantage that Stability AI claims is its versatility. Many of the tools Stable Diffusion artists now use to refine their work —like ControlNets or LoRas—are compatible. And, because of its extreme efficiency, users can add more of these tools into their workflows without collapsing their memories.

The model's lightweight architecture, smaller model footprint, and compatibility with less powerful computing hardware lower the barrier to entry, increasing the accessibility of advanced text-to-image generation techniques for casual users and researchers alike.

Doing more with less

Our tests have found that the model is accurate and detailed and does not show the washed-out, rubbery aesthetic of Stability AI’s previous SDXL turbo or LCM models. Instead, it generates highly detailed images on par with fine-tuned SDXL models.

It also has some basic text generation capabilities, which can be further enhanced with LoRAs that are already available in online repositories like Civitai.

Stability AI reports that despite hosting more parameters than Stable Diffusion XL, Stable Cascade still enjoys faster inference times and excels at prompt alignment.

Fine-tuning Stable Cascade is also less resource-intensive compared to similar-sized Stable Diffusion models. Researchers and enthusiasts can potentially train the model on smaller datasets and with considerably less computing power, which makes it very cost-efficient.

Stable Cascade is released under a non-commercial research license and is readily available on Stability AI's GitHub repository with a community-maintained ComfyUI workflow already available that automatically downloads the models for more ease of use.

Edited by Ryan Ozawa.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On It

Google is building a chip designed for one job: running Gemini faster and cheaper. The chip, codenamed Frozen v2, was reported by The Information on Monday and gives Google a potential answer to a problem it can't spend its way out of fast enough: It is running out of capacity to serve the AI demand it has already generated. In March, Google told Meta it couldn't fill the volume of Gemini compute Meta wanted to purchase. Meta had to instruct employees to ration their AI usage. Google—spending...

XRP Breaks Out on Clarity Act Hopes. The Charts Are Still Cautious

It was a good day to own some crypto. Reports emerged late Monday that President Donald Trump agreed to the ethics provision that had been holding the Clarity Act hostage in the Senate—a major sticking point in months of negotiations. Bitcoin climbed above $66,000 following the news, five-day spot ETF inflows surpassed $700 million, and a bounce in Asian AI and semiconductor stocks added to the risk-on tone that lifted nearly every digital asset. XRP caught the wave harder than most. As Decryp...

Jack Mallers Quits Twenty One Capital as Tether's Bitcoin Merger Collapses

Jack Mallers has stepped down as CEO of Twenty One Capital, and investors didn't take it well. Shares of the Bitcoin treasury company—a publicly traded firm that holds Bitcoin on its balance sheet, letting regular investors gain exposure to the cryptocurrency without buying it directly—dropped nearly 15% on Tuesday. Mallers co-founded Twenty One alongside Tether—the issuer of USDT, the world's most widely used dollar-pegged stablecoin (a digital token that holds a fixed value of one dollar and...

News

Courses

Deep Dives

Coins

Videos

New AI Image Generator Does More Than SDXL With Less

Stability AI has adopted a new architecture that outperforms SDXL and SD 1.5 in both speed and quality.

Würstchen ingredients

Modular advantages

Doing more with less

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Recommended News

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On It

XRP Breaks Out on Clarity Act Hopes. The Charts Are Still Cautious

Jack Mallers Quits Twenty One Capital as Tether's Bitcoin Merger Collapses