6 min read
Illustrious, a text-to-image model based on Stable Diffusion XL, has become so dominant in the AI art community that Civitai, the largest hub for AI art models, had to create a separate category just to handle its massive ecosystem of resources.
And it all happened in three months. The secret behind its success? A return to the basics with a twist.
While newer models like SD 3.5 and Flux rely on lengthy natural language descriptions, Onoma AI, the developers of Illustrious, took a different approach by leveraging Danbooru tags to help their model understand concepts without having to reinvent the wheel with complex captioning systems.
The model's training on Danbooru's vast library of tagged anime images gives it an edge in understanding visual concepts.
Each tag in the Danbooru system represents specific elements like character features, clothing items, poses, or backgrounds, allowing for precise control over the generated images without wasting precious tokens on lengthy descriptions.
These tags have been around for years and have become kind of a standard for image categorization among art/anime enthusiasts.
The model is highly accurate and efficient when it comes to understanding the characteristics of a photo.
"It's like having an artist who understands exactly what you want without having to explain it in paragraphs," Vishnu, a Discord member who participates in a server focused on NSFW AI content, told Decrypt. "You just need to know the right tags."
At its core, Illustrious uses the good old SDXL architecture with a sophisticated dual-encoder system that combines CLIP ViT-L and OpenCLIP ViT-bigG to understand words and associate them with their visual equivalent.
The model is capable of processing and generating images at an impressive 1536×1536 resolution, with the capability to stretch up to 2048×2048 and even 3744x3744 without significant quality loss.
For context, the original SDXL handled full HD resolutions (1024x1024).
The journey to create Illustrious was methodical and deliberate. The initial training phase, which produced version 0.1, processed 7.5M images at 1024×1024 resolution with a batch size of 192 images per batch.
The team carefully balanced learning rates, running for 20 epochs (the process in which AI studies 100% of its dataset) to establish a solid foundation. Once the results were satisfactory enough, the team proceeded to increase the size of the dataset and the resolutions used for the next iterations.
In the advanced training phase, Illustrious truly began to shine. Version 1.0 expanded the dataset to 10 M images and bumped the resolution to 1536×1536.
Though they reduced the batch size to 128, they introduced sophisticated tag manipulation strategies and register tokens, fundamental changes defining the model's exceptional performance.
The final refinement phase for version 2.0 took things a bit further. Working with 20M images at the same high resolution but with a larger batch size of 512, the team incorporated a multi-caption method that dramatically improved text-image correspondence.
The result was the best waifu generator known to man, with good finetuning capabilities, prompt adherence, decent aesthetics, and high-quality outputs.
For the more tech-savvy, the Illustrious devs also introduced a lot of interesting techniques like a “No Dropout Token” approach, ensuring that specific tokens would never be excluded during training; the implementation of Quasi-Register Tokens, for the model to be capable of handling unknown or weird concepts; a Cosine Annealing Scheduler, for the learning rate; a Multi-Level Dropout system and Input Perturbation Noise Augmentation, to turn a simple AI model into a powerhouse.
Illustrious doesn’t need any additional steps to run.
The installation process is the same as with any other SDXL Model. Download the checkpoint and put it in the corresponding folder, depending on which UI you use.
Windows and Linux
MacOS
Mac users have similar routes. However, some popular macOS-oriented UIs require additional steps.
Once the model is loaded, there are three things to consider.
There are many models to choose from, all focusing on different styles, aesthetics, and characteristics.
There are even general models like the ones from Noob AI that used Illustrious as a base and are being used by fine-tuners to build their models.
However, here are our top pics for different needs. These are great at prompt understanding, output quality, and ease of use. All the samples are from the Civit AI community and are copyright-free.
Link: Mistoon_Anime - v1.0 Illustrious | Illustrious Checkpoint | Civitai
Link: Smooth Mix - Illustrious | Pony - Illustrious | Illustrious Checkpoint | Civitai
Link: NTR MIX | illustrious-XL | Noob-XL - XIII | Illustrious Checkpoint | Civitai
Link: THRILLustrious - v5.0 THRILLed | Illustrious Checkpoint | Civitai
Edited by Sebastian Sinclair and Josh Quittner
Decrypt-a-cookie
This website or its third-party tools use cookies. Cookie policy By clicking the accept button, you agree to the use of cookies.