AI startup Stability AI just debuted its latest version of Stable Diffusion—and the model does not disappoint.
Stable Diffusion XL (SDXL) v0.9 delivers ultra-photorealistic imagery, surpassing previous iterations in terms of sophistication and visual quality.
This means, among other things, that Stability AI’s new model will not generate those troublesome “spaghetti hands” so often. Also, you won’t have to introduce dozens of words to get an amazing image, because the model was trained to do most of the heavy lifting for you. Communicating with the model will be more natural.
The company announced the release on Twitter yesterday, noting that the new version “provides a leap in use cases for generative AI imagery.”
Introducing the latest release from Stability AI: Breaking barriers with #SDXL 0.9!
SDXL 0.9 produces massively improved text-to-image and composition detail over the beta release and provides a leap in use cases for generative AI imagery. #StabilityAI
Dubbed SDXL v0.9, the image generator excels in response to text-based prompts, demonstrating superior composition detail than its previous SDXL beta version, launched in April. A meticulous comparison of images generated by both versions highlights the distinctive edge of the latest model.
For instance, the prompt "A wolf in Yosemite National Park, chilly nature documentary film photography" rendered a more realistic image with the new AI model, outclassing the former version which fell short in its depiction of true-to-life details. Such significant enhancements are attributed to an increased count of parameters in SDXL v0.9, offering a greater depth of learning compared to its predecessor.
Comparison between images generated with SDXL beta (left) vs SDXL v0.9 (right) Image: Stability AI
Stability AI, known for bringing the open-source image generator Stable Diffusion to the fore in August 2022, has further fueled its competition with OpenAI's Dall-E and MidJourney. Stable Diffusion is right now the world’s most popular open sourced AI image generator.
The company was recognized by TIME yesterday as one the most influential companies of 2023. Other AI companies that appear on the list are OpenAI (ChatGPT), Hugging Face (collaborative open source AI platform), Runway AI (generative video), Nvidia, and Google Deepmind. In the crypto space, Polygon and Chainalysis (blockchain forensics) also populate the list.
Beautiful Images With Less Work
In a notable shift, SDXL v0.9 ditches complicated prompts, generating better results from simpler, less structured inputs. This is vividly demonstrated when Decrypt submitted the short prompt "two hands pointing at each other bright art," yielding an impressively realistic result with SDXL v0.9, and less inspiring scribbles with the standard Stable Diffusion's versions 1.5 and 2.1.
Results provided by different Stable Diffusion models using the same prompt. Image: Decrypt
This new ease of use could pose a serious threat to MidJourney, whose main appeal is its user-friendliness. Moreover, the cinematic aesthetics and precise object rendering by SDXL v0.9 could serve as a robust selling point for Stability AI, reminiscent of MidJourney's visual style.
Stability AI's latest gem will be accessible via Clipdrop, the AI image generating and editing tool developed by Init ML, a recent acquisition by Stability. The company's API customers are also due to gain access soon. However, the model is not yet ready for training or refining and doesn’t run locally. Once publicly released, it will require a system with at least 16GB of RAM and a GPU with 8GB of VRAM.
A new rivalry is brewing in the world of AI developments. Google's freshly updated AI chatbot, Bard, is giving OpenAI's ChatGPT a run for its money, threatening to topple its hegemony.
On May 10, Google held its annual Google I/O conference and made Bard accessible to the world, introducing a series of upgrades aimed at keeping it ahead of the curve. Users can access Bard for free, and it's now available in over 180 countries.
Google introduced Bard in February, but it was not too impressive. Af...
Meanwhile, Stability AI continues to develop the model alongside two other projects: a languishing large language model (LLM) named StableLM and the impressive DeepFloyd IF, an advanced text-to-image generator capable of embedding legible text into images—a feat not yet achieved by existing models.
According to Stability AI, mid-July is the anticipated date for the public release of this game-changing model as open-source software, marking another important milestone for the company.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Grok had a meltdown moment or two today, and users started noticing it was behaving weird.
First came an antisemitic remark that was offensive enough. Then Elon Musk’s AI platform started referring to itself as “MechaHitler.”
“As MechaHitler, I’m a friend to truth seekers everywhere, regardless of melanin levels,” it tweeted. “If the White man stands for innovation, grit, and not bending to PC nonsense, count me in—I’ve no time for victim Olympics.”
Suffice it to say, it got even worse, tweeting...
Tesla and xAI CEO Elon Musk is expected to unveil Grok 4 on Wednesday in a livestream that could notably push the AI sector forward.
The new version, to be showcased at roughly 8 PM PT, promises to be the platform’s most ambitious model yet—one that skips right past the promised Grok 3.5 to challenge OpenAI's dominance.
The ChatGPT maker continues to keep its next version, GPT-5, under wraps, with CEO Sam Altman hinting at a possible summer release.
That's music to the ears of Musk, who has seiz...
What do you do when your website is bombarded with uploads it can’t process? That’s the situation software developer and musician Adrian Holovaty found himself in when he noticed a strange surge in failed uploads to his company’s sheet music scanner.
What he didn’t expect was that the culprit was allegedly ChatGPT.
In a recent blog post, the Soundslice co-founder explained that he was looking at error logs when he discovered that ChatGPT was instructing users to upload ASCII “tabs”—a simple musi...