6 min read
Recent rapid developments in artificial intelligence rank among the most significant technological breakthroughs of the decade. Today, text-to-art, generative AI models like Midjourney and DALL-E are so sophisticated that sometimes users' own human limitations—rather than the model's constraints—are often the primary obstacle when people have first contact with the technology.
When you can create anything, people grapple with deciding "what to create," leading to decision paralysis.
However, AI has its own struggles too. The perfect example is creating perfect hands. The web is littered with eerie, terrifying images of model-perfect people with too many, too few, or impossibly interconnected fingers.
Why is it that a model capable of generating realistic images of a bear in a tuxedo riding a bicycle in the Swiss Alps still has trouble with something as simple as a hand? The answer is far from straightforward.
First, humans have not always been exceptionally skilled at creating hands. Mastering realistic hand drawing has taken us centuries, to say the least. Just as an example, these hands from different eras are not realistic -—and certainly not beautiful.
In fact, human artists have only managed to consistently create visually pleasing hand representations in the last 600 years. That means only about 0.3% of our 200,000-year-old art history features beautiful hands. In this regard, let's give machines some credit.
There are quite a few reasons for AI's struggle with hands, but they can be divided into two categories: biological and technical.
The hand's complexity stems from a fundamental biological characteristic: it’s the body part with the most joints in a small area. Consequently, a single hand can have dozens of different positions and representations, which is far from ideal for identifying patterns.
Basically, an AI struggles to identify what makes a hand a hand. And the most common, basic characteristics (skin color, skin texture, nails, a palm, and a plural but unidentifiable number of fingers) are not enough to meet our criteria.
What do all of these images have in common?
Artificial intelligence has made significant progress in generating realistic images, and to some extent, it has succeeded even with hands. Despite having five, six, or seven fingers, we can still recognize that AI creates hands—recognizable facsimiles at least.
However, hands play such a crucial role in our lives and bodies that our perception has extremely high standards. It's more unsettling to see a hand with six fingers or without knuckles than, for example, a woman without a navel or a person with shorter-than-average legs.
This leads to AI hands falling into the uncanny valley, where they appear too realistic to be a fake representation yet too fake to look real.
Technically speaking, AI-generated images have trouble accurately depicting anything with defined, regular patterns. For instance, AI-created images of a barefoot person with toned abs and a smiling mouth with visible teeth may probably have too many toes, too many teeth, or perhaps an implausible number of abs.
Images generated by Decrypt using Stable Diffusion.
However, these inconsistencies don't bother us as much because teeth and abs don't play as significant a role in our lives the same way hands do. Most people would prefer to lose a tooth rather than a finger and can certainly live without a six-pack—unless they're a bodybuilder.
Data scarcity is another issue. AIs have not yet been trained with sufficient data to focus on hands specifically. The algorithm generally understands that when one finger is present, there are typically more. Still, it lacks the detail needed to truly comprehend each finger joint's behavior, location, and the hand's overall function on each of the billion images provided for training.
For example, this image (number 2,120,079,006,880 from the Laion-2b-en data model used to train Stable Diffusion) is described as "Man with impaired posture position defect scoliosis and ideal," but it doesn’t add information to describe what his normal hands look like: “his hand is in a relaxed position, with the fingers slightly near each other and curved towards his body with the thumb not visible”
Image from the Laion-5b dataset. Source: Stability.ai
Stable Diffusion was trained using the Laion-5b dataset. Why don't you try and spot and properly describe human hands in a dataset of 5,85 billion images? Good luck.
Given that the problem partly lies in inadequate training, it's reasonable to assume that text-to-image generation models will eventually overcome the challenge of creating realistic hands.
For instance, Decrypt was recently provided samples of MidJourney's impressive competence in generating realistic hands with its most recent version. In a few months, the algorithm's sixth iteration is likely to yield even more realistic results, given the increasing investment in these technologies and the availability of more powerful hardware to process vast amounts of data.
Samples of hands generated with MidJourney V5. Image created by Decrypt using AI.
Even now, ugly hands are starting to fade into the past—at least for professional or experienced AI artists. It's already possible to generate realistic hands using Stable Diffusion by providing guidance for the process.
Stable Diffusion is an open-source AI image generation model similar to MidJourney or DALL-E. The key difference is that, because of its open architecture, the community can adapt it to their needs, creating custom models focused on anything from futuristic images to cartoonish art and—of course—uncensored adult images.
In addition, users can create plugins compatible with Stable Diffusion for various purposes, such as poses, depth maps, model merging, and implementing instructions for creating realistic hands.
To generate pictures with perfect hands with Stable Diffusion today, users will need to install and configure the ControlNet plugin, provide a reference image with normal hands to the installed Openpose model, give Stable Diffusion the desired prompt, and evaluate the generated image.
Once that’s done, users have to play with parameters and practice—a lot. But this method (which can identify over 20 different keypoints in a human hand) proves more effective than the inpainting technique, which involved instructing the machine to modify only the hand portion and hoping for the best outcome.
If you don’t want to deal with all that, of course, you can just use Photoshop and edit your pictures with horrible hands. Adobe has been selling AI software to improve images for 30 years, so in a way, you are also technically an AI artist if you use any image editing software.
As AI models continue to evolve and improve, the quality of generated hands and other complex patterns will undoubtedly advance. The combination of increased investment, data availability, and hardware capabilities, as well as collaboration within the open-source community, will drive significant progress in this field.
Decrypt-a-cookie
This website or its third-party tools use cookies. Cookie policy By clicking the accept button, you agree to the use of cookies.