Apple Shakes Up Open-Source AI With MGIE Image Editor

The new technique lets users chat in natural language with the model and produces better results than more established methods like Pix2Pix.

4 min read

Feb 7, 2024

After seemingly lurking on the sidelines most of last year, Apple is starting to shake things up in the field of artificial intelligence—and open-source AI in particular.

The Cupertino-based tech giant has partnered with the University of Santa Barbara to develop an AI model that can edit images based on natural language, the same way people interact with ChatGPT. Apple calls it Multimodal Large-Language Model-Guided Image Editing (MGIE).

MGIE interprets text instructions provided by users, processing and refining them to generate precise image editing commands. Integrating a diffusion model enhances the process, enabling MGIE to apply edits based on the characteristics of the original image.

Multimodal Large Language Models (MLLMs), which can process both text and images, form the foundation of the MGIE method. Unlike traditional single-mode AIs focusing solely on text or images, MLLMs can process complex instructions and work in a wider range of situations. For example, a model may understand a text instruction, analyze the elements of a specific photo, then take something out of the image and create a new picture without that element.

To perform these actions, an AI system must have different capabilities, including generative text, generative image, segmentation, and CLIP analysis, all in the same process.

The introduction of MGIE brings Apple closer to achieving capabilities akin to OpenAI's ChatGPT Plus, enabling users to engage in conversational interactions with AI models to create customized images based on text input. With MGIE, users can provide detailed instructions in natural language—"remove the traffic cone from the foreground"—which is translated into image editing commands and executed.

In other words, Users can start with a photo of a blonde person and turn them into a ginger just by saying, "make this person a redhead." Under the hood, The model would understand the instruction, segment the person's hair, generate a command like "red hair, highly detailed, photorealistic, ginger tone," and then execute the changes via inpainting.

Apple's approach aligns with existing tools like Stable Diffusion, which is can be augmented with a rudimentary interface for text-guided image editing. Leveraging third-party tools like Pix2Pix, users can interact with the Stable Diffusion interface using natural language commands, witnessing real-time effects on edited images.

Apple’s approach, however, proves to be more accurate than any other similar method.

Results of editing an image with natural language using Instruct Pix2Pic, LGIE, Apple's MGIE and Ground Truth Image: Apple

Besides generative AI, Apple MGIE can perform other conventional image editing tasks like color grading, resizing, rotations, style changes, and sketching.

Why would Apple make it open source?

Apple's open-source forays are a clear strategic move—with a scope beyond mere licensing requirements.

To build MGIE, Apple uses open-source models such as Llava and Vicuna. Due to the licensing requirements of these models, which limit commercial use by big corporate entities, Apple was likely compelled to share its improvements openly on GitHub.

But this also allows Apple to leverage a worldwide pool of developers in a bid to boost its strength and flexibility. This kind of collaboration moves things forward far faster than Apple working entirely on its own, and starting from scratch. In addition, this openness inspires a wider spectrum of ideas and draws diverse technical talent, allowing MGIE to evolve faster.

Engagement by Apple in the open-source community with projects like MGIE also gives the brand a boost among developers and tech enthusiasts. This aspect is no secret, with Meta and Microsoft both heavily investing in open-source AI.

It's possible that releasing MGIE as open-source software will give Apple a head start in setting still-evolving industry standards for AI and AI-based image editing in particular. With MGIE, Apple has likely given AI artists and developers a solid foundation with which to build the next big thing, providing more accuracy and efficiency than what's available elsewhere.

MGIE will certainly make Apple's products better: it wouldn’t be too difficult to synthesize a voice command sent to Siri and use that text to edit a photo on the user's smartphone, computer, or innersive headset.

Technically savvy AI developers can use MGIE right now. Just visit the project’s GitHub repository.

Edited by Ryan Ozawa.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Get Email!

Base Creator Jesse Pollak Steps Back From App Leadership After Admitting Social Bet 'Was Wrong'

Jesse Pollak is stepping back from leading the Base App, handing responsibility for the product back to Coinbase while shifting his full attention to growing Base as a blockchain for global finance. In a lengthy post on X on Wednesday, Pollak reflected on Base's performance over the past six months, acknowledging that the network's strategy around on-chain social and creator coins failed to drive the adoption he expected. "The first quarter of 2026 was a punch in the face," Pollak wrote. He said...

BlackRock, Goldman, and JP Morgan Will Give Tokenized Stocks a Try

The Depository Trust & Clearing Corporation, the clearinghouse that processes U.S. securities transactions, on Wednesday launched a pilot to test tokenized stocks and U.S. Treasuries with nearly 40 financial institutions. The initiative, first reported by The Wall Street Journal, includes JPMorgan Chase, Goldman Sachs, BlackRock, Vanguard, and the New York Stock Exchange. “Today is the beginning of a long journey where we will demonstrate that the old and the new can live together, [and] that th...

Another DeFi Exploit: Perp DEX Ostium Loses $18 Million in Oracle Attack

Ostium lost roughly $18 million on Wednesday after attackers compromised an oracle signer key and manipulated the decentralized perpetuals exchange's price feed to generate fake trading profits, according to blockchain security firm Blockaid. In a post on X, Blockaid said the attacker used a registered PriceUpKeep forwarder and future-dated authorized oracle reports to create artificial trading profits, triggering the multi-million payout—in the form of the Circle-issued stablecoin USDC—from Ost...

News

Courses

Deep Dives

Coins

Videos