Tech

I Cloned Myself With Midjourney’s New Feature. It Took 30 Seconds and Was Weirdly Convincing.

Thomas SmithMarch 17, 2024

This week, Midjourney released a massively exciting feature, which users have demanded for months: character references.

The new tool allows Midjourney users to create consistent characters across multiple images. As we’ll see, it’s a huge deal for artists, illustrators and animators.

But the new feature also has a dark side. It allows even novice users to create alarmingly convincing deepfakes in just a few simple clicks.

Let’s explore.

The Power of Character References

Character references are a huge deal for AI image generation.

Before, each image you created with an AI image generator like Midjourney or DALL-E was discrete. The images stood alone — although power users found clever workarounds, you couldn’t easily use elements from one image in future images.

That was a big problem for illustrators and animators. Creating a series of illustrations with a consistent character — for example, to illustrate a children’s book or graphic novel — was nearly impossible.

Character references changes that. The new feature allows designers to create a character one time, and then re-use that character in an unlimited number of future images.

For example, I can create a character called Bichon Frise Man using Midjourney’s “Imagine” function.

Illustration by the author via Midjourney

I can then run additional commands to have Bichon Frise Man do anything that I want, like swooping in to rescue people from a burning building.

With Character References enabled, the superhero in my second image closely resembles the character I created with my first one.

Again, this is a huge deal for anyone interested in creating a series of images with Midjourney. With Character References, you could now use the system to create illustrations for an entire comic book, storyboards for a film, and more.

Using the function is easy. You simply add the command — cref [URL TO REFERENCE IMAGE] to the end of your Midjourney command, like this:

/imagine Bichon Frise man rushes into a burning building to save someone --cref [URL TO ORIGINAL IMAGE]

Easy Deepfakes

Again, Character References will be a huge boon to illustrators. But the new feature also comes with a risk. Character References make it far easier to quickly spin up convincing deepfakes, based only on a single reference image of a real person.

To demonstrate how this could work, I decided to deepfake myself. Instead of specifying the URL to a previous Midjourney character image in my new commands, I instead sent Midjourney a selfie of me in a Bay Area restaurant from my new site The Bay Review.

I then asked Midjourney to use that image and create scenes that feature me going on a variety of fun vacations.

Here’s the deepfaked me having the time of his life in Tahoe:

Here I am on a beach in Maui:

At the end of the day, the deepfake Thomas Smith apparently got a bit tired from all the traveling and needed some alone time. When I asked Midjourney to show me eating dinner, it produced this pensive image.

Those images aren’t perfect, but they’re pretty convincing. My own family could probably tell that they’re fake. But if I posted the Tahoe one on social media with a caption like “Hi from the slopes!” I guarantee that many people would be fooled (I’m not going to do this — see my section below on using this tech ethically.)

Running Midjourney’s images through a system like Upscaler, which is designed to fix AI faces, made them even more convincing. Here’s the Tahoe image after applying Upscaler’s FacePro function.

Many of the somewhat grotesque elongations and other undertones of the original image are gone. There are still flaws — my skin looks too smooth in this one for example — but many people would probably still believe that it’s me.

For reference, here’s a real picture of me on the real slopes.

The deepfake isn’t too far off from reality.

Deepfakes on Demand

Midjourney’s new Character Reference function is powerful, but it’s not perfect.

As Midjourney acknowledges in their release notes for the feature (shared on their Discord channel), “This feature works best when using characters made from Midjourney images. It’s not designed for real people / photos (and will likely distort them as regular image prompts do).”

Indeed, many of my Character Reference generations were distorted or strange. I don’t know who this guy is, but he’s definitely not me!

Character References also struggles to create images showing more than one subject. I asked for a photo of me meeting Taylor Swift, and instead got this bizarre shot of two pseudo Thomas Smiths embracing.

The function also copies clothing to a fault — even when I asked for a photo of myself on a tropical beach, the system showed me wearing the same puffy jacket from my reference image!

Still, in my testing, I found that Midjourney got my salient features right a surprising portion of the time.

Crucially, creating my deepfakes cost nothing, took only 1–2 minutes of processing time, and required no editing skills.

I was also able to create them using a single, publicly available reference image — there was no need to train Midjourney with lots of photos of me, as I would have needed to do with previous image generators.

Character References opens up tons of exciting creative possibilities. But given this ease of use, it also reveals risks.

Specifically, the tech shows that it’s now incredibly simple for anyone with basic prompting skills to create reasonably good deepfakes of anyone else.

My vacation examples are silly. But I could easily create a fake photo of a real person doing something nefarious that they never actually did.

To be fair, Midjourney’s Community Guidelines prohibit these kinds of deceptive uses of their system. Faking images for destructive or abusive purposes is forbidden. As Midjourney says, “We are not a democracy.” Creating abusive deepfakes will almost certainly get you banned from the platform.

The (Deepfaked) Cat is Out of the Bag

Still, even if Midjourney polices its Character References function well and establishes good guardrails, the fact that Character References exist in one platform means other platforms will almost certainly clone and implement the same function.

And many of those platforms — especially open-source ones — won’t include any guardrails at all.

The ease of creating deepfakes risks further eroding the public’s trust in visuals. Before, good deepfakes were the purview of skilled graphics designers, often with institutional or government backing.

As generative AI advances, it will become ever easier for anyone to call up a convincing deepfake of anyone else with a few presses of the mouse.

As Kate Middleton’s high-profile Photoshop snafu revealed this week, people are already suspicious of images. And bad actors are already allegedly claiming that real images are deepfakes in order to cast doubt on photos showing them doing compromising things.

Easy deepfakes will make it harder than ever to trust any image from any source. Now that the deepfaked cat is out of the bag, there’s little we can do as imagery consumers to put it back.

But, there are a few steps we can take to minimize the negative impact of these new technologies:

Clearly label manipulated or deepfaked images when we create them, as I have done in this article.
Further develop and foster networks of trusted photographers and imagery producers, so that we can trust the real/manipulated labels that creators attach to their work.
Embrace technologies like digital watermarking that can identify deepfaked images from the point of creation.
Increase our own media literacy, so we can spot potentially deepfaked images when we see them. MIT has a great resource for this.
Realize that deepfakes usually work by playing on stereotypes or biases already present in a community. When we see an image that appears to confirm a deeply held suspicion or emotion, we need to be especially leery of it.
Avoid rejecting dual-use tech like Character References. Yes, the tech opens up the possibility of expanded deepfakes. But it’s also an incredibly valuable tool for illustrators and other creators. We shouldn’t throw the baby out with the deepfaked bathwater.

As my Midjourney experiments show, the technological barrier for creating deepfakes has almost entirely evaporated.

We can’t stop people from creating deepfakes. It’s now on us as media consumers to establish the systems of trust and education required to avoid being duped or manipulated by these images.

I’ve tested thousands of ChatGPT prompts over the last year. As a full-time creator, there are a handful I come back to every day that fit with the ethical uses I mention in this article. I compiled them into a free guide, 7 Enormously Useful ChatGPT Prompts For Creators. Grab a copy today!

The Upscaler link above is an affiliate link. This article originally appeared in The Generator.