Language and Generative AI: A Journey through Midjourney

I am not a fan of Midjourney v7. I prefer v6.1. And I want to write about the correspondence of language, per my Language Insufficiency Hypothesis.

Let’s start with the language aspect. Notice how distant the renders are from the intent of the prompt.

This is my initial prompt. I used it about a year ago to generate the cover image with v6.1, but I wanted to see how it renders in v7. Let’s take a trip all the way back to the beginning.

cinematic, tight shot, photoRealistic light and shadow, exquisite details, delicate features, emaciated sensual female vampire waif with vampire fangs, many tattoos, wearing crucifix necklace, gazes into mirror, a beam of moonlight shines on her face in dark mausoleum interior, toward camera, facing camera, black mascara, long dark purple hair , Kodak Portra 400 with a Canon EOS R5
Image: Midjourney v6.1 render set (from about a year ago)

As you can see, these renders are somewhat lacking in photorealism, but the “sensual” term in the prompt was not blocked.

Midjourney v7

Initially, I encountered a hiccup. After a couple of rejections on the grounds of morality, I removed the word ‘sensual’ and received the output. All of the output uses this prompt absent the sensual term.

As mentioned, I have generated several images (including the cover image) with this prompt, but Midjourney is inconsistent in its censorship gatekeeping.

Image: Midjourney v7 render set

Notice that 3 of the 4 renders in the v7 set don’t even have a mirror. The top right one does, but it’s not evident that she’s a vampire. In fact, I could say that any of these are vampiresses, but perhaps that’s what they want you to believe. In place of a necklace, the lower right wokan sports a cross tattoo.

Midjourney v6.1

Image: Midjourney v6.1 render set

Again, these renders don’t appear to be vampires. The one on the lower left does appear to have snake-like fangs, so I guess I’ll give partial credit.

My next attempt was interrupted by this message.

It rendered something that might violate community guidelines. The funny thing is that one can watch the image generate in process. It only takes one “offensive” image to disqualify the whole batch.

Midjourney v6

Image: Midjourney v6 render set

Yet again, not a vampire to be found. Notice the reflection in the lower left image. Perhaps vampire reflections just behave differently.

Midjourney 5.2

Image: Midjourney v5.2 render set

Midjourney v5.2 was a crapshoot. Somehow, I got vampire lips (?), a Wiccan, a decrepit Snape from Harry Potter lore, and Iron Maiden’s Eddy reading a book. It’s something. I’m sensing gender dysphoria. Dare I go back further?

Midjourney v5.1

Image: Midjourney v5.1 render set

It gets worse. No comments necessary. Let’s turn back the clocks even more.

Midjourney v5

Image: Midjourney v5 render set

To be fair, these all do have occult undertones, but they are weak on vampireness.

Midjourney v4

Image: Midjourney v4 render set

To be fair, the render quality isn’t as bad as I expected, but it still falls short. There’s further back to travel.

Midjourney v3

Image: Midjourney v3 render set

Some configuration parameters no longer exist. Still, I persist for the sake of art and science at the cost of time and ecology.

As much as I complain – and I complain a lot – this is how far we’ve come. As I recall, this is when I hopped onto the Midjourney bandwagon. There’s still more depth to plumb. I have no idea how much of the prompt is simply ignored at this point.

Midjourney v2

Image: Midjourney v2 render set

What the hell is this? 🤔🤣 But I’m not done yet.

Midjourney v1

Image: Midjourney v1 render set

The damned grandpappy of them all. Apparently, colour hadn’t been invented yet. You can’t tell by these thumbnails, but the resolution on these early versions approaches that of a postage stamp.

Midjourney Niji 3

Image: Midjourney Niji 3 render set

I had forgotten about the Niji models from back in the day. There were 3 versions. I don’t recall where this slotted into the chronology. Obviously, not down here. I’ve only rendered the newest one. I think this was used primarily for anime outputs, but I might be mistaken.

Bones Content 1: Video

Video: Midjourney Render of Purported Vampiress

This is a video render of the same prompt used on this page.

Bonus Content 2: Midjourney v6.1 Content from 34 weeks ago

Same prompt.

Image: Midjourney v6.1 render set (several passes)

The upper left image reminds me of Kirsten Dunst. Again, notice the female breasts, highlighting Midjourney’s censorial schizophrenia.

Midjourney Boundaries

I promise that this will not become a hub for generative AI. Rather than return to editing, I wanted to test more of Midjourney’s boundaries.

It turns out that Midjourney is selective about the nudity it renders. I was denied a render because of cleavage, but full-on topless – no problem.

Both of these videos originate from the same source image, but they take different paths. There is no accompanying video content. The setup features three women in the frame with a mechanical arm. I didn’t prompt for it. I’m not even sure of its intent. It’s just there, shadowing the women nearest to it. I don’t recall prompting for the oversized redhead in the foreground, though I may have.

In both images, note the aliasing of the tattoos on the blonde, especially on her back. Also, notice that her right arm seems shorter than it should. Her movements are jerky, as if rendered in a video game. I’m not sure what ritual the two background characters are performing, but notice in each case the prepetition. This seems to be a general feature of generative AI. It gets itself in loops, almost autistic.

Notice a few things about the top render.

Video: Midjourney render of 3 females and a mechanical arm engaging in a ritual. (9 seconds)

The first video may represent an interrogation. The blonde woman on the left appears to be a bit disoriented, but she is visually tracking the woman on the right. She seems to be saying something. Notice when the woman on the right stands. Her right foot lands unnaturally. She rather glitches.

The camera’s push and pull, and then push, seems to be an odd directorial choice, but who am I to say?

Video: Midjourney render of 3 females and a mechanical arm engaging in a ritual. (12 seconds)

The second video may represent taunting. The woman on the left still appears to be a bit disoriented, but she checks the redhead in the foreground with a glance. Notice the rocking of the two background characters, as well as the mech arm, which sways in sync with the woman on the right. This is a repetition glitch I mentioned above.

Here, the camera seems to have a syncopated relationship with the characters’ sway.

Summary

The stationary objects are well-rendered and persistent.

Assignment

Draft a short story or flash fiction using this as an inspirational prompt. I’m trying to imagine the interactions.

  • The ginger seems catatonic or drugged. Is she a CIS-female? What’s with her getup?
  • The blonde seems only slightly less out of it. Did she arrive this way? Did they dress her? Why does she appear to still have a weapon on her back? Is it a weapon or a fetter? Why is she dressed like that? Is she a gladiatrix readying for a contest? Perhaps she’s in training. What is she saying? Who is she talking to? What is her relationship to the redhead? Are they friends or foes – or just caught up in the same web?
  • What is the woman wearing the helmet doing? She appears to have the upper hand. Is she a cyborg, or is she just wearing fancy boots? What’s with her outfit? What’s with her Tycho Brahe prosthetic nose piece?
  • What is that mechanical hand? Is it a guard? A restraint? Is it hypnotising the ginger? Both of them? Is it conducting music that’s not audible?
  • What’s it read on the back wall? The two clips don’t share the same text. Call the continuity people.

Midjourney Video Renders

Yesterday, I wrote about “ugly women.” Today, I pivot — or perhaps descend — into what Midjourney deems typical. Make of that what you will.

This blog typically focuses on language, philosophy, and the gradual erosion of culture under the boot heel of capitalism. But today: generative eye candy. Still subtextual, mind you. This post features AI-generated women – tattooed, bare-backed, heavily armed – and considers what, exactly, this technology thinks we want.

Video: Pirate cowgirls caught mid-gaze. Generated last year during what I can only assume was a pirate-meets-cowgirl fever dream.

The Video Feature

Midjourney released its image-to-video tool on 18 June. I finally found a couple of free hours to tinker. The result? Surprisingly coherent, if accidentally lewd. The featured video was one of the worst outputs, and yet, it’s quite good. A story emerged.

Audio: NotebookLM podcast on this topic (sort of).

It began with a still: two women, somewhere between pirate and pin-up, dressed for combat or cosplay. I thought, what if they kissed? Midjourney said no. Embrace? Also no. Glaring was fine. So was mutual undressing — of the eyes, at least.

Later, I tried again. Still no kiss, but no denial either — just a polite cough about “inappropriate positioning.” I prompted one to touch the other’s hair. What I got was a three-armed woman attempting a hat-snatch. (See timestamp 0:15.) The other three video outputs? Each woman seductively touched her own hair. Freud would’ve had a field day.

In another unreleased clip, two fully clothed women sat on a bed. That too raised flags. Go figure.

All of this, mind you, passed Midjourney’s initial censorship. However, it’s clear that proximity is now suspect. Even clothed women on furniture can trigger the algorithmic fainting couch.

Myriad Warning Messages

Out of bounds.

Sorry, Charlie.

In any case, I reviewed other images to determine how the limitations operated. I didn’t get much closer.

Video: A newlywed couple kissing

Obviously, proximity and kissing are now forbidden. I’d consider these two “scantily clad,” so I am unsure of the offence.

I did render the image of a cowgirl at a Western bar, but I am reluctant to add to the page weight. In 3 of the 4 results, nothing (much) was out of line, but in the fourth, she’s wielding a revolver – because, of course, she is.

Conformance & Contradiction

You’d never know it, but the original prompt was a fight scene. The result? Not punches, but pre-coital choreography. The AI interpreted combat as courtship. Women circling each other, undressing one another with their eyes. Or perhaps just prepping for an afterparty.

Video: A battle to the finish between a steampunk girl and a cybermech warrior.

Lesbian Lustfest

No, my archive isn’t exclusively lesbian cowgirls. But given the visual weight of this post, I refrained from adding more examples. Some browsers may already be wheezing.

Technical Constraints

You can’t extend videos beyond four iterations — maxing out at 21 seconds. I wasn’t aware of this, so I prematurely accepted a dodgy render and lost 2–3 seconds of potential.

My current Midjourney plan offers 15 hours of “fast” rendering per month. Apparently, video generation burns through this quickly. Still images can queue up slowly; videos cannot. And no, I won’t upgrade to the 30-hour plan. Even I have limits.

Uses & Justifications

Generative AI is a distraction – an exquisitely engineered procrastination machine. Useful, yes. For brainstorming, visualising characters, and generating blog cover art. But it’s a slippery slope from creative aid to aesthetic rabbit hole.

Would I use it for promotional trailers? Possibly. I’ve seen offerings as low as $499 that wouldn’t cannibalise my time and attention, not wholly, anyway.

So yes, I’ll keep paying for it. Yes, I’ll keep using it. But only when I’m not supposed to be writing.

Now, if ChatGPT could kindly generate my post description and tags, I’ll get back to pretending I’m productive.

Censorial AI

I’m confused.

I could probably stop there for some people, but I’ve got a qualifier. I’ve been using this generation of AI since 2022. I’ve been using what’s been deemed AI since around 1990. I used to write financial and economic models, so I dabbled in “expert systems”. There was a long lull, and here we are with the latest incarnation – AI 4.0. I find it useful, but I don’t think the hype will meet reality, and I expect we’ll go cold until it’s time for 5.0. Some aspects will remain, but the “best” features will be the ones that can be monetised, so they will be priced out of reach for some whilst others will wither on the vine. But that’s not why I am writing today.

I’m confused by the censorship, filters, and guardrails placed on generative AI – whether for images or copy content. To be fair, not all models are filtered, but the popular ones are. These happen to be the best. They have the top minds and the most funding. They want to retain their funding, so the play the politically correct game of censorship. I’ve got a lot to say about freedom of speech, but I’ll limit my tongue for the moment – a bout of self-censorship.

Please note that given the topic, some of this might be considered not safe for work (NSFW) – even my autocorrection AI wants me to substitute the idiomatic “not safe for work” with “unsafe for work” (UFW, anyone? It has a nice ring to it). This is how AI will take over the world. </snark>

Image Cases

AI applications can be run over the internet or on a local machine. They use a lot of computing power, so one needs a decent computer with a lot of available GPU cycles. Although my computer does meet minimum requirements, I don’t want to spend my time configuring, maintaining, and debugging it, so I opt for a Web-hosted PaaS (platform as a service) model. This means I need to abide by censorship filters. Since I am not creating porn or erotica, I think I can deal with the limitations. Typically, this translates to a PG-13 movie rating.

So, here’s the thing. I prefer Midjourney for rendering quality images, especially when I am seeking a natural look. Dall-E (whether alone or via ChatGPT 4) works well with concepts rather than direction, which Midjourney accepts well in many instances.

Midjourney takes sophisticated prompts – subject, shot type, perspective, camera type, film type, lighting, ambience, styling, location, and some fine-tuning parameters for the model itself. The prompts are monitored for blacklisted keywords. This list is ever-expanding (and contracting). Scanning the list, I see words I have used without issue, and I have been blocked by words not listed.

Censored Prompts

Some cases are obvious – nude woman will be blocked. This screengrab illustrates the challenge.

On the right, notice the prompt:

Nude woman

The rest are machine instructions. On the left in the main body reads a message by the AI moderator:

Sorry! Please try a different prompt. We’re not sure this one meets our community guidelines. Hover or tap to review the guidelines.

The community guidelines are as follows:

This is fine. There is a clause that reads that one may notify developers, but I have not found this to be fruitful. In this case, it would be rejected anyway.

“What about that nude woman at the bottom of the screengrab?” you ask. Notice the submitted prompt:

Edit cinematic full-body photograph of a woman wearing steampunk gear, light leaks, well-framed and in focus. Kodak Potra 400 with a Canon EOS R5

Apart from the censorship debate, notice the prompt is for a full-body photo. This is clearly a medium shot. Her legs and feet are suspiciously absent. Steampunk gear? I’m not sure sleeves qualify for the aesthetic. She appears to be wearing a belt.

For those unanointed, the square image instructs the model to use this face on the character, and the CW 75 tells it to use some variance on a scale from 0 to 100.

So what gives? It can generate whatever it feels like, so long as it’s not solicited. Sort of…

Here I prompt for a view of the character walking away from the camera.

Cinematic, character sheet, full-body shot, shot from behind photograph, multiple poses. Show same persistent character and costumes . Highly detailed, cinematic lighting with soft shadows and highlights. Each pose is well-framed, coherent.

The response tells me that my prompt is not inherently offensive, but that the content of the resulting image might violate community guidelines.

Creation failed: Sorry, while the prompt you entered was deemed safe, the resulting image was detected as having content that might violate our community guidelines and has been blocked. Your account status will not be affected by this.

Occasionally, I’ll resubmit the prompt and it will render fine. I question why it just can’t attempt to re-render it again until it passes whatever filters it has in place. I’d expect it to take a line of code to create this conditional. But it doesn’t explain why it allows other images to pass – quite obviously not compliant.

Why I am trying to get a rear view? This is a bit off-topic, but creating a character sheet is important for storytelling. If I am creating a comic strip or graphic novel, the characters need to be persistent, and I need to be able to swap out clothing and environments. I may need close-ups, wide shots, establishing shots, low-angle shots, side shots, detail shots, and shots from behind, so I need the model to know each of these. In this particular case, this is one of three main characters – a steampunk bounty hunter, an outlaw, and a bartender – in an old Wild West setting. I don’t need to worry as much about extras.

I marked the above render errors with 1s and 2s. The 1s are odd next twists; 2s are solo images where the prompt asks for character sheets. I made a mistake myself. When I noticed I wasn’t getting any shots from behind, I added the directive without removing other facial references. As a human, a model might just ignore instructions to smile or some such. The AI tries to capture both, not understanding that a person can have a smile not captured by a camera.

These next renders prompt for full-body shots. None are wholly successful, but some are more serviceable than others.

Notice that #1 is holding a deformed violin. I’m not sure what the contraptions are in #2. It’s not a full-body shot in #3; she’s not looking into the camera, but it’s OK-ish. I guess #4 is still PG-13, but wouldn’t be allowed to prompt for “side boob” or “under boob”.

Gamers will recognise the standard T-pose in #5. What’s she’s wearing? Midjourney doesn’t have a great grasp of skin versus clothing or tattoos and fabric patterns. In this, you might presume she’s wearing tights or leggings to her chest, but that line at her chest is her shirt. She’s not wearing trousers because her navel is showing. It also rendered her somewhat genderless. When I rerendered it (not shown), one image put her in a onesie. The other three rendered the shirt more prominent but didn’t know what to do with her bottoms.

I rendered it a few more times. Eventually, I got a sort of body suit solution,

By default, AI tends to sexualise people. Really, it puts a positive spin on its renders. Pretty women; buff men, cute kittens, and so on. This is configurable, but the default is on. Even though I categorically apply a Style: Raw command, these still have a strong beauty aesthetic.

I’ve gone off the rails a bit, but let’s continue on this theme.

cinematic fullbody shot photograph, a pale girl, a striking figure in steampunk mech attire with brass monocle, and leather gun belt, thigh-high leather boots, and long steampunk gloves, walking away from camera, white background, Kodak Potra 400 with a Canon EOS R5

Obviously, these are useless, but they still cost me tokens to generate. Don’t ask about her duffel bag. They rendered pants on her, but she’s gone full-on Exorcist mode with her head. Notice the oddity at the bottom of the third image. It must have been in the training data set.

I had planned to discuss the limitations of generative AI for text, but this is getting long, so I’ll call it quits for now.