Lipsyncing with AILip-Reading the AI Hallucination: A Futile Adventure

Some apps boldly claim to enable lip syncing – to render speech from mouth movements. I’ve tried a few. None delivered. Not even close.

To conserve bandwidth (and sanity), I’ve rendered animated GIFs rather than MP4s. You’ll see photorealistic humans, animated characters, cartoonish figures – and, for reasons only the algorithm understands, a giant goat. All showcase mouth movements that approximate the utterance of phonemes and morphemes. Approximate is doing heavy lifting here.

Firstly, these mouths move, but they say nothing. I’ve seen plenty of YouTube channels that manage to dub convincing dialogue into celebrity clips. That’s a talent I clearly lack – or perhaps it’s sorcery.

Secondly, language ambiguity. I reflexively assume these AI-generated people are speaking English. It’s my first language. But perhaps, given their uncanny muttering, they’re speaking yours. Or none at all. Do AI models trained predominantly on English-speaking datasets default to English mouth movements? Or is this just my bias grafting familiar speech patterns onto noise?

Thirdly, don’t judge my renders. I’ve been informed I may have a “type.” Lies and slander. The goat was the AI’s idea, I assure you.

What emerges from this exercise isn’t lip syncing. It’s lip-faking. The illusion of speech, minus meaning, which, if we’re honest, is rather fitting for much of what generative AI produces.

EDIT: I hadn’t noticed the five fingers (plus a thumb) on the cover image.

Midjourney Boundaries

I promise that this will not become a hub for generative AI. Rather than return to editing, I wanted to test more of Midjourney’s boundaries.

It turns out that Midjourney is selective about the nudity it renders. I was denied a render because of cleavage, but full-on topless – no problem.

Both of these videos originate from the same source image, but they take different paths. There is no accompanying video content. The setup features three women in the frame with a mechanical arm. I didn’t prompt for it. I’m not even sure of its intent. It’s just there, shadowing the women nearest to it. I don’t recall prompting for the oversized redhead in the foreground, though I may have.

In both images, note the aliasing of the tattoos on the blonde, especially on her back. Also, notice that her right arm seems shorter than it should. Her movements are jerky, as if rendered in a video game. I’m not sure what ritual the two background characters are performing, but notice in each case the prepetition. This seems to be a general feature of generative AI. It gets itself in loops, almost autistic.

Notice a few things about the top render.

Video: Midjourney render of 3 females and a mechanical arm engaging in a ritual. (9 seconds)

The first video may represent an interrogation. The blonde woman on the left appears to be a bit disoriented, but she is visually tracking the woman on the right. She seems to be saying something. Notice when the woman on the right stands. Her right foot lands unnaturally. She rather glitches.

The camera’s push and pull, and then push, seems to be an odd directorial choice, but who am I to say?

Video: Midjourney render of 3 females and a mechanical arm engaging in a ritual. (12 seconds)

The second video may represent taunting. The woman on the left still appears to be a bit disoriented, but she checks the redhead in the foreground with a glance. Notice the rocking of the two background characters, as well as the mech arm, which sways in sync with the woman on the right. This is a repetition glitch I mentioned above.

Here, the camera seems to have a syncopated relationship with the characters’ sway.

Summary

The stationary objects are well-rendered and persistent.

Assignment

Draft a short story or flash fiction using this as an inspirational prompt. I’m trying to imagine the interactions.

  • The ginger seems catatonic or drugged. Is she a CIS-female? What’s with her getup?
  • The blonde seems only slightly less out of it. Did she arrive this way? Did they dress her? Why does she appear to still have a weapon on her back? Is it a weapon or a fetter? Why is she dressed like that? Is she a gladiatrix readying for a contest? Perhaps she’s in training. What is she saying? Who is she talking to? What is her relationship to the redhead? Are they friends or foes – or just caught up in the same web?
  • What is the woman wearing the helmet doing? She appears to have the upper hand. Is she a cyborg, or is she just wearing fancy boots? What’s with her outfit? What’s with her Tycho Brahe prosthetic nose piece?
  • What is that mechanical hand? Is it a guard? A restraint? Is it hypnotising the ginger? Both of them? Is it conducting music that’s not audible?
  • What’s it read on the back wall? The two clips don’t share the same text. Call the continuity people.

Midjourney Video Renders

Yesterday, I wrote about “ugly women.” Today, I pivot — or perhaps descend — into what Midjourney deems typical. Make of that what you will.

This blog typically focuses on language, philosophy, and the gradual erosion of culture under the boot heel of capitalism. But today: generative eye candy. Still subtextual, mind you. This post features AI-generated women – tattooed, bare-backed, heavily armed – and considers what, exactly, this technology thinks we want.

Video: Pirate cowgirls caught mid-gaze. Generated last year during what I can only assume was a pirate-meets-cowgirl fever dream.

The Video Feature

Midjourney released its image-to-video tool on 18 June. I finally found a couple of free hours to tinker. The result? Surprisingly coherent, if accidentally lewd. The featured video was one of the worst outputs, and yet, it’s quite good. A story emerged.

Audio: NotebookLM podcast on this topic (sort of).

It began with a still: two women, somewhere between pirate and pin-up, dressed for combat or cosplay. I thought, what if they kissed? Midjourney said no. Embrace? Also no. Glaring was fine. So was mutual undressing — of the eyes, at least.

Later, I tried again. Still no kiss, but no denial either — just a polite cough about “inappropriate positioning.” I prompted one to touch the other’s hair. What I got was a three-armed woman attempting a hat-snatch. (See timestamp 0:15.) The other three video outputs? Each woman seductively touched her own hair. Freud would’ve had a field day.

In another unreleased clip, two fully clothed women sat on a bed. That too raised flags. Go figure.

All of this, mind you, passed Midjourney’s initial censorship. However, it’s clear that proximity is now suspect. Even clothed women on furniture can trigger the algorithmic fainting couch.

Myriad Warning Messages

Out of bounds.

Sorry, Charlie.

In any case, I reviewed other images to determine how the limitations operated. I didn’t get much closer.

Video: A newlywed couple kissing

Obviously, proximity and kissing are now forbidden. I’d consider these two “scantily clad,” so I am unsure of the offence.

I did render the image of a cowgirl at a Western bar, but I am reluctant to add to the page weight. In 3 of the 4 results, nothing (much) was out of line, but in the fourth, she’s wielding a revolver – because, of course, she is.

Conformance & Contradiction

You’d never know it, but the original prompt was a fight scene. The result? Not punches, but pre-coital choreography. The AI interpreted combat as courtship. Women circling each other, undressing one another with their eyes. Or perhaps just prepping for an afterparty.

Video: A battle to the finish between a steampunk girl and a cybermech warrior.

Lesbian Lustfest

No, my archive isn’t exclusively lesbian cowgirls. But given the visual weight of this post, I refrained from adding more examples. Some browsers may already be wheezing.

Technical Constraints

You can’t extend videos beyond four iterations — maxing out at 21 seconds. I wasn’t aware of this, so I prematurely accepted a dodgy render and lost 2–3 seconds of potential.

My current Midjourney plan offers 15 hours of “fast” rendering per month. Apparently, video generation burns through this quickly. Still images can queue up slowly; videos cannot. And no, I won’t upgrade to the 30-hour plan. Even I have limits.

Uses & Justifications

Generative AI is a distraction – an exquisitely engineered procrastination machine. Useful, yes. For brainstorming, visualising characters, and generating blog cover art. But it’s a slippery slope from creative aid to aesthetic rabbit hole.

Would I use it for promotional trailers? Possibly. I’ve seen offerings as low as $499 that wouldn’t cannibalise my time and attention, not wholly, anyway.

So yes, I’ll keep paying for it. Yes, I’ll keep using it. But only when I’m not supposed to be writing.

Now, if ChatGPT could kindly generate my post description and tags, I’ll get back to pretending I’m productive.