Lipsyncing with AILip-Reading the AI Hallucination: A Futile Adventure

Some apps boldly claim to enable lip syncing – to render speech from mouth movements. I’ve tried a few. None delivered. Not even close.

To conserve bandwidth (and sanity), I’ve rendered animated GIFs rather than MP4s. You’ll see photorealistic humans, animated characters, cartoonish figures – and, for reasons only the algorithm understands, a giant goat. All showcase mouth movements that approximate the utterance of phonemes and morphemes. Approximate is doing heavy lifting here.

Firstly, these mouths move, but they say nothing. I’ve seen plenty of YouTube channels that manage to dub convincing dialogue into celebrity clips. That’s a talent I clearly lack – or perhaps it’s sorcery.

Secondly, language ambiguity. I reflexively assume these AI-generated people are speaking English. It’s my first language. But perhaps, given their uncanny muttering, they’re speaking yours. Or none at all. Do AI models trained predominantly on English-speaking datasets default to English mouth movements? Or is this just my bias grafting familiar speech patterns onto noise?

Thirdly, don’t judge my renders. I’ve been informed I may have a “type.” Lies and slander. The goat was the AI’s idea, I assure you.

What emerges from this exercise isn’t lip syncing. It’s lip-faking. The illusion of speech, minus meaning, which, if we’re honest, is rather fitting for much of what generative AI produces.

EDIT: I hadn’t noticed the five fingers (plus a thumb) on the cover image.