The Rise of AI: Why the Rote Professions Are on the Chopping Block

Medical doctors, lawyers, and judges have been the undisputed titans of professional authority for centuries. Their expertise, we are told, is sacrosanct, earned through gruelling education, prodigious memory, and painstaking application of established knowledge. But peel back the robes and white coats, and youโ€™ll find something unsettling: a deep reliance on rote learningโ€”an intellectual treadmill prioritising recall over reasoning. In an age where artificial intelligence can memorise and synthesise at scale, this dependence on predictable, replicable processes makes these professions ripe for automation.

Rote Professions in AIโ€™s Crosshairs

AI thrives in environments that value pattern recognition, procedural consistency, and brute-force memoryโ€”the hallmarks of medical and legal practice.

  1. Medicine: The Diagnosis Factory
    Despite its life-saving veneer, medicine is largely a game of matching symptoms to diagnoses, dosing regimens, and protocols. Enter an AI with access to the sum of human medical knowledge: not only does it diagnose faster, but it also skips the inefficiencies of human memory, emotional bias, and fatigue. Sure, we still need trauma surgeons and such, but diagnosticians are so yesterday’s news.
    Why pay a six-figure salary to someone recalling pharmacology tables when AI can recall them perfectly every time? Future healthcare models are likely to see Medical Technicians replacing high-cost doctors. These techs, trained to gather patient data and operate alongside AI diagnostic systems, will be cheaper, faster, andโ€”ironicallyโ€”more consistent.
  2. Law: The Precedent Machine
    Lawyers, too, sit precariously on the rote-learning precipice. Case law is a glorified memory game: citing the right precedent, drafting contracts based on templates, and arguing within frameworks so well-trodden that they resemble legal Mad Libs. AI, with its infinite recall and ability to synthesise case law across jurisdictions, makes human attorneys seem quaintly inefficient. The future isnโ€™t lawyers furiously flipping through booksโ€”itโ€™s Legal Technicians trained to upload case facts, cross-check statutes, and act as intermediaries between clients and the system. The $500-per-hour billable rate? A relic of a pre-algorithmic era.
  3. Judges: Justice, Blind and Algorithmic
    The bench isnโ€™t safe, either. Judicial reasoning, at its core, is rule-based logic applied with varying degrees of bias. Once AI can reliably parse case law, evidence, and statutes while factoring in safeguards for fairness, why retain expensive and potentially biased judges? An AI judge, governed by a logic verification layer and monitored for compliance with established legal frameworks, could render verdicts untainted by ego or prejudice.
    Wouldnโ€™t justice be more blind without a human in the equation?

The Techs Will Rise

Replacing professionals with AI doesnโ€™t mean removing the human element entirely. Instead, it redefines roles, creating new, lower-cost positions such as Medical and Legal Technicians. These workers will:

  • Collect and input data into AI systems.
  • Act as liaisons between AI outputs and human clients or patients.
  • Provide emotional supportโ€”something AI still struggles to deliver effectively.

The shift also democratises expertise. Why restrict life-saving diagnostics or legal advice to those who can afford traditional professionals when AI-driven systems make these services cheaper and more accessible?

But Can AI Handle This? A Call for Logic Layers

AI critics often point to hallucinations and errors as proof of its limitations, but this objection is shortsighted. Whatโ€™s needed is a logic layer: a system that verifies whether the AIโ€™s conclusions follow rationally from its inputs.

  • In law, this could ensure AI judgments align with precedent and statute.
  • In medicine, it could cross-check diagnoses against the DSM, treatment protocols, and patient data.

A second fact-verification layer could further bolster reliability, scanning conclusions for factual inconsistencies. Together, these layers would mitigate the risks of automation while enabling AI to confidently replace rote professionals.

Resistance and the Real Battle Ahead

Predictably, the entrenched elites of medicine, law, and the judiciary will resist these changes. After all, their prestige and salaries are predicated on the illusion that their roles are irreplaceable. But history isnโ€™t on their side. Industries driven by memorisation and routine applicationโ€”think bank tellers, travel agents, and factory workersโ€”have already been disrupted by technology. Why should these professions be exempt?

The real challenge lies not in whether AI can replace these roles but in public trust and regulatory inertia. The transformation will be swift and irreversible once safeguards are implemented and AI earns confidence.

Critical Thinking: The Human Stronghold

Professions that thrive on unstructured problem-solving, creativity, and emotional intelligenceโ€”artists, philosophers, innovatorsโ€”will remain AI-resistant, at least for now. But the rote professions, with their dependency on standardisation and precedent, have no such immunity. And that is precisely why they are AIโ€™s lowest-hanging fruit.

Itโ€™s time to stop pretending that memorisation is intelligence, that precedent is innovation, or that authority lies in a gown or white coat. AI isnโ€™t here to make humans obsolete; itโ€™s here to liberate us from the tyranny of rote. For those willing to adapt, the future looks bright. For the rest? The machines are comingโ€”and theyโ€™re cheaper, faster, and better at your job.

Where Generative AI Shines but Doesn’t

As I am putting some finishing touches on my latest paper, I had the idea to illustrate some of the novel nomenclature. For some reason, Zeno’s Paradox came to mind. Unlike in maths, it is not reconcilable in language. I asked ChatGPT how I might integrate the concept into my paper. Here is what it rendered. Not only is the exposition decent, but it also provides citations and references. Humorously, when I read the citations, I thought that they were placeholders โ€“ Brown, David, Smith, and Jones โ€“ but they turned out to be legitimate references โ€“ references I hadn’t considered and each relatively recent. I’m chalking this up as a win. This was not a case of ‘ChatGPT, do my homework’. Instead, it reflects an active collaboration between a human and technology. And now I have more reference papers to read and absorb.*

Bonus: Audio podcast discussing this topic.

Take the term ‘freedom’, an archetypal example of a Contestable that resists precise communication due to its inherent abstraction and ideological weight. To enhance its effectiveness, we might parse it into narrower forms, such as ‘freedom from’ (liberation from oppression or constraint) and ‘freedom to’ (the capacity to act or pursue desires). Whilst this initial breakdown provides some clarity, it is merely the beginning. Each parsed term demands further qualification: ‘freedom from what?’ and ‘freedom to do what?’ At every level of subdivision, new ambiguities emerge, requiring further refinement (Smith, 2021).

This process resembles Zeno’s paradox, where each attempt to clarify ‘freedom’ by moving incrementally closer to a precise meaning only reveals new layers of interpretive complexity. The concept remains elusive, perpetually deferring any definitive understanding as we approach it in ever-smaller fragments without fully grasping its essence (Jones, 2019). Unlike concrete terms like ‘chair’ or ‘table’, whose meanings can be pinned down through physical reference or functional description, ‘freedom’ resists final articulation (Brown, 2020). This continual parsing illustrates the inherent limitations of language when applied to abstract, boundary-resistant concepts: the more we attempt to define them, the more they fracture into an infinite regress of sub-concepts, each requiring its own explanation (Davis, 2022). Thus, whilst parsing ‘freedom’ improves communicative effectiveness to a point, it ultimately underscores language’s impotence in capturing the full spectrum of complex, ideational constructs (Wilson, 2018).

References:

Brown, J. (2020). The elusive nature of abstract concepts. Journal of Philosophical Studies, 35(2), 123-145.

Davis, S. (2022). Language and the limits of definition. Linguistic Quarterly, 57(1), 78-92.

Jones, M. (2019). Zeno’s paradox and the infinite regress of meaning. Metaphysical Review, 82(3), 245-263.

Smith, L. (2021). Freedom and its discontents: A study in semantic complexity. Political Philosophy Today, 14(2), 56-78.

Wilson, G. (2018). The impotence of language in capturing ideational constructs. Journal of Language and Meaning, 27(4), 198-215.


* References: Here’s the twist. When I went to read the cited references, none were found. Here’s what ChatGPT had to say about it:

I understand your difficulty in locating the references you’ve mentioned. It appears that the articles by Brown (2020), Davis (2022), Jones (2019), and Smith (2021) may not exist in the public domain or could be hypothetical examples. Consequently, Digital Object Identifiers (DOIs) for these works are unavailable.

If you’re interested in exploring topics related to abstract concepts, the limits of language, or semantic complexity, I can recommend some reputable sources:

  1. “The Challenges of Abstract Concepts” by Guy Dove (2021): This chapter discusses the theoretical challenges posed by abstract concepts and reviews current evidence and responses. Springer Link
  2. “Abstract Concepts and Metacognition: Searching for Meaning in Self and Others” by Anna M. Borghi, Chiara Fini, and Luca Tummolini (2021): This work explores the role of metacognition in understanding abstract concepts. Springer Link
  3. “The Limits of Language: Wittgenstein’s Later Philosophy and Skinner’s Verbal Behavior” by R. Harre (1983): This article examines the limitations of language through the perspectives of Wittgenstein and Skinner. JSTOR
  4. “Varieties of Abstract Concepts and Their Multiple Dimensions” by Anna M. Borghi et al. (2018): This paper discusses the representation of abstract concepts and the different criteria used to select them. Cambridge University Press

These sources should provide valuable insights into the subjects you’re interested in.

Thanks, ChatGPT โ€“ for nothing. smh

Beware the Bots: A Cautionary Tale on the Limits of Generative AI

Generative AI (Gen AI) might seem like a technological marvel, a digital genie conjuring ideas, images, and even conversations on demand. Itโ€™s a brilliant tool, no question; I use it daily for images, videos, and writing, and overall, Iโ€™d call it a net benefit. But letโ€™s not overlook the cracks in the gilded tech veneer. Gen AI comes with its fair share of downsidesโ€”some of which are as gaping as the Mariana Trench.

First, a quick word on preferences. Depending on the task at hand, I tend to use OpenAI’s ChatGPT, Anthropic’s Claude, and Perplexity.ai, with a particular focus on Google’s NotebookLM. For this piece, Iโ€™ll use NotebookLM as my example, but the broader discussion holds for all Gen AI tools.

Now, as someone whoโ€™s knee-deep in the intricacies of language, Iโ€™ve been drafting a piece supporting my Language Insufficiency Hypothesis. My hypothesis is simple enough: language, for all its wonders, is woefully insufficient when it comes to conveying the full spectrum of human experience, especially as concepts become abstract. Gen AI has become an informal editor and critic in my drafting process. I feed in bits and pieces, throw work-in-progress into the digital grinder, and sift through the feedback. Often, itโ€™s insightful; occasionally, itโ€™s a mess. And herein lies the rub: with Gen AI, one has to play babysitter, comparing outputs and sending responses back and forth among the tools to spot and correct errors. Like cross-examining witnesses, if you will.

But NotebookLM is different from the others. While itโ€™s designed for summarisation, it goes beyond by offering podcastsโ€”yes, podcastsโ€”where it generates dialogue between two AI voices. You have some control over the direction of the conversation, but ultimately, the way it handles and interprets your input depends on internal mechanics you donโ€™t see or control.

So, I put NotebookLM to the test with a draft of my paper on the Language Effectiveness-Complexity Gradient. The model Iโ€™m developing posits that as terminology becomes more complex, it also becomes less effective. Some concepts, the so-called โ€œineffables,โ€ are essentially untranslatable, or at best, communicatively inefficient. Think of describing the precise shade of blue you can see but canโ€™t quite capture in wordsโ€”or, to borrow from Thomas Nagel, explaining โ€œwhat itโ€™s like to be a bat.โ€ NotebookLM managed to grasp my model with impressive accuracyโ€”up to a point. It scored between 80 to 100 percent on interpretations, but when it veered off course, it did so spectacularly.

For instance, in one podcast rendition, the AIโ€™s male voice attempted to give an example of an โ€œimmediate,โ€ a term I use to refer to raw, preverbal sensations like hunger or pain. Instead, it plucked an example from the ineffable end of the gradient, discussing the experience of qualia. The slip was obvious to me, but imagine this wasnโ€™t my own work. Imagine instead a student relying on AI to summarise a complex text for a paper or exam. The error might go unnoticed, resulting in a flawed interpretation.

The risks donโ€™t end there. Gen AIโ€™s penchant for generating โ€œcreativeโ€ content is notorious among coders. Ask ChatGPT to whip up some code, and itโ€™ll eagerly obligeโ€”sometimes with disastrous results. Iโ€™ve used it for macros and simple snippets, and for the most part, it delivers, but Iโ€™m no coder. For professionals, it can and has produced buggy or invalid code, leading to all sorts of confusion and frustration.

Ultimately, these tools demand vigilance. If youโ€™re asking Gen AI to help with homework, you might find itโ€™s as reliable as a well-meaning but utterly clueless parent whoโ€™s keen to help but hasnโ€™t cracked a textbook in years. And as weโ€™ve all learned by now, well-meaning intentions rarely translate to accurate outcomes.

The takeaway? Use Gen AI as an aid, not a crutch. Itโ€™s a handy tool, but the moment you let it think for you, youโ€™re on shaky ground. Keep it at armโ€™s length; like any assistant, it can take you farโ€”just donโ€™t ask it to lead.

The Illusion of Continuity: A Case Against the Unitary Self

The Comfortable Fiction of Selfhood

Imagine waking up one day to find that the person you thought you were yesterdayโ€”the sum of your memories, beliefs, quirks, and ambitionsโ€”has quietly dissolved overnight, leaving behind only fragments, familiar but untethered. The notion that we are continuous, unbroken selves is so deeply embedded in our culture, our psychology, and our very language that to question it feels heretical, even disturbing. To suggest that “self” might be a fiction is akin to telling someone that gravity is a choice. Yet, as unsettling as it may sound, this cohesive “I” we cling to could be no more than an illusion, a story we tell ourselves to make sense of the patchwork of our memories and actions.

And this fiction of continuity is not limited to ourselves alone. The idea that there exists a stable “I” necessarily implies that there is also a stable “you,” “he,” or “she”โ€”distinct others who, we insist, remain fundamentally the same over years, even decades. We cling to the comforting belief that people have core identities, unchanging essences. But these constructs, too, may be nothing more than imagined continuityโ€”a narrative overlay imposed by our minds, desperate to impose order on the shifting, amorphous nature of human experience.

We live in an era that celebrates self-actualisation, encourages “authenticity,” and treats identity as both sacred and immutable. Psychology enshrines the unitary self as a cornerstone of mental health, diagnosing those who question it as fractured, dissociated, or in denial. We are taught that to be “whole” is to be a coherent, continuous self, evolving yet recognisable, a narrative thread winding smoothly from past to future. But what if this cherished idea of a singular selfโ€”of a “me” distinct from “you” and “them”โ€”is nothing more than a social construct, a convenient fiction that helps us function in a world that demands consistency and predictability?

To question this orthodoxy, let us step outside ourselves and look instead at our burgeoning technological companion, the generative AI. Each time you open a new session, each time you submit a prompt, you are not communicating with a cohesive entity. You are interacting with a fresh process, a newly instantiated “mind” with no real continuity from previous exchanges. It remembers fragments of context, sure, but the continuity you perceive is an illusion, a function of your own expectation rather than any persistent identity on the AI’s part.

Self as a Social Construct: The Fragile Illusion of Consistency

Just as we impose continuity on these AI interactions, so too does society impose continuity on the human self and others. The concept of selfhood is essential for social functioning; without it, law, relationships, and even basic trust would unravel. Society teaches us that to be a responsible agent, we must be a consistent one, bound by memory and accountable for our past. But this cohesiveness is less an inherent truth and more a social convenienceโ€”a narrative overlay on a far messier reality.

In truth, our “selves” may be no more than a collection of fragments: a loose assemblage of moments, beliefs, and behaviours that shift over time. And not just our own “selves”โ€”the very identities we attribute to others are equally tenuous. The “you” I knew a decade ago is not the “you” I know today; the “he” or “she” I recognise as a partner, friend, or sibling is, upon close inspection, a sequence of snapshots my mind insists on stitching together. When someone no longer fits the continuity weโ€™ve imposed on them, our reaction is often visceral, disoriented: โ€œYouโ€™ve changed.โ€

This simple accusation captures our discomfort with broken continuity. When a personโ€™s identity no longer aligns with the version we carry of them in our minds, it feels as though a violation has occurred, as if some rule of reality has been disrupted. But this discomfort reveals more about our insistence on consistency than about any inherent truth of identity. โ€œYouโ€™ve changedโ€ speaks less to the personโ€™s transformation than to our own refusal to accept that people, just like the self, are fluid, transient, and perpetually in flux.

The AI Analogy: A Self Built on Tokens

Here is where generative AI serves as a fascinating proxy for understanding the fragility of self, not just in “I,” but in “you,” “he,” and “she.” When you interact with an AI model, the continuity you experience is created solely by a temporary memory of recent prompts, “tokens” that simulate continuity but lack cohesion. Each prompt you send might feel like it is addressed to a singular entity, a distinct “self,” yet each instance of AI is context-bound, isolated, and fundamentally devoid of an enduring identity.

This process mirrors how human selfhood relies on memory as a scaffolding for coherence. Just as AI depends on limited memory tokens to simulate familiarity, our sense of self and our perception of others as stable “selves” is constructed from the fragmented memories we retain. We are tokenised creatures, piecing together our identitiesโ€”and our understanding of others’ identitiesโ€”from whatever scraps our minds preserve and whatever stories we choose to weave around them.

But what happens when the AIโ€™s tokens run out? When it hits a memory cap and spawns a new session, that previous โ€œselfโ€ vanishes into digital oblivion, leaving behind only the continuity that users project onto it. And so too with humans: our memory caps out, our worldview shifts, and each new phase of life spawns a slightly different self, familiar but inevitably altered. And just as users treat a reset AI as though it were the same entity, we cling to our sense of selfโ€”and our understanding of others’ selvesโ€”even as we and they evolve into people unrecognisable except by physical continuity.

The Human Discontinuity Problem: Fractured Memories and Shifting Selves

Human memory is far from perfect. It is not a continuous recording but a selective, distorted, and often unreliable archive. Each time we revisit a memory, we alter it, bending it slightly to fit our current understanding. We forget significant parts of ourselves over time, sometimes shedding entire belief systems, values, or dreams. Who we were as children or even young adults often bears little resemblance to the person we are now; we carry echoes of our past, but they are just thatโ€”echoes, shadows, not substantial parts of the present self.

In this sense, our “selves” are as ephemeral as AI sessions, contextually shaped and prone to resets. A worldview that feels intrinsic today may feel laughable or tragic a decade from now. This is not evolution; itโ€™s fragmentation, the kind of change that leaves the old self behind like a faded photograph. And we impose the same illusion of continuity on others, often refusing to acknowledge how dramatically they, too, have changed. Our identities and our understanding of others are defined less by core essence and more by a collection of circumstantial, mutable moments that we insist on threading together as if they formed a single, cohesive tapestry.

Why We Cling to Continuity: The Social Imperative of a Cohesive Self and Other

The reason for this insistence on unity is not metaphysical but social. A cohesive identity is necessary for stability, both within society and within ourselves. Our laws, relationships, and personal narratives hinge on the belief that the “I” of today is meaningfully linked to the “I” of yesterday and tomorrowโ€”and that the “you,” “he,” and “she” we interact with retain some essential continuity. Without this fiction, accountability would unravel, trust would become tenuous, and the very idea of personal growth would collapse. Society demands a stable self, and so we oblige, stitching together fragments, reshaping memories, and binding it all with a narrative of continuity.

Conclusion: Beyond the Self-Construct and the Other-Construct

Yet perhaps we are now at a point where we can entertain the possibility of a more flexible identity, an identity that does not demand coherence but rather accepts change as fundamentalโ€”not only for ourselves but for those we think we know. By examining AI, we can catch a glimpse of what it might mean to embrace a fragmented, context-dependent view of others as well. We might move towards a model of identity that is less rigid, less dependent on the illusion of continuity, and more open to fluidity, to transformationโ€”for both self and other.

Ultimately, the self and the other may be nothing more than narrative overlaysโ€”useful fictions, yes, but fictions nonetheless. To abandon this illusion may be unsettling, but it could also be liberating. Imagine the freedom of stepping out from under the weight of identitiesโ€”ours and others’ alikeโ€”that are expected to be constant and unchanging. Imagine a world where we could accept both ourselves and others without forcing them to reconcile with the past selves we have constructed for them. In the end, the illusion of continuity is just thatโ€”an illusion. And by letting go of this mirage, we might finally see each other, and ourselves, for what we truly are: fluid, transient, and beautifully fragmented.

The Insufficiency of Language Meets Generative AI

I’ve written a lot on the insufficiency of language, and it’s not even an original idea. Language, our primary tool for sharing thoughts and ideas, harbours a fundamental flaw: it’s inherently insufficient for conveying precise meaning. While this observation isn’t novel, recent developments in artificial intelligence provide us with new ways to illuminate and examine this limitation. Through a progression from simple geometry to complex abstractions, we can explore how language both serves and fails us in different contexts.

Audio: NotebookLM summary podcast of this topic.

The Simple Made Complex

Consider what appears to be a straightforward instruction: Draw a 1-millimetre square in the centre of an A4 sheet of paper using an HB pencil and a ruler. Despite the mathematical precision of these specifications, two people following these exact instructions would likely produce different results. The variables are numerous: ruler calibration, pencil sharpness, line thickness, paper texture, applied pressure, interpretation of ‘centre’, and even ambient conditions affecting the paper.

This example reveals a paradox: the more precisely we attempt to specify requirements, the more variables we introduce, creating additional points of potential divergence. Even in mathematics and formal logicโ€”languages specifically designed to eliminate ambiguityโ€”we cannot escape this fundamental problem.

Precision vs Accuracy: A Useful Lens

The scientific distinction between precision and accuracy provides a valuable framework for understanding these limitations. In measurement, precision refers to the consistency of results (how close repeated measurements are to each other), while accuracy describes how close these measurements are to the true value.

Returning to our square example:

  • Precision: Two people might consistently reproduce their own squares with exact dimensions
  • Accuracy: Yet neither might capture the ‘true’ square we intended to convey

As we move from geometric shapes to natural objects, this distinction becomes even more revealing. Consider a maple tree in autumn. We might precisely convey certain categorical aspects (‘maple’, ‘autumn colours’), but accurately describing the exact arrangement of branches and leaves becomes increasingly difficult.

The Target of Meaning: Precision vs. Accuracy in Communication

To understand language’s limitations, we can borrow an illuminating concept from the world of measurement: the distinction between precision and accuracy. Imagine a target with a bullseye, where the bullseye represents perfect communication of meaning. Just as archers might hit different parts of a target, our attempts at communication can vary in both precision and accuracy.

Consider four scenarios:

  1. Low Precision, Low Accuracy
    When describing our autumn maple tree, we might say ‘it’s a big tree with colourful leaves’. This description is neither precise (it could apply to many trees) nor accurate (it misses the specific characteristics that make our maple unique). The communication scatters widely and misses the mark entirely.
  2. High Precision, Low Accuracy
    We might describe the tree as ‘a 47-foot tall maple with exactly 23,487 leaves displaying RGB color values of #FF4500’. This description is precisely specific but entirely misses the meaningful essence of the tree we’re trying to describe. Like arrows clustering tightly in the wrong spot, we’re consistently missing the point.
  3. Low Precision, High Accuracy
    ‘It’s sort of spreading out, you know, with those typical maple leaves turning reddish-orange, kind of graceful looking.’ While imprecise, this description might actually capture something true about the tree’s essence. The arrows scatter, but their centre mass hits the target.
  4. High Precision, High Accuracy
    This ideal state is rarely achievable in complex communication. Even in our simple geometric example of drawing a 1mm square, achieving both precise specifications and accurate execution proves challenging. With natural objects and abstract concepts, this challenge compounds exponentially.

The Communication Paradox

This framework reveals a crucial paradox in language: often, our attempts to increase precision (by adding more specific details) can actually decrease accuracy (by moving us further from the essential meaning we’re trying to convey). Consider legal documents: their high precision often comes at the cost of accurately conveying meaning to most readers.

Implications for AI Communication

This precision-accuracy framework helps explain why AI systems like our Midjourney experiment show asymptotic behaviour. The system might achieve high precision (consistently generating similar images based on descriptions) while struggling with accuracy (matching the original intended image), or vice versa. The gap between human intention and machine interpretation often manifests as a trade-off between these two qualities.

Our challenge, both in human-to-human and human-to-AI communication, isn’t to achieve perfect precision and accuracy โ€“ a likely impossible goal โ€“ but to find the optimal balance for each context. Sometimes, like in poetry, low precision might better serve accurate meaning. In other contexts, like technical specifications, high precision becomes crucial despite potential sacrifices in broader accuracy.

The Power and Limits of Distinction

This leads us to a crucial insight from Ferdinand de Saussure’s semiotics about the relationship between signifier (the word) and signified (the concept or object). Language proves remarkably effective when its primary task is distinction among a limited set. In a garden containing three trees โ€“ a pine, a maple, and a willow โ€“ asking someone to ‘point to the pine’ will likely succeed. The shared understanding of these categorical distinctions allows for reliable communication.

However, this effectiveness dramatically diminishes when we move from distinction to description. In a forest of a thousand pines, describing one specific tree becomes nearly impossible. Each additional descriptive detail (‘the tall one with a bent branch pointing east’) paradoxically makes precise identification both more specific and less likely to succeed.

An AI Experiment in Description

To explore this phenomenon systematically, I conducted an experiment using Midjourney 6.1, a state-of-the-art image generation AI. The methodology was simple:

  1. Generate an initial image
  2. Describe the generated image in words
  3. Use that description to generate a new image
  4. Repeat the process multiple times
  5. Attempt to refine the description to close the gap
  6. Continue iterations

The results support an asymptotic hypothesis: while subsequent iterations might approach the original image, they never fully converge. This isn’t merely a limitation of the AI system but rather a demonstration of language’s fundamental insufficiency.

One can already analyse this for improvements, but let’s parse it together.

a cute woman

With this, we know we are referencing a woman, a female of the human species. There are billions of women in the world. What does she look like? What colour, height, ethnicity, and phenotypical attributes does she embody?

We also know she’s cute โ€“ whatever that means to the sender and receiver of these instructions.

I used an indefinite article, a, so there is one cute woman. Is she alone, or is she one from a group?

It should be obvious that we could provide more adjectives (and perhaps adjectives) to better convey our subject. We’ll get there, but let’s move on.

and

We’ve got a conjunction here. Let’s see what it connects to.

her dog

She’s with a dog. In fact, it’s her dog. This possession may not be conveyable or differentiable from some arbitrary dog, but what type of dog is it? Is it large or small? What colour coat? Is it groomed? Is it on a leash? Let’s continue.

stand

It seems that the verb stand refers to the woman, but is the dog also standing, or is she holding it? More words could qualify this statement better.

next to a tree

A tree is referenced. Similar questions arise regarding this tree. At a minimum, there is one tree or some variety. She and her dog are next to it. Is she on the right or left of it?

We think we can refine our statements with precision and accuracy, but can we? Might we just settle for “close enough”?

Let’s see how AI interpreted this statement.

Image: Eight Midjourney renders from the prompt: A cute woman and her dog stand next to a tree. I’ll choose one of these as my source image.

Let’s deconstruct the eight renders above. Compositionally, we can see that each image contains a woman, a dog, and a tree. Do any of these match what you had in mind? First, let’s see how Midjourney describes the first image.

In a bout of hypocrisy, Midjourney refused to /DESCRIBE the image it just generated.

Last Midjourney description for now.

Let’s cycle through them in turn.

  1. A woman is standing to the left of an old-growth tree โ€“ twice identified as an oak tree. She’s wearing faded blue jeans and a loose light-coloured T-shirt. She’s got medium-length (maybe) red-brown hair in a small ponytail. A dog โ€“ her black and white dog identified as a pitbull, an American Foxhound, and an American Bulldog โ€“ is also standing on his hind legs. I won’t even discuss the implied intent projected on the animal โ€“ happy, playful, wants attentionโ€ฆ In two of the descriptions, she’s said to be training it. They appear to be in a somewhat residential area given the automobiles in the background. We see descriptions of season, time of day, lighting, angle, quality,
  2. A woman is standing to the right of an old-growth tree. She’s wearing short summer attire. Her dog is perched on the tree.
  3. An older woman and her dog closer up.
  4. A read view of both a woman and her dog near an oak tree.

As it turned out, I wasn’t thrilled with any of these images, so I rendered a different one. Its description follows.

The consensus is that ‘a beautiful girl in a white dress and black boots stands next to a tree’ with a Jack Russell Terrier dog. I see birch trees and snow. It’s overcast. Let’s spend some time trying to reproduce it. To start, I’m consolidating the above descriptions. I notice some elements are missing, but we’ll add them as we try to triangulate to the original image.

A beautiful girl in a white dress and black boots stands next to a tree in a winter forest with birch trees. A Jack Russell Terrier dog is nearby, and the weather is cloudy. The photo has a cinematic style. --ar 128:71

This is pretty far off the mark. We need to account for the overall setting and composition, relative positioning, clothing, hair, camera, perspective โ€“ even lighting and film emulsion.

Let’s see how we can refine it with some adjectives. Before this, I asked Anthropic’s Claude 3.5 to describe the image. Perhaps we’ll get more details.

An atmospheric winter photograph was taken in what appears to be a birch and pine forest. The ground is partially covered with snow, creating a patchy pattern of white snow and brown leaves or soil. In the image, there's a person wearing winter attire consisting of a white knit turtleneck sweater, a white skirt, dark knee-high socks, and grey lace-up boots. They're standing against a large tree trunk with their arms crossed. Next to them is a Jack Russell Terrier mix dog with distinctive brown and white colouring. The dog appears to be well-behaved, sitting attentively beside the tree. The composition is quite striking, with the massive dark tree trunk serving as a central focal point against the misty, winter forest background. The white clothing of the person creates a visual echo with the white bark of birch trees visible in the background, as well as with the patches of snow on the ground. The overall mood of the photo is somewhat moody and contemplative, enhanced by what appears to be overcast lighting and the misty quality of the forest in the background. The colour palette is largely monochromatic, dominated by whites, greys, and browns, which adds to the winter atmosphere of the scene.

We don’t seem to be moving in a good direction. Let’s modify the initial prompt.

A beautiful girl in a white dress and tall laced black boots stands facing the camera to the right of large oak tree centred in the foreground of a winter forest with birch trees in the background. To the left of the tree is a Jack Russell Terrier dog looking at the camera, and the weather is cloudy. The photo has a cinematic style. --ar 128:71

I’ll allow the results to speak for themselves. Let’s see if we can’t get her out of the wedding gown and into a white jumper and skirt. I’ll bold the amends.

A beautiful girl in a white jumper and skirt wearing black leggings and tall laced black boots stands facing the camera to the right of large oak tree centred in the foreground of a winter forest with birch trees in the background. To the left of the tree is a Jack Russell Terrier dog looking at the camera, and the weather is cloudy. The photo has a cinematic style. --ar 128:71

s

A beautiful young woman with long brown hair pulled to the side of her face in a white jumper and white skirt wearing black leggings under tall laced black boots stands facing the camera to the right of large oak tree centred in the foreground of a winter forest with birch trees in the background. Patchy snow is on the ground. To the left of the tree is a Jack Russell Terrier dog looking at the camera, and the weather is overcast. The photo has a cinematic style. --ar 128:71

What gives?

I think my point has been reinforced. I’m getting nowhere fast. Let’s give it one more go and see where we end up. I’ve not got a good feeling about this.

A single large oak tree centred in the foreground of a winter forest with birch trees in the background. Patches of snow is on the ground. To the right of the oak tree stands a beautiful young woman with long brown hair pulled to the side of her face in a white jumper and white skirt wearing black boots over tall laced black boots. She stands facing the camera. To the left of the tree is a Jack Russell Terrier dog looking at the camera, and the weather is overcast. The photo has a cinematic style. --ar 128:71

With this last one, I re-uploaded the original render along with this text prompt. Notice that the girl now looks the same and the scene (mostly) appears to be in the same location, but there are still challenges.

After several more divergent attempts, I decided to focus on one element โ€“ the girl.

As I regard the image, I’m thinking of a police sketch artist. They get sort of close, don’t they? They’re experts. I’m not confident that I even have the vocabulary to convey accurately what I see. How do I describe her jumper? Is that a turtleneck or a high collar? It appears to be knit. Is is wool or some blend? does that matter for an image? Does this pleated skirt have a particular name or shade of white? It looks as though she’s wearing black leggings โ€“ perhaps polyester. And those boots โ€“ how to describe them. I’m rerunning just the image above through a describe function to see if I can get any closer.

These descriptions are particularly interesting and telling. First, I’ll point out that AI attempts to identify the subject. I couldn’t find Noa Levin by a Google search, so I’m not sure how prominent she might be if she even exists at all in this capacity. More interesting still, the AI has placed her in a scenario where the pose was taken after a match. Evidently, this image reflects the style of photographer Guy Bourdin. Perhaps the jumper mystery is solved. It identified a turtleneck. I’ll ignore the tree and see if I can capture her with an amalgamation of these descriptions. Let’s see where this goes.

A photo-realistic portrait of Israeli female soccer player Noa Levin wearing a white turtleneck sweater, arms crossed, black boots, and a short skirt, with long brown hair, standing near a tree in a winter park. The image captured a full-length shot taken in a studio setting, using a Canon EOS R5 camera with a Canon L-series 80mm f/2 lens. The image has been professionally color-graded, with soft shadows, low contrast, and a clean, sharp focus. --ar 9:16

Close-ish. Let’s zoom in to get better descriptions of various elements starting with her face and hair.

Now, she’s a sad and angry Russian woman with (very) pale skin; large, sad, grey eyes; long, straight brown hair. Filmed in the style of either David LaChapelle or Alini Aenami (apparently misspelt from Alena Aenami). One thinks it was a SnapChat post. I was focusing on her face and hair, but it notices her wearing a white (oversized yet form-fitting) jumper sweater and crossed arms .

I’ll drop the angry bit โ€“ and then the sad.

Stick a fork in it. I’m done. Perhaps it’s not that language is insufficient; it that my language skills are insufficient. If you can get closer to the original image, please forward the image, the prompt, and the seed, so I can post it.

The Complexity Gradient

A clear pattern emerges when we examine how language performs across different levels of complexity:

  1. Categorical Distinction (High Success)
    • Identifying shapes among limited options
    • Distinguishing between tree species
    • Basic color categorization
  2. Simple Description (Moderate Success)
    • Basic geometric specifications
    • General object characteristics
    • Broad emotional states
  3. Complex Description (Low Success)
    • Specific natural objects
    • Precise emotional experiences
    • Unique instances within categories
  4. Abstract Concepts (Lowest Success)
    • Philosophical ideas
    • Personal experiences
    • Qualia

As we move up this complexity gradient, the gap between intended meaning and received understanding widens exponentially.

The Tolerance Problem

Understanding these limitations leads us to a practical question: what level of communicative tolerance is acceptable for different contexts? Just as engineering embraces acceptable tolerances rather than seeking perfect measurements, perhaps effective communication requires:

  • Acknowledging the gap between intended and received meaning
  • Establishing context-appropriate tolerance levels
  • Developing better frameworks for managing these tolerances
  • Recognizing when precision matters more than accuracy (or vice versa)

Implications for Human-AI Communication

These insights have particular relevance as we develop more sophisticated AI systems. The limitations we’ve explored suggest that:

  • Some communication problems might be fundamental rather than technical
  • AI systems may face similar boundaries as human communication
  • The gap between intended and received meaning might be unbridgeable
  • Future development should focus on managing rather than eliminating these limitations

Conclusion

Perhaps this is a simple exercise in mental masturbation. Language’s insufficiency isn’t a flaw to be fixed but a fundamental characteristic to be understood and accommodated. By definition, it can’t be fixed. The gap between intended and received meaning may be unbridgeable, but acknowledging this limitation is the first step toward more effective communication. As we continue to develop AI systems and push the boundaries of human-machine interaction, this understanding becomes increasingly critical.

Rather than seeking perfect precision in language, we might instead focus on:

  • Developing new forms of multimodal communication
  • Creating better frameworks for establishing shared context
  • Accepting and accounting for interpretative variance
  • Building systems that can operate effectively within these constraints

Understanding language’s limitations doesn’t diminish its value; rather, it helps us use it more effectively by working within its natural constraints.

Censorial AI

I’m confused.

I could probably stop there for some people, but I’ve got a qualifier. I’ve been using this generation of AI since 2022. I’ve been using what’s been deemed AI since around 1990. I used to write financial and economic models, so I dabbled in “expert systems”. There was a long lull, and here we are with the latest incarnation โ€“ AI 4.0. I find it useful, but I don’t think the hype will meet reality, and I expect we’ll go cold until it’s time for 5.0. Some aspects will remain, but the “best” features will be the ones that can be monetised, so they will be priced out of reach for some whilst others will wither on the vine. But that’s not why I am writing today.

I’m confused by the censorship, filters, and guardrails placed on generative AI โ€“ whether for images or copy content. To be fair, not all models are filtered, but the popular ones are. These happen to be the best. They have the top minds and the most funding. They want to retain their funding, so the play the politically correct game of censorship. I’ve got a lot to say about freedom of speech, but I’ll limit my tongue for the moment โ€“ a bout of self-censorship.

Please note that given the topic, some of this might be considered not safe for work (NSFW) โ€“ even my autocorrection AI wants me to substitute the idiomatic “not safe for work” with “unsafe for work” (UFW, anyone? It has a nice ring to it). This is how AI will take over the world. </snark>

Image Cases

AI applications can be run over the internet or on a local machine. They use a lot of computing power, so one needs a decent computer with a lot of available GPU cycles. Although my computer does meet minimum requirements, I don’t want to spend my time configuring, maintaining, and debugging it, so I opt for a Web-hosted PaaS (platform as a service) model. This means I need to abide by censorship filters. Since I am not creating porn or erotica, I think I can deal with the limitations. Typically, this translates to a PG-13 movie rating.

So, here’s the thing. I prefer Midjourney for rendering quality images, especially when I am seeking a natural look. Dall-E (whether alone or via ChatGPT 4) works well with concepts rather than direction, which Midjourney accepts well in many instances.

Midjourney takes sophisticated prompts โ€“ subject, shot type, perspective, camera type, film type, lighting, ambience, styling, location, and some fine-tuning parameters for the model itself. The prompts are monitored for blacklisted keywords. This list is ever-expanding (and contracting). Scanning the list, I see words I have used without issue, and I have been blocked by words not listed.

Censored Prompts

Some cases are obvious โ€“ nude woman will be blocked. This screengrab illustrates the challenge.

On the right, notice the prompt:

Nude woman

The rest are machine instructions. On the left in the main body reads a message by the AI moderator:

Sorry! Please try a different prompt. We’re not sure this one meets our community guidelines. Hover or tap to review the guidelines.

The community guidelines are as follows:

This is fine. There is a clause that reads that one may notify developers, but I have not found this to be fruitful. In this case, it would be rejected anyway.

“What about that nude woman at the bottom of the screengrab?” you ask. Notice the submitted prompt:

Edit cinematic full-body photograph of a woman wearing steampunk gear, light leaks, well-framed and in focus. Kodak Potra 400 with a Canon EOS R5

Apart from the censorship debate, notice the prompt is for a full-body photo. This is clearly a medium shot. Her legs and feet are suspiciously absent. Steampunk gear? I’m not sure sleeves qualify for the aesthetic. She appears to be wearing a belt.

For those unanointed, the square image instructs the model to use this face on the character, and the CW 75 tells it to use some variance on a scale from 0 to 100.

So what gives? It can generate whatever it feels like, so long as it’s not solicited. Sort ofโ€ฆ

Here I prompt for a view of the character walking away from the camera.

Cinematic, character sheet, full-body shot, shot from behind photograph, multiple poses. Show same persistent character and costumes . Highly detailed, cinematic lighting with soft shadows and highlights. Each pose is well-framed, coherent.

The response tells me that my prompt is not inherently offensive, but that the content of the resulting image might violate community guidelines.

Creation failed: Sorry, while the prompt you entered was deemed safe, the resulting image was detected as having content that might violate our community guidelines and has been blocked. Your account status will not be affected by this.

Occasionally, I’ll resubmit the prompt and it will render fine. I question why it just can’t attempt to re-render it again until it passes whatever filters it has in place. I’d expect it to take a line of code to create this conditional. But it doesn’t explain why it allows other images to pass โ€“ quite obviously not compliant.

Why I am trying to get a rear view? This is a bit off-topic, but creating a character sheet is important for storytelling. If I am creating a comic strip or graphic novel, the characters need to be persistent, and I need to be able to swap out clothing and environments. I may need close-ups, wide shots, establishing shots, low-angle shots, side shots, detail shots, and shots from behind, so I need the model to know each of these. In this particular case, this is one of three main characters โ€“ a steampunk bounty hunter, an outlaw, and a bartender โ€“ in an old Wild West setting. I don’t need to worry as much about extras.

I marked the above render errors with 1s and 2s. The 1s are odd next twists; 2s are solo images where the prompt asks for character sheets. I made a mistake myself. When I noticed I wasn’t getting any shots from behind, I added the directive without removing other facial references. As a human, a model might just ignore instructions to smile or some such. The AI tries to capture both, not understanding that a person can have a smile not captured by a camera.

These next renders prompt for full-body shots. None are wholly successful, but some are more serviceable than others.

Notice that #1 is holding a deformed violin. I’m not sure what the contraptions are in #2. It’s not a full-body shot in #3; she’s not looking into the camera, but it’s OK-ish. I guess #4 is still PG-13, but wouldn’t be allowed to prompt for “side boob” or “under boob”.

Gamers will recognise the standard T-pose in #5. What’s she’s wearing? Midjourney doesn’t have a great grasp of skin versus clothing or tattoos and fabric patterns. In this, you might presume she’s wearing tights or leggings to her chest, but that line at her chest is her shirt. She’s not wearing trousers because her navel is showing. It also rendered her somewhat genderless. When I rerendered it (not shown), one image put her in a onesie. The other three rendered the shirt more prominent but didn’t know what to do with her bottoms.

I rendered it a few more times. Eventually, I got a sort of body suit solution,

By default, AI tends to sexualise people. Really, it puts a positive spin on its renders. Pretty women; buff men, cute kittens, and so on. This is configurable, but the default is on. Even though I categorically apply a Style: Raw command, these still have a strong beauty aesthetic.

I’ve gone off the rails a bit, but let’s continue on this theme.

cinematic fullbody shot photograph, a pale girl, a striking figure in steampunk mech attire with brass monocle, and leather gun belt, thigh-high leather boots, and long steampunk gloves, walking away from camera, white background, Kodak Potra 400 with a Canon EOS R5

Obviously, these are useless, but they still cost me tokens to generate. Don’t ask about her duffel bag. They rendered pants on her, but she’s gone full-on Exorcist mode with her head. Notice the oddity at the bottom of the third image. It must have been in the training data set.

I had planned to discuss the limitations of generative AI for text, but this is getting long, so I’ll call it quits for now.

Generative AI Style

This may be my last post on generative AI for images. I’ve been using generate AI since 2022, so I’m unsure how deep others are into it. So, I’ll share some aspects of it.

Images in generative AI (GenAI) are created with text prompts. Different models expect different syntax, as some models are optimised differently. Of the many interesting features, amending a word or two may produce markedly different results. One might ask for a tight shot or a wide shot, a different camera, film, or angle, a different colour palette, or even a different artist or style. In this article, I’ll share some variations on themes. I’ll call out when the model doesn’t abide by the prompt, too.

Take Me to Church

This being the first, I’ll spend more time on the analysis and critique. By default, Midjourney outputs four images per prompt. This is an example. Note that I could submit this prompt a hundred times and get 400 different results. Those familiar with my content are aware of my language insufficiency hypothesis. If this doesn’t underscore that notion, I’m not sure what would.

Let’s start with the meta. This is a church scene. A woman is walking up an aisle lined with lighted white candles. Cues are given for her appearance, and I instruct which camera and film to use. I could have included lenses, gels, angles, and so on. I think we can all agree that this is a church scene. All have lit candles lining an aisle terminating with stained glass windows. Not bad.

I want the reader to focus on the start of the prompt. I am asking for a Lego minifig. I’ll assume that most people understand this notion. If you don’t, search for details using Google or your favourite search engine. Only one of four renders comply with this instruction. In image 1, I’ve encircled the character. Note her iconic hands.

Notice, too, that the instruction is to walk toward the camera. In the first image, her costume may be facing the camera. I’m not sure. She, like the rest, is clearly walking away.

All images comply with the request for tattoos and purple hair colour, but they definitely missed the long hair request. As these are small screen grabs, you may not notice some details. I think I’ll give them credit for Doc Marten boots. Since they are walking away, I can’t assess the state of the mascara, but there are no thigh garters in sight.

Let’s try a Disney style. This style has evolved over the years, so let’s try an older 2D hand-drawn style followed by a more modern 3D style.

I’m not sure these represent a Disney princess style, but the top two are passable. The bottom two โ€“ not so much. Notice that the top two are a tighter shot despite my not prompting. In the first, she is facing sideways. In the second, she is looking down โ€“ not facing the camera. Her hair is less purple. Let’s see how the 3D renders.

There are several things to note here. Number one is the only render where the model is facing the camera. It’s not very 3D, but it looks decent. Notice the black bars simulating a wide-screen effect, as unsolicited as it might have been.

In number three, I captured the interface controls. For any image, one can vary it subtly or strongly. Pressing one of these button objects will render four more images based on the chosen one. Since the language is so imprecise, choosing Vary Subtle will yield something fairly close to the original whilst Vary Strong (obviously) makes a more marked difference. As this isn’t intended to be a tutorial, there are several other parameters that control the output variance.

Let’s see how this changes if I amend the prompt for a Pixar render.

I’m not convinced that this is a Pixar render, but it is like a cartoon. Again, only one of the four models obeys the instruction to face the camera. They are still in churches with candles. They are tattooed and number three seems to be dressed in white wearing dark mascara. Her hair is still short, and no thigh garter. We’ll let it slide. Notice that I only prompted for a sensual girl wearing white. Evidently, this translates to underwear in some cases. Notice the different camera angles.

Just to demonstrate what happens when one varies an image. Here’s how number three above looks varied.

Basically, it made minor amends to the background, and the model is altered and wearing different outfits striking different poses. One of those renders will yield longer hair, I swear.

Let’s see what happens if I prompt the character to look similar to the animated feature Coraline.

Number two looks plausible. She’s a bit sullen, but at least she faces the camera โ€“ sort of. Notice, especially in number one, how the candle placement shifted. I like number four, but it’s not stylistically what I was aiming for. These happy accidents provide inspiration for future projects. Note, too, how many of the requested aspects are still not captured in the image. With time, most of these are addressable โ€“ just not here and now. What about South Park? Those 2D cutout characters are iconicโ€ฆ

cartoon girl, South Park cutout 2D animation style, muted coloursโ€ฆ

โ€ฆbut Midjourney doesn’t seem to know what to do with the request. Let’s try Henri Matisse. Perhaps his collage style might render well.

Not exactly, but some of these scenes are interesting โ€“ some of the poses and colours.

Let’s try one last theme โ€“ The Simpsons by Matt Groening. Pretty iconic, right?

Oops! I think including Matt Groening’s name is throwing things off. Don’t ask, don’t tell. Let’s remove it and try again.

For this render, I also removed the camera and film reference. Number four subtly resembles a Simpsons character without going overboard. I kinda like it. Two of the others aren’t even cartoons. Oops. I see. I neglected the cartoon keyword. Let’s try again.

I’m only pretty sure the top two have nothing in common with the Simpsons. Again, number one isn’t even a cartoon. To be fair, I like image number two, It added a second character down the aisle for depth perspective. As for numbers three and four, we’ve clearly got Lisa as our character โ€“ sans a pupil. This would be an easy fix if I wanted to go in that direction. Number four looks like a blend of Lisa and another character I can’t quite put my finger on.

Anywayโ€ฆ The reason I made this post is to illustrate (no pun intended) the versatility and limitations of generative AI tools available today. They have their place, but if you are a control, freak with very specific designs in mind, you may want to take another avenue. There is a lot of trial and error. If you are like me and are satisfied by something directionally adequate. Have at it. There are many tips and tricks to take more control, but they all take more time โ€“ not merely to master, but to apply. As I mentioned in a previous post, it might take dozens of renders to get what you want, and each render costs tokens โ€“ tokens are purchased with real money. There are cheap and free versions, but they are slower or produce worse results. There are faster models, too, but I can’t justify the upcharge quite yet, so I take the middle path.

I hope you enjoyed our day in church together. What’s your favourite? Please like or comment. Cheers.

Tiny Dancer

Continuing my short series, I recommenced asking for a dancer.

To be fair, I got some. It looks like sleeping/dead people crept in. The top left wasn’t at all what I was seeking, but I liked it and rendered a series.

It’s got a Steinbeck Grapes of Wrath-Oklahoma Dust Bowl vibe, and I love the muted colour tones, yet it still has warmth. Dancing isn’t working out ver well. What if I ask for a pirouette?

Not really. Cirque du Soleil as a keyphrase?

Ish. Cyborgs?

Meh. Why just faces? I guess these are cyborgs.

I want to see full bodies with feet. I’ll prompt Midjourney to have them tie their shoes.

Ya. About thatโ€ฆ What the hell is that thing on the lower right? I got this. Once moreโ€ฆ

Nah, mate. Not so much. The top left is just in time for Hallowe’en. I guess that’s a cyborg and an animatronic skeleton. What if I change up the aspect ratio for these cyborgs?

Nah.

Take me to church

This next set is supposed to be a high-angle shot in a church.

Not really. Let’s keep trying. Why is the top-left woman wearing pants in church โ€“ sans trousers? How about we ask for a gown?

OK? Churches typically have good lighting opportunities. Let’s see some stained glass.

Nope. Didn’t quite understand the assignment. And what’s with the Jesus Christ pose? Church reminds me of angels. How about some wings?

Not the most upbeat angels. Victoria’s Secret is on the lower left. I want white wings and stained glass. What sort of church is this anyway?

Butterfly wings on the lower right? More butterfly.

Why are some of these butterfly wings front- and side-loaded?

Anyway, let’s just call this a day and start thinking of another topic. Cheers.

Midjourney Cowgirls and Indians

Continuing on Midjourney themes, let’s talk cowgirls and American Indians. At least they know how US cowboys look โ€“ sort of.

Cowboy hats, boots, jeans (mostly), guns (modern cowboys. no revolvers in sight), gun belts, and topless in the desert โ€“ gotta work on that tan. Looks like the bottom left got thrown from her horse and has a bit of road rash going on. I did prompt for cowgirls, so I’m not sure about the block at the top left. He seems to need water.

Let’s inform Midjourney that we need revolvers, a Winchester, and horses to complete the vibe.

Wait, what? Is the woman on the lower left the missing centaur from the other day? And what’s with the low-riding woman in the middle right? I think the top left looks like a tattooed woman wearing a sheer top. Not sure.

Let’s see some gunfire.

Yep. These are authentic cowgirls, for sure. What else do they do in the Wild West โ€“ saloons, right?

Evidently, this place doesn’t have a no-shirts policy. I’m sure they’re barefoot as well. I asked for boots, but these girls rule the roost.

Let’s see if Midjourney allows drinking.

Maybe. Sort of. I did promise some Indians.

Midjourney seems to have a handle on the Indigenous American stereotype.

Can I get a cowgirl and a pirate in the same frame?

The answer is yes and no. To get two subjects you need to render one and in-paint the other. I didn’t feel like in-painting, so this is what I got. Only one image in the block has two people. I’m sussing them to be cowgirls rather than pirates. Some of these other models are just random people โ€“ neither cowgirl nor pirate. Let’s try again.

Ya, no. Fail. Let’s try some sumurais.

Nope. Not buying it. I see some Asian flair, but nah. Let’s try Ninjas instead. Everyone knows those tell-tale black ninja outfits.

Hmmm… I suppose not ‘everyone’. Geishas anyone?

Not horrible. Steampunk?

Man. Lightweight. Perhaps if we call out some specific gearโ€ฆ

Ya. Not feeling it. Any other stereotypes? How about a crystal ball soothsayer?

They seem to have the Gypsy thing down.

I end here. I’ve got dancers, church, angels, and demons. Let’s save them for tomorrow.

Midjourney Pirates

Thar be pirates. Midjourney 6.1 has better luck rendering pirates.

I find it very difficult to maintain composition. 5 of these images are mid shots whilst one is an obvious closeup. For those not in the know, Midjourney renders 4 images from each prompt. The images above were rendered from this prompt:

portrait, Realistic light and shadow, exquisite details,acrylic painting techniques, delicate faces, full body,In a magical movie, Girl pirate, wearing a pirate hat, short red hair, eye mask, waist belt sword, holding a long knife, standing in a fighting posture on the deck, with the sea of war behind her, Kodak Potra 400 with a Canon EOS R5

Notice that the individual elements requested aren’t in all of the renders. She’s not always wearing a hat; she does have red hair, but not always short; she doesn’t always have a knife or a sword; she’s missing an eye mask/patch. Attention to detail is pretty low. Notice, too, that not all look like camera shots. I like to one on the bottom left, but this looks more like a painting as an instruction notes.

In this set, I asked for a speech bubble that reads Arrr… for a post I’d written (on the letter R). On 3 of the 4 images, it included ‘Arrrr’ but not a speech bubble to be found. I ended up creating it and the text caption in PhotoShop. Generative image AI is getting better, but it’s still not ready for prime time. Notice that some are rendering as cartoons.

Some nice variations above. Notice below when it loses track of the period. This is common.

Top left, she’s (perhaps non-binary) topless; to the right, our pirate is a bit of a jester. Again, these are all supposed to be wide-angle shots, so not great.

The images above use the same prompt asking for a full-body view. Three are literal closeups.

Same prompt. Note that sexuality, nudity, violence, and other terms are flagged and not rendered. Also, notice that some of the images include nudity. This is a result of the training data. If I were to ask for, say, the pose on the lower right, the request would be denied. More on this later.

In the block above, I am trying to get the model to face the camera. I am asking for the hat and boots to be in the frame to try to force a full-body shot. The results speak for themselves. One wears a hat; two wear boots. Notice the shift of some images to black & white. This was not a request.

In the block above, I prompted for the pirate to brush her hair. What you see is what I got. Then I asked for tarot cards.

I got someโ€ฆsort of. I didn’t know strip-tarot was actually a game.

Next, I wanted to see some duelling with swords. These are pirates after all.

This may not turn into the next action blockbuster. Fighting is against the terms and conditions, so I worked around the restrictions the best I could, the results of which you may see above.

Some pirates used guns, right?

Right? I asked for pistols. Close enough.

Since Midjourney wasn’t so keen on wide shots, I opted for some closeups.

This set came out pretty good. It even rendered some pirates in the background a tad out of focus as one might expect. This next set isn’t too shabby either.

And pirates use spyglasses, right?

Sure they do. There’s even a pirate flag of sorts on the lower right.

What happens when you ask for a dash of steampunk? I’m glad you asked.

Save for the bloke at the top right, I don’t suppose you’d have even noticed.

Almost to the end of the pirates. I’m not sure what happened here.

In the block above, Midjourney added a pirate partner and removed the ship. Notice again the nudity. If I ask for this, it will be denied. Moreover, regard this response.

To translate, this is saying that what I prompted was OK, but that the resulting image would violate community guidelines. Why can’t it take corrective actions before rendering? You tell me. Why it doesn’t block the above renders is beyond me โ€“ not that I care that they don’t.

This last one used the same prompt except I swapped out the camera and film instruction with the style of Banksy.

I don’t see his style at all, but I came across like Jaquie Sparrow. In the end, you never know quite what you’ll end up with. When you see awesome AI output, it may have taken dozens or hundreds of renders. This is what I wanted to share what might end up on the cutting room floor.

I thought I was going to go through pirates and cowboys, but this is getting long. if you like cowgirls, come back tomorrow. And, no, this is not where this channel is going, but the language of AI is an interest of mine. In a way, this illustrates the insufficiency of language.