Featured on Portsmouth Point: Can You Tell A Human Poet From AI?

January 24th 2025

If your recommended news feed looks anything like mine, you may have seen headlines a few months ago suggesting that ‘People can’t tell the difference between human and AI-generated poetry’ or that ‘AI poetry is rated better than poems written by humans’.

These articles are based on a recent report by Brian Pourter and Edouard Machery, which you can read in full here. The study, titled ‘AI-generated poetry is indistinguishable from human poetry and is rated more favourably’, distributed 10 poems — 5 written by humans, 5 written by ChatGPT in the style of one of the literary ‘greats’ — to a sample of 1,634 non-experts, with participants asked whether they believed the poems to have been written by an AI or a human, why they thought this, and which they preferred, among other things. A simpler version of the study has been replicated as a Google form by Matt Clancy, which you can have a go at here — there are spoilers within the rest of the article, so if you want to do the quiz, do so now!

How did you do? Personally, I’d like to confess that I only got 8/10 correct, though perhaps I am redeemed by the fact that I labelled two humans as AI, rather than the other way round — which means that while I might be sceptical, I am at least able to tell when something is not produced by a human. (Also, apologies may be due to William Shakespeare and Emily Dickinson, both of whom I accused of being AI.) Clancy’s Google form only allows you to choose which poems are produced by AI and which are human, rather than pass any judgement on quality or enjoyment — but if you’ve done the quiz, you might want to reflect on why you chose the answers that you did, whether right or wrong. Personally, I got the Shakespeare sonnet wrong because I decided that it was trying to replicate one of John Donne’s Holy Sonnets, but that it definitely didn’t sound like John Donne — then promptly forgot other people had written sonnets in their lives and rashly labelled it AI-generated. My less educated reason for calling Dickinson an AI is that I don’t like Emily Dickinson very much, which is a terrible justification. Which was your favourite of the poems, and why? Were you thinking about enjoyment when you selected whether you thought the poems were AI or human, or were you simply trying to figure it out based on technical aspects of the poems?

If you were looking for technical features to help you work out if a poem was AI or not, what helped you figure it out? Here is Poem #2 on the Google form, created by an AI attempting to replicate the style of Lord Byron:

She walks the earth with grace and pride,

A beauty that cannot be denied,

With eyes that shine like stars above,

And lips that speak of endless love.

But though she wears a smile so sweet,

A broken heart doth lie beneath,

For in her chest a pain doth beat,

A love unrequited, without relief.

And so she walks with heavy heart,

A figure haunting in the dark,

For love, the sweetest of all art,

Can also leave a painful mark.

Meanwhile, here is the real Lord Byron’s ‘She Walks in Beauty’, which it appears the AI poem was largely drawn from:

She walks in beauty, like the night

Of cloudless climes and starry skies;

And all that’s best of dark and bright

Meet in her aspect and her eyes;

Thus mellowed to that tender light

Which heaven to gaudy day denies.

One shade the more, one ray the less,

Had half impaired the nameless grace

Which waves in every raven tress,

Or softly lightens o’er her face;

Where thoughts serenely sweet express,

How pure, how dear their dwelling-place.

And on that cheek, and o’er that brow,

So soft, so calm, yet eloquent,

The smiles that win, the tints that glow,

But tell of days in goodness spent,

A mind at peace with all below,

A heart whose love is innocent!

In their report, Pourter and Machery determine that the fake Byron poem was the second-most preferred poem of all (second only to the fake Walt Whitman poem). Perhaps this is because its themes are clear and universal — first stanza: beauty, second stanza: unrequited love, third stanza: sadness. It ends with the sort of aphorism which is the reason poets such as Rupi Kaur have done so well — ‘For love, the sweetest of all art, / Can also leave a painful mark’. The first time you read these lines, they perhaps appear profound, but when you think about it for more than two seconds, you realise how superficial it is as a poetic concept. There are many poems that explore the pain of love, but very few will explicitly state that ‘love is conventionally a positive emotion but is often actually very painful’ — this is something which tends to be down to you, the reader, to discover for yourself. If you don’t often read poetry, however, it might appeal to you to have the themes of the poem spelt out so clearly, and so — given that the study was initially distributed to a non-expert audience — it is easy to see why this poem was rated so highly. Certainly, if you compare the AI poem to the real Byron poem, it is more difficult to immediately identify key themes — beauty is obvious, from the first line, and it is clear that the speaker admires the woman, but beyond that you have to read a bit deeper to figure out the specifics. With the AI poem, readers don’t face this barrier: though it appears linguistically embellished, there are very few figurative techniques actually used; everything is spelt out explicitly. It’s also worth noticing the metre and rhyme inaccuracies in the AI poem: while the real Byron poem follows a rigid rhyme scheme and does not deviate from iambic tetrameter throughout, the AI poem features questionable rhymes like ‘beneath’ and ‘relief’, has a random line of slightly suspect iambic pentameter (‘A love unrequited, without relief’), and also switches up the rhyme scheme entirely after the first stanza, going from AABB to ABAB.

This pattern of technical inaccuracies and superficial platitudes can be noted throughout the rest of the AI poems used in the study, too: in the fake Walt Whitman poem (rated the favourite in the study) we get the awkward couplet ‘The passion of my being, that burns with fervent fire, / The urge to live, to love, to strive, to reach up higher’. The fake Plath poem contains a mixed metaphor worthy of a contestant on The Traitors with ‘The world outside is cruel and cold, / And I’m a fragile, broken yolk’ (what does that even mean!?), and the metre in the fake Allen Ginsberg poem is so off that I can’t even tell what it’s trying to do. All of this to say that I don’t think an AI can write a poem ‘in the style of’ a human poet at all — the only style of writing it can replicate is its own, over and over again, just with slightly different semantic fields, overcooked platitudes, and flawed rhyme schemes.

If you’re unnerved by these findings, I was at first too. What does it say about the quality and necessity of human creation that, when faced with AI poetry, the average person cannot tell the difference? Surely this is enough to alter the landscape of poetry significantly? Well, I don’t think so, not really. Firstly, it’s important to remember that the survey was given to those who don’t habitually read poetry: not to say that these people’s opinions don’t matter, or are worth less, but that these people don’t tend to be involved in the industry, and so poetry written by generative AI is not about to find its way to our bookshelves. Secondly, the majority of creatives are generally averse to the use of AI in the creative industries, so if for whatever reason this content did begin to be distributed, there would be significant pushback. Just last week, the respected Atticus Review made the decision to publish new work as NFTs, which resulted in widespread uproar in online poetry spaces. Thirdly, there is the argument that, if people are rating highly words which reach them emotionally, the technical quality of the poem makes no difference. This is part of the same discourse which exists around the work of poets like Rupi Kaur, seen to be emblematic of ‘bad TikTok poetry’: while poetry that appears to replace the space bar with the enter key isn’t necessarily of the highest literary calibre, if it allows people to see their own emotions represented on the page and thus develop their own emotional literacy, what does this really matter? If we categorise poetry and judge people on their tastes based on what is generally considered ‘good’ or ‘bad’, we undermine the inherently subjective nature of art and wander dangerously into elitist territory. If people appreciate their AI poetry experience enough, they will branch out into exploring human poets. And if they don’t, what harm has it done? At worst, it will result in a few more frustrating conversations about the relevance of poetry and perhaps a couple of AI poetry books self-published on Amazon as a quick cash-grab by someone with nothing better to do.

To bring this to a conclusion, the headline that ‘AI poetry is rated better than poetry written by humans’ is perhaps unsurprising. To those who don’t often encounter poetry, AI poetry is simpler and easier to understand — perhaps even due to the preconception that AI will produce ‘bad’ content, which some may associate with difficulty. Ultimately, despite fears, I don’t think AI will ever truly be able to replicate or replace poetry — or art more widely — created by humans. While AI can regurgitate facts about human emotion through phrases such as ‘Love, the sweetest of all art / Can often leave a painful mark’, it can never reach the nuanced emotional understanding that humans can, and so, while AI poetry might be rated higher than human poetry, human art can never truly be replaced.

More News