AI is challenging cartoonists and gag writers—but what about judging captions? Turns out, it might hold its own there too. But, is it better than human caption analysis?
Maybe not. Perhaps it's the nuances, mistakes, the imperfections, and even the errors that make cartoons, captions, and the banter on podcasts uniquely human.
Wait, did I just diss the podcasters? No Maybe a little bit. It's true I don't always agree with them—and not because of the overused escape hatch: "Humor is subjective."
During a recent vChat with Bob Mankoff, I brought up one of his TNYCCC books and he reminded me that some captions are just "craptions."
Mistakes, as human as they are, can be great lessons and sometimes very valuable. As a youth, I was a philatelist and envied the owners of the Inverted Jenny.
I recently bought this original Bob Mankoff on ebay. You have to love the spelling of "Council"; let's see AI do that…naturally.
(s/o to Stephen Nadler for the ebay heads-up on Attempted Bloggery Website )
When speaking of our flaws, I think Capt Kirk said it best:
"We prefer to help ourselves. We make mistakes, but we're human — and maybe that's the word that best explains us." — Captain James T. Kirk
Sidenote
Because of my background in the sciences, I am inherently a supporter of new technologies. But, as a surfer, I'm a purist (hated when factories started pumping out boards. I loved talking with a shaper and watching him shape my board based on my surf style) and I'm a bit of a humanist.
I'm also a supporter of humans making-things and I put my money where my mouth is. I buy original cartoonist's art, license their artwork, and publish their artwork. Checkout PanelStripLOL, https://panelstrip.lol, as an example.
Back to an AI LLM judging captions
AI used properly is great. Tweaked your knee and can't understand your MRI results? AI helped me understand the scan and be better prepared for Doc. Telegraph to internet, horse and buggy to cars—technology rolls on and there are going to be fantastic uses for AI.
This is an unpolished test. I suppose I could run this test on all 993 Contests (so far), but I'd just have to do it again a month later when the next shiny new-and-improved AI LLM is released. (But, if I did do that, I'd use AI tools.)
Here's one test, a quick go at determining if AI can judge a cartoon caption contest as well as humans.
Contest #989
Prompt:
This is a The New Yorker Cartoon Caption Contest. First, describe it to me and I will correct any items you misidentify. Identify the speaker. Then I will give you three captions to list in order of best to worst and assign each a rank 1-10. Ten is the highest for an excellent caption for this cartoon. Your list order and rankings should not be influenced by the caption order. Consequently, If all three captions are poor, rank them as so.
AI LLM:
Scene description:
Two prehistoric cave people in a barren, rocky landscape. One stands upright on the left, mouth open in apparent surprise or concern — clearly the speaker/observer. The other is crouched on the right, apparently carving or chiseling something. Surrounding the crouching figure are several large LEGO-style bricks — oversized interlocking plastic-looking blocks, anachronistically prehistoric. Stone chips are scattered on the ground. A rocky butte is visible in the background.
The visual joke: a caveperson has invented LEGO bricks instead of, say, the wheel. Or is chiseling bricks into LEGO form.
Speaker: The standing figure on the left — open mouth, facing the worker, classic "reaction" posture.
Ready for your three captions.
Prompt:
"I hope these sell better than the frisbees you made."
"Are they for all ages?"
"I have a feeling this will be a painful step in our evolution."
AI LLM:
Here's my ranking, best to worst:
1. "I have a feeling this will be a painful step in our evolution." — 8
The LEGO-stepping-on-bare-feet gag is universally understood, "evolution" does double duty (prehistory + progress), and it lands without explanation. Clean, load-bearing wordplay, image-specific. Submission-worthy.
2. "I hope these sell better than the frisbees you made." — 7
Implies a history of anachronistic invention failures, which is a solid character premise. "Frisbees" is a good specific choice. Slightly weakened by the fact that it's more about an offscreen invention than this one — the image isn't fully load-bearing. Competent but not surprising enough to win.
3. "Are they for all ages?" — 4
This is the LEGO box tagline ("Ages 4 and up"), and the anachronism is the joke — but it requires you to supply the punchline yourself. The caption points at the joke without completing it. A finisher knows the joke; this one just winks at it.
Strong trio overall. Caption 1 is your best shot.
Now, here are Contest #989 results after the public voted.
No doubt this test needs some modifications, but it was an interesting first go.
Note, the captions provided to the AI LLM were in the same vertically stacked order on TNYCCC online voting page.
Geez, I hate to even mention this, you all know how I feel about it. But, because it may not be mentioned elsewhere (unless the results match the stated theory) I will: interesting that neither the voting public or this AI LLM was swayed by the caption in the top voting position. That casts some doubt on the first-position advantage theory — or maybe it just confirms whose confirmation bias it really is.
The Judging Analysis: A Surprising Alignment
The AI LLM called it the same order as the voting public. "Painful step in our evolution" first, frisbees second, "all ages" a distant third.
What's interesting is how it got there. The AI broke it down like a data problem — is the caption load-bearing? Does it need the image? Does it land without explanation? The voters didn't run that checklist. They just laughed. Or didn't.
Same result, different path.
Does that mean AI has figured out what makes something funny? I wouldn't go that far — one contest, three captions, unpolished test. But it's a data point. And as someone who has an interest in humor data, I'll admit it was a little satisfying to see the analysis hold up against an actual vote count.