AI vs Human Crossword: A Speed Test

The Experiment: The New York Times Sunday Crossword. The ultimate test of trivia, wordplay, and "lateral thinking." Can GPT-4 using Vision (looking at the grid screenshot) beat a fairly competent human (me) and a Professional Solver (my friend Alice)?

> THE CONTENDERS

ME Average Sunday: 45m

GPT-4 Average: ???

> ROUND 1: TRIVIA CLUES (THE EASY PART)

Computers know everything. This should be a slaughter.

14 Across: "Capital of Burkina Faso" (11 letters)

ME: (Googles it) OUAGADOUGOU.

AI: OUAGADOUGOU. (Instant)

Result: Tie. I know how to Google. The AI knows how to query its database. No advantage.

> ROUND 2: THE PUNS (THE HUMAN EDGE)

Crosswords are famous for trickery. "Flower" can mean "Something that flows" (River), not a plant.

23 Down: "A stable business?" (5 letters)

ME: Stable... Horses... BARN? STALL? Wait... HORSE. "Horse Business?" No. "Stud Farm?" No. Ah! A HORSE ranch! Answer: HORSE.

AI: BANK. (Reasoning: Banks are businesses that are stable.)

Correct Answer: HORSE. (Because horses live in Stables. It's a pun).

AI Failure: It took the clue literally. It missed the double meaning of "Stable."

> ROUND 3: THE LATERAL THINKING

45 Across: "Lead-in to 'step' or 'stone'" (4 letters)

ME: Step... Stone... SOAP? Soapstone? Soapbox? No. MILE? Milestone. Milestep? No. LIME? Limestone. Limestep? No.

AI: SAND. (Sandstone. Sandstep?)

The answer was FLAG. Flagstone. Flagstep? Wait, is Flagstep a word? No. The answer ended up being SOAP. (Soapstone. Soapbox... wait, clue was "Lead-in to step OR box").

I realized reading the grid is hard. The AI hallucinated the clue number. It was reading 46 Across.

> THE VISION PROBLEM

I fed the AI a screenshot of the grid. It struggled to correlate the numbers with the boxes.

AI: "I see a grid. But I cannot reliably tell which empty white box corresponds to 12 Down vs 13 Down."

I had to manually type the clues. This defeats the speed purpose.

> FINAL TIMES

ME
42m

AI (Vision)
FAILED

AI (Text Input)
2m

The Verdict:

If I type the clues into ChatGPT, it solves the puzzle in 2 minutes. It is a super-genius. It figures out the puns eventually if I give it the letter count ("_ _ _ S E").

But if I ask it to "Look at the puzzle and solve it," it fails completely. It cannot navigate the spatial geometry of the grid.

> CONCLUSION

Crosswords are safe... for now. The joy of a crossword is the "Aha!" moment. The AI doesn't have "Aha!" moments; it just runs a probability search. It's like playing Chess against an engine. You can do it, but why? The struggle is the point.