The November 2025 AI Coding Surprise, Model by Model

Randy Olson, PhD·

In November 2025, AI coding tools went from “halting and clumsy” to surprisingly capable. Suddenly they could produce whole, working apps from minimal instruction, in ways they simply couldn't before.

This experiment makes that shift visible. We gave 22 AI models the same exact prompt to build a working analog clock from scratch, then ran each model five times independently.

Create a single HTML file containing an interactive analog clock that displays the current time. Use HTML, CSS, and JavaScript. The clock should update every second. Use a white background. Output only the HTML code with no additional text or explanation.

The models span 2023 to early 2026, sorted oldest to newest. Scroll through the rows and watch the quality change.

The five-run design matters: one generation could be a lucky hit or an unlucky miss. Five runs shows whether a model reliably understands the task.

Browse the results below. Click any clock to expand it.

Loading experiment data...

What the results show

Claude Opus 4.5, released November 2025, produces a polished, working analog clock in all five replicates. Earlier models in the timeline produce inconsistent results at best.

A working analog clock is a surprisingly good probe. The model has to understand what “analog” means, render a clock face with hour markers, position three hands correctly for the current time, and animate them every second. Getting that right consistently, across five independent runs, is a real bar to clear.

This is what the November 2025 surprise looked like.

Better models need better evaluation.

As model quality improves, the bar for what “good” means keeps rising. Truesight helps you define and measure quality for your specific domain, so you know exactly when a new model is better for your product.

Randy Olson, PhD

Randy Olson, PhD

Co-Founder & CTO at Goodeye Labs

Dr. Randy Olson builds evaluation infrastructure that puts domain experts in control of AI quality. He holds a PhD in Computer Science from Michigan State University and has spent over 15 years building production AI systems across research and industry.

Inspired by Brian Moore's clock experiment, which I extended with more models and replicates.