I added a puzzle to Octothorpean this weekend. It took longer to figure out where to put the puzzle than it did to write the puzzle itself. I wanted to add a puzzle close to the beginning of Octothorpean, but didn't want to make the longest "arc" even longer. I didn't know which was longest, so I measured.

The plurality of Octothorpean is a set of eight "arcs" or trails of puzzles that come together at a metapuzzle. Some arcs take longer to finish than others. Ideally, they'd all take about the same amount of time. In practice, I'd settle for the slowest path taking less than e times as long as the quickest path.

I wanted to add a puzzle. If I added it to the slowest arc, I'd make that arc even slower. Really, I wanted to add it to the quickest arc. But which was quickest?

I'd already measured this somewhat: I'd originally "balanced" the arcs based on playtesters' solving times. But that timing data was… noisy. Some puzzles had only been seen by two teams; a few had only been seen by one team. In some cases, that one team was a Small Subset of the Demonic Robot Tyrannosaurs, and they were very fast indeed.

But now more than 100 teams have played. So I had more data. I wrote a little program to crunch the logs: for each of the eight arcs, for each team that's completed that arc, figure out how long that team took. From this, for each arc, we can find the median time to complete the arc, easy-peasy.

The results surprised me: the longest arc took more than five times as long to finish as the shortest.

Now I wasn't trying to figure out where to insert a new puzzle: suddenly I was doing damage control, trying to figure out why this one arc was so difficult to solve.

I wrote another little program, this one to crunch the logs another way, going for finer granularity: instead of computing the median time to finish an arc, I computed the median time for each puzzle. Then added up the puzzle-times to figure out arc times. The big multiplier… disappeared. Using this computation, another arc was the longest arc (but not by a factor of five, thank goodness).

What was going on? Why did teams take so long to get through that arc, but not if you looked puzzle-by-puzzle?

More than half of the teams took a break partway through that arc. They didn't all do so on the same puzzle. But somewhere in that arc, they took a break. When I looked more closely at the median time to finish the arc, more interesting than its time as a multiple-of-the-quickest arc's time: the slow arc took 11 hours. Partway through, teams took a break, went away to sleep, and came back. Some went away for a week, some for a day. The median time to go away was about enough for a good night's sleep and a big breakfast.

Since they took their breaks on different puzzles, when I looked at median puzzle-solving times, the breaks disappeared. For each puzzle, the median-time team solved without a break.

In theory, the median's a good thing to measure for stuff like this: avoiding skewed timing data from fast Tyrannosaurs and slow folks who decided to get some rest. But I'm glad I looked more closely at that data, or else I'd still be trying to cure that slow arc.

Anyhow, now I'm looking at "slow" puzzles. And using that "funnel" analysis I wanted on each arc, figuring out which puzzles caused the most folks to quit that arc. I looked at stats from 100 teams. As a rule of thumb, for each arc 50 teams started the arc and 30 finished. That's not terrible, but I bet those quit-inspiring puzzles could each use a closer look.

(Oh, and I moved away from using the median to something I've been calling the "twedian": instead of the middle value, I take the value that's 2/3 of the way through the sorted list. It's an attempt to counteract the effect of so many Tyrannosaur-like teams playing this game but trying to figure out how to optimize the fun for not-so-experienced teams.)