Okay, now RAISE is my new Wordle starter word. As before, I am not the first to figure this out. Last night, I was measuring a starting word's quality based on how many green and yellow squares it yielded on average. That's a pretty good measurement, but not quite rigorous. E.g. you have to guess "how much more valuable is a green than a yellow?"

A more algorithm-ly rigorous measure is: on average, for a starter word when the game shows you the green and yellow squares, how many wrong-choices get eliminated? Sorry, that's kind of a mouthful. Maybe more clearly:

If you've wallowed in classic puzzles, you've probably seen plenty of coin-weighing problems a la you have 12 coins, one of which is counterfeit and a little light; you have a balance scale; can you find the counterfeit in just three weighings? In many of these coin-weighing problems, the key is to divvy the 12 coins into 3 groups of 4 instead of the the obvious-but-wrong 2 groups of 6. This lets you rule out eight coins in the first round instead of six.

My new measurement looks at a potential starter word. Then it considers all the potential answer-words; for each, what green-and-yellow-squares does the game report? Put all answer-words that get the same green-and-yellow-squares into the same "bucket". You're hoping for many buckets, all about the same size. Because English isn't smooth, you won't get many-many-equally-sized buckets. But you can measure the buckets you do get to see how closely they approach the ideal.

This is a subtle difference. Last night, my favorite word using the the how-many-green-and-yellow-squares measure was SLATE. According to this new measure, SLATE is 99.97% as good as RAISE. I don't know the exact point of diminishing returns for thinking about this problem, but I'm 100% sure I'm way past it.