Every day, I play the puzzle game Collections.
It starts out kinda like the game show Wheel of Fortune: You're looking at three sets of blanks. You ask about some letters. The game shows you where those letters are in the blanks. When you have some idea what words you're looking at, you figure out what collection (category) they all belong to. Guess the collection correctly to win.
To ask about a letter, you spend points. You want to spend as few points as possible to win. From that description, you might think "Obviously the best strategy is to ask about E first, it's the most common letter." But wait: The game designer thought about that. You spend 10 points for E, but only 1 point for Q. If you start by asking for E, your best score will only ever be 90. If you start by asking for rare, "cheap" letters, maybe you can do better. Which letters should you pick? I guesse hypothesized I should start with the cheap letters, but wasn't too sure. So I wrote a computer program. I tell the program what I've guessed so far, which letters are in which blanks. The program knows the phraser word list and knows how much letters cost in this game. Based on that, it recommends which letter to pick next.
Here are the letters it recommended for several recent Collections games. Here, the line zJxqkyFvwhGBdlNcS means in one puzzle, start with Z, then J, then X…. One line per puzzle. The program pretty consistently suggests starting with the cheapest letters: Z, J, X, and Q. Then prrrrobably the next cheapest: K, Y, V, F, and W. Yay, I will stick with my hypothesis with more confidence now.
zJxqkyFvwhGBdlNcS
zjxqkyfvwhGBdMpu
zjxqkyFvwHgBMDCPlUn
zjxqkYfvwhgBMDSpU
ZjxqkYvfWhBgmdPNu
zJxqkyvfwHbGMDuAp
zjxqKYvfwHbGMPdLN
zjxqkyvfWhGBdMLNp
zjxqKYvFwHGbdmlnR
zjxqkyvfWHgbdSmPn
zjxqkyVfwHgbdSpMA
zJxqkyvfwHgbMApd
zjxqkYvfWhGbmdPL
zjxqkyvfwHgBmdSlN
zjxqKyvfwhgBmdSP
zjxqkyvfwHGbmDsPL
zjXQkyvfwHGbmdSpN
zjxqkyVfwHgbmSdA
zjxqKyVfwHgbpDmluC
zjxqkYvFWhGbPlDU
zjXqkyvfwHGmdPlN
While I'm content with this result, some more rigorous thinkers will note problems:
№1 The program chooses a letter trying to narrow down the number of word possibilities overall. In general, that's a fine strategy. But suppose you're looking at _____, __YX, _____. That middle word is onyx, Styx, or oryx; if you knew which, you could probably figure out the collection right away, even if you don't know anything about the other words. But the program doesn't think narrowing three choices down to one as being very useful; instead, it will mostly try to narrow down the thousands of possibilities for those _____ all-blank words.
№2 The phraser word list is just what I had handy. It was created as a general word list, not necessarily words that you'd expect to see in Collections puzzles. For example, the phraser word list thinks that "eagle" and "helped" are roughly equally awesome. But I bet "eagle" shows up in lots more Collections puzzles. Maybe the category is Birds. Maybe it's golf scores. Maybe it's things on the Mexican flag. When I try to think of categories I'd clue with "helped," I think of… Uh. Hmm. I think of… Yyyyeah. Ideally, my word list would have "eagle" but not "helped".
Not the most important thing going on these days, but a topic on which I can provide expertise, so here ya go.
Permalink
I found out about GroceryDB, a data set of grocery ingredients sold at some big chain stores
(via the excellent Data is Plural blog).
Thus, future versions of the Phraser phrase list will know about
cultured dextrose and modified corn starch, substances which are apparently all around us
(if we are standing in our kitchens, anyhow).
Then I wasted some time staring at modified corn starch trying to use it as a cryptic clue.
Alas, if there are interesting anagrams of corn starch, I didn't find 'em. Scorch rant? Never mind, then.
Permalink
I saw mention of another movie database.
I already knew about IMDb, a pretty-good example acquired by Amazon some years back.
New to me: TMDb, The Movie Database.
I previously figured out how to use IMDb's data to improve phraser's phrase list
with lots of movie titles and movie-people names.
Now I figured out how to do it again with TMDb, hopefully thus getting a broader list.
(Maybe more international?)
And maybe if Amazon ever goes on a cost-cutting rampage and discards IMDb, I might
have a fall-back plan. Anyhow, this data should trickle in next time I update
the phraser lists; I'm trying the new data at home now to make sure it doesn't
ruin everything.
Permalink
While investigating the question "Why doesn't phraser know about [[Redacted MIT Mystery Hunt puzzle solution Redacted]]?" I found a bug: When reading Wikipedia, if there was an absurdly long paragraph, phraser
thought it had reached the end of Wikipedia. Back in 2016 when I was writing phraser and looking carefully for problems, Wikipedia
didn't have any absurdly long paragraphs, so everything seemed to be working fine. In the intervening years, alas, that changed.
I was no longer looking carefully for problems and, alas, didn't notice.
I fixed that bug, yay. (Less notice-ably, I fixed another bug, and thus did something to help phraser find [[Redacted MIT Mystery Hunt puzzle solution Redacted]]. When considering Wikipedia cross-references to, say, Critique of Pure Reason, I was counting "critique of", "of pure", and "pure reason" more than I meant to.)
Anyhow, you might want to download the latest phrase and word lists from the appropriate page. If you run phraser yourself, this would be a good time to refresh and pick up the latest code.
Permalink
I have updated
the Phraser word and phrase lists.
Those of you who find these lists handy for solving/designing
word puzzles, rejoice!
This update incorporates an epiphany!
(It also has updated content from Wikipedia,
etc, but you already expected that.)
tl;dr I fixed many many mistakes, and I like the
quality improvement. If you want the details, read on…
You may recall a quandary: Crossword constructors have hand-crafted
lists of cool phrases, idioms, and such. I can download a few of
these lists, and Phraser can see that SHORTANDSTOUT is a nifty
phrase. Crossword constructors don't care about spaces in phrases;
but I'd like to know where the spaces go.
So Phraser first tries to figure out a list of phrases that have appeared
in text. It reads lots of text-sources: Wikipedia, text files from project
Gutenberg, etc etc. But it doesn't realize that
"short and stout" is a more-interesting phrase than
"copyright 1995", which appears more often.
So it then goes through the crossword lists,
notices that crossword constructors think SHORTANDSTOUT is cool,
notices that "short and stout" is SHORTANDSTOUT
with spaces, and boosts the score of "short and stout".
But what if Phraser never figured out that "short and stout" is a thing?
Maybe it figured out the phrase "short and" and the word "stout" are things,
but never sees that teapot song and thus never realizes that "short and stout"
goes together. In that case, when it sees SHORTANDSTOUT in a crossword list,
it just kinda shrugs, thinks "I don't what to do with that" and moves on. What a waste.
One time, I wrote a program that looked over my crossword lists for SHORTANDSTOUTs
for which Phraser couldn't figure out where to put spaces. For each, it would look
for a pair of phrases that could be combined. So if Phraser had spotted
"short and" and the word "stout", this other program would spot that
those phrases could be combined to make "short and stout". I kept the output from
that program, and fed it to Phraser on subsequent runs. Thus, it would
know "short and stout" was a thing the next time it ran; and when it saw
SHORTANDSTOUT in a crossword list, it would know to boost the score of
"short and stout".
My phrase-combiner program didn't get it right every time.
Like, if a crossword constructor like the very-obscure word BLUNGE,
my phrase-combiner program would guess that must be "B LUNGE".
But it was right most of the time, and the results were good enough such
that I kept using it.
A few days ago, I was looking at one of the wrong phrases that
my phrase-combiner program had come up with:
diuretic ally
That's not a thing. "Diuretically" is a word, sort of. You can look at it and figure
out it's an adverb to describe something acting in the manner of a diuretic, I guess.
It's reeeeeeeally rare, though. If you look up diuretical
and diuretically on the Google ngram viewer, you can see that they show
up not-quite-never in books.
And diuretically appears so very rarely in texts that Phraser figured
"aw, that's probably just a typo" and forgot about it.
But crossword lists agree that
"DIURETICALLY" is a kinda-important thing. So my phrase-combiner, trying its
best, had come up with diuretic ally. And a couple of days ago, I was
staring at that and wondering: OK, why do all these crossword
lists think that "diuretically" is good thing to put into a crossword, given that nobody
uses this word in real life? That's when I had the epiphany.
The epiphany: DIURETICALLY is a valid Scrabble word.
I looked through my crossword-word-lists and saw a fair number
of words-only-Scrabble-players use.
As near as I can tell,
crossword constructors are pretty forgiving about Scrabble words that nobody uses
but are figure-out-able.
(They're not so forgiving about obscure scientific terms; there's no obvious
way for a solver to figure the name of a rare sheep disease by applying
grammar-suffixes to a common word, I guess.)
So I hauled out a SOWPODS list (list of Scrabble words), looked through the list of
best-guess-phrases from my phrase-combiner tool, and thus found many other of my
mistakes like diuretic ally. And I purged them.
I'm now much more confident in the surviving best-guess-phrases; so I increased
their "boost" so that they're more likely to appear in the 5-million-phrases file.
Permalink
Book Report: Shift Happens, Vol. №2
It's the second volume in a set of books about the history of keyboards, text entry, the user experience of working with text on various devices. This volume got into more modern history. Sometimes ...
Permalink
Book Report: Shift Happens, Vol. №1
It's the first volume in a set of books about the history of keyboards, text entry, the user experience of working with text on various then-newfangled devices. I learned a lot, which might kind of s...
Permalink
In the constructor notes for today's Puzzmo crossword, Zhouqin Burnikel says her original gimmick idea (not used) was people whose names had a fruit-word and a season-word. But she could only find on...
Permalink
Spoiler Warning: This post spoils a twist in "Not Your Typical Reincarnation Story." I read a review of the comic "Not Your Typical Reincarnation Story." The comic falls in the isekai genre: the pro...
Permalink
The Hearst Newspapers News-sites, no doubt jealous of the NYT's puzzle section, have launched their own syndicated puzzle page, Puzzmo. Each day there's a cool mini-crossword from the AVCX folks and...
Permalink
I made some more word ladder memory drill web pages; and tweaked the computer program I use to make them to be not so San Francisco street-specific. Several days ago, I made a San Francisco street n...
Permalink
Animals drawn using the letterforms of their words. Sidewalk chalk art in San Francisco at Judah & 20th, sadly faded by the time I saw it. ...
Permalink
This morning, I spotted a van from local plumbing company Chosen Rooter & Plumbing; painted on the side of their van was their logo: They tease us by narrowly avoiding a naughty word in the...
Permalink
I updated the lists of "popular" phrases and words over on the phraser page. These new lists have fresh data from Wikipedia and some other wikis. Perhaps making the biggest difference between this up...
Permalink
April is National Poetry Month. Today is April Fool's Day. I had an idea for something fun to do today, but ended up getting pranked by the English language instead. Since I recently figured out ho...
Permalink
I'm reading press releases about the Beagle Brigade Act, which would set up a center to train beagles to detect prohibited agricultural items in international mail and the baggage of international tr...
Permalink
rephrased Phraser word+phrase lists
I updated the scored word and phrase lists over at the phraser page, using data from a recent copies of Wikipedia and other wikis. Soon after I updated them, I saw that my over-enthusiastic tool tha...
Permalink
Surfwords is an intense word game. I'm enjoying it so far… in short doses, because it's intense. ...
Permalink
phraser improvements
Phraser, the tool for generating word+phrase lists useful for solving+designing puzzles, is now smarter when reading crossword constructor dictionaries. Thus, hundreds of thousands of words+phrases g...
Permalink
I've had a good time playing the word puzzle game Cell Tower at https://www.andrewt.net/puzzles/cell-tower/ ...
Permalink
Thank you Google Books for clearing up the burning questions on common English usage, e.g. is there a space in "backasswards"? (Answer: sometimes, but mostly no.) I usually say "bass ackwards" b...
Permalink
I updated the big ol' list of words and the big ol' list of phrases on the Phraser page. A couple of months back, I noticed that The Collaborative Word List Project was now free. I've used the C.W....
Permalink
I updated that Bewordled game, the one where you swap tiles to make words kinda like Bejewelled but with words. Now it looks prettier with firecracker emojis and clouds. After I updated it, it occurr...
Permalink
The Collaborative Word List Project is a darned useful resource for word puzzle constructors and now it's free.* This is a list of phrases and hand-tuned scores. Here are a few lines from the file: ...
Permalink
Daily 5-dle #0007 11 : 5&8&6&11&10 polydle.github.io/?classic/daily/5 ⬜⬜⬜⬜🟨 ⬜🟩⬜🟨⬜ ⬜⬜⬜⬜🟩 ⬜🟩🟩🟩🟩 🟩🟩🟩🟩🟩 ⬜🟨⬜⬜🟨 ⬜⬜⬜🟨⬜ ⬜🟨⬜🟨⬜ ⬜⬜⬜🟨🟨 ⬜⬜⬜🟨🟨 ⬜⬜🟨⬜🟨 ⬜🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟨🟨⬜🟨🟩 ⬜⬜⬜⬜⬜ 🟩⬜⬜⬜⬜ 🟨⬜⬜🟨⬜ ⬜⬜⬜🟨⬜ 🟩🟩🟩🟩🟩 🟨⬜⬜⬜⬜ 🟩⬜⬜⬜⬜...
Permalink
Okay, now RAISE is my new Wordle starter word. As before, I am not the first to figure this out. Last night, I was measuring a starting word's quality based on how many green and yellow squares it yi...
Permalink
Update: This blog post, which superceded another blog post, has since then itself been superceded. Try to keep up. Also, my "only root words" explanation wasn't quite right. Apparently, non-root wo...
Permalink
UPDATE: This post has been superceded. I've been playing Wordle, the online game that's like a cross between Mastermind and guess-the-word. It occurred to me that the ideal "starting word" would hav...
Permalink
I got wind of a new-ish public word list for crossword constructors, the spread the word(list). So I grabbed a copy and tossed it into the big pile of data that feeds the "Phraser" phrase and word li...
Permalink
Further Bewordled
I read Allison Parrish's article "Rewordable versus the alphabet fetish," in which she discusses the design of the card game Rewordable. Like Scrabble and Bananagrams, in Rewordable a player builds u...
Permalink
Updated "phraser" word list
I updated that big ranked "phraser" word list (and also the even bigger ranked phrase list). It counts words (and phrases) from different sources than it did before. The Expanded Crossword Name Dat...
Permalink
Google Ngrams Download
There's a new-to-me set of Google Ngrams (big files with frequency counts for common and not-so-common English words&phrases&word-strings): Google Ngrams Download. I mention this because when...
Permalink
I have a couple of iron-on patches but no iron. 👆 Still trying to figure out on how many levels that sentence is not ironic. ...
Permalink
"Septuple" has eight letters but "Octuple" has seven. The English language fights you at every turn. ...
Permalink
A few months back, I mentioned that I'd boosted phraser's word lists by using data from Project Gutenberg's huge stash of old books… and mentioned that I wished I'd thought to omit the non-Eng...
Permalink
big ol' text corpus
Project Gutenberg is a collection of Important Works kept online. E.g., if you'd like to read Shakespeare's sonnets and don't want to schlep off to some library for a physical book (ugh), you can dow...
Permalink
The Dutch word "scheepvaart" means "maritime", not ovine flatulent whatever. ...
Permalink
Remember that list of phrases and/or that list of words in a text file handy for designing/solving word puzzles? I updated those lists again with some fresh content. While I'm here: Happy Thanksgivi...
Permalink
Huh. Neither of my senators' voicemail boxes were full this morning. Maybe I should start leaving longer messages. ...
Permalink
As previously threatened, I've updated the phrase and word lists linked from the phraser page with more modern language. E.g., podesta was the 82,350th most "common" word on the old list, but with ne...
Permalink
Remember phraser, that tool for generating puzzle-design-friendly word lists? I just updated it. I found OMDB, a big database of movie info with a public API. (Did I find it? Or did one of you tell m...
Permalink
phraser, a word list generator
When you construct word puzzles, it's good to have a nice list of words to work with. Over the last several weeks, I've been tinkering on and off to build phraser, a tool that chugs through wiki data...
Permalink
Bird Names, part of the new gig
It's an exaggeration to say that Twitter's moving from a Big-Ball-of-Mud monolithic RnR architecture to a loose confederacy of services, but after you tone down the hyperbole that's roughly what's ha...
Permalink
Book Report: Many Subtle Channels in praise of potential literature
In honor of USA's Buy Nothing Day, a report on a book that I checked out of the library: Many Subtle Channels It's a book about the OuLiPo. You've probably heard of them: they're a literary cabal in...
Permalink
Speaking of "what's this kind of puzzle called?", what is "Put together the letter-triples ION ISS NSM TRA to form a word"? It's kind of an anagram, but easier since you've got three triples instead ...
Permalink
Link: Ranking Wikipedia Pages
This puzzle nerd has ideas on how to rank Wikipedia pages for notable-ness. Similar goals to Nutrimatic, but taking advantage of more data. Some of you folks might have some good ideas on things he c...
Permalink
Crossword Compiler Noob Diary
Unsurprisingly, creating mediocre crossword puzzles is easy but creating good crossword puzzles is hard. Mind you, I don't feel pressured to create great crossword puzzles. For puzzlehunts, I only ne...
Permalink
Cyber-F-22
Sometime the past few years, the prefix "cyber-" changed meaning. It used to mean "high-tech". But lately, it's meant "I am trying to sell some poorly-thought-out computer crap to the USA governmen...
Permalink
Michael Agger wants a word for someone who speechifies about the future. He coined "Keynotist" but I prefer TEDifice. ...
Permalink
The voice of Wikipedia. Each article written by people writing about what they care about most. The precise language of controversies tiptoed around. The earnestness. You might think you could rob...
Permalink
Google & OpenID: discovery URL
A while back, I mentioned that Google supported Opendid. There's one important detail that I had a hard time finding amidst the mountains of documentation: If the user wants to use their Google acco...
Permalink
Book Report: Alphabet Juice
This book is a sort of lexicon, except that instead of definitions there are riffs. These are some of the author's favorite words, or at least words that he wanted to write about. He likes to pron...
Permalink
Book Report: Letting Go of the Words
I'm a professional technical writer and I recommend this book about writing: Letting Go of the Words. I theoretically train engineers so that they can write clearly. This book would help those peopl...
Permalink
Link: Warren Spector, Playing Word Games
Warren Spector does not, as far as I know, play uppercase "T" The uppercase "G" Game. But he designs lowercase "g" games. He worked on some good stuff for the Paranoia pencil-and-paper RPG... uhm, ...
Permalink
Book Report: Ambient Findability
This was not the right book for me. Rather, I was not the right person to read this book. Ambient Findability is a high-level overview, a survey of the surge of information that's coming at us, and...
Permalink
Book Report: Rainbows End
It pays to increase your word power. I always thought that "hyperventilation" meant "breathing too fast", but really it means "breathing too fast and/or too deeply". I didn't know it was possible to...
Permalink
Book Report: Everything is Miscellaneous
I am scheduled for HEAD & NECK SURGERY. It says so, in all-capital letters on the appointment form. Don't worry, mom, HEAD & NECK SURGERY is a scary-sounding category of things, but really s...
Permalink
Link: Travelers Storybook
I have mentioned this before: When I was growing, I spent a fair amount of time with Bob & Kelly Wilhelm, friends of the family. Bob was and is a storyteller. I don't just mean that he can rela...
Permalink
Link: Webster's Online Dictionary
Puzzle hunts were everywhere last weekend. Midnight Madness in Hot Springs. Some movie called BHAGAMBHAG set up a promo treasure hunt in Mumbai, sounds big-scale. I didn't do any of that. I have ...
Permalink
Puzzle Hunts are Everywhere, from Seattle to Siena
Some awesome folks in Seattle are contributing to their local Game community by setting up a web site with announcements and forums and stuff. Check it out. I fed their RSS feed into my reader so I...
Permalink
Publishing News
Tom Manshreck is in town. Tom was living in NYC, working in publishing. There's a lot of publishing around there. Tom was working on engineering textbooks, but he still cares about the literary st...
Permalink
Not Quite Letting Go of Spring
Did I mention that White Mughals mentions a doctor treating a bladder infection? And the doctor is named George Ure. Ure should totally be the root of the word "urea", though it isn't, really. Tha...
Permalink