Larry Hosken: New: Tag: words

I have updated the Phraser word and phrase lists. Those of you who find these lists handy for solving/designing word puzzles, rejoice!

This update incorporates an epiphany! (It also has updated content from Wikipedia, etc, but you already expected that.) tl;dr I fixed many many mistakes, and I like the quality improvement. If you want the details, read on…

You may recall a quandary: Crossword constructors have hand-crafted lists of cool phrases, idioms, and such. I can download a few of these lists, and Phraser can see that SHORTANDSTOUT is a nifty phrase. Crossword constructors don't care about spaces in phrases; but I'd like to know where the spaces go.

So Phraser first tries to figure out a list of phrases that have appeared in text. It reads lots of text-sources: Wikipedia, text files from project Gutenberg, etc etc. But it doesn't realize that "short and stout" is a more-interesting phrase than "copyright 1995", which appears more often.

So it then goes through the crossword lists, notices that crossword constructors think SHORTANDSTOUT is cool, notices that "short and stout" is SHORTANDSTOUT with spaces, and boosts the score of "short and stout".

But what if Phraser never figured out that "short and stout" is a thing? Maybe it figured out the phrase "short and" and the word "stout" are things, but never sees that teapot song and thus never realizes that "short and stout" goes together. In that case, when it sees SHORTANDSTOUT in a crossword list, it just kinda shrugs, thinks "I don't what to do with that" and moves on. What a waste.

One time, I wrote a program that looked over my crossword lists for SHORTANDSTOUTs for which Phraser couldn't figure out where to put spaces. For each, it would look for a pair of phrases that could be combined. So if Phraser had spotted "short and" and the word "stout", this other program would spot that those phrases could be combined to make "short and stout". I kept the output from that program, and fed it to Phraser on subsequent runs. Thus, it would know "short and stout" was a thing the next time it ran; and when it saw SHORTANDSTOUT in a crossword list, it would know to boost the score of "short and stout". My phrase-combiner program didn't get it right every time. Like, if a crossword constructor like the very-obscure word BLUNGE, my phrase-combiner program would guess that must be "B LUNGE". But it was right most of the time, and the results were good enough such that I kept using it.

A few days ago, I was looking at one of the wrong phrases that my phrase-combiner program had come up with:
diuretic ally
That's not a thing. "Diuretically" is a word, sort of. You can look at it and figure out it's an adverb to describe something acting in the manner of a diuretic, I guess. It's reeeeeeeally rare, though. If you look up diuretical and diuretically on the Google ngram viewer, you can see that they show up not-quite-never in books. And diuretically appears so very rarely in texts that Phraser figured "aw, that's probably just a typo" and forgot about it. But crossword lists agree that "DIURETICALLY" is a kinda-important thing. So my phrase-combiner, trying its best, had come up with diuretic ally. And a couple of days ago, I was staring at that and wondering: OK, why do all these crossword lists think that "diuretically" is good thing to put into a crossword, given that nobody uses this word in real life? That's when I had the epiphany.

The epiphany: DIURETICALLY is a valid Scrabble word. I looked through my crossword-word-lists and saw a fair number of words-only-Scrabble-players use. As near as I can tell, crossword constructors are pretty forgiving about Scrabble words that nobody uses but are figure-out-able. (They're not so forgiving about obscure scientific terms; there's no obvious way for a solver to figure the name of a rare sheep disease by applying grammar-suffixes to a common word, I guess.)

So I hauled out a SOWPODS list (list of Scrabble words), looked through the list of best-guess-phrases from my phrase-combiner tool, and thus found many other of my mistakes like diuretic ally. And I purged them. I'm now much more confident in the surviving best-guess-phrases; so I increased their "boost" so that they're more likely to appear in the 5-million-phrases file.

Permalink

Book Report: Shift Happens, Vol. №2 It's the second volume in a set of books about the history of keyboards, text entry, the user experience of working with text on various devices. This volume got into more modern history. Sometimes ...

Permalink

Book Report: Shift Happens, Vol. №1 It's the first volume in a set of books about the history of keyboards, text entry, the user experience of working with text on various then-newfangled devices. I learned a lot, which might kind of s...

Permalink

In the constructor notes for today's Puzzmo crossword, Zhouqin Burnikel says her original gimmick idea (not used) was people whose names had a fruit-word and a season-word. But she could only find on...

Permalink

Spoiler Warning: This post spoils a twist in "Not Your Typical Reincarnation Story." I read a review of the comic "Not Your Typical Reincarnation Story." The comic falls in the isekai genre: the pro...

Permalink

The Hearst Newspapers News-sites, no doubt jealous of the NYT's puzzle section, have launched their own syndicated puzzle page, Puzzmo. Each day there's a cool mini-crossword from the AVCX folks and...

Permalink

I made some more word ladder memory drill web pages; and tweaked the computer program I use to make them to be not so San Francisco street-specific. Several days ago, I made a San Francisco street n...

Permalink

Animals drawn using the letterforms of their words. Sidewalk chalk art in San Francisco at Judah & 20th, sadly faded by the time I saw it. ...

Permalink

This morning, I spotted a van from local plumbing company Chosen Rooter & Plumbing; painted on the side of their van was their logo: They tease us by narrowly avoiding a naughty word in the...

Permalink

I updated the lists of "popular" phrases and words over on the phraser page. These new lists have fresh data from Wikipedia and some other wikis. Perhaps making the biggest difference between this up...

Permalink

April is National Poetry Month. Today is April Fool's Day. I had an idea for something fun to do today, but ended up getting pranked by the English language instead. Since I recently figured out ho...

Permalink

I'm reading press releases about the Beagle Brigade Act, which would set up a center to train beagles to detect prohibited agricultural items in international mail and the baggage of international tr...

Permalink

rephrased Phraser word+phrase lists I updated the scored word and phrase lists over at the phraser page, using data from a recent copies of Wikipedia and other wikis. Soon after I updated them, I saw that my over-enthusiastic tool tha...

Permalink

Surfwords is an intense word game. I'm enjoying it so far… in short doses, because it's intense. ...

Permalink

phraser improvements Phraser, the tool for generating word+phrase lists useful for solving+designing puzzles, is now smarter when reading crossword constructor dictionaries. Thus, hundreds of thousands of words+phrases g...

Permalink

I've had a good time playing the word puzzle game Cell Tower at https://www.andrewt.net/puzzles/cell-tower/ ...

Permalink

Thank you Google Books for clearing up the burning questions on common English usage, e.g. is there a space in "backasswards"? (Answer: sometimes, but mostly no.) I usually say "bass ackwards" b...

Permalink

I updated the big ol' list of words and the big ol' list of phrases on the Phraser page. A couple of months back, I noticed that The Collaborative Word List Project was now free. I've used the C.W....

Permalink

I updated that Bewordled game, the one where you swap tiles to make words kinda like Bejewelled but with words. Now it looks prettier with firecracker emojis and clouds. After I updated it, it occurr...

Permalink

The Collaborative Word List Project is a darned useful resource for word puzzle constructors and now it's free.* This is a list of phrases and hand-tuned scores. Here are a few lines from the file: ...

Permalink

Daily 5-dle #0007 11 : 5&8&6&11&10 polydle.github.io/?classic/daily/5 ⬜⬜⬜⬜🟨 ⬜🟩⬜🟨⬜ ⬜⬜⬜⬜🟩 ⬜🟩🟩🟩🟩 🟩🟩🟩🟩🟩 ⬜🟨⬜⬜🟨 ⬜⬜⬜🟨⬜ ⬜🟨⬜🟨⬜ ⬜⬜⬜🟨🟨 ⬜⬜⬜🟨🟨 ⬜⬜🟨⬜🟨 ⬜🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟨🟨⬜🟨🟩 ⬜⬜⬜⬜⬜ 🟩⬜⬜⬜⬜ 🟨⬜⬜🟨⬜ ⬜⬜⬜🟨⬜ 🟩🟩🟩🟩🟩 🟨⬜⬜⬜⬜ 🟩⬜⬜⬜⬜...

Permalink

Okay, now RAISE is my new Wordle starter word. As before, I am not the first to figure this out. Last night, I was measuring a starting word's quality based on how many green and yellow squares it yi...

Permalink

Update: This blog post, which superceded another blog post, has since then itself been superceded. Try to keep up. Also, my "only root words" explanation wasn't quite right. Apparently, non-root wo...

Permalink

UPDATE: This post has been superceded. I've been playing Wordle, the online game that's like a cross between Mastermind and guess-the-word. It occurred to me that the ideal "starting word" would hav...

Permalink

I got wind of a new-ish public word list for crossword constructors, the spread the word(list). So I grabbed a copy and tossed it into the big pile of data that feeds the "Phraser" phrase and word li...

Permalink

Further Bewordled I read Allison Parrish's article "Rewordable versus the alphabet fetish," in which she discusses the design of the card game Rewordable. Like Scrabble and Bananagrams, in Rewordable a player builds u...

Permalink

Updated "phraser" word list I updated that big ranked "phraser" word list (and also the even bigger ranked phrase list). It counts words (and phrases) from different sources than it did before. The Expanded Crossword Name Dat...

Permalink

Google Ngrams Download There's a new-to-me set of Google Ngrams (big files with frequency counts for common and not-so-common English words&phrases&word-strings): Google Ngrams Download. I mention this because when...

Permalink

I have a couple of iron-on patches but no iron. 👆 Still trying to figure out on how many levels that sentence is not ironic. ...

Permalink

"Septuple" has eight letters but "Octuple" has seven. The English language fights you at every turn. ...

Permalink

A few months back, I mentioned that I'd boosted phraser's word lists by using data from Project Gutenberg's huge stash of old books… and mentioned that I wished I'd thought to omit the non-Eng...

Permalink

big ol' text corpus Project Gutenberg is a collection of Important Works kept online. E.g., if you'd like to read Shakespeare's sonnets and don't want to schlep off to some library for a physical book (ugh), you can dow...

Permalink

The Dutch word "scheepvaart" means "maritime", not ovine flatulent whatever. ...

Permalink

Remember that list of phrases and/or that list of words in a text file handy for designing/solving word puzzles? I updated those lists again with some fresh content. While I'm here: Happy Thanksgivi...

Permalink

Huh. Neither of my senators' voicemail boxes were full this morning. Maybe I should start leaving longer messages. ...

Permalink

As previously threatened, I've updated the phrase and word lists linked from the phraser page with more modern language. E.g., podesta was the 82,350th most "common" word on the old list, but with ne...

Permalink

Remember phraser, that tool for generating puzzle-design-friendly word lists? I just updated it. I found OMDB, a big database of movie info with a public API. (Did I find it? Or did one of you tell m...

Permalink

phraser, a word list generator When you construct word puzzles, it's good to have a nice list of words to work with. Over the last several weeks, I've been tinkering on and off to build phraser, a tool that chugs through wiki data...

Permalink

Bird Names, part of the new gig It's an exaggeration to say that Twitter's moving from a Big-Ball-of-Mud monolithic RnR architecture to a loose confederacy of services, but after you tone down the hyperbole that's roughly what's ha...

Permalink

Book Report: Many Subtle Channels in praise of potential literature In honor of USA's Buy Nothing Day, a report on a book that I checked out of the library: Many Subtle Channels It's a book about the OuLiPo. You've probably heard of them: they're a literary cabal in...

Permalink

Speaking of "what's this kind of puzzle called?", what is "Put together the letter-triples ION ISS NSM TRA to form a word"? It's kind of an anagram, but easier since you've got three triples instead ...

Permalink

Link: Ranking Wikipedia Pages This puzzle nerd has ideas on how to rank Wikipedia pages for notable-ness. Similar goals to Nutrimatic, but taking advantage of more data. Some of you folks might have some good ideas on things he c...

Permalink

Crossword Compiler Noob Diary Unsurprisingly, creating mediocre crossword puzzles is easy but creating good crossword puzzles is hard. Mind you, I don't feel pressured to create great crossword puzzles. For puzzlehunts, I only ne...

Permalink

Cyber-F-22 Sometime the past few years, the prefix "cyber-" changed meaning. It used to mean "high-tech". But lately, it's meant "I am trying to sell some poorly-thought-out computer crap to the USA governmen...

Permalink

Michael Agger wants a word for someone who speechifies about the future. He coined "Keynotist" but I prefer TEDifice. ...

Permalink

The voice of Wikipedia. Each article written by people writing about what they care about most. The precise language of controversies tiptoed around. The earnestness. You might think you could rob...

Permalink

Google & OpenID: discovery URL A while back, I mentioned that Google supported Opendid. There's one important detail that I had a hard time finding amidst the mountains of documentation: If the user wants to use their Google acco...

Permalink

Book Report: Alphabet Juice This book is a sort of lexicon, except that instead of definitions there are riffs. These are some of the author's favorite words, or at least words that he wanted to write about. He likes to pron...

Permalink

Book Report: Letting Go of the Words I'm a professional technical writer and I recommend this book about writing: Letting Go of the Words. I theoretically train engineers so that they can write clearly. This book would help those peopl...

Permalink

Link: Warren Spector, Playing Word Games Warren Spector does not, as far as I know, play uppercase "T" The uppercase "G" Game. But he designs lowercase "g" games. He worked on some good stuff for the Paranoia pencil-and-paper RPG... uhm, ...

Permalink

Book Report: Ambient Findability This was not the right book for me. Rather, I was not the right person to read this book. Ambient Findability is a high-level overview, a survey of the surge of information that's coming at us, and...

Permalink

Book Report: Rainbows End It pays to increase your word power. I always thought that "hyperventilation" meant "breathing too fast", but really it means "breathing too fast and/or too deeply". I didn't know it was possible to...

Permalink

Book Report: Everything is Miscellaneous I am scheduled for HEAD & NECK SURGERY. It says so, in all-capital letters on the appointment form. Don't worry, mom, HEAD & NECK SURGERY is a scary-sounding category of things, but really s...

Permalink

Link: Travelers Storybook I have mentioned this before: When I was growing, I spent a fair amount of time with Bob & Kelly Wilhelm, friends of the family. Bob was and is a storyteller. I don't just mean that he can rela...

Permalink

Link: Webster's Online Dictionary Puzzle hunts were everywhere last weekend. Midnight Madness in Hot Springs. Some movie called BHAGAMBHAG set up a promo treasure hunt in Mumbai, sounds big-scale. I didn't do any of that. I have ...

Permalink

Puzzle Hunts are Everywhere, from Seattle to Siena Some awesome folks in Seattle are contributing to their local Game community by setting up a web site with announcements and forums and stuff. Check it out. I fed their RSS feed into my reader so I...

Permalink

Publishing News Tom Manshreck is in town. Tom was living in NYC, working in publishing. There's a lot of publishing around there. Tom was working on engineering textbooks, but he still cares about the literary st...

Permalink

Not Quite Letting Go of Spring Did I mention that White Mughals mentions a doctor treating a bladder infection? And the doctor is named George Ure. Ure should totally be the root of the word "urea", though it isn't, really. Tha...

Permalink

Tags