Larry Hosken: New

wordnet, is-a

ColinTheMathmo asked folks to think of animals that were also verbs, like "bug". I thought of some and then it occurred to me: wordnet ("wn") is a computer tool that knows the meaning of many many words. It knows that "bug" is a verb. It knows that a "bug" is an (arthropod is an invertebrate is an) animal. I already knew how to use wn "by hand" to look up word definitions. I knew that it could also be used in a computer program, but I didn't know how. This seemed like a good opportunity to learn.

So I googled around, found out I should install NLTK (sudo aptitude install python3-nltk) and install its data; and then I started tinkering.

From my previous playing with wn I knew that "hypernyms" was its jargon for relations like "a bug is an arthropod". But there were still things to learn to use it programmatically. I had to learn about Synsets, the objects wn uses to keep track of senses of a word. You don't directly ask wn if "bug" is an arthropod. wn doesn't know whether you mean the creepy-crawly "bug" or the computer-problem "bug" or what-have-you. So you start by asking for a word's Synsets, and then you can query wn about the relations between those Synsets and others (e.g., Synset(animal.n.01)).

Once I knew about the important data types, I slapped together a program:

#!/usr/bin/env python3

from nltk.corpus import wordnet as wn

def load_words():
    words = []
    for line in open("words_500K.txt"):
        score_s, word = line.strip().split("\t")
        if len(score_s) < 4: break
        words.append(word)
    return words

def is_animal(word_s):
    animal = wn.synset('animal.n.01')
    senses = wn.synsets(word_s)
    to_do = []
    for sense in senses: # get creepy-crawly bug, computer-problem bug, etc...
        to_do += sense.hypernyms()
    while len(to_do) > 0: # walk up is-a trees: bug -> arthropod -> invertebrate...
        h = to_do.pop()
        if h == animal: return True
        to_do += h.hypernyms()
    return False

def is_verb(word_s):
    return len(wn.synsets(word_s, wn.VERB)) > 0
        
def main():
    words = load_words()
    for word in words:
        if not is_verb(word): continue
        if not is_animal(word): continue
        print(word)

main()

Pretty soon, a list emerges: man head does blue bear horse game mans horses dog heads queen birds fish soldiers fly bird baby stock gray dogs bay soldier jack cat grey pen bears permit mount games sole bull fox kids rays prey flies cow mate kid seal copper wolf ray eagle cats pet rail lamb grade queens pig beef swallow snake steamer cows cock mouse drum bees knot chat duck babies monkey goose rabbit worm rat pigs toy stray crow dove buffalo rats dun hawk fishes hare worms drums toys ducks layer homer hounds fowl snakes mates drill knots buck steer dam bulls butterfly bat rabbits orphan bitch stocks grades frog rails sire swan hound sow welsh whale eagles lambs turtle seals monkeys layers cricket fowls ram crows frogs permits steamers beaver lark butterflies perch pens raven kit quarry bucks sponge jacks bug hog kitten oyster parrot crane swallows crab ape bays calves blues oysters pets grub orphans bugs smelt plug foxes hack monitor stag poll hogs cocks hawks bats mounts babys soles shark cod flicker beetle spat grunt swans kite sharks fawn hares tick apes whales quail gulls falcon cuckoo kittens grays beetles crabs imagines torpedo ravens cub jade nestling larks snail stud grouse cubs parrots gull turtles nag leech badger mares stint rams cranes snails skate whiff polls quarries alligator buffaloes drone crickets drills stunt pup skates dams steers tinker skunk greys bunting clam rooks beavers shrimp ferret wolfs snipe kites homers coppers stags ruff sires carp foxs studs char bitches bulldog lug slug sows grunts clams alligators monitors pout assess reeves sponges suckling stunts torpedoes slugs grubs leeches pollard whiting falcons plugs drones kits quails strays ticks rook chats winkle perches roach mew flounder pinches cockle reeve whelp shrimps pups cuckoos skylark hacks foal fawns duns nags tinkers mews badgers whiffs skunks preys flickers whelps fishs cockles scallop basset ferrets yak nestlings remount scallops prawns crawfish snipes ruffs lugs gelding flys bream flounders spats butterflys bulldogs cocker sucklings foals jades buffalos mouses prawn gooses roaches

They ain't all good, but there are some great ones in there I wouldn't have thought of on my own.

Permalink
& Comments

2021-03-29T15:50:54.381632

Book Report: The Doomsday Calculation (How an Equation that Predicts the Future Is Transforming Everything We Know About Life and the Universe)

Back when I was a humble computer science university student learning how to write operating systems, we learned a simple trick.

A computer might run several programs at the same time: a web browser in one window, a text file editor in another window, several programs that don't even have windows… There are so many programs running that they can't actually run at the same time; there aren't enough CPU chips to go around. Instead, the special operating system program sets up a schedule: a fraction of a second for the web browser, a fraction of a second for the text editor, etc. To make an efficient schedule, it helps if you can predict how much longer a program needs to run. For example, that text editor program probably doesn't have much to do, assuming the user hasn't typed anything in the past ⅛ second. That program probably just needs to wake up, re-draw the blinking text cursor, and go back to sleep. On the other hand, a program that reads and sorts all the text of Wikipedia needs much more time. A naive operating system that scheduled equal CPU time to each of these programs would leave that CPU twiddling its metaphorical thumbs much of the time.

We learned a trick to predict how long a program will want to run. Consider how long the program has already been running. On average, it's about halfway done. If the program has run about a quarter second so far, it probably needs another quarter second. This trick isn't always right—if you just start running that Wikipedia-sorting program a quarter second ago, this trick gives you the wrong answer. But on average, it gives you about the right answer.

Other folks, not just computer science students, use this trick for guesstimating mysterious durations. For example, if you want a guess about how long a musical production will continue its run on Broadway (when we get Broadway back), consider how long it's already been running. There's a name for this trick: the Copernican Principle. Well, the Copernican Principle isn't exactly about estimating durations; it's the idea that the observer of a phenomenon shouldn't think they (the observer) are too special. Copernicus was famous for not-assuming that the earth was the center of the universe. It's similar to think "We're probably not at the very start of this thing; we're probably not at the very end; on average, we're probably about in the middle." This trick isn't great. If you have any information about how long something should last, you should use that information, not this trick. E.g., if the sun rose an hour ago, this trick guesstimates that the sun will set in about an hour; but your knowledge of how days work yields the much better estimate of 11 hours.

Some folks used this trick to guesstimate how long the human race will stick around: several thousand years of civilization followed by a couple of hundred thousand years of species survival. This was a terrible mistake. I don't say it was a terrible mistake because I disagree with its estimate. I say it was a terrible mistake because many philosophers and pundits have opinions about the immanent or not-so-immanent end of [civilization|human life|whatever], and all of these people and their opinions crawled out of the woodwork to yell at each other. This book, The Doomsday Calculation, tells the story of their arguments.

For example, the human population has surged in the past few centuries. It's all very well for someone to have said "When you're trying to guesstimate how long human civilization will last, on average you can guess you're about in the middle," but the chances that they said this in the 1600s or later, because most humans were in the 1600s or later. If someone says "Maybe we'll be around much longer than 'the trick' suggests—like someone an hour after sunrise estimating when sunset is coming," you should raise an eyebrow: odds are, you're a modern Indian not an early Sumerian. (Some folks, including this book's author, think this implies that doomsday is coming soon—perhaps in 700 years or so. I think the trick suggests that in 800 years we won't be doomed—but we can expect, on average, to have declined to an existence like that humanity had 800 years ago: not great.)

The book gets into some interesting territory. Given that humans are intelligent but haven't been contacted by intelligent aliens, what does "the trick" suggest? If the rules of Physics were just a little different, the universe couldn't support life—and "the trick" uses that fact to suggest that ours might be a universe within a multiverse just like the comic book movies told us (but maybe with fewer Spider-Men).

Though the book is interesting, it does point out that it's silly to use "the trick" to predict how long homo sapiens will be around. Remember: if you know how long a day lasts, you shouldn't use "the trick" to predict sunset time; instead, use your knowledge, knowledge is more accurate. Fossil-studying scientists know some things about how long species last before extinction. We should use that knowledge instead of just shrugging and guessing "We'll be around about as long as we've been around, on average." (Fossil knowledge doesn't make the doomsday-timeline-arguments go away; we're in a mass extinction event so you need to follow up that "species tend to last 3 million years" with "…uhm, unless there's some mass extinction event going on, that could pull in the timeline, uh-oh.") But the book doesn't look too closely at this—there are other books for that. This book is about high-falutin' applications of the Copernican Principle.

Permalink
& Comments

2021-03-19T22:34:33.242638

I enjoyed this crossword puzzle's gimmick: https://www.theatlantic.com/free-daily-crossword-puzzle/?id=atlantic_20210221&set=atlantic&puzzleType=crossword

Permalink
& Comments

2021-02-25T14:38:58.227178

I had some thoughts about automatically-generated mazes rattling around in the back of my head and figured out a way to apply an algorithm from that Mazes for Programmers book to a problem I'd noticed a while back.

The problem: I wanted to create a maze with thick walls; and I wanted to choose the maze's entrance and exit squares ahead of time. Alas, the naive algorithms I had were more likely than not to block off the entrance and/or exit with walls. But I'd had Kruskal's algorithm (as applied to mazes) on my mind, and I saw how to apply it here: choose an entrance and an exit, then grow a maze-tree from each; when the two maze-trees collide, treat them as a single maze-tree and continue growing to fill in the rest:

(This problem doesn't show up with normal "skinny-wall" mazes; you can choose any square as the entrance and any as the exit/goal/whatever. You can get from any point to any other point in a normal maze; you can always be assured that there's one way to get from start to finish.)

Permalink
& Comments

2021-02-14T20:49:21.011304

I've been digging these interviews with early employees at tech startups. (Note: I didn't say "with founders"—instead, these are the first hires, wacky risk-takers who maybe weren't that convinced of the vision, but along for an interesting ride.)

Permalink
& Comments

2021-02-13T23:48:11.570643

The US Postal Service announced new stamps for 2021. One title especially caught my eye: "Mystery Message." Wow, a stamp with a hidden message. Sounds like something right up my alley.

[Mystery Message stamp: stylized letters in a variety of colors]

According to the USPS explainer page, this stamp's hidden message is MORE THAN MEETS THE EYE!, as spelled out in a designer-ish semi-readable font. But as any serious puzzler will tell you, that's kind of a shallow mystery. Surely there must be another puzzle in there, an extra-mysterious message, as it were. Some doubters might claim that the USPS wouldn't put a multi-layered hidden message on a stamp. Hmph, those doubters probably still think The Crying of Lot 49 was a work of fiction. Thus, we should examine this design more closely.

First, consider possible Morse code. (No great reason to consider Morse code first. But as my puzzlehunt teammates will tell you, I am quite fond of Morse.) The letters in this stamp's "typeface" are decorated with extra dots and lines. E.g., the M has an extra – on top; that's a Morse T. The O is surrounded by ⸬, four dots; four dots is Morse H. TH sure could be the start of a message. Continuing in this manner yields THERM IETRRMIMMRRNRM which looked kind of promising at first before it devolved into a sort of sleepy mumble.

Next, consider the use of color. The letterforms use nine colors for the message. Any topologist could tell you that four colors would have sufficed. Consider the most obvious indexing scheme: There are 2 indigo letters (M on the first row, second T on the third row). This suggests that our extra-mysterious message has an N (The №2 letter in iNdigo). Continuing in this manner with the nine colors yields nine letters:

N  2 iNdigo
O  2 fOrest green (or mOss?)
I  2 pInk
U  3 blUe
I  2 vIolet
R  2 oRange
Y  2 cYan
D  3 reD
R  2 gReen

NOIUIRYDR. How to order these letters? If this stamp used just ROYGBIV colors, we'd use ROYGBIV order; but the presence of forest green and pink colors suggest we should try something else. But I'm not sure what else, so we can try swizzling the letters around and see what we get. A pessimist might not think much of this idea: Surely that's too many vowels to yield a sensical phrase. But but soon an anagram leaps out:

RUIN IDRYO

To some, that might sound like nonsense. But San Francisco-area puzzle nerds (and Cambridge-area puzzle nerds, one supposes) will of course recognize the last name of Ryan and Christopher IDRYO, designers of the The Hunt for Black Bart’s Hidden Hoard puzzle hunt (in which I played with team Dern Tootin').

I think that's the best solution I'll be able to come up with; but if you find another message, don't be shy about it.

Permalink
& Comments

2021-02-13T01:50:55.666615

New Glasses Selfie

On the day my new glasses were ready, I found my spare pair of glasses. It sure would have been nice to find those a month and a half ago, yep. Anyhow, my new glasses have metal under-rims instead of plastic wire; maybe they won't break so easily.

Permalink
& Comments

2021-02-11T14:29:27.709067

Book Report: Mazes for Programmers

This book is about randomly generating mazes by writing computer programs. Before reading this book, I'd tried randomly generating some mazes, but those mazes hadn't pleased me: too many little nubbly dead ends. This book showed an algorithm (well, a few algorithms) that didn't have so many dead ends. (The secret, obvious when you hear it but I swear not so obvious when you're just staring at a maze and trying to figure out what you don't like about it: have fewer branches in the maze.)

There's plenty in this book: several maze-generation algorithms, tweaks for hex grids, tweaks for 3-D, tweaks for cylinders, tweaks for… Uhm, yeah, there's plenty in this book. I didn't use much of it. I was just looking for this one specific thing! But it's nice to know that if I ever want to put a maze on a Möbius strip, this book has some ideas about that.

Permalink
& Comments

2021-02-01T22:13:46.114386

Book Report: Humble Pi

This book talks about math errors and the consequences that follow. There are errors of engineering, software errors (dear to my heart), and plain old computation errors. Some of these get pretty interesting. E.g., until I read this book, I thought the designers of the Millennium Bridge must be pretty darned incompetent. When pedestrians walked across the newly-constructed Millennium Bridge, it started to shake itself apart. Engineers have known about the "breakstep bridge" problem for a long time; there was no excuse for this to happen to a modern bridge. Except except, as I learned from this book, the Millennium Bridge was shaking itself apart in a new, exciting way. It wasn't bouncing up and down as people stomped on it. Rather, it waggled from side to side as walkers shifted.

I found out about medical calculators, used for lives-are-at-stake calculations like drug dosages. If I haul out my phone calculator and type in 2  3 · 4 , it ignores the second dot and shows 2.34. But it's strange that I hit the · key twice—one of those was probably an accident. Maybe I meant to enter 2.34, maybe I meant 23.4. These fancy-pants medical calculators are more careful: they show a warning about the too-many-dots problem. (The book also discusses a problem that can arise from using too many fancy-pants complex error checkers: if the system gets too complex, that's just more opportunities for errors as layers of a system interfere with each other.)

I found out that the Spurious Correlations website is pretty funny and reminds us that plenty of unrelated things correlate with crime rates, cancer rates, whatever rates; and you should take any pundit's discovery of the true cause of whatever with a big grain of salt. I found out about a project Tommy Flowers worked on after his WWII codebreaking work: ERNIE, a random number generator that generated entropy from neon tubes. I learned… uhm, there's a lot of cool stuff in this book. Recommended; check it out.

Permalink
& Comments

2021-01-25T18:15:30.819359

Occasionally, my web alert for "Hosken" turns up a winner.

Genital shape key to male flies' sexual success

"Male genitals generally, and in Drosophila specifically, evolve very quickly, so we were really surprised to find this weak selection," said Professor David Hosken, of the University of Exeter.

Science Daily

Yes, I have the sense of humor you'd expect of a 12 year old.

Permalink
& Comments

2021-01-18T01:03:48.484308

Tags

Archives:
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Feed