: New: wordnet, is-a

ColinTheMathmo asked folks to think of animals that were also verbs, like "bug". I thought of some and then it occurred to me: wordnet ("wn") is a computer tool that knows the meaning of many many words. It knows that "bug" is a verb. It knows that a "bug" is an (arthropod is an invertebrate is an) animal. I already knew how to use wn "by hand" to look up word definitions. I knew that it could also be used in a computer program, but I didn't know how. This seemed like a good opportunity to learn.

So I googled around, found out I should install NLTK (sudo aptitude install python3-nltk) and install its data; and then I started tinkering.

From my previous playing with wn I knew that "hypernyms" was its jargon for relations like "a bug is an arthropod". But there were still things to learn to use it programmatically. I had to learn about Synsets, the objects wn uses to keep track of senses of a word. You don't directly ask wn if "bug" is an arthropod. wn doesn't know whether you mean the creepy-crawly "bug" or the computer-problem "bug" or what-have-you. So you start by asking for a word's Synsets, and then you can query wn about the relations between those Synsets and others (e.g., Synset(animal.n.01)).

Once I knew about the important data types, I slapped together a program:

#!/usr/bin/env python3

from nltk.corpus import wordnet as wn

def load_words():
    words = []
    for line in open("words_500K.txt"):
        score_s, word = line.strip().split("\t")
        if len(score_s) < 4: break
    return words

def is_animal(word_s):
    animal = wn.synset('animal.n.01')
    senses = wn.synsets(word_s)
    to_do = []
    for sense in senses: # get creepy-crawly bug, computer-problem bug, etc...
        to_do += sense.hypernyms()
    while len(to_do) > 0: # walk up is-a trees: bug -> arthropod -> invertebrate...
        h = to_do.pop()
        if h == animal: return True
        to_do += h.hypernyms()
    return False

def is_verb(word_s):
    return len(wn.synsets(word_s, wn.VERB)) > 0
def main():
    words = load_words()
    for word in words:
        if not is_verb(word): continue
        if not is_animal(word): continue


Pretty soon, a list emerges: man head does blue bear horse game mans horses dog heads queen birds fish soldiers fly bird baby stock gray dogs bay soldier jack cat grey pen bears permit mount games sole bull fox kids rays prey flies cow mate kid seal copper wolf ray eagle cats pet rail lamb grade queens pig beef swallow snake steamer cows cock mouse drum bees knot chat duck babies monkey goose rabbit worm rat pigs toy stray crow dove buffalo rats dun hawk fishes hare worms drums toys ducks layer homer hounds fowl snakes mates drill knots buck steer dam bulls butterfly bat rabbits orphan bitch stocks grades frog rails sire swan hound sow welsh whale eagles lambs turtle seals monkeys layers cricket fowls ram crows frogs permits steamers beaver lark butterflies perch pens raven kit quarry bucks sponge jacks bug hog kitten oyster parrot crane swallows crab ape bays calves blues oysters pets grub orphans bugs smelt plug foxes hack monitor stag poll hogs cocks hawks bats mounts babys soles shark cod flicker beetle spat grunt swans kite sharks fawn hares tick apes whales quail gulls falcon cuckoo kittens grays beetles crabs imagines torpedo ravens cub jade nestling larks snail stud grouse cubs parrots gull turtles nag leech badger mares stint rams cranes snails skate whiff polls quarries alligator buffaloes drone crickets drills stunt pup skates dams steers tinker skunk greys bunting clam rooks beavers shrimp ferret wolfs snipe kites homers coppers stags ruff sires carp foxs studs char bitches bulldog lug slug sows grunts clams alligators monitors pout assess reeves sponges suckling stunts torpedoes slugs grubs leeches pollard whiting falcons plugs drones kits quails strays ticks rook chats winkle perches roach mew flounder pinches cockle reeve whelp shrimps pups cuckoos skylark hacks foal fawns duns nags tinkers mews badgers whiffs skunks preys flickers whelps fishs cockles scallop basset ferrets yak nestlings remount scallops prawns crawfish snipes ruffs lugs gelding flys bream flounders spats butterflys bulldogs cocker sucklings foals jades buffalos mouses prawn gooses roaches

They ain't all good, but there are some great ones in there I wouldn't have thought of on my own.

Tags: programming puzzle scene