I like to imagine that's how the CMU Pronouncing Dictionary came into existence; a room full of students muttering to themselves and typing. The result was a big text file, of which I'll copy a fraction here:
REACTS R IY0 AE1 K T S READ R EH1 D READ(1) R IY1 D READ'S R IY1 D Z READABILITY R IY2 D AH0 B IH1 L IH0 T IY0 READABLE R IY1 D AH0 B AH0 L
Each line looks like WORD (two spaces) PH ON EM ES OF WO RD, where vowel phonemes are further, uhm, decorated with numbers indicating stress. (Useful for poetry-rhyming: UNHURRIED and WANT TO READ both end with an -eed sound, but the different stress levels would make for an awkward couplet.) Since READ can be pronounced like RED or REED, there's an entry for READ and another for READ(1).
This is a pretty awesome resource, and I've been using it to good effect. Of course, since I have engineer-brain, I don't think much about the parts that work well, and instead stare off into space and think about the parts that don't.
I want to know the syllable-breaks in words. When a computer sees the line
READABLE R IY1 D AH0 B AH0 L
...It doesn't know whether that word is REE-DUH-BULL or REED-UHB-UHL or what-have-you. Maybe there's an automatic way to figure it out, but when I tried googling [phoneme syllable], I bumped into academic linguistics papers. When one bumps into academic papers, one tends to assume one's reading about not-so-easy problems.
I want syllable breaks. Both for puzzles and for, uhm, not-so-great rhymes. Who can forget the classic
To find a rhyme for silver
Or any "rhymeless" rhyme
Requires only will, ver-
Bosity and time.
How would you find this rhyme in the CMU dictionary? (You wouldn't; it doesn't have the word VERBOSITY. But if it did, how would you find it?)
VERB V ER1 B VERBAL V ER1 B AH0 L
In VERB, the B is in the same syllable as the "ER"; but in VERBAL, the B wanders over to the next syllable, clinging to AL. But you can't just say "If there's a following vowel, then the consonants in between wander over." In VENTRAL, the N stays in the first syllable, while the T R travels to AL.
Of course, there's the "easy" solution of asking a bunch of CMU students to mark the syllable breaks in words; or better yet, find out that someone else has already done this. I haven't had any luck finding such a thing, though. Perhaps because I don't know the name of what I'm looking for; perhaps because it doesn't exist.