Yes, it's another blog post about programming & puzzle-hunts. This one isn't a web crawler.
Dr Clue runs team-building puzzle hunts. Alexandra's done some puzzles for them and I've proofread a few. She mentioned that one of these puzzles was tricky to encode--there were fiddly steps, easy to get wrong. And she'd make more of these puzzles in the future. Dr. Clue re-uses puzzle types; this makes sense, since the participants from, say, last week's dentists convention won't participate in next week's gathering of Coca-Cola managers. (I generally can't talk about things I do for Dr. Clue for this reason; but I got permission to talk about this.) Thus, it made sense to automate encoding for this kind of puzzle.
Spoiler warning! If you're reading this because you're about to play in a Dr Clue game, you probably want to stop. I'm about to describe one of the puzzles.
In this type of puzzle, a team receives a series of numbers along with instructions. The instructions say to decode the numbers. They are atomic numbers, replace them with the chemical symbols for those elements. If a number is marked "reverse", then use the chemical symbol backwards. Then substitute some letters; e.g., the instructions might say if you see "P" replace that with "F". Then cross out some letters; e.g., the instructions might say to cross out all the "X"s. Why all of the substituting and crossing-out? Is it to make the puzzle tougher? Nope.
You probably can't encode a message as a straight-up series of chemical symbols. There aren't many symbols to work with; not many have an "E" in them. So you probably need to have players cross out some letters--keep track of those. And you might need to set up some substitutions; keep track of those. It gets tricky. So I wrote a program to help out, a quick-and-dirty Python script.
But that program didn't do much good on my home machine. And it might not do much good to just hand it to the Dr Clue folks: "Here's the program... Oh, but you need to install Python to use it. Whoopsie." But I have a web site. Everyone knows how to use a web site. So I wrapped that script up into a cgi script, a web form. Here is the messy result... uhm, I did warn you about the "quick-n-dirty" aspect, right?
#!/usr/local/bin/python
import cgi
import random
print "Content-Type:text/html\n"
# Stolen from wikipedia: Tab-separated.
# Each line has Symbol, name, name origin, atomic number, mass, group, period
ELEMENT_DATA = '''Ac Actinium corruption of the Greek aktinos 89 [227][1] 7
Ag Silver Latin argentum 47 107.8682(2)[2] 11 5
Al Aluminium (Aluminum) Latin alumen 13 26.9815386(8) 13 3
...snipping out many many lines of data...
Zn Zinc German zin 30 65.409(4) 12 4
Zr Zirconium zircon 40 91.224(2)[2] 4 5'''
elem_num = {}
sym_dict = {}
known_failures = {}
# ReverseString("Able") returns "elbA"
def ReverseString(s):
c_list = [c for c in s]
c_list.reverse()
return ''.join(c_list)
def MaybeAddToDict(tweaked_sym, sym):
if tweaked_sym in sym_dict:
# If sym_dict['n'] already contains the nice short 'N',
# then don't add sym_dict['N'] <- 'Na'. Short is nice.
# Since we cycle through in order from shortest to longest,
# just check the first element:
if len(sym) > len(sym_dict[tweaked_sym][0]): return
else:
sym_dict[tweaked_sym] = []
sym_dict[tweaked_sym].append(sym)
# Initialize our dictionary of letters->symbols. Pass in message to be encoded.
# (We need the message because: if our message contains no "X", then "Xe" is
# a valid symbol for encoding "E".
def InitDicts(message):
elem_num.clear()
sym_dict.clear()
known_failures.clear()
for sym_len in [1, 2, 3]:
for line in ELEMENT_DATA.splitlines():
fields = [s.strip() for s in line.split('\t')]
(sym, name, origin, number, mass, group, period) = fields
if not len(sym) == sym_len: continue
elem_num[ReverseString(sym)] = '-' + number
elem_num[sym] = number
clean_sym = sym.lower()
tweaked_sym = ''.join([c for c in clean_sym if c in message])
if not tweaked_sym: continue
MaybeAddToDict(tweaked_sym, sym)
if len(tweaked_sym) > 1:
MaybeAddToDict(ReverseString(tweaked_sym), ReverseString(sym))
# We tokenize the message by recursively calling this function. The call
# stack to tokenize "heal" would look like
# HelperFunc("heal", "")
# HelperFunc("al", ".he")
# HelperFunc("", ".he.al")
def HelperFunc(todo, sofar):
if todo in known_failures: return (False, sofar, known_failures[todo])
if not len(todo): return(True, sofar, '') # cool, we finished
bottleneck = ''
for maybe_len in [3, 2, 1]: # try to use the next three letters. or 2. or 1.
maybe = todo[:maybe_len]
if maybe in sym_dict:
(success, tokenization, bottleneck) = HelperFunc(todo[maybe_len:], '.'.join([sofar, maybe]))
if success: return(True, tokenization, '')
if not bottleneck:
print todo, '#', sofar
bottleneck = todo[:3]
known_failures[todo] = bottleneck
return (False, sofar, bottleneck)
def PrintRows(tokens, encoded_tokens):
TOKENS_PER_ROW = 15
while 1:
if not tokens: break
line_of_tokens = tokens[:TOKENS_PER_ROW]
tokens = tokens[TOKENS_PER_ROW:]
line_of_codes = encoded_tokens[:TOKENS_PER_ROW]
encoded_tokens = encoded_tokens[TOKENS_PER_ROW:]
for token in line_of_tokens: print "%4s" % token,
print
for code in line_of_codes: print "%4s" % code,
print
for code in line_of_codes: print "%4s" % elem_num[code],
print
print
def ReportSuccess(message, tokenization, substs):
print "<p>Found an encoding."
print "<p>(If you try again, you might get a slightly different encoding."
print " We use some randomness."
tokenization = tokenization[1:]
tokens = tokenization.split('.')
message_with_subst = message
for subst in substs:
message_with_subst = message_with_subst.replace(subst[1], subst[0])
letters_to_cross_out = ''
encoded_tokens = []
for token in tokens:
for count in range(0, 10):
code = random.choice(sym_dict[token])
if elem_num[code].startswith('-'): continue
if not [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]: break
new_nots = [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]
if new_nots:
letters_to_cross_out += ''.join(new_nots)
encoded_tokens.append(code)
sorted = [c for c in letters_to_cross_out]
sorted.sort()
letters_to_cross_out = ''.join(sorted)
print '<p style="font-weight: bold;">Letters to cross out: %s ' % letters_to_cross_out.upper()
print
if substs:
print '''<p><b>Used some substitutions.</b>
(Double-check these: There might be "transitives" you can
collapse. For example "('z', 'g'), ('q', 'z')" really
means "substitute Q for G") <b>%s</b> ''' % substs
print "<p>Message w/subst: %s " % message_with_subst
print "<pre>"
PrintRows(tokens, encoded_tokens)
print "</pre>"
print "<p>Just the numbers ("-12" means 12 reverse)<br><b>"
print ' '.join([elem_num[elem] for elem in encoded_tokens])
print "</b>"
def ReportFailure(message):
print '<p style="color:red;">Failed to elementarize message'
print '<p>General hints: Q, X, and J are generally difficult to encode'
def GenerateForm(message):
message = cgi.escape(message)
message = message.replace('"', """)
print '''
<form action="elemental_encoding.py">
<input name="msg" value="%s" size="100"> <input type="submit">
</form>
''' % message
def GenerateHead(message):
KOSHER_CHARS = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
title = ''.join([c for c in message if c in KOSHER_CHARS])
print '''<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Elementarizing</title>
</head>
<body>
<h1>Elementarizing</h1>
'''
def GenerateFoot():
print '''
</body>
</html>
'''
def AttemptTokenizationHelper(message):
InitDicts(message)
print "<p>Attempting to elementarize message "%s"" % message
print "<pre>"
(success, tokenization, bottleneck) = HelperFunc(message, '')
print "</pre>"
print "<p>Finished elementarizing."
return (success, tokenization, bottleneck)
def AttemptTokenization(message):
substs = []
for attempts in range(0, 20):
(success, tokenization, bottleneck) = AttemptTokenizationHelper(message)
if success:
return (True, tokenization, substs)
print '<p style="color: red;">Tokenization attempt failed.'
print '<p>We think we had trouble with: <i>%s</i>' % bottleneck
subst_candidates = [c for c in 'abcdefghijklmnopqrstuvwxyz' if not c in message]
if not subst_candidates:
print "<p>Can't substitute: already using all letters."
return (False, tokenization, substs)
swap_in = random.choice(subst_candidates)
if swap_in in "jqxz":
swap_in = random.choice(subst_candidates)
if swap_in in "jqx":
swap_in = random.choice(subst_candidates)
if swap_in in "jx":
swap_in = random.choice(subst_candidates)
swap_out = bottleneck[0]
if 'x' in message and random.random() > 0.5:
swap_out = 'x'
message = message.replace(swap_out, swap_in)
substs.insert(0, (swap_in, swap_out))
print "<p>Let's try a substitution: %s for %s" % (swap_in, swap_out)
print '<p style="color: red;">Too many Tokenization attempts failed. I give up'
return (False, tokenization, [])
def Main():
form = cgi.FieldStorage()
if "msg" in form:
message = form["msg"].value
else:
message = "Find the frog statue in the lobby."
GenerateHead(message)
GenerateForm(message)
message = ''.join([c for c in message if c.isalpha()])
message = message.lower()
(success, tokenization, substs) = AttemptTokenization(message)
if success:
ReportSuccess(message, tokenization, substs)
else:
ReportFailure(message)
GenerateFoot()
exit
if __name__ == "__main__":
Main()
Wow, I'm kind of embarrassed of this script now. To get the mapping from chemical symbols to numbers, I just pasted in a blob of data--most of which I don't use--from Wikipedia and parse it each time the script runs; it would be more elegant to start with just the right data. This program does a lot of extra work, re-computing things it's already figured out. It could be faster... but it takes a few seconds, and maybe that's fast enough. It's messy... but it's a few months after I originally wrote it, and I can look at it and figured out how it worked; that's better than I can say for my old Perl scripts.
Labels: programming, puzzlehunts