Yes, it's another blog post about programming & puzzle-hunts. This one isn't a web crawler.
Dr Clue runs team-building puzzle hunts. Alexandra's done some puzzles for them and I've proofread a few. She mentioned that one of these puzzles was tricky to encode--there were fiddly steps, easy to get wrong. And she'd make more of these puzzles in the future. Dr. Clue re-uses puzzle types; this makes sense, since the participants from, say, last week's dentists convention won't participate in next week's gathering of Coca-Cola managers. (I generally can't talk about things I do for Dr. Clue for this reason; but I got permission to talk about this.) Thus, it made sense to automate encoding for this kind of puzzle.
Spoiler warning! If you're reading this because you're about to play in a Dr Clue game, you probably want to stop. I'm about to describe one of the puzzles.
In this type of puzzle, a team receives a series of numbers along with instructions. The instructions say to decode the numbers. They are atomic numbers, replace them with the chemical symbols for those elements. If a number is marked "reverse", then use the chemical symbol backwards. Then substitute some letters; e.g., the instructions might say if you see "P" replace that with "F". Then cross out some letters; e.g., the instructions might say to cross out all the "X"s. Why all of the substituting and crossing-out? Is it to make the puzzle tougher? Nope.
You probably can't encode a message as a straight-up series of chemical symbols. There aren't many symbols to work with; not many have an "E" in them. So you probably need to have players cross out some letters--keep track of those. And you might need to set up some substitutions; keep track of those. It gets tricky. So I wrote a program to help out, a quick-and-dirty Python script.
But that program didn't do much good on my home machine. And it might not do much good to just hand it to the Dr Clue folks: "Here's the program... Oh, but you need to install Python to use it. Whoopsie." But I have a web site. Everyone knows how to use a web site. So I wrapped that script up into a cgi script, a web form. Here is the messy result... uhm, I did warn you about the "quick-n-dirty" aspect, right?
#!/usr/local/bin/python import cgi import random print "Content-Type:text/html\n" # Stolen from wikipedia: Tab-separated. # Each line has Symbol, name, name origin, atomic number, mass, group, period ELEMENT_DATA = '''Ac Actinium corruption of the Greek aktinos 89 [227][1] 7 Ag Silver Latin argentum 47 107.8682(2)[2] 11 5 Al Aluminium (Aluminum) Latin alumen 13 26.9815386(8) 13 3 ...snipping out many many lines of data... Zn Zinc German zin 30 65.409(4) 12 4 Zr Zirconium zircon 40 91.224(2)[2] 4 5''' elem_num = {} sym_dict = {} known_failures = {} # ReverseString("Able") returns "elbA" def ReverseString(s): c_list = [c for c in s] c_list.reverse() return ''.join(c_list) def MaybeAddToDict(tweaked_sym, sym): if tweaked_sym in sym_dict: # If sym_dict['n'] already contains the nice short 'N', # then don't add sym_dict['N'] <- 'Na'. Short is nice. # Since we cycle through in order from shortest to longest, # just check the first element: if len(sym) > len(sym_dict[tweaked_sym][0]): return else: sym_dict[tweaked_sym] = [] sym_dict[tweaked_sym].append(sym) # Initialize our dictionary of letters->symbols. Pass in message to be encoded. # (We need the message because: if our message contains no "X", then "Xe" is # a valid symbol for encoding "E". def InitDicts(message): elem_num.clear() sym_dict.clear() known_failures.clear() for sym_len in [1, 2, 3]: for line in ELEMENT_DATA.splitlines(): fields = [s.strip() for s in line.split('\t')] (sym, name, origin, number, mass, group, period) = fields if not len(sym) == sym_len: continue elem_num[ReverseString(sym)] = '-' + number elem_num[sym] = number clean_sym = sym.lower() tweaked_sym = ''.join([c for c in clean_sym if c in message]) if not tweaked_sym: continue MaybeAddToDict(tweaked_sym, sym) if len(tweaked_sym) > 1: MaybeAddToDict(ReverseString(tweaked_sym), ReverseString(sym)) # We tokenize the message by recursively calling this function. The call # stack to tokenize "heal" would look like # HelperFunc("heal", "") # HelperFunc("al", ".he") # HelperFunc("", ".he.al") def HelperFunc(todo, sofar): if todo in known_failures: return (False, sofar, known_failures[todo]) if not len(todo): return(True, sofar, '') # cool, we finished bottleneck = '' for maybe_len in [3, 2, 1]: # try to use the next three letters. or 2. or 1. maybe = todo[:maybe_len] if maybe in sym_dict: (success, tokenization, bottleneck) = HelperFunc(todo[maybe_len:], '.'.join([sofar, maybe])) if success: return(True, tokenization, '') if not bottleneck: print todo, '#', sofar bottleneck = todo[:3] known_failures[todo] = bottleneck return (False, sofar, bottleneck) def PrintRows(tokens, encoded_tokens): TOKENS_PER_ROW = 15 while 1: if not tokens: break line_of_tokens = tokens[:TOKENS_PER_ROW] tokens = tokens[TOKENS_PER_ROW:] line_of_codes = encoded_tokens[:TOKENS_PER_ROW] encoded_tokens = encoded_tokens[TOKENS_PER_ROW:] for token in line_of_tokens: print "%4s" % token, print for code in line_of_codes: print "%4s" % code, print for code in line_of_codes: print "%4s" % elem_num[code], print print def ReportSuccess(message, tokenization, substs): print "<p>Found an encoding." print "<p>(If you try again, you might get a slightly different encoding." print " We use some randomness." tokenization = tokenization[1:] tokens = tokenization.split('.') message_with_subst = message for subst in substs: message_with_subst = message_with_subst.replace(subst[1], subst[0]) letters_to_cross_out = '' encoded_tokens = [] for token in tokens: for count in range(0, 10): code = random.choice(sym_dict[token]) if elem_num[code].startswith('-'): continue if not [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]: break new_nots = [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out] if new_nots: letters_to_cross_out += ''.join(new_nots) encoded_tokens.append(code) sorted = [c for c in letters_to_cross_out] sorted.sort() letters_to_cross_out = ''.join(sorted) print '<p style="font-weight: bold;">Letters to cross out: %s ' % letters_to_cross_out.upper() print if substs: print '''<p><b>Used some substitutions.</b> (Double-check these: There might be "transitives" you can collapse. For example "('z', 'g'), ('q', 'z')" really means "substitute Q for G") <b>%s</b> ''' % substs print "<p>Message w/subst: %s " % message_with_subst print "<pre>" PrintRows(tokens, encoded_tokens) print "</pre>" print "<p>Just the numbers ("-12" means 12 reverse)<br><b>" print ' '.join([elem_num[elem] for elem in encoded_tokens]) print "</b>" def ReportFailure(message): print '<p style="color:red;">Failed to elementarize message' print '<p>General hints: Q, X, and J are generally difficult to encode' def GenerateForm(message): message = cgi.escape(message) message = message.replace('"', """) print ''' <form action="elemental_encoding.py"> <input name="msg" value="%s" size="100"> <input type="submit"> </form> ''' % message def GenerateHead(message): KOSHER_CHARS = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' title = ''.join([c for c in message if c in KOSHER_CHARS]) print '''<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>Elementarizing</title> </head> <body> <h1>Elementarizing</h1> ''' def GenerateFoot(): print ''' </body> </html> ''' def AttemptTokenizationHelper(message): InitDicts(message) print "<p>Attempting to elementarize message "%s"" % message print "<pre>" (success, tokenization, bottleneck) = HelperFunc(message, '') print "</pre>" print "<p>Finished elementarizing." return (success, tokenization, bottleneck) def AttemptTokenization(message): substs = [] for attempts in range(0, 20): (success, tokenization, bottleneck) = AttemptTokenizationHelper(message) if success: return (True, tokenization, substs) print '<p style="color: red;">Tokenization attempt failed.' print '<p>We think we had trouble with: <i>%s</i>' % bottleneck subst_candidates = [c for c in 'abcdefghijklmnopqrstuvwxyz' if not c in message] if not subst_candidates: print "<p>Can't substitute: already using all letters." return (False, tokenization, substs) swap_in = random.choice(subst_candidates) if swap_in in "jqxz": swap_in = random.choice(subst_candidates) if swap_in in "jqx": swap_in = random.choice(subst_candidates) if swap_in in "jx": swap_in = random.choice(subst_candidates) swap_out = bottleneck[0] if 'x' in message and random.random() > 0.5: swap_out = 'x' message = message.replace(swap_out, swap_in) substs.insert(0, (swap_in, swap_out)) print "<p>Let's try a substitution: %s for %s" % (swap_in, swap_out) print '<p style="color: red;">Too many Tokenization attempts failed. I give up' return (False, tokenization, []) def Main(): form = cgi.FieldStorage() if "msg" in form: message = form["msg"].value else: message = "Find the frog statue in the lobby." GenerateHead(message) GenerateForm(message) message = ''.join([c for c in message if c.isalpha()]) message = message.lower() (success, tokenization, substs) = AttemptTokenization(message) if success: ReportSuccess(message, tokenization, substs) else: ReportFailure(message) GenerateFoot() exit if __name__ == "__main__": Main()
Wow, I'm kind of embarrassed of this script now. To get the mapping from chemical symbols to numbers, I just pasted in a blob of data--most of which I don't use--from Wikipedia and parse it each time the script runs; it would be more elegant to start with just the right data. This program does a lot of extra work, re-computing things it's already figured out. It could be faster... but it takes a few seconds, and maybe that's fast enough. It's messy... but it's a few months after I originally wrote it, and I can look at it and figured out how it worked; that's better than I can say for my old Perl scripts.
Labels: programming, puzzlehunts