New: Puzzle Hunts are Everywhere: The Elementarizer

Yes, it's another blog post about programming & puzzle-hunts. This one isn't a web crawler.

Dr Clue runs team-building puzzle hunts. Alexandra's done some puzzles for them and I've proofread a few. She mentioned that one of these puzzles was tricky to encode--there were fiddly steps, easy to get wrong. And she'd make more of these puzzles in the future. Dr. Clue re-uses puzzle types; this makes sense, since the participants from, say, last week's dentists convention won't participate in next week's gathering of Coca-Cola managers. (I generally can't talk about things I do for Dr. Clue for this reason; but I got permission to talk about this.) Thus, it made sense to automate encoding for this kind of puzzle.

Spoiler warning! If you're reading this because you're about to play in a Dr Clue game, you probably want to stop. I'm about to describe one of the puzzles.

In this type of puzzle, a team receives a series of numbers along with instructions. The instructions say to decode the numbers. They are atomic numbers, replace them with the chemical symbols for those elements. If a number is marked "reverse", then use the chemical symbol backwards. Then substitute some letters; e.g., the instructions might say if you see "P" replace that with "F". Then cross out some letters; e.g., the instructions might say to cross out all the "X"s. Why all of the substituting and crossing-out? Is it to make the puzzle tougher? Nope.

You probably can't encode a message as a straight-up series of chemical symbols. There aren't many symbols to work with; not many have an "E" in them. So you probably need to have players cross out some letters--keep track of those. And you might need to set up some substitutions; keep track of those. It gets tricky. So I wrote a program to help out, a quick-and-dirty Python script.

But that program didn't do much good on my home machine. And it might not do much good to just hand it to the Dr Clue folks: "Here's the program... Oh, but you need to install Python to use it. Whoopsie." But I have a web site. Everyone knows how to use a web site. So I wrapped that script up into a cgi script, a web form. Here is the messy result... uhm, I did warn you about the "quick-n-dirty" aspect, right?


import cgi
import random

print "Content-Type:text/html\n"

# Stolen from wikipedia: Tab-separated. 
# Each line has Symbol, name, name origin, atomic number, mass, group, period
ELEMENT_DATA = '''Ac    Actinium        corruption of the Greek aktinos         89      [227][1]                7
Ag      Silver  Latin argentum  47      107.8682(2)[2]  11      5
Al      Aluminium (Aluminum)    Latin alumen    13      26.9815386(8)   13      3
   ...snipping out many many lines of data...
Zn      Zinc    German zin      30      65.409(4)       12      4
Zr      Zirconium       zircon  40      91.224(2)[2]    4       5'''

elem_num = {}
sym_dict = {}
known_failures = {}

# ReverseString("Able") returns "elbA"
def ReverseString(s):
  c_list = [c for c in s]
  return ''.join(c_list)

def MaybeAddToDict(tweaked_sym, sym):
  if tweaked_sym in sym_dict: 
    # If sym_dict['n'] already contains the nice short 'N', 
    # then don't add sym_dict['N'] <- 'Na'.  Short is nice.
    # Since we cycle through in order from shortest to longest,
    # just check the first element:
    if len(sym) > len(sym_dict[tweaked_sym][0]): return
    sym_dict[tweaked_sym] = []

# Initialize our dictionary of letters->symbols.  Pass in message to be encoded.
# (We need the message because: if our message contains no "X", then "Xe" is 
# a valid symbol for encoding "E".  
def InitDicts(message):
  for sym_len in [1, 2, 3]:
    for line in ELEMENT_DATA.splitlines():
      fields = [s.strip() for s in line.split('\t')]
      (sym, name, origin, number, mass, group, period) = fields
      if not len(sym) == sym_len: continue
      elem_num[ReverseString(sym)] = '-' + number
      elem_num[sym] = number
      clean_sym = sym.lower()
      tweaked_sym = ''.join([c for c in clean_sym if c in message])
      if not tweaked_sym: continue
      MaybeAddToDict(tweaked_sym, sym)
      if len(tweaked_sym) > 1:
        MaybeAddToDict(ReverseString(tweaked_sym), ReverseString(sym))

# We tokenize the message by recursively calling this function.  The call
# stack to tokenize "heal" would look like
#  HelperFunc("heal", "")
#    HelperFunc("al", ".he")
#      HelperFunc("", "")
def HelperFunc(todo, sofar):
  if todo in known_failures: return (False, sofar, known_failures[todo])
  if not len(todo): return(True, sofar, '') # cool, we finished
  bottleneck = ''
  for maybe_len in [3, 2, 1]: # try to use the next three letters. or 2. or 1.
    maybe = todo[:maybe_len]  
    if maybe in sym_dict:
      (success, tokenization, bottleneck) = HelperFunc(todo[maybe_len:], '.'.join([sofar, maybe]))
      if success: return(True, tokenization, '')
  if not bottleneck:
    print todo, '#', sofar
    bottleneck = todo[:3]
  known_failures[todo] = bottleneck
  return (False, sofar, bottleneck)

def PrintRows(tokens, encoded_tokens):
  while 1:
    if not tokens: break
    line_of_tokens = tokens[:TOKENS_PER_ROW]
    tokens = tokens[TOKENS_PER_ROW:]
    line_of_codes =  encoded_tokens[:TOKENS_PER_ROW]
    encoded_tokens = encoded_tokens[TOKENS_PER_ROW:]

    for token in line_of_tokens: print "%4s" % token,
    for code in line_of_codes: print "%4s" % code,
    for code in line_of_codes: print "%4s" % elem_num[code],

def ReportSuccess(message, tokenization, substs):
  print "<p>Found an encoding."
  print "<p>(If you try again, you might get a slightly different encoding."
  print "    We use some randomness."
  tokenization = tokenization[1:]
  tokens = tokenization.split('.')

  message_with_subst = message
  for subst in substs:
    message_with_subst = message_with_subst.replace(subst[1], subst[0])

  letters_to_cross_out = ''

  encoded_tokens = []
  for token in tokens:
    for count in range(0, 10):
      code = random.choice(sym_dict[token])
      if elem_num[code].startswith('-'): continue
      if not [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]: break
    new_nots = [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]
    if new_nots: 
      letters_to_cross_out += ''.join(new_nots)


  sorted = [c for c in letters_to_cross_out]
  letters_to_cross_out = ''.join(sorted)
  print '<p style="font-weight: bold;">Letters to cross out: %s ' % letters_to_cross_out.upper()

  if substs:
    print '''<p><b>Used some substitutions.</b>
             (Double-check these: There might be "transitives" you can
              collapse.  For example "('z', 'g'), ('q', 'z')" really
              means "substitute Q for G") <b>%s</b> ''' % substs

    print "<p>Message w/subst: %s " % message_with_subst

  print "<pre>"
  PrintRows(tokens, encoded_tokens)
  print "</pre>"

  print "<p>Just the numbers (&quot;-12&quot; means 12 reverse)<br><b>"
  print ' '.join([elem_num[elem] for elem in encoded_tokens])
  print "</b>"

def ReportFailure(message):
  print '<p style="color:red;">Failed to elementarize message'
  print '<p>General hints: Q, X, and J are generally difficult to encode'

def GenerateForm(message):
  message = cgi.escape(message)
  message = message.replace('"', "&quot;")
  print '''
  <form action="">
  <input name="msg" value="%s" size="100">  <input type="submit">
''' % message

def GenerateHead(message):
  title = ''.join([c for c in message if c in KOSHER_CHARS])
  print '''<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "">


def GenerateFoot():
  print '''

def AttemptTokenizationHelper(message):

  print "<p>Attempting to elementarize message &quot;%s&quot;" % message
  print "<pre>"
  (success, tokenization, bottleneck) = HelperFunc(message, '')
  print "</pre>"
  print "<p>Finished elementarizing."
  return (success, tokenization, bottleneck)

def AttemptTokenization(message):
  substs = []
  for attempts in range(0, 20):
    (success, tokenization, bottleneck) = AttemptTokenizationHelper(message)
    if success:
      return (True, tokenization, substs)

    print '<p style="color: red;">Tokenization attempt failed.'
    print '<p>We think we had trouble with: <i>%s</i>' % bottleneck
    subst_candidates = [c for c in 'abcdefghijklmnopqrstuvwxyz' if not c in message]
    if not subst_candidates:
      print "<p>Can't substitute: already using all letters."
      return (False, tokenization, substs)
    swap_in = random.choice(subst_candidates)
    if swap_in in "jqxz": 
      swap_in = random.choice(subst_candidates)
    if swap_in in "jqx": 
      swap_in = random.choice(subst_candidates)
    if swap_in in "jx": 
      swap_in = random.choice(subst_candidates)
    swap_out = bottleneck[0]
    if 'x' in message and random.random() > 0.5:
      swap_out = 'x'
    message = message.replace(swap_out, swap_in)
    substs.insert(0, (swap_in, swap_out))
    print "<p>Let's try a substitution: %s for %s" % (swap_in, swap_out)
  print '<p style="color: red;">Too many Tokenization attempts failed. I give up'
  return (False, tokenization, [])

def Main():
  form = cgi.FieldStorage()

  if "msg" in form:
    message = form["msg"].value
    message = "Find the frog statue in the lobby."



  message = ''.join([c for c in message if c.isalpha()])
  message = message.lower()

  (success, tokenization, substs) = AttemptTokenization(message)

  if success:
    ReportSuccess(message, tokenization, substs)



if __name__ == "__main__":

Wow, I'm kind of embarrassed of this script now. To get the mapping from chemical symbols to numbers, I just pasted in a blob of data--most of which I don't use--from Wikipedia and parse it each time the script runs; it would be more elegant to start with just the right data. This program does a lot of extra work, re-computing things it's already figured out. It could be faster... but it takes a few seconds, and maybe that's fast enough. It's messy... but it's a few months after I originally wrote it, and I can look at it and figured out how it worked; that's better than I can say for my old Perl scripts.

Labels: ,

Posted 2008-03-16