Book Report: India (From Midnight to Millenium)

Summary:

India is a melting pot of cultures, except that it hasn't melted. India is a diverse mix of cultures. Groups get mad and argue. It would be better if they just got along. For the most part they do. But sometimes they don't. Also, there is much government corruption. People in small towns still care about caste--but the next generation doesn't really care, so there's hope for the future.

This book was a tough slog. As a careful reader, you know to read some things with a grain of salt, and I needed a lot of salt with this book. But there were some good patches.

Labels:

Book Report: The Social Life of Information

It was six months when I started looking for work and it was just this last week that I got my first phone screen. There aren't many technical writing jobs in San Francisco, thus phone screens are rare. So I was pretty excited: finally, a chance to eliminate my commute!

Partway through the phone screen, I was trying to convince this company's CTO that he didn't want a technical writer. I don't think I convinced him. I probably wasn't very coherent--I was pretty surprised when I realized "Hey what the heck I am trying to talk this guy out of hiring me!?!" But I couldn't help it; when I talked with him about the project, it sounded like he could get a lot more use out of a short-term contractor writer and a junior programmer. And I said that before I really thought I was disqualifying myself.

And thus I will continue with my long commute.

My job title is "Technical Writer." But that's not my mission. My mission is:

Spread knowledge

Writing technical documents is one way to spread knowledge; sometimes it's the most appropriate way; often, it's not. Sometimes I get so wrapped up in some document that I forget to stick my head up, remember the big picture.

The Social Life of Information is a book about the big picture of spreading knowledge. It was written back in 2000--back when the internet's firehose of information was about to wash away all conventional governments, businesses, economies, societies, and we were all going to live happily ever after. The book points out: new communications technology eases the flow of information around the world, but there's still a "last 5mm" problem--getting that information through a learner's skull.

Lately, I've been working on company-internal documentation. I work for a large multi-national corporation with clusters of software engineers hacking away across the globe. Some group in Dublin figures out a clever way to do something useful. Can we spread that knowledge to other groups? Maybe. Sometimes.

The Social Life of Information touches on many topics. But my current focus caused me to latch onto certain parts. The second half of this book could have been called How Organizations Learn. I kept hitting passages which made me nod my head and say "Right on." Please indulge me as a quote long passages from this book.

...The instability that rapidly-changing technology brings, however, often lies less in the technology itself than in enthusiastic expectations that everything being "just a click away" or "at your fingertips" will make life easy. Battered by such hype, it's easy to believe that everyone except you knows how to use this stuff without a problem.

We saw this pressure at work on a new employee at Xerox PARC. She was intelligent and hard-working, but got mired in difficulties with the office computer system. That system came with the usual promises of "usability" and self-explanatoriness, but she found it impossible to use or understand. Being a newcomer, she was reluctant to keep asking for help. ...

Then chance moved her desk from an isolated office into the center of a group of offices. There she immediately benefitted from... incidental learning... She saw that these "stable" machines crashed for everyone. She saw that there was no more "ease" for experienced assistants, long-time employees, or PARC's hallowed computer scientists than for her. And she also saw that when a machine did crash, its user would without shame look around for help from someone else who, whatever their status, had successfully steered around that particular problem. No one person knew how to handle these temperamental machines. But spread around the office was enough collective knowledge to keep them up and running.

This story makes me cringe. She didn't report these crashes to the people who could fix them? Surely, there should be a way to report the crashes--along with documentation of "known problems"-- a-and with workarounds for those problems! Surely we should solve this problem with tons and tons of documentation. Then I calm down a bit. It takes effort to document things, to maintain that documentation. How much time should she have put into trying to track down where each crash was coming from, coming up with a good report? Who would be in charge of tracking the known problems? That takes effort, too. When you hire someone for some random job, you don't tack "...and must write excellent bug reports" onto their job requirements. It would be nice if these people carefully tracked down every defect, but it's not a realistic hope.

[When Alexander Graham Bell was promoting use of the then-new telephone]... The company needed, Bell argued, to abandon specialists and specialist training and put the phones in people's hands. In the right circumstances, the practicality of the device would do the rest. So he crafted the circumstances. ... [The company] put phones near lunch counters. That way, it reasoned, people who didn't know how to use them would be likely to see people who did know how and in this way learn about the phone system.

Later, they mention that one thing that helps people learn how to drive motor vehicles: before they learn how to drive, they've probably been a passenger many times, watching someone else drive. It's almost an apprenticeship system.

Another interesting section...

An anthropologist, Orr, studied the Xerox technical representatives (reps) who service and repair the company's copiers at customers' sites. ...

The company tried to provide the reps with the targeted information they would need. [It provided training and documentation.] ...

Everyone knew what reps did. But Orr argues forcefully that work is rarely well understood. Neither management nor management theorists, he points out, are adequately "concerned with work practice," by which he means they "do not focus on what is done in accomplishing a given job." He was not surprised, then, to find out that what looked quite clear and simple from above was much more opaque and confusing on the ground. Tasks were no longer so straightforward, and machines, despite their elegant circuit diagrams and diagnostic procedures, exhibited quite incoherent behaviors. Consequently, the information and training provided to the reps was inadequate for all but the most routine tasks they faced. ...

The reps' real difficulties arose, however, not simply because the documentation had lapses. They arose more problematically because it told them what to do, but not why. ... So when machines did something unpredicted, reps found themselves not just off the map, but there without a compass or tools for bushwhacking. ...

Orr begins his account of the reps' day not where the company process begins--9 o'clock at the first call--but at breakfast beforehand. From a conventional perspective, the reps' job was highly individual. Routine work was carried out alone... Yet Orr found that the reps were remarkably social, getting together on their own time for breakfast, lunch, coffee, or at the end of the day--and sometimes for all of the above.

...At these meetings, while eating, playing cribbage, and engaging in what might seem like idle gossip, the reps talked work, and talked it continuously. They posed questions, raised problems, offered solutions, constructed answers, and discussed changes in their work, the machines, or customer relations. In this way... they kept one another up to date with what they knew, what they learned, and what they did.

If I really want to spread knowledge around my company, should I write another document? Or should I offer to take care of an engineer's pet budgie for a week, freeing that engineer to travel to a remote site and sit down to lunch with a different group of people?

Reps tell stories about unsolved problems in an attempt to generate a coherent account of what the problem is and how to solve it. They may do this individually, putting their own story together. Or they can do it collectively, as they draw on the collective wisdom and experience of the group. ...

While it may appear at first that the reps used stories to circulate information, they were actually doing much more. For it is not shared stories or shared information so much as shared interpretation that binds people together. In their storytelling, the resps developed a common framework that allowed them to interpret the information that they received in a common light. To collaborate around shared information you first have to develop a shared framework for interepretation. ...

Before you can describe your day to someone, you need to understand your day and that someone needs to understand your vocabulary of description.

As a result of Orr's work, rather than trying to support the reps with yet more information from outside the reps' community, Xerox instead turned to reinforcing internal ties. The first step was simple. Reps were given two-way radios, which allowed them to continue to talk to one another even when working apart. This intervention both supported and acknowledged the reps' ways of collaboration, narration, and improvisation.

The second step was more ambitious, but it too reflected the resources the reps provided for themselves and tried to amplify this resourcefulness. Though passed on in war stories, the insight reps developed in the course of their work tended to have a short reach, traveling primarily in local groups, and a short life, fading from memory even locally. Consequently, reps near and far ended up reinventing fixes that might have been known elsewhere. The Eureka project set out to create a database of this useful knowledge, preserving over time and delivering over space resourceful ideas.

I.e., rather than giving them the effort of a writer, it was more useful to give them a tool that made it easy to record knowledge themselves. Maybe your organization needs a scruffy wiki more than it needs another polished document.

...To maintain a competitive edge, firms first search for the best practices either within their own or in their competitors' units. Once identified, they then transfer these to areas where practices are less good. The search part has led to a a great deal of useful benchmarking. The transfer part, however, has proved much more awkward. ... the now-famous lament of HP's chairman Lew Platt, as he considered how much better the firm would be "if only we knew what we know at HP."

Spreading knowledge ain't easy. Some moves, but a lot of it slips between the cracks.

Another colleague, Jack Whalen, showed the power of practice in his study of learning in a service center taking the calls from customers and scheduling technicians. ...the people who take the calls can save the company money by diagnosing simple problems and telling the customer how to fix these for themselves. ...

The phone operators are not, of course, trained as technicians. In the past, however, they learned from the reps when the latter called in to pick up their next job. The reps would then explain how trivial the last one had been, and in the process the phone operators could learn a lot from these mentors. When they next took such a call, they could offer such a solution. ...technicians no longer pick up calls this way. Consequently, operators no longer pick up insights. ...

The company has tried to replace this kind of learning with the more explicit support of a "case-based expert system." ... This alternative has not worked well. As the reps found with "directive documentation," it can be surprisingly difficult to get a clear diagnosis and solution this way. Moreover, such a system doesn't help the operators understand what they're doing. ... the company contemplated new training courses...

Whalen and his fellow researchers took a slightly different route, however. They studied one service center and the quality of diagnosis its staff provided. There they found two operators who gave especially reliable answers. One, unsurprisingly, was an eight-year veteran of the service center with some college experience and a survivor from the days when reps served as mentors. The other, however, was someone with only a high-school diploma. She had been on the job barely four months.

The researchers noticed however, that the newcomer had a desk opposite he veteran. There she could hear the veteran talking calls, asking questions, and giving advice. And she began to do the same. She had also noticed that he had acquired a variety of pamphlets and manuals, so she began to build up her own stock. Moreover, when she didn't understand the answers the veteran gave, she asked him to show her what he meant, using the service center's own copier.

...Ultimately, Whalen concluded, given the amount and level of knowledge already available in the room, what the operators needed were not so much expert systems or new training courses but "longer phone cords." (These allow an operator taking a call to slide over to the desk and the screen of a resourceful colleague who could provide the necessary help.)

Documentation wasn't useless in that story. The veteran had pamphlets and manuals. But the documentation wasn't enough, couldn't stand on its own.

We see two types of work-related networks that, with the boundaries they inevitably create, are critical for understanding learning, work, and the movement of knowledge. First, there are the networks that link people to others whom they may never get to know but who work on similar practices. We call these "networks of practice." Second, there are more tight-knit groups formed, again through practice, by people working together on the same or similar tasks. These are what, following Lave and Wenger, we call "communities of practice." ...

Great, new buzzwords with which I can bewilder my colleagues. Maybe if I say "network of practice" enough times, I'll irritate my co-workers enough such that I can trick them into reading the book just to find out what I'm talking about.

...In a similar vein, Saxenian draws attention to the importance of social forces to the development of Silicon Valley that both encourages and is encouraged by the networks of practice that run between companies. By contrast, the companies of Route 128 discouraged fraternization between firms. This insularity not only cut firms off from their ecology but also prevented the ecology as a whole from developing. ...

Promoting internal communication is good; don't forget to pull in people from the outside, though. I work for a big, growing, company. I could talk to a different brillant co-worker every day and never run out. If I did that, it would be tempting to forget the many brilliant people outside the company that I wasn't talking to. It's easy to fall into the trap of provincialism.

Gee, I'm tempted to just type the whole book into this blog entry, but my fingers are getting tired and that would probably violate copyright, so I'll stop here. I'll just recommend that you read this book. But not on this blog post. Oh, poor tired fingers.

Labels: , ,

Book Report: Seabiscuit

This was a fun read about horseracing.

Labels: , ,

Embedded Reporter Seeks Team

Is your team playing in the upcoming Back to Basics/Midnight Madness game on April 5? Would you let me play with your team and then write about it afterwards? If so, please get in touch with me (web+comment@lahosken.san-francisco.ca.us).

A couple of years back, Continental Breakfast let me tag along as they play-tested the Hogwarts Game, and I wrote about it. And thus the world got to find out that: Continental Breakfast has an unusually high concentration of Australians. See, that's deep reporting.

The world needs to learn startling facts like this about your team. No, really, other gamers are curious about you.

Questions that people have asked me about this "embedded-reporter" project:

  • "When you're reporting, do you play the Game? Or do you just sit in the van and take notes?" I play. I tone down my style at first until I see how the team works together. I'm interested in how the team works together, and I try not to Heisenbergishly change that. But that doesn't mean I'm going to keep quiet when I notice some paragraph of puzzle text contains an unusually high density of hyphens and periods.
  • "What kinds of teams do you want to report on? Top-scoring? Most veteran? Snappiest dressers?" Any and all. Eventually, I'd like to do this for a variety of teams. So I guess if I end up writing about too many in some, uhm, category then I'd want to make an effort to write about other kinds of teams for a while. If I'm only writing about one team per year, I bet it will be a while before that's an issue.

If that sounds like something that your team could stand, I am web+comment@lahosken.san-francisco.ca.us.

Labels: , ,

Puzzle Hunts are Everywhere: Dim Memories of GC Summit 2008

The lovely Just Passing Through put together a fun & educational event last night: a GC Summit. Folks who had run Games and/or were considering running Games showed up to eat, talk, and watch informative lectures. Should we call it GC Summit 2008, seeing as how the previous one was GC Summit 2007? Sure, let's say that.

I bet that excellent videos of the lectures will appear on YouTube soon, thanks to Curtis (and maybe thanks to others... Curtis was working the camera... anyhow...). But the conversations before and after were good, too. Unfortunately, I didn't bring along my little audio recorder, so all I've got is a few snippets that lodged in my unreliable-narrator brain.

  • On the car ride over, an idea batted around: open-source software that people write for Games. Not just the little web-crawlers and such. But the software that GC writes to track teams' progress, handle pre-game... more complicated things. This idea came up a couple of times during the Summit itself.
  • It is now public knowledge that Greg et al. are thinking of a game in early November, but They have not committed, good grief people calm down.
  • Casey Jade Holman, age 1.5-ish, has learned to say "puzzle." And she kind of chuckles when she says it. I mean, she says other stuff, too. I don't want you to think that she's growing up warped or something. I'm just pointing out that she seems to like saying the word "puzzle," is all.
  • John Owens' good news is that he got tenure. John Owens' bad news is that he didn't receive an invitation to the "Back to Basics" game. G.C. sent physical invitations by post; so you were much more likely to get an invite if GC knew your address. Teams that did receive an invite received two, so they could pass one along to another deserving team. No-one passed one to John. When he said this, I thought back to when Alexandra said that Team Mystic Fish had an extra invite: I had just naturally thought Which less-connected team needs our help? Which of them could we pass that to? Saying "Should we ask Advil if they want an invitation?" is kind of like saying "I hear that the King of Sweden is coming to San Francisco; I wonder if he has a place to stay; should we call up and offer him a spot on the couch?" But if everyone thinks that way, then the King of Sweden ends up... I don't know where I'm going with that simile. But the upshot is... So Team Advil won't play; John will play with the Scoobies.
  • DeeAnn talked about how to choose locations for The Game: not too long a drive between locations, but the drive should be at least ten minutes. Why ten minutes? DeeAnn explained: There should be enough time for a player to eat half a sandwich. If you pop into the car, unwrap your sandwich, and then boom you have to get back out of the car again, then you're stuck re-wrapping your sandwich and you're grumpy. Rich Bragg, hale and hearty, had doubts: Half a sandwich...so that's like 30 seconds, right? Brent Holman suggested packing many teeny-tiny bite-sized sandwiches. Perhaps Gaming scientists will one day discover the sandwich molecule, the smallest possible particle one can point at and say "that's a sandwich". Once we know how long it takes to eat that, we will know the minimum possible distance between Game locations. Or we could stick with the 10-minute drive rule of thumb. Whatever works.
  • During the after-lecture conversation, Jan Chong's voice was kind of quiet. When two people started talking at once, if Jan was one of them, her voice go drowned out. It made me glad that she presented.
  • A couple of newbies showed up, yay! And somehow we didn't scare them too much. Alexandra recruited them for her Leisurely Stroll team. (Or recruited them for something.)
  • On the car ride back, one of the people in the car was the one who had, during Q&A after a lecture, asked Seattle teams taking up space in SF Bay Area Games, but not hosting Games themselves. Other folks in the car took issue with this--Seattle doesn't have a monopoly on freeloading teams. If there's a group to kvetch about, it's teams that play plenty but don't host. I don't think anyone convinced anyone else of anything.

While I'm thinking about it... in terms of where to put an "open source" set of Game-ish programs. Some wiki-ish place to put files might be enough, might not need to set up an open-source project. As Jan points out, each GC is probably going to want to tweak enough behavior such that they might want to read old code, but might want to drastically re-write it. I think Yahoo! Groups has a place to dump files, but only readable by people in the group, so maybe not good to use that.

Labels: ,

Book Report: Infrastructure

Wow, Infrastructure is a great book. You should acquire it and read it. (Here, by "read" I mean "look at the photos". But you can read it, too, if you like.)

It is photographs of "infrastructure": mines, mining equipment, steel mills, utility poles, electrical transformers, dams, power plants, smokestacks, cooling towers, insulators, water towers. There is text explaining what these things are, what they do, and how that determines what they look like. There is industry, there is engineering, there is design, there is beauty.

This book is dangerous to read; it warps your thinking. I keep looking up at utility poles instead of watching where I'm walking.

  • A Becherian typology of water towers, and then a collection of outliers, of playful water tower designs.
  • The process of modern corn milling and its uses
  • The mystery of the fake cattle guards
  • The Rowan Gorilla III
  • Why you pulverize coal before you burn it (with a photo of the gleaming machine that does it)
  • two smokestacks: one venting visible steam, one venting invisible poison gas. Which one do the neighbors complain about?
  • The evolution of the windmill
  • power line splicing sleeves, dampers, and especially Stockbridge dampers
  • Live-wire guys and hot sticks
  • fire department callboxes don't work like you think they do
  • GEO vs LEO communication sattelites
  • Two paragraphs about Botts Dots "In California I once watched a road worker inspecting the Dots with a go-cart. Riding an inch or two off the road surface, he tested each Dot by banging it with a rubber mallet. The ones that rattled or moved were marked for replacement."
  • An illustrated list of bridge truss designs
  • Intermodal freight: history, modern practice.
  • The controversy of re-spraying landfill leachate

That's not all of the topics; that's just a few I picked out while flipping through pages.

This book is a survey; it doesn't dive into any of these topics in depth. But, wow, the breadth. I learned about all kinds of new things to obsess about. You will too. Go read.

Labels: , ,

Puzzle Hunts are Everywhere: The Elementarizer

Yes, it's another blog post about programming & puzzle-hunts. This one isn't a web crawler.

Dr Clue runs team-building puzzle hunts. Alexandra's done some puzzles for them and I've proofread a few. She mentioned that one of these puzzles was tricky to encode--there were fiddly steps, easy to get wrong. And she'd make more of these puzzles in the future. Dr. Clue re-uses puzzle types; this makes sense, since the participants from, say, last week's dentists convention won't participate in next week's gathering of Coca-Cola managers. (I generally can't talk about things I do for Dr. Clue for this reason; but I got permission to talk about this.) Thus, it made sense to automate encoding for this kind of puzzle.

Spoiler warning! If you're reading this because you're about to play in a Dr Clue game, you probably want to stop. I'm about to describe one of the puzzles.

In this type of puzzle, a team receives a series of numbers along with instructions. The instructions say to decode the numbers. They are atomic numbers, replace them with the chemical symbols for those elements. If a number is marked "reverse", then use the chemical symbol backwards. Then substitute some letters; e.g., the instructions might say if you see "P" replace that with "F". Then cross out some letters; e.g., the instructions might say to cross out all the "X"s. Why all of the substituting and crossing-out? Is it to make the puzzle tougher? Nope.

You probably can't encode a message as a straight-up series of chemical symbols. There aren't many symbols to work with; not many have an "E" in them. So you probably need to have players cross out some letters--keep track of those. And you might need to set up some substitutions; keep track of those. It gets tricky. So I wrote a program to help out, a quick-and-dirty Python script.

But that program didn't do much good on my home machine. And it might not do much good to just hand it to the Dr Clue folks: "Here's the program... Oh, but you need to install Python to use it. Whoopsie." But I have a web site. Everyone knows how to use a web site. So I wrapped that script up into a cgi script, a web form. Here is the messy result... uhm, I did warn you about the "quick-n-dirty" aspect, right?

#!/usr/local/bin/python

import cgi
import random

print "Content-Type:text/html\n"

# Stolen from wikipedia: Tab-separated. 
# Each line has Symbol, name, name origin, atomic number, mass, group, period
ELEMENT_DATA = '''Ac    Actinium        corruption of the Greek aktinos         89      [227][1]                7
Ag      Silver  Latin argentum  47      107.8682(2)[2]  11      5
Al      Aluminium (Aluminum)    Latin alumen    13      26.9815386(8)   13      3
   ...snipping out many many lines of data...
Zn      Zinc    German zin      30      65.409(4)       12      4
Zr      Zirconium       zircon  40      91.224(2)[2]    4       5'''

elem_num = {}
sym_dict = {}
known_failures = {}

# ReverseString("Able") returns "elbA"
def ReverseString(s):
  c_list = [c for c in s]
  c_list.reverse()
  return ''.join(c_list)

def MaybeAddToDict(tweaked_sym, sym):
  if tweaked_sym in sym_dict: 
    # If sym_dict['n'] already contains the nice short 'N', 
    # then don't add sym_dict['N'] <- 'Na'.  Short is nice.
    # Since we cycle through in order from shortest to longest,
    # just check the first element:
    if len(sym) > len(sym_dict[tweaked_sym][0]): return
  else:
    sym_dict[tweaked_sym] = []
  sym_dict[tweaked_sym].append(sym)

# Initialize our dictionary of letters->symbols.  Pass in message to be encoded.
# (We need the message because: if our message contains no "X", then "Xe" is 
# a valid symbol for encoding "E".  
def InitDicts(message):
  elem_num.clear()
  sym_dict.clear()
  known_failures.clear()
  for sym_len in [1, 2, 3]:
    for line in ELEMENT_DATA.splitlines():
      fields = [s.strip() for s in line.split('\t')]
      (sym, name, origin, number, mass, group, period) = fields
      if not len(sym) == sym_len: continue
      elem_num[ReverseString(sym)] = '-' + number
      elem_num[sym] = number
      clean_sym = sym.lower()
      tweaked_sym = ''.join([c for c in clean_sym if c in message])
      if not tweaked_sym: continue
      MaybeAddToDict(tweaked_sym, sym)
      if len(tweaked_sym) > 1:
        MaybeAddToDict(ReverseString(tweaked_sym), ReverseString(sym))

# We tokenize the message by recursively calling this function.  The call
# stack to tokenize "heal" would look like
#  HelperFunc("heal", "")
#    HelperFunc("al", ".he")
#      HelperFunc("", ".he.al")
def HelperFunc(todo, sofar):
  if todo in known_failures: return (False, sofar, known_failures[todo])
  if not len(todo): return(True, sofar, '') # cool, we finished
  bottleneck = ''
  for maybe_len in [3, 2, 1]: # try to use the next three letters. or 2. or 1.
    maybe = todo[:maybe_len]  
    if maybe in sym_dict:
      (success, tokenization, bottleneck) = HelperFunc(todo[maybe_len:], '.'.join([sofar, maybe]))
      if success: return(True, tokenization, '')
  if not bottleneck:
    print todo, '#', sofar
    bottleneck = todo[:3]
  known_failures[todo] = bottleneck
  return (False, sofar, bottleneck)

def PrintRows(tokens, encoded_tokens):
  TOKENS_PER_ROW = 15
  while 1:
    if not tokens: break
    line_of_tokens = tokens[:TOKENS_PER_ROW]
    tokens = tokens[TOKENS_PER_ROW:]
    line_of_codes =  encoded_tokens[:TOKENS_PER_ROW]
    encoded_tokens = encoded_tokens[TOKENS_PER_ROW:]

    for token in line_of_tokens: print "%4s" % token,
    print
    for code in line_of_codes: print "%4s" % code,
    print
    for code in line_of_codes: print "%4s" % elem_num[code],
    print
    print

def ReportSuccess(message, tokenization, substs):
  print "<p>Found an encoding."
  print "<p>(If you try again, you might get a slightly different encoding."
  print "    We use some randomness."
  tokenization = tokenization[1:]
  tokens = tokenization.split('.')

  message_with_subst = message
  for subst in substs:
    message_with_subst = message_with_subst.replace(subst[1], subst[0])

  letters_to_cross_out = ''

  encoded_tokens = []
  for token in tokens:
    for count in range(0, 10):
      code = random.choice(sym_dict[token])
      if elem_num[code].startswith('-'): continue
      if not [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]: break
    new_nots = [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]
    if new_nots: 
      letters_to_cross_out += ''.join(new_nots)

    encoded_tokens.append(code)

  sorted = [c for c in letters_to_cross_out]
  sorted.sort()
  letters_to_cross_out = ''.join(sorted)
  print '<p style="font-weight: bold;">Letters to cross out: %s ' % letters_to_cross_out.upper()
  print

  if substs:
    print '''<p><b>Used some substitutions.</b>
             (Double-check these: There might be "transitives" you can
              collapse.  For example "('z', 'g'), ('q', 'z')" really
              means "substitute Q for G") <b>%s</b> ''' % substs

    print "<p>Message w/subst: %s " % message_with_subst

  print "<pre>"
  PrintRows(tokens, encoded_tokens)
  print "</pre>"

  print "<p>Just the numbers (&quot;-12&quot; means 12 reverse)<br><b>"
  print ' '.join([elem_num[elem] for elem in encoded_tokens])
  print "</b>"

def ReportFailure(message):
  print '<p style="color:red;">Failed to elementarize message'
  print '<p>General hints: Q, X, and J are generally difficult to encode'
  

def GenerateForm(message):
  message = cgi.escape(message)
  message = message.replace('"', "&quot;")
  print '''
  <form action="elemental_encoding.py">
  <input name="msg" value="%s" size="100">  <input type="submit">
  </form>
''' % message

def GenerateHead(message):
  KOSHER_CHARS = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
  title = ''.join([c for c in message if c in KOSHER_CHARS])
  print '''<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
 <head>
  <title>Elementarizing</title>
 </head>

 <body>
 <h1>Elementarizing</h1>
''' 

def GenerateFoot():
  print '''
</body>
</html>
'''

def AttemptTokenizationHelper(message):
  InitDicts(message)

  print "<p>Attempting to elementarize message &quot;%s&quot;" % message
  print "<pre>"
  (success, tokenization, bottleneck) = HelperFunc(message, '')
  print "</pre>"
  print "<p>Finished elementarizing."
  return (success, tokenization, bottleneck)

def AttemptTokenization(message):
  substs = []
  for attempts in range(0, 20):
    (success, tokenization, bottleneck) = AttemptTokenizationHelper(message)
    if success:
      return (True, tokenization, substs)

    print '<p style="color: red;">Tokenization attempt failed.'
    print '<p>We think we had trouble with: <i>%s</i>' % bottleneck
    subst_candidates = [c for c in 'abcdefghijklmnopqrstuvwxyz' if not c in message]
    if not subst_candidates:
      print "<p>Can't substitute: already using all letters."
      return (False, tokenization, substs)
    swap_in = random.choice(subst_candidates)
    if swap_in in "jqxz": 
      swap_in = random.choice(subst_candidates)
    if swap_in in "jqx": 
      swap_in = random.choice(subst_candidates)
    if swap_in in "jx": 
      swap_in = random.choice(subst_candidates)
    swap_out = bottleneck[0]
    if 'x' in message and random.random() > 0.5:
      swap_out = 'x'
    message = message.replace(swap_out, swap_in)
    substs.insert(0, (swap_in, swap_out))
    print "<p>Let's try a substitution: %s for %s" % (swap_in, swap_out)
        
  print '<p style="color: red;">Too many Tokenization attempts failed. I give up'
  return (False, tokenization, [])
      

def Main():
  form = cgi.FieldStorage()

  if "msg" in form:
    message = form["msg"].value
  else:
    message = "Find the frog statue in the lobby."

  GenerateHead(message)

  GenerateForm(message)

  message = ''.join([c for c in message if c.isalpha()])
  message = message.lower()

  (success, tokenization, substs) = AttemptTokenization(message)

  if success:
    ReportSuccess(message, tokenization, substs)
  else:
    ReportFailure(message)

  GenerateFoot()

  exit

if __name__ == "__main__":
  Main()

Wow, I'm kind of embarrassed of this script now. To get the mapping from chemical symbols to numbers, I just pasted in a blob of data--most of which I don't use--from Wikipedia and parse it each time the script runs; it would be more elegant to start with just the right data. This program does a lot of extra work, re-computing things it's already figured out. It could be faster... but it takes a few seconds, and maybe that's fast enough. It's messy... but it's a few months after I originally wrote it, and I can look at it and figured out how it worked; that's better than I can say for my old Perl scripts.

Labels: ,

Puzzle Hunts are Everywhere: an elegant Mastermind Crawler

Last time, I wrote about a brute force web crawler. This time, I'm writing about an elegant web crawler. As you would expect from elegant code, I didn't write it.

The Pirates BATH game had a pregame website. Teams could log in to the web site. There was a web form which I'd programmed before I'd dropped out of Game Control. This web form allowed teams to "search for treasure": enter a string of text. Game Control gave them some strings of text that they could enter: entering one of those into the web form yielded puzzles. When a team solved the puzzle, the answer was a phrase: entering the phrase into the web form yielded a hint which would be useful during the upcoming game.

If they entered text that wasn't a puzzle and wasn't an answer, they were told that they'd found nothing. And if they paid attention, they also noticed some black dots, some white dots, and some xs. These were a "Mastermind" puzzle. If they entered a nonsense phrase, a program figured out which "useful" word was closest; it would then display one white dot for each letter in the correct place; a black dot for each correct letter in the wrong place; an X for each incorrect letter. So if "BELOW" was a word and someone entered "BLOW", they'd see a white dot (for the B), three black dots (for L, O, and W), and an X (for the E).

This was the way to find one game hint: no puzzles solved to the correct word for this hint. But four puzzles gave words that didn't actually yield hints--but instead were just near to the word to enter for this special hint.

What if a team just tried to guess every possible text string? They could guess A B C ... Y Z AA AB AC ... ZY ZZ AAA AAB AAC ... Of course, that would take a long time. It would probably take less time to just solve the puzzles.

So I was kind of surprised when my pager started buzzing one day: BATH Game Control was sending me messages: Team Scoobies had set up a bot to crawl the server! The Scoobies had found puzzles that they couldn't have found!

I looked over the logs. There was a program crawling the system, but the Scoobies weren't running it. Team Blood was running it. The bot was not brute-forcibly checking every possible text string. It was playing Mastermind!

It would guess "A". If it got back a white dot, it knew that at least one word started with A. If it got back a white dot, it knew that at least one word started with A. (A white dot meant right letter in right place.) Next it would try try AA AB AC AD AE ... AZ. If AA returned just one white dot (not two), then the bot knew no words started with AA (e.g., no word was AARDVARK). So it never tried AAA AAB AAC... Thus, it didn't need to check so many things. Thus, elegance.

When I reported my findings to Game Control, they decided that this thing must be stopped. Though it was elegant, what if it allowed the team to bypass puzzles? Game Control figured that this would be unfair.

Hmm, how to stop the bot without disrupting other teams? How did the bot work? Team Blood was running it. Rich Bragg captained Team Blood. I worked at the same company as Rich. Maybe he'd written this program while at work? And maybe he'd left the program somewhere where I could find it? I thought about it: If I were Rich and I'd written this program at work, where would I have put the source code? I looked there: no program. Then I tried my second guess and saw a file: piratebath.py. Bingo. It was a web crawler, a very specialized web crawler.

#!/usr/bin/python2.4

import cookielib
import re
import time
import urllib2

def Login():
   print "Logging in..."
   cj = cookielib.CookieJar()
   opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
   opener.open("http://www.piratesbath.com/log_in.php",
               "access_login=blood&access_password=bythepint")
   return opener

def NumMatches(html_data, substring):
   matches = re.findall(substring, html_data)
   if not matches:
       return 0   
   return len(matches)

def NumLettersCorrect(html_data):
   return NumMatches(html_data, "dot_white.gif")

def FoundTreasure(html_data):
   return NumMatches(html_data, "No treasure found.") == 0

def SearchOne(opener, results, query):
   data = opener.open("http://www.piratesbath.com/search.php?q=" +
query).read()
   letters_correct = NumLettersCorrect(data)
   print "Query:", query, "had", letters_correct, "of", len(query), "letters"
   all_correct = letters_correct == len(query)
   if all_correct and FoundTreasure(data):
       print "Found:", query              
       results.append(query)
   return all_correct

def SearchAll(opener, results, query_prefix = ''):
   alphabet = list('abcdefghijklmnopqrstuvwxyz')
   for letter in alphabet:
       if SearchOne(opener, results, query_prefix + letter):
           SearchAll(opener, results, query_prefix + letter)

def Run(query_prefix = ''):
   opener = Login()
   results = []
   SearchAll(opener, results, query_prefix)
   print "Results: ", len(results), "words starting with '%s'" % query_prefix
   for word in results:
       print word      

Run()

Aha, the code was checking for text in the page: dot_white.gif and No treasure found. If I just added some visible-to-bots-but-invisible-to-humans text like that, I could fool the bot into mis-counting white dots or what-have-you. So that's what I did. (Security-minded folks in the audience might say: uhm, but what about stopping the general case of bots? Yeah, I set up code for that too, but wanted to let Game Control configure it to say how much guessing was "too much", and that took a while. Fooling Rich's bot--that was a quick-n-dirty fix.)

(I notice that this code imports the "time" module, but doesn't use it. I wonder if an earlier version of code politely "slept" a little between queries--but maybe Rich figured out that the server was waiting a second between responding to a team's queries anyhow, and that the sleep was thus not so useful...)

Rich noticed when his bot started generating garbage results. He mailed Game Control to make sure there were no hard feelings. Game Control asked him to stop running it, and he did. He said that this script was basically another monitor: it alerted the team to the presence of new puzzles; thus no-one had to go re-check the piratesbath.com web site each day.

In hindsight, when I programmed that web form, we should have used it only for entering answers, not for getting puzzles. We should have used some other way to distribute puzzles. Thus, a team could monitor that to look for puzzles and Game Control wouldn't need to panic that someone was bypassing the puzzles to get the answers.

Labels: , ,

Puzzle Hunts are Everywhere: Brute Force Web Quiz Crawler

It's another blog post about how web programming skillz can aid in game-ish activities.

A couple of years ago, Team XX-Rated hosted the Paparazzi Game. I was sorry that illness made me miss the game itself. Fortunately, the illness didn't strike until after the pre-game, so I was able to participate in that. The pre-game gave me an excuse to use glitter glue. Also, there were puzzles.

The Online Dating Style Quiz Puzzle was pretty mysterious to me. It was a multiple-choice quiz; you could submit a set of choices and get a grade on your dating style. Later on, my team-mates showed me the clever way to solve this puzzle. But I didn't spot that on my own. I wasn't sure that I could spot cleverness in this puzzle on my own. I could tell it had some references to tabloid celebrities. But I knew almost nothing about tabloid celebrities.

I tried filling in the quiz a few times. Each time, I just got the grade "Honestly, you're pretty lame and none of us on staff would want to date you. Maybe you should re-read the questions.". But one time, I got a different grade: "You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction." I hoped that might inspire a clever approach, but it didn't. But by now I was pretty sure that the grade was based solely on my multiple-choice choices: there wasn't something weird going on based on my history of past attempts, timing, or other factors.

So I used brute force: try all possible combinations of choices. Or, rather, try many of them. I wrote a little program that would cycle through choices and log those that got a non-"lame" grade. To avoid hogging the server, the program would "sleep" for a second between queries. Since there were many, many combinations of choices to consider, the program would take a long time: I planned to let it run overnight but it still wouldn't have time to try every choice. But I didn't especially want it to finish. I wanted enough data back so that I could understand the problem better, maybe figure out the clever bit.

So, the script. This is not that script; I lost track of that script. This script is a reconstruction; it's probably similar to that script.

import time
import urllib

ABC = "abc"
QBC = "qbc" # question 5 has different choices

QUIZ_URL = "http://www.xx-rated.org/xxtraonline/quizzes.php"

LOG_PATH = "/home/lahosken/log.txt"

q = [None,'','','','','','','','','','', '']

# Don't log the whole page; it's mostly boilerplate.  Just log the interesting part.
def GetUsefulParts(s):
  (before, marker, after) = s.partition('<!-- InstanceBeginEditable name="content" -->')
  if after: 
    s = after
  else:
    s = before
  (s, _, _) = s.partition('<!-- InstanceEndEditable')
  retval = ""
  for line in s.splitlines():
    line = line.strip()
    if not line: continue
    if line == '<p><img src="images/quizzes.gif" width="435" height="90"></p>' : continue
    if line == '<div align="center">': continue
    if line == """<h1><span class="style8">What's your dating style?</span></h1>""": continue
    if line == '<font size="+2" color="#000000">': continue
    if line == '</font>': continue
    if line == '<h3><a href="quizzes.php">Try again!</a></h3>': continue
    retval = retval + line
  return retval

for q[1] in ABC:
  for q[2] in ABC:
    for q[3] in ABC:
      for q[4] in ABC:
        for q[5] in QBC: # careful
          for q[6] in ABC:
            for q[7] in ABC:
              for q[8] in ABC:
                for q[9] in ABC:
                  for q[10] in ABC:
                    for q[11] in ABC:
                      time.sleep(1)
                      qlist = [('q'+str(ix), q[ix]) for ix in range(1,12)]
                      qlist.append(('Submit', "What's my style?"))
                      qlist.append(('force', 'brute'))
                      post_data = urllib.urlencode(qlist)
                      page_contents = urllib.urlopen(QUIZ_URL, post_data).read()

                      if page_contents.find("you're pretty lame") > -1: continue

                      log_file = open(LOG_PATH, 'a')
                      log_file.write(post_data)
                      log_file.write('\t')
                      log_file.write(GetUsefulParts(page_contents))
                      log_file.write('\n')
                      log_file.close()

The next morning I had a lot of data: choices which had produced non-"lame" grades.

q1=a&q2=a&q3=a&q4=b&q5=c&q6=b&q7=c&q8=c&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction. </p>
q1=a&q2=a&q3=a&q4=c&q5=b&q6=b&q7=a&q8=a&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're pretty spicy although not as much as you could be. You may want to reconsider some of your choices on the next date.</p>
q1=a&q2=a&q3=a&q4=c&q5=b&q6=b&q7=a&q8=b&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction. </p>
q1=a&q2=a&q3=a&q4=c&q5=b&q6=b&q7=a&q8=c&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction. </p>

This was interesting. That "pretty spicy" grade suggested that guess was pretty close. Also, I saw that the (a) choice for question 11 showed up in many good answers, as did the (b) choice for 10, the (c) choice for 9. Probably those were correct choices. (The recurring (a) choice for question 1 is less exciting: because of the way I'd ordered the guesses, all guesses that night had (a) for question 1.) I stared at those correct answers for a while, trying to see the clever pattern. That didn't get me very far.

So I used this information to tweak the brute force script. I changed some of the for loops so that instead of considering each choice (a, b, or c) it only considered the correct choice. (Yeah, there were more elegant ways I could have coded it; this was an easy edit.)

                for q[9] in 'c': # was ABC:
                  for q[10] in 'b': # was ABC:
                    for q[11] in 'a': # was ABC:
...and re-ran the script. Now it was wasting less time on bad choices for those questions. I let it run a while longer, looked at the output again, used it to narrow down a few more choices. Ran it again. The next time I looked through the logs, there was another kind of grade: one that let me know that the script had found the perfect answer.

No doubt it would have been more satisfying to solve this puzzle with cleverness than by brute force. But brute force can be fun, too.

Labels: ,

Puzzle Hunts are Everywhere: Simple Website Monitor

Waiting for the bus, Jonas asked me: "Why did you start beeping during that tech talk?"

People at work occasionally start beeping. We're an internet company with many servers. When servers have problems, system administrators' pagers start going off. But I'm not a system administrator. I'm a technical writer. When I go on vacation, I don't tell people "In case of emergency, you can reach me at ____", I write "In case of a documentation emergency (ha ha), you can reach me at _____". And it's funny. At least, it was funny the first time.

And the people at this tech talk probably weren't likely to get paged. It was a tech talk on an open-source library of computational geometry functions. You don't really see these people getting paged with... I dunno, some errant line segment getting into a data set, coincidentally totally vertical, its infinite slope causing numbers to spin out of whack or...

I don't know where I'm going with that. That doesn't happen.

I 'fessed up. That pager signal didn't come from a work system. I'd set up a computer program to monitor coed astronomy's web site. They were going to host a puzzle hunt and their web site would announce when sign-ups were possible. I wanted to know when that happened: if their game didn't have many slots for teams, I wanted to sign up before those slots filled up.

It's pretty easy to set up this kind of monitoring if you have a Unix machine on the internet. (I bet that very few people will be interested in this blog post. Half of the bay area puzzlehunters are software developers who will wonder why I'm describing something so obvious; the rest don't program and are about to get scared away when I show them source code. But maybe I'll show them that, if they're going to get into programming, that this is a pretty achievable task to take on.)

On a Unix machine, you can set up a "cron job". You can tell the machine to run a program once every few minutes. (You can set up cron jobs to run at strange times. I set up this cron job to run on prime-numbered minute offsets from the hour--because I was in a silly mood.)

What program did I set up? I set up a simple python script. This script checked coed astronomy's web page and compared its contents to a copy I'd downloaded earlier. If it noticed a change, it mailed my pager. Like I said, this script is simple, if you know what you're looking for. Python is a nice language to use because, for any given task, someone has probably written a library of functions to help you. The bad news is that it can find a while to find the library that you want. In this case, I wanted urllib2 to fetch the web page contents.

import os
import urllib2

# download the page
pagecontents = urllib2.urlopen("http://coedastronomy.org/sf/").read() 

# compare it to previously-saved file, "golden" copy of the page
# contents which I downloaded earlier.
goldenfile = open("/home/lahosken/golden.html")
goldencontents = goldenfile.read()

# If there's a difference, OMG page me by sending mail to my pager
if goldencontents != pagecontents:
  os.popen("mail page@lahosken.san-francisco.ca.us -s IT_IS_HAPPENING < /dev/null")

This script watches one page. I've done stranger things to watch a web site, running wget in spider mode and then recursively diffing the resulting directory to a previously-generated "golden" directory tree. And there are stranger things.

Anyhow, my pager went off during a tech talk. And it kept going off because I was so busy signing up for the game that I didn't immediately shut the script down. But I eventually did. Hopefully, everyone at the tech talk thought I was a pager-wearing badass entrusted to rescue foundering servers. But Jonas wasn't fooled. Maybe none of them were.

Labels: ,

Site: Gratuitous Photos of 17th Street

(Am I the only one who checked the coedastronomy site in case they meant March 3 Greenwich time?)

I can post an admission that I'm half-done with a handful of projects, but I don't have to like it. I finish projects! Or I give them up! (Right now, I am making a "chop" hand gesture to emphasize my willingness to give up on stalled projects.) So after posting that blog item, I forced myself to get my act together. (Right now I am gritting and baring my teeth to illustrate my renewed strength of purpose.)

Thus: a page of photos of San Francisco's 17th Street. A couple of weeks back, when it stopped raining, I walked the length of 17th Street. I snapped a bunch of photos. And then for two weeks, I didn't get around to captioning/uploading them. Instead, I just groused about not enjoying being in the middle of projects. Today, I finally finished slapping some captions on. Allez-oupload! I will now stop worrying about those photos.

Last night, I wasn't doing photos. Last night, I finished off my Erlang experiment. I was trying to learn about Erlang concurrency. And sure enough, concurrency is indeed easy with Erlang. Here's my program's, uhm, central dispatch control loop thingy:

queue(Migrant) ->
    receive
 TradedList ->
     RanList = cull_unhealthy(TradedList),
     {NewMigrant, NewList} = judge(Migrant, RanList),
     spawn(critter, run_n_trades, [self(), 
       clear_state(NewList), 500]),
     queue(NewMigrant)
    end.

That "spawn" spawns a new thread, a thread that executes a function called run_n_trades. That "receive" receives a message.

run_n_trades(Queue_PID, List, 0) ->
    Queue_PID ! List;
run_n_trades(Queue_PID, List, N) ->
    run_n_trades(Queue_PID, run_trades(List), N-1).

It's not obvious from this code, but run_n_trades does a lot of data crunching and then sends the results back to the, uhm, central dispatch control loop thingy. (It's that mysterious Queue_PID ! List blob.) That's sending back the data structure that the queue will receive. How does this message-passing benefit me? Well, I actually had two threads doing big data crunching at the same time. Each one would crunch, crunch, crunch, then send results back to the queue. The queue could combine their answers. (In this case, the "combine" was allowing one of the genetic-algorithm "critters" to migrate from one batch of critters to... whichever batch was next passed back to the queue. (But this parenthetical remark probably doesn't make much sense unless you're looking at the whole program, which isn't really interesting enough to be worth it.))

What did I learn from all this?

  • Erlang is not so bad. I am not fond of languages designed by people in love with recursion. Among those, Erlang is not so frustrating as many.
  • Erlang concurrency is indeed easy. If I were writing a program whose main challenge was coordinating many threads, watching a little data on each thread, I'd be glad to have Erlang in my bag of tricks. Because those problems can be really hard, and Erlang has nice language structures for these.
  • Erlang was not a great choice for my sample program's purpose: yet another genetic-algorithm prisoner's dilemma fun-fest. Part of Erlang's safety comes from discouraging you from changing the value of variables. Instead, you're supposed to create new variables whose values don't change. (Is "variables" even the right word?) So if you're moving around little structs, there's some extra copies but you don't sweat it much. But if you make many tiny changes to a big array of structs... Erlang isn't a great choice.
  • But it does suggest some ways to make thread-safe programming safer in, you know, real programming languages like C++. Maybe you make a rule saying that cross-thread messages pass in copies, not originals. That's one extra copy, but one extra copy maybe isn't so bad. And at least everyone knows which thread is responsible for which data structure.

Am I rambling? Sorry, I'm rambling. I'm just so happy that I have an excuse to stop thinking about Erlang now that I did what I set out to do. And I'm glad I finally uploaded those photos. So... I'm rambling. You shouldn't have to listen to me ramble. Here, go look at photos instead.

Labels: , ,

Not-Puzzle Hunts are Everywhere, but especiially at Waller and Steiner

On my way back home from the library, I encountered a nicely-made suggestion box at the Northeast corner of Waller and Steiner. Signage encouraged passers-by to write suggestions on index cards and place them within the box. There were no index cards left, but then there was nothing preventing you from writing suggestions on other things. There were two slots for suggestions: one labeled "Public", one "Private". Each slot was on a drawer; the private one was locked, the public one wasn't. I read the public suggestions. Many of them suggested road and traffic improvements. Did their authors think that this was a municipal suggestion box?

A label on the side of the box read "sfzero.org" and thus I found out that this box is part of a game and that one can read suggestions from boxes around the world. I have not joined this game; my shoulders already strain beneath the weight of a few half-done projects. And yet... and yet, after some reading in their web site, I believe that this game has merit. And if you're in San Francisco swinging past the Lower Haight any time soon, I suggest you take a few minutes to visit that suggestion box. (I think I got that intersection right. I'm sure it was on Waller, close to Fillmore.)

Labels: , ,

Book Report: On Food and Cooking

Here is the recipe I follow for tamales: 1. remove two tamales from package. 2. place in pot with steamer rack 3. place on high heat 4. get distracted by computer stuff, lose track of time 4. when apartment is full of smoke, the tamales are done. Remove from heat and hold out window. 5. remove tamales from corn husks, cover with salsa, and serve.

My cooking skills have atrophied in recent years.

I think that explains why I didn't finish reading McGee's On Food and Cooking. The parts that I read were pretty interesting! (Usually when I don't finish a book, it's because I was bored.) I read the chapter about dairy products. It discussed the history of humanity's use of animal milk. It was fascinating. It discussed the chemical transformations by which cream turns into whipped cream and other delights. That was fascinating, too.

But when I tried to remember these things a few hours later, they'd already leaked out of my brain. There were parts of my brain that were once dedicated to thinking about cooking, I'm sure. So I could, say, sit down to surf the web but a couple of neurons would keep track: has it been 15 minutes since you set that stuff cooking? Those neurons went away; I lose track of time. I recommend this book for most people, but it wasn't for me.

Labels: , ,

home |