100% Organically Farmed Software

The book The Mythical Man-Month pointed out the organic nature of software development in 1975

...The building metaphor has outlived its usefulness... If, as I believe, the conceptual structures we construct today are too complicated to be accurately specified in advance, and too complex to be built faultlessly, then we must take a radically different approach.

Let us turn to nature and study complexity in living things, instead of just the dead works of man. Here we find constructs whose complexities thrill us with awe. The brain alone is intricate beyond mapping, powerful beyond imagination, rich in diversity, self-protecting, and self-renewing. The secret is that it is grown, not built. So it must be with our software systems.

—Fred Brooks

The book The Pragmatic Programmer nicely compares software development to gardening in its section on Refactoring (in 1999, so maybe it's taking 25 years for the idea to take hold):

...Rather than construction, software is more like gardening—it is more organic than concrete. You plant many things in a garden according to an initial plan and conditions. Some thrive, others are destined to end up as compost. You may move plantings relative to each other to take advantage of the interplay of light and shadow, wind and rain. Overgrown plants get split or pruned, and colors that clash may get moved to more aesthetically pleasing locations. You pull weeds, and you fertilize plantings that are in need of some extra help. You constantly monitor the health of the garden, and make adjustments (to the soil, the plants, the layout) as needed.

—Andrew Hunt and David Thomas

Back when I programmed on an Apple ][, programming didn't feel organic. It felt solid. When my program ran, nothing else ran on that machine; my code was totally in control. There was no OS upgrade coming along next month to plan for. My programming language was in ROM, wouldn't change out from under me. Those programs I wrote were solid... solid and brittle.

Nowadays, we write software that bends instead of breaks. When things change, we bend our software to fit. But we don't always get it right. It fatigues. So we talk about an organic model, software that is meant to grow. We try to create systems that live. Occasionally, we get it right. Often it feels like we don't achieve living, flexible software—just something solid with a couple of hinges in it. We're still figuring this stuff out.

Labels: ,

Book Report: the Pragmatic Programmer

This book, The Pragmatic Programmer is difficult to find by searching, since there's also a series of books by that name. So maybe I'll give the full title here: The Pragmatic Programmer / from journeyman to master. That subtitle sounds pretty highfalutin', at odds with the "pragmatic" in the main part of the title. But it does fit with this book's approach: treating programming as craft, trying to give direction to some folks who want to hone their craft.

This book is a survey of "best practices". It doesn't go into much detail on any one of these practices. So... for example... If you picked up this book thinking "Command Line Interfaces are relics of a stupider age," then you're going to see a section of this book which exposes you to the idea that Command Line Interfaces are awesome--but it won't give you enough detail to reverse your preconceived notion.

I'm older than dirt and have been in some well-run engineering organizations. Thus, I have already been exposed to these practices. I feel kinda sorry for someone who has to learn about these things from a book. If you've been in an engineering group that has a nice testing framework you appreciate it—especially if you then go to another group that doesn't have one. But if you've never used a testing framework, and if you just have this book telling you that it's a darned convenient thing, and you're going to the trouble of setting it up... I dunno if a book would convince me to take the trouble.

The authors are technologically aware, and have a pretty good web site about the book, with a table of contents and a list of "tips". If you're interested in reading a survey of best practices and want to make sure that this book contains the best practices you want to learn about, that table of contents might be pretty interesting to you.

Labels: ,

Book Report: The Mythical Man-Month (leftover cheap joke)

Last week, I posted a rough draft of a study guide for The Mythical Man-Month. I left a cheap joke out of that study guide. That study guide was serious business and had no room for cheap jokes. So I'm posting this separately.

One part of the book discusses how to divide up a programming team's labor among a few people, letting the best programmer concentrate on programming, while offloading minutiae to others. They use the analogy of a surgical team:

A proposal by Harlan Mills offers a fresh and creative solution. Mills proposes that each segment of a large job be tackled by a team...organized like a surgical team... one does the cutting and the others give him every support that will enhance his effectiveness and productivity. ... Can it work? Who are the the anesthesiologists and nurses on a programming team, and how is the work divided?

Because I'm a technical writer, I was flattered that this proposed team has a role for a technical writer: the Editor.

The editor. The surgeon is responsible for generating the documentation—for maximum clarify he must write it... The editor, however, takes the draft or dictated manuscript produced by the surgeon and criticizes it, reworks it, provides it with references and bibliography, nurses it through several revisions, and oversees the mechanics of production.

Why mention a tech writer? Maybe Brooks was thinking about this because he was writing a book. I have another theory, though: he wanted to answer the question he raised earlier: Who is the anesthesiologist of the surgical team? Anyone who's tried to stay awake while reading my documentation can answer that question. Uhm, they can answer it after you wake them up again, I mean.

Labels: , ,

Book Report: The Mythical Man-Month (a Study Guide)

If this book report seems a little heavy on the questions? It's because it's the first draft of a study guide? For people reading the book? Oh man it's way too long? But hey give me a break, it's a first draft?

The Mythical Man-Month

This is a study guide for folks reading the Mythical Man-Month. When reading a book like this, it's useful to have some questions buzzing around in the back of your brain. In case you don't already have enough of those, this study guide provides a few extra.

This book is about managing large software projects. Fred Brooks wrote it in 1975; back then, there wasn't a big literature about such things. There hadn't been many such projects. Many of them had been debacles; nobody wanted to brag about them later. Actually, Fred Brooks had presided over something of a debacle himself--his project was famously late. This book is a post-mortem: why things went wrong, lessons learned, how we can (or why we can't) avoid similar stumbles.

Pick up a recent edition of the book--the 20th anniversary edition includes chapters at the end pointing out which parts of the original have [not] withstood the decades. E.g., the Waterfall Method of project planning isn't SOP anymore. In the preface, he also points out something important about software project management reports in general:

In preparing my retrospective and update of The Mythical Man-Month, I was struck by how few of the propositions asserted in it have been critiqued, proven, or disproven by ongoing software engineering research and experience.

As you read this book (or anything), always be on your guard for snake oil, untested assertions, and handwavery. Some techniques will help your team, some will harm it. Learning to recognize which are which is an important part of leadership.

Preface to the Original edition

Brooks gives us the whole book in a nutshell in the Preface.
I wanted to explain the quite different management experiences encountered in System/360 hardware development and OS/360 software development.

...

Briefly, I believe that large programming projects suffer management problems different from small ones, due to division of labor. I believe the critical need to be the preservation of the conceptual integrity of the product itself.

Got that? He's telling us that Management is important. He's saying that Leadership is important. He's saying that Design is important. 120 people produce no more than 12 do if they're not working towards the same goal. And unless you can help them all see where that goal is, they will not all work towards it. He's not going to teach you new algorithms, tools, none of that. This is people skills. He's reminding you that this stuff matters.

The Tar Pit

Brooks' OS/360 debacle was largely a schedule slip. Here, he points out that junior programmers are optimistic about how long it takes to implement part of a large system. They don't think about the (considerable) time it takes to provide the correctness and polish they'll need to be part of a real-world product. the (considerable) time to design and implement the interface between their system component and the rest of the system;

Try to remember your first projects working on larger software systems. It took longer to get things done than your quick-and-dirty hacks for school homework assignments, didn't they? Where did the time go?

As a rule of thumb, I estimate that a programming product costs at least three times as much as a debugged program with the same function... A programming system component costs at least three times as much as a stand-alone program of the same function.

If you're not sure whether a junior programmer has considered these things, their schedule guesstimate might be 9x optimistic. When you get an estimate from a co-worker, how can you find out whether they've allowed for correctness and API design?

In practice, actual (as opposed to formal) authority is acquired from the very momentum of accomplishment.

The Mythical Man-Month

Here, Brooks points out more things that go wrong with managing schedules. They're hard to estimate. They're hard to boost, too. This chapter gives us Brooks' Law: "Adding manpower to a late software project makes it later." When you add new people to a project, for the first few months, they slow you down as you teach them things and they bump into things. Only later when they're up to speed might they help your schedule. Unless they slow down trying to manage them.

Why would anyone ever think to throw people onto a project at the last minute? Maybe a clue is the context in which this book was written, IBM in 1975:

I wanted to explain the quite different management experiences encountered in System/360 hardware development and OS/360 software development.

Maybe back in 1975, most of IBM's experience was with hardware. If design went slowly, maybe there was a temptation to make up time with more people: run extra shifts at the manufacturing plant.

The need for intercommunication slows everyone down. Can we ease this by designing better interfaces? Can we design our software architectures so that not every engineer needs to pester every other engineer to learn every interface?

Gutless estimating It's bad enough that the junior engineers on your team underestimate how long it will take them to accomplish something. Sometimes an executive does, too. They lean on you, that's no fun. Something that happens at often at some companies: someone pre-announces an unrealistic date to the world. Do you have any stories like this? If not, ask some veteran programmer to tell you some. Maybe buy that veteran a drink first--they tend to be sad stories. In hindsight, could the teams have dodged these disasters?

The Surgical Team

This chapter explores the idea of a small programming team. Some parts of this idea have aged better than others.

Aged well: team of people with complementary roles and diverse skill sets.
Aged less well: some of the suggested roles now seem absurd.

Brooks thinks that every senior programmer needs a secretary. And an editor (tech writer) and another secretary for the tech writer. Brooks was a writer in a time of typewriters, large presses, and other awkward tools. Nowadays instead of giving the senior programmer so many people to manage correspondence and documentation, we have Gmail and the wiki. The "program clerk" role largely went away when revision control systems came along—a few admins can "clerk" the "programs" of many, many engineers.

Suppose you wanted to plan a "surgical team" for your organization in the modern day. What roles would it have? What assumptions do you need to make about how this team would fit in to the organization at large?

Aristocracy, Democracy, and System Design

If one person designs a system, that design captures only one person's knowledge. For a large system with many differrent pieces, some pieces' designs will be clunky.

If many, many people design a system, that system never gets designed.

Brooks praises Reims Cathedral:

The joy that stirs the beholder comes as much from the integrity of the design as from any particular excellences. As the guidebook tells, this integrity was achieved by the self-abnegation of eight generations of builders, each of whom sacrificed some of his ideas so that the whole might be of pure design.

Invoking a cathedral as metaphor might set of alarm bells in your head. Didn't ESR use that same metaphor in his essay The Cathedral and the Bazaar? There ESR praised the "bazaar" over the "cathedral": harnessing hundreds of open source folks to work on a project instead of a small, limited group. But that essay points out that some tasks scale better than others. He wrote

So does the leader/coordinator for a bazaar-style effort really have to have exceptional design talent, or can he get by through leveraging the design talent of others?

I think it is not critical that the coordinator be able to originate designs of exceptional brilliance, but it is absolutely critical that the coordinator be able to recognize good design ideas from others.

Brooks and ESR would agree that you can't just let loose a horde of 150 programmers and hope that a great design emerges. You need some architects with "taste".

How much of a system's design should the architects figure out (leaving details to the horde)? Brooks thinks it's the interfaces: the APIs and UIs:

By the architecture of the system, I mean the complete and detailed specification of the user interface. For a computer this is the programming manual. For a compiler it is the language manual. For a control program it is the manuals for the language or languages used to invoke its functions. For the entire system it is the union of the manuals the user must consult to do his entire job.

Suppose you're Fred Brooks and you have 15 great programmers and 150 good ones. How might you divide tasks between them? What are some ways you could set up bottom-up feedback without drowning in noise?

The Second-System Effect

Have you ever worked on a "second system"? You work on a successful system. It's working. You look for ways to improve it, to "get it right." You get around to elegantly adding those improvements that you couldn't "bolt on before". And somehow... the result is a mess. Years later, you figure out that many of those "improvements" were gratuitous; some of them made the system worse. Have you? Or seen one from a distance?

How can we guard against this effect? Brooks says every design team needs a third-system veteran, someone who has experienced their second system. How might a system like this work at your organization? Anyone can design something, you can't force them to take advice from a veteran. What can you do?

Passing the Word

This chapter is mostly of interest to the historian, or people who like to hear stories of the old days when "we had to walk back and forth through the snow, twenty miles, uphill both ways".

The book assumes that most of a project's "architecture" happens up front, then stays frozen. Wow, it's the opposite of agile. This chapter explains the clumsiness: This chapter talks about project communications back in the 1970s.

Back then, communicating a spec change was a major ordeal. Word processor software was a newfangled clunky thing. There was no revision control system software to keep track of changes. There was physical typesetting; physical pages to truck to far-flung teams.

The section titled Conferences and Courts talks about their change process. He mentions that it happened less often than it should have--Brooks wishes that his architecture team had been more agile. But when you hear about what it took to communicate a change... it makes one glad to be working in these relatively easy times.

Why Did the Tower of Babel Fail?

It's another chapter about the difficulty of communication, but this chapter describes problems that are still with us.

So it is today. Schedule disaster, functional misfits, and system bugs all arise because the left hand doesn't know what the right hand is doing.

Part of the chapter describes the 1970s solution: write everything down. Since they didn't have an intranet and a wiki, and they generated a lot of information, they used microfiche. Ha ha ha. If you get bored reading about 1970s communications technology, you might skip ahead to the section titled Organization of the Large Programming Project

Like the "Surgical Teams" chapter, this section discusses roles. But instead of people working on a small team, these roles are the "diplomats" that keep those teams moving in the same general direction: producers and technical directors. If you work on a large project, who are the people who have this high-level role? How would you decribe their roles? What challenges do you think they face?

Calling the Shot

This chapter has more ponderings on schedule estimation, including some still-relevant thoughts about why things might take longer than you'd expect. Along with this, there are some good reasons for "fudge factors" to apply when you hear a schedule from a naive estimator.

Have you ever estimated how long it would take you to complete a task? How far off were you? Why?

Has someone else ever estimated how long it would take you to complete a task? Why was their estimate so far off?

Have you ever estimated how long it would take someone else to complete a task? How far off were you? Why?

This chapter measures programming output in Lines Of Code. Yes, people really did this back in the 1970s and '80s without irony.

Ten Pounds in a Five-Pound Sack

This chapter discusses the tragedy of metrics: once programmers know what you're trying to measure, they will optimize for it, sometimes at the expense of overall system quality. E.g., if you tell programmers to use less memory, you might notice the system slows down with swapping:

...Digging-in showed that the control program modules were each making many, many disk accesses. Even high-frequency supervisor modules were making many trips to the well, and the result was quite analogous to page thrashing.

You need to measure things, but leaders need to make sure that things stay on track.

The project was large enough and management communication poor enough to prompt many members of the team to see themselves as contestants making brownie points, rather than as builders making programming products.

Have you seen this kind of problem, where a high-level strategy gets mis-applied at the low level? Could a different approach have yielded the benefits without the low-level problems?

Representation is the Essence of Programming is a fun section, reminding us of the importance of designing our data structures.

The Documentary Hypothesis

Brooks was a writer; it's no wonder that he advises that a project's mission, architecture, and everything be expressed in writing. This chapter describes some things worth writing down. If your customers can see your source code, then API documentation is simple. But he wants more.

One document he suggests maintaining is an org chart:

This becomes intertwined with the interface specification, as Conway's Law predicts: "Organizations which design systems are constrained to produce systems which are copies of the communication structures of these organizations." Conway goes on to point out that the organization chart will initially reflect the first system design, which is almost certainly not the right one.

Even with our great software tools, editing an org chart is tricky. People get tetchy when you mess with it. Some inertia is good. Brooks advises some inertia in changing system documents, too... and not just because of clunky 1970s documentation technology:

...the best engineering manager I ever saw served often as a giant flywheel, his inertia damping the fluctuations that came from market and management people.

Have you worked with customers before? What sorts of second-guessing have you had to apply to their suggestions? How do you prioritize their requests?

The task of the manager is to develop a plan and then to realize it. But only the written plan is precise and communicable.

How much precision do you want in a plan? Are there other ways you might communicate it?

Plan to Throw One Away

Twenty years later, Brooks was no longer fond of this chapter. This chapter points out that your first attempt at implementation might not be good; it's OK to scrap it and start over. Later on, as he moved away from the waterfall model, Brooks was OK with the idea of more incremental improvements.

But there's still some good advice here:

Project after project designs a set of algorithms and then plunges into construction of customer-deliverable software on a schedule that demands delivery of the first thing built.

Have you ever been on such a project? Did your customers forgive you?

Plan the Organization for Change reminds us of tricky things to keep in mind when changing things.

Management structures also need to be changed as the system changes. This means that the boss must give a great deal of attention to keeping his managers and his technical people as interchangeable as their talents allow.

I'm not sure I've ever seen a management structure change that was described as in response to a system change. Have you? What was the reason? How was it communicated?

Sharp Tools

This chapter is mostly of interest to the historian. It's a fun read, but if you're in a hurry, I'll summarize it for you: tools are better now than they were in the 1970s. We gripe about our tools, but they are awesome.

The Whole and the Parts

This chapter discusses the creation of high-quality systems:

  • Writing a spec clear enough such that folks know what the system is supposed to do.
  • Test the program.
  • Debugging: excel at it.
  • Don't try to integrate all changes at once.

Have you worked on a large project? How did the project find bugs hidden in the spaces "between" modules worked on by multiple teams? Did teams do anything to make this easier?

Hatching a Catastrophe

Earlier, we learned that everyone is terrible at estimating software development schedules. Here, we learn that they're also terrible at noticing schedule slips. He recommends using scheduling software to find bottlenecks and watch those carefully.

The bosses also need to let underlings safely report schedule slippage.

Have you worked on a project that had a schedule?
Have you worked on a project that did not have a schedule?
Any differences in the projects that could be attributed to the difference?

The Other Face

This chapter discusses how your product interacts with customers: usability and documentation. Some parts of this chapter have aged better than others.

The flow chart is a most thorougly oversold piece of program documentation. Many programs don't need flow charts at all; few programs need more than a one-page flow chart.

Flow charts have fallen out of favor. What would you say are the most oversold pieces of program documentation nowadays?

The chapter has some interesting ideas about using code comments to clarify code. What did you think about using ASCII art arrows to show the path of GOTO statements? Do you still think C is a primitive programming language, now that you've seen PL/I?

No Silver Bullet--Essence and Accident in Software Engineering

If you're reading an old edition of the book, you don't have this chapter. Fortunately, it's out there in a place called the internets.

This 1987 essay points out that software development had made great strides recently. Software development was getting faster! But the easy speed-ups were gone: the remaining problems could not be so easily knocked down with tools.

Our tools were getting really good at dealing with the same problems. But the essential problems hadn't been tackled at all.

I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared to the conceptual errors in most systems.

Putting together a software system isn't like assembling homogeneous bricks.

...it is necessarily an increase in the number of different elements. In most cases, the elements interact with each other in some nonlinear fashion, and the complexity of the whole increases much more than linearly.

And of course, by the time you design a decent solution, the requirements have changed

The software entity is constantly subject to pressures for change. Of course, so are buildings, cars, computers. But manufactured things are infrequently changed after manufacture; they are superseded by later models, or essential changes are incorporated in later serial-number copies of the same basic design.

How can we approach some of these essential problems? If we want to do something similar to what others have done before, we might be able take advantage of their work; e.g., if you want to write a simple database-backed web application, there are several platforms that make this pretty easy. But probably the very fact that you're thinking about large-scale software development means that you're thinking about some project that's not so straightforward.

When you consider the lifetime of a major software project: the evolution from hallway conversation to design to tinkering to large-scale development to refinement to deployment to support to maintenance and improvement: where does the time go? If you wanted to accomplish all this with half the effort, what would need to change?

Some of Brooks' possible-solutions-on-the-horizon have come to pass, at least partially. Are there any of these that you think might speed up software development further in the future? Or have these veins been mined out?

Labels: , ,

Tutorial: Closure Tools Javascript compiler and library

There are some fine tutorials out there for using Closure Tools, but I wrote a tutorial anyhow. Go read Closure Tutorial: Displaying Friendfeed Items. Uhm, by "Closure Tools", I mean the set of recently-opensourced Javascript compilation tools and library code. The Closure library is huge, and there's plenty to explore. This tutorial shines a light on goog.net.Jsonp and goog.array.

Yes, I wrote this on a Saturday. Tech Writer's Saturday is like a busman's holiday, apparently.

Labels: , ,

Google & OpenID: discovery URL

A while back, I mentioned that Google supported Opendid. There's one important detail that I had a hard time finding amidst the mountains of documentation:

If the user wants to use their Google account to log in via OpenID, the discovery url is https://www.google.com/accounts/o8/id

That was hard to find. I think it took me over an hour. It's mentioned on the Federated Login for Google Account Users page... halfway down... under a diagram showing the back-and-forth of redirects which I didn't especially care about because of course my code has to handle those already for all of the other OpenID redirections. And with plenty of mentions of OAuth, just to further convince me that I must be looking at the wrong page, and wander off to look at other places.

It took me just a few minutes to find out Yahoo!'s OpenID discovery URL. Some Yahoo! technical writer deserves a bonus.

(Yeah, yeah, I saw the blog post about webfinger and how it will automagically discover discovery URLs and we'll all be sitting down at lovely unicorn tea parties forever. But maybe instead of waiting for that, I'll just let people log in via their Yahoo or Google account, and that's probably gonna handle all of my users just fine, thank you.)

Labels: , ,

Link: Deny you ever read about Crypto Strikes Back in this blog post

In theory, I'm hobbyishly working on a little programming project. In practice, I make almost no progress on it. I'm almost never home and awake and alert enough to code. The bad news is: not much progress. The good news is: I can make a mental note like "I should find an API for a cryptographically-secure random function in Python" and I don't really need to research it. (Note to self: random.SystemRandom) I just need to keep that in mind and a couple of weeks later, I'll watch a video of a tech talk which mentions the info I want. Normally, you might think that two weeks for "research" of one factlet would be too slow. But it didn't slow me down. It's not like I would have made progress. It was like, no-cost research, anemone-style.

Here's the talk: Crypto Strikes Back, by Nate Lawson.

Oh yeah, and you should totally ignore what I said about crytographically-whatever Python functions and watching that video because in that same video he also says that if someone says they researched crypto by reading a blog post, that's a warning sign of bad crypto. You totally didn't want to read this far. Look, you still have plausible deniability. Go drink gin until you've totally forgot that you read this right now or else your crypto will suffer.

Yeah, if anyone asks, you never read this blog post, and you would never study crypto by reading blog posts. (I, for one, am much too 'leet for that because I study crypto on YouTube.) YOU WERE NEVER HERE! WE NEVER HAD THIS CONVERSATION!

I will now go straight to bed and forget I ever wrote this.

Labels: , ,

Book Report: Knowledge Sharing in Software Development

I was in meetings most of this last week at work. Meanwhile, one of my co-workers was learning a new style of programming--and thus was trying to learn about four big new things at once. She sent me emails, but I was slow to answer; I was in meetings. Days passed; she could read documentation, she was a quick study, she was learning a lot... but wasn't learning the things that would help her, you know, get started doing stuff. Before you know how to get started, you're not sure which things to study to get you started. On Friday afternoon, I sat with this one co-worker. I didn't tell her much--but I steered her at what she needed to get started. Soon afterwards, she was set up with her development environment, had a "hello world" going, was ready to apply some of that stuff she'd read about. Such a difference. Face time speeds things up. Except, of course, that it won't help the next person who wants to learn those four big new things.

How do computer programmers learn? Though it pains this professional tech writer to say it--they don't learn much from documentation. There's gossip, but that only gets you so far. There are so many different kinds of things to learn: general tradecraft, programming languages, project-specific knowledge... The project-specific stuff is especially tough to learn, because you're probably rewriting your project's code all of the time, pulling the rug out from under your own feet. In Knowledge Sharing in Software Development the author, Jan Chong, studied two teams of developers organized in two different styles to watch how job-knowledge flowed within each group.

(Yes, puzzle huntists, that Jan Chong. No, I probably wouldn't have heard about it if she wasn't a puzzle huntist.)

One of the groups followed the Extreme Programming cult, the others were Waterfa-- Hey, come back! This is not a Book on $foo about how $foo is better than the Waterfall strawman.

This book is Jane Goodall among the Geeks. Except that the author is a geek herself, and understands what's going on a lot better than Jane Goodall understands chimps. So when she took notes, she didn't say "Vocalizes technical knowledge, wish I could understand it." She said, "Desire to project technical competence." Oh yeah, how many times have you been stuck in that conversation.

The book doesn't start out seeming useful. It starts out in full-on academic mode. No sentence escapes without citing a couple of sources.

"Work has changed significantly over the past fifty years, moving from a workforce predominantly based on manual labor to one that is increasingly based on knowledge work (Drucker 1993; Elliasson 1996)."
It's not just that, as an academic, one has to cite sources to back up a statement like that. It's that one has to state a statement like that in the first place or risk getting questioned on it by some professor type who doesn't even know what it's like to work for a living these days. Halfway down page 2 of this, I nearly gave up. But I riffled ahead and saw
The Waterfall programmers also invited me to several team-wide social events in the course of observations, including periodic board and card game nights (e.g., playing Robo Rally, Chez Geek, Guillotine)...

Well that was more accessible. And I couldn't help but notice that the list of games didn't say "Robo Rally (Garfield, 1994), Chez Geek (Darbro, 1999), Guillotine (Peterson, 1998)" So, OK, first few pages have plenty of citations to remind the committee that a sufficiency of past reading has gone into the work. But you don't really need to slog through much of that to get to the real work. And there's good stuff in here.

The book describes the behavior of two groups of programmers. One group is a team of Extreme Programmers--they pair-program, forming new pairs each day, all sitting in one big work area where everybody can hear everybody. One group is a team Waterfall programmers, each sitting in an office with a door, where every bit of intra-team communication requires effort. The book looks a lot at the social networks that formed. But that's not what I focused on.

I focused on this snippet about the XP programmers:

...the practice of pair programming made visible the process of reasoning inherent in code construction, a process normally only available through post hoc accounts and code inspection. Through pair programming, the process was made available in situ.

They developed code faster because they all looked at the same code. They rotated between areas of the code. No one was the "datastore expert" or the "security expert" or the whatever expert--they all wrestled with all parts from time to time. They didn't need to comment or document their code so much--because they could talk it over with each other.

They produced code with little documentation. Like most code, it didn't make much sense--but if one of them didn't understand it, they could always ask someone else on the team. Then again, if someone who wasn't on the team didn't understand it... that person would have been out of luck. The XP programmers were happy to be oral, because they didn't need to slow down to explain things. But heaven help you if you wanted to understand their code and you weren't on the team. The XP team checked email once a day; didn't answer their phones. They spent time with the team, no distractions... from, say, the people who were stuck maintaining the code the XP programmers had previously developed.

...on the XP team, specific program code changes were actually quite opaque outside of a given pair.

Yeah, I bet.

So... the good news is that the XP team developed code very quickly by working all day, ignoring the outside world, and not trying to make their code understandable outside of the team. The bad news is that they ended up with some semi-understandable code. By working in pairs, they made sure their knowledge was distributed between their brains but that also made them less likely to capture their knowledge where someone else could read it.

Of course, most of the time that the Waterfall team puts into recording knowledge is wasted. You re-write code so many times over the course of a project, it's a waste of time to explain how any of it works--until you're ready to share it with somebody else. They track so much crap, most of which won't help future maintainers.

What would be the best of both worlds? How do you get the team to work together well, but make sure that they explain their work before they hand it over to the outside world? You almost want to set up an XP team--and then geographically split them up in the last few weeks of the project, force them to clarify things, nudge them to capture the knowledge that they use to clarify the code to each other. Just remember: that explanation you emailed to your co-worker probably also makes a dandy code comment. And the very fact that one co-worker needed that email suggests that patch of code isn't as self-explanatory as you hoped.

(Yes, training-minded co-workers, I have this book at my desk and you can borrow it.)

Labels: , ,

Aiming for Precisionism but Missing

When I was in Houston, I took perhaps my favorite photo-of-mine ever, this shot of the Houston Hyatt. It reminded me of some photos that the artist Charles Sheeler took. But he didn't leave his photos alone. He used them as models for his paintings. His paintings smoothed the images out, made them flatter.

I don't know how to paint. I'm not going to get good at it any time soon. As a lazy engineer, I'm always looking for shortcuts. If all I want to do is smooth some of the texture out of a photo, maybe I can do that in software. I tried opening up the photo in the GIMP, a photo-editing program. I told it to use a palette of just 16 colors. That looked--too flat. Too weird.

So I tried writing my own image-manipulation program. Python has an Imaging library; its API is nice; this was a fun thing to play with. What did I try? Consider the image, one row of pixels at a time. For each row, set up four levels each of Red, Green, Blue. Draw the photo, "restrained" to this palette.

[altered photo]

The result is... not so great. Not so precisionist; all to pointillist. It reminds me of an old GIF file, full of noise forced by its small palette. On the other hand, it's strange to look at this image, and to know why it's noisy, to understand the complicated filter I applied to it. It inspires other filters.

Labels: , ,

OpenID, OAuth, Learning by Gossip

Last weekend, I did some programming. Well, not much programming. Mostly I did research preparatory to programming. Well, not exactly research. It was more un-research.

I started out learning how to use the OAuth protcol to... to do something it's not meant to do. OAuth is useful, but I learned that it wasn't meant to do what I wanted. If lots of people worked hard, you could use it for what I wanted--but that would be silly, because you can use OpenID for what I wanted.

What I wanted was to set up a little web app with user accounts that didn't ask users for a password. Instead, it would ask the user if they already had an account at some service: Yahoo, Google, Twitter, Flickr, or whatever... and then ask that service: hey, is this person who she says she is?

What I wanted was OpenID, which does that. (Like, say, this OpenID consumer sample implementation for AppEngine.)

But I'd heard some third-hand news a while back. Chatter on forums: Don't use OpenID. None of the big services are using OpenID. Folks asked Google to use OpenID, but Google didn't--because it's insecure. Google's pushing for OAuth instead, and they're web security smarties, you should use OAuth.

That was wrong. I'm not sure how much of the wrongness came from me mis-interpreting what I heard. I'm not sure how much of the wrongness came from the ignorance of the folks spouting off in the forums. But there was plenty of wrongness.

I'm pretty sure I'm not the only one who got confused. Some guy wrote a blog post just to say that OpenID and OAuth are not the same thing.

So I spent a while studying OAuth, thinking "This is kind of a bass-ackwards way to do what I want." Until I finally decided to look over OpenID some more.

The rumors of Google's rejection of OpenID are false. I can write a little web app. That little web app can (if you have a Google account and you give your consent) ask Google: is this person who she says she is? And Google will answer. The Google security team will not jump out from behind your refrigerator and break your fingers.

There are so many technologies to learn. You don't have time to learn them all. How do you find out which things are worth learning about? Me, I listen to chatter. I don't think I'm the only one. It's embarrassing to think about but... for all that we're supposed to be rigorous engineers, we fall back on gossip to figure out what to study in depth. What worthwhile things do we ignore? What do we ignore because of some unearned sneering comment on some IRC channel somewhere that's been repeated, relayed, never fact-checked...

Sorry, was I ranting? I do that.

Labels: , ,

Book Report: The Elements of Programming Style

Non-programmers might not realize it, but some computer program source code is even harder to read than the rest. Some of this code is so messy that an experienced programmer looks at it and says "I have no idea what is going on here. Maybe I could figure it out, but... what's on TV?" This book talks about some general principles of writing readable code; and there are examples to illustrate good and bad code.

This book is from the 1970s. The examples are in the FORTRAN and PL/I programming languages. They are in an old FORTRAN--I think FORTRAN has changed a bit since then. I think this book uses an old dialect. I'm not really sure, though. I don't know FORTRAN nor do I know PL/I. Actually, that was a problem with this book. The book had "before" and "after" examples to show how to "clean up" code to make it more readable. I couldn't always understand the "after" examples. Well, I could, but only after the head-scratching I associate with my attempts to read poorly-written code. For example, DO 2 I=1,N After looking at some other examples, I think that means "Loop N times over the block of code that starts here and ends with the line of code labeled "2".

As it was... I think this book might be of more interest to the historian than to the programmer.

Labels: , ,

Book Report: The Difference Between God and Larry Ellison

I've "used" Oracle applications. When I say "used", I mean "tried and gave up". Oracle calendar was slow, buggy, and thought it was a good idea to store my password, unencrypted, in a publically visible file. Yes, I'm still angry about that. Oracle expense report software has saved my employers plenty of money. Not because it's efficient. Rather, because it's so awful that I tend to pay for things out of my own pocket rather than try to file expense reports.

The Difference Between God and Larry Ellison is a history of Oracle Corporation and CEO Larry Ellison. Oracle got ahead through selling vaporware and spreading FUD. I always assumed that their apps were crap but that their underlying database must be good--because they were a successful database company, right? But it turns out that they were in the right time and right place. And they won and kept a lot of business by lying to their customers.

This book was a powerful reminder that a company can do very well by cheating its customers and doing evil. I like being honest with customers and like finding "win-win" situations... but there are people who say "screw that" and go for the quick bucks. If I assume that a company that's been around for several years must have some kind of honesty or else their reputation would have caught up with them, Oracle proves me wrong. I guess I shouldn't have needed this book to remind me. There are plenty of long-lived evil companies out there.

Labels: ,

Book Report: Applied Cryptography

This is an old textbook about applying cryptography; that is, it's about computer security. It's the textbook by Bruce Schneier, the book he later said wasn't so important--you can get this stuff right and your system still might not be secure. Your fancy security system might not do much good if everyone in your company's art department thinks its easier to trade passwords than to set up a shared file server. But I read it anyhow--some pieces of security still seem useful.

It's an old book; people crack codes over time. This led to some disappointment while reading. I got kind of excited to read about FAPKC, a Chinese cryptography system based on cellular automata. This was cool on a number of levels, and not just because it evoked a puzzle from the No More Secrets game. But it turns out that FAPKC was broken back in 1995--probably at around the time this book was slogging through the book publishing process.

I'm glad I read this book; this book made me think. It's not just about the crypto; it's also about protocols built up from crypto. Suppose you have a way to encrypt messages, a way to sign messages. How do you exchange data with someone without being eavesdropped upon if you haven't already exchanged keys with them? OK, you've probably already stumbled into key-exchange protocols. But there are weirder things out there. This book talks about several of them--including how some protocols were found vulnerable. It's good exercise to think about these things, try to figure out how you would crack them. I didn't always succeed. There's another good lesson there--sometimes you can look at a broken system and think "well, it looks OK to me". Trust no-one, least of all yourself. This book had plenty of good puzzles dressed up as protocols.

There was a quick run-through of useful mathematics. This was a nice refresher for stuff I already knew. For the stuff I didn't know--number theory--this wasn't enough to teach me much. But there were references to books with more information with some recommendations, so there's hope for the future. And of course there's still plenty here that you can understand even without the number theory background. The book wants to be both a reference and a lesson-book. Nowadays, for the reference stuff, you'd probably search the web instead; in hindsight, it would have been nice if the book had concentrated on the lessons. Still, it's a fun read; check it out.

Labels: , ,

Book Report: Exploiting Online Games

This book is about hacking online games. Unfortunately, they started out talking about plenty of stuff which I already had read about. Cheating happens. E.g., people in shoot-em-up games use video drivers that don't fill in the walls, just draw "wire frames"--and thus they can see bad guys through walls. Stuff like that. The economics of MMORPGs, of selling stuff. I guess I heard about this stuff from Slashdot and comp.risks? I dunno. By the time they actually start talking about how to exploit games, I wasn't paying attention any more. And it seems like they assume you have access to some software library that only they have access to? Maybe if I'd kept paying attention, I would have learned some things about reverse-engineering.

Labels: , ,

Book Report: Crypto

This last weekend, I pitched in for a playtest of MSPH12 "Jeopardy!". These puzzle-solving endeavors have wonderful moments. Solving puzzles in a team environment--it's very satisfying when my skills complement someone else's and we solve a puzzle quickly by playing off each others' strengths.

I tell people about this stuff, and they're interested at first, until I 'fess up that it's not all about shining flashlights at carefully-constructed paper models of the Las Vegas skyline--it's mostly sitting, thinking, and scritching little notes. It ain't rock and roll, though in conversation I probably paint it exciting.

Of course, I'm thinking about a book as I write this. I'm thinking about Crypto, about another activity full of secret messages.

This book is by Steven Levy and thus has a faux rock-n-roll rebel subtitle: "How the Code Rebels Beat the Government--Saving Privacy in the Digital Age." And there's an awful passage that reminds me of some of Levy's worst writing:

Profane, cranky, and totally in tune with the digital hip-hop of Internet rhythm, they were cryptographers with an attitude.

Argh. Even if the Internet had a rhythm and even if "digital hip-hop" described that rhythm, I don't think the cryptographers could be described as... arrgh. Oh, and he kinda gives partial explanations of some crypto techniques, explanations so incomplete that they're less help than nothing. Arggh, aiyee.

But

But

But if you can get past that, this is a pretty well-researched history. Levy talked with plenty of cryptographers and other figures. And he shows both sides--I still think that the Clipper Chip was a bad idea, and maybe I still can't respect Al Gore for backing it--but I can kinda see how he got roped into supporting it; maybe I can see how well-paved his hell-bound road was.

I got an idea of the personalities of Diffie, Rivest, and some other big names. The story of the coining of "cypherpunks" is in here. There are plenty of good anecdotes. All in all, a worthwhile piece of work.

Labels: , ,

Book Report: The Psychology of Computer Programming

How to get programmers to get along together. Attempts to use psychology to design easier-to-use computer language features. Discussion of which is better for your organization's culture: batch processing of punch cards or time-sharing. Ahem, yes, that's right, I said "punch cards or time-sharing." This book is from 1971. Wow, that's even older than Peopleware. So it's interesting, but in sorta the same way as when you're reading Knuth and you're thinking "Wow, this book is so influential, I'm gonna learn a lot and--wha hey why is he talking about this weird non-Intel architecture OMG did people really argue about what the correct size for a byte is, I mean, c'mon?!?" Apparently, time-sharing was a not-unmixed blessing. If you get a bunch of geeks waiting in line to turn in their punch cards, sometimes they talk about useful stuff.

Labels: , ,

Book Report: Working Effectively with Legacy Code

This book is a classic amongst computer programmers. Well, it's a four-year old classic. It captures the, uhm, zeitg^W movement towards unit testing and refactoring. It shares a problem with other classics: you've already heard its message, heard it from other sources.

Legacy code is scary. Within it lurk strange pieces of code. You look at a piece of code, a piece of code that had a purpose once. But that code has been changed over the years; now it has three purposes, each of which it kinda half-does by working with some other pieces of code which you wouldn't expect to be related. One way to deal with this is by surrounding the strange code with unit tests. Now if someone changes some code that alters behavior, you get an early warning. Blah blah blah.

He also lists a bunch of refactorings useful for making code quickly & easily testable. The first few of these in the list--I'd already heard of them. So I kinda zoned out for the rest of this part of the book.

Labels: , ,

Link: Caja's HTML sanitizer for Javascript

When you write a program that's supposed to be secure, you have to plan on security from the beginning; you can't bolt it on afterwards. The idiomatic way to describe a "plan" like we'll write the program first and figure out the security later is "They're asking for some magical security fairy dust to sprinkle over their code."

I'm tweaking a Javascript program that takes HTML from someone else and renders it on a page. I thought my program was getting "sanitized" HTML; that is, HTML that had any potentially-dangerous stuff removed. If I'm showing someone else's HTML on my page, I want to make sure that HTML doesn't have, for example, an <img src="http://sneaky.org/sneaky.gif"> in it. Otherwise, the webmaster of sneaky.org will know whenever someone reads my page.

I thought the program was getting sanitized HTML, but it was getting "raw" HTML, possibly chock-full of evil. Argh, I needed to bolt on some security. I went pleading to some of the security-minded folks for help. I was embarrassed--I 'fessed up that I needed some "magical security fairy dust". The amazing part is that those security-minded folks came through--they pointed me at Caja.

Caja is primarily a system for enforcing security "capabilities" in Javascript. But, but but even if you don't need all of that, you might still want one part:

Caja comes with a XSS sanitizer for HTML that works with your JS code: html-sanitizer.js. And you'll also need html4-defs.js. It looks like you need to build html4-defs.js via Ant. That's kinda annoying, but a lot easier than writing your own HTML sanitizer from scratch.

I looked over the source code. It's checking for bad stuff I hadn't thought to check for. I sure am glad that folks more knowledgeable than me are working on this thing.

Labels: , ,

Site Update: It's like Web2.0, but three years too late to be considered cool

You know how I had separate lists of Twitter updates and Blog updates? Like, on my home page, I listed each of those, but they were in separate areas? That was kind of silly. And unnecessary: Friendfeed provides a handy combination of my feeds. They provide it as a pretty web page, but also as a Web API. So I wrote a little Javascript to present that list of recent updates.

It's like I'm some kind of web programmer or something. But I'm not turning into a hipster, I swear. I don't even own an ironic trucker cap. Don't shun me just because I used some Javascript. At least I didn't put in any gratuitous XML parsing.

Labels: , ,

Book Report: Code Complete

Computers are hard. This afternoon, I was trying to figure out why some people couldn't view my web site. It sounded like a DNS problem; one guy reported it was affecting him on Comcast in Boston. So I tried Googling for DNS problem reports; I found people complaining that Comcast provides crappy DNS. I don't know if that means that Comcast provides crappy DNS or if that means that Comcast has many customers and thus has more people to whine about it. And then I got sidetracked when I found out that I no longer know what organization manages san-francisco.ca.us. Back when I set up this domain, san-francisco.ca.us was managed by an outfit called Tycho.net. They were the (something something) delegate. No one outfit was going to try to handle registering all of the .us domains; depending on what region you wanted a domain in, you'd deal with some local delegate. Thus, tycho.net. Don't look them up, they no longer exist. I found a forum post which reported that tycho.net got gobbled by something which got gobbled by something else which got gobbled by sonic.net. If sonic.net still manages san-francisco.ca.us, I didn't find any evidence of it on their web site. www.nic.us seems to imply that www.nic.us manages it--but it doesn't have any record of my domain. So the good news about my domain is that it's free, but the bad news of "free" is that if no-one's cashing your checks, then maybe you don't know who you're dealing with.

Computers are hard. Anyhow, yeah, book report, yeah, Code Complete, here we go.

This is one of those influential books that I didn't get much out of because their ideas have already percolated out into society. Heck, a large part of what it does is distill down ideas that had already percolated around society long ago. This book takes the time to summarize both sides of the goto-considered-harmful debate. So I kinda spaced out most of the time I was "reading" this book. Still, there are advantages of skimming over a book that's considered an authority of sanity and stuff.

I'm thinking of this section in particular:

Taking pictures of whiteboard drawings with a digital camera and then embedding those pictures into traditional documents can be a low-effort way to get 80 percent of the benefit of saving design drawings by doing 1 percent of the work required if you use a drawing tool.

I do that. I draw on the whiteboard and snap a photo instead of diving into you favorite diagram-drawing tool. And people laugh at me when I do. But now I can point at this book. I can say, "It's in Code Complete, thus it is accepted industry practice QED."

Labels: , ,

Book Report: Code Reading

I am getting ready for a The Game, and am thus hyper-aware of white cargo vans. This is tricky; while team-mate Wesley is in town, he's staying close to Delancey Street. As in Delancey Street Movers. They have a lot of white cargo vans. I was kind of twitchy in that neighborhood. Anyhow, I guess I'll post a book report: Code Reading

When I was in school, I did an internship at a now-defunct software development company called Geoworks. Some folks worry about hiring students--these kids have been "spoon-fed" assignments, can you trust them to take on more amorphous tasks, to figure things out? But that wasn't my biggest problem when I started at Geoworks. It was the code. There was so much of it. I was used to working on little assignments--a few hundred lines of code, built up over a semester. But at Geoworks--I faced the product of dozens of engineers hacking away for years. Just finding my way around the codebase was tough. Figuring out the conventions necessary to make such a codebase manageable...

I wish that Diomedis Spinellis' book Code Reading was available then. He talks about just that--how to get a handle on a big pile of code. He talks about architecture. He talks about low-level code constructs you're likely to encounter. He talks plenty about C programming (common in open-source programs) which might come in handy if you're a student emerging from a Java-heavy program.

There's some good stuff in this book. I learned a little from this book... which sounds like I'm damning it with faint praise. But remember, I've read a lot of code over the years already; this book wasn't really aimed at me. Mostly, I'd like to send this book back through time, send it back to myself 1991. It would have done me a lot of good.

Labels: ,

Puzzle Hunts are Everywhere: a web-crawling puzzle-hunt robot that didn't work

When the applications for the Ghost Patrol game started appearing, it was pretty humbling. New videos kept showing up on YouTube. The videos... the videos made me glad that my team (Mystic Ghosti) had submitted a cryptic crossword puzzle instead of a video. There was some tough competition.

One video was an advertisement for the Ghostatron 5000, a "ghost capturing device" oddly reminiscent of a Pac-man game. Embedded in the video was a not-so-secret message "number of dots". There was also the URL of a page allowing you to purchase a Ghostatron 5000--but only if you entered the correct password into a little web form. Aha, no doubt the password was the number of dots. But, uhm, what number of dots? Number of dots in a Pac-man layout? Number of dots in a Seurat painting?

I figured it was probably the number of dots in a Pac-man game. I tried counting those. I counted them a couple of times, got a couple of answers. I tried counting a third time as a tie-breaker--and got a third answer. Counting dots was too hard for my puny puny brain. I wasn't even sure I was on the right track. It was time to think about brute force.

In trying out some previous guesses, I had some idea of how the password checker worked. If you entered the guess "able", then the password checker would put you on the page http://www.princeton.edu/~bdbennet/able.html . Since there was no such web page, I figured "able" was not the password. Peeking at the form page's source code confirmed that this was what was going on:

<SCRIPT LANGUAGE="JavaScript">
function testResults (form) {
    var TestVar = form.inputbox.value;
    window.location="./"+TestVar+".html"; 
   
}

To test the hypothesis that the password was a "number of dots", I could check the pages
http://www.princeton.edu/~bdbennet/1.html
http://www.princeton.edu/~bdbennet/2.html
http://www.princeton.edu/~bdbennet/3.html

...

So I threw together a program to check those. Well, not all possible numbers. There are many numbers. I just checked the numbers from 1...999:

import time
import urllib

for count in range(1, 1000):
  time.sleep(1)  # wait a second, in case princeton.edu is too puny to handle a robot (unlikely)
  u = urllib.urlopen("http://www.princeton.edu/~bdbennet/%d.html" % count)
  for line in u.readlines():
      if line.find("<TITLE>404 Not Found</TITLE>") > -1:  # if no such page
          print count,                                                   # print status
          if not count % 10: print
          break
  else:                                                                  # but if page found
      print
      print "HEY"                                                        # say HEY
      print count
      break

My program never said HEY. So I was on the wrong track. I never got on the right track. I never solved that puzzle.

Labels: ,

Book Report: Defensive Design for the Web

It's sad news that Rory Root, owner of Berkeley's Comic Relief comic book store, died today. But no-one reads this blog for news. You're here for book reports. Here is a book report for Defensive Design for the Web:

A quick book of how to deal with those most dangerous parts of designing a web application: when we force people to think. How to write and present a useful error message. How to design a web form. Making online help useful. They point out some examples worth stealing from.

Labels: ,

Book Report: Refactoring

Here I am tending to my blog on the bus. I wasn't really planning on it. I was just checking my email. I get email, among other occasions, when someone or something posts a comment to this blog. My mail this morning suggested that a spambot was posting spam comments to this blog. So here I am, cleaning spam. As long as I'm here, I guess I'll post a Book Report I've had sitting ready for a few months. It's about re-writing software. Like hopefully someone will re-write some of blogger's anti-spammy stuff so that bots can't break CAPTCHA anymore. That would be awesome. Actually, this book isn't about that kind of programming, not about changing how programs behave. It's about changing programs so that they behave the same.

If you're a computer programmer, you've probably heard of this book Refactoring. Everybody talks about it. Everybody talks about Knuth, too, but nobody has read him. (I started to read Knuth, but bogged down when I was reading about his example machine architecture... what was it, 13-bit words? Yeah, yeah, I've heard that the new editions will feature a more modern architecture. I'm sure I'll give them a try eventually. Anyhow.) Reading Knuth wasn't easy. But reading Refactoring is easy.

Parts of it were too easy. I eagerly read the catalog of refactorings. I gobbled them up like popcorn. "Oh, I can see how to apply this refactoring, and now I feel clever," I chuckled to myself. But I was reacting like some silly student who still has a favorite data structure--it was too easy to get swept up by the catalog of refactorings, to forget that each refactoring has a purpose.

I'm doing my best to forget the list of refactorings. Instead, I want to remember the list of "code smells"--things to look for in my code. Symptoms of problems. Often, to recognize one of these suggests the refacto... the edit by which one may purge it.

Alternative classes with different names, Comments, Data class (struct w/few methods), Data clumps, Divergent change, Duplicated code, Feature envy, Inappropriate intimacy, Incomplete library class, Large class, Lazy class (class w/little functionality; perhaps can be absorbed into something else), Long method, Long parameter list, Message chains, Middle man, Parallel inheritance hierarchies, Primitive obsession (reluctance to use objects to represent, say, int values), Refused bequest (subclass overrides most of parent), Shotgun surgery, Speculative generality, Switch statements, Temporary field.

Looking them over, most of these look like problems with poorly-architected OO. There's a temptation to toss out OO--look at the mess it's brought. But then, I do like being able to associate functions with the data they act upon. I like it most of the time. And, as the saying goes, "I can write Java in any language". Don't toss the baby out with the bathwater. Moderation in all things.

Maybe I should turn on moderation for comments on this blog. Hmm.

Labels: ,

Puzzle Hunts are Everywhere: The Elementarizer

Yes, it's another blog post about programming & puzzle-hunts. This one isn't a web crawler.

Dr Clue runs team-building puzzle hunts. Alexandra's done some puzzles for them and I've proofread a few. She mentioned that one of these puzzles was tricky to encode--there were fiddly steps, easy to get wrong. And she'd make more of these puzzles in the future. Dr. Clue re-uses puzzle types; this makes sense, since the participants from, say, last week's dentists convention won't participate in next week's gathering of Coca-Cola managers. (I generally can't talk about things I do for Dr. Clue for this reason; but I got permission to talk about this.) Thus, it made sense to automate encoding for this kind of puzzle.

Spoiler warning! If you're reading this because you're about to play in a Dr Clue game, you probably want to stop. I'm about to describe one of the puzzles.

In this type of puzzle, a team receives a series of numbers along with instructions. The instructions say to decode the numbers. They are atomic numbers, replace them with the chemical symbols for those elements. If a number is marked "reverse", then use the chemical symbol backwards. Then substitute some letters; e.g., the instructions might say if you see "P" replace that with "F". Then cross out some letters; e.g., the instructions might say to cross out all the "X"s. Why all of the substituting and crossing-out? Is it to make the puzzle tougher? Nope.

You probably can't encode a message as a straight-up series of chemical symbols. There aren't many symbols to work with; not many have an "E" in them. So you probably need to have players cross out some letters--keep track of those. And you might need to set up some substitutions; keep track of those. It gets tricky. So I wrote a program to help out, a quick-and-dirty Python script.

But that program didn't do much good on my home machine. And it might not do much good to just hand it to the Dr Clue folks: "Here's the program... Oh, but you need to install Python to use it. Whoopsie." But I have a web site. Everyone knows how to use a web site. So I wrapped that script up into a cgi script, a web form. Here is the messy result... uhm, I did warn you about the "quick-n-dirty" aspect, right?

#!/usr/local/bin/python

import cgi
import random

print "Content-Type:text/html\n"

# Stolen from wikipedia: Tab-separated. 
# Each line has Symbol, name, name origin, atomic number, mass, group, period
ELEMENT_DATA = '''Ac    Actinium        corruption of the Greek aktinos         89      [227][1]                7
Ag      Silver  Latin argentum  47      107.8682(2)[2]  11      5
Al      Aluminium (Aluminum)    Latin alumen    13      26.9815386(8)   13      3
   ...snipping out many many lines of data...
Zn      Zinc    German zin      30      65.409(4)       12      4
Zr      Zirconium       zircon  40      91.224(2)[2]    4       5'''

elem_num = {}
sym_dict = {}
known_failures = {}

# ReverseString("Able") returns "elbA"
def ReverseString(s):
  c_list = [c for c in s]
  c_list.reverse()
  return ''.join(c_list)

def MaybeAddToDict(tweaked_sym, sym):
  if tweaked_sym in sym_dict: 
    # If sym_dict['n'] already contains the nice short 'N', 
    # then don't add sym_dict['N'] <- 'Na'.  Short is nice.
    # Since we cycle through in order from shortest to longest,
    # just check the first element:
    if len(sym) > len(sym_dict[tweaked_sym][0]): return
  else:
    sym_dict[tweaked_sym] = []
  sym_dict[tweaked_sym].append(sym)

# Initialize our dictionary of letters->symbols.  Pass in message to be encoded.
# (We need the message because: if our message contains no "X", then "Xe" is 
# a valid symbol for encoding "E".  
def InitDicts(message):
  elem_num.clear()
  sym_dict.clear()
  known_failures.clear()
  for sym_len in [1, 2, 3]:
    for line in ELEMENT_DATA.splitlines():
      fields = [s.strip() for s in line.split('\t')]
      (sym, name, origin, number, mass, group, period) = fields
      if not len(sym) == sym_len: continue
      elem_num[ReverseString(sym)] = '-' + number
      elem_num[sym] = number
      clean_sym = sym.lower()
      tweaked_sym = ''.join([c for c in clean_sym if c in message])
      if not tweaked_sym: continue
      MaybeAddToDict(tweaked_sym, sym)
      if len(tweaked_sym) > 1:
        MaybeAddToDict(ReverseString(tweaked_sym), ReverseString(sym))

# We tokenize the message by recursively calling this function.  The call
# stack to tokenize "heal" would look like
#  HelperFunc("heal", "")
#    HelperFunc("al", ".he")
#      HelperFunc("", ".he.al")
def HelperFunc(todo, sofar):
  if todo in known_failures: return (False, sofar, known_failures[todo])
  if not len(todo): return(True, sofar, '') # cool, we finished
  bottleneck = ''
  for maybe_len in [3, 2, 1]: # try to use the next three letters. or 2. or 1.
    maybe = todo[:maybe_len]  
    if maybe in sym_dict:
      (success, tokenization, bottleneck) = HelperFunc(todo[maybe_len:], '.'.join([sofar, maybe]))
      if success: return(True, tokenization, '')
  if not bottleneck:
    print todo, '#', sofar
    bottleneck = todo[:3]
  known_failures[todo] = bottleneck
  return (False, sofar, bottleneck)

def PrintRows(tokens, encoded_tokens):
  TOKENS_PER_ROW = 15
  while 1:
    if not tokens: break
    line_of_tokens = tokens[:TOKENS_PER_ROW]
    tokens = tokens[TOKENS_PER_ROW:]
    line_of_codes =  encoded_tokens[:TOKENS_PER_ROW]
    encoded_tokens = encoded_tokens[TOKENS_PER_ROW:]

    for token in line_of_tokens: print "%4s" % token,
    print
    for code in line_of_codes: print "%4s" % code,
    print
    for code in line_of_codes: print "%4s" % elem_num[code],
    print
    print

def ReportSuccess(message, tokenization, substs):
  print "<p>Found an encoding."
  print "<p>(If you try again, you might get a slightly different encoding."
  print "    We use some randomness."
  tokenization = tokenization[1:]
  tokens = tokenization.split('.')

  message_with_subst = message
  for subst in substs:
    message_with_subst = message_with_subst.replace(subst[1], subst[0])

  letters_to_cross_out = ''

  encoded_tokens = []
  for token in tokens:
    for count in range(0, 10):
      code = random.choice(sym_dict[token])
      if elem_num[code].startswith('-'): continue
      if not [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]: break
    new_nots = [c for c in code.lower() if c not in message_with_subst + letters_to_cross_out]
    if new_nots: 
      letters_to_cross_out += ''.join(new_nots)

    encoded_tokens.append(code)

  sorted = [c for c in letters_to_cross_out]
  sorted.sort()
  letters_to_cross_out = ''.join(sorted)
  print '<p style="font-weight: bold;">Letters to cross out: %s ' % letters_to_cross_out.upper()
  print

  if substs:
    print '''<p><b>Used some substitutions.</b>
             (Double-check these: There might be "transitives" you can
              collapse.  For example "('z', 'g'), ('q', 'z')" really
              means "substitute Q for G") <b>%s</b> ''' % substs

    print "<p>Message w/subst: %s " % message_with_subst

  print "<pre>"
  PrintRows(tokens, encoded_tokens)
  print "</pre>"

  print "<p>Just the numbers (&quot;-12&quot; means 12 reverse)<br><b>"
  print ' '.join([elem_num[elem] for elem in encoded_tokens])
  print "</b>"

def ReportFailure(message):
  print '<p style="color:red;">Failed to elementarize message'
  print '<p>General hints: Q, X, and J are generally difficult to encode'
  

def GenerateForm(message):
  message = cgi.escape(message)
  message = message.replace('"', "&quot;")
  print '''
  <form action="elemental_encoding.py">
  <input name="msg" value="%s" size="100">  <input type="submit">
  </form>
''' % message

def GenerateHead(message):
  KOSHER_CHARS = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
  title = ''.join([c for c in message if c in KOSHER_CHARS])
  print '''<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
 <head>
  <title>Elementarizing</title>
 </head>

 <body>
 <h1>Elementarizing</h1>
''' 

def GenerateFoot():
  print '''
</body>
</html>
'''

def AttemptTokenizationHelper(message):
  InitDicts(message)

  print "<p>Attempting to elementarize message &quot;%s&quot;" % message
  print "<pre>"
  (success, tokenization, bottleneck) = HelperFunc(message, '')
  print "</pre>"
  print "<p>Finished elementarizing."
  return (success, tokenization, bottleneck)

def AttemptTokenization(message):
  substs = []
  for attempts in range(0, 20):
    (success, tokenization, bottleneck) = AttemptTokenizationHelper(message)
    if success:
      return (True, tokenization, substs)

    print '<p style="color: red;">Tokenization attempt failed.'
    print '<p>We think we had trouble with: <i>%s</i>' % bottleneck
    subst_candidates = [c for c in 'abcdefghijklmnopqrstuvwxyz' if not c in message]
    if not subst_candidates:
      print "<p>Can't substitute: already using all letters."
      return (False, tokenization, substs)
    swap_in = random.choice(subst_candidates)
    if swap_in in "jqxz": 
      swap_in = random.choice(subst_candidates)
    if swap_in in "jqx": 
      swap_in = random.choice(subst_candidates)
    if swap_in in "jx": 
      swap_in = random.choice(subst_candidates)
    swap_out = bottleneck[0]
    if 'x' in message and random.random() > 0.5:
      swap_out = 'x'
    message = message.replace(swap_out, swap_in)
    substs.insert(0, (swap_in, swap_out))
    print "<p>Let's try a substitution: %s for %s" % (swap_in, swap_out)
        
  print '<p style="color: red;">Too many Tokenization attempts failed. I give up'
  return (False, tokenization, [])
      

def Main():
  form = cgi.FieldStorage()

  if "msg" in form:
    message = form["msg"].value
  else:
    message = "Find the frog statue in the lobby."

  GenerateHead(message)

  GenerateForm(message)

  message = ''.join([c for c in message if c.isalpha()])
  message = message.lower()

  (success, tokenization, substs) = AttemptTokenization(message)

  if success:
    ReportSuccess(message, tokenization, substs)
  else:
    ReportFailure(message)

  GenerateFoot()

  exit

if __name__ == "__main__":
  Main()

Wow, I'm kind of embarrassed of this script now. To get the mapping from chemical symbols to numbers, I just pasted in a blob of data--most of which I don't use--from Wikipedia and parse it each time the script runs; it would be more elegant to start with just the right data. This program does a lot of extra work, re-computing things it's already figured out. It could be faster... but it takes a few seconds, and maybe that's fast enough. It's messy... but it's a few months after I originally wrote it, and I can look at it and figured out how it worked; that's better than I can say for my old Perl scripts.

Labels: ,

Puzzle Hunts are Everywhere: an elegant Mastermind Crawler

Last time, I wrote about a brute force web crawler. This time, I'm writing about an elegant web crawler. As you would expect from elegant code, I didn't write it.

The Pirates BATH game had a pregame website. Teams could log in to the web site. There was a web form which I'd programmed before I'd dropped out of Game Control. This web form allowed teams to "search for treasure": enter a string of text. Game Control gave them some strings of text that they could enter: entering one of those into the web form yielded puzzles. When a team solved the puzzle, the answer was a phrase: entering the phrase into the web form yielded a hint which would be useful during the upcoming game.

If they entered text that wasn't a puzzle and wasn't an answer, they were told that they'd found nothing. And if they paid attention, they also noticed some black dots, some white dots, and some xs. These were a "Mastermind" puzzle. If they entered a nonsense phrase, a program figured out which "useful" word was closest; it would then display one white dot for each letter in the correct place; a black dot for each correct letter in the wrong place; an X for each incorrect letter. So if "BELOW" was a word and someone entered "BLOW", they'd see a white dot (for the B), three black dots (for L, O, and W), and an X (for the E).

This was the way to find one game hint: no puzzles solved to the correct word for this hint. But four puzzles gave words that didn't actually yield hints--but instead were just near to the word to enter for this special hint.

What if a team just tried to guess every possible text string? They could guess A B C ... Y Z AA AB AC ... ZY ZZ AAA AAB AAC ... Of course, that would take a long time. It would probably take less time to just solve the puzzles.

So I was kind of surprised when my pager started buzzing one day: BATH Game Control was sending me messages: Team Scoobies had set up a bot to crawl the server! The Scoobies had found puzzles that they couldn't have found!

I looked over the logs. There was a program crawling the system, but the Scoobies weren't running it. Team Blood was running it. The bot was not brute-forcibly checking every possible text string. It was playing Mastermind!

It would guess "A". If it got back a white dot, it knew that at least one word started with A. If it got back a white dot, it knew that at least one word started with A. (A white dot meant right letter in right place.) Next it would try try AA AB AC AD AE ... AZ. If AA returned just one white dot (not two), then the bot knew no words started with AA (e.g., no word was AARDVARK). So it never tried AAA AAB AAC... Thus, it didn't need to check so many things. Thus, elegance.

When I reported my findings to Game Control, they decided that this thing must be stopped. Though it was elegant, what if it allowed the team to bypass puzzles? Game Control figured that this would be unfair.

Hmm, how to stop the bot without disrupting other teams? How did the bot work? Team Blood was running it. Rich Bragg captained Team Blood. I worked at the same company as Rich. Maybe he'd written this program while at work? And maybe he'd left the program somewhere where I could find it? I thought about it: If I were Rich and I'd written this program at work, where would I have put the source code? I looked there: no program. Then I tried my second guess and saw a file: piratebath.py. Bingo. It was a web crawler, a very specialized web crawler.

#!/usr/bin/python2.4

import cookielib
import re
import time
import urllib2

def Login():
   print "Logging in..."
   cj = cookielib.CookieJar()
   opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
   opener.open("http://www.piratesbath.com/log_in.php",
               "access_login=blood&access_password=bythepint")
   return opener

def NumMatches(html_data, substring):
   matches = re.findall(substring, html_data)
   if not matches:
       return 0   
   return len(matches)

def NumLettersCorrect(html_data):
   return NumMatches(html_data, "dot_white.gif")

def FoundTreasure(html_data):
   return NumMatches(html_data, "No treasure found.") == 0

def SearchOne(opener, results, query):
   data = opener.open("http://www.piratesbath.com/search.php?q=" +
query).read()
   letters_correct = NumLettersCorrect(data)
   print "Query:", query, "had", letters_correct, "of", len(query), "letters"
   all_correct = letters_correct == len(query)
   if all_correct and FoundTreasure(data):
       print "Found:", query              
       results.append(query)
   return all_correct

def SearchAll(opener, results, query_prefix = ''):
   alphabet = list('abcdefghijklmnopqrstuvwxyz')
   for letter in alphabet:
       if SearchOne(opener, results, query_prefix + letter):
           SearchAll(opener, results, query_prefix + letter)

def Run(query_prefix = ''):
   opener = Login()
   results = []
   SearchAll(opener, results, query_prefix)
   print "Results: ", len(results), "words starting with '%s'" % query_prefix
   for word in results:
       print word      

Run()

Aha, the code was checking for text in the page: dot_white.gif and No treasure found. If I just added some visible-to-bots-but-invisible-to-humans text like that, I could fool the bot into mis-counting white dots or what-have-you. So that's what I did. (Security-minded folks in the audience might say: uhm, but what about stopping the general case of bots? Yeah, I set up code for that too, but wanted to let Game Control configure it to say how much guessing was "too much", and that took a while. Fooling Rich's bot--that was a quick-n-dirty fix.)

(I notice that this code imports the "time" module, but doesn't use it. I wonder if an earlier version of code politely "slept" a little between queries--but maybe Rich figured out that the server was waiting a second between responding to a team's queries anyhow, and that the sleep was thus not so useful...)

Rich noticed when his bot started generating garbage results. He mailed Game Control to make sure there were no hard feelings. Game Control asked him to stop running it, and he did. He said that this script was basically another monitor: it alerted the team to the presence of new puzzles; thus no-one had to go re-check the piratesbath.com web site each day.

In hindsight, when I programmed that web form, we should have used it only for entering answers, not for getting puzzles. We should have used some other way to distribute puzzles. Thus, a team could monitor that to look for puzzles and Game Control wouldn't need to panic that someone was bypassing the puzzles to get the answers.

Labels: , ,

Puzzle Hunts are Everywhere: Brute Force Web Quiz Crawler

It's another blog post about how web programming skillz can aid in game-ish activities.

A couple of years ago, Team XX-Rated hosted the Paparazzi Game. I was sorry that illness made me miss the game itself. Fortunately, the illness didn't strike until after the pre-game, so I was able to participate in that. The pre-game gave me an excuse to use glitter glue. Also, there were puzzles.

The Online Dating Style Quiz Puzzle was pretty mysterious to me. It was a multiple-choice quiz; you could submit a set of choices and get a grade on your dating style. Later on, my team-mates showed me the clever way to solve this puzzle. But I didn't spot that on my own. I wasn't sure that I could spot cleverness in this puzzle on my own. I could tell it had some references to tabloid celebrities. But I knew almost nothing about tabloid celebrities.

I tried filling in the quiz a few times. Each time, I just got the grade "Honestly, you're pretty lame and none of us on staff would want to date you. Maybe you should re-read the questions.". But one time, I got a different grade: "You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction." I hoped that might inspire a clever approach, but it didn't. But by now I was pretty sure that the grade was based solely on my multiple-choice choices: there wasn't something weird going on based on my history of past attempts, timing, or other factors.

So I used brute force: try all possible combinations of choices. Or, rather, try many of them. I wrote a little program that would cycle through choices and log those that got a non-"lame" grade. To avoid hogging the server, the program would "sleep" for a second between queries. Since there were many, many combinations of choices to consider, the program would take a long time: I planned to let it run overnight but it still wouldn't have time to try every choice. But I didn't especially want it to finish. I wanted enough data back so that I could understand the problem better, maybe figure out the clever bit.

So, the script. This is not that script; I lost track of that script. This script is a reconstruction; it's probably similar to that script.

import time
import urllib

ABC = "abc"
QBC = "qbc" # question 5 has different choices

QUIZ_URL = "http://www.xx-rated.org/xxtraonline/quizzes.php"

LOG_PATH = "/home/lahosken/log.txt"

q = [None,'','','','','','','','','','', '']

# Don't log the whole page; it's mostly boilerplate.  Just log the interesting part.
def GetUsefulParts(s):
  (before, marker, after) = s.partition('<!-- InstanceBeginEditable name="content" -->')
  if after: 
    s = after
  else:
    s = before
  (s, _, _) = s.partition('<!-- InstanceEndEditable')
  retval = ""
  for line in s.splitlines():
    line = line.strip()
    if not line: continue
    if line == '<p><img src="images/quizzes.gif" width="435" height="90"></p>' : continue
    if line == '<div align="center">': continue
    if line == """<h1><span class="style8">What's your dating style?</span></h1>""": continue
    if line == '<font size="+2" color="#000000">': continue
    if line == '</font>': continue
    if line == '<h3><a href="quizzes.php">Try again!</a></h3>': continue
    retval = retval + line
  return retval

for q[1] in ABC:
  for q[2] in ABC:
    for q[3] in ABC:
      for q[4] in ABC:
        for q[5] in QBC: # careful
          for q[6] in ABC:
            for q[7] in ABC:
              for q[8] in ABC:
                for q[9] in ABC:
                  for q[10] in ABC:
                    for q[11] in ABC:
                      time.sleep(1)
                      qlist = [('q'+str(ix), q[ix]) for ix in range(1,12)]
                      qlist.append(('Submit', "What's my style?"))
                      qlist.append(('force', 'brute'))
                      post_data = urllib.urlencode(qlist)
                      page_contents = urllib.urlopen(QUIZ_URL, post_data).read()

                      if page_contents.find("you're pretty lame") > -1: continue

                      log_file = open(LOG_PATH, 'a')
                      log_file.write(post_data)
                      log_file.write('\t')
                      log_file.write(GetUsefulParts(page_contents))
                      log_file.write('\n')
                      log_file.close()

The next morning I had a lot of data: choices which had produced non-"lame" grades.

q1=a&q2=a&q3=a&q4=b&q5=c&q6=b&q7=c&q8=c&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction. </p>
q1=a&q2=a&q3=a&q4=c&q5=b&q6=b&q7=a&q8=a&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're pretty spicy although not as much as you could be. You may want to reconsider some of your choices on the next date.</p>
q1=a&q2=a&q3=a&q4=c&q5=b&q6=b&q7=a&q8=b&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction. </p>
q1=a&q2=a&q3=a&q4=c&q5=b&q6=b&q7=a&q8=c&q9=c&q10=b&q11=a&Submit=What%27s+my+style%3F&force=brute <p>You're exciting, but not so much to scare your partner away. Count on the questions in this quiz to lead you in the right direction. </p>

This was interesting. That "pretty spicy" grade suggested that guess was pretty close. Also, I saw that the (a) choice for question 11 showed up in many good answers, as did the (b) choice for 10, the (c) choice for 9. Probably those were correct choices. (The recurring (a) choice for question 1 is less exciting: because of the way I'd ordered the guesses, all guesses that night had (a) for question 1.) I stared at those correct answers for a while, trying to see the clever pattern. That didn't get me very far.

So I used this information to tweak the brute force script. I changed some of the for loops so that instead of considering each choice (a, b, or c) it only considered the correct choice. (Yeah, there were more elegant ways I could have coded it; this was an easy edit.)

                for q[9] in 'c': # was ABC:
                  for q[10] in 'b': # was ABC:
                    for q[11] in 'a': # was ABC:
...and re-ran the script. Now it was wasting less time on bad choices for those questions. I let it run a while longer, looked at the output again, used it to narrow down a few more choices. Ran it again. The next time I looked through the logs, there was another kind of grade: one that let me know that the script had found the perfect answer.

No doubt it would have been more satisfying to solve this puzzle with cleverness than by brute force. But brute force can be fun, too.

Labels: ,

Puzzle Hunts are Everywhere: Simple Website Monitor

Waiting for the bus, Jonas asked me: "Why did you start beeping during that tech talk?"

People at work occasionally start beeping. We're an internet company with many servers. When servers have problems, system administrators' pagers start going off. But I'm not a system administrator. I'm a technical writer. When I go on vacation, I don't tell people "In case of emergency, you can reach me at ____", I write "In case of a documentation emergency (ha ha), you can reach me at _____". And it's funny. At least, it was funny the first time.

And the people at this tech talk probably weren't likely to get paged. It was a tech talk on an open-source library of computational geometry functions. You don't really see these people getting paged with... I dunno, some errant line segment getting into a data set, coincidentally totally vertical, its infinite slope causing numbers to spin out of whack or...

I don't know where I'm going with that. That doesn't happen.

I 'fessed up. That pager signal didn't come from a work system. I'd set up a computer program to monitor coed astronomy's web site. They were going to host a puzzle hunt and their web site would announce when sign-ups were possible. I wanted to know when that happened: if their game didn't have many slots for teams, I wanted to sign up before those slots filled up.

It's pretty easy to set up this kind of monitoring if you have a Unix machine on the internet. (I bet that very few people will be interested in this blog post. Half of the bay area puzzlehunters are software developers who will wonder why I'm describing something so obvious; the rest don't program and are about to get scared away when I show them source code. But maybe I'll show them that, if they're going to get into programming, that this is a pretty achievable task to take on.)

On a Unix machine, you can set up a "cron job". You can tell the machine to run a program once every few minutes. (You can set up cron jobs to run at strange times. I set up this cron job to run on prime-numbered minute offsets from the hour--because I was in a silly mood.)

What program did I set up? I set up a simple python script. This script checked coed astronomy's web page and compared its contents to a copy I'd downloaded earlier. If it noticed a change, it mailed my pager. Like I said, this script is simple, if you know what you're looking for. Python is a nice language to use because, for any given task, someone has probably written a library of functions to help you. The bad news is that it can find a while to find the library that you want. In this case, I wanted urllib2 to fetch the web page contents.

import os
import urllib2

# download the page
pagecontents = urllib2.urlopen("http://coedastronomy.org/sf/").read() 

# compare it to previously-saved file, "golden" copy of the page
# contents which I downloaded earlier.
goldenfile = open("/home/lahosken/golden.html")
goldencontents = goldenfile.read()

# If there's a difference, OMG page me by sending mail to my pager
if goldencontents != pagecontents:
  os.popen("mail page@lahosken.san-francisco.ca.us -s IT_IS_HAPPENING < /dev/null")

This script watches one page. I've done stranger things to watch a web site, running wget in spider mode and then recursively diffing the resulting directory to a previously-generated "golden" directory tree. And there are stranger things.

Anyhow, my pager went off during a tech talk. And it kept going off because I was so busy signing up for the game that I didn't immediately shut the script down. But I eventually did. Hopefully, everyone at the tech talk thought I was a pager-wearing badass entrusted to rescue foundering servers. But Jonas wasn't fooled. Maybe none of them were.

Labels: ,

Book Report: Beautiful Code Chs 30-33

(If you're reading these posts in reverse chronological order, be aware that this Book Report is the last one of a series. This book report is for Beautiful Code, a book of essays. Rather than try to review all of the essays at once, I chunked them into blog posts, a few chapters each.) (If you're reading these posts in forward chronological order, you reached the last one! But you probably already figured that out.)

When a Button is All that Connects You to the World / Arun Mehta

I didn't know what to make of this chapter. It's a HCI exercise, inspired by Stephen Hawking: how would you design a computer UI for someone who can just press one button? But Stephen Hawking doesn't use their software. It's not clear that anyone uses this software.

Emacspeak: the Complete Audio Desktop / T.V. Raman

Every so often at work, I see a message on a distribution list from this software developer, T.V. Raman. It's usually about how to use some API for some Web2.0-ish page. At first, these messages surprised me. I suppose I was prejudiced. I kept thinking, Wait, how did this blind guy even notice that this totally visual page had useful information on it? How does he make as much progress around the web as he does? I asked him for some background once, and he gave me an answer that didn't make much sense: it sounded like he'd set up the Emacs text editor to browse web page, screen-scrape useful information out of them, and speak it. But that couldn't be right, could it? Surely I was misunderstanding? But that's what he did.

In this essay, Raman gives us a glimpse into his Emacs configuration, how he put it together, how he's built it up. He talks about the early days of the web, before pages got all visually fancy. He talks about why he likes some parts of Web2.0: the APIs. They reveal data--unadorned with presentation. He knows how to program Emacs to fetch, parse, and speak the data he's interested in. In this article he talks about how he uses Emacs' advice feature to add text-to-speech support. I think I understand it now. I think I understand how this guy bootstrapped his system, turned Emacs into his window to the world.

Code in Motion / Laura Wingerd and Christopher Seiwald

Ha ha, folks from Perforce, a revision control system, wrote this essay about writing code so that you can understand it when you need to revise it years later. That's cute. This was a fun essay.

Writing Programs for "The Book" / Brian Hayes

This essay was pretty cool, though it's not clear how mere mortals are supposed to put its method into practice. Hayes was working on a geometry problem: testing three points for collinearity. He tries a few methods that don't work. So then he tries the best method: he posts to his blog, asking for help. Soon, he gets tons of advice, including one really great set of lecture notes which has a good solution.

How many computational geometers read your blog? Yeah, maybe I have one, but I don't know that he reads often. I'm not sure that Hayes' method would work for me.

The best part about this article was when I recognized the name of the author. Brian Hayes wrote the article about Markov-chain-generated text that inspired my Daily Nonsense page. Nowadays he has a blog. So I spent the rest of the day catching up on two years of archives on that. And I subscribed to his blog. I'm not a computational geometry expert, but maybe he'll be stumped by some Japanese ska trivia someday and I can help out.

Labels: , ,

Book Report: Beautiful Code Chs 26-29

Labor-Saving Architecture / William R. Otte and Douglas C. Schmidt

This is a fun essay, talking about issues that arise if you have a distributed network of computers and you want all of those computers to be able to send their logs to one central log collection server. They talk about different ways you might want to implement that server. You probably want to think about concurrency. Ideally, the client machines wouldn't care so much about how the server is implemented. They mentioned in passing some project called ACE that they've worked on. It wasn't clear how much of this essay came from ACE. I'm still not sure what ACE does. But this essay still got me thinking.

Integrating Business Partners the RESTful Way / Andrew Patzer

Can something partly elegant and partly quick-and-dirty show up in a book about Beauty? Ah, who cares, this was a fun read. Part of the fun: business clients who want the latest greatest buzzword w/out understanding what it means. This essay even had J2EE, but I still enjoyed reading it.

Beautiful Debugging / Andreas Zeller

Brute force debugging. Made me think. In the end, I don't think I much like the approach. Still, it's good when something makes you think.

Treating Code as an Essay / Yukihiro Matsumoto

Matusmoto "Matz", a programming language designer, thinks that programming languages should allow code to be clear and expressive. Then again, every language designer thinks that programming languages should allow code to be clear and expressive--but they can't agree on the details: which code is clearer? This essay talks about some of the issues around language design, but I've already heard these issues as I shielded my ears against various "my favorite programming language is better than yours" religious arguments over the years. I wish he'd chosen something more specific as his topic.

Labels: ,

Book Report: Beautiful Code Chs 22-25

(Visiting the doctor is good for you. Today, I visited a cardiologist to make sure that my recent hospital visit was Really No Big Deal. Thus, I missed the last bus to work and worked from home today. Thus, I'm not trapped in this evening's huge peninsula traffic jam. Three cheers for modern medicine. What? Oh, right, some more chapters from Beautiful Code.)

A Spoonful of Sewage / Bryan Cantrill

Thread scheduling algorithms. When I hear that I'm going to have to think about a thread scheduling algorithm--especially a thread scheduling algorithm that isn't quite working--I gulp nervously and tug at my collar. This essay about thread scheduling algorithms; it desribes a bumpy ride. I learned from it.

Oh, I keep looking for things to quote from this essay--but things from the middle don't make much sense unless you've read the beginning. There's some talk about debugging multi-threaded problems... Oh, just go read it. Oh, here's a bit that makes some sense:

The essence of the problem is this: for user-level locks, we normally keep track of the state associated with the lock (e.g., whether or not there's a waiter) at user level--that information is considered purely advisory by the kernel. (There are several situations in which the waiters can't be trusted, and the kernel knows not to trust it in those situations.)

To implement priority inheritance for user-level locks, however, one must become much more precise about ownership; the ownership must be tracked the same way we track ownership for kernel-level synchronization primitives. That is, when we're doing the complicated thread lock dance in turnstile_interlock(), we can't be doing loads from user-level memory to determine ownership. The nasty implication of this is that the kernel-level state tracking the ownership of the user-level lock must itself be protected by a lock, and that (in-kernel) lock must itself implement priority inheritance to avoid a potential inversion.

It also has some implicit good advice for code reviewers towards the end.

Distributed Programming with MapReduce / Jeff Dean & Sanjay Ghemawat

I skipped this chapter. Hey, give me a break. I hear about this stuff all the frickin' time.

Beautiful Concurrency / Simon Peyton-Jones

People who work with Haskell keep telling me how self-documenting the code is. They never convince me--probably because they make the mistake of showing me some Haskell code. This essay didn't convince me, either. Maybe because it showed me some Haskell code and then rushed along, thinking I'd understand it.

nTimes :: Int -> IO () -> IO ()
nTimes 0 do_this = return ()
nTimes n do_this = do { do_this; n_times (n-1) do_this }

Erlang folks also pull this kind of no-for-loops nonsense, but they don't keep telling me I should be able to read it by instinct. What does a loop have to do with IO? Does IO mean what it usually means? Do I care anymore?

Jones just wrote this essay. It's not his fault that other Haskell fans have soured me on the language. Maybe I should give this essay another chance. Nnnergh. I just don't have the willpower, not today.

Syntactic Abstraction / R. Kent Dybvig

I can't blame this one on the language. I thought I understood Scheme before I read this essay. But it turns out I don't know as much as I thought I did. Or I know the wrong dialect. Or something. Yoy. And this chapter was giving me freshman software engineering class Metacircular Evaluator flashbacks. I gave up on this chapter partway through.

Labels: , ,

Book Report: Beautiful Code Chs 17-21

Another Level of Indirection / Diomidis Spinellis

I'm not exactly sure what I was supposed to get out of this essay. "Function pointers can be useful."? OK, the point of these essays was not to instruct me; the point was that people would write about beautiful code. Still, I keep hoping to learn things.

Update: I got an email from someone claiming to be Diomidis Spinellis:

My point was that layering abstractions is an elegant design that can bring about powerful synergies. Obviously I didn't make it too clear. Pointers are just a mechanism for implementing this policy.

Cheers,

Diomidis

Python's Dictionary Implementation / Andrew Kuchling

Because Python uses dictionaries for just about everything, it has some interesting use cases. I liked the Special Accommodations section here; I also liked the discussion of the Free List.

Multidimensional Iterators in NumPy / Travis E. Oliphant

This essay mostly lost me. It explained the problem OK, but then it started talking about the solution and the special data structures useful for that solution... and just lost me.

A Highly Reliable Enterprise System for NASA's Mars Rover Mission / Ronald Mak

"Middleware... java beans..." I'm sorry, my brain just went into emergency shutdown mode. Skipping to next chapter.

ERP5 / Regerio Atem de Carvalho and Rafael Monnerat

It's a flexible content management system. That's good, because no two people can agree on how they want their content management system to work. I didn't pay much attention to this chapter, though.

Labels: ,

Book Report: Beautiful Code: Chs 13-16

The Design of the Gene Sorter / Jim Kent

This essay is what I want to see in a book called Beautiful Code. He talks about the design. He dives into specifics of implementation. The section "Theory of Beautiful Code in the Large" touches on making code readable, and it has wisdom aplenty. "Programming is a human activity, and perhaps the resource that limits us most when programming is our human memory." If you find yourself, in the first few sections saying "I'm not going to write an OO system in plain old C; I'm skipping the rest of this essay," maybe instead you want to skip ahead to this essay's "Theory of Beautiful Code in the Large", which is pretty generally applicable.

How Elegant Code Evolves with Hardware / Jack Dongarra and Piotr Luszczek

What I got out of this essay: math programming is arcane.

The Long-Term Benefits of Beautiful Design / Adam Kolawa

What I got out of this essay: math programming is arcane, but you can do some things to make it a little less so.

The Linux Kernel Driver Model / Greg Kroah-Hartman

I sprinted through this essay. The previous two essays had been so arcane that I'd got into the habit of skimming. By the time I had determined that this essay might be comprehensible, I had rushed through most of it. But there were some good things.

The lack of runtime checking forces the developers who are manipulating these pointers to be absolutely sure they know exactly what type of pointer they are manipulating and passing around the system. Sure, at moments, a developer really wishes that there would be some way to determine what type of struct device he is staring at, but the feeling eventually passes when the problem is properly debugged.

Is this lack of type checking good enough to be called "beautiful code"? After working with it for over five years, yes, I think it is. It keeps easy hacks from springing up within the kernel and forces everyone to be very exact in their logic, never falling back on any checks for types that might prevent bugs from happening later.

I should note here that only a relatively small number of developers--those who code up subsystems for common buses--work on these parts of the kernel, and that they are expected to develop considerable expertise; that is why no hand-holding in the form of type-checking is done here.

I'm still not sure what I think of that. "No-one else will ever need to look at this" has been an excuse for a lot of ugly code. And yet, I can't argue with their good results here.

Labels: ,

Book Report: Beautiful Code: Chs 9-12

(I started learning Erlang a couple of weeks ago. Then I stopped. I'd started learning how to use the concurrency features. So I tried a simple program: it ran a "while true" loop in two threads--should max out both of my CPUs, right? I looked at my system stats with a top, and saw that my program was only using one CPU.

So then I gave up on Erlang.

Then I hung out with friends. One of them was pretty frustrated trying to view some HD movies he'd downloaded. The people uploading these movies use a compressed format. The compression doesn't save much space, but most people use it anyhow. But it means that when you watch the movie, the computer needs to think pretty hard to de-compress all of that data. My friend's computer can't always keep up. He's mad at people who use this compressed format. Me, I latched onto something he mentioned in passing--his machine has two CPUs; during decompression one CPU is maxed out, but the other sits idle.

I started noodling around some ideas of how you could parallelize this task, which brought me back to thinking about concurrency. OK, Erlang didn't work. C++ threading... I dunno, there's probably some library I could use that would make C++ threading less painful than banging on my fingers with hammers. But would I be able to find that library before my head exploded?

Twisted Python sounded like it was trying to solve the parallelization problem. I set up a Twisted equivalent of the two "while true" loops. It used more than one CPU! It used... about 120% of a CPU. Nice, but less than the 200% I hoped for. Ah, I was hoping that Twisted had removed Python's "global lock" that I'd heard occasional whines about, but they hadn't. But while reading about this, I learned something...

erl -smp. Erlang uses just one CPU, unless you specifically tell it to use more, by passing the -smp flag. Why isn't that the default? Good grief. Anyhow, the "Getting Started with Erlang" tutorial, which I'd been picking my way through... it didn't mention that little detail. So now I'm back to playing with Erlang. When I'm home and alert. Which isn't often. Anyhow.

What? Oh, right, this is supposed to be a Book Report, not me whining about the sad state of parallel programming for home programmers.)

Top Down Operator Precedence / Douglas Crockford

I'm not sure what Top Down Operator Precedence is, and this essay didn't tell me. It jumped right in and showed me. Too bad I didn't follow it. This essay wasn't very useful to me, except that it made me think I should go read the original paper on Top Down Operator Precedence. Maybe if this essay had included a summary of T.D.O.P., I could have remembered that bp stood for Binding Power instead of base pointer, but I gave up on this muddle.

The Quest for an Accelerated Population Count / Henry S. Warren Jr.

Bit-counting. Oh my yes. I enjoyed this essay, but I hope I don't have much excuse to use its lessons.

Secure Communication / Ashish Gulhati

The story of a project that grew over several iterations. This essay meanders plenty. It was a fun read, but at the end, I hadn't learned anything. I wish he'd chosen one tweak to this program and dove in to more detail. The "Auditing Crypt::GPG" section seems like it would be a good one to expand.

Growing Beautiful Code in BioPerl / Lincoln Stein

When I whine about programmers who write about their own code, this is the kind of thing I'm talking about. Lincoln has his act together. There's some good stuff in this essay. But there were some stretches which seemed to talk about stuff that was kinda obvious--and I am not the smartest coder on the block.

Labels: ,

Book Report: Beautiful Code Chs 5-8

Correct, Beautiful, Fast (in That Order) / Elliotte Rusty Harold

Emerging from the previous essay, I saw that this essay was going to be about verifying correctness of XML. My yawning muscles tensed in anticipation. But this essay was good. If you're writing an XML validator in Java, a tricky bit is: "The Java Character.isLetterOrDigit and Character.isDigit methods aren't perfectly aligned with XML's definition of letters and digits." Is it worth the processing time to use your own character-checking function?

Framework for Integrated Test: Beauty through Fragility / Michael Feathers

This essay is about a simple program that could have been made much more complicated. In some ways software engineers are like "real" engineers; one of those ways is: we over-engineer things. It's easier to write about adding things to systems rather than to write about leaving them out. What did you add to the system? You can write about that. What did you leave out of the system? Well, you left out millions of things. How do you write about that? This essay is about a simple framework that left out a lot of cruft.

Beautiful Tests / Alberto Savoia

I've already been won over to the cult of unit testing; I didn't learn much from this essay.

On the Fly Code Generation for Image Processing / Charles Petzold

Back in the day, we wrote self modifying assembly code. And by "we", I mean "not me". I didn't learn much assembly until I emerged from university. And by then, self-modifying code wasn't such a great idea anymore.

In this essay, Petzold shows how to apply the lessons of assembly code to... C# virtual machine instructions. I don't know much about C#, but it was fun to read how some bit-twiddlers are applying old tricks in a new environment to good effect.

Labels: , ,

Book Report: Beautiful Code Chs 2-4

(Another episode of Iron Puzzler is coming soon. And now, on to our partial book report, Beautiful Code, chapters 2-4...)

Subversion's Delta Editor / Karl Fogel

This essay was nice. It talks about SVN's way of expressing file trees and deltas to that tree. I think I'd seen this in some lecture or other, but it had mostly leaked out of my head again; the review was welcome. Next, this chapter discusses the problem the SVN team wanted to solve, a tricky bit of API design. Then the explanation of the API: a code comment. It's a long code comment. But it's a good code comment.

"The real strength of this API, and, I suspect, of any good API, is that it guide's one thinking." Heck, any API guides your thinking. A good API leads your thinking to a better place;a bad API leads your thinking to bedlam.

The Most Beautiful Code I Never Wrote / Jon Bentley

Jon Bentley convinced me to buy this book when he presented this essay's material as a talk. It's a great talk, with twists and turns and math and the importance of measurement. Uhm, but since I'd already seen the talk, I just kinda skimmed the chapter.

Finding Things / Tim Bray

This essay left me cold. Searching Apache weblogs with a regular expression? OK. Lookups using an index done via a binary search? Sure, OK. I dunno, this "beautiful" code seemed kind of everyday to me. Maybe there were subtle insights here that I overlooked. Maybe I take this stuff too much for granted.

Labels: ,

Book Report: Beautiful Code Ch 1

Beautiful Code is a book about programming well. There are 33 chapters. In each chapter, one or two big-name programmers write about "the most beautiful piece of code they knew." As you'd expect with so many authors, the result is a mixed bag. The good parts are very, very good. I'm splitting up my book report into multiple parts, as with that Cyber China book.

A Regular Expression Matcher / Brian Kernighan

In this essay, Kernighan discusses the regular expression matching program that Rob Pike wrote for their book The Practice of Programming. There's plenty to like in this essay.

He's talking about code that he did not write. Most of the better essays in this book were by programmers talking about other people's code. Programmers writing about their own code bog down in minutiae. But maybe there's a deeper reason I liked these essays in which someone looked at someone else's code. I'm trying to learn something new about programming by looking at someone else's code. Kernighan has looked at this code and been impressed, and he thinks that code will impress me. That's a good sign. When someone's writing about their own code, you don't know that you're going to learn from it.

This code was meant to be sample code. I'm a technical writer. If you're not a technical writer, then you think I write English all day. I do write English, but I also work on sample code, and that's darned hard. It should be simple, so that a newbie can understand it; correct because many people will copy-paste that code without understanding it very well. Oh, and make sure you get it correct on the very first try, because people who copy-paste that code into their programs today won't check your errata page next month.

That's why I liked the code. To find out what Kernighan likes, get the book + read the essay. I was hoping to review a few essays from the book in each book report, and I've already meandered for a few paragraphs on just one. Sigh. Hopefully I'll calm down soon, get my concision back.

Oh, and if you're a kid fresh out of school with a newly-minted Computer Science degree, go read the Practice of Programming. I swear if I'd read that book before I entered The Industry, the first ~four years would have made much more sense.

Labels: ,

Site Update: Mini-feed on Home Page

I continue to putter around with the computer. I did some programming this morning, and now this site's home page has a little mini-feed with links to a few recent articles on this blog. Not wildly exciting, but it prompted me to configure some... Oh, you're falling asleep, aren't you?

Hey!

If you're going to act that way... Check out the Richter Scales singing "Here Comes Another Bubble" if you haven't already. It's exciting.

Labels: , ,

Site Update: Updated Tags for Old Blog Posts

Blogger.com manages this part of my site, the /new/ part. In the long-forgotten days of 2006, Blogger.com didn't support labels/tags/whatever. In those dark days, I hand-made some tags, tags which looked suspiciously like tarted-up links to technorati.com. Then Blogger.com did support labels/tags/whatever. But now I had all of these blog posts that didn't follow the new tagging scheme. What to do?

I've been noodling around on the computer a lot, and today I tackled this one. What did I want to do?

  1. Read my blog's data into a program
  2. Look for links to technorati.com/tag/foo
  3. For each of those links, create a label "foo".
  4. Upload the results to Blogger.com

As I was putting a program together to do this, I discovered another task: don't convert all of the tags; tweak some of them. That sounds like a strange goal, doesn't it? Well, I had made technorati tags for nautical and maritime, just one item tagged with each. Maybe I wanted to convert that "maritime" to "nautical" for consistency. Why leave out some tags? I'd technorati-tagged a Pi post "irrational". I'd never used that tag again. (You might doubt that, given the hysterical tone of this blog, but it's true.) I didn't especially want to clutter up my tag space with 100 obscure tags, each used only once.

So I started reading up on the Blogger GData APIs. At first I thought I could install the Python client library with a simple sudo aptitude install python-gdata. That seemed to install an old version of the library that didn't work with my Python. (It insisted on importing ElemTree from some nonexistent place.) So I ended up downloading and installing the latest version of the library.

Then I set about reading the docs, copying out bits of sample code, and twisting them towards my own purpose. Soon I had a messy piece of code that seemed to work:

from xml.etree import ElementTree
import gdata.service
import gdata
import atom
import getopt
import re

TAG_RE = re.compile('technorati.com/tag/([^\'\"]+)')

TAG_MAP = {} # If a tag isn't in here, ignore it.  If it is in here, see how to convert it.
TAG_MAP['book'] = 'book'
TAG_MAP['books'] = 'books'
TAG_MAP['puzzle%20hunts'] = 'puzzlehunts'
TAG_MAP['puzzle%20hunt'] = 'puzzlehunts'
# ...this TAG_MAP crapola went on for a while
TAG_MAP['pi'] = 'pi'
TAG_MAP['poesy'] = 'poesy'

# Get my blog's ID number.  I guess I didn't really need to 
# run this each time.  I should just have made a note of the
# number.
def RetrieveBlogId(blogger_service):
  query = gdata.service.Query()
  query.feed = '/feeds/default/blogs'
  feed = blogger_service.Get(query.ToUri())
  return feed.entry[0].GetSelfLink().href.split("/")[-1]

# The program doesn't actually call this function!  But
# an earlier version of the program did, back when I was
# still trying to figure out how the API worked.  This fn is
# taken from the sample code, but with an important tweak.
# The sample didn't mention that by default, this would
# only retrieve 25 blog posts.  My blog had 427 items, so 
# I appended a tactical "?max-results=500".
def PrintAllPosts(blogger_service, blog_id):
    feed = blogger_service.GetFeed('/feeds/' + blog_id + '/posts/default?max-results=500')
    print feed.title.text
    for entry in feed.entry:
      print "TITLE \t" + entry.title.text
      print "\t" + entry.content.text
      print "\t" + entry.updated.text
    print

# Create a client class which will make HTTP requests with Google Docs server.
blogger_service = gdata.service.GDataService("lahosken@gmail.com", "ilikeyou")
blogger_service.source = 'Technorati-to-Label-1.0' 
blogger_service.service = 'blogger'
blogger_service.server = 'www.blogger.com'
blogger_service.ProgrammaticLogin()

blog_id = RetrieveBlogId(blogger_service)
# PrintAllPosts(blogger_service, blog_id)

feed = blogger_service.GetFeed('/feeds/' + blog_id + '/posts/default?max-results=500')
for e in feed.entry:
    dirty = False
    for tag in TAG_RE.findall(e.content.text):
        if tag in TAG_MAP:
            kitty = atom.Category(term=TAG_MAP[tag], scheme="http://www.blogger.com/atom/ns#")
            e.category.append(kitty)
            dirty = True
    if dirty:
        blogger_service.Put(e, e.GetEditLink().href)

Yes, that is some awful code. No, I don't think it was good style to name that variable "kitty". Give me a break, it was a one-off.

This code ran quickly, but I think that Blogger.com is still handling the results. At first, I didn't think that it had worked. I thought maybe I needed to force a republish of all my pages. So I tweaked the blog's template and triggered a republish. But I don't think that was the problem. Actually, my pages are changing. It's just taking a while. It's several hours later now, and I notice that my blog's pages keep getting re-uploaded. I think I changed tags on about 100 blog posts, and I think each of those changes triggered a republish. They seem to happen about 2-10 minutes apart. If there are a few hundred to process (plus that template change), I guess it makes sense that it would take a few hours.

I wonder when this post will appear.

Labels: , ,

Book Report: Parallel Distributed Processing

Based on the title, I hoped that this heavy two-volume set of books containing a number of articles would teach me a lot about how to write programs that run on several machines at once. After reading a couple of sections and skimming further, I figured out that these articles are about what we nowadays call "neural nets". Maybe these are great articles about neural nets, but that's not what I wanted to read about. I stopped.

Labels: , , ,

Book Report: BAE05: Ellen Ullman's "Dining with Robots"

The Best American Essays 2005 contains two essays which pay homage to the then recently-deceased chef Julia Child. One of them is by Ellen Ullman. Ellen Ullman is a geek; she writes about software development; here in the essay "Dining with Robots" she writes about a clunky metaphor. When she was learning to write computer programs, she heard this metaphor; when I was learning to code, I heard this metaphor. Maybe it's universal. The metaphor is this:

A computer program is like a recipe. It's a set of instructions.

Ellen Ullman points out that this metaphor doesn't work when you look at a recipe from Mastering the Art of French Cooking--she looks at a recipe that mentions "important guests" and "a good bordeaux". These are real-world concepts, not easily expressed as computer data structures.

This leads Ms. Ullman to some musing on the topic: AI is a hard problem. More specifically, creating artificial intelligence that will interact with humans is a hard problem. But she wrote this article for non-technical people, so she doesn't talk about various past AI techniques which flopped. Instead, she talks about the peculiarly human thinking we do automatically at a dinner party: sitting in a chair, understanding the utility of an ice-cream spoon; tasting.

Then it gets weirder. What a fun essay.

Tags:  |  |  |

Labels: , ,

Book Report: Game Physics

David Eberly wrote this computer programming book about physics and numerical methods. Where "numerical methods" means making quick accurate calculations. It's an interesting subject, and this is an interesting book. Plus, the pages of the book are full of calculations, so the guy next to you on the bus will think you're a genius. There's a lot in this book, and I flipped past a bunch of stuff I wasn't interested in, and wish I'd flipped past more--I'm still not sure why there's so much material about LaGrangian dynamics--how often is there a game where the player is stuck on a track?

Tags:  |  |  |

Labels: ,

Switching Gears

Today I felt like I'd lost a fight with the interior of a passenger van, but that wasn't the problem. I'd had a great weekend playing in the Griffiths Game, a 24+ hour puzzle hunt run by the Burninators and their band of merry volunteers. This morning, I was paying the price: bruises everywhere (from stumbling around inside a van) and achy bones (from sitting still inside a van), and more bruises (from walking into things after staying awake for too long). But that wasn't the problem.

The problem was how I prepared my backpack this morning. I remembered to untie the magic lasso, dumped out many spare sheets of not-exactly-hex paper, and carefully extracted a plastic bag full of puzzles. But I forgot to pack a book.

So it was a long bus ride this morning. I sat and stared out the window.

That wasn't bad, though. I looked at things outside the window and I thought of puzzles based around those things. I had a few ideas. It wasn't many by a true puzzler's standard. But up until this point, I'd had about one half-baked puzzle idea per month, so having a few ideas in one hour was quite a jump.

* ~ * ~ * ~ *

On the bus ride back from the office, I read an office copy of the book Effective Java. Thus I spared myself the ordeal of trying to think of any more puzzles. Exercising your creativity every so often is OK, but you wouldn't want to make a habit out of it.

Labels: ,

Link: Parallel Analysis with Sawzall

People ask me what I do at work. I did not write the academic paper Interpreting the Data: Parallel Analysis with Sawzall (Pike, Dorward, Griesemer, Quinlan 2005). But I did revise the tutorial for the described system/language.

(The views expressed on these pages are mine alone and not those of my employer.)

Tags:  |  |  |

Labels: ,

Link: Joel on Hungarian Notation

Just when I thought I was going to have to read the papers myself, Joel Spolsky wrote a readable paper about the non-braindead version of the software engineering technique Hungarian Notation. Is the technique worth using? I used to think "Hell no!", but I'm backsliding to "I guess not."

Labels: ,

Hungarian Notation Not Brain Dead

(If you are not a computer programmer, this item will not make sense.)

For years I made fun of Hungarian Notation and Charles Simonyi. Now, thanks to Joel Spolsky, I find out that Hungarian Notation started out as something useful.

We try to use something called Apps Hungarian notation, as invented by Simonyi, not the grotesque bastard Systems Hungarian notation, misinterpreted by Petzold and the entire Windows team.

--Joel Spolsky, The Road to FogBugz 4.0, Part III

Now I'm tempted to start making fun of this Petzold guy. But of course it would only be a matter of time until I found out that he was OK, too. So never mind.

[Update May 12 2005: Joel wrote about the right way to use Hungarian notation.]

Tags:  |  |

Labels: ,

Book Report: The Process of Creating Life

The Process of Creating Life is the second book of Christopher Alexander's Nature of Order tetralogy. That is, this is a book that is Alexander's theory of the universe and how this nature should guide us in making buildings.

Summary: The first half of the book tells you that an incremental plan/build cycle is good. This is good advice, but it gets lost in the mumbo-jumbo. With this good advice in mind, skip ahead to section 10, part 9. Now start reading--the rest of the book is still about 50% mumbo-jumbo, but the rest is good--there are case studies, and you can learn from them. If you read nothing else in this book, read the appendix. It is one long case study, and has some good stories.

Enough with the summary, already.

This book feels like an update on his The Timeless Way of Building in that the message is: Don't try to plan everything out ahead of time. Instead, alternate between building and planning stages. Maybe this means that you won't know what your second floor will look like as you work on the first floor--but this uncertainty is worth it since it allows you to make decisions while standing on the site of the second floor.

He points out that this is not the normal way of doing things. Most architects try to plan out everything in advance. The results are often ill-fitted to the building site and don't work together well.

OK, so that's pretty much the same message as The Timeless Way of Building. What is new?

The first part of the book combines this incremental approach with the life-giving structural properties of The Phenomenon of Life. Alexander creates a napkin doodle by applying these features incrementally, always preserving symmetry. This is very pretty. It seems doubtful that one could apply this process towards making a building/town/whatever. Or, rather, I can not see how one could use this process to steer a building plan towards something good versus towards something bad.

Oh, wait, before that he talked about how growth works in biological systems. This section is also very pretty. I think the lesson we're supposed to take away from this is that things don't grow as if they were being squirted into an injection mold. They grow--they grow larger, and they grow more complicated. They grow at different rates in different parts. This is very pretty. It seems doubtful that one could apply this process towards making a building/town/whatever.

Then there is a lot of complaining about designs by other architects. Of course, Alexander can not just say, "I didn't like it." Instead, these structures could not have been created by a sequence of Wholeness-preserving application of the Properties. Except that some of them could have been. How can we explain these away? Well, these could have been created by iterated transformations based on the properties--but only if those transformations happened in the wrong order. And who can say what this wrong order is? Christopher Alexander can, of course. But he can not explain it.

He tried to explain it to a student. He told the student to first choose the most important part of the design, to start with that and get it just right. And not to start working on the next part of the design until the first part was right. So that determines the order, right? Maybe. Alexander describes stepping through this process with the student. Alexander asked the student which was the most important part, and the student answered, and Alexander said, OK, work on that part. Then Alexander asked, what is the next most important part of the design? And the student answered, but Alexander did not agree with this answer. And so Alexander said "Really? Is that really the most important part?" And he was eventually able to bully the student into backing down. The student rattled off some other guesses until he found one that pleased Alexander. And thus the process continued.

Alexander does not phrase it that way, of course.

I do not doubt that the resulting design was good, but obviously it's not so easy to determine what the right order of transformations should be.

There are some photos and descriptions of good buildings and places. Alexander points out some modern ones. I guess if I read more architecture theory books, I'd know that Alexander's critics accuse him of being a Luddite or something. Fortunately, I have no plans to read much more architecture theory.

That's the gist of the first half of the book. It had a pretty low signal-to-noise ratio.

Fortunately, it gets better. In the second half of the book, there are some good case studies. One thing I learned from them: if Alexander uses those 15 structural properties that he laid out in The Phenomenon of Life, he does so only unconsciously. Which suggests that one should not read The Phenomenon of Life.

There is a case study of the Julian Street Inn in San Jose. Here we see a series of sketches in which the plan takes shape. You can see the centers form, and Alexander talks about how he made decisions. It's by feel: he creates "strong centers". He creates a place with a "living feel". That is, he does not describe his process in a way that helps the reader to make these decisions. But the sketches help a lot--showing which parts of the plan took shape in which order.

There are some summaries of pattern languages for different places. Though these summaries mostly only provide lists of pattern names, it is interesting to see what variations of pattern language are useful in different places and cultures.

Chapter Fifteen "Emergence of Formal Geometry" has more site sketches and pretty pictures. Here, we are not taken through so many steps of the initial planning, but there are some. Again, the fifteen properties do not appear. Actually, he does rattle off some of them when he talks about creating a grid, but in a way that suggests that one can use those properties to justify anything.

A scary story for the computer geeks in the audience:

When I described the Pasadena apartmnet-building sequence and other examples to a prominent computer scientist in Silicon valley he said to me, with an astonished look on his face: "You mean the generator problem, for architecture is solvable?"

I told him that I did mean that, and that my colleagues and I were on the way to solving it for a large number of particular cases, and believed it to be solvable in general.

Can we claim that a problem is solved if we still believe it is AI-complete? Bah. Did Alexander ask what was meant by the generator problem?

He presents a list of elements of society in which changes must be made:

  • The process of banking
  • The control and regulation of money
  • The way money flows through a project
  • The conditions in which risk is deployed
  • The process of development
  • Speculation in land
  • Construction contracts
  • The role of architects and engineers
  • Organization of construction companies
  • The nature of planning
  • The nature of master plans
  • The nature of construction contracts
  • The process of ecological evaluation
  • Evaluation by lending institutions
  • Architectural competitions
  • The size and scope of architect's work
  • The teaching of architecture
  • The priorities of manufacturers
  • Building codes and regulations
  • The role of town planners
  • The mortgage process
  • The process of housing ownership
  • Control over housing
  • Ownership of public land and streets
  • Protection of the wilderness

Distressingly, he makes a good case for these. Our laws and customs of land use are pretty screwed up. To make them good would require some pretty fundamental changes to our society.

The best part of the book is the Appendix "A small example of a living process", a case study of planning/building a house in the Berkeley hills. Here we see way that the plans changed as the building was built. We see how to trick the Berkeley building inspectors. We see the advantages and frustrations of the incremental building/planning approach. And there are plenty of pretty photos. Check it out.

Tags:  ;  ;

Labels: , ,

Hiding Data in Metadata

I'm flipping through this telegraphic code book which E. E. Morgan's Sons used for encoding messages long ago. Most of it consists of code words to convey phrases. E.g., instead of sending "one hundred tins", you would send the code word "waver".

But the people using this book used a trick to embed data in their signatures. Only a few people were composing messages. So the person's signature didn't convey that much information. But the people at E. E. Morgan's Sons didn't necessarily sign their names. On different days of the week, you would sign a different name. On Monday, Hargrove would sign his name as "Harford". On Tuesday, he would sign as "Harive".

Thus, they encoded both the signator's identity and the day of the week in the signature.

Does this mean that the telegraph service did not charge money for the signature? Otherwise, I would thingk that they would want to choose shorter names. Does this mean that telegraphs sometimes took a few days to arrive? I don't know. But I am jotting this note none-the-less.

Labels: ,

Book Report: The Phenomenon of Life

Summary: This is a good book if you skip the first four chapters, the last chapter, and half of the appendices.

Christopher Alexander is famous as the honcho behind A Pattern Language. A Pattern Language described a set of architectural techniques. It was organized very well. Computer geeks liked it--we recognized that this was a good way to organize a bag of tricks. Do you remember that the introduction to the computer science book Design Patterns mentioned that some architect came up with the idea of design patterns? Christopher Alexander is that architect. He released a big four-volume work, The Nature of Order, which is supposed to provide the foundation for his previous works.

I read volume one: The Phenomenon of Life. It was tough going, but it was worth it. I suspect that Alexander would be disappointed with what I got out of this book: some rules of thumb for judging aesthetics. I was supposed to get a New Way to Undersand the Wholeness.

Although this may sound absurd to ears trained in the last few decades of scientific orthodoxy, I shall try to show that this conception is more profound scientifically, that it has a solid basis in mathematical and physical understanding of space, and above all that it does provide us with a single coherent conception of the world, and of what we are doing in the world, when we try to make the world "alive."

--The Phenomenon of Life, Christopher Alexander
introducing some attempts at rigorous definition

These are scientific facts based on the research done by captive supergeniuses working in controlled conditions with test mice and test mice dressed like tiny wizards.

--Dorkstorm: the Annihilation, Seanbaby
introducing a similarly rigorous work

Alexander distinguishes between good places and bad places by saying that good places have more "life". He spends the half of this book describing the nature of life. The Dead Milkmen tell us: "Philosophers try to bring me down: 'What's the meaning of life?' Go kill a cop then drink 'til you drop; baby that's my advice in a f&#*ed up world." In other words, you should not attempt to be very rigorous or mathematical when defining life. Yet Alexander attempts to be rigorous. He spends several pages telling us that he will teach us a new way to see the world. He says that we need to understand the Wholeness. Yes, he capitalizes it. He provides an appendix which attempts to define this Wholeness. I claim that it's never defined: the appendix shifts the vagueness around, but the vagueness never leaves. The attempts at rigor are long-winded.

And yet, if you can skim over the rigor, he presents fifteen structural features of things which have life. In other words, fifteen things to keep an eye out for when considering the form and function of a place.

  • levels of scale
  • strong centers
  • boundaries
  • alternating repetition
  • positive space
  • good shape
  • local symmetries
  • deep interlock and ambiguity
  • contrast
  • gradients
  • roughness
  • echoes
  • the void
  • simplicity and inner calm
  • not-separateness

He provides illustrations of places and natural objects which do or do not possess these properties. He discusses how each of these properties can help a place. I was glad that I read this part.

I am not sure why Alexander wants to present his ideas encumbered with so much rigor. A well-designed, well-described, well-chosen, well-organized Bag of Tricks is more useful than a Humbug Theory of Architectural Technique. That's how A Pattern Language inspired a generation of computer hackers to rip it off. If this book's fifteen properties don't have a Grand Theory to back them up, but if they work, are they still worth studying? Of course they are.

There is a good story buried in the footnotes, in which we find our heroes battling forces of oppression:

In order to help the city of Nagoya, my colleagues in Japan made a survey in which 100 family members were asked to describe their feelings about the kind of housing I had proposed, compared with the 14-story apartment buildings that are usually built at the same cost and density. ...Once this survey was made it showed overwhelmingly that the families questioned prefered [my kind of housing].

However, it was surprisingly hard even to get permission to make this survey in the first place. Public agencies in Nagoya went to some trouble to prevent this survey from being made at all by interfering with practical details of the survey process, and by trying to change the questions. ... (Details of their attempt to prevent this survey from taking place are given in Christopher Alexander and Hisae Hosoi, The Precious Jewel, forthcoming.

That sounds like it's going to be interesting.

Then, in Appendix 3, we get to see Alexander bully a roomful of students into seeing the universe the way he claims that people naturally see the universe:

It is not always easy to see the wholeness which exists in the world. ... I was astonished, many years ago, to find out, in the course of an experiment I was doing with Radcliffe students, that most of them did not see the wholeness of simple patterns. They saw, instead, a distorted picture of these patterns, viewed them with arbitrary intellectual devices rather than responding to the deeper wholeness that was present in them. I found out, too, that it took immense effort to dissuade them from their distorted cognition, and to help them to see wholeness as it is. ...

Read this book and you might learn some new ways to think. Just don't get so snowed under by rigor that you believe that these are the only ways to think.

Tags:  ;  ;

Labels: , ,

I bet you get these mixed up all the time

Last week, I read the book Managing Gigabytes by Witten, Moffat, and Bell. It's about storing and retrieving huge repositories of data.

This week, I am reading Trilobite! (Eyewitness to Evolution) by Richard Fortey. It's about trilobites, the prehistoric critters.

This, of course, begs the question: What is the difference between a gigabyte and a trilobite?

  • A gigabyte is a K of megabytes--about 109 bytes. "Trilobite" sounds kind of like "A trillion bites", which in the US&A is about 1012 bites. Advantage: Trilobite
  • One half of a gigabyte is a giganybble. One eighth of a trilobite is like a trilobite over eight; and that is like a trilobite overate; which is more than a nibble, but much much less than a giganybble. Advantage: Gigabyte
  • A word-based Huffman encoding can compress a gigabyte of text into much less space. Intense geological pressure can compress a trilobite into a fossil. While this takes less space than the original critter, it is a very lossy transformation. Advantage: Gigabyte
  • Retrieving a gigabyte of information from storage involves a fast index lookup and also decompression. Retrieving a trilobite from the surrounding rock involves a hammer. Advantage: Trilobite

Tags:  ;  ;

Labels: ,

Tech-Brain Candy

When I commute to work, I change buses close to the San Francisco main library. Tonight, I took advantage of this. During the ride from Mountain View to San Francisco, I'd been reading Managing Gigabytes, wrapping my head around algorithms for efficiently implementing an index for a full-text-search system.

My brain was full.

So it was nice to stop in at the library and pick up The Turk, a little volume about a mysterious fake chess-playing automaton. When 20th century* tech is too much to take, it's nice to just jump back a couple of hundred years.

* Yes, I know this is the 21st century, but the book was written in 1994, so back off.

Tag:

Labels: ,

[Powered by Blogger | Feed | Feeds I Like ]

home |