Larry Hosken: New

Book Report: Security Engineering

This book is humongous! It's a survey of security computer engineering. It doesn't go into depth on any one topic, but it's got plenty of breadth. In areas where I already knew something, this book didn't teach me anything. But in areas where I didn't already know something, this book taught me plenty. For example:

Some people are born without fingerprints.
A history of smartcard hacking.
The original motivator for "watermarking" schemes was for proof of authorship (but it turns out that folks aren't trying so hard to claim they wrote Miley Cyrus' songs--they just want to be able to copy those songs).

There were some aspects of Tor I hadn't heard about; admittedly that's because I don't know much about Tor. Similarly, I'd heard some things about government clearance levels, but I hadn't heard about some of the devices used to carefully, carefully move information betwen information clearance levels...

An interesting factoid from the more-exciting-than-it-looks world of banking: about 1% of bank employees "go bad" each year. Embezzles something, steals, helps someone else to defraud... One percent. That's worse than I expected. I don't think everyone is squeaky-clean, but we aren't talking about a random sample of the world population here. These are people who got hired at a bank. There was probably a background check somewhere in there. They had to make it through an interview with folks looking for twitchy behavior. They are monitored; they know they are monitored. I wasn't expecting that "go bad" rate to be zero, but... wow, one percent. Does that mean that anyone who's worked at a medium-to-large bank for a few years probably knows one person who's gone bad?

That was some of the interesting stuff in this book--looks at other worlds, not so far from web apps.

It's a big book. There's plenty in it. There's something to be said for a wide survey.

Labels: book, capabilities, trust the machine

OpenID, OAuth, Learning by Gossip

Last weekend, I did some programming. Well, not much programming. Mostly I did research preparatory to programming. Well, not exactly research. It was more un-research.

I started out learning how to use the OAuth protcol to... to do something it's not meant to do. OAuth is useful, but I learned that it wasn't meant to do what I wanted. If lots of people worked hard, you could use it for what I wanted--but that would be silly, because you can use OpenID for what I wanted.

What I wanted was to set up a little web app with user accounts that didn't ask users for a password. Instead, it would ask the user if they already had an account at some service: Yahoo, Google, Twitter, Flickr, or whatever... and then ask that service: hey, is this person who she says she is?

What I wanted was OpenID, which does that. (Like, say, this OpenID consumer sample implementation for AppEngine.)

But I'd heard some third-hand news a while back. Chatter on forums: Don't use OpenID. None of the big services are using OpenID. Folks asked Google to use OpenID, but Google didn't--because it's insecure. Google's pushing for OAuth instead, and they're web security smarties, you should use OAuth.

That was wrong. I'm not sure how much of the wrongness came from me mis-interpreting what I heard. I'm not sure how much of the wrongness came from the ignorance of the folks spouting off in the forums. But there was plenty of wrongness.

I'm pretty sure I'm not the only one who got confused. Some guy wrote a blog post just to say that OpenID and OAuth are not the same thing.

So I spent a while studying OAuth, thinking "This is kind of a bass-ackwards way to do what I want." Until I finally decided to look over OpenID some more.

The rumors of Google's rejection of OpenID are false. I can write a little web app. That little web app can (if you have a Google account and you give your consent) ask Google: is this person who she says she is? And Google will answer. The Google security team will not jump out from behind your refrigerator and break your fingers.

There are so many technologies to learn. You don't have time to learn them all. How do you find out which things are worth learning about? Me, I listen to chatter. I don't think I'm the only one. It's embarrassing to think about but... for all that we're supposed to be rigorous engineers, we fall back on gossip to figure out what to study in depth. What worthwhile things do we ignore? What do we ignore because of some unearned sneering comment on some IRC channel somewhere that's been repeated, relayed, never fact-checked...

Sorry, was I ranting? I do that.

Labels: capabilities, programming, research

Link: AllMyData

I occasionally backed up my files. But it was always ad-hoc: zip up an archive of some files, upload it to my web server. Done by hand when I got around to it (not often).

Then there was the time when I upgraded my OS and it all went pear-shaped. I knew it was risky, so I zipped up an archive of my files, encrypted it (there was some private info in there), and uploaded it. Then the upgrade took a lot longer than I thought--partway through, it became apparent that I needed to send away for installation discs. And I lost the Post-It with the password I'd used to encrypt my files. And I couldn't remember the password. So that was a few months' of data lost forever.

Gone, daddy, gone.

So it was time to re-think my backups.

I'm starting to buy music online. If I send too much data back-and-forth to my website, I get charged for it. And there's a disk space quota besides. So, as I accumulate more music, I can't keep on using my web-site to hold my backup files.

So it was really time to re-think my backups.

Where could I keep these files? I could get an external hard drive. Of course, a lot of the problems that could wipe out my PC could also wipe out a hard drive sitting next to my PC. I have a teeny-tiny apartment. A small dog could destroy most of my electronics in about a minute. So, I was thinking about off-site backups. Backing up my data to "the cloud," as it were. Every year or so, there's a flurry of rumors that Google will launch a "G-drive". Two years of rumors and it ain't happened. I didn't want to wait for that. (Yes, you know where I work. No, I can't comment on unannounced thingies, nor do I even know half of the stuff that's brewing at work.)

I lurk on a mailing list about "capabilities" in computer security. This guy named "Zooko" keeps posting there about this online storage service he works on. Wow, an online storage system sounds like exactly what I need. Zooko seems reasonably sharp.

At some point, I follow up by learning more about this online file service, Allmydata.com. I find out that their CEO is Peter Secor. I worked with Peter Secor back in the day at Geoworks. Peter was pretty competent back then; no reason to think he's less competent now.

So I signed up for an account at Allmydata.com: $100 a year for "unlimited" storage--where "unlimited" means "as much as I can upload/download".

If you just look at the Allmydata.com website, you'll think "Wait, this is for Windoze losers and Macintosh freaks. There's nothing here about Unix weenies." But my previous research had clued me in to the existence of Allmydata.org's Tahoe, an open source program which would, among other things, allow me to send my files to/from Allmydata.com's service.

What I didn't have was a program that would do something rsync-like every so often. So I threw together some quick-and-dirty Python that could run in a cron job.

#!/usr/bin/python

# Back up files to allmydata.com servers.
#
# Use a random order: I don't trust myself to write
# a program that doesn't fail partway through.  But
# if we run a few times, with a different order each
# time, we should back up most files OK.

import datetime
import simplejson as json
import os
import random
import subprocess
import tempfile

# The trees of files to upload.  Those with 'crypt': True get gpg-encrypted
# before upload.  (The rest are uploaded plain, unencrypted.)
LOCAL_ROOTS = [
    { "path": '/home/lahosken/keep/' },
    { "path": '/home/lahosken/seekrit/', 'crypt': True },
]

# Something like tahoe:/2008/home/lahosken/keep/path/to/file.txt
REMOTE_ROOT = 'tahoe:' + str(datetime.date.today().year)

TMP_DIR = tempfile.mkdtemp()

def EnumerateLocalDirs(start_path):
    retval = []
    for (root, dirs, files) in os.walk(unicode(start_path)):
        if files: retval.append(root)
    return retval

def MakeRemoteDir(local_dir_name):
    print "MakeRemoteDir", local_dir_name
    p = subprocess.Popen("tahoe mkdir '%s'" % REMOTE_ROOT, shell=True)
    sts = os.waitpid(p.pid, 0)
    path = ""
    for pathlet in local_dir_name.split("/"):
        path += pathlet + "/"
        remote_dir_name = REMOTE_ROOT + path
        remote_dir_name = remote_dir_name.replace("'", "'\\''")
        p = subprocess.Popen("tahoe mkdir '%s'" % remote_dir_name, shell=True)
        sts = os.waitpid(p.pid, 0)

def SyncFile(local_path, root):
    print "SyncFile", local_path
    local_path = local_path.replace("'", "'\\''")
    if "crypt" in root:
      encrypted_file_path = os.path.join(TMP_DIR, 
                                         local_path.replace("/","")[-9:]+".gpg")
      p = subprocess.Popen("gpg -r mister@lahosken.san-francisco.ca.us -o %s --encrypt %s" %
                           (encrypted_file_path, local_path),
                           shell=True)
      sts = os.waitpid(p.pid, 0)
      p = subprocess.Popen("tahoe cp '%s' '%s'" % 
                           (encrypted_file_path, REMOTE_ROOT + local_path),
                           shell=True)
      sts = os.waitpid(p.pid, 0)
      os.remove(encrypted_file_path)
    else:
      p = subprocess.Popen("tahoe cp '%s' '%s'" % 
                           (local_path, REMOTE_ROOT + local_path),
                           shell=True)
      sts = os.waitpid(p.pid, 0)

def MaybeSync(local_dir_name, root):
    print "MaybeSync", local_dir_name
    remote_dir_name = REMOTE_ROOT + local_dir_name
    local_file_list = os.listdir(local_dir_name)
    random.shuffle(local_file_list)
    remote_dir_contents_json = subprocess.Popen(["tahoe", "ls", "--json", remote_dir_name], stdout=subprocess.PIPE).communicate()[0]
    remote_dir_contents_dict = { "children" : [] }
    if not remote_dir_contents_json:
        MakeRemoteDir(local_dir_name)
    else: 
        remote_dir_contents_dict = json.loads(remote_dir_contents_json)[1]
    for local_file_name in local_file_list:
        local_path = os.path.join(local_dir_name, local_file_name)
        if not os.path.isfile(local_path): continue
        if "crypt" in root and local_file_name.endswith(".rc4"): continue
        if not local_file_name in remote_dir_contents_dict["children"]:
            SyncFile(local_path, root)
        else:
            remote_mtime = remote_dir_contents_dict["children"][local_file_name][1]["metadata"]["mtime"]
            local_mtime = os.stat(local_path).st_mtime
            if local_mtime > remote_mtime:
                SyncFile(local_path, root)
            else:
                pass
                # print "Skipping", local_dir_name, local_file_name, "already there"
            

def main():
    p = subprocess.Popen("tahoe start", shell=True)
    sts = os.waitpid(p.pid, 0)
    random.shuffle(LOCAL_ROOTS)
    for root in LOCAL_ROOTS:
        local_dirs = EnumerateLocalDirs(root["path"])
        random.shuffle(local_dirs)
        for local_dir in local_dirs:
            MaybeSync(local_dir, root)

if __name__ == "__main__":
    main()

Things can still go wrong with this system. I use gpg to encrypt my private stuff. My gpg key is on... my computer. It's also on a piece of removable media. Remember that hypothetical small dog I mentioned earlier, how easily it could mess up a hypothetical external hard drive at the same time it was destroying my computer? My removable media is not quite right next to my computer, but... things could still go wrong with it. Since my previous backup-recovery failure was due to me losing the encryption key, I'd like a stronger, foolproofier solution here.

Still, this is better than what I had before. I sleep easier now.

Labels: admin, capabilities, link

Link: Caja's HTML sanitizer for Javascript

When you write a program that's supposed to be secure, you have to plan on security from the beginning; you can't bolt it on afterwards. The idiomatic way to describe a "plan" like we'll write the program first and figure out the security later is "They're asking for some magical security fairy dust to sprinkle over their code."

I'm tweaking a Javascript program that takes HTML from someone else and renders it on a page. I thought my program was getting "sanitized" HTML; that is, HTML that had any potentially-dangerous stuff removed. If I'm showing someone else's HTML on my page, I want to make sure that HTML doesn't have, for example, an <img src="http://sneaky.org/sneaky.gif"> in it. Otherwise, the webmaster of sneaky.org will know whenever someone reads my page.

I thought the program was getting sanitized HTML, but it was getting "raw" HTML, possibly chock-full of evil. Argh, I needed to bolt on some security. I went pleading to some of the security-minded folks for help. I was embarrassed--I 'fessed up that I needed some "magical security fairy dust". The amazing part is that those security-minded folks came through--they pointed me at Caja.

Caja is primarily a system for enforcing security "capabilities" in Javascript. But, but but even if you don't need all of that, you might still want one part:

Caja comes with a XSS sanitizer for HTML that works with your JS code: html-sanitizer.js. And you'll also need html4-defs.js. It looks like you need to build html4-defs.js via Ant. That's kinda annoying, but a lot easier than writing your own HTML sanitizer from scratch.

I looked over the source code. It's checking for bad stuff I hadn't thought to check for. I sure am glad that folks more knowledgeable than me are working on this thing.

Labels: capabilities, link, programming

Link: Some thoughts on security after ten years of qmail 1.0

This guy Hans Boehm came and gave a talk at work today about upcoming C++ support for threads. That's support built into the language. It sounds like sometime in the next few years, we will have atomic<int> . That is to say that C++ will support concurrency, you'll be able to create objects that only one process/processor/whatever can mess with at a time. Up until now, it's been fun to mock people who have opinions about concurrency in programming languages, "Enjoy the Erlang!" and all that, but soon there will be no escape.

Actually, I was thinking about concurrency earlier, when I was reading this paper that's been going around, Some thoughts on security after ten years of qmail 1.0. Some things he doesn't say so well. But there is a nice list of things that a root program can do to run another program in a sandbox:

The jpegtopnm program reads a JPEG file, a compressed image, as input. It uncompresses the image, produces a bitmap as output, and exits. Right now this program is trusted: its bugs can compromise security. Let’s see how we can fix that.
Imagine running the jpegtopnm program in an “extreme sandbox” that doesn’t let the program do anything other than read the JPEG file from standard input, write the bitmap to standard output, and allocate a limited amount of memory. Existing UNIX tools make this sandbox tolerably easy for root to create:

Prohibit new files, new sockets, etc., by setting the current and maximum RLIMIT_NOFILE limits to 0.
Prohibit filesystem access: chdir and chroot to an empty directory.
Choose a uid dedicated to this process ID. This can be as simple as adding the process ID to a base uid, as long as other system-administration tools stay away from the same uid range.
Ensure that nothing is running under the uid: fork a child to run setuid(targetuid), kill(-1,SIGKILL), and _exit(0), and then check that the child exited normally.
Prohibit kill(), ptrace(), etc., by setting gid and uid to the target uid.
Prohibit fork(), by setting the current and maximum RLIMIT_NPROC limits to 0.
Set the desired limits on memory allocation and other resource allocation.
Run the rest of the program.

At this point, unless there are severe operating-system bugs, the program has no communication channels other than its initial file descriptors.

Up until now, the phrase "chroot jail" was one of those things that I read with only a vague sense of understanding. And folks kept saying "It's not enough to set up the chroot, there's more to it", but they never seemed to list the other things to do. But now that I have a list of a few things, I can probably search the web for pages and code that mentions these things and get a nice survey.

But something I hadn't caught onto before--this chroot stuff is all about spawning programs. I guess you end up with multiple programs all running at the same time. It's a concurrent programming model, I guess. Actually, now that I look at some code samples that deal with chroot this and RLIMIT that, this stuff doesn't look so easy.

Rob Pike gave a talk about Newsqueak a little while back. Newsqueak is a language that makes it pretty easy to spawn off little programlets--there are these objects that are kinda like function pointers. And you can do this thing where you kind of set up a thingy that invokes one of these functions and blocks/waits for its return value. And I thought it would be nice if I could do something like that, but maybe first set some flag on that function-pointer-like-thingy that means "run this function in jail".

Newsqueak-like message-passing concurrency programming... it lives on in other languages these days--like Erlang. Erlang? Erlang. Jeez, maybe I should look at Erlang, stop making fun of it, see what I can learn from it. Learn from Erlang. So this is how low I've sunk.

Next thing you know, I'll be associating with LARPers. Asking Furries to share their wisdom.

Oh, now I can't stop shuddering.

Maybe I'll put off studying Erlang until C++'s atomic<int> comes along.

Labels: capabilities, link, programming languages

Link: Lectures on Authorization Based Access Control

If you're a programmer, you might be interested in watching some lectures about Authorization Based Access Control. Some folks from an HP research lab lectured at the GooglePlex about better & easier security through fine-grained access control. Maybe if I followed security literature closely, this would be all old news to me. But I don't. And these lectures were pretty good. Well, at least three of them were. I was out of town for one of them, and haven't seen it. Anyhow, links to the lectures:

Intro to the lab's projects (Alan Karp)
Abstraction mechanisms for Access Control (Mark Miller)
A Very Secure and Powerful Wiki (or other web service) (Tyler Close)
Fitting it all Together: Safely running warez from teh internets (Marc Stiegler)

These lectures were dangerous in that they made me want to go join a startup to create a new operating system. But I know better than that by now. So I got over it.

Tags: link | video | capabilities |

Labels: capabilities, link