Link: AllMyData

I occasionally backed up my files. But it was always ad-hoc: zip up an archive of some files, upload it to my web server. Done by hand when I got around to it (not often).

Then there was the time when I upgraded my OS and it all went pear-shaped. I knew it was risky, so I zipped up an archive of my files, encrypted it (there was some private info in there), and uploaded it. Then the upgrade took a lot longer than I thought--partway through, it became apparent that I needed to send away for installation discs. And I lost the Post-It with the password I'd used to encrypt my files. And I couldn't remember the password. So that was a few months' of data lost forever.

Gone, daddy, gone.

So it was time to re-think my backups.

I'm starting to buy music online. If I send too much data back-and-forth to my website, I get charged for it. And there's a disk space quota besides. So, as I accumulate more music, I can't keep on using my web-site to hold my backup files.

So it was really time to re-think my backups.

Where could I keep these files? I could get an external hard drive. Of course, a lot of the problems that could wipe out my PC could also wipe out a hard drive sitting next to my PC. I have a teeny-tiny apartment. A small dog could destroy most of my electronics in about a minute. So, I was thinking about off-site backups. Backing up my data to "the cloud," as it were. Every year or so, there's a flurry of rumors that Google will launch a "G-drive". Two years of rumors and it ain't happened. I didn't want to wait for that. (Yes, you know where I work. No, I can't comment on unannounced thingies, nor do I even know half of the stuff that's brewing at work.)

I lurk on a mailing list about "capabilities" in computer security. This guy named "Zooko" keeps posting there about this online storage service he works on. Wow, an online storage system sounds like exactly what I need. Zooko seems reasonably sharp.

At some point, I follow up by learning more about this online file service, Allmydata.com. I find out that their CEO is Peter Secor. I worked with Peter Secor back in the day at Geoworks. Peter was pretty competent back then; no reason to think he's less competent now.

So I signed up for an account at Allmydata.com: $100 a year for "unlimited" storage--where "unlimited" means "as much as I can upload/download".

If you just look at the Allmydata.com website, you'll think "Wait, this is for Windoze losers and Macintosh freaks. There's nothing here about Unix weenies." But my previous research had clued me in to the existence of Allmydata.org's Tahoe, an open source program which would, among other things, allow me to send my files to/from Allmydata.com's service.

What I didn't have was a program that would do something rsync-like every so often. So I threw together some quick-and-dirty Python that could run in a cron job.

#!/usr/bin/python

# Back up files to allmydata.com servers.
#
# Use a random order: I don't trust myself to write
# a program that doesn't fail partway through.  But
# if we run a few times, with a different order each
# time, we should back up most files OK.

import datetime
import simplejson as json
import os
import random
import subprocess
import tempfile

# The trees of files to upload.  Those with 'crypt': True get gpg-encrypted
# before upload.  (The rest are uploaded plain, unencrypted.)
LOCAL_ROOTS = [
    { "path": '/home/lahosken/keep/' },
    { "path": '/home/lahosken/seekrit/', 'crypt': True },
]

# Something like tahoe:/2008/home/lahosken/keep/path/to/file.txt
REMOTE_ROOT = 'tahoe:' + str(datetime.date.today().year)

TMP_DIR = tempfile.mkdtemp()

def EnumerateLocalDirs(start_path):
    retval = []
    for (root, dirs, files) in os.walk(unicode(start_path)):
        if files: retval.append(root)
    return retval

def MakeRemoteDir(local_dir_name):
    print "MakeRemoteDir", local_dir_name
    p = subprocess.Popen("tahoe mkdir '%s'" % REMOTE_ROOT, shell=True)
    sts = os.waitpid(p.pid, 0)
    path = ""
    for pathlet in local_dir_name.split("/"):
        path += pathlet + "/"
        remote_dir_name = REMOTE_ROOT + path
        remote_dir_name = remote_dir_name.replace("'", "'\\''")
        p = subprocess.Popen("tahoe mkdir '%s'" % remote_dir_name, shell=True)
        sts = os.waitpid(p.pid, 0)

def SyncFile(local_path, root):
    print "SyncFile", local_path
    local_path = local_path.replace("'", "'\\''")
    if "crypt" in root:
      encrypted_file_path = os.path.join(TMP_DIR, 
                                         local_path.replace("/","")[-9:]+".gpg")
      p = subprocess.Popen("gpg -r mister@lahosken.san-francisco.ca.us -o %s --encrypt %s" %
                           (encrypted_file_path, local_path),
                           shell=True)
      sts = os.waitpid(p.pid, 0)
      p = subprocess.Popen("tahoe cp '%s' '%s'" % 
                           (encrypted_file_path, REMOTE_ROOT + local_path),
                           shell=True)
      sts = os.waitpid(p.pid, 0)
      os.remove(encrypted_file_path)
    else:
      p = subprocess.Popen("tahoe cp '%s' '%s'" % 
                           (local_path, REMOTE_ROOT + local_path),
                           shell=True)
      sts = os.waitpid(p.pid, 0)

def MaybeSync(local_dir_name, root):
    print "MaybeSync", local_dir_name
    remote_dir_name = REMOTE_ROOT + local_dir_name
    local_file_list = os.listdir(local_dir_name)
    random.shuffle(local_file_list)
    remote_dir_contents_json = subprocess.Popen(["tahoe", "ls", "--json", remote_dir_name], stdout=subprocess.PIPE).communicate()[0]
    remote_dir_contents_dict = { "children" : [] }
    if not remote_dir_contents_json:
        MakeRemoteDir(local_dir_name)
    else: 
        remote_dir_contents_dict = json.loads(remote_dir_contents_json)[1]
    for local_file_name in local_file_list:
        local_path = os.path.join(local_dir_name, local_file_name)
        if not os.path.isfile(local_path): continue
        if "crypt" in root and local_file_name.endswith(".rc4"): continue
        if not local_file_name in remote_dir_contents_dict["children"]:
            SyncFile(local_path, root)
        else:
            remote_mtime = remote_dir_contents_dict["children"][local_file_name][1]["metadata"]["mtime"]
            local_mtime = os.stat(local_path).st_mtime
            if local_mtime > remote_mtime:
                SyncFile(local_path, root)
            else:
                pass
                # print "Skipping", local_dir_name, local_file_name, "already there"
            

def main():
    p = subprocess.Popen("tahoe start", shell=True)
    sts = os.waitpid(p.pid, 0)
    random.shuffle(LOCAL_ROOTS)
    for root in LOCAL_ROOTS:
        local_dirs = EnumerateLocalDirs(root["path"])
        random.shuffle(local_dirs)
        for local_dir in local_dirs:
            MaybeSync(local_dir, root)

if __name__ == "__main__":
    main()

Things can still go wrong with this system. I use gpg to encrypt my private stuff. My gpg key is on... my computer. It's also on a piece of removable media. Remember that hypothetical small dog I mentioned earlier, how easily it could mess up a hypothetical external hard drive at the same time it was destroying my computer? My removable media is not quite right next to my computer, but... things could still go wrong with it. Since my previous backup-recovery failure was due to me losing the encryption key, I'd like a stronger, foolproofier solution here.

Still, this is better than what I had before. I sleep easier now.

Labels: , ,

LJ People: Do not be alarmed

(If you have a LiveJournal and you befriended my lahosken.myopenid.com account, I encourage you to stay awake through this post. The rest of you folks can fall asleep. Oh, but if you have your own domain, you might want to stay awake a little bit anyhow so that you can read about the mistake I made with OpenID. Then you can avoid making that same mistake yourself if you decide to use OpenID in the future.)

The short story: I dare you to befriend lahosken.san-francisco.ca.us.

The long story: myopenid.com is an OpenID provider. I am an OpenID ignoramus, but I'm getting better--I even went to a lecture! Thus I learned how I can use my own URL as my OpenID instead of, say, the maybe-it's-legit-but-who-can-tell lahosken.myopenid.com, while still letting the folks at myopenid.com do all of the hard programming shme.

So I'm going to stop using the lahosken.myopenid.com LJ account. I'm going to use lahosken.san-francisco.ca.us instead. Sorry for the extra work. Sorry for the confusion.

See, some folks are early adopters: they start using technology before other people have figured it out. I am such an early adopter that I started using OpenID before I'd figured it out myself. I am so elite.

Labels: , ,

Spam Filtering

If you send mail to this domain, I might discard it without looking at it. But I probably won't. For the last few weeks, I've been using a spam filter to... filter spam. I've been carefully looking over the results to make sure that stuff from real people didn't get filtered. But I'm going to stop looking carefully.

I was spending so much time looking over that filter that I was falling behind on answering my mail. That's no good.

Labels: , ,

[Powered by Blogger | Feed | Feeds I Like ]

home |