Book Report: Software Engineering at Google

It's a pretty-good survey of important systems (technical systems and people-systems) at Google Engineering. When I say "survey" I mean it covers a lot of topics lightly. E.g., the chapter "How to Work Well on Teams" is by Brian Fitzpatrick who co-wrote a whole book about teams. The chapter doesn't try to cover everything in the book; but it covers some well. Each chapter has plenty of endnotes with references so you know where to learn more.

I bet if you're a new Google engineer, this book helps you understand why things are the way they are. If you're a new Google engineer who's worked in other engineering organizations, you absolutely want some kind of explanation for why things are the way they are.

E.g. If you're an engineer who came from a 100-person-sized company, you might ask "Why not use a git repository to hold all the code? Isn't that what everybody does? But Google has wayyyy too much code to fit in a git repository. If you work in smaller organizations, you don't really think about how much code a git repo can handle. Why would you? You're nowhere near the limit. Then you arrive at Google and suddenly find out it's an issue; and you find out about dozens of other "issues" that arise when thousands of nerds beaver away on the same pile of code.

If you're an engineer not at Google, this book is probably interesting, but I dunno how useful it is. It's good at describing issues a growing engineering organization runs into; but the solution that was best for Google isn't necessarily the solution best for your organization.

Consider diamond dependencies. That's the problem where there's low-level utility code that two middle-level libraries use; then some high-level application uses both middle-level libraries. You can hit an issue where you change the low-level code; then you try to update the middle-level libraries, and you hit a snag. Until you work around the snag, one middle-level library only works with the old version of the low-level code; the other middle-level library only works with the new version of the low-level code. And now you can't build your high-level application because there's no version of the low-level code that works with everything.

(It feels kind of silly that we talk about "diamond dependencies". When your tree of dependencies looks like ♢, you can fix snags in a hurry. Maybe instead of "diamond dependencies" we should talk about "elaborate macrame dependencies." When your dependency graph looks less like ♢ and more like 𝍌, those little snags turn into knotty snarls.)

elaborate macrame picture. theoretically a public picture from the Rijksmuseum, but I had no luck tracking down the original

It's good to know that ~~diamo~~ elaborate macrame dependencies are a real problem for big engineering organizations. If you've only worked in more sanely-sized organizations, you might think oh diamond dependency problems can be a hassle, but it's easy to fix each time. Surely such an issue would never block all progress for the engineers working on low- and mid-level code in my growing organization for a week at a time. Ideally, you find out from the efforts of traumatized Googlers, not from experience.

But Google's solution might not be the best one for your organization. For many years, Google used a git-like-but-different tool called Perforce that made it pretty easy to keep lots of code in one big repo and work "at head". When they figured out a system to test low-level changes against middle-level libraries, they relied on all the relevant code being at head in one repo. When their codebase outgrew Perforce, it seemed easiest to write a tool that was like Perforce but bigger. Other solutions would have involved changing many, many tools.

I heard about the system Salesforce uses to solve the same problem. It's pretty different. Instead of putting all projects in the same repo, there's one project per repo. And there's a system that knows which projects depend on which other projects. So if a Salesforce engineer wants to change some low-level code in one repo, this system knows which other repos to test. I bet Salesforce's solution is different because their history is different. They knew they'd run into the elaborate macrame dependencies problem; but they came up with a solution based on where they were, not based on what worked for Google.

It's not just dependencies and source control. There's unit tests and big tests, release trains(‽), code review and more and yet more. It was a fun read if you're into large engineering organizations' swarming behavior and how to steer it. I enjoyed it.