Designing Human Systems

Recently I was having a conversation with a colleague who asserted that we (SREs) are broadly the types of engineer who, if given the choice, try to focus on perfecting the fundamentals. This surprised me, because if you were to ask me about my views on engineering, I’d probably lean in a slightly different direction.

My personal view on SRE is that its a game of balance. We’re not Software Engineers, we’re not Operations Engineers and we’re also not Security Engineers. We tread a fine line in the middle, pushing on aspects of the broader (humans included) system to help it find a stable equilibrium in which it delivers maximum value for all stakeholders. That kind of balancing requires a very pragmatic, flexible approach and often depends more on the subtleties of the system at hand than a rigidly theoretical approach can offer.

With that in mind, I think that as engineers, we need to focus on building systems that support that healthy equilibrium. Doing so means balancing a wide range of requirements from different, often competing, stakeholders while attempting to divine what the future may bring. In my experience, however, all of this becomes much easier to deal with if you can solve two key problems: velocity and observability.

Before I dive into that, let’s quickly talk about that experience.

Read more »

App Updates

Today I work as an SRE, surrounded by dozens of complex systems designed to make the process of taking code we write and exposing it to customers. It’s easy to forget that software deployment itself is a problem that many developers have not yet solved.

Today I’d like to run you through a straightforward process I recently implemented for Git Tool to enable automated updates with minimal fuss. It’s straightforward, easy to implement and works without any fancy tooling.

Read more »

Organizing your Development Directory

As an engineer, I like to think that I help fix problems. That’s what I’ve tried to do most of my life and career and I love doing so to this day. It struck me, though, that there was one problem which has followed me around for years without due attention: the state of my development directories.

That’s not to say that they are disorganized, I’ve spent hours deliberating over the best way to arrange them such that I can always find what I need, yet I often end up having to resort to some dark incantation involving find to locate the project I was certain sat under my Work folder.

No more, I’ve drawn the line and decided that if I can’t fix the problem, automation damn well better be able to!

I’d like to introduce you to my new, standardized (and automated), development directory structure and the tooling I use to maintain it. With any luck, you’ll find it useful and it will enable you to save time, avoid code duplication and more easily transition between machines.

Read more »

Python Iterators, Next

How iterators work in Python, details about the next function and a lesson from production

Recently we had an outage. It was a small one, by all accounts and as a result of the way our system is designed, it didn’t impact any users, lose any data and wasn’t in any way noticeable to anybody except us. It did happen though and that’s a problem.

The cause of this outage was pretty simple, engineer A designed a nice new feature in library X; engineer B liked this feature and decided to use it in service Y. This is a daily occurrence and is generally a very good thing, new, cleaner solutions help you constantly refactor away technical debt and improve the readability and all important maintainability of your code.

This time, however, it went wrong and caused an outage so let’s talk about how that happened and take a detour through the land of Python iterators at the same time.

Read more »

Blueprint for a Monitoring Stack

At one point in my career, I spent over 2 years building a monitoring stack. It started out the way many do; with people staring at dashboards, hoping to divine the secrets of production from ripples in the space time continuum before an outage occurred.

Over these two years we was able to transform not just the technology used, but the entire way the organization viewed monitoring, eventually removing the need for a NOC altogether.

I’ll walk you through the final design which was responsible everything from data acquisition to alerting and much besides. In this post I’ll go over some of the design decisions we made, why we made them and some guidance for anybody designing their own monitoring stack.

Read more »

Scaling for Latency with Async I/O

I’ve just spent the last month rewriting the core component in a monitoring stack which is responsible for protecting the availability of a billion dollar per year franchise. The purpose of this rewrite was to improve the ability of our engineers to implement new features in a safe, quick and easy way - what we delivered ended up offering a four order of magnitude performance and efficiency improvement over our previous system.

Let’s talk about how that happened, why it was possible and how we achieved that without it being a focal point of the redesign. I’m going to discuss evented input-output, often referred to as async.

Hopefully, by the time you’ve finished reading this article you should have a good grasp of what evented IO is, how it works and some of the situations in which it has a lot to offer - as well as some of the significant advantages it has over alternative approaches when we start talking about large scale production systems.

Read more »

Patterns for APIs

If you’ve built a production API before, you’ll know that they tend to evolve over time. This evolution is not only unavoidable, it is a natural state that any active system will exist in until it is deprecated.

Realizing and designing to support this kind of evolution in a proactive way is one of the aspects that differentiates a mature API from the thousands that litter the Wall of Shame.

At the same time, it is important that your API remains easy to use and intuitive, maximizing the productivity of developers who will make use of it.

Read more »

Relational and Document DBs

One of the most interesting discussions to have with people, notably those with traditional database experience, is that of the relationship between an off the shelf RDBMS and some modern NoSQL document stores.

What makes this discussion so interesting is that there’s invariably a lot of opinion driven from, often very valid, experience one way or another. The truth is that there simply isn’t a silver-bullet database solution and that by better understanding the benefits and limitations of each, one can make vastly better decisions on their adoption.

Read more »

Signing Git Commits using Keybase

KeyBase's Logo
KeyBase’s Logo

With the increasing popularity of Git as a tool for open source collaboration, not to mention distribution of code for tools like Go, being able to verify that the author of a piece of code is indeed who they claim to be has become absolutely critical.

This requirement extends beyond simply ensuring that malicious actors cannot modify the code we’ve published, something GitHub and its kin (usually) do a very good job of preventing. The simple fact is that by adopting code someone else has written, you are entrusting your clients' security to them - you best be certain that trust is wisely placed.

Using Git’s built in support for PGP signing and pairing it with Keybase provides you with a great framework on which to build and verify that trust. In this post I’ll go over how one sets up their development environment to support this workflow.

Read more »

Feeling Lucky

Anybody who has worked in the development world for a significant portion of time will have built up a vast repertoire of abbreviations to describe how they solve problems. Everything from TDD to DDD and, my favourites, FDD and HDD. There are so many in fact that you’ll find a website dedicated to naming and shaming them.

I’m not one to add another standard to the mix… Oh who am I kidding, let me introduce you to Chance Driven Development.

XKCD Standards

Read more »

Autocompletion for Bash CLI

If you haven’t yet read the article on Bash CLI then go read it now.

Bash’s ability to automatically provide suggested completions to a command by pressing the Tab key is one of its most useful features. It makes navigating complex command lines trivially simple, however it’s generally not something we see that often.

Bash CLI was designed with the intention of making it as easy as possible to build a command line tool with a great user experience. Giving our users the ability to use autocompletion would be great, but we don’t want to make it any more difficult for developers to build their command lines.

Thankfully, Bash CLI’s architecture makes adding basic autocomplete possible without changing our developer-facing API (always a good thing).

Read more »

Building a CLI in Bash

If you’re just looking to hop straight to the final project, you’ll want to check out SierraSoftworks/bash-cli on GitHub.

Anybody who has worked in the ops space as probably built up a veritable library of scripts which they use to manage everything from deployments to brewing you coffee.

Unfortunately, this tends to make finding the script you’re after and its usage information a pain, you’ll either end up grep-ing a README file, or praying that the script has a help feature built in.

Neither approach is conducive to a productive workflow for you or those who will (inevitably) replace you. Even if you do end up adding help functionality to all your scripts, it’s probably a rather significant chunk of your script code that is dedicated to docs…

After a project I was working on started reaching that point, I decided to put together a tool which should help minimize both the development workload around building well documented scripts, as well as the usage complexity related to them.

Read more »


Since there seems to be quite a bit of confusion surrounding the process of hacking the Asus ExpressGate system to change resolutions I am gonna try my best to clarify some of it for you. I recently purchased an Asus P6X58D Premium which comes with ExpressGate (SplashTop) embedded on it. However since the maximum resolution is limited to 1280x1024 I decided to do a bit of work and fix that.

I have since created a bash script and numerous applications to aid anyone looking to do their own bit of modding on ExpressGate.

Read more »