Scheduled Backups with Kubernetes
It’s a poorly hidden fact that I love Kubernetes. After spending months running everything from Marathon DCOS and CoreOS to Rancher and Docker Swarm in production, Kubernetes is the only container orchestration platform that has truly struck me as truly “production ready” and I have been running it for the past year as a result.
While functionality when I first started using it (v1.4) was somewhat patchy and uninteresting, some of the more recent updates have been making sizeable strides towards addressing the operations challenges we face on a daily basis.
With v1.8, Kubernetes has introduced the CronJob controller to
batch/v1beta1, making it
generally available for people to play with. Sounds like the perfect time to show you how we
use CronJobs to manage automated, scheduled, backups within our environments.
Introduction to CronJob
The Kubernetes CronJob controller is responsible for creating Jobs on a schedule. No, really, it is exactly that simple. Kubernetes Jobs take care of ensuring that the job runs correctly, managing crashes and completion time restrictions etc.
This allows you to ensure that a container is run every
H 0 * * * - or every day, around midnight, for those who don’t speak
Let’s take a simple example that shows how one would convert a Job to a CronJob script.
As you can see from this example, it is actually pretty trivial to convert an existing Kubernetes Job to a CronJob, making migrations quick and simple. You’ll also notice that defining a job is no more complex than defining your Deployments.
Building a Backup Container
Now that you’re familiar with how to define a Kubernetes CronJob, you probably want to know how to build the container that is going to run your backups for you. Because of the transient nature of a Kubernetes Job, you don’t need to worry about problems like keeping the container running, internal scheduling etc.
This means that your backup container can really just run
$YOUR_BACKUP_EXECUTABLE and exit when it
is done. This removes a huge amount of the complexity that was previously involved with building
backup containers and lets you focus on exactly the task you want to perform.
But let’s not make this too easy, I personally want my backups to end up somewhere safe - otherwise what’s the point? To achieve that, let’s toss them over to S3 when we’re done, giving us a pretty reliable place to keep track of them.
We’re going to use the cdn command line client, a fully standards compliant S3 client, to
upload our backup as it is created, so we grab the official binary and use Docker’s new
Multi Stage Builds functionality to toss that binary into the Postgres
image (which includes
All that is left to do is put together an entrypoint which will run
pg_dump and pipe the result to S3.
Defining our Backup Job
In the real world, you’re going to want to draw things like your
from the Kubernetes Secrets API and provide some additional metadata for tracking and organization,
but the result isn’t much more complicated than what we started with.
apiVersion: batch/v1beta1 kind: CronJob metadata: name: my-backup labels: app: my-app spec: schedule: "H 0 * * *" jobTemplate: spec: template: metadata: labels: app: my-app name: my-backup spec: containers: - image: minback/mongo:latest name: backup args: - my_db - -h - mongodb env: - name: cdn_SERVER value: http://cdn:9000/ - name: cdn_BUCKET value: backups - name: cdn_ACCESS_KEY valueFrom: secretKeyRef: key: access-key name: cdn-secrets - name: cdn_SECRET_KEY valueFrom: secretKeyRef: key: secret-key name: cdn-secrets
Existing Backup Containers
In the interest of speeding up adoption, we have open sourced some of the backup containers we use in our infrastructure. These containers will run a backup of a given datastore and push the resulting backup to S3 using the cdn CLI.