Introduction To Dockerizing For Production: Itamar Turner-Trauring
Introduction To Dockerizing For Production: Itamar Turner-Trauring
Introduction To Dockerizing For Production: Itamar Turner-Trauring
Production
Itamar Turner-Trauring
1
Contents
Legal disclaimer 3
Introduction 4
Production is different 6
Step 2: Security 15
Dealing with security updates . . . . . . . . . . . . . . . . . . 15
2
Legal disclaimer
3
Introduction
4
• Packaging is a process: get the process wrong, and the product
will be wrong too. Consider, for example, security updates: as we
will see, there are at least two approaches you can take. If you im-
plement a “best practice” created with the first process in mind,
while actually following the second process, you will end up with
insecure images.
In short, before you learn the specifics, you need to understand the big
picture.
Before we continue, there are some prerequisites. You should already
understand the basics of how Docker packaging works: the differences
between a container and an image, what an image registry is, the basic
structure of a Dockerfile, and so on. If you need help with these concepts,
consider reading my introductory book Just Enough Docker Packaging.
Assuming you know the basics, let’s move on to consider packaging for
production.
5
Production is different
Let’s assume for now that you’re planning on packaging your software for
production using Docker. We’ll consider why packaging and why Docker
in the next section, but for now let’s consider the difference between
production and development.
A common use case for Docker is for development purposes: creating
a consistent environment across multiple machines and operating sys-
tems, and providing easy access to runtime dependencies.
How exactly does this use case different from production? The short
answer: development is a much simpler use case than production.
Here’s a visualization of the development process where Docker is often
used:
Coding Testing
You write some code, you test the code, you find some bugs, you write
some more code. The Docker dev environment is designed solely to help
you run code easily, while still providing fast feedback. For example, a
web server might reload the code every time it changes.
6
Production use is far more complex:
Running
Runtime
Developing
Testing
Packaging
Integration testing
Deploying
Deployment
Process startup
7
Software is submitted in pull requests, tests are run, Docker images are
built, integration tests are run, the software is deployed, old servers are
shut down, bug reports are fed from production environments back to
developers… all these steps have some interaction with packaging. As a
result, packaging for production is far more complex.
In addition, packaging for production is also more important to get right:
Given that production packaging is both more complex and more impor-
tant, you will need to spend more time on it.
8
Why packaging? Why Docker images?
So far we’ve just assumed you’re packaging your software. The next
question to consider is why you’re packaging your software at all. Can’t
you just checkout your code from version control on a machine some-
where, and then run it directly as is?
Technically, yes, but even for this simplistic deployment model we have
some requirements. To deploy and run your code you will need:
And that’s just the starting point. Eventually, time passes and things
change:
If you start with the simplistic git pull approach, you can certainly pull
new code, install new dependencies as needed, upgrade Python manu-
ally, and so on.
The problem with this approach is that the machine will end up in any
one of many arbitrary combinations of versions of your code and its de-
pendencies. At the same time, your development machine may be in
some other slightly different combination. With arbitrary versions of
9
code in different environments, how can you be certain the code that
worked on your laptop will also work in production?
Once you run the same code on two machines, or add additional devel-
opers, the problem gets worse: now you might have three, four, ten dif-
ferent combinations of versions and dependencies.
And that’s why you want packaging, and that’s why Docker images are an
excellent form of packaging:
• Don’t just contain your files, but also have a standard way of en-
capsulating how to run your application: the image entrypoint.
• Encourage standardized idioms for configuration (environment
variables) and logging (stdout) inspired by the Twelve-Factor App.
• Can be run in a very broad range of environments, including lo-
cal machines, PaaS like Heroku, serverless environments like AWS
Lambda, a variety of cloud services, and orchestration systems like
Kubernetes and Consul.
10
image, but many Docker images over time, driven by both internal pro-
cesses like code changes, and external events like security updates to
your dependencies.
In short, packaging for production is a process, and a process that con-
tinues over time. Some parts of this process will be embodied in artifacts
like your Dockerfile, build scripts, and the like. But other parts will be or-
ganizational processes that will require human intervention, for example
upgrading to newer but incompatible dependencies.
11
An iterative process for Docker packaging
• After each step in this process, you will have something useful,
even if not perfect.
• You implement the most important parts first, so that if you’re
pulled away to work on something else you’re always at a good
stopping point.
In practice you might choose to do the steps in a different order, for ex-
ample doing reproducibility earlier, but the given order is a reasonable
starting point for most situations.
Next, we’ll go through each step, explaining:
12
Again, I won’t be going into specific implementation details about how to
do any of these steps—that’s a 100-page book, and this isn’t it. But after
we go through the whole process I’ll point you at resources to help you
do the actual work.
13
Step 1: Your application runs
Having your application run inside a Docker image is your first step: if
your Docker image can’t run your application, it’s not particularly useful.
This won’t be perfect packaging, since it’s only the first step, but you will
improve the packaging with each additional iteration.
The real world is more complex than a simplified model. So in practice,
some of the work you’ll be doing at this stage matches the goals of later
steps. For example, choosing a stable base image is part of reproducibil-
ity, but you should be doing it here.
The main focus, however, should be on doing just the minimum neces-
sary to get something working.
Since you’re starting from scratch, you will need to decide how your ap-
plication will be configured: will you use environment variables, configu-
ration files, or both?
For network servers you’ll want to think about which ports are public and
which are private.
For batch jobs that process data you’ll want to think about how input
files will be passed in, for example a volume mount at a known directory,
and how output will be stored.
You should of course document all these decisions, so that people using
the image know how to use it.
14
Step 2: Security
The next step is making your image more secure. If your image is inse-
cure you won’t want to run it anywhere public, so it’s worth doing this as
early as possible so you can start using your image outside your develop-
ment machine.
This includes not running as root, installing security updates, and other
best practices.
The key process you’ll have to implement here is handling security up-
dates for your dependencies. Given Docker images are immutable, se-
curity updates require rebuilding a new image with the newly updated
dependency, and then redeploying if necessary.
To simplify somewhat, there are two basic approaches you can take:
• Always use latest versions: Whenever you rebuild the image, you
use the latest versions of your dependencies to ensure you get the
latest security fixes. In order to ensure updates are applied in a
timely fashion, you need to either rebuild from scratch nightly or
weekly, or whenever a security update becomes available.
• Use pinned versions: Your dependencies are pinned to specific
versions, for example you only install v1.1.1 of a particular package.
When a security update—or critical bug fix—becomes available you
will need to regenerate your list of pinned dependencies to include
the version with the fix, and then rebuild the image.
Either way you will be rebuilding and redeploying your image on a regu-
lar basis, as an ongoing process.
15
My general advice is to use the latest version approach for system pack-
ages: a stable long-term-support Linux distribution will make sure se-
curity updates are backwards compatible. For Python dependencies I
recommend the pinning approach, because upgrades are much more
likely to introduce incompatibilities with your code.
16
Step 3: Automated builds
Now that you have an image that runs your application and hopefully is
somewhat secure, the next step is automating builds. Instead of building
images manually, you want to build images automatically in your build or
CI system. Whenever new code is pushed to version control, you’ll build
a new image. This means it’s easier for multiple team members to get
images built, and means you can also automate deploys if relevant.
You will likely want to push your images to an image registry, for later
access. If you don’t already have one you will need to set one up, either
on your own servers or via a hosted service like your cloud provider.
If you’ve chosen the path of automated daily or weekly rebuilds for secu-
rity updates, you’ll want to implement that at this point.
Now that it’s not just you manually building one-off images, you need to
think about how your image building integrates with your development
process.
For example, a common way to use version control is feature branches:
each feature gets its own branch. Eventually you open a pull request
back on to the main branch. Tests get run on the pull request, there
might be a code review, and eventually the pull request gets merged in
to the main branch.
One approach to automating image building is to only build a Docker
image off the main branch, after each branch is merged.
Alternatively, you might want to have tests that use the Docker image,
which means you are going to end up building Docker images as part of
17
the pull request. You will then need to some way to differentiate between
images from different branches—a common way is to base the Docker
image tag on the branch name, so you might have yourorg/yourimage:main
and yourorg/yourimage:branch-12345.
Looking further out, you might want to start thinking about how new re-
leases of the image can be deployed automatically.
18
Step 4: Improving debugging and operational
correctness
Improving debugging
Once builds are automated, your whole team will start generating Docker
images just as a side-effect of normal software development. And so now
you need to start thinking about how to support the debugging process
your team uses.
Imagine an image is running in production, and a bug report comes in.
Can you tell what version of the code the image running, so a developer
can reproduce the problem locally? Are you getting sufficient logging to
help debug the problem?
The next step then is making images easier to debug, for example by ad-
ditional logging. You’ll also want to add metadata to the image to keep
track of where, how, and when it was created.
Operational correctness
19
A lot of this comes down to application logic, and to integrating with your
runtime environment—Heroku or Kubernetes, for example—but packag-
ing does come into this. For example, health checks are typically part of
your packaging, or at least adjacent to it. Likewise, how database schema
upgrades work is closely tied to how your image starts up images, and
how you deploy the images.
20
Step 5: Reproducible builds
In general, you want reproducible builds: when you rebuild your image
with the same version of the source code, you want to get the same im-
age. If you’re always installing the latest version of third-party dependen-
cies your application will end up breaking in unexpected ways, even if
your code has only minor changes, because you’ll end up installing in-
compatible dependencies.
For system packages like glibc my tendency is install the latest version
automatically, and rely on a stable operating system to ensure backwards
compatibility. But if you’re depending on the Django web framework, you
probably want to have the same version of Django used whenever you
package your software.
What you need, then, is tooling that takes your logical dependencies—
the packages you import—and turns them into transitively pinned depen-
dencies. Pinning means you specify specific versions, for example Flask
1.1.1. Transitive meaning you also want to pin the dependencies of your
dependencies, and their dependencies, and so on.
21
Preventing stagnation via an update process
At the same time, freezing your dependencies in place for too long is
a bad idea. If you only upgrade your dependencies every three years,
you might find yourself updating your code to deal with multiple major
API changes at once. This makes debugging harder, and problems more
likely.
You therefore need an ongoing organizational process for updating to
newer dependencies. For example, you might upgrade every 3 months or
so, which means chances are you’ll only be facing one major dependency
upgrade at a time.
You can of course automate some parts of this, by using services like
GitHub’s Dependabot that updates dependencies, but that only solves
the mechanical part of changing the version, you may still need to manu-
ally upgrade your code.
22
Step 6: Faster builds and smaller images
At this point you should have packaging that from an operational per-
spective is in great shape. So next it’s time to improve another part of
the development process: the developer feedback loop.
If building a Docker image is a key part of your development cycle, then
slow builds can become an expensive bottleneck. Instead of getting test
results from CI in 2 minutes, you might have to wait another 10 minutes
for the Docker image to build.
If you’re doing continuous deployment and the newly built images get
immediately deployed, slow builds also make it harder to get feedback
about whether code works in production. And that can impede develop-
ers ability to feel secure about deploying at any time.
Similarly, large images can slow down deploys and testing, and can cause
higher cloud computing costs due to disk and bandwidth charges.
The final step in the packaging process is therefore optimizing both build
time and image size. This can involve relying on Docker’s layer caching,
multi-stage builds, and nitty-gritty configuration options of your package
manager.
23
Implementing the process
At this point you should have a sense of the scope of Docker packaging
for production, and some of the processes you’ll be thinking about: se-
curity updates, dependency feature updates, deployments, pull requests.
Actually implementing all this requires getting the details right, though,
and Docker packaging has plenty of those. So how can you learn these
details?
First, you can read the free articles on my website. There’s a huge
amount of tutorial-style content demonstrating specific best practices for
Python and how you can use them.
Second, if you’d like to get going faster, consider reading my Python on
Docker Production Handbook. It’s a streamlined reference, covering 70+
best practices specifically for Python in production, and it’s organized
using the exact same process I describe in this mini-book1 .
Finally, if you have any questions, suggestions, or ideas you’d like share,
please email me—I always love hearing from readers.
1
Given the number of best practices to cover, I split up two of the steps for organi-
zational purposes, so it actually uses an 8-step process. But essentially it’s the same
structure.
24