AWS Reference Architecture Humanitec
AWS Reference Architecture Humanitec
AWS Reference Architecture Humanitec
an enterprise-grade Internal
Developer Platform built with
Humanitec on AWS
Backsta
ge
API
talogper/ Portal
e Cave
Servicg De lo
Catalo
IaC
specs
control Workload m
Version Terrafor
Score
Github Plane
Resource
Compute
S
CD Pipeli
ne Amazon EK
Registry
Data
c
Humanite
ne L
CI Pipeli RDS MySQ
Deploy
Platform
tor
Amazon
Orchestra
ECR Networking
Github
s
Action Route 53
Services
S
Amazon SQ
h
oud Watc
Amazon Cl
ility
Observab
HCP Vault
ment
ty Manage
& Identi
Secrets
02 Introduction
Introduction
Organizations need to be agile and innovative to stay competitive in today's software
development era, which has led to changes in how applications are built, deployed, and managed.
This necessitates the transformation of static CI/CD setups into modern Internal Developer
Platforms (IDPs) that provide developers with the tools needed to innovate and move quickly.
43% of DevOps professionals recognize this and, as a result, have built an IDP to improve
developer experience (DevEx) and enable developer self-service.
As an industry, we need to move beyond buzzwords and provide real-life examples of modern
IDPs. A blog post by McKinsey (soon to be released) makes a major contribution to this.
While every platform looks different, certain common patterns emerge. To help simplify things,
McKinsey consolidated the platform designs of hundreds of setups into standard patterns based
on real-world experiences, which have been proven to work effectively. By adopting these
patterns, organizations can create IDPs that keep them ahead of the competition and deliver
innovative applications faster than ever before.
This whitepaper is inspired by McKinsey’s blog post and provides an overview of one reference
architecture for a dynamic IDP using AWS EKS, RDS, Backstage, Humanitec, GitHub Actions,
Terraform, and several other technologies.
Developer Portal
Deploy
Compute
Amazon EKS
Integration
RDS MySQL
Networking
Route 53
Observability
Monitoring & Logging Plane Amazon Cloud Watch
Services
Amazon SQS
Secrets & Identity Management
Security Plane HCP Vault
03 Introduction
Please note this architectural design tries to use the most common combinations, but it
doesn’t restrict the use of these technologies; every one of them is interchangeable
with others.
There will be a replicable open-source version of this architecture soon, so you can set this up
yourself. In the meantime, please get in touch with a Humanitec solution architect
([email protected]), who will be happy to provide a test setup.
Table of contents
Design principles 07
Architectural components 08
Conclusion 30
Appendix 32
Environment management 32
Deployment Management 33
Observability 33
Administration 33
Cost management 34
Integration 34
04 Problems this IDP design aims to solve
When developers spend their most productive hours dealing with tedious
infrastructure and configuration management tooling, creativity and velocity suffer.
A burned-out engineer does not write code as quickly and efficiently; inspiration
often dies when too many steps are needed to test and deploy an idea. On average,
developers with inefficient platform setups have longer waiting times as they are
stuck in a loop while other teams manually resolve things for them. Mediocre setups
also make onboarding difficult.
Ops teams and developers often find themselves waiting for each other to complete
tasks which can result in delays, frustration, and decreased productivity. Modern
IDPs can help solve this problem by providing code such as UI, CLI or API-based
interfaces that allow developers to quickly and easily provision resources without
having to wait for Ops to do it for them.
In current static CI/CD setups, there are often dozens of ways to materially reach the
same goal, such as spinning up a new Postgres Database or deploying to production,
describing the state of a cluster or other conventions. These scripts often vary only
slightly and sometimes just by environment. Their sheer number and unstructured
nature make them hard to maintain. Good platform design reduces the number of
variances through Dynamic Configuration Management (DCM) by up to 95%.
Design principles
According to McKinsey, there are eight proven design principles:
02 Run your platform team like a start-up. Establish a small central team that
owns the platform and is responsible for marketing it (and has the resources to
do so), ensuring it is easily consumable and fulfills developers’ needs.
engineers must define how to vend resources and configuration. This ensures
every resource is built securely, compliant, and well-architected.
07 Keep code as the single source of truth. This ensures everyone is working
from the same version, reducing the risk of errors.
Architectural components
According to McKinsey's blog post, “plane levels” are different areas of the platform
architecture that cluster certain functionalities. Let’s zoom in on the plane levels we have to
take care of and see what technologies fulfil each function in all of those levels.
Terraform - an Infrastructure as Code (IaC) tool written in HCL to describe the state of
the infrastructure resources in a declarative way
Backstage - a developer portal/service catalog to provide an interface to consolidate
documentation, structure service templates, and catalog existing services
Developer Portal
Integration
Humanitec
Actions ECR
Platform
Orchestrator
Deploy
10 Architectural components
Security Plane
On the Security Plane, we’re managing secrets and identity to protect sensitive data. We’re
storing, managing, and securely retrieving API keys and passwords.
Observability
Monitoring & Logging Plane Amazon Cloud Watch
Developer Portal
Deploy
Compute
Amazon EKS
Integration
RDS MySQL
Networking
Route 53
Observability
Monitoring & Logging Plane Amazon Cloud Watch
Services
Amazon SQS
Secrets & Identity Management
Security Plane HCP Vault
12 The end-to-end architecture result
The developer portal component adds an aggregator of information pulling data from the
Observability Plane, Resource Plane, Platform Orchestrator, CI pipeline and VCS. If used for
service creation and as a templating engine, the portal might call the templating API of
the VCS.
The Continuous Integration (CI) pipeline receives a notification to build and test from the git-
push of the terminal, pushing the latest changes to the Version Control System (VCS) and
indicating the branch. The CI pipeline will store its image in the registry and—at the last step of
the build pipeline—inform the Platform Orchestrator that a new image is available. It will also
send the metadata and eventual “orders” of new resources from the workload specification
(we’ll get to this in more detail later.) The Orchestrator will hand over the deployment-ready
app and infrastructure configs to the Continuous Delivery (CD) part. Note that the Orchestrator
is also taking over the CD functionality in our example, which doesn’t necessarily have to be the
case. The CD pipeline will go ahead and update the Resource Plane.
All parts of this plane can output workflow performance data and other metrics to the
Developer Control Plane, such as CI build time, DORA metrics etc. The Orchestrator can
register new services and their dependent resources to the portal layer.
An important integration point for the Monitoring and Logging Plane also happens on the
Integration and Delivery Plane, where the Platform Orchestrator ensures the necessary
sidecars and agents are launched and running next to the cluster.
In the vast majority of cases, teams already have an existing setup when building their IDP, so
it’s often about remodeling to ensure their setup matches the design principles. Here’s how:
01 Design the individual planes. Start with the Resource Plane because it
dictates other design decisions on other layers. We usually propose the
following order:
a. Resource Plane (you probably already have resources, in this case, decide
which ones are supported by your platform as a default).
b . Integration and Delivery Plane: Pipeline design, configs of the Orchestrator etc.
c . Security Plane.
d . Monitoring and Logging Plane.
e . Developer Control Plane: this heavily depends on the design choices
of the other planes and should always come last after thorough testing
by developers.
15 How platform engineers or Ops teams operate, build, and maintain a platform
02 Wire the individual components of the planes to each other as well as one
plane to the other and test the raw end-to-end flows.
03 Set baseline configs for app and infrastructure configs (more details below).
We’ve covered the planes and their design; let’s next zoom in on the baseline configs
and automations.
Before the platform is ready, the platform engineering team still needs to set a number of
defaults and baseline configs. The entire idea of the presented reference architecture is to
enable developer self-service, lower cognitive load, drive standardization, and reduce ticket
ops. This requires the use of Dynamic Configuration Management (DCM), which in turn requires
a Platform Orchestrator that functions as a rules engine. It matches the request from the
developers with the config defaults provided by the platform team.
This means the next “job to be done” for the platform engineering team is to set those app and
infrastructure config defaults.
16 How platform engineers or Ops teams operate, build, and maintain a platform
id = "db-dev"
name = "db-dev"
type = "postgres"
driver_type = "humanitec/postgres-cloudsql-static"
driver_inputs = {
values = {
"instance" = "test:test:test"
"name" = "db-dev"
"host" = "127.0.0.1"
"port" = "5432"
secrets = {
"username" = "test"
"password" = "test"
criteria = [
app_id = "test-app"
}
17 How platform engineers or Ops teams operate, build, and maintain a platform
The primary interaction method (by far the most used) is the code-based one. Developers
prefer to stay in their usual workflow, in the version control system (VCS), and within their
integrated development environment in order to “indicate” what their workloads require, spin
up new services, add resources etc. This is where a workload specification like Score comes
into play. It provides a code-based “specification” to describe how the workload relates to
other workloads and their dependent resources. Adding a Resource Definition to the Score file
will tell the Orchestrator to automatically create a new resource or wire an existing one. We’ll
explain how this works in the next section.
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers:
python-service:
image: python
variables:
CONNECTION_STRING: postgresql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
We can see that the developer requires a database type Postgres, a storage of type S3 and a
DNS of type DNS. For the vast majority of use cases, this code-based format should be entirely
sufficient and is the preference for most developers.
20 How developers use such a platform
For specific situations (like running diffs, rolling back, or spinning up new environments), they
Portals and Service Catalogs are primarily used for consolidation and/or product managers/
Here’s a list of some activities a developer performs using an Internal Developer Platform and
Deploy Terminal/IDE
The workload source code contains the workload specification (Score), which in this case
might look like this:
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers:
python-service:
image: python
variables:
CONNECTION_STRING: postgresql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
We can see that the developer requires a database type Postgres, a storage of type S3 and a
DNS of type DNS.
23 Zooming in on a “golden path” to understand the interplay of all components
So after the CI has been built, the Platform Orchestrator realizes the context and looks up
what resources are matched against this context (in our case, it's maybe the CI tag
"environment = development"). It checks whether the resources are already created (which is
likely in this case because it’s just a deployment to an existing dev environment) and reaches
out to the AWS API to retrieve the resource credentials. It then creates the application configs
in the form of manifests because our target compute in this architecture is EKS. Once this is
done, the Orchestrator deploys the configs and injects the secrets at runtime into the
container (utilizing Vault).
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
CONTEXT:
containers:
python-service:
image: python
type: s3
dns:
type: dns
Platform
response Create app configs, configure resources
S 3 credentials injected
Deploy
Route 5 3 DNS configured
24 Zooming in on a “golden path” to understand the interplay of all components
Let’s repeat the same procedure but spin up a new environment. The developer experience
is as simple as it gets; they push the same repository and the same workload specification
(because it works across all environments). By setting the context to “ephemeral” (through
the tag), the Platform Orchestrator will again now interpret the workload specification. It will
realize that Postgres doesn’t exist yet and that it should create one using a specific Driver.
The Platform Orchestrator will then create the configs, inject the dependencies, and serve.
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers:
CONTEXT:
python-service:
env = development
image: python
v quest
variables:
Platform
De re CONNECTION_STRING: ql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources.db.port}/${resources.db.name}
postgres
Orchestrator
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
Create RDS
Platform Match resource definitions
This is a great example of how platform engineers utilize the platform to maintain a high
degree of standardization. Let’s explore how one would go about updating the dev Postgres
resource to the latest Postgres version. Given our architectural choice, let's assume all
resources use Terraform for Infrastructure as Code. To make the example more compelling,
let's also assume that development environments all have their own instance of a Postgres.
This approach is so expensive that, in reality, we would likely share instances across several
dev environments. So how would we go about updating all our Postgres resources across the
update the Terraform module. Otherwise, we would actually adopt the driver.
the Orchestrator.
id = " d b - d e v "
driver_inputs = {
values = {
secrets = {
criteria = [
}
27 Zooming in on a “golden path” to understand the interplay of all components
03 We then need to find out what workloads are currently depending on our
Resource Definition of “dev Postgres”. The answer can be found in our “rules
engine”, the Platform Orchestrator. Simply because this is where the “decision
is made” regarding what resources to use to wire the workload up, and in what
context. We can do this by pinging the Orchestrator API or looking at the user
interface in the Resource Definition section: “Usage”.
Another benefit of IDPs is streamlined config management, which reduces cognitive load.
Developers can focus on writing code instead of worrying about infrastructure, which can be
a complex and time-consuming task. With an IDP, developers can simply select the resources
they need and configure them as required, freeing up more time for coding.
IDPs also offer new superpowers that can boost productivity. For example, developers
can use Score as a workload spec, which allows them to specify the desired performance
characteristics of their application. They can also spin up PR environments, which can be
used to test and debug code changes before merging them into the main codebase.
Furthermore, the diff functionality for debugging allows developers to quickly identify and
fix issues, while secure infrastructure self-service ensures that the entire development
process remains secure.
29 Benefits of this Architecture
In conclusion, IDPs can have a significant impact on the productivity and efficiency of
application developers. By reducing dependencies and waiting times, streamlining config
management, and offering new superpowers, developers can focus on delivering
high-quality applications.
In addition to the automation benefits, IDPs also enable developer self-service, which reduces
waiting times and skyrockets productivity. This allows for faster innovation cycles and
enables organizations to stay ahead of the competition. Moreover, dynamic IDPs require fewer
full-time Ops employees per every application developer, which helps organizations
streamline overall operations and reduce costs.
Another important benefit of IDPs is cost control. By reducing cloud bills and optimizing
resource allocation, organizations can invest saved money in other business areas. This is
especially important in today's highly competitive landscape, where every dollar counts.
Overall, IDPs have the potential to revolutionize the way organizations develop and deploy
software. By leveraging automation, developer self-service, and other advanced technologies,
IDPs can help organizations to stay ahead of the curve and achieve their goals more quickly
and efficiently than ever before.
30 Conclusion
Conclusion
In conclusion, adopting modern Internal Developer Platforms (IDPs) using AWS EKS, RDS,
Backstage, Humanitec, GitHub Actions, Terraform, and several other technologies can help
organizations improve their developer experience, increase productivity and innovation, and
reduce cognitive load for developers. By implementing this architecture, organizations can also
deliver applications faster and more efficiently. However, it is important to remember that the
implementation of an IDP varies widely by organization, and our reference architecture is just a
Appendix
Capabilities of this architecture
Dynamic Configuration Management
Use the environment-agnostic workload specification to describe infrastructure
dependencies once and for all environments
Multiple workloads can depend on the same resource (e.g. a shared database or
DNS name)
A full history of all workload configuration, environment specific values and secrets
can be retrieved
Environment Management
New environments can be created on demand by cloning existing environments
Deployment Management
Deployments in an environment can be rolled back to a previous deployment
Deployments can be triggered based on criteria from source control, such as tag
format or branch name
Pipelines including additional pre and post-deployment steps can be defined (in beta)
Existing resources not managed can be used and connected via static definitions
Resource provisioning can depend on other resources, e.g. a database can depend on a
database instance which depends on the VPC that the environment is created in
A suite of drivers can be used to create, update and destroy infrastructure and
other resources
Driver for Terraform and other IaC formats to integrate brownfield setups
Observability
Can be used to standardize the integration of APM products
Container logs are surfaced without the user needing access to the cluster
Monitor environment health via workload, pod and container statuses. Errors are
displayed in real-time
Administration
Integration with SSO (SAML 1.1 & 2.0)
Run a self-hosted instance (still managed by Humanitec, but running in your network
and on your infrastructure)
Cost management
Resource limits
Pausing of environments
Integration
All functionality is available via the API
Long-lived API tokens can be issued to support integration with 3rd party systems
Humanitec GmbH
Wöhlertstraße 12-13, 10115 Berlin, Germany
Phone: +49 30 6293-8516
Humanitec Inc
228 East 45th Street, Suite 9E,
Humanitec Ltd
3rd Floor, 1 Ashley Road
United Kingdom
E-mail: [email protected]
Website: https://www.humanitec.com
Responsible for the content of humanitec.com ref. § 55 II RStV: Kaspar von Grünberg
IDE IaC
specs
control Workload m
Version Terrafor
Score
loper
Github Plane
rol
Resource
e
Compute
KS
CD Pipeli
ne Amazon E
Registry
Data
CI Pipeli
ne
Humanit
ec
RDS MySQL
Deploy
Platform
tor
Amazon
Orchestra ng
ECR Networki
Github
s
Action Route 53
ion
Integrvaetry Plane
& Deli
Services
QS
Amazon S
h
oud Watc
Amazon Cl
ility
Observab
ng &
MonitogriPlane Vault
Loggin HCP
ment
ty Manage
crets & Identi
Se
Plane
Security
humanitec.com