SystemDesign AWS Excerpt
SystemDesign AWS Excerpt
SystemDesign AWS Excerpt
on AWS
Building and Scaling Enterprise Solutions
Early
Release
Raw & Unedited
Compliments of
A modern Redis
replacement
memory stores.
22k stars
Built-in Reliability
Boost Performance
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. System Design on AWS, the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use
of or reliance on this work. Use of the information and instructions contained in this work is at your
own risk. If any code samples or other technology this work contains or describes is subject to open
source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights. This work is part of a collaboration between O'Reilly
and Dragonfly. See our statement of editorial independence.
978-1-098-14683-2
[FILL IN]
Table of Contents
iii
Cache Invalidation 41
Caching Strategies 43
Caching Deployment 45
Caching Mechanisms 46
Content Delivery Networks 48
Open Source Caching Solutions 51
Memcached 51
Redis 52
Conclusion 57
iv | Table of Contents
CHAPTER 1
System Design Trade-offs and Guidelines
5
This chapter covers the basics of system design, with the goal of helping you to
understand the concepts around system design itself, the system trade-offs that nat‐
urally arise in such large-scale software systems, the fallacies to avoid in building
such large scale systems and the guidelines—those wisdoms which were learnt after
building such large scale software systems over the years. This is simply meant to
introduce you to the basics—we’ll dig into details in later chapters, but we want you
to have a good foundation to start with. Let’s begin by digging into the basic system
design concepts.
Communication
A large scale software system is composed of small sub-systems, known as servers,
which communicate with each other, i.e. exchange information or data over the net‐
work to solve a business problem, provide business logic, and compose functionality.
Communication can take place in either a synchronous or asynchronous fashion,
depending on the needs and requirements of the system.
Figure 1-1 shows the difference in the action sequence of both synchronous and
asynchronous communication.
Let’s go over the details of both communication mechanisms in the following sec‐
tions.
Synchronous Communication
Consider a phone call conversation with your friend, you hear and speak with them
at the same time and also use pauses in between to allow for conversation to com‐
plete. This is an example of synchronous communication, a type of communication
in which two or more parties communicate with each other in real-time, with low
latency. This type of communication is more immediate and allows for quicker
resolution of issues or questions.
Asynchronous Communication
To give you an example of asynchronous communication, consider that instead of a
phone call conversation, you switch to email. As you communicate over email with
your friend, you send the message and wait for the reply at a later time (but within
an acceptable time limit). You also follow a practice to follow up again if there is
no response after this time limit has passed. This is an example of asynchronous
communication, a type of communication in which two or more parties do not
communicate with each other in real-time. Asynchronous communication can also
take place through messaging platforms, forums, and social media, where users can
post messages and responses may not be immediate.
In system design, a communication mechanism is asynchronous when the sender
does not block (or wait) for the call or execution to return from the receiver. Execu‐
tion continues on in your program or system, and when the call returns from the
receiving server, a “callback” function is executed. In system design, asynchronous
communication is often used when immediate response is not required, or when the
system needs to be more flexible and tolerant of delays or failures.
In general, the choice of synchronous or asynchronous communication in system
design depends on the specific requirements and constraints of the system. Synchro‐
nous communication is often preferred when real-time response is needed (such as
the communication between the frontend UI and the backend), while asynchronous
communication is often preferred when flexibility and robustness are more important
(such as the communication to check the status of a long running job).
Consistency
Consistency, i.e. the requirement of being consistent, or in accordance with a set of
rules or standards, is an important issue when it comes to communication between
servers in a software system. Consistency can refer to a variety of concepts and
contexts in system design.
In the context of distributed systems, consistency can be the property of all replica
nodes (more on this in a moment) or servers having the same view of data at a given
point in time. This means that all replica nodes have the same data, and any updates
to the data are immediately reflected on all replica nodes.
Figure 1-2 shows the difference in the result of performing action sequence under
strong consistency and eventual consistency. As you can see in the figure on the left
for strong consistency, when x is read from a replica node after updating it from 0
to 2, it will block the request until replication happens and then, return 2 as result.
On the other side in the figure on the right for eventual consistency, on querying the
replica node, it will give stale result of x as 0 before replication completes.
Figure 1-2. Sequence Diagram for Strong Consistency and Eventual Consistency
Measuring Availability
Availability can be measured mathematically as the percentage of the time the system
was up (total time - time system was down) over the total time the system should
have been running.
Availability percentages are represented in 9s, based on the above formula over a
period of time. You can see the breakdown of what these numbers really work out to
in Table 1-1.
The goal for availability is usually to achieve the highest level possible, such as “five
nines” (99.999%) or even “six nines” (99.9999%). However, the level of availability
that is considered realistic or achievable depends on several factors, including the
complexity of the system, the resources available for maintenance and redundancy,
and the specific requirements of the application or service.
Achieving higher levels of availability becomes progressively more challenging
and resource-intensive. Each additional nine requires an exponential increase in
redundancy, fault-tolerant architecture, and rigorous maintenance practices. It often
involves implementing redundant components, backup systems, load balancing, fail‐
over mechanisms, and continuous monitoring to minimize downtime and ensure
rapid recovery in case of failures.
If the components are in sequence, the overall availability of the service will be the
product of the availability of each component. For example, if two components with
99.9% availability are in sequence, their total availability will be 99.8%.
On the other hand, if the components are in parallel, the overall availability of the
service will be the sum of the availability of each component minus the product of
Ensuring Availability
Ensuring availability in a system is important for maintaining the performance and
reliability of the system. There are several ways to increase the availability of a system,
including:
Redundancy
By having multiple copies of critical components or subsystems, a system can
continue to function even if one component fails. This can be achieved through
the use of redundant load balancers, failover systems, or replicated data stores.
Fault tolerance
By designing systems to be resistant to failures or errors, the system can continue
to function even in the face of unexpected events. This can be achieved through
the use of error-handling mechanisms, redundant hardware, or self-healing sys‐
tems.
Load balancing
By distributing incoming requests among multiple servers or components, a
system can more effectively handle heavy load and maintain high availability.
This can be achieved through the use of multiple load balancers or distributed
systems.
Availability Patterns
To ensure availability, there are two major complementary patterns to support high
availability: fail-over and replication pattern.
The failover pattern can involve the use of additional hardware and can add com‐
plexity to the system. There is also the potential for data loss if the active system
fails before newly written data can be replicated to the passive system. Overall, the
choice of failover pattern depends on the specific requirements and constraints of the
system, including the desired level of availability and the cost of implementing the
failover solution.
Figure 1-5. Multi-leader replication system setup vs Single leader replication system
setup
There is a risk of data loss if the leader system fails before newly written data can be
replicated to other nodes. And thus, the more read replicas that are used, the more
writes need to be replicated, which can lead to greater replication lag. In addition, the
use of read replicas can impact the performance of the system, as they may be bogged
down with replaying writes and unable to process as many reads. Furthermore,
Reliability
In system design, reliability refers to the ability of a system or component to perform
its intended function consistently and without failure over a given period of time.
It is a measure of the dependability or trustworthiness of the system. Reliability is
typically expressed as a probability or percentage of time that the system will operate
without failure. For example, a system with a
reliability of 99% will fail only 1% of the time. Let’s try to quantify the measurement
of reliability of the system.
Measuring Reliability
One way to measure the reliability of a system is through the use of mean time
between failures(MTBF) and mean time to repair (MTTR).
Mean time between failures
Mean time between failures (MTBF) is a measure of the average amount of time
that a system can operate without experiencing a failure. It is typically expressed
in hours or other units of time. The higher the MTBF, the more reliable the
system is considered to be.
Together, MTBF and MTTR can be used to understand the overall reliability of a
system. For example, a system with a high MTBF and a low MTTR is considered to
be more reliable than a system with a low MTBF and a high MTTR, as it is less likely
to experience failures and can be restored to operation more quickly when failures do
occur.
Scalability
In system design, we need to ensure that the performance of the system increases
with the resources added based on the increasing workload, which can either be
request workload or data storage workload. This is referred to as scalability in system
design, which requires the system to respond to increased demand and load. For
example, a social network needs to scale with the increasing number of users as well
as the content feed on the platform, which it indexes and serves.
Vertical Scaling. Vertical scaling involves meeting the load requirements of the system
by increasing the capacity of a single server by upgrading it with more resources
(CPU, RAM, GPU, storage etc) as shown on the left in Figure 1-6. Vertical scaling is
useful when dealing with predictable traffic, as it allows for more resources to be used
to handle the existing demand. However, there are limitations to how much a single
server can scale up based on its current configuration and also, the cost of scaling
up is generally high as adding more higher end resources to the existing server will
require more dollars for high end configurations.
Horizontal Scaling . Horizontal scaling involves meeting the load requirements of the
system by increasing the number of the servers by adding more commodity servers
to serve the requests as shown on the right in Figure 1-6. Horizontal scaling is useful
when dealing with unpredictable traffic, as adding more servers increases the servers
capacity to handle more requests and if demand arises, more servers can further
be added to the pool cost-effectively However, though horizontal scaling provides
a better dollar cost proposition for scaling, the complexity of managing multiple
servers and ensuring they work collectively as an abstracted single server to handle
the workload is the catch here.
In early stage systems, you can start scaling up by vertically scaling the system and
adding better configuration to it and later, when you hit the limitation in further
scaling up, you can move to horizontally scaling the system.
Maintainability
In system design, maintainability is the ability of the system to be modified, adapted,
or extended to meet the changing needs of its users while ensuring smooth system
operations. In order for a software system to be maintainable, it must be designed to
be flexible and easy to modify or extend.
The maintainability of a system requires covering these three underlyings aspects of
the system:
Operability
This requires the system to operate smoothly under normal conditions and even
return back to normal operations within stipulated time after a fault. When a
system is maintainable in terms of operability, it reduces the time and effort
required to keep the system running smoothly. This is important because effi‐
cient operations and management contribute to overall system stability, reliabil‐
ity, and availability.
Lucidity
This requires the system to be simple and lucid to understand, extend to add
features and even, fix bugs. When a system is lucid, it enables efficient collab‐
oration among team members, simplifies debugging and maintenance tasks,
and facilitates knowledge transfer. It also reduces the risk of introducing errors
during modifications or updates.
Modifiability
This requires the system to be built in a modular way to allow it to be modified
and extended easily, without disrupting the functionality of other subsystems.
Modifiability is vital because software systems need to evolve over time to adapt
to new business needs, technological advancements, or user feedback. A system
that lacks modifiability can become stagnant, resistant to change, and difficult to
enhance or adapt to future demands.
By prioritizing maintainability, organizations can reduce downtime, lower mainte‐
nance costs, enhance productivity, and increase the longevity and value of their
software systems.
Replication
Replication based fault tolerance ensures data safety as well as serving the request by
replicating both the service through multiple replica servers and also, replicating the
data through multiple copies of data across multiple storage servers. During a failure,
the failed node gets swapped with a fully functioning replica node. Similarly data is
also served again from a replica store, in case the data store has failed. The replication
patterns were already covered in the previous section in availability.
Checkpointing
Checkpointing based fault tolerance ensures that data is reliably stored and backed
up, even after the initial processing is completed. It allows for a system to recover
from any potential data loss, as it can restore a previous system state and prevent
data loss. Checkpointing is commonly used to ensure system and data integrity,
especially when dealing with large datasets. It can also be used to verify that data
is not corrupted or missing, as it can quickly detect any changes or issues in the
data and then take corrective measures. Checkpointing is an important tool for data
integrity, reliability, and security, as it ensures that all data is stored properly and
securely.
There are two checkpointing patterns — it can be done in either using synchronous
or asynchronous mechanisms.
These fallacies cover the basic assumptions we should avoid while building large scale
systems. Overall, neglecting the fallacies of distributed computing can lead to a range
of issues, including system failures, performance bottlenecks, data inconsistencies,
security vulnerabilities, scalability challenges, and increased system administration
complexity. It is important to acknowledge and account for these fallacies during
the design and implementation of distributed systems to ensure their robustness,
reliability, and effective operation in real-world environments. Lets also, go through
the trade-offs in the next section, which are generally encountered in designing large
scale software systems.
Time vs Space
Space time trade-offs or time memory trade-offs arise inherently in implementation
of the algorithms in computer science for the workload, even in distributed systems.
This trade-off is necessary because system designers need to consider the time limita‐
tions of the algorithms and sometimes use extra memory or storage to make sure
everything works optimally. One example of such a trade-off is in using look-up
tables in memory or data storage instead of performing recalculation and thus,
serving more requests by just looking up pre-calculated values.
Latency vs Throughput
Another trade-off that arises inherently in system design is latency vs throughput.
Before diving into the trade-off, let’s make sure you understand these concepts
thoroughly.
Latency, Processing time and Response time
Latency is the time that a request waits to be handled. Until the request is picked
up to be handled, it is latent, inactive, queued or dormant.
Processing time, on the other hand, is the time taken by the system to process the
request, once it is picked up.
Hence, the overall response time is the duration between the request that was
sent and the corresponding response that was received, accounting for network
and server latencies.
Mathematically, it can be represented by the following formula:
Performance vs Scalability
As discussed earlier in the chapter, scalability is the ability of the system to respond to
increased demand and load. On the other hand, performance is how fast the system
responds to a single request. A service is scalable if it results in increased performance
in a manner proportional to resources added. When a system has performance
problems, it is slow for a single user (p50 latency = 100ms) while when the system
has scalability problems, the system may be fast for some users (p50 latency = 1ms
for 100 requests) but slow under heavy load for the users (p50 latency = 100ms under
100k requests).
CAP Theorem
The CAP theorem, as shown in Venn Diagram Figure 1-7, states that it is impossible
for a distributed system to simultaneously provide all three of the following guaran‐
tees: consistency (C), availability (A), and partition tolerance (P). According to the
theorem, a distributed system can provide at most two of these guarantees at any
given time. Systems need to be designed to handle network partitions as networks
aren’t reliable and hence, partition tolerance needs to be built in. So, in particular, the
CAP theorem implies that in the presence of a network partition, one has to choose
between consistency and availability.
PACELC Theorem
The PACELC theorem, as shown in Decision Flowchart Figure 1-8, is more nuanced
version of CAP theorem, which states that in the case of network partitioning (P)
in a distributed computer system, one has to choose between availability (A) and
consistency (C) (as per the CAP theorem), but else (E), even when the system is
running normally in the absence of partitions, one has to choose between latency
(L) and consistency (C). This trade-off arises naturally because to handle network
partitions, data and services are replicated in large scale systems, leading to the choice
between the consistency spectrum and the corresponding latency.
If the system tries to provide for strong consistency at the one end of the consis‐
tency spectrum model, it has to do replication with synchronous communication
In summary, the CAP and PACELC theorems are important concepts in distributed
systems design that provide a framework for understanding the trade-offs involved in
designing highly available and strongly consistent systems.
Given such requirements, fallacies and trade-offs in system design, in order to avoid
repeating mistakes of the past we should prescribe to a set of guidelines learnt by the
previous generation of software design practitioners. Let’s dig into those now.
The first guideline is to build the system modularly, i.e. break down a complex system
into smaller, independent components or modules that can function independently,
yet also work together to form the larger system. Building it modularly helps in
improving all the requirements of the large scale system:
Maintainability
Modules can be updated or replaced individually without affecting the rest of the
system.
Reusability
Modules can be reused in different systems or projects, reducing the amount of
new development required.
Scalability
Modules can be added or removed and even scaled independently as needed to
accommodate changes in requirements or to support growth.
Reliability
Modules can be tested and validated independently, reducing the risk of system-
wide failures.
Modular systems can be implemented in a variety of ways, including through the
use of microservices architecture, component-based development, and modular pro‐
gramming, which we will cover in more detail in chapter 7. However, designing
modular systems can be challenging, as it requires careful consideration of the inter‐
faces between modules, data sharing and flow, and dependencies.
The second guideline is to keep the design simple by avoiding complex and unneces‐
sary features and avoiding over-engineering. To build simple systems using the KISS
(Keep it Simple, Silly) guideline, designers can follow these steps:
1. 1. Identify the core requirements: Determine the essential features and functions
the system must have, and prioritize them.
2. 2. Minimize the number of components: Reduce the number of components or
modules in the system, making sure each component serves a specific purpose.
3. 3. Avoid over-engineering: Don’t add unnecessary complexity to the system, such
as adding features that are not necessary for its functioning.
4. 4. Make the system easy to use: Ensure the system is intuitive and straightforward
for users to use and understand.
5. 5. Test and refine: Test the system to ensure it works as intended and make
changes to simplify the system if necessary.
By following the KISS guideline, you as a system designer can build simple, efficient,
and effective systems that are easy to maintain and less prone to failure.
The third guideline is to measure then build, and rely on the metrics as you can’t
cheat the performance and scalability. Metrics and observability are crucial for the
operation and management of large scale systems. These concepts are important for
understanding the behavior and performance of large-scale systems and for identify‐
ing potential issues before they become problems.
Metrics
Metrics are quantitative measures that are used to assess the performance of a
system. They provide a way to track key performance indicators, such as resource
utilization, response times, and error rates, and to identify trends and patterns
in system behavior. By monitoring metrics, engineers can detect performance
bottlenecks and anomalies, and take corrective actions to improve the overall
performance and reliability of the system.
Observability
Observability refers to the degree to which the state of a system can be inferred
from its externally visible outputs. This includes being able to monitor system
health and diagnose problems in real-time. Observability is important in large
The fifth guideline is that design always depends, as system design is a complex and
multifaceted process that is influenced by a variety of factors, including requirements,
user needs, technological constraints, cost, scalability, maintenance and even, regula‐
tions. By considering these and other factors, you can develop systems that meet the
needs of the users, are feasible to implement, and are sustainable over time. Since
there are many ways to design a system to solve a common problem, it indicates a
stronger underlying truth: there is no one “best” way to design the system, i.e. there is
no silver bullet. Thus, we settle for something reasonable and hope it is good enough.
Conclusion | 35
CHAPTER 4
Caching Policies and Strategies
37
present in the cache, requiring the system to fetch the data from the main memory
or external storage. Cache hit rates measure the effectiveness of the cache in serving
requests without needing to access slower external storage, while cache miss rates
indicate how often the cache fails to serve requested data.
This chapter will cover important information to help you understand how to use
data caching effectively. We’lll talk about cache eviction policies, which are rules for
deciding when to remove data from the cache to make retrieval of important data
faster. We’ll also cover cache invalidation, which ensures the cached data is always
correct and matches the real underlying data source. The chapter will also discuss
caching strategies for both read and write intensive applications. We’ll also cover
how to actually put caching into action, including where to put the caches to get the
best results. You’ll also learn about different ways caching works and why Content
Delivery Networks (CDNs) are important. And finally, you’ll learn about two popular
open-source caching solutions. So, let’s get started with the benefits of caching.
Caching Benefits
Caches play a crucial role in improving system performance and reducing latency for
several reasons:
Faster Access
Caches offer faster access times compared to main memory or external storage.
By keeping frequently accessed data closer to the CPU, cache access times can be
significantly lower, reducing the time required to fetch data.
Reduced Latency
Caches help reduce latency by reducing the need to access slower storage
resources. By serving data from a cache hit, the system avoids the delay associ‐
ated with fetching data from main memory or external sources, thereby reducing
overall latency.
Bandwidth Optimization
Caches help optimize bandwidth usage by reducing the number of requests sent
to slower storage. When data is frequently accessed from the cache, it reduces the
demand on the memory bus or external interfaces, freeing up resources for other
operations.
Improved Throughput
Caches improve overall system throughput by allowing the CPU to access fre‐
quently needed data quickly, without waiting for slower storage access. This
enables the CPU to perform more computations in a given amount of time,
increasing overall system throughput.
Belady’s Algorithm
Belady’s algorithm is an optimal caching algorithm that evicts the data item that will
be used furthest in the future. It requires knowledge of the future access pattern,
Allowlist Policy
An allowlist policy for cache replacement is a mechanism that defines a set of
prioritized items eligible for retention in a cache when space is limited. Instead
of using a traditional cache eviction policy that removes the least recently used or
least frequently accessed items, an allowlist policy focuses on explicitly specifying
which items should be preserved in the cache. This policy ensures that important
or high-priority data remains available in the cache, even during periods of cache
pressure. By allowing specific items to remain in the cache while evicting others,
the allowlist policy optimizes cache utilization and improves performance for critical
data access scenarios.
Caching policies serve different purposes and exhibit varying performance charac‐
teristics based on the access patterns and workload of the system. Choosing the
right caching policy depends on the specific requirements and characteristics of the
application.
By understanding and implementing these caching policies effectively, system design‐
ers and developers can optimize cache utilization, improve data retrieval perfor‐
mance, and enhance the overall user experience. Let’s discuss different cache
invalidation strategies, which are applied post identifying which data to evict based
on the above cache eviction policies.
Cache Invalidation
Cache invalidation is a crucial aspect of cache management that ensures the cached
data remains consistent with the underlying data source. Effective cache invalidation
strategies help maintain data integrity and prevent stale or outdated data from being
served. Here are three common cache invalidation techniques: active invalidation,
invalidating on modification, invalidating on read, and time-to-live (TTL).
Active Invalidation
Active invalidation involves explicitly removing or invalidating cached data when
changes occur in the underlying data source. This approach requires the appli‐
cation or the system to actively notify or trigger cache invalidation operations.
For example, when data is modified or deleted in the data source, the cache
is immediately updated or cleared to ensure that subsequent requests fetch the
Cache Invalidation | 41
latest data. Active invalidation provides precise control over cache consistency
but requires additional overhead to manage the invalidation process effectively.
Invalidating on Modification
With invalidating on modification, the cache is invalidated when data in the
underlying data source is modified. When a modification operation occurs,
such as an update or deletion, the cache is notified or flagged to invalidate
the corresponding cached data. The next access to the invalidated data triggers
a cache miss, and the data is fetched from the data source, ensuring the cache
contains the most up-to-date information. This approach minimizes the chances
of serving stale data but introduces a slight delay for cache misses during the
invalidation process.
Invalidating on Read
In invalidating on read, the cache is invalidated when the cached data is accessed
or read. Upon receiving a read request, the cache checks if the data is still valid
or has expired. If the data is expired or flagged as invalid, the cache fetches the
latest data from the data source and updates the cache before serving the request.
This approach guarantees that fresh data is always served, but it adds overhead
to each read operation since the cache must validate the data’s freshness before
responding.
Time-to-Live (TTL)
Time-to-Live is a cache invalidation technique that associates a time duration
with each cached item. When an item is stored in the cache, it is marked with
a TTL value indicating how long the item is considered valid. After the TTL
period elapses, the cache treats the item as expired, and subsequent requests for
the expired item trigger cache misses, prompting the cache to fetch the latest
data from the data source. TTL-based cache invalidation provides a simple and
automatic way to manage cache freshness, but it may result in serving slightly
stale data until the TTL expires.
The choice of cache invalidation strategy depends on factors such as the nature of
the data, the frequency of updates, the performance requirements, and the desired
consistency guarantees. Active invalidation offers precise control but requires active
management, invalidating on modification ensures immediate data freshness, invalid‐
ating on read guarantees fresh data on every read operation, and TTL-based invalida‐
tion provides a time-based expiration mechanism. Understanding the characteristics
of the data and the system’s requirements helps in selecting the appropriate cache
invalidation strategy to maintain data consistency and improve overall performance.
The next section covers different caching strategies to ensure that data is properly
consistent between the cache and the underlying data source.
The left-hand side of the diagram displays read-intensive caching strategies, focusing
on optimizing the retrieval of data that is frequently read or accessed. The goal
of a read-intensive caching strategy is to minimize the latency and improve the
overall performance of read operations by serving the cached data directly from
memory, which is much faster than fetching it from a slower and more distant data
source. This strategy is particularly beneficial for applications where the majority of
operations involve reading data rather than updating or writing data.
Let’s take a look at those in more detail:
Cache-Aside
Cache-aside caching strategy, also known as lazy loading, delegates the responsi‐
bility of managing the cache to the application code. When data is requested,
the application first checks the cache. If the data is found, it is returned from
the cache. If the data is not in the cache, the application retrieves it from the
data source, stores it in the cache, and then returns it to the caller. Cache-aside
caching offers flexibility as the application has full control over caching decisions
but requires additional logic to manage the cache.
Caching Strategies | 43
Read-Through
Read-through caching strategy retrieves data from the cache if available; other‐
wise, it fetches the data from the underlying data source. When a cache miss
occurs for a read operation, the cache retrieves the data from the data source,
stores it in the cache for future use, and returns the data to the caller. Subsequent
read requests for the same data can be served directly from the cache, improving
the overall read performance. This strategy offloads the responsibility of manag‐
ing cache lookups from the application unlike Cache-aside strategy, providing a
simplified data retrieval process.
Refresh-Ahead
Refresh-ahead caching strategy, also known as prefetching, proactively retrieves
data from the data source into the cache before it is explicitly requested. The
cache anticipates the future need for specific data items and fetches them in
advance. By prefetching data, the cache reduces latency for subsequent read
requests and improves the overall data retrieval performance.
The right-hand side of the diagram displays the write-intensive strategies, focussing
around optimizing the storage and management of data that is frequently updated
or written. Unlike read-intensive caching, where the focus is on optimizing data
retrieval, a write-intensive caching strategy aims to enhance the efficiency of data
updates and writes, while still maintaining acceptable performance levels. In a write-
intensive caching strategy, the cache is designed to handle frequent write operations,
ensuring that updated data is stored temporarily in the cache before being eventually
synchronized with the underlying data source, such as a database or a remote server.
This approach can help reduce the load on the primary data store and improve the
application’s responsiveness by acknowledging write operations more quickly.
Let’s take a look at those in more detail:
Write-Through
Write-through caching strategy involves writing data to both the cache and the
underlying data source simultaneously. When a write operation occurs, the data
is first written to the cache and then immediately propagated to the persistent
storage synchronously before the write operation is considered complete. This
strategy ensures that the data remains consistent between the cache and the data
source. However, it may introduce additional latency due to the synchronous
write operations.
Write-Around
Write-around caching strategy involves bypassing the cache for write operations.
When the application wants to update data, it writes directly to the underlying
data source, bypassing the cache. As a result, the written data does not reside
in the cache, reducing cache pollution with infrequently accessed data. However,
Caching Deployment
When deploying a cache, various deployment options are available depending on
the specific requirements and architecture of the system. Here are three common
cache deployment approaches: in-process caching, inter-process caching, and remote
caching.
In-Process Caching
In in-process caching, the cache resides within the same process or application as
the requesting component. The cache is typically implemented as an in-memory
data store and is directly accessible by the application or service. In-process
caching provides fast data access and low latency since the cache is located within
the same process, enabling direct access to the cached data. This deployment
approach is suitable for scenarios where data sharing and caching requirements
are limited to a single application or process.
Inter-Process Caching
Inter-process caching involves deploying the cache as a separate process or ser‐
vice that runs alongside the applications or services. The cache acts as a dedicated
caching layer that can be accessed by multiple applications or processes. Appli‐
cations communicate with the cache using inter-process communication mecha‐
nisms such as shared memory, pipes, sockets, or remote procedure calls (RPC).
Caching Deployment | 45
Inter-process caching allows multiple applications to share and access the cached
data, enabling better resource utilization and data consistency across different
components. It is well-suited for scenarios where data needs to be shared and
cached across multiple applications or processes within a single machine.
Remote Caching
Remote caching involves deploying the cache as a separate service or cluster
that runs on a different machine or location than the requesting components.
The cache service is accessed remotely over a network using protocols such as
HTTP, TCP/IP, or custom communication protocols. Remote caching enables
distributed caching and can be used to share and cache data across multiple
machines or even geographically distributed locations. It provides scalability,
fault-tolerance, and the ability to share cached data among different applications
or services running on separate machines. Remote caching is suitable for scenar‐
ios that require caching data across a distributed system or when the cache needs
to be accessed by components running on different machines.
The choice of cache deployment depends on factors such as the scale of the system,
performance requirements, data sharing needs, and architectural considerations. In-
process caching offers low latency and direct access to data within a single process,
inter-process caching enables sharing and caching data across multiple applications
or processes, and remote caching provides distributed caching capabilities across
multiple machines or locations. Understanding the specific requirements and charac‐
teristics of the system helps in selecting the appropriate cache deployment strategy
to optimize performance and resource utilization. Let’s cover different caching mech‐
anisms to improve application performance in the next section.
Caching Mechanisms
In this section, we will explore different caching mechanisms, including client-side
caching, CDN caching, web server caching, application caching, database caching,
query-level caching, and object-level caching.
Client-side Caching
Client-side caching involves storing cached data on the client device, typically in
the browser’s memory or local storage. This mechanism allows web applications
to store and retrieve static resources, such as HTML, CSS, JavaScript, and images,
directly from the client’s device. Client-side caching reduces the need to fetch
resources from the server on subsequent requests, leading to faster page load
times and improved user experience.
CDN Caching
Content Delivery Network (CDN) caching is a mechanism that involves cach‐
ing static content on distributed servers strategically located across different
Caching Mechanisms | 47
cache invalidation, cache coherence, and cache management strategies to ensure the
consistency and integrity of the cached data.
Out of the above mechanisms, Content Delivery Networks (CDNs) play a crucial
role in improving the performance and availability of web content to end-users by
reducing latency and enhancing scalability by caching at edge locations. Let’s cover
CDNs in detail in the next section.
AWS offers Amazon Cloudfront, a pull CDN offering built for high
performance, security, and developer convenience, which we will
cover in more detail in Chapter 9 - AWS Network Services.
Before ending the section, let’s also understand that using a CDN can come with
certain drawbacks also due to cost, stale content and frequent URL changes. CDNs
may involve significant costs depending on the amount of traffic. However, it’s impor‐
tant to consider these costs in comparison to the expenses you would incur without
utilizing a CDN. If updates are made before the TTL expires, there is a possibility
of content being outdated until it is refreshed on the CDN. CDNs require modifying
URLs for static content to point to the CDN, which can be an additional task to
manage.
Overall, CDNs offer benefits in terms of performance and scalability but require
careful consideration of these factors and the specific needs of your website. At the
end of this chapter, let’s dive deeper into two popular open-source caching solutions
to understand their architecture and how they implement the caching concepts
discussed in the chapter.
Memcached
Memcached is an open-source, high-performance caching solution widely used in
web applications. It operates as a distributed memory object caching system, storing
data in memory across multiple servers. Here are some key features and benefits of
Memcached:
Simple and Lightweight
Memcached is designed to be simple, lightweight, and easy to deploy. It focuses
solely on caching and provides a straightforward key-value interface for data
storage and retrieval.
Horizontal Scalability
Memcached follows a distributed architecture, allowing it to scale horizontally by
adding more servers to the cache cluster. This distributed approach ensures high
availability, fault tolerance, and improved performance for growing workloads.
Protocol Compatibility
Memcached adheres to a simple protocol that is compatible with various pro‐
gramming languages. This compatibility makes it easy to integrate Memcached
into applications developed in different languages.
Transparent Caching Layer
Memcached operates as a transparent caching layer, sitting between the applica‐
tion and the data source. It helps alleviate database or API load by caching
frequently accessed data, reducing the need for repetitive queries.
Let’s take a look at Memcached’s architecture.
Memcached Architecture
Memcached’s architecture consists of a centralized server that coordinates the storage
and retrieval of cached data. When a client sends a request to store or retrieve data,
the server handles the request and interacts with the underlying memory allocation
strategy.
Memcached follows a multi-threaded architecture that enables it to efficiently han‐
dle concurrent requests and scale across multiple CPU cores. In this architecture,
Memcached utilizes a pool of worker threads that can simultaneously process cli‐
ent requests. Each worker thread is responsible for handling a subset of incoming
Redis
Redis, short for Remote Dictionary Server, is a server-based in-memory data struc‐
ture store that can serve as a high-performance cache. Unlike traditional databases
that rely on iterating, sorting, and ordering rows, Redis organizes data in customiz‐
able data structures from the ground up, supporting a wide range of data types,
including strings, bitmaps, bitfields, lists, sets, hashes, geospatial, hyperlog and more,
making it versatile for various caching use cases. Here are some key features and
benefits of Redis:
High Performance
Redis is designed for speed, leveraging an in-memory storage model that allows
for extremely fast data retrieval and updates. It can handle a massive number of
operations per second, making it suitable for high-demand applications.
Persistence Options
Redis provides persistence options that allow data to be stored on disk, ensuring
durability even in the event of system restarts. This feature makes Redis suitable
RDB Files (Redis Database Files). RDB is the default persistence model in Redis. It peri‐
odically creates snapshots of the dataset and saves them as binary RDB files. These
files capture the state of the Redis database at a specific point in time. Here are key
features and considerations of RDB persistence:
Snapshot-based Persistence
RDB persistence works by periodically taking snapshots of the entire dataset
and storing it in a file. The frequency of snapshots can be configured based on
requirements.
1. 1. Forking: Redis uses the fork() system call to create a child process, which is
an identical copy of the parent process. Forking is a lightweight operation as it
creates a copy-on-write clone of the parent’s memory.
2. 2. Copy-on-Write (COW): Initially, the child process shares the same memory
pages with the parent process. However, when either the parent or child process
modifies a memory page, COW comes into play. Instead of immediately dupli‐
cating the modified page, the operating system creates a new copy only when
necessary.
Conclusion
In concluding this chapter on caching, we have journeyed through a comprehensive
exploration of the fundamental concepts and strategies that empower efficient data
caching. We’ve covered cache eviction policies, cache invalidation mechanisms, and
a plethora of caching strategies, equipping you with the knowledge to optimize data
access and storage. We’ve delved into caching deployment, understanding how strate‐
gic placement can maximize impact, and explored the diverse caching mechanisms
available. Additionally, we’ve touched upon Content Delivery Networks (CDNs) and
open-source caching solutions including Redis and Memcached, that offer robust
options for enhancing performance. By incorporating Redis or Memcached into your
architecture, you can significantly improve application performance, reduce response
times, and enhance the overall user experience by leveraging the power of in-memory
caching.
As we move forward in our exploration of enhancing system performance, the next
chapter will embark on an exploration of scaling and load balancing strategies. Scal‐
ing is a pivotal aspect of modern computing, allowing systems to handle increased
loads gracefully. We will also delve into strategies for load balancing in distributing
incoming traffic efficiently. Together, these topics will empower you to design and
maintain high-performing systems that can handle the demands of today’s dynamic
digital landscape.
Conclusion | 57
About the Authors
Jayanth Kumar is a Software Development Manager at Amazon, where he is
currently building large-scale software systems. Kumar is a millennial polymath,
published poet, certified AWS Solutions Architect Professional, entrepreneur, engi‐
neering leader and an assistant professor. He earned his bachelorâs degree from IIT
Bombay and his masterâs degree from UCLA, where he studied Multi-Objective
Deep Learning. He formerly held software engineering positions at SAP Germany
and Silicon Valley. Later, as an entrepreneur, he held the positions of Head of Engi‐
neering at Goodhealth and Engineering Manager at Delhivery, an Indian unicorn
company. He is always seeking a challenge and new opportunities for learning, and
focuses on building robust mechanisms and systems that will stand the test of time.
Mandeep Singh is a Software Engineer at Jupiter. He fell in love with Cloud, AWS
Technologies and Distributed Systems in the role of Big Data Cloud Support Asso‐
ciate at AWS and as a Software Engineer at Amazon. He supports the learning and
development of others by sharing lessons on Cloud and Distributed Systems on his
YouTube channel. He enjoys morning runs, weight lifting, cooking and spending time
with his family in the serene ambience of his hometown.