Creating A Modern Analytics Architecture
Creating A Modern Analytics Architecture
Creating A Modern Analytics Architecture
Modern Analytics
Architecture
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
2
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
3
CREATING A MODERN ANALYTICS ARCHITECTURE
Almost all organizations have built data warehouses for reporting and analytics
purposes. They use data from a variety of sources, including their own
transaction-processing systems and other databases. Many have also built
Hadoop frameworks for analyzing what is commonly called “big data” or data
that does not fit well in highly structured data warehouses. Building and running a
data warehouse and a big data framework have been complicated and expensive.
analytics
4
CREATING A MODERN ANALYTICS ARCHITECTURE
When data volumes grow or you want to make analytics and reports available to
more users, you have to choose between accepting slow query performance or
investing time and effort on an expensive upgrade process. In fact, some IT teams
discourage augmenting data or adding queries to protect existing service-level
agreements. To mitigate this, organizations often set up multiple data marts.
These contain copies of a subset of the data in the data warehouse. Specialized
and long-running queries then don’t impact the performance and SLAs of
mission-critical business operations and decision-making. This complicates the
data and analytics infrastructure and further locks organizations to the chosen
vendors for their data warehouse and data marts.
analytics
5
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
6
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
7
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
8
CREATING A MODERN ANALYTICS ARCHITECTURE
Before data can be analyzed, it needs to be collected, processed and stored. You
can think of this as an analytics pipeline that extracts data from source systems,
processes the data, and then loads it into data stores where it can be analyzed.
Analytics pipelines are designed to handle large volumes of incoming data from
heterogeneous sources such as databases, applications, and devices.
1. Collect data
2. Process data
3. Store data
4. Analyze and visualize data
5. Predict future outcomes
analytics
9
CREATING A MODERN ANALYTICS ARCHITECTURE
Collect data
Consider the different types of data—transactional data, log data, streaming
data, and Internet of Things (IoT) data. Each type may be stored in data
stores best suited for the data and its use. Some data stores are optimized for
transactional or relational data and others for nonrelational or unstructured
data. Your strategy should be to use a purpose-built database that best fits the
data and the applications that produce or consume the data.
• IoT Data: Devices and sensors around the world send messages continuously.
Organizations see a growing need today to capture this data and derive
intelligence from it.
analytics
10
CREATING A MODERN ANALYTICS ARCHITECTURE
Process data
The collection process gathers or extracts data from data sources, transforms the
data, and stores the data in a separate destination such as another database, a
data lake, or an analytics service like a data warehouse where it can be processed
or analyzed.
Batch data loading has been, and still is, pervasive. Nightly batch jobs extract data
from one system, transform it into a ready-to-consume format for analytics, and
load it into a destination. This introduces delays before data is available to those
who need it.
• Extract Load Transform (ELT): ELT is a variant of ETL where the extracted
data is loaded into the target system before any transformations are made.
The schema is defined when the data is read or used (schema-on-read).
ELT typically works well when your target system is powerful enough to
handle transformations and when you want to explore the data in ways not
consistent with a predefined schema.
analytics
11
CREATING A MODERN ANALYTICS ARCHITECTURE
Store data
You can store your data in either a data lake or an analytics tool like a data
warehouse.
A data lake is a centralized repository for all data, including structured and
unstructured. In a data lake, the schema is not defined, enabling additional
types of analytics like big data analytics, full text search, real-time analytics,
and machine learning. More and more, organizations are using data lakes as a
central repository for all data so it can be used by downstream applications
and analytics tools.
A data warehouse utilizes a predefined schema optimized for analytics, and the
data is highly curated and serves as a single source of the truth from multiple
data sources.
• Data Lake: Data lakes can handle the scale, agility, and flexibility required
to combine different types of data and analytics approaches to gain deeper
insights in ways that traditional data silos and data warehouses cannot.
They give organizations the flexibility to use the widest array of analytics
and machine learning services, with easy access to all relevant data, without
compromising on security or governance.
analytics
12
CREATING A MODERN ANALYTICS ARCHITECTURE
Analyze data
analytics
13
CREATING A MODERN ANALYTICS ARCHITECTURE
Other big data analytics tools should be able to access the same data in the data
lake. This allows everyone across the organization from business users to data
scientists and everyone in between to have confidence in both the data and their
analytics results.
• Big Data Analytics: Big data processing uses the Hadoop and Spark
frameworks to process vast amounts of data.
analytics
14
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
15
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
16
CREATING A MODERN ANALYTICS ARCHITECTURE
analytics
17
ABOUT AWS
For 13 years, Amazon Web Services has been the world’s most comprehensive and broadly
adopted cloud platform. AWS offers over 165 fully featured services for compute, storage,
databases, networking, analytics, robotics, machine learning and artificial intelligence (AI),
Internet of Things (IoT), mobile, security, hybrid, virtual and augmented reality (VR and AR),
media, and application development, deployment, and management from 61 Availability
Zones (AZs) within 20 geographic regions, spanning the U.S., Australia, Brazil, Canada, China,
France, Germany, India, Ireland, Japan, Korea, Singapore, Sweden, and the U.K. Millions of
customers—including the fastest-growing startups, largest enterprises, and leading government
agencies—trust AWS to power their infrastructure, become more agile, and lower costs.
To learn more about AWS, visit https://aws.amazon.com.
analytics ©2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
18