Data Architecture

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

ISAS

Big data Architecture For Business

  
Name          :1. Adrian Hartanto
                     2. Latif Arif Putranto
Faculty       : Fachran Nazarullah, S.Kom
Semester     : 3
Quarter       : 1
Class           : 3SC8
 

 
Continuing Education Center for Computing and Information Technology
Faculty of Engineering, University of Indonesia

2019
PREFACE

Praise be to Allah Almighty, Most Merciful, because thanks to His grace


and guidance, the writer can arrange and present a paper that contains about Big
data Architecture. The writer also thanked to Mr. Facran Nazarullah as lecturer
Introductory IT courses that have provided guidance to the writer in the process of
preparing this paper. Not to forget the writer thank the various parties who have
given encouragement and motivation.

The author realizes that in the preparation of this paper there are still far
from perfection. Therefore, the authors expect constructive criticism and
suggestions to improve this paper and can be a reference in preparing the papers
or subsequent tasks.

The authors also apologize if in writing this paper there are typos and
errors that confuse the reader in understanding the author's intent.

Depok, September 2019

Author

ii
TABLE OF CONTENTS

Contents
PREFACE.............................................................................................................................ii
TABLE OF CONTENTS.........................................................................................................iii
TABLE OF FIGURES.............................................................................................................iii
CHAPTER I..........................................................................................................................1
INTRODUCTION..................................................................................................................1
CHAPTER II.........................................................................................................................3
BASIC THEORY....................................................................................................................3
II.3 Characteristics of Big Data...............................................................................4
PROBLEM ANALYSIS...........................................................................................................7
III.1 Definition Big Data......................................................................................................7
III. 4 Challenges of Big Data Architecture....................................................................12
III.6 Disadvantages of Big Data:...............................................................................15
1. Incompatible tools..............................................................................................15
3. Chances of Failure..............................................................................................15
4. Correlation Errors.............................................................................................16
5. Security and Privacy Concerns.........................................................................16
CONCLUSION....................................................................................................................18
IV.1. Conclusion.........................................................................................................18
IV.2. Suggestion..........................................................................................................18
BIBLIOGRAPHY.................................................................................................................19

iii
TABLE OF FIGURES

Figure 3 1 Components of a big data architecture...................................................7


Figure 3 2 Big Data overview..................................................................................9
Figure 3 3 Big Data on Goverment........................................................................11

iv
1

CHAPTER I

INTRODUCTION

I.1.    Background

Architecture is that set of design artifacts, or descriptive representations,


that are relevant for describing an object, such that it can be produced to
requirements as well as maintained over the period of its useful life.

In information technology, data architecture is composed of models,


policies, rules or standards that govern which data is collected, and how it is
stored, arranged, integrated, and put to use in data systems and in organizations.
I.2.    Writing Objective
The following is the purpose of writing a paper entitled "Big data":
●  Understand the background of Database Architecture
●  Understand Components of a big data architecture

I.3.    Problem Domain

This paper will discuss about Big data Architecture for business
which includes the understanding about function of Big data Architecture.

I.4.    Writing Methodology

This paper is written by studying literature review from various


sources, either in the form of material from internet.
I.5.    Writing Framework

Here is a systematic writing of a paper entitled "Database Architecture for".

CHAPTER I INTRODUCTION
I.1 Background
I.2 Writing Objective
I.3 Problem Domain
I.4 Writing Methodology

I.5 Writing Framework

CHAPTER II BASIC THEORY


II.1 Information Technology Architecture

II.2 Types of Big Data

II.3 Characteristics of Big Data


II.4 What Does Big Data Architecture Look Like?

CHAPTER III PROBLEM ANALYSIS

III.1 Definition of Big Data Architecture


III.2 Components of Big Data Architecture
III.3 Applications of Big Data Architecture
III.4 Challenges of Big Data Architecture
III.5 The Advantage and the Disadvantage

CHAPTER IV CONCLUSION AND SUGGESTION


IV.1 Conclusion

IV.2 Suggestion

BIBLIOGRAPHY

2
2
3

CHAPTER II

BASIC THEORY

II.1 Information Technology Architecture


An information technology architecture is a detailed description of the various
information-processing assets needed to meet business objectives, the rules that govern
them, and the information associated with them. It focuses on three basic tiers within the
organization. We'll look at them as they relate to our newspaper business example:

 Server - generally hardware, this level provides the basic computing power for the
entire organization and is typically centrally located. This is the equipment in the
computer room of the newspaper business mentioned above.
 Middleware - generally software, this level sits on top of the server level and provides
the infrastructure necessary to keep the hardware running and the information
flowing. These are the tools and utilities used by the information technology people in
the newspaper business.
 Client - A combination of hardware and software, this level provides the capabilities
accessible by a user and allows them to access the information a business has
available. These are the things the reporters use in newspaper business (personal
computers, printers, applications, etc.).

In addition, several documents of interest are created that provide details for
how the levels are organized and administered. They are as follows:

 Products - a list of hardware and software used by the architecture.


 Standards & Guidelines - the set of rules for implementation and use of the various
assets and the level at which each is provided within the architecture.
 Services - a list of functions and features the architecture will provide.
 Principles - a set of guiding ideas that form the basis of the architecture.
 Policies - a set of rules that enforce the principles of the architecture.[1]
II.2 Types of Big Data
 Structured
By structured data, we mean data that can be processed, stored, and retrieved in a
fixed format. It refers to highly organized information that can be readily and
seamlessly stored and accessed from a database by simple search engine
algorithms. For instance, the employee table in a company database will be structured
as the employee details, their job positions, their salaries, etc., will be present in an
organized manner.

 Unstructured
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and analyze
unstructured data. Email is an example of unstructured data.
 Semi-structured
Semi-structured data pertains to the data containing both the formats mentioned
above, that is, structured and unstructured data. To be precise, it refers to the data that
although has not been classified under a particular repository (database), yet contains
vital information or tags that segregate individual elements within the data.

II.3 Characteristics of Big Data

Gartner analyst Doug Laney listed the 3 ‘V’s of Big Data – Variety, Velocity, and
Volume. These characteristics, isolatedly, are enough to know what is big data. Let’s look at
them in depth:

1) Variety

4
Variety of Big Data refers to structured, unstructured, and semistructured data that is
gathered from multiple sources. While in the past, data could only be collected from
spreadsheets and databases, today data comes in an array of forms such as emails, PDFs,
photos, videos, audios, SM posts, and so much more.

2) Velocity
Velocity essentially refers to the speed at which data is being created in real-time. In a
broader prospect, it comprises the rate of change, linking of incoming data sets at varying
speeds, and activity bursts.

3) Volume
We already know that Big Data indicates huge ‘volumes’ of data that is being
generated on a daily basis from various sources like social media platforms, business
processes, machines, networks, human interactions, etc. Such a large amount of data are
stored in data warehouses.[2]

II.4 What Does Big Data Architecture Look Like?


Big data architecture varies based on a company's infrastructure and needs, but it usually
contains the following components:

 Data sources. All big data architecture starts with your sources. This can include data
from databases, data from real-time sources (such as IoT devices), and static files
generated from applications, such as Windows logs
 Real-time message ingestion. If there are real-time sources, you'll need to build a
mechanism into your architecture to ingest that data.
 Data store. You'll need storage for the data that will be processed via big data
architecture. Often, data will be stored in a data lake, which is a large unstructured
database that scales easily.
 A combination of batch processing and real-time processing. You will need to
handle both real-time data and static data, so a combination of batch and real-time
processing should be built into your big data architecture. This is because the large

5
volume of data processed can be handled efficiently using batch processing, while
real-time data needs to be processed immediately to bring value. Batch processing
involves long-running jobs to filter, aggregate, and prepare the data for analysis.
 Analytical data store. After you prepare the data for analysis, you need to bring it
together in one place so you can perform analysis on the entire data set. The
importance of the analytical data store is that all your data is in one place so your
analysis can be comprehensive, and it is optimized for analysis rather than
transactions. This might take the form of a cloud-based data warehouse or a relational
database, depending on your needs.
 Analysis or reporting tools. After ingesting and processing various data sources,
you'll need to include a tool to analyze the data. Frequently, you'll use a BI (Business
Intelligence) tool to do this work, and it may require a data scientist to explore the
data.
 Automation. Moving the data through these various systems requires orchestration
usually in some form of automation. Ingesting and transforming the data, moving it in
batches and stream processes, loading it to an analytical data store, and finally
deriving insights must be in a repeatable workflow so that you can continually gain
insights from your big data.[3]

6
7

CHAPTER III

PROBLEM ANALYSIS

III.1 Definition Big Data

Big data is a term that describes a large volume of structured, semi-structured and
unstructured data that has the potential to be mined for information and used in machine
learning projects and other advanced analytics applications. Big data is often characterized by
the 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which
the data must be processed. several other Vs have been added to descriptions of big data,
including veracity, value and variability. Although big data doesn't equate to any specific
volume of data, the term is often used to describe terabytes, petabytes and even exabytes of
data captured over time.

III.2 Components of a big data architecture

Figure 3 1 Components of a big data architecture

Most big data architectures include some or all of the following components:

 Data sources. All big data solutions start with one or more data sources. Examples
include:
o Application data stores, such as relational databases.
o Static files produced by applications, such as web server log files.
o Real-time data sources, such as IoT devices.
 Data storage. Data for batch processing operations is typically stored in a distributed
file store that can hold high volumes of large files in various formats. This kind of store
is often called a data lake. Options for implementing this storage include Azure Data
Lake Store or blob containers in Azure Storage.
 Batch processing. Because the data sets are so large, often a big data solution must
process data files using long-running batch jobs to filter, aggregate, and otherwise
prepare the data for analysis. Usually these jobs involve reading source files, processing
them, and writing the output to new files. Options include running U-SQL jobs in
Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in an
HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight
Spark cluster.
 Real-time message ingestion. If the solution includes real-time sources, the
architecture must include a way to capture and store real-time messages for stream
processing. This might be a simple data store, where incoming messages are dropped
into a folder for processing. However, many solutions need a message ingestion store to
act as a buffer for messages, and to support scale-out processing, reliable delivery, and
other message queuing semantics. This portion of a streaming architecture is often
referred to as stream buffering. Options include Azure Event Hubs, Azure IoT Hub, and
Kafka.
 Stream processing. After capturing real-time messages, the solution must process
them by filtering, aggregating, and otherwise preparing the data for analysis. The
processed stream data is then written to an output sink. Azure Stream Analytics
provides a managed stream processing service based on perpetually running SQL
queries that operate on unbounded streams. You can also use open source Apache
streaming technologies like Storm and Spark Streaming in an HDInsight cluster.
 Analytical data store. Many big data solutions prepare data for analysis and then
serve the processed data in a structured format that can be queried using analytical
tools. The analytical data store used to serve these queries can be a Kimball-style
relational data warehouse, as seen in most traditional business intelligence (BI)
solutions. Alternatively, the data could be presented through a low-latency NoSQL
technology such as HBase, or an interactive Hive database that provides a metadata
abstraction over data files in the distributed data store. Azure SQL Data Warehouse

8
provides a managed service for large-scale, cloud-based data warehousing. HDInsight
supports Interactive Hive, HBase, and Spark SQL, which can also be used to serve data
for analysis.
 Analysis and reporting. The goal of most big data solutions is to provide insights
into the data through analysis and reporting. To empower users to analyze the data, the
architecture may include a data modeling layer, such as a multidimensional OLAP cube
or tabular data model in Azure Analysis Services. It might also support self-service BI,
using the modeling and visualization technologies in Microsoft Power BI or Microsoft
Excel. Analysis and reporting can also take the form of interactive data exploration by
data scientists or data analysts. For these scenarios, many Azure services support
analytical notebooks, such as Jupyter, enabling these users to leverage their existing
skills with Python or R. For large-scale data exploration, you can use Microsoft R
Server, either standalone or with Spark.
 Orchestration. Most big data solutions consist of repeated data processing
operations, encapsulated in workflows, that transform source data, move data between
multiple sources and sinks, load the processed data into an analytical data store, or push
the results straight to a report or dashboard. To automate these workflows, you can use
an orchestration technology such Azure Data Factory or Apache Oozie and Sqoop.[4]

III.3 Applications of Big Data

Having a bird’s eye view of big data and its application in different industries will
help you better appreciate what your role is or what it is likely to be in the future, in your
industry or across different industries.

9
Figure 3 2 Big Data overview

3.1. Banking and Securities

The Securities Exchange Commission (SEC) is using big data to monitor financial
market activity. They are currently using network analytics and natural language processors
to catch illegal trading activity in the financial markets. Retail traders, Big banks, hedge
funds and other so-called ‘big boys’ in the financial markets use big data for trade analytics
used in high frequency trading, pre-trade decision-support analytics, sentiment measurement,
Predictive Analytics etc.

This industry also heavily relies on big data for risk analytics including; anti-money
laundering, demand enterprise risk management, "Know Your Customer", and fraud
mitigation. Big Data providers specific to this industry include: 1010data, Panopticon
Software, Streambase Systems, Nice Actimize and Quartet FS

3.2 Transportation
In recent times, huge amounts of data from location-based social networks and high
speed data from telecoms have affected travel behavior. Regrettably, research to understand
travel behavior has not progressed as quickly.

10
In most places, transport demand models are still based on poorly understood new
social media structures.

Applications of big data in the transportation industry

Some applications of big data by governments, private organizations and individuals include:

 Governments use of big data: traffic control, route planning, intelligent transport
systems, congestion management (by predicting traffic conditions)

 Private sector use of big data in transport: revenue management, technological


enhancements, logistics and for competitive advantage (by consolidating shipments and
optimizing freight movement)

 Individual use of big data includes: route planning to save on fuel and time, for travel
arrangements in tourism etc.

Figure 3 3 Big Data on Goverment

Big Data Providers in this industry include: Qualcomm and Manhattan Associates

11
3.3 Education

From a technical point of view, a major challenge in the education industry is to


incorporate big data from different sources and vendors and to utilize it on platforms that
were not designed for the varying data.From a practical point of view, staff and institutions
have to learn the new data management and analysis tools.

On the technical side, there are challenges to integrate data from different sources, on
different platforms and from different vendors that were not designed to work with one
another. Politically, issues of privacy and personal data protection associated with big data
used for educational purposes is a challenge.

Applications of big data in Education

Big data is used quite significantly in higher education. For example, The University
of Tasmania. An Australian university with over 26000 students, has deployed a Learning
and Management System that tracks among other things, when a student logs onto the
system, how much time is spent on different pages in the system, as well as the overall
progress of a student over time.

In a different use case of the use of big data in education, it is also used to measure
teacher’s effectiveness to ensure a good experience for both students and teachers. Teacher’s
performance can be fine-tuned and measured against student numbers, subject matter, student
demographics, student aspirations, behavioral classification and several other variables.

On a governmental level, the Office of Educational Technology in the U. S.


Department of Education, is using big data to develop analytics to help course correct
students who are going astray while using online big data courses. Click patterns are also
being used to detect boredom. Big Data Providers in this industry include: Knewton and
Carnegie Learning and MyFit/ Naviance

12
III. 4 Challenges of Big Data Architecture

When done right, a big data architecture can save your company money and help
predict important trends, but it is not without its challenges. Be aware of the following issues
when working with big data.

Data Quality
Anytime you are working with diverse data sources, data quality is a challenge. This
means that you'll need to do work to ensure that the data formats match and that you don't
have duplicate data or are missing data that would make your analysis unreliable. You'll need
to analyze and prepare your data before you can bring it together with other data for analysis.

Scaling
The value of big data is in its volume. However, this can also become a significant
issue. If you have not designed your architecture to scale up, you can quickly run into
problems. First, the costs of supporting the infrastructure can mount if you don't plan for
them. This can be a burden on your budget. And second, if you don't plan for scaling, your
performance can degrade significantly. Both issues should be addressed in the planning
phases of building your big data architecture.

Security
While big data can give you great insights into your data, it's challenging to protect
that data. Fraudsters and hackers can be very interested in your data, and they may try to
either add their own fake data or skim your data for sensitive information. A cybercriminal
can fabricate data and introduce it to your data lake. For example, suppose you track website
clicks to discover anomalous patterns in traffic and find criminal activity on your site. A
cybercriminal can penetrate your system, adding noise to the data so that it is impossible to
find the criminal activity. Conversely, there is a huge volume of sensitive information to be
found in your big data, and a cybercriminal could mine your data for that information if you
don't secure the perimeters, encrypt your data, and work to anonymize the data to remove
sensitive information.[5]

III.5 Advantages and Disadvantages

13
Advantages of Big Data:
1. Cost Cutting
Big Data provides business intelligence that can improve the efficiency of operations and cut
down on costs. Big Data technologies such as Hadoop and other cloud-based analytics help
significantly reduce costs when storing massive amounts of data. They can also find far more
efficient ways of doing business.

Though initial implementation may seem expensive, it will eventually save a lot of money in
the long run. The reduction in waiting time reduces the stress on the organization’s IT
landscape, and so resources previously set aside to respond to report requests are now freed
up.

2. Better Decision Making


Businesses are now able to analyse information instantly thanks to the quick
processing of Hadoop and in-memory analytics, added with the ability to analyse new sources
of information. Based on what they’ve learned from all this, companies are able to take faster
and better decisions.

Big Data is able to analyse data from the past which can be used to make predictions
about the future. This makes businesses take better decisions in the present as well as prepare
for the future. Data insights into customer movements, promotions and competitive offerings
give useful information with regards to customer trends. With real-time analytics, quicker
decisions can be made that are better suited to present customers.

3. New Products and Services


Now that companies using Big Data tools understand how customer patterns work and
what works better and what doesn’t, this grants the company the ability to estimate customer
satisfaction and needs, and are thus able to come up with products and services that
customers would want.

With Big Data analytics, far more companies are now able to create new products and
services to meet the needs of their customers. Companies are able to analyse data from the
past about customer feedbacks and product launches which helps them to come up with better
products. Additionally, real-time market analysis helps in customer-oriented marketing by

14
allowing businesses to understand changes in consumer behavior and shifts in supply and
demand of products. Understanding consumer needs, buying behaviors and preferences can
help with the increasing demand for personalized services.

4. Fraud Detection
Big Data helps to automatically detect fraud attempts to hack into your organisation
and you will be instantly notified of a real-time safeguard system. Once you detect a
fraudulent attempt, you can immediately take appropriate action. You can map the entire data
landscape across your organization using Big Data tools.

This will let you analyse different internal threats and use this data to keep sensitive
information secure and safe. It is stored following regulatory requirements and protected in
an appropriate way. Because of this, many industries have begun to use Big Data for data
safety and protection, especially so in organizations and companies that deal with financial
information.

5. Control Online Reputation


Big Data tools can help understand the company’s reputation through sentiment
analysis. This gives you feedback about what people say about your company, which will
allow you to improve your company’s online presence and reputation.

III.6 Disadvantages of Big Data:

1. Incompatible tools

Hadoop is the most commonly used tool for Big Data analytics. However, the
standard version of Hadoop is not currently able to handle real-time data analysis. This means
that other tools need to be used while we wait around for Hadoop to add functionality to a
real-time approach in the near or distant future.

2. New approach

15
Most organizations are used to working in a manner where insights and updated are
received approximately once in a week. With Big Data bringing in insights every second, the
organisation will require a different approach and work method to handle this influx of
information at a much faster rate than the company is used to handling.

Insights need action and with Big Data, this action is now required in real-time. This
will drastically affect work culture, a change that the company may or may not be
immediately ready for. This could definitely be a great challenge to some organisations and
may lead to a restructuring of plans and decisions.

3. Chances of Failure

Many organizations may see other companies using Big Data, and its benefits being
touted all over the internet as the best tool to grow one’s business. This may cause them to
take hasty decisions and try to implement it immediately without understanding how to use it
and whether it is suited to their business or not.

If Big Data is not implemented in the appropriate manner, it could cause more harm
than good. Companies that are not used to handling data at such a rapid rate may make
inaccurate analysis which could lead to bigger problems for the organization.

4. Correlation Errors

A common technique used to analyse Big Data is to draw correlations by linking one
variable to another to form a pattern. However, their correlations may not always stand for
anything substantial or meaningful. In fact, just because two variables are linked or correlated
does not imply that an instrumental relationship is present between them. In short, correlation
does not always imply causation.A thorough analysis with the help of a data expert will help
you understand which of these correlations mean anything to your business and which
absolutely don’t.

16
5. Security and Privacy Concerns

Though it may seem ironic since we already mentioned safety and security as an
advantage for Big Data, it is important to understand that although Big Data analytics allows
you to find fraudulent attempts, the framework itself is prone to a data breach as is the case
with many technological undertakings.

The information that you provide to a third party may get leaked to competitors and
customers. There are also privacy concerns as many customers are not comfortable with the
idea that Big Data is capable of a collection of detailed information about their identities.[6]

17
18

CHAPTER IV

CONCLUSION

IV.1. Conclusion

The availability of Big Data, low-cost commodity hardware, and new information
management and analytic software have produced a unique moment in the history of data
analysis. The convergence of these trends means that we have the capabilities required to
analyze astonishing data sets quickly and cost-effectively for the first time in history. These
capabilities are neither theoretical nor trivial. They represent a genuine leap forward and a
clear opportunity to realize enormous gains in terms of efficiency, productivity, revenue, and
profitability. The Age of Big Data is here, and these are truly revolutionary times if both
business and technology professionals continue to work together and deliver on the promise.

IV.2. Suggestion

Studying big data is quite important because the age of big data is here. In the future
big data will be very important because it will be needed by anyone who engaged in
information technology
19

BIBLIOGRAPHY

[1]The Technology of Information from:https://study.com/academy/lesson/what-is-


information-technology-architecture-definition-plan.html Retrieved 12 September 2019

[2] Characteristics of Big Data from: https://www.upgrad.com/blog/what-is-big-data-types-


characteristics-benefits-and-examples/ Retrieved 12 September 2019

[3] What Does Big Data Architecture Look Like? From: https://dzone.com/articles/what-is-
big-data-architecture Retrieved 11 September 2019

[4] Big data architectures from:https://docs.microsoft.com/en-


us/azure/architecture/data-guide/big-data/ Retrieved 12 September 2019

[5] Challenges of Big Data Architecture from: https://dzone.com/articles/what-is-big-data-


architecture Retrieved 11 September 2019

[6] Advantages and Disadvantages of Big Data in Businesses


from://www.stacktunnel.com/advantages-disadvantages-big-data.html Retrieved 11
September 2019

19

You might also like