TP 4 2docuatrimestre
TP 4 2docuatrimestre
TP 4 2docuatrimestre
In the early 2000s, advances in software and hardware capabilities made it possible for
organizations to collect and handle large amounts of unstructured data. With this
explosion of useful data, open-source communities developed big data frameworks to
store and process this data. These frameworks are used for distributed storage and
processing of large data sets across a network of computers. Along with additional tools
and libraries, big data frameworks can be used for:
The main difference between big data analytics and traditional data analytics is the type of
data handled and the tools used to analyze it. Traditional analytics deals with structured
data, typically stored in relational databases. This type of database helps ensure that data
is well-organized and easy for a computer to understand. Traditional data analytics relies
on statistical methods and tools like structured query language (SQL) for querying
databases.
Big data analytics involves massive amounts of data in various formats, including
structured, semi-structured and unstructured data. The complexity of this data requires
more sophisticated analysis techniques. Big data analytics employs advanced techniques
like machine learning and data mining to extract information from complex data sets. It
often requires distributed processing systems like Hadoop to manage the sheer volume of
data.
Veracity
Data reliability and accuracy are critical, as decisions based on inaccurate or incomplete
data can lead to negative outcomes. Veracity refers to the data's trustworthiness,
encompassing data quality, noise and anomaly detection issues. Techniques and tools for
data cleaning, validation and verification are integral to ensuring the integrity of big data,
enabling organizations to make better decisions based on reliable information.
Value
Big data analytics aims to extract actionable insights that offer tangible value. This involves
turning vast data sets into meaningful information that can inform strategic decisions,
uncover new opportunities and drive innovation. Advanced analytics, machine learning
and AI are key to unlocking the value contained within big data, transforming raw data
into strategic assets.
Data professionals, analysts, scientists and statisticians prepare and process data in a data
lake house, which combines the performance of a data warehouse with the flexibility of a
data lake to clean data and ensure its quality. The process of turning raw data into
valuable insights encompasses several key stages:
Collect data: The first step involves gathering data, which can be a mix of
structured and unstructured forms from myriad sources like cloud, mobile
applications and IoT sensors. This step is where organizations adapt their data
collection strategies and integrate data from varied sources into central
repositories like a data lake, which can automatically assign metadata for better
manageability and accessibility.
Process data: After being collected, data must be systematically organized,
extracted, transformed and then loaded into a storage system to ensure accurate
analytical outcomes. Processing involves converting raw data into a format that is
usable for analysis, which might involve aggregating data from different sources,
converting data types or organizing data into structure formats. Given the
exponential growth of available data, this stage can be challenging. Processing
strategies may vary between batch processing, which handles large data volumes
over extended periods and stream processing, which deals with smaller real-time
data batches.
Clean data: Regardless of size, data must be cleaned to ensure quality and
relevance. Cleaning data involves formatting it correctly, removing duplicates and
eliminating irrelevant entries. Clean data prevents the corruption of output and
safeguard’s reliability and accuracy.
Analyze data: Advanced analytics, such as data mining, predictive analytics,
machine learning and deep learning, are employed to sift through the processed
and cleaned data. These methods allow users to discover patterns, relationships
and trends within the data, providing a solid foundation for informed decision-
making.
Under the Analyze umbrella, there are potentially many technologies at work, including
data mining, which is used to identify patterns and relationships within large data sets;
predictive analytics, which forecasts future trends and opportunities; and deep learning,
which mimics human learning patterns to uncover more abstract ideas.
Deep learning uses an artificial neural network with multiple layers to model complex
patterns in data. Unlike traditional machine learning algorithms, deep learning learns from
images, sound and text without manual help. For big data analytics, this powerful
capability means the volume and complexity of data is not an issue.
Natural language processing (NLP) models allow machines to understand, interpret and
generate human language. Within big data analytics, NLP extracts insights from massive
unstructured text data generated across an organization and beyond.
Structured Data
Structured data refers to highly organized information that is easily searchable and
typically stored in relational databases or spreadsheets. It adheres to a rigid schema,
meaning each data element is clearly defined and accessible in a fixed field within a record
or file. Examples of structured data include:
Structured data's main advantage is its simplicity for entry, search and analysis, often
using straightforward database queries like SQL. However, the rapidly expanding universe
of big data means that structured data represents a relatively small portion of the total
data available to organizations.
Unstructured Data
Unstructured data lacks a pre-defined data model, making it more difficult to collect,
process and analyze. It comprises the majority of data generated today, and includes
formats such as:
The primary challenge with unstructured data is its complexity and lack of uniformity,
requiring more sophisticated methods for indexing, searching and analyzing. NLP, machine
learning and advanced analytics platforms are often employed to extract meaningful
insights from unstructured data.
Semi-structured data
Semi-structured data occupies the middle ground between structured and unstructured
data. While it does not reside in a relational database, it contains tags or other markers to
separate semantic elements and enforce hierarchies of records and fields within the data.
Examples include:
JSON (JavaScript Object Notation) and XML (extensible Markup Language) files,
which are commonly used for web data interchange
Email, where the data has a standardized format (e.g., headers, subject, body) but
the content within each section is unstructured
NoSQL databases, can store and manage semi-structured data more efficiently
than traditional relational databases
Semi-structured data is more flexible than structured data but easier to analyze than
unstructured data, providing a balance that is particularly useful in web applications and
data integration tasks.
Ensuring data quality and integrity, integrating disparate data sources, protecting data
privacy and security and finding the right talent to analyze and interpret data can present
challenges to organizations looking to leverage their extensive data volumes. What
follows are the benefits organizations can realize once they see success with big data
analytics:
Real-time intelligence
One of the standout advantages of big data analytics is the capacity to provide real-time
intelligence. Organizations can analyze vast amounts of data as it is generated from
myriad sources and in various formats. Real-time insight allows businesses to make quick
decisions, respond to market changes instantaneously and identify and act on
opportunities as they arise.
Better-informed decisions
With big data analytics, organizations can uncover previously hidden trends, patterns and
correlations. A deeper understanding equips leaders and decision-makers with the
information needed to strategize effectively, enhancing business decision-making in
supply chain management, e-commerce, operations and overall strategic direction.
Cost savings
Big data analytics drives cost savings by identifying business process efficiencies and
optimizations. Organizations can pinpoint wasteful expenditures by analyzing large
datasets, streamlining operations and enhancing productivity. Moreover, predictive
analytics can forecast future trends, allowing companies to allocate resources more
efficiently and avoid costly missteps.
Data scientist
Data scientists analyze complex digital data to assist businesses in making decisions. Using
their data science training and advanced analytics technologies, including machine
learning and predictive modeling, they uncover hidden insights in data.
Data analyst
Data analysts turn data into information and information into insights. They use statistical
techniques to analyze and extract meaningful trends from data sets, often to inform
business strategy and decisions.
Data engineer
Data engineers prepare, process and manage big data infrastructure and tools. They also
develop, maintain, test and evaluate data solutions within organizations, often working
with massive datasets to assist in analytics projects.
Business intelligence (BI) analysts help businesses make data-driven decisions by analyzing
data to produce actionable insights. They often use BI tools to convert data into easy-to-
understand reports and visualizations for business stakeholders.
Data architect
Data architects design, create, deploy and manage an organization's data architecture.
They define how data is stored, consumed, integrated and managed by different data
entities and IT systems.
Big data analytics products
Respuesta
Traducción de titulus
Traducción de subtítulos
Análisis descriptivo
Análisis de diagnóstico
Análisis predictivo
Análisis prescriptivo
Velocidad
Variedad
Veracidad
Valor
Datos estructurados
Datos no estructurados
Datos semiestructurados
Ahorro de costos
Científico de datos
analista de datos
ingeniero de datos
Arquitecto de datos
2)