Data Analytics 1
Data Analytics 1
Data Analytics 1
• Data Mining: is the computational process to discover patterns in large datasets stored in relational databases
and data warehouses.
• It is an intersection of artificial intelligence, machine learning, statistics and database systems.
Data Analytics also known as Predictive Analytics, is all about automating insights into a dataset through usage of
queries and data aggregation procedures. It can represent various dependencies between input variables and
discover hidden patterns in the dataset under analysis.
- is the science of examining raw data with the purpose of finding and drawing conclusions about the
information in the data using methods from statistics and machine learning.
- goes beyond the concept of data mining by analysing semi-structured and unstructured data from different
sources and in different formats e.g. text mining.
Big Data implies huge data volumes that cannot be processed effectively with traditional applications. Big Data
processing begins with raw data that is not aggregated and it is often impossible to store such data in the memory
of a single computer.
• In other words big data analytics is an extension of traditional data analytics which was mainly done on
structured data e.g. stored in relational databases, to a more complex analysis on structured, semi-structured and
unstructured data.
• Big data is characterized by the famous 3 ‘Vs’-{volume, velocity and variety}
• Big data deals with huge volumes of data, whose rate of growing is very fast and is from diverse sources
and in different formats e.g. texts, tweets, web click streams, satellite images, sensors, web log data etc.
• Big data analytics is fuelled by improvements in bandwidth and connectivity, advancement in processing
power and the falling price of hard disk
2.Information Acquisition
gathering useful information from the data sources available that can be used to solve the problem
• data must be described, together with their type, relevancy, and organization.
• Data can be found/gathered from the company’s databases, data sets available online, through Web APIs and
crawling data, and social media sites such as Twitter and Facebook let their users approach data by connecting
with web servers.
3.Data Preparation
Process of filtering out the data applicable for the problem, merge different datasets, clean the data
- eliminating inaccurate data, treating missing values and outliers
• setting up a sandbox (or testing) environment, and extracting, transforming, and loading your data into your new
sandbox in a format that is ready for analysis and model building.
• convert the data into desired format, eliminate columns that are not needed and derive new elements from the
data acquired
6. Data Modelling
creating a conceptual representation of data and the relationships between the various data entities.
• main objective of data modeling is to ensure that data is well-organized and easily accessible, thereby
facilitating analysis and decision-making.
• data scientist will often build a baseline model that has proved successful in similar situations and then tailor it
to suit the specifics of your problem
7. Data Visualization /Operationalize
delivering the final reports on the model performance findings, as well as any necessary briefings, code, and
technical documents
• communicate our findings to the stakeholders • Data visualization is used to convey the information in an easy
way and understand the performance of the proposed solution
Data Acquisition
Crime data is collected from various sources, including police reports, crime statistics, and incident reports
Data Preparation
the data might be cleaned by removing duplicate records or filling in missing values..
Data Exploration
data might be visualized to identify which types of crimes are most common, and which locations are most
affected.
Data Modeling
a model might be built to predict the likelihood of a crime occurring based on factors such as location, time of
day, and weather conditions.
Data Visualization
the model might be evaluated to measure its ability to predict crime rates accurately .
the model might be integrated into a crime prevention strategy to identify high - risk areas and allocate
resources accordingly