Unit#1 - Overview

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

19 CS

1 Term Final Year


st
Data Sciences
and Analytics
(DSA)
Prof. Dr. M. S. Memon
Course In charge
[email protected]
1. Overview
DATA SCIENCE VOCABULARY

Prof. Dr. M. S. Memon 5


4

Prof. Dr. M. S. Memon 6


ALL ABOUT DATA SCIENCE

• Statistics
• Big Data Analytics
• Business Analytics
• Business Intelligence
• Data(base) Management
• Visualization
• Machine Learning
• Data Mining
• Artificial Intelligence
• Predictive Modelling

Prof. Dr. M. S. Memon 7


What is Data Science

Data

Analysing

Manipulating

Prof. Dr. M. S. Memon 8


How is it different from ML, DL, and AI

Artificial Intelligence

Machine Learning

Deep Learning

Prof. Dr. M. S. Memon 9


Artificial Intelligence

What is AI?

Tools

Application

Prof. Dr. M. S. Memon 10


Deep Learning

What is DL?

Tools

Application

Prof. Dr. M. S. Memon 11


ALL ABOUT Data Science

Prof. Dr. M. S. Memon 12


1. WHAT IS DATA SCIENCE?

• “Data science, also known as data-driven science, is an


interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”

Prof. Dr. M. S. Memon 13


WHAT IS DATA SCIENCE?

• “Data science, also known as data-driven science, is an


interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”
• “Data science intends to analyze and understand actual
phenomena with ‘data’. In other words, the aim of data science
is to reveal the features or the hidden structure of complicated
natural, human, and social phenomena with data from a
different point of view from the established or traditional theory
and method.”

Prof. Dr. M. S. Memon 14


WHAT IS DATA SCIENCE?

• Fourth paradigm
• “… change of all sciences moving from observational, to
theoretical, to computational and now to the 4th Paradigm –
Data-Intensive Scientific Discovery”

Prof. Dr. M. S. Memon 15


2. WHAT IS IMPORTANT?

Need to solve a real problem using data…


No applications, no data science.
3. Defining Data Science
 A PROCESS OF FINDING THE
KNOWLEDGE (HIDDEN PATTERN) FROM THE
RAW DATA USING
PRINCIPLE OF MACHINE LEARNING,
ALGORITHMS AND VARIOUS TOOLS.
Data Science Process

• Setting the
• Data Exploration
01 research goal 04

• Retrieving Data • Data Modeling


02 05
• Results analysis and
• Data Preparation
03 06 visualization

Prof. Dr. M. S. Memon 17


3.1. Setting the Research Goal

 DATA SCIENCE RESEARC H GOAL IS


MOSTLY OBTAIN AS PER ORGANIZATION
REQUIREMENT.
 PREPARING THE CHARTER WITH SOME
MAJOR QUESTIONS AND THEIR
ANSWERS AS:
 What is going to research?
 How the organization will get benefit from it?
 What are the resources and data required?
 What are the time table and deliverable?
Prof. Dr. M. S. Memon 18
3.2. Retrieving the Data

 DATA COLLECTION IS THE SECOND STEPS OF


DATA SCIENCE PROCESS.
 COLLECTING THE REQUIRED DATA AS PER
PROJECT CHARTER BY CHECKING THE DATA
EXISTENCE, ACCESS, AND QUALITY WITHIN AND
OUTSIDE OF THE ORGANIZATION.
 DEALING WITH DIFFERENT TYPES OF DATA
FORMAT AND DATABASE.
 ACCESSING THE THIRD PARTY RESOURCE TO
ENRICH THE QUALITY OF INFORMATION.
Prof. Dr. M. S. Memon 19
3.3. Data Preparation
 PREPARING A GOOD QUALITY OF DATA IN
REQUIRED FORMAT USING COMMON AND
DOMAIN SPECIFIC PREPROCESSING STEPS.

Data Preparation
Phases
• Data Cleaning: Removing inconsistency and
01 Noise data

• Data Integration:Enriching data by combining


02 the multiple data sources
• Data Transformation:Obtaining the suitable
03 format to utilize for modeling

Prof. Dr. M. S. Memon 20


3.4. Data Exploration

 UNDERSTANDING THE DATA USING


STATISTICAL ANALYSIS AND VISUALIZATION.
 DETECTING THE NOISE AND OUTLIERS.
 UNDERSTANDING THE VARIABLE INTERACTIONS.
 TRYING TO SENSE THE DISTRIBUTION OF
THE DATA
 THIS STEP SPECIALLY KNOWN AS EXPLORATORY
DATA ANALYSIS (EDA).

Prof. Dr. M. S. Memon 21


3.5. Building the Model

 THIS STEP USE THE PREVIOUS EXPERIENCES OF THE DOMAIN TO


BUILD THE MODELS.
 WHILE BUILDING THE MODEL, ITUTILIZESTHE STATISTICS, OPERATION
RESEARCH METHODS, OPTIMIZATION AND MACHINE LEARNING
ALGORITHMS.
 IN ITERATIVE PROCESS, HYPERPARAMETER TUNING IS DONE FOR
SELECTING THEFINAL MODEL.
 FINAL MODEL GOT SELECTED BASED ON PERFORMANCE OF
MODEL ON VALIDATION SET OF THE DATA.

Prof. Dr. M. S. Memon 22


3.5.1 Models

Prof. Dr. M. S. Memon 23


Models

Prof. Dr. M. S. Memon 24


3.6. Result Analysis and Visualization
 THIS STEPS INVOLVES THE RESULTS ANALYSIS AND
VISUALIZATION
 THERE ARE TWO WAY TO ANALYZE THE RESULTS
 Quantitative measures
 Graphical measures
 Statistical measures
 SOME TIME, IT IS IMPORTANT TO VISUALIZE THE
RESULTS DYNAMICALLY THAT SHOWS THE REAL TIME
BEHAVIOR OF RESULTS.
 BUSINESS INTELLIGENCE TOOLS ARE UTILIZED FOR
VISUALIZATION OF RESULTS LIKE: MICROSOFT
POWER BI, TABLEAU DESKTOP, GOOGLE CHART,
MICROSOFT BI ETC.
Prof. Dr. M. S. Memon 25

You might also like