Welcome to Scribd!

0% found this document useful (0 votes)

5 views

Introduction To Gather

Uploaded by

kachizih

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Introduction To Gather

Uploaded by

kachizih

0% found this document useful (0 votes)

5 views8 pages

Original Title

Introduction to Gather

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

5 views8 pages

Introduction To Gather

Uploaded by

kachizih

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 8

Search inside document

Introduction to Gather

© Explore Data Science Academy

Overview

This tutorial is laid out as follows:

The Explore Data Science process

Gathering data in the real world

What is involved in “gathering” data?

Where do we get data?

Where do we store data and how has this changed over time?

Conclusion
The Explore Data Science Process

The Explore Data Science Process is about solving real-world problems using data.

GATHER

Databases
SQL Queries
Types of Databases
Modifying Data
Schema Maintenance

Statistics
Probability
Distributions
Set Theory
Gathering data in the real world

Be prepared to spend A LOT of time gathering data!

The fallacy of “perfect data” What do we do with imperfect data?

• No such thing as a perfect dataset.

• The PARETO PRINCIPLE states that roughly 80%
• Part of the gathering process involves checking, of the effects come from 20% of the causes.
cleaning, and getting into the right format. • For Data Science, this means quickly
• Sometimes we need to meet halfway between understanding the 20% of data that accounts for
what is available and what is required. 80% of the results.

ITERATE!

• Use what data is available and get started.

• Doing descriptive analytics and building models.
• Helps you understand what more data you might
need to continue.
What is involved in “Gathering” data?

Data is the essence of data science - it’s in the name!

• Finding data e.g. using web scraping to extract data.

Processes • Creating data e.g. collecting or transforming data.
Involved • Storing data e.g. using AWS to maintain databases.
• Managing data e.g backing up and granting access to data.

• We need data to solve problems, so we gather it after we have

A continual speciﬁed the problem.
process • A continual feedback loop exists between E-G-A-D (and we never stop
gathering data).

• Impossible to do data science without good quality data -

garbage in, garbage out!
Relevance
• Data needs to be in the correct format in order to analyse
and visualise it.
Where do we get data?

Getting data is a critical part of data science, sometimes you get lucky and it’s already available….

Using other people’s data Collecting your own data

Open data sources, for example Create your own new datasets
• Stats SA • Primary research, including:
• UCT’s Data Portal ○ Surveys
• City of Cape Town ○ Interviews
• The World Bank ○ Simulating data

Proprietary data sources Collect other people’s data

• Industry datasets • Use web scraping to pull data off websites
• Company speciﬁc datasets • Use API’s to pull data off systems and speciﬁc
applications
• Capture data electronically that used to be on
You should not share any proprietary data without written paper
consent from the source. Also need to be aware of regulations
like the Protection of Personal Information (POPI) Act
Where do we store data and how has this changed over time?

There are multiple mediums for storing data and these are constantly changing and improving.

Old School Local Storage Cloud Storage

• Prehistoric data storage • Data stored physically on a • We are now starting to store
included writing on clay local computer, external drive data in the “cloud” e.g. on
tablets or on rock or on a server in a database or Amazon Web Services,
in a ﬁle system Microsoft Azure, or Google
Cloud

• Data was then written or

typed on paper and stored in
ﬁling cabinets
Conclusion

What you have learnt

There is no such thing as a perfect dataset.

The process of ﬁnding, creating, storing, and

managing data.

The multiple mediums for storing data.

Literature Review On Food Delivery Service
Document11 pages
Literature Review On Food Delivery Service
Akash soni
92% (12)
Using Netsim
Document6 pages
Using Netsim
yudya sukma
No ratings yet
Testing Cheats Enabled
Document22 pages
Testing Cheats Enabled
Av Diki
No ratings yet
Smart Washing Machine
Document27 pages
Smart Washing Machine
Ramiro O CF
No ratings yet
Study Material I
Document140 pages
Study Material I
mansha bhatia
No ratings yet
Introduction To Data Mining: - Chapter 3
Document39 pages
Introduction To Data Mining: - Chapter 3
Maya Joshi
No ratings yet
Chapter 1 DM
Document20 pages
Chapter 1 DM
Hardik Dangiya
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
Document50 pages
Anaum Hamid: Lecture 01 - Introduction To DM
Farooq
No ratings yet
DataMining S
Document103 pages
DataMining S
Barsha Roy
No ratings yet
The Importance of Data Mining in IT Industry
Document50 pages
The Importance of Data Mining in IT Industry
Jun Pobz
No ratings yet
Combine 056
Document57 pages
Combine 056
Ahmed Al-syani
No ratings yet
Module - 1 - DM
Document52 pages
Module - 1 - DM
prathammsr192003
No ratings yet
Unit 1: Data Warehousing & Data Mining
Document54 pages
Unit 1: Data Warehousing & Data Mining
Kunal Ranjan
No ratings yet
Unit 1
Document59 pages
Unit 1
Saidulu Dorepalli
No ratings yet
Data Mining
Document25 pages
Data Mining
Hoa Ha
No ratings yet
Introduction To Big Data BS (CS) 6 Lecture # 3: Dr. Syed Attique Shah (PH.D.)
Document32 pages
Introduction To Big Data BS (CS) 6 Lecture # 3: Dr. Syed Attique Shah (PH.D.)
Ahsan Iqbal
No ratings yet
Data Mining N Business Intelligence
Document63 pages
Data Mining N Business Intelligence
Vishal Anand
No ratings yet
N 3. Classification of Digital Data
Document39 pages
N 3. Classification of Digital Data
newt67710
No ratings yet
01-Introduction To Data Mining
Document43 pages
01-Introduction To Data Mining
Ku Ha Ku
No ratings yet
Da Unit - I - Notes
Document30 pages
Da Unit - I - Notes
krishnaharish678
No ratings yet
File 1704273297 0009750 IntroUNIT-1
Document13 pages
File 1704273297 0009750 IntroUNIT-1
Mubarak Daha Isa
No ratings yet
Web Mining - Lec1 2
Document62 pages
Web Mining - Lec1 2
Ammar Mousa
No ratings yet
Bda Unit 1
Document47 pages
Bda Unit 1
bhargavvobilisetti
No ratings yet
DATA WAREHOUSE - Pertemuan01
Document20 pages
DATA WAREHOUSE - Pertemuan01
TopalSangPelajar
No ratings yet
Data Science: October 2021
Document51 pages
Data Science: October 2021
Rajachandra Voodiga
No ratings yet
Lecture 1
Document37 pages
Lecture 1
alaa emad
No ratings yet
Chapter 1
Document40 pages
Chapter 1
SANG VÕ NGỌC
No ratings yet
Module 1
Document54 pages
Module 1
Aditya Raj
No ratings yet
DM 1
Document78 pages
DM 1
Aditya Srivastava
No ratings yet
Unit - I - Types of Digital Data
Document45 pages
Unit - I - Types of Digital Data
hekhodke
No ratings yet
BDT Module 1
Document107 pages
BDT Module 1
falishaumaiza6
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
Document28 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
Pulkit Agarwal
No ratings yet
DM Chap1 Introduction
Document36 pages
DM Chap1 Introduction
Engin Öner
No ratings yet
Unit 1
Document76 pages
Unit 1
Shritika Chandra
No ratings yet
KDD-07 Invited Innovation Talk: Research
Document64 pages
KDD-07 Invited Innovation Talk: Research
OnixSantos
No ratings yet
Introduction To Data Mining
Document17 pages
Introduction To Data Mining
Kylle
No ratings yet
Lecture1 Introductiontobigdata 190301171350
Document63 pages
Lecture1 Introductiontobigdata 190301171350
krishnaharish678
No ratings yet
Chapter 1 DM
Document20 pages
Chapter 1 DM
minaluasefa23
No ratings yet
BDT 1
Document49 pages
BDT 1
Karthick T
No ratings yet
CSE2021 - MODULE 1ppt
Document62 pages
CSE2021 - MODULE 1ppt
Rehan Mohammed
No ratings yet
BDA 01 - Introduction
Document42 pages
BDA 01 - Introduction
51 Nguyễn Hoàng Việt
No ratings yet
Certified Artificial Intelligence Practitioner 1
Document43 pages
Certified Artificial Intelligence Practitioner 1
MIresh Rs
No ratings yet
Dbms Data Warehosuing
Document80 pages
Dbms Data Warehosuing
Vishal Anand
No ratings yet
Basis Data 02
Document111 pages
Basis Data 02
filar ilham fernanda
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
Document80 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
shreyanshsingh0408
No ratings yet
1712060004 (1)
Document25 pages
1712060004 (1)
Rishabh Jain
No ratings yet
Data Science
Document87 pages
Data Science
Umar Ahmad
No ratings yet
Part 1 - Introduction To Big Data
Document24 pages
Part 1 - Introduction To Big Data
asarisetya
No ratings yet
Data Mining
Document61 pages
Data Mining
alinsubaojhendz
No ratings yet
01-Introduction To DS With Python
Document32 pages
01-Introduction To DS With Python
Sabrina Sibarani
No ratings yet
CHP 19
Document63 pages
CHP 19
mona yadv
No ratings yet
Lecture 1-Data Mining (Introduction)
Document30 pages
Lecture 1-Data Mining (Introduction)
ruba
No ratings yet
Module 1
Document107 pages
Module 1
prabadeviboopathy87
No ratings yet
Unit I 2 Marks
Document5 pages
Unit I 2 Marks
ramyaproject
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
Document37 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
ANIRUDH B K 19BIT0348
No ratings yet
Introduction To Data Mining: Unit 1
Document28 pages
Introduction To Data Mining: Unit 1
Aryan
No ratings yet
01 - Data Mining Introduction
Document21 pages
01 - Data Mining Introduction
salehaalsaleh602
No ratings yet
DM Introduction
Document32 pages
DM Introduction
That was Epic
No ratings yet
Introduction To Big Data - Presentation
Document30 pages
Introduction To Big Data - Presentation
Mohamed Rachdi
No ratings yet
Introduction To Data Mining
Document46 pages
Introduction To Data Mining
vaishnavipatel.it22
No ratings yet
An Introduction To Data Warehousing and Data Mining
Document34 pages
An Introduction To Data Warehousing and Data Mining
Agnivesh Pandey
No ratings yet
4 Data Mining & Preprocessing L 11,12,13,14,15,16
Document100 pages
4 Data Mining & Preprocessing L 11,12,13,14,15,16
MANOJ KUMAWAT
No ratings yet
Data Warehousing Guide
From Everand
Data Warehousing Guide
Saimon Carrie
No ratings yet
Data Engineering Guide for Beginners: Part 1
From Everand
Data Engineering Guide for Beginners: Part 1
Allan Murray
No ratings yet
Cwne - Certified Wireless Networking Expert Continuing Education Policies and Guidelines
Document3 pages
Cwne - Certified Wireless Networking Expert Continuing Education Policies and Guidelines
kachizih
No ratings yet
Aws Skill Builder Team Subscription Learner Guide
Document46 pages
Aws Skill Builder Team Subscription Learner Guide
kachizih
No ratings yet
Recharge Card Dealers Guide11111
Document15 pages
Recharge Card Dealers Guide11111
kachizih
No ratings yet
Module 6 - Problem Set: Time Series Analysis Worksheet
Document3 pages
Module 6 - Problem Set: Time Series Analysis Worksheet
kachizih
0% (2)
Job Description For Engineer at Nigeria
Document1 page
Job Description For Engineer at Nigeria
kachizih
No ratings yet
Statistical Rainfall Analysis For Ghana
Document27 pages
Statistical Rainfall Analysis For Ghana
kachizih
No ratings yet
Problem Set: GDP: Module 4 - Problem Set: Hypothesis Testing and Linear Regression Worksheet
Document4 pages
Problem Set: GDP: Module 4 - Problem Set: Hypothesis Testing and Linear Regression Worksheet
kachizih
0% (4)
Changing Landscapes of Asian Higher Education
Document6 pages
Changing Landscapes of Asian Higher Education
kachizih
No ratings yet
America's Fixer-Upper Housing Market
Document2 pages
America's Fixer-Upper Housing Market
kachizih
No ratings yet
22-06-2020 Inverter System Installation Cost Estimate
Document1 page
22-06-2020 Inverter System Installation Cost Estimate
kachizih
No ratings yet
801 - Management Practice
Document125 pages
801 - Management Practice
kachizih
No ratings yet
Structured System Analysis and Design Technique
Document1 page
Structured System Analysis and Design Technique
kachizih
No ratings yet
Objective: Company: Lightning Networks Position: 3G Radio Planning and Optimization Engineer. 2016
Document5 pages
Objective: Company: Lightning Networks Position: 3G Radio Planning and Optimization Engineer. 2016
kachizih
No ratings yet
canadianResumeTemplate 1 PDF
Document2 pages
canadianResumeTemplate 1 PDF
Roshan Punnoose
No ratings yet
Motivational PDF
Document3 pages
Motivational PDF
kachizih
No ratings yet
OSIModel Chart PDF
Document1 page
OSIModel Chart PDF
kachizih
No ratings yet
Obstacle Avoiding Robot Using Pid Controller
Document48 pages
Obstacle Avoiding Robot Using Pid Controller
Đặng Duy Tùng
No ratings yet
Introduction of UI - UX
Document42 pages
Introduction of UI - UX
Viru Patel
100% (2)
Terms & Conditions For Usage of Sia It Resources and Non-Disclosure
Document2 pages
Terms & Conditions For Usage of Sia It Resources and Non-Disclosure
Saravana Selvi Lakshmanan
No ratings yet
Manual de PC-DMIS Portable: para La Versión 2017 R1
Document219 pages
Manual de PC-DMIS Portable: para La Versión 2017 R1
Sergio Silva Solano
No ratings yet
Chhattisgarh Swami Vivekanand Technical University, Bhilai
Document10 pages
Chhattisgarh Swami Vivekanand Technical University, Bhilai
Pradeep Singh Yadav
No ratings yet
Atcd Unit 2
Document49 pages
Atcd Unit 2
Sanam Durgarani
No ratings yet
Vsphere 7 Icmos Vita Guide - d2l
Document46 pages
Vsphere 7 Icmos Vita Guide - d2l
Ziad Hares
No ratings yet
Nutanix: NCSE-LEVEL-1 Exam
Document4 pages
Nutanix: NCSE-LEVEL-1 Exam
James
0% (1)
Computer System Components
Document34 pages
Computer System Components
Jhon Vincent Gayta
No ratings yet
Cybersecurity White Paper
Document28 pages
Cybersecurity White Paper
Oscar Cruz
No ratings yet
Web3.2 Development Kit Interfaces Description V1.1.1
Document37 pages
Web3.2 Development Kit Interfaces Description V1.1.1
Ricardo Castellanos Rivera
No ratings yet
MAX7-NEO7 HardwareIntegrationManual (UBX-13003704)
Document55 pages
MAX7-NEO7 HardwareIntegrationManual (UBX-13003704)
Bung Rhoma Dangdut
No ratings yet
Alpha
Document7 pages
Alpha
Shreya Singh
No ratings yet
Healthy Apps Us New Var
Document9 pages
Healthy Apps Us New Var
JESUS DELGADO
No ratings yet
Reflective Blog Posts
Document1 page
Reflective Blog Posts
Danika Barker
No ratings yet
Space & Time Complexity
Document3 pages
Space & Time Complexity
G K
No ratings yet
Mean Stack T Unit1
Document75 pages
Mean Stack T Unit1
sirishaksnlp
No ratings yet
Dicionario Geografico Do Imperio Do Brasil 1845
Document803 pages
Dicionario Geografico Do Imperio Do Brasil 1845
antoniomyskiw
No ratings yet
Patriot Blast SSD Firmware Update Zip PDF
Document4 pages
Patriot Blast SSD Firmware Update Zip PDF
Melissa
No ratings yet
Tutorial 1
Document5 pages
Tutorial 1
Nainesh Sorathiya
No ratings yet
The Android Application For Sending Sms
Document5 pages
The Android Application For Sending Sms
Lav Kumar
No ratings yet
Operator Rolling Mill Equipments
Document33 pages
Operator Rolling Mill Equipments
Mai Văn Định
No ratings yet
Enhanced Surveillance System For ATM Looters Using PIC Microcontroller
Document4 pages
Enhanced Surveillance System For ATM Looters Using PIC Microcontroller
monishabe23
No ratings yet
Chapter 4 - Processing Mail
Document37 pages
Chapter 4 - Processing Mail
NOR SHADNI SHAHDINA SHAEFI
No ratings yet
Lab 8 Dinamica Aplicada
Document19 pages
Lab 8 Dinamica Aplicada
George Lsg
No ratings yet
Getting Started in Kicad
Document53 pages
Getting Started in Kicad
Laptop Mail
No ratings yet