Introduction To Gather
Introduction To Gather
Introduction To Gather
Where do we store data and how has this changed over time?
Conclusion
The Explore Data Science Process
The Explore Data Science Process is about solving real-world problems using data.
GATHER
Databases
SQL Queries
Types of Databases
Modifying Data
Schema Maintenance
Statistics
Probability
Distributions
Set Theory
Gathering data in the real world
ITERATE!
Getting data is a critical part of data science, sometimes you get lucky and it’s already available….
Open data sources, for example Create your own new datasets
• Stats SA • Primary research, including:
• UCT’s Data Portal ○ Surveys
• City of Cape Town ○ Interviews
• The World Bank ○ Simulating data
There are multiple mediums for storing data and these are constantly changing and improving.
• Prehistoric data storage • Data stored physically on a • We are now starting to store
included writing on clay local computer, external drive data in the “cloud” e.g. on
tablets or on rock or on a server in a database or Amazon Web Services,
in a file system Microsoft Azure, or Google
Cloud