Introduction To Data Science and Machine Learning
Introduction To Data Science and Machine Learning
Introduction To Data Science and Machine Learning
1. Data Science is the science of extracting hidden patterns from large data sets
3. Data sets usually refer to large volume of cleansed, structured data prepared
for the analysis
a. That part of statistics which is used to understand the data is called descriptive
statistics. Descriptive statistics give vital insights into the data in terms of
central values, spread and distribution shape of the data
b. The part of statistics which is used to establish the reliability of the potential
patterns identified, is called inferential statistics
What is Machine Learning?
2. These algorithms use a learning process through which they identify the patterns
in the dataset. The patterns they learn from the data are called models
4. Machine learning algorithms work on the data prepared for analytics to express
the hidden patterns in form of models
1. Fraud detection
2. Sentiment analysis
3. Usually data science is a team effort where the team consists of all the
required skills and knowledge
Real World as Mathematical Space
Machine learning happens in mathematical space / feature
space:
1. A data set representing the real world, is a collection attributes that
define an entity
Sugar
BP level
Heart healthy
Potential heart ailments
Machine learning happens in mathematical space / feature
space:
Sugar
BP level
Heart healthy
Potential heart ailments
Machine learning happens in mathematical space / feature
space:
Sugar
alternate hypothesis
ax + by + cz = d
8. x , y, z represent the three
dimensions i.e. BP, Age, Sugar
while d represents the color
Sugar
i.e. healthy or ailing heart
BP level
Heart healthy
Potential heart ailments
Machine learning happens in mathematical space / feature
space:
Sugar
based on d
ax + by + cz = d, BP level
Heart healthy
Potential heart ailments
Machine learning happens in mathematical space / feature
space:
Sugar
ax + by + cz = d, BP level
Heart healthy
Potential heart ailments
Machine learning happens in mathematical space / feature
space:
ax + by + cz = d,
Sugar
14. If majority of new data points
are correctly classified, the
model is good else not
BP level
Heart healthy
Potential heart ailments
Machine Learning Categories
Machine learning categories:
There are broadly three categories into which the machine learning algorithms
are grouped
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Supervised Machine Learning:
1. Class of algorithms which work in two stages. The first stage is called training and
second one is usually called testing. Sometimes it may involve validation stage
followed by testing
2. At each stage it takes input data prepared for that stage. i.e. for training data for
training stage, test data for test stage, validation data for validation stage
3. During training, the machine learning algorithm gets the training data inform of
independent and dependent variables
4. In the process of learning, the algorithm learns the relationship between the
dependent and the independent variables
5. This relationship is expressed as a model which can take the form of a equation,
probability ratios, hidden rules etc.
1. Regression - Predicting mileage of a car given the other features such as weight,
engine capacity, horse power, transmission type, number of cylinders etc.
3. The algorithms are not used to find any relationship between dependent and
independent variables
4. This class of algorithms usually find patterns in form of clusters and associations
reflecting some kind of commonality, togetherness among the data points in the
given data sets
1. Clustering - Identifying groups in the given data set where a group represents
some kind of commonality among the data points. “Birds of same feather, flock
together”.
a. Flat clustering, e.g. Kmeans clustering- The clusters identified are disjoint,
non-overlapping. For e.g. segmenting customers into different groups based
on their purchase amount, frequency of purchase and types of items purchase
2. During the initial stages of learning, it is likely to commit many errors in learning
the patterns, however, through a process of reward and punishment, it learns to
identify the patterns correctly.