How These Books Were Found: Get Updates in Your Inbox

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

The internet's best courses Data Science Machine Learning Python

Daily Learning Courses Team

You are reading Articles

3
SHARES

!
Author: Brendan Martin
Founder of LearnDataSci
"

# Most Recommended Data Science and Machine


Learning Books by Top Master's Programs
$
See the most popular books assigned in Master's programs from
% top universities

Contents Index +

How these books were found

After over 15 hours researching and logging materials assigned in Master’s


programs, the following books were the most recommended to graduate
students in those programs. Since data scientists can come from many
backgrounds, the Master’s degrees considered were in applied math, statistics,
computer science, machine learning, and data science.

Specifically, the following programs were explored:

• Master in Machine Learning — Carnegie Mellon University


• Masters in Statistics — Stanford University
• Masters in Computer Science, specializing in Artificial Intelligence — Stanford
University
• Masters in Computer Science — Georgia Tech
• Masters in Data Science — Harvard University
• Master in Computational Science and Engineering — Harvard University
• Masters in Data Science — Columbia University

Due to the amount of time it takes to wade through degree requirements, course
codes, and catalogs, this article will continue to evolve as I gather more data.

In each book below, I’ve given an example of how the author(s) decided to
introduce Linear Regression, one of the most basic machine learning algorithms.
If you’re a beginner in data science, I think this will give you some insight into
what sort of math background each book requires.

Without further ado, here’s the most assigned and recommended books from top
universities.

Most Recommended Books

#1 The Elements of Statistical Learning: Data Mining,


Inference and Prediction (“ESL”)

Amazon or Free — Authors: Trevor Hastie, Robert Tibshirani, Jerome Friedman

This book was either the assigned textbook or recommended reading in every
Masters program I researched. Due to its advanced nature, you’ll find that book
#5 in this list — An Introduction to Statistical Learning with Applications in R
(ISLR) — was written as a more accessible version, and even includes exercises
in R.

It’s usually recommended for beginners in data science to master the content in
ISLR before moving to ESL, where you’ll get a more theoretical background. Just
mastering ISLR is often enough for data analyst roles.

Overall, ESL takes an applied, frequentist approach, as opposed to a Bayesian


approach like in the next book. Exercises in this book are not only challenging,
but also very useful for individuals generally interested in machine learning
research. Fortunately, you can find solutions to the exercises freely available.

To get an idea of the math required, Linear Regression is introduce like so:

We have an input vector and want


to predict a real-valued output . The linear regression model
has the form

#2 Pattern Recognition and Machine Learning (“PRML”)

Amazon or Free— Author: Christopher Bishop

Recommended in _almost_ every Masters surveyed, this book usually comes up


second after ESL in many course syllabi. PRML is a great resource for
understanding the Bayesian derivations of classical machine learning algorithms.

Despite being very clear and rich in diagrams, to get the full benefit of PRML
you'll need advanced calculus, linear algebra, and optimization knowledge. Many
of the derivations do not show the intermediate steps so it'll be important for you
to go through each step on your own for a good understanding.

Unlike the applied approach of ESL, PRML is more theoretical. Here's how Linear
Regression in introduced by Bishop:

where are known as basis functions

Luckily, Bishop has also authored solutions to the exercises labeled “www” in the
book, making this book a possibility for self-study. You can find those solutions
as a PDF here.

#3 Machine Learning: A Probabilistic Perspective


(“MLAPP”)

Amazon — Authors: Kevin P. Murphy

MLAPP is another book recommended in almost every program; usually it's


between this and the previous book. Considered to be more comprehensive and
relevant than PRML, MLAPP is a very dense and broad encyclopedic guide to
machine learning.

A great resource for graduate courses, but since it's not freely available and the
solutions manual can only be purchased by professors, it's a little more closed
off than others in this list and is not recommended for self-study. Also, If you're a
beginner in machine learning, this textbook isn't an ideal starting point.

Here's how Linear Regression is introduced:

where represents the inner or scalar product between


the input vector and the model's weight vector , and is
the residual error between our linear predictions and the
true response.

Within the next couple of lines, Murphy redefines this in probabilistic terms like
so:

...we can rewrite the model in the following form:

This makes it clear that the model is a conditional probability


density.

Without a more advanced math foundation, it's easy to get caught in the notation
when reading this book on your own.

#4 Deep Learning

Amazon or Free— Authors: Ian Goodfellow, Yoshua Bengio, Aaron Courville,


Francis Bach

Unlike the previous two books listed, this textbook goes into a nice general
survey of math and machine learning methods. There's many concrete examples
and the math is much simpler than MLAPP and PRML.

For example, Linear Regression is introduced like so:

Let be the value that our model predicts should take on. We
define the output to be

where is a vector of parameters [and is a


vector of inputs
inputs]

This notation is much more straightforward for beginners, and very similar to
how both the next book, ISLR, presents it, as well as Andrew Ng’s famous
Machine Learning course on Coursera.

Overall, this book serves as a good reference and starting point for digging
deeper elsewhere, but isn’t comprehensive by any means. There’s not much
direct application, so you won’t gain any insight in how to actually implement
neural networks, but it is a good high-level complement to deep learning
courses — which Andrew Ng has also created.

#5 An Introduction to Statistical Learning with


Applications in R ("ISLR")

Amazon or Free— Authors: Gareth James, Daniela Witten, Trevor Hastie, and
Robert Tibshirani

I’ll start out by saying that this a fantastic book. ISLR is usually recommended in
the first course of programs specifically built for data science, which makes a lot
of sense from how this book is structured.

Although not a thick book by any means, it’s derived from the #1 book, The
Elements of Statistical Learning, and comprehensively covers the fundamentals
every data scientist should know.

Not only is it extremely clear and accessible to those with a basic undergrad
math background, but it has a very applied approach. Every chapter comes with
exercises in R that let you work applying the concepts you’re learning directly on
some data.

Furthermore, the authors of the book created an accompanying online course,


which follows each chapter and is totally free.

For comparison, here’s how ISLR introduces Linear Regression:

Mathematically, we can write this linear relationship as

You might read " " as "is approximately modeled as". We will
sometimes describe [this equation] by saying that we are
regressing on (or onto ).

As you can see, ISLR is much more beginner-friendly. Each statistical/machine


learning concept is introduced just like this, without heavy notation, and in a very
approachable way.

Let me know your thoughts

Have you read any of the books listed? Did you use any of these in a course?
What did you think?

I'm going to continue compiling books I find in course syllabi from top
universities and frequently update this article, but I would also love to know what
you all think about each of these.

If there wasn't a book mentioned that you've found particularly helpful, leave a
comment and let me know!

Get updates in
your inbox
Join over 7,500 data science learners.

Enter your email Subscribe

Meet the Authors

Brendan Martin
Founder of LearnDataSci

Author and Editor at LearnDataSci. Python development and data


science consultant.

Back to blog index

Best Data Science Courses Best Machine Learning Courses Best Udemy Courses

Data Science & Machine Learning Glossary Free Data Science Books

Privacy Policy

Get updates in your inbox


© 2022 LearnDataSci. All rights reserved.
Join over 7,500 data science learners.
Use of and/or registration on any portion of this site constitutes acceptance of our Privacy
Policy. The material on this site may not be reproduced, distributed, transmitted, cached or
Enter your email Subscribe
otherwise used, except with the prior written permission of LearnDataSci.com.

You might also like