Data Science Training Institute in Hyderabad

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 14

Musings on Data Science and

Students Experiencing Data Analytics

Presented By

www.kellytechno.com

The first war:


Terminology
Analyzing data has a long history!
There have been many terms that have

been used to describe such endeavors:

Statistics
Artificial Intelligence
Machine learning
Data analytics

Since I happen to work in a Data Science

program perhaps I may be allowed the


indulgence of using that terminology
www.kellytechno.com

Whatever we call it, what


makes things different
now?

www.kellytechno.com

Experiments, observations, and numerical simulations in many


areas of science and business are currently generating terabytes
of data, and in some cases are on the verge of generating
petabytes and beyond. Analyses of the information contained in
these data sets have already led to major breakthroughs in fields
ranging from genomics to astronomy and high-energy physics and
to the development of new information-based industries.

Given a large mass of data, we can by judicious selection


construct perfectly plausible unassailable theoriesall of
which, some of which, or none of which may be right.
www.kellytechno.com

The ability to take datato be able to understand it, to process it, to


extract value from it, to visualize it, to communicate itthats going to
be a hugely important skill in the next decades, not only at the
professional level but even at the educational level for elementary
school kids, for high school kids, for college kids. Because now we
really do have essentially free and ubiquitous data. So the
complimentary scarce factor is the ability to understand that data and
extract value from it.

www.kellytechno.com

What is Big Data?

The are many examples of "data", but what makes


some of it big? The classic definition revolves
around the three Vs.
Volume, velocity, and variety.

Volume: There is a just a lot of it being


generated all the time. Things get interesting
and big, when you cant fit it all on one
computer anymore. Why? There are many ideas
here such as MapReduce, Hadoop, etc. that all
revolve around being able to process data that
http://pl.wikipedia.org
/wiki/Green_Giant#m
goes from Terabytes, to Petabytes, to Exabytes.
ediaviewer/Plik:Jolly_
green_giant.jpg

Velocity: Data is being generated very quickly.


Can you even store it all? If not, then what do
you get rid of and what do you keep?

Variety: The data types you mention all take


different shapes. What does it mean to store
them so that you can play with or compare them?
www.kellytechno.com

Is Big Data the same as Data


Science?

Are Big Data and Data Science the same


thing?

I wouldn't say so...


Data Science can be done on small data sets.
And not everything done using Big Data would
necessarily be called Data Science.

Data
Science

Big Data
www.kellytechno.com

Is Big Data the same as Data


Science?

Are Big Data and Data Science the same


thing?

I wouldn't say so...


Data Science can be done on small data sets.
And not everything done using Big Data would
necessarily be called Data Science.
But there certainly is a substantial overlap!

Data
Big Data
Science
www.kellytechno.com

Can you even be certain?

For real world problems, I


claim that you will never be
certain of any inferences
from data.
I mean, what happens to
your carefully thought out
marketing plan for some
rocking slacks when the
Martians land.
What is unacceptable is
when the data you actually
have does not support the
conclusion you report.

Public domain image

www.kellytechno.com

Skills for Data Science

www.kellytechno.com

Which is most important?

http://en.wikipedia.org/wiki/View_of_the_World_from_9th_Avenue

www.kellytechno.com

WPI Data Science Program:


A Collaboration
Mathematical
Sciences
Department

Computer
Science
Department

Business School

www.kellytechno.com

Data Science Core


INTEGRATIVE DATA SCIENCE :

DS 501 INTRODUCTION TO DATA SCIENCE (NEW COURSE)

MATHEMATICAL ANALYTICS (SELECT ONE):

MA 543/DS 502 STATISTICAL METHODS FOR DATA SCIENCE (NEW COURSE)


MA 542 REGRESSION ANALYSIS
Data Science Certificate
MA 554 APPLIED MULTIVARIATE ANALYSIS
Program (18 credits);

DATA ACCESS AND MANAGEMENT (SELECT ONE):


15 CREDIT DATA
CS 542 DATABASE MANAGEMENT SYSTEMS
SCIENCE CORE
MIS 571 DATABASE APPLICATIONS DEVELOPMENT
plus
CS 561 ADVANCED TOPICS IN DATABASE SYSTEMS
3 CREDIT ELECTIVE
CS 585/DS 503 BIG DATA MANAGEMENT (NEW COURSE)

DATA ANALYTICS AND MINING (SELECT ONE):


CS 548 KNOWLEDGE DISCOVERY AND DATA MINING
CS 539 MACHINE LEARNING
CS 586/DS 504 BIG DATA ANALYTICS (NEW COURSE)

BUSINESS INTELLIGENCE AND CASE STUDIES (SELECT ONE):


MIS 584 BUSINESS INTELLIGENCE
MKT 568 DATA MINING BUSINESS APPLICATIONS

www.kellytechno.com

www.kellytechno.com

You might also like