Pred Analytics Writeup

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

t h e term “data science” is increasingly common, as is “big data.” But what does it mean?

Is
there something unique about it? What skills do “data scientists” need to be productive in a
world deluged by data? What are the implications for scientific inquiry? Here, I address
these questions from the perspective of predictive modeling.

We might feel that Data science might imply a focus involving data and, by
extension, statistics, or the systematic study of the organization, properties, and
analysis of data and its role in inference, including our confidence in the
inference. But why do we need the term Data Science when we had Statistics
for centuries.

As it is evident from the figure, the growth of vol- umes of unstructured and structured data
from 2008 to 2015 worldwide, projecting a difference of almost 200 petabytes (PB) in 2015 compared to a
difference of 50PB in 2012.

most data generated by humans and computers today is for consump- tion by computers; that is, computers
increasingly do background work for each other and make decisions auto- matically. This scalability in decision
making has become possible because of big data that serves as the raw mate- rial for the creation of new
knowledge;

The emphasis on prediction is particularly strong in the machine learning and knowledge discov- ery in
databases, or KDD, commu- nities. Unless a learned model is predictive, it is generally regarded with
skepticism,

he requirement on predic- tive accuracy on observations that will occur in the future is a key con- sideration in
data science.

Skills

Machine learning skills are fast be- coming necessary for data scientists as companies navigate the data deluge
and try to build automated decision systems that hinge on predictive accu- racy.

Data scientists’ knowledge about machine learning must build on more basic skills that fall into three broad
classes: The first is statistics, especially Bayesian statistics, which requires a working knowledge of probability,
dis- tributions, hypothesis testing, and mul- tivariate analysis.

The second class of skills comes from computer science and pertains to how data is internally represented and
manipulated by computers. This involves a sequence of courses on data structures, algorithms, and systems,
including distributed computing, databases, parallel computing, and fault-tolerant computing. Together with
scripting languages (such as Py- thon and Perl),

The third class of skills requires knowledge about correlation and cau- sation and is at the heart of virtually any
modeling exercise involving data. data scientist should have a clear idea of the distinction between correlation
and causality and the ability to assess which models are feasible, desirable, and practical in different settings.

Experienced data scientists are familiar with these prob- lems and know how to formulate them in a way that
gives a system a chance to make correct predictions under con- ditions where the priors are stacked heavily
against it.

Skills Organization
The data science revolution also poses serious organizational chal- lenges as to how organizations manage their
data scientists. Besides recogniz- ing and nurturing the appropriate skill sets, it requires a shift in managers’
mind-sets toward data-driven decision making to replace or augment intuition and past practices.

“In God we trust, everyone else please bring data”—has come to characterize the new orienta- tion, from
intuition-based decision making to fact-based decision making.

From a decision-making stand- point, we are moving into an era of big data where for many types of prob- lems
computers are inherently bet- ter decision makers than humans, where “better” could be defined in terms of cost,
accuracy, and scalabil- ity. This shift has already happened in the world of data-intensive finance where
computers make the majority ofinvestmentdecisions

The same holds in areas of online advertising where mil- lions of auctions are conducted in mil- liseconds every
day, air traffic control, routing of package delivery, and many types of planning tasks that require scale, speed,
and accuracy simultane- ously

You might also like