Data Processing/Analysis/Science With R: (A Brief) Introduction To R
Data Processing/Analysis/Science With R: (A Brief) Introduction To R
Data Processing/Analysis/Science With R: (A Brief) Introduction To R
Data
Processing/Analysis/Science
with R
(A Brief ) Introduction to R
By Marin Fotache
What is R ?
An open-source and free language and platform for:
◦ data gathering (from a wide variety of sources)
◦ data processing
◦ data exploration (data visualisation and data analysis)
◦ data mining / machine learning / deep learning
At its inception, R was targeted specifically for
statistical computing (it derives from S programming
language)
Now it incorporates options for tasks varying from
data processing to software development; still, it is not
considered a full-fledged programming language (like
Python, for example)
Why R ? (Kabacoff, 2011)
R is free! (SPSS, SAS, etc. cost thousands or tens of thousands of
dollars)
R is a comprehensive statistical platform, offering all manner of
data analytic techniques
R has state-of-the-art graphics capabilities
R is a powerful platform for interactive data analysis and
exploration
R can easily import data from a wide variety of sources, including
text files, database management systems, statistical packages, and
specialized data repositories. It can write data out to these systems
as well
R provides an unparalleled platform for programming new
statistical methods in an easy and straightforward manner. It’s easily
extensible and provides a natural language for quickly
programming recently published methods
Why R ? (cont.)
R contains advanced statistical routines not yet
available in other packages. In fact, new methods
become available for download on a weekly basis
R has an (over) enthusiastic community of users
and developers
A variety of graphic user interfaces (GUIs) are
available, offering the power of R through menus
and dialogs.
R runs on a wide array of platforms, including
Windows, Unix, and Mac OS X
R main limitations (Fotache & Strimbei,
2013)
User interface:
◦ R was initially based on the command prompt and scripts
◦ SPSS and Excel users find the transition to R interface (GUI) difficult
◦ IDEs like RStudio have hugely improved the productivicy and R acceptance
A tidal wave of packages
◦ Deciding which function/package to use is not always an easy task
◦ Some of the packages are poorly maintained (unavailable in recent R versions)
◦ New packages require constant scanning of R literature/blogosphere
Data sourcing (not particular to R)
◦ In many cases ETL mechanisms are needed for gathering data from web logs,
sensors, mobile applications, Excel files, etc.
◦ Various packages have been developed in this respect
◦ Recent APIs, web services are beneficial
Functional programming (not only in R) requires some time to master
References/resources on R
R Studion Education (Beginners)
https://education.rstudio.com/learn/beginner/
R resources (free courses, books, tutorials, & cheat
sheets – updated regularly)
https://paulvanderlaken.com/2017/08/10/r-resources-c
heatsheets-tutorials-books/
New to R? Kickstart your learning and career with
these 6 steps!
https://paulvanderlaken.com/2017/10/18/learn-r/
R Tutorials and Courses
References/resources on R (cont.)
CRAN - the main R site: http://cran.r-project.org
Books/e-Books which can be bought from:
◦ Amazon (unfortunately, sometimes statistical formula are poorly
displayed on Kindle format)
◦ Publishers (Manning, Packt, …): PDF format widely provided -
excellent to read/display/code copy
Freee-Books (see next slides) and PDF tutorials available
on web
Presentations posted on Slideshare, universities or courses
pages
Video-tutorials (mostly on YouTube) (2016)
http://flavioazevedo.com/stats-and-r-blog/2016/9/13/learnin
References/resources on R (cont.)
Journals:
https://github.com/marinfotache
/Data-Processing-Analysis-Science-with-R/blob/master/01%2
0R_Introduction_Data%20Structures/01a_R_Introduction.R