R Crash Course For Business

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

R

Crash Course
Kelley 303
Technology and Business Analyst

Olga Scrivner @obscrivn


https://www.linkedin.com/in/olgascrivner/

Some Material is adapted from YaRrr! The Pirate’s Guide to R


by Nathaniel D. Phillips (2018)
https://bookdown.org/ndphillips/YaRrr/

1
Outline
Part 1 – Introduction to R and RStudio
Part 2 – Working with data
Part 3 – R Graphics
Part 4 - Deployment, Sharing, and Publishing with R

2
Rstudio Basics

If you know Rstudio – skip this section

3
Getting Started

Two software packages: Base-R, and RStudio

4
Getting Started

RStudio is a graphical user interface interface for R.

http://www.rstudio.com/products/rstudio/download/ 5
Rstudio Window

1. SSRC desktop – start button

2. Rstudio Shortcut Icon

6
Rstudio Screen

7
Open New File

R scripts are just text files with the “.R” extension 8


Rstudio Tour

9
Editor – “Source/Script”

10
Console– “Command Line”

11
History and Objects

12
Extremely Useful Panel

13
Files

The files panel gives you access to the file directory on your hard drive
14
Files for Workshop

1. Download and unzip files from


https://languagevariationsuite.wordpress.com/
2. Create a Folder for CrashCourse in Desktop
3. Place unzip files in the CrashCourse folder

15
Telling R Where Your Files Are

Select Session > Set Working Directory > Choose Directory >
Select CrashCourse folder in Desktop > Open

16
Packages - Useful Libraries
1. Shows a list of all the R packages installed
2. Indicates whether the package is currently loaded

17
Packages

Click to Activate and Deactivate a package

18
Select Package Content - Stats

Click on Package to Access its content


Shortcut to find
it quick

19
Scroll Stats Content

Do you recognize any statistical functions?

20
Help!

1. TYPE IN CONSOLE:
?hist

2. CLICK ENTER

21
Help!

1. Review the Help Panel

2. Scroll down the


histogram function:
What are the
arguments?
hist(x,...)
x – “a vector of values
for which the histogram
is desired”
22
Create Histogram

1. Return to your Editor file (it is empty)


2. Place your curser on Line 1
3. Type any set of numbers
• x = c(1,2,3,4,5)
4. Keep your cursor on line 1
5. Click RUN Name
6. On line 2, type
• hist(x)
7. Click RUN
23
Organizing Rstudio Layout

Try to adjust and change the size of panels.

24
RUN versus ENTER: Editor versus Console

1. TYPE IN CONSOLE 3. TYPE IN EDITOR

4. CLICK RUN

5. RESULTS - CONSOLE

2. CLICK ENTER KEY - keyboard


25
Opening and Saving Files

26
Packages

CRAN is a main source of R packages.

An R package is a bunch of data,


https://cran.r-project.org/
functions, examples (vignettes)
stored in one package.

Source: https://bookdown.org/ndphillips/YaRrr/packages.html 27
Installing Packages Practice - Wordcloud

28
Exploring Data

29
Load CSV into R

NOTE: Remember about Working Directory? Go to Slide 16, if needed

pirates = read.csv(“pirates.csv”)

By default for csv:


read.csv(file, header = TRUE, sep = “,”)

You can use single quotes or double quotes for file names
30
Data Storage in R

31
View Entire Dataset

Click on pirates

32
Take a Closer Look at Your Data

head() - take a look at the first few rows


Type head(pirates) in your script. Click run.

33
Descriptive Statistics: Min, Max, Mean, Table

mean(pirates$age)
max(pirates$height) Continuous variable

table(pirates$sex)

pirates$age
Table name $ Column name

Categorical variable
34
Summary of Stats Functions

practice
each
function:
Type in
your
editor
examples
and RUN
35
Aggregate Function

Names of columns

Name of dataset
Type of Statistics

Change mean to sum: aggregate(formula = age ~ sex, data = pirates, FUN = sum)
36
Plotting

You can select and highlight two lines, then click RUN. Or RUN each line at a time
If plot does not fit your window, adjust the Rstudio layout and RUN plot again
Error in plot.new() : figure margins too large

37
Saving Plot

38
Adding Title, Labels, and Color

39
Adding Regression Line

40
Box Plot

boxplot

41
How to Interpret Box Plot

Inter-quartile range - the middle 50%

Median - the mid-point of the data

42
Source: https://www.wellbeingatschool.org.nz/information-sheet/understanding-and-interpreting-box-plots
How to Interpret Box Plot

The box plot is tall – a lot of variation

The box plot is short – a very small variation deviation

One box plot is much higher or lower than another –


a difference between groups

The 4 sections of the box plot are uneven in size –


more or less variation in different quartiles

Same median, different distribution

43
Source: https://www.wellbeingatschool.org.nz/information-sheet/understanding-and-interpreting-box-plots
Hypothesis testing

44
T-test – two sample
Difference between two variables

TWO SIDED - you are testing for the


possibility of the relationship in
both directions

ONE SIDED - you are testing for the


possibility of the relationship in one
direction (x > y OR x < y)

45
Source: https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-the-differences-between-one-tailed-and-two-tailed-tests/
T-test – two sample
Difference between two variables

p-value of 0.7 - we fail to reject our


NULL hypothesis that there is NO
difference in the men age of pirates
who wear headbands and those that
do not

46
Correlation Testing
Correlation between two variables

p-value < 0.0000000000000002 -


we reject our NULL hypothesis and
there is a significant (positive)
relationship between a pirate’s
height and weight

47
Linear Regression Model
Can we use pirate’s age, weight, and number of tattoos
to predict how many treasure chests they found?

DV – dependent variable
IV – independent variable

48
Time Series

49
Import EU Stocks Data 1991-1998

eu <- read.csv("eustockmarkets.csv")

head(eu)

50
Create Time Series EU Stocks 1991-1998

time_series <- ts(eu, start=1991, end=1998, frequency = 260)

start(time_series)

end(time_series)

51
Plot Time Series EU Stocks 1991-1998

plot(time_series)

52
Time Series Plot EU Stocks 1991-1998

ts.plot(time_series, col = 1:4, xlab = "Year", ylab = "Index Value", main


= " Major European Stock Indices, 1991-1998 ")

Color – based on columns

53
Add Legend - Time Series Plot EU Stocks 1991-1998
ts.plot(time_series, col = 1:4, xlab = "Year", ylab = "Index Value", main
= " Major European Stock Indices, 1991-1998 ")

legend("topleft", legend = colnames(time_series), col = 1:4, lty = 1)

54
Debugging

55
Debugging R

The + symbol means that R is


Waiting for you to (properly) finish
Code (example – you forgot a closing
parenthesis)

The > symbol means that R is Ready


for new code.
Source: https://bookdown.org/ndphillips/YaRrr/debugging.html 56
Debugging R

Console:

Source: https://bookdown.org/ndphillips/YaRrr/debugging.html 57
Debugging R

Source: https://bookdown.org/ndphillips/YaRrr/debugging.html 58
Debugging R

Console:

Source: https://bookdown.org/ndphillips/YaRrr/debugging.html 59
Debugging R

Console:

Source: https://bookdown.org/ndphillips/YaRrr/debugging.html 60
Syntax

61
R Syntax

62
Strings

63
Strings: Practice

64
Vector

65
Length

length() – a function

66

You might also like