Assignment1 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

ST5209/X Assignment 1

ZHAO ZIHUI

Set up

1. Make sure you have the following installed on your system: LATEX, R4.2.2+, RStudio
2023.12+, and Quarto 1.3.450+.
2. Clone the course repo.
3. Create a separate folder in the root directory of the repo, label it with your name,
e.g. yanshuo-assignments
4. Copy the assignment1.qmd file over to this directory.
5. Modify the duplicated document with your solutions, writing all R code as code chunks.
6. When running code, make sure your working directory is set to be the folder with your
assignment .qmd file, e.g. yanshuo-assignments. This is to ensure that all file paths
are valid.1

Submission

1. Render the document to get a .pdf printout.


2. Submit both the .qmd and .pdf files to Canvas.

Question 1 (Quarto)

Read the guide on using Quarto with R and answer the following questions:

a) Write a code chunk that imports tidyverse and fpp3.

library(tidyverse)

1
You may view and set the working directory using getwd() and setwd().

1
-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
v dplyr 1.1.4 v readr 2.1.5
v forcats 1.0.0 v stringr 1.5.1
v ggplot2 3.4.4 v tibble 3.2.1
v lubridate 1.9.3 v tidyr 1.3.0
v purrr 1.0.2
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom

library(fpp3)

-- Attaching packages ---------------------------------------------- fpp3 0.5 --


v tsibble 1.1.3 v fable 0.3.3
v tsibbledata 0.4.1 v fabletools 0.3.4
v feasts 0.3.1
-- Conflicts ------------------------------------------------- fpp3_conflicts --
x lubridate::date() masks base::date()
x dplyr::filter() masks stats::filter()
x tsibble::intersect() masks base::intersect()
x tsibble::interval() masks lubridate::interval()
x dplyr::lag() masks stats::lag()
x tsibble::setdiff() masks base::setdiff()
x tsibble::union() masks base::union()

a) Modify the chunk so that only the following output is shown (i.e. the usual output about
attaching packages and conflicts is not shown.)

library(tidyverse)
library(fpp3)

c) Modify the chunk so that it is executed but no code is shown at all when rendered to a
pdf.

c) Modify the document so that your name is printed on it beneath the title.

2
Question 2 (Livestock)

Consider the aus_livestock dataset loaded in the fpp3 package.

a) Use filter() to extract a time series comprising the monthly total number of pigs
slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018.
subset <- filter(aus_livestock,
State=="Victoria",
Animal=="Pigs",
yearmonth(Month)>= yearmonth("1972 Jul")
&yearmonth(Month)<=yearmonth("2018 Dec"))

b) Make a time plot of the resulting time series.


library(ggplot2)
Sys.setlocale("LC_TIME", "en_US.UTF-8")

[1] "en_US.UTF-8"
autoplot(subset,series="Count")

Plot variable not specified, automatically selected `.vars = Count`

Warning in geom_line(...): Ignoring unknown parameters: `series`

160000

120000
Count

80000

40000

1980 Jan 1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]

3
Question 3 (Data cleaning)

Inspect the function process_sgcpi() located in _code/clean_data.R. This function is used


to convert the raw Consumer Price Index (CPI) data in _data/raw/sg-cpi.csv into a tsibble,
stored in _data/cleaned/sgcpi.rds.

a) In line 9, what does skip = 10 and n_max = 152 do? Why do we need to do this when
reading the csv file?
Answer: “skip=10” means that the first 10 line of csv file are skipped. “n_max=152”
sets the row number to 152, so the rows which are after 152 couldn’t be read. Then we
could only read the interested data and skip irrelevant part.
b) In line 14, what does t() do? Why do we need to do this in order to make a tsibble?
Answers: Function t() is used to transpose matrix. By this function, the original rows
and columns become columns and rows separately which makes the function as_tsibble()
could get the right index.

Question 4 (Beer production)

Consider the aus_production dataset loaded in the fpp3 package. We will study the column
measuring the production of beer.

a) Make a time plot of the beer production time series.


aus_production |> autoplot(.vars= Beer)

4
600

500
Beer

400

300

200
1960 Q1 1980 Q1 2000 Q1
Quarter [1Q]

b) Describe the observed trend.


Answer:The time plot shows an increasing trend in beer production from 1960 to ap-
proximately 1975, followed by a stable trend from 1976 to 1983. Then there is a slight
increase from 1984 to 1989. Finally it shows a slight decreasing trend in the following
years.
c) Make a seasonal plot.
aus_production |> gg_season(y= Beer)

5
600

500
2005
1995
Beer

400 1985
1975
1965

300

200
Q1 Q2 Q3 Q4
Quarter
d) What is the period of the seasonality?
Answer: Period is 4.
e) Describe the seasonal behavior.
Answer: Beer production decreases from Q1 to Q2 and increases from Q2 to Q4 which
shows a more obvious increasing trend from Q3 to Q4.

Question 5 (Pelts)

Consider the pelt dataset loaded in the fpp3 package, which measures the Hudson Bay Com-
pany trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935.

a) Plot both time series on the same axes. Hint: Use pivot_longer() to create a key
column.
fit2 <- pivot_longer(pelt,cols =c(Hare,Lynx))
fit2 |> autoplot()

Plot variable not specified, automatically selected `.vars = value`

6
150000

100000
name
value

Hare
Lynx

50000

0
1860 1880 1900 1920
Year [1Y]

b) What happens when you try to use gg_season() to the lynx fur time series? What is
producing the error?
# pelt |> gg_season()

Answer: Error: Plot variable not specified, automatically selected ‘y = Hare‘Error in


‘gg_season()‘: ! The data must contain at least one observation per seasonal period.
Backtrace: 1. feasts::gg_season(pelt).
Function gg_season needs at least observation per seasonal period. But “pelt” data
couldn’t provide seasonal data so it produces errors.
c) Make a lag plot with the first 20 lags. Which lags display strong positive correlation?
Which lags display strong negative correlation? Verify this with the time plot.
pelt |> gg_lag(y= Lynx, lags = 1:20,geom="point")

7
lag 1 lag 2 lag 3 lag 4 lag 5
80000
60000
40000
20000
0
lag 6 lag 7 lag 8 lag 9 lag 10
80000
60000
40000
20000
0
Lynx

lag 11 lag 12 lag 13 lag 14 lag 15


80000
60000
40000
20000
0
lag 16 lag 17 lag 18 lag 19 lag 20
80000
60000
40000
20000
0
20000
0 40000
60000
80000
20000
0 40000
60000
80000
20000
0 40000
60000
80000
20000
0 40000
60000
80000
20000
0 40000
60000
80000
lag(Lynx, n)

Answer: lag 1, 19 and 20 display strong positive correlations. lag 5, 14 and 15 display
strong negative correlations. The following plot could verify this conclusion.
pelt |> autoplot()

Plot variable not specified, automatically selected `.vars = Hare`

150000

100000
Hare

50000

0
1860 1880 1900 1920
Year [1Y]

8
d) If you were to guess the seasonality period based on the lag plot, what would it be?
Answer: it would be 10 years.
e) Use the provided function ‘gg_custom_season() in _code/plot_util.R2 to make a
seasonal plot for lynx furs with the period that you guessed.3 Does the resulting plot
suggest seasonality? Why or why not?
source("D:/st5209/plot_util.R")

Loading required package: rlang

Attaching package: 'rlang'

The following objects are masked from 'package:purrr':

%@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,


flatten_raw, invoke, splice
gg_custom_season(pelt,y = Lynx , period=40)

80000

60000

Iteration
1
Lynx

40000
2
3

20000

0
0 10 20 30 40
Season

Answer: The period is 10 years. The plot displays seasonality and shows a swing in the
length of season. In every 10 years, it shows roughly a decreasing trend in the first 7 years
and an increasing trend in the following 3 years.
2
You can load this function using source("../_code/plot.util.R").
3
Unfortunately, it seems ‘gg_season() does not allow this functionality.

You might also like