Assignment1 1
Assignment1 1
Assignment1 1
ZHAO ZIHUI
Set up
1. Make sure you have the following installed on your system: LATEX, R4.2.2+, RStudio
2023.12+, and Quarto 1.3.450+.
2. Clone the course repo.
3. Create a separate folder in the root directory of the repo, label it with your name,
e.g. yanshuo-assignments
4. Copy the assignment1.qmd file over to this directory.
5. Modify the duplicated document with your solutions, writing all R code as code chunks.
6. When running code, make sure your working directory is set to be the folder with your
assignment .qmd file, e.g. yanshuo-assignments. This is to ensure that all file paths
are valid.1
Submission
Question 1 (Quarto)
Read the guide on using Quarto with R and answer the following questions:
library(tidyverse)
1
You may view and set the working directory using getwd() and setwd().
1
-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
v dplyr 1.1.4 v readr 2.1.5
v forcats 1.0.0 v stringr 1.5.1
v ggplot2 3.4.4 v tibble 3.2.1
v lubridate 1.9.3 v tidyr 1.3.0
v purrr 1.0.2
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom
library(fpp3)
a) Modify the chunk so that only the following output is shown (i.e. the usual output about
attaching packages and conflicts is not shown.)
library(tidyverse)
library(fpp3)
c) Modify the chunk so that it is executed but no code is shown at all when rendered to a
pdf.
c) Modify the document so that your name is printed on it beneath the title.
2
Question 2 (Livestock)
a) Use filter() to extract a time series comprising the monthly total number of pigs
slaughtered in Victoria, Australia, from Jul 1972 to Dec 2018.
subset <- filter(aus_livestock,
State=="Victoria",
Animal=="Pigs",
yearmonth(Month)>= yearmonth("1972 Jul")
&yearmonth(Month)<=yearmonth("2018 Dec"))
[1] "en_US.UTF-8"
autoplot(subset,series="Count")
160000
120000
Count
80000
40000
1980 Jan 1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
3
Question 3 (Data cleaning)
a) In line 9, what does skip = 10 and n_max = 152 do? Why do we need to do this when
reading the csv file?
Answer: “skip=10” means that the first 10 line of csv file are skipped. “n_max=152”
sets the row number to 152, so the rows which are after 152 couldn’t be read. Then we
could only read the interested data and skip irrelevant part.
b) In line 14, what does t() do? Why do we need to do this in order to make a tsibble?
Answers: Function t() is used to transpose matrix. By this function, the original rows
and columns become columns and rows separately which makes the function as_tsibble()
could get the right index.
Consider the aus_production dataset loaded in the fpp3 package. We will study the column
measuring the production of beer.
4
600
500
Beer
400
300
200
1960 Q1 1980 Q1 2000 Q1
Quarter [1Q]
5
600
500
2005
1995
Beer
400 1985
1975
1965
300
200
Q1 Q2 Q3 Q4
Quarter
d) What is the period of the seasonality?
Answer: Period is 4.
e) Describe the seasonal behavior.
Answer: Beer production decreases from Q1 to Q2 and increases from Q2 to Q4 which
shows a more obvious increasing trend from Q3 to Q4.
Question 5 (Pelts)
Consider the pelt dataset loaded in the fpp3 package, which measures the Hudson Bay Com-
pany trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935.
a) Plot both time series on the same axes. Hint: Use pivot_longer() to create a key
column.
fit2 <- pivot_longer(pelt,cols =c(Hare,Lynx))
fit2 |> autoplot()
6
150000
100000
name
value
Hare
Lynx
50000
0
1860 1880 1900 1920
Year [1Y]
b) What happens when you try to use gg_season() to the lynx fur time series? What is
producing the error?
# pelt |> gg_season()
7
lag 1 lag 2 lag 3 lag 4 lag 5
80000
60000
40000
20000
0
lag 6 lag 7 lag 8 lag 9 lag 10
80000
60000
40000
20000
0
Lynx
Answer: lag 1, 19 and 20 display strong positive correlations. lag 5, 14 and 15 display
strong negative correlations. The following plot could verify this conclusion.
pelt |> autoplot()
150000
100000
Hare
50000
0
1860 1880 1900 1920
Year [1Y]
8
d) If you were to guess the seasonality period based on the lag plot, what would it be?
Answer: it would be 10 years.
e) Use the provided function ‘gg_custom_season() in _code/plot_util.R2 to make a
seasonal plot for lynx furs with the period that you guessed.3 Does the resulting plot
suggest seasonality? Why or why not?
source("D:/st5209/plot_util.R")
80000
60000
Iteration
1
Lynx
40000
2
3
20000
0
0 10 20 30 40
Season
Answer: The period is 10 years. The plot displays seasonality and shows a swing in the
length of season. In every 10 years, it shows roughly a decreasing trend in the first 7 years
and an increasing trend in the following 3 years.
2
You can load this function using source("../_code/plot.util.R").
3
Unfortunately, it seems ‘gg_season() does not allow this functionality.