Time_Series_Analysis (3)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 115

Time Series Analysis

Lecture Review

Dr. Kalyan N Prof. Sangeetha S


Assistant Professor Assistant Professor
Dept. of CSE (Data Science) Dept.of AI & DS
B.M.S College of Engineering B.M.S College of Engineering
Bengaluru - 560019. Bengaluru - 560019.
[email protected] [email protected]
Homepage Homepage
October, 2024.
Time Series Analysis

Contents

1 Time Series Data 5


1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Uses of Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Seasonal Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Decomposition of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Estimating trends and seasonal effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 Decomposition in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Characteristics of Time Series 11


2.1 Introduction and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Objectives and Nature of Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Introduction to Time Series Databases and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Measures of Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Introduction to Measures of Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Example Problem: Estimating Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Stationary Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Definition and Importance of Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.2 Features of Stationary Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.3 R Example: Plotting a Stationary Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Estimation of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.1 Definition of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.2 Proof of Correlation for Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Vector-Valued and Multi-Dimensional Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7.1 Definition and Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7.2 Example: Vector-Valued Series in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Components of Time Series 20


3.1 Additive and Multiplicative models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Resolving components of a Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Measuring Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Graphic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2 Semi-Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.5 Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Correlation 37
4.1 Expectation and the ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 The Ensemble and Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 Ergodic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Variance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 correlogram, covariance of sum of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Example based on air passenger series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

5 Seasonal Variation 47
5.1 Method of Simple Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Ratio-to- Trend Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Ratio-to-Moving Average Method and Link Relative Method . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Link relative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.5 Cyclical and Random Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5.1 Example of Cyclical Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Random Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6.1 Example of Random Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6.2 Deseasonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Variate Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7.1 Example 1: Monthly Sales Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7.2 Example 2: Daily Temperature Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6 Index Numbers and Their Definitions 77


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Fixed-based Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.1 Example of Fixed-based Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.2 R Code for Fixed-based Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Chain-based Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.1 Example of Chain-based Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 R Code for Chain-based Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Uses of Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7 Methods of Constructing Index Numbers 79


7.1 Unweighted Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Weighted Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.3 Comparison of Laspeyres’ and Paasche’s Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.4 Weighted average of relatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.5 The Chain Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.6 Chain Index Numbers: Merits, Demerits, and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.7 Base shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.8 Splicing of Two Series of Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.9 Deflating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.10 Optimum Tests for Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.11 Cost of Living Index Numbers (Consumer Price Index Numbers) . . . . . . . . . . . . . . . . . . . . . 97
7.12 Methods for Construction of Cost of Living Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . 98
7.13 Possible Errors in Construction of Cost of Living Index Numbers . . . . . . . . . . . . . . . . . . . . . 100

8 Forecasting Strategies 101


8.1 Leading variables and associated variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.2 Marine Coatings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Building Approvals Publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.4 Cross-Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.5 Cross-Correlation between Building Approvals and Activity . . . . . . . . . . . . . . . . . . . . . . . . 104
8.6 Bass Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.6.1 Model Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.6.2 Interpretation of the Bass Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.6.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.6.4 Fitting the Bass Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.6.5 Parameter Ranges for Different Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.6.6 Extensions and Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.7 Exponential Smoothing and Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

3 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

8.7.1 Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


8.7.2 R Code for Single Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.8 Holt-Winters Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.8.1 Holt-Winters Additive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.8.2 Holt-Winters Multiplicative Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.8.3 R Code for Holt-Winters Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.9 Plotting and Visualizing Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.10 Choosing the Right Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

9 Basic Stochastic Models 113


9.1 White Noise, Random Walks, Fitted models & diagnostic plots . . . . . . . . . . . . . . . . . . . . . . 113
9.2 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.2.1 stationary and non-stationary Autoregressive process . . . . . . . . . . . . . . . . . . . . . . . . 113

10 Time series Regression and Exploratory Data Analysis 114


10.1 Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.2 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.3 generalized least square method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.4 linear models with seasonal variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.5 Harmonic seasonal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.6 logarithmic transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

11 Linear Models 115


11.1 Moving Average models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.2 Fitted MA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.2.1 Autoregressive Moving Average Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.3 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.4 Autocorrelation and Partial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.5 Forecasting & Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.6 Non-stationary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.6.1 Building non-seasonal ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11.6.2 ARCH Models & GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 1
Chapter - 1
A time series is a sequence of statistical data organized according to the time of occurrence or in chronological order.
The numerical data collected at various points in time, forming a set of observations, is referred to as a time series. In
time series analysis, current data within a series can be compared with past data from the same series. Additionally,
the progression of two or more series over time can be compared. These comparisons can provide valuable insights
for individual businesses. Time series analysis is crucial in fields such as economics, statistics, and commerce.

1 Time Series Data


A time series consists of observations made at specific time intervals and arranged in chronological order. For example,
tracking agricultural production, sales, or National Income over a span of 3 to 5 years constitutes a time series. It
is essentially a sequence of quantitative readings recorded at regular intervals, which could be hourly, daily, weekly,
monthly, or annually. Examples of time series include hourly temperature readings, daily shop sales, weekly market
sales, monthly production figures, yearly agricultural outputs, and population growth over ten years. Analyzing a
time series involves comparing past data with current data to forecast future trends and evaluate past performance.
The focus of time series analysis is on understanding chronological variations. Key requirements for a time series
are:

• The time intervals between observations should be as consistent as possible.


• The dataset must be homogeneous.
• Data should be collected over an extended period.

Symbolically, if t represents time and yt denotes the value at time t, then the paired values (t, yt ) constitute the time
series data. Ex 1: Production of rice in Karnataka for the period from 2010-11 to 2016-17.

Table 1: Production of rice in Karnataka (in ‘000 metric tons)


Year Production
2010-11 800
2011-12 950
2012-13 870
2013-14 920
2014-15 860
2016-17 720

1.1 Purpose
Time series analysis is crucial for understanding historical data and forecasting future trends, which aids managers
and policymakers in making informed decisions. By quantifying key features and random variations in data, time
series methods have become widely applicable across government, industry, and commerce, especially with advances
in computing power. The Kyoto Protocol, an amendment to the United Nations Framework Convention on Climate
Change, was signed in December 1997 and came into effect on February 16, 2005. The rationale for reducing
greenhouse gas emissions involves a blend of scientific data, economic considerations, and time series analysis. The
decisions made in the coming years will have significant implications for the planet’s future.
In 2006, Singapore Airlines expanded its fleet by ordering twenty Boeing 787-9s and expressing intent to purchase
twenty-nine Airbus planes, including twenty A350s and nine A380s (superjumbos). This expansion was guided by
time series analysis of passenger trends and strategic corporate planning to maintain or enhance market share. Time

5 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

series methods are also employed in everyday operational decisions. For instance, UK gas suppliers must place orders
for offshore gas one day in advance. The variation from the seasonal average is influenced by temperature and, to
a lesser extent, wind speed. Time series analysis helps forecast demand by adjusting the seasonal average with
one-day-ahead weather forecasts.
Additionally, time series models underpin many computer simulations. Examples include evaluating inventory control
strategies using simulated demand series, comparing wave power device designs with simulated sea states, and
simulating daily rainfall to assess the long-term environmental impacts of proposed water management policies.

1.2 Time series


In many fields, including science, engineering, and commerce, variables are measured sequentially over time. For
instance, reserve banks track daily interest and exchange rates, governments report annual GDP figures, and meteo-
rological offices log rainfall at various locations. When data are collected at regular intervals, they form a time series.
A historical time series is created from observations recorded at fixed intervals. In this context, time series are often
treated as realizations of sequences of random variables, known as discrete-time stochastic processes or time series
models. Our focus will be on applying these models using R to fit data and perform analysis.

Time series data often exhibit trends and seasonal variations that can be modeled mathematically. Additionally,
observations close in time are typically correlated. Time series analysis aims to explain this correlation and other data
features using statistical models. Once a model is fitted, it can be used to forecast future values, conduct statistical
tests, and summarize the main characteristics of the data, aiding decision-making.Sampling intervals impact data
quality. Aggregated data, like daily tourist arrivals, or sampled data, such as daily stock prices, need appropriate
intervals to accurately reflect the original signal. In high-frequency trading or signal processing, continuous signals
are sampled at very high rates to create time series for detailed analysis.

1.2.1 Uses of Time series


The analysis of time series is of great significance not only to economists and business people but also to scientists,
astronomers, geologists, sociologists, biologists, and researchers. This is due to the following reasons:

• It helps in understanding past behavior.


• It assists in planning future operations.
• It aids in evaluating current accomplishments.
• It facilitates comparison.

1.2.2 Plots
Visualizing time series data is crucial for identifying patterns and trends. Common types of plots include:

• Line Plot: Displays data points connected by lines to show changes over time. Useful for identifying trends
and seasonal patterns.
• Scatter Plot: Plots individual data points to observe the relationship between two variables or to identify
patterns and outliers.
• Bar Plot: Represents data with bars, helpful for comparing discrete time periods or categories.

• Histogram: Shows the distribution of data over specified intervals, useful for understanding the frequency of
values.
• Box Plot: Displays the distribution of data based on quartiles, highlighting median, and potential outliers.

6 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

1.2.3 Trends
Trends refer to the long-term movement or direction in the data over a period. Identifying trends helps in under-
standing the overall pattern:

• Upward Trend: Indicates a general increase in values over time.


• Downward Trend: Shows a general decrease in values.

• Stationary Trend: The data fluctuates around a constant mean without a long-term trend.

1.2.4 Seasonal Variation


Variations refer to the deviations from the trend and can be categorized into:

• Seasonal Variations: Regular patterns that repeat at consistent intervals, such as monthly or quarterly.
• Cyclical Variations: Fluctuations that occur over longer periods, influenced by economic or business cycles.
• Irregular Variations: Unpredictable changes due to unforeseen events or anomalies that do not follow a
pattern.

Understanding these components allows for effective analysis and forecasting of time series data.

1.3 Decomposition of Series


1.3.1 Notation
The analysis so far has focused on plotting data to identify features such as trends and seasonal variations. While
this is a crucial first step, the next stage involves fitting time series models. We represent a time series of length n as
{xt : t = 1, . . . , n} = {x1 , x2 , . . . , xn }, where n values are sampled at discrete times t = 1, 2, . . . , n. When the series
length is not essential, we abbreviate it as {xt }.
A time series model is a sequence of random variables, and the observed series is a realization of this model. We
use the same notation for both, with context distinguishing between them. An overline denotes sample means.
X xi
x̄ = (1)
n
The ‘hat’ notation represents a prediction or forecast. For a series {xt : t = 1, . . . , n}, x̂t+k|t denotes a forecast
made at time t for the value at t + k. The number of steps into the future, k, is the lead time. Depending on the
context, x̂t+k|t may refer to either the random variable or its numerical value.

1.3.2 Models
Many time series are dominated by trend and/or seasonal effects. A simple additive decomposition model is given
by:
xt = mt + st + zt (2)
where xt is the observed series, mt is the trend, st is the seasonal effect, and zt is the error term, often a sequence of
correlated random variables with mean zero. Two main approaches for extracting mt and st will be outlined along
with R functions for this.
For cases where the seasonal effect increases with the trend, a multiplicative model may be more suitable:

xt = mt · st + zt (3)

Alternatively, an additive decomposition for log(xt ) can be used:

log(xt ) = mt + st + zt (4)

7 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Care is needed when transforming back to xt from log(xt ) to avoid bias. If zt is normally distributed with mean
0 and variance σ 2 , the predicted mean value is:
1 2
x̂t = emt +st + 2 σ (5)

For non-normal distributions, bias correction may lead to overcorrection, requiring an empirical adjustment. This is
critical, for instance, in financial forecasts, where underestimating mean costs is a common issue.

1.3.3 Estimating trends and seasonal effects


A simple way to estimate the trend mt is by calculating a moving average centered on xt . A moving average smooths
the time series by averaging a specified number of values around each xt , except for the first and last few terms. For
monthly data, the moving average spans 12 months. Since the average of t = 1 (January) to t = 12 (December) falls
between June and July (i.e., t = 6.5), we average two consecutive moving averages to center the result at t = 7. The
centered moving average for mt is given by:
 
1 1 1
m̂t = xt−6 + xt−5 + · · · + xt+5 + xt+6 (6)
12 2 2

where t = 7, . . . , n − 6. The coefficients sum to 1, ensuring equal weight for each value. This method generalizes
to other seasonal frequencies (e.g., quarterly) by maintaining the condition that coefficients sum to unity.
The seasonal effect ŝt can be estimated by subtracting the trend:

ŝt = xt − m̂t (7)

Averaging the monthly estimates across all years provides a single estimate of the effect for each month. To ensure
the seasonal effects sum to zero, they are adjusted by subtracting the mean. For multiplicative models, the estimate
becomes:
xt
ŝt = (8)
m̂t
and multiplicative factors are adjusted to average to 1. Seasonally adjusted data, often used in economic indicators,
removes seasonal effects. If the seasonal effect is additive, the adjusted series is xt − s̄t , and if multiplicative, it is
xt /s̄t , where s̄t is the mean seasonal adjustment for the given time.

1.3.4 Smoothing
The centred moving average is a smoothing procedure applied retrospectively to identify an underlying trend in a
time series. It uses points before and after the target time, often leaving some missing values at the series’ start
and end unless adapted for edge points. Another smoothing method in R is ‘stl‘, which uses locally weighted regres-
sion (loess). This local regression considers a small number of points around the target time, weighted to reduce the
influence of outliers, making it a robust regression. While straightforward in principle, the details of ‘stl‘ are complex.

Unlike smoothing, which does not provide a forecast model, fitting a linear trend has the advantage of enabling
extrapolation. The term ”filtering” is also used in this context, particularly in engineering, to describe obtaining the
best estimate of a variable based on past and current noisy measurements. Filtering is vital in control algorithms,
such as those used by the Huygens probe during its 2005 landing on Titan.

1.3.5 Decomposition in R
In R, the function ‘decompose‘ estimates trends and seasonal effects using a moving average. Nesting it within
‘plot‘ (e.g., ‘plot(stl())‘) produces a figure showing the original series xt , and decomposed series mt , st , and zt . For
example, additive and multiplicative decomposition plots for electricity data are created by the following commands,
with the seasonal effect superimposed on the trend using ‘lty‘ for line types.

8 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 1: Electricity production data: trend with superimposed multiplicative seasonal effects.

1 # Decomposition of the time series


2 plot ( decompose ( Elec . ts ) )
3
4 # Multip licative decomposition
5 Elec . decom <- decompose ( Elec . ts , type = " mult " )
6 plot ( Elec . decom )
7
8 # Extracting the trend and seasonal components
9 Trend <- Elec . decom $ trend
10 Seasonal <- Elec . decom $ seasonal
11
12 # Plotting the trend and the product of trend and seasonal effect
13 ts . plot ( cbind ( Trend , Trend * Seasonal ) , lty = 1:2)
Listing 1: Decomposition of Time Series in R

A multiplicative model is often more suitable than an additive one when the variance of the series and trend
increase over time. However, if the random component zt also shows increasing variance, a log-transformation (Eq.
1.4) may be more appropriate.

9 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 2: Decomposition of the electricity production data.

The random series from ‘decompose‘ is not the true realization of zt , but an estimate derived from the trend and
seasonal components, treated as a residual error series, yet used as a realisation of the random process.

10 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 1
Chapter - 2

2 Characteristics of Time Series


A time series is a collection of data points indexed in time order, usually spaced at uniform intervals. It captures
observations at successive points in time, and this ordering makes time series distinct from other types of data because
time series analysis is inherently about trends, patterns, and dependencies across time. To effectively analyze time
series data, it is important to understand its core characteristics. These characteristics form the basis for any
meaningful analysis, forecasting, or modeling.

2.1 Introduction and Examples


Time series is a sequence of data points collected or recorded at regular time intervals. It is used in various do-
mains such as economics, finance, meteorology, medicine, and engineering to analyze trends, patterns, and seasonal
variations. Unlike cross-sectional data, time series data captures the dynamics and changes over time, allowing for
forecasting and insight extraction from historical patterns.
Common examples of time series data include:
• Stock market prices recorded every minute.
• Daily temperature recordings in a city.

• Monthly sales data for a retail store.


• Quarterly GDP of a country.
• Yearly rainfall data in a specific region.

In R, we can visualize a simple time series data using the AirPassengers dataset:
1 # Example in R :
2 data ( " AirPassengers " )
3 plot ( AirPassengers , main = " AirPassengers Dataset " ,
4 ylab = " Number of Passengers " , xlab = " Year " )
Listing 2: Example in R

2.2 Objectives and Nature of Time Series


The main objectives of time series analysis include:

• Understanding the underlying patterns in the data.


• Identifying components such as trend, seasonality, and cyclic behaviors.
• Modeling the time series to predict future values.
• Detecting anomalies or unexpected changes.

• Smoothing to eliminate noise and reveal important patterns.


The nature of time series can be categorized into:
• Trend Component (T): A long-term increase or decrease in the data. For example, the overall upward
movement in stock market prices over several years.

11 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

• Seasonal Component (S): Regular fluctuations that repeat over a specific period, such as monthly sales
peaking every December.
• Cyclic Component (C): Recurrent but non-periodic fluctuations often linked to economic cycles.

• Irregular Component (I): Random variations that do not follow a pattern.


These components can be combined using the following additive or multiplicative models:

Y (t) = T (t) + S(t) + C(t) + I(t) (9)

or
Y (t) = T (t) × S(t) × C(t) × I(t) (10)
In R, we can decompose a time series to analyze these components:
1 # Decomposition Example in R :
2 decomposed <- decompose ( AirPassengers )
3 plot ( decomposed )
Listing 3: Decomposition Example in R

2.3 Introduction to Time Series Databases and Applications


Time series databases (TSDB) are optimized for storing and querying time series data. Unlike traditional relational
databases, TSDBs are built to efficiently handle timestamped data, which makes them ideal for applications that
require storing large volumes of time-dependent data with high write-throughput and quick retrieval.
Some popular time series databases include:
• InfluxDB: InfluxDB is an open-source time series database designed specifically for storing and managing time
series data such as metrics, events, and analytics. It is widely used in the Internet of Things (IoT), DevOps,
real-time analytics, and monitoring applications. InfluxDB is part of the InfluxData stack, which includes
Telegraf (a plugin-based collector of metrics), Chronograf (a visualization tool), and Kapacitor (an alerting
and processing tool).
InfluxDB uses a simple and flexible query language called InfluxQL, which is similar to SQL but optimized for
time series operations like aggregations, time windowing, and transformations. It also supports the newer Flux
language, which offers more powerful queries. InfluxDB is known for its high write throughput, enabling it to
handle millions of writes per second, which makes it ideal for real-time data collection and analysis.
InfluxDB stores data in a compressed time series format and offers a retention policy feature, where users can
automatically delete old data to manage storage effectively. It also supports downsampling, which helps reduce
storage costs by keeping high-resolution data for shorter periods while retaining summarized, lower-resolution
data for longer periods.
InfluxDB uses the InfluxQL or Flux query languages. Below is an example using InfluxQL.
1 # Storing time series data in InfluxDB
2 INSERT temperature , location = room1 value =72.5 1627550220
3 INSERT temperature , location = room2 value =75.3 1627550280
Listing 4: Storing Time Series Data in InfluxDB

1 # Querying time series data in InfluxDB


2 SELECT mean ( " value " ) FROM " temperature "
3 WHERE time >= ' 2021 -07 -29 T00 :00:00 Z ' AND time < ' 2021 -07 -30 T00 :00:00 Z '
4 GROUP BY time (1 h ) , " location " fill ( none )
Listing 5: Querying Time Series Data in InfluxDB

Advantages of InfluxDB:

12 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

– High Performance: InfluxDB is optimized for fast writes, making it ideal for use cases where data is
collected frequently, such as IoT applications or system monitoring. Its write throughput is among the
best in class for time series databases.
– Retention Policies: Users can define retention policies to automatically expire older data, thus managing
storage costs efficiently. This is particularly useful in environments where data grows exponentially over
time.
– Schema-Free: InfluxDB is schemaless, meaning that data can be written with any fields and tags, making
it flexible to adapt to new use cases and metrics without predefined structures.
– Integrations: InfluxDB integrates easily with other tools such as Grafana for visualization, Telegraf for
data collection, and Kapacitor for alerting and data processing.
Cons of InfluxDB:
– Query Complexity: While InfluxQL is simple for basic queries, more complex queries involving joins
or transformations might be challenging. Flux, the newer query language, addresses these issues but
introduces a learning curve.
– Scaling Issues: Scaling InfluxDB horizontally (i.e., across multiple nodes) can be challenging. The
enterprise version of InfluxDB offers clustering, but the open-source version does not, making scaling
limited for high-availability deployments.
– Storage Costs: Although InfluxDB offers compression, the storage requirements for high-frequency data
can still be substantial, especially in long-term retention scenarios.

• TimescaleDB: TimescaleDB is an open-source time series database built as an extension to PostgreSQL. By


leveraging the mature and robust PostgreSQL ecosystem, TimescaleDB inherits features like ACID compliance,
powerful indexing, relational joins, and SQL support, making it a reliable option for time series applications
that also require relational data.
TimescaleDB splits large datasets into ”chunks” based on time intervals, which allows for efficient time-series
queries. This method also makes TimescaleDB horizontally scalable while keeping storage costs down. Ad-
ditionally, TimescaleDB supports native compression, significantly reducing the storage footprint for large
datasets.
One of the strengths of TimescaleDB is its seamless integration with the broader PostgreSQL ecosystem,
which includes extensions, tools, and libraries. This makes it an attractive choice for users who already have
a PostgreSQL setup and are looking to incorporate time series capabilities into their existing infrastructure
without migrating to a new system.
TimescaleDB uses standard SQL, with time series data stored in hypertables.
1 -- Creating a hypertable in TimescaleDB
2 CREATE TABLE temperature (
3 time TIMESTAMPTZ NOT NULL ,
4 location TEXT NOT NULL ,
5 temperature DOUBLE PRECISION NOT NULL
6 );
7
8 -- Convert the table to a hypertable
9 SELECT c r e a t e _ h y p e r t a b l e ( ' temperature ' , ' time ') ;
10
11 -- Inserting data into TimescaleDB
12 INSERT INTO temperature ( time , location , temperature )
13 VALUES ( NOW () , ' room1 ' , 72.5) , ( NOW () , ' room2 ' , 75.3) ;
Listing 6: Storing Time Series Data in TimescaleDB

1 -- Querying the average temperature by hour


2 SELECT time_bucket ( '1 hour ' , time ) AS bucket , location , avg ( temperature )
3 FROM temperature
4 WHERE time > NOW () - interval ' 24 hours '
5 GROUP BY bucket , location

13 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

6 ORDER BY bucket DESC ;


Listing 7: Querying Time Series Data in TimescaleDB

Advantages of TimescaleDB:
– PostgreSQL Ecosystem: Since TimescaleDB is an extension of PostgreSQL, it benefits from the sta-
bility, reliability, and community support of PostgreSQL. This includes support for advanced indexing,
relational joins, transactions, and other features common to relational databases.
– SQL Support: TimescaleDB supports standard SQL queries, making it easy for developers familiar with
SQL to work with time series data. This reduces the learning curve compared to other TSDBs that use
custom query languages.
– Efficient Time Series Storage: TimescaleDB automatically partitions data into chunks based on time
intervals, which improves query performance. It also supports data compression, making it highly efficient
for storing large datasets.
– Scalability: TimescaleDB provides built-in tools for scaling horizontally, allowing it to handle large time
series datasets across distributed environments.
Cons of TimescaleDB:
– Limited for Extreme Real-Time Use Cases: While TimescaleDB performs well for most time series
applications, it may not be as optimized for extreme high-frequency, real-time applications as InfluxDB
or Prometheus.
– Complexity with Large Joins: Although relational joins are a strength of TimescaleDB, performing
large-scale joins on massive datasets can lead to performance issues, particularly for real-time queries.
– Enterprise Features: Some advanced features, like continuous aggregation and advanced compression,
are part of TimescaleDB’s enterprise offering, which can be a limitation for users relying only on the
open-source version.
• Prometheus: Prometheus is a highly popular, open-source monitoring and alerting toolkit designed specifically
for cloud-native environments. It was developed as part of the Cloud Native Computing Foundation and is
often used in conjunction with Kubernetes for monitoring application performance, infrastructure metrics, and
other system behaviors.
Prometheus works by scraping metrics from instrumented services at regular intervals, storing them as time
series data. It supports multi-dimensional data collection using labels, which are key-value pairs attached to
the metrics. Prometheus uses its own query language called PromQL (Prometheus Query Language), which
is specifically designed for aggregating and filtering time series data. Its alerting mechanism is flexible and
integrates easily with various notification systems like PagerDuty, Slack, and email.
One of the primary use cases for Prometheus is in monitoring cloud infrastructure, where it excels at tracking
the performance of servers, containers, and microservices. The system is designed to be lightweight and works
well in environments where quick real-time insights and monitoring are critical.
Prometheus uses PromQL for querying, and data is scraped from instrumented services.
1 # An example of Prometheus scrape configuration
2 scra pe_confi gs :
3 - job_name : ' node_exporter '
4 stat ic_confi gs :
5 - targets : [ ' localhost :9100 ']
Listing 8: Scraping Time Series Data in Prometheus

1 # Querying time series data in Prometheus using PromQL


2 avg_over_time ( n o d e _ c p u _ s e c o n d s _ t o t a l [5 m ])
Listing 9: Querying Time Series Data in Prometheus

Advantages of Prometheus:
14 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

– Cloud-Native Friendly: Prometheus is designed to work seamlessly in dynamic cloud-native environ-


ments, particularly with Kubernetes. It is well-suited for containerized environments where services come
and go frequently.
– Highly Scalable: Prometheus is built for large-scale, distributed environments. Its pull-based metrics
collection makes it efficient for monitoring hundreds or thousands of services.
– Alerting System: Prometheus has a powerful alerting system that allows users to define alerting rules
based on metric thresholds. It integrates easily with notification tools, enabling quick responses to system
failures or abnormal metrics.
– Multi-Dimensional Data Collection: Prometheus allows users to attach labels to their metrics, making
it easy to filter and aggregate data along different dimensions, such as by service, data center, or cluster.
Cons of Prometheus:
– Limited Long-Term Storage: Prometheus is not designed for long-term data retention. While it excels
at real-time monitoring, users often need to integrate it with external databases like Thanos or Cortex to
store time series data for long-term historical analysis.
– No Built-In Clustering: Prometheus does not support clustering in its native form, which can be a
limitation for users requiring high availability and fault tolerance without external dependencies.
– Query Language Complexity: PromQL, the query language used by Prometheus, can be complex for
new users, particularly for those used to SQL or other more common query languages. Learning to write
efficient queries in PromQL can take time.
Applications of time series databases include:
• IoT Data Storage: Time series databases are commonly used in IoT devices to store sensor data, such as
temperature readings, GPS data, or humidity levels.
• Financial Market Analysis: TSDBs handle high-frequency trading data, storing stock prices, trading vol-
umes, and other financial indicators over time.
• DevOps Monitoring: Tracking system performance metrics like CPU usage, memory consumption, and
network bandwidth usage in real-time.
An R example to work with a simple time series dataset:
1 # Simulate time series data and store it
2 time _ series _ data <- ts ( rnorm (100) , frequency =12 , start = c (2020 , 1) )
3 plot ( time _ series _ data , main = " Simulated Time Series Data " ,
4 ylab = " Values " , xlab = " Time " )
Listing 10: Simulate Time Series Data in R

2.4 Measures of Dependence


2.4.1 Introduction to Measures of Dependence
In time series analysis, measures of dependence refer to the statistical relationships between observations in a time
series dataset, especially over time lags. Understanding these dependencies is crucial because they indicate whether
and how past values influence future values. A time series is dependent when the values at different points in time
are not independent but rather exhibit a relationship that we can measure and analyze.
One of the primary measures of dependence in time series is the autocorrelation function (ACF). This function
helps determine how observations at different time points are correlated with one another. The ACF for a time series
{Xt } at lag k is given by:
Cov(Xt , Xt+k )
ρ(k) = p
Var(Xt ) · Var(Xt+k )
where Cov denotes covariance, and Var represents the variance of the time series.
The partial autocorrelation function (PACF) is another measure of dependence, which describes the relationship
between an observation and its lagged values, excluding the influence of intermediate lags. This is particularly useful
in autoregressive (AR) models, where PACF helps determine the order of the model.
15 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

2.4.2 Example Problem: Estimating Autocorrelation


Consider a simple time series dataset where we observe daily stock prices for 10 days:

X = {120, 122, 119, 118, 121, 123, 124, 125, 126, 128}.

We want to calculate the autocorrelation at lag 1 and lag 2.

Step 1: Compute the mean of the series. First, compute the mean X̄ of the series:
120 + 122 + 119 + 118 + 121 + 123 + 124 + 125 + 126 + 128 1226
X̄ = = = 122.6.
10 10

Step 2: Define the autocorrelation formula. The formula for autocorrelation at lag k is:
Pn−k
t=1 (Xt − X̄)(Xt+k − X̄)
ρ(k) = Pn 2
,
t=1 (Xt − X̄)

where:
• Xt is the value at time t,
• X̄ is the mean of the time series,
• k is the lag,
• n is the total number of observations.

Step 3: Calculate the denominator for all lags. The denominator for both lag 1 and lag 2 is the same:
n
X
(Xt − X̄)2 = (120 − 122.6)2 + (122 − 122.6)2 + (119 − 122.6)2 + . . . + (128 − 122.6)2 .
t=1

Substitute the values:

= (−2.6)2 + (−0.6)2 + (−3.6)2 + (−4.6)2 + (−1.6)2 + (0.4)2 + (1.4)2 + (2.4)2 + (3.4)2 + (5.4)2 .

= 6.76 + 0.36 + 12.96 + 21.16 + 2.56 + 0.16 + 1.96 + 5.76 + 11.56 + 29.16 = 92.4.
Thus, the denominator is 92.4.

Step 4: Calculate the numerator for lag 1. Now, compute the numerator for lag 1, which is:
n−1
X
(Xt − X̄)(Xt+1 − X̄) = (120 − 122.6)(122 − 122.6) + (122 − 122.6)(119 − 122.6) + . . . + (126 − 122.6)(128 − 122.6).
t=1

Substitute the values:

= (−2.6)(−0.6)+(−0.6)(−3.6)+(−3.6)(−4.6)+(−4.6)(−1.6)+(−1.6)(0.4)+(0.4)(1.4)+(1.4)(2.4)+(2.4)(3.4)+(3.4)(5.4).

= 1.56 + 2.16 + 16.56 + 7.36 + (−0.64) + 0.56 + 3.36 + 8.16 + 18.36 = 57.68.

Step 5: Calculate autocorrelation at lag 1. Now that we have the numerator and denominator, calculate the
autocorrelation:
57.68
ρ(1) = ≈ 0.624.
92.4
Thus, the autocorrelation at lag 1 is approximately 0.624, indicating a moderate positive correlation between con-
secutive values.
16 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Step 6: Calculate the numerator for lag 2. For lag 2, compute the numerator:
n−2
X
(Xt − X̄)(Xt+2 − X̄) = (120 − 122.6)(119 − 122.6) + (122 − 122.6)(118 − 122.6) + . . . + (125 − 122.6)(128 − 122.6).
t=1

Substitute the values:

= (−2.6)(−3.6) + (−0.6)(−4.6) + (−3.6)(−1.6) + (−4.6)(0.4) + (−1.6)(1.4) + (0.4)(2.4) + (1.4)(3.4) + (2.4)(5.4).

= 9.36 + 2.76 + 5.76 + (−1.84) + (−2.24) + 0.96 + 4.76 + 12.96 = 32.48.

Step 7: Calculate autocorrelation at lag 2. Finally, calculate the autocorrelation at lag 2:


32.48
ρ(2) = ≈ 0.352.
92.4
Thus, the autocorrelation at lag 2 is approximately 0.352, indicating a weaker positive correlation between values
that are two time steps apart.

Conclusion From these calculations, we see that the autocorrelation at lag 1 is higher (0.624) compared to lag
2 (0.352). This suggests that consecutive stock prices are more closely related than prices separated by two days,
which is a typical observation in time series where immediate past values have a stronger influence on the present.

2.5 Stationary Time Series


2.5.1 Definition and Importance of Stationarity
A stationary time series is one whose statistical properties such as mean, variance, and autocorrelation are constant
over time. Stationarity is important because many time series models, such as ARMA, ARIMA, or GARCH, assume
that the data is stationary. If the data is not stationary, the model’s performance can suffer, leading to poor forecasts.
Formally, a time series {Xt } is stationary if for all time points t, the following conditions hold:
• E(Xt ) = µ (constant mean)
• Var(Xt ) = σ 2 (constant variance)

• Cov(Xt , Xt+k ) = γ(k) (constant autocovariance that depends only on lag k)

2.5.2 Features of Stationary Time Series


Key characteristics of a stationary series include:
• The series fluctuates around a constant mean.
• There is no long-term trend.

• The autocorrelation function decreases quickly as the lag increases.

2.5.3 R Example: Plotting a Stationary Series


We can generate and plot a stationary time series in R using the following code:
1 set . seed (123)
2 stationary _ series <- ts ( rnorm (100) , frequency =12)
3 plot ( stationary _ series , main = " Stationary Time Series " , ylab = " Values " , xlab = " Time " )
Listing 11: Generating a Stationary Time Series

17 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 3: Stationary Time Series

The plot of the stationary series will show random fluctuations around a constant mean with no visible trend.

2.6 Estimation of Correlation


2.6.1 Definition of Correlation
Correlation is a measure of the strength and direction of the linear relationship between two variables. In the context
of time series, correlation helps identify how strongly values at one time point relate to values at a later time point
(lagged values). Estimating correlation is crucial for understanding the underlying patterns in the data, such as
seasonality or trends.
The Pearson correlation coefficient is given by:
Pn
(Xt − X̄)(Yt − Ȳ )
r = qP t=1
n 2
Pn 2
t=1 (Xt − X̄) t=1 (Yt − Ȳ )

In time series analysis, we often compute autocorrelation or the correlation between values at different time lags.
Estimating the autocorrelation function (ACF) helps us determine whether previous values have predictive power
for future values.

2.6.2 Proof of Correlation for Time Series


For a time series {Xt }, the autocorrelation at lag k is:
Pn
(Xt − X̄)(Xt−k − X̄)
ρ(k) = t=k+1 Pn 2
t=1 (Xt − X̄)

This equation calculates the correlation between observations separated by k time steps. As k increases, ρ(k) typically
decreases, reflecting the diminishing influence of earlier observations on future values.

2.7 Vector-Valued and Multi-Dimensional Series


2.7.1 Definition and Importance
A vector-valued time series consists of multiple time series observed together. These can be considered multidimen-
sional or multivariate, where each dimension represents a different but related time series. Analyzing such series is
18 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

important in fields like economics (e.g., analyzing stock prices for multiple companies) and environmental science
(e.g., temperature, humidity, and wind speed together).
Multidimensional time series analysis focuses on understanding the relationships between these multiple series
and how they jointly evolve over time. The vector autoregressive (VAR) model is a common model used for such
series.

2.7.2 Example: Vector-Valued Series in R


We can create and analyze a vector-valued time series in R using the following code:
1 # Simulate two related time series
2 set . seed (123)
3 ts1 <- ts ( rnorm (100) , frequency =12)
4 ts2 <- ts ( rnorm (100 , mean =2) , frequency =12)
5
6 # Combine them into a multivariate time series
7 multi _ series <- ts ( cbind ( ts1 , ts2 ) , frequency =12)
8
9 # Plot the multivariate series
10 plot ( multi _ series , main = " Vector - Valued Time Series " , col = c ( " blue " , " red " ) , lty =1:2)
11 legend ( " topright " , legend = c ( " Series 1 " , " Series 2 " ) , col = c ( " blue " , " red " ) , lty =1:2)
Listing 12: Creating and Plotting a Multidimensional Time Series

Equation for Multivariate Model: In a multivariate time series model, each variable depends on its own past
values and the past values of other variables. The vector autoregressive (VAR) model for two time series Xt and Yt
is given by:
Xt = α1 Xt−1 + β1 Yt−1 + ϵ1t
Yt = α2 Xt−1 + β2 Yt−1 + ϵ2t
where α1 , α2 , β1 , β2 are coefficients and ϵ1t , ϵ2t are error terms.
This type of modeling is crucial in understanding how multiple series interact over time.

19 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 1
Chapter - 3 & 4

3 Components of Time Series


article amsmath
The characteristics of a time series are defined by the various types of movements or fluctuations that occur over
time. These movements, known as the components of a time series, help explain the underlying patterns in the data.
There are four main components:

1. Secular Trend (T)


The Secular Trend, also known as the long-term trend or simply trend, refers to the general movement of data,
either upward or downward, over an extended period. It captures the long-term tendency of a dataset to grow or
decline, ignoring short-term fluctuations.
For example, the population of India shows a clear upward trend over the years, while the death rate after
independence has steadily declined due to improvements in literacy and healthcare. It’s important to note that what
constitutes a ”long period” depends on the context of the data. For instance, an increase in cloth store sales over
one year (e.g., from 1996 to 1997) is too short a period to be considered a secular trend.
However, in certain cases, a shorter time frame can reflect a trend if the nature of the data allows. For example,
in a bacterial culture exposed to germicide, counting the number of organisms still alive every 10 seconds over 5
minutes could reveal a general decline in numbers, which would represent a secular trend over that period.
Mathematically, secular trends are categorized into two types:

1. Linear Trend: A consistent, straight-line increase or decrease over time.


2. Curvi-Linear Trend (Non-Linear Trend): A trend where the rate of change is not constant, resulting in
a curved pattern.

Figure 4: Linear Trend and Non Linear Trend

Example: A consistent rise in global temperatures over decades due to climate change.

20 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

2. Seasonal Variations (S)


Seasonal variations occur in a time series due to rhythmic forces that repeat in a regular and periodic manner
within a period of less than one year. These variations follow the same pattern year after year. The period may be
monthly, weekly, or even hourly, but if data is given in yearly terms, seasonal fluctuations do not exist.
Seasonal fluctuations in a time series arise from two main factors:

1. Natural forces
2. Manmade conventions

The most significant cause of seasonal variations is climate. Changes in weather conditions—such as rainfall,
humidity, and temperature—impact industries and products differently. For example, there is a higher demand for
woolen clothes and hot drinks in winter, while in summer, cotton clothes and cold drinks see increased sales. During
the rainy season, the demand for umbrellas and raincoats rises.
In addition to nature, customs, traditions, and habits also influence seasonal variation. For instance, during
festivals like Diwali, Dussehra, and Christmas, there is an increased demand for sweets and clothes. Similarly, the
start of a school or college year sees a surge in demand for books and stationery.
Example: Higher sales of air conditioners during summer months due to the hot weather.

3. Cyclical Variations (C)


Cyclical movements occur over longer time periods than seasonal variations and typically reflect economic cycles
such as booms and recessions. These cycles generally last for several years and, unlike seasonal variations, they do
not follow a fixed or regular pattern.
Cyclical variations refer to short-term fluctuations lasting more than one year. The rhythmic movements in a
time series that repeat in the same manner over a period longer than one year are called cyclical variations, and the
duration is referred to as a cycle. Time series related to business and economics often exhibit cyclical behavior.
A classic example of cyclical variation is the Business Cycle, which includes four well-defined phases:

1. Boom: This phase is characterized by rapid economic growth, high levels of production, employment, and
rising prices. During the boom period, consumer demand is strong, and businesses expand rapidly. However,
inflationary pressures may also build up, leading to potential overheating of the economy.
Example: The global economy in the late 1990s experienced a boom due to the dot-com bubble, where
technology companies saw rapid growth and expansion.
2. Decline: After the boom, the economy begins to slow down. Production and demand decrease, unemployment
starts to rise, and inflation stabilizes. This phase marks the transition from a peak towards a downturn, signaling
the end of rapid economic expansion.
Example: The early 2000s saw a decline after the burst of the dot-com bubble, where stock prices fell, and
many tech companies collapsed, leading to a slowdown in economic growth.
3. Depression: This is the lowest phase of the cycle, marked by a significant decline in economic activity. There
is high unemployment, reduced consumer spending, lower investment, and overall economic stagnation. It
represents the most severe form of economic contraction.
Example: The Great Depression of the 1930s is a classic example, where global economies shrank, unemploy-
ment reached record levels, and industrial output dropped sharply.
4. Improvement (Recovery): After the depression, the economy begins to recover. Businesses start investing
again, employment rises, and consumer confidence gradually returns. Production and demand start increasing,
marking the beginning of the next upward cycle.
Example: After the Great Recession of 2008, the economy began recovering in 2010, with improved job growth,
increased consumer spending, and steady economic expansion.

21 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

These phases repeat over time, reflecting the fluctuating nature of economic activity.

Figure 5: Phases of Business cycle

Example: Economic cycles with alternating periods of economic expansion and contraction.

4. Irregular Variations (I)


Irregular variations, also known as Erratic, Accidental, or Random Variations, are unpredictable and non-
recurring fluctuations in a time series caused by unexpected events. Unlike trend, seasonal, and cyclical varia-
tions—which are considered regular variations—irregular variations are random and typically short-term, making
them difficult to model or forecast.
These fluctuations are the result of unforeseen circumstances that are beyond human control, such as natural
disasters, wars, pandemics, or other catastrophic events. Irregular variations significantly disrupt a time series but
are not as structurally important as other variations.
Example: The COVID-19 pandemic in 2020 led to severe and unexpected disruptions across global economies,
causing irregular variations in many time series related to employment, GDP, and stock market performance. This
variation could not be predicted and doesn’t follow any consistent or repeating pattern.
Together, these components provide a framework to analyze time series data, enabling better forecasting and
understanding of the underlying patterns.

3.1 Additive and Multiplicative models


In time series analysis, a mathematical model represents the underlying structure of the data. It is assumed that
the time series consists of various components such as trends, seasonal variations, cyclical variations, and irregular
variations. These components together explain the observed value of the time series at any point in time.
The objective of a mathematical model is to decompose a time series into its constituent components in order to
better understand, analyze, and forecast future values. Two widely used models in classical time series analysis are
the Additive Model and the Multiplicative Model.

Why Are Mathematical Models Needed?


Mathematical models are essential in time series analysis for the following reasons:
1. Understanding patterns: By decomposing the time series into its components, we can identify trends,
seasonal behaviors, and cyclical movements that help in understanding the nature of the data.
2. Forecasting: Mathematical models help us forecast future values based on past observations and the relation-
ships among the components.
3. Analyzing irregularities: With the model, irregular or random variations can be separated from systematic
variations, allowing analysts to focus on predictable aspects of the data.

22 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Additive Model
The Additive Model assumes that the different components of a time series combine in an additive manner. That
is, the observed value Yt at time t is the sum of the contributions of the individual components.
Mathematically, the additive model is represented as:

Yt = Tt + St + Ct + It
where:
• Yt is the observed value at time t,
• Tt is the trend component at time t,

• St is the seasonal component at time t,


• Ct is the cyclical component at time t, and
• It is the irregular component at time t.

Derivation of Additive Model


The additive model is useful when the variation in the seasonal and cyclical components remains relatively constant
over time. For example, if sales of ice cream increase by a fixed amount every summer, we can model that seasonal
variation additively.

Example
Consider a time series of monthly sales data for a store over a year. Suppose the trend increases by 5 units per
month, the seasonal effect adds 10 units during the summer months (June, July, and August), and cyclical factors
add or subtract up to 3 units. Then, using the additive model, we can express the sales data Yt for a summer month
as:

Yt = 5t + 10 + Ct + It
where Ct is the cyclical effect and It represents any irregular variations.

Multiplicative Model
The Multiplicative Model assumes that the components of the time series interact in a multiplicative manner.
That is, the observed value Yt at time t is the product of the contributions of the individual components.
Mathematically, the multiplicative model is represented as:

Yt = Tt × St × Ct × It
where the variables Yt , Tt , St , Ct , and It represent the same components as in the additive model.

Derivation of Multiplicative Model


The multiplicative model is useful when the seasonal and cyclical variations are proportional to the level of the trend.
For instance, if sales of ice cream double in the summer but are still dependent on an overall increasing trend, a
multiplicative model would be more appropriate.

23 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example
Consider a company’s quarterly revenue over a few years. If the trend increases by 10% each quarter, and sales are
doubled during the holiday season, the multiplicative model expresses the revenue as:

Yt = Tt × 2 × Ct × It
where Tt represents the 10% growth in each quarter, the factor 2 accounts for the seasonal holiday surge, Ct
captures any cyclical effects, and It represents irregular variations.

Choosing Between Additive and Multiplicative Models


The choice between the additive and multiplicative models depends on the nature of the data:
• Additive Model: Appropriate when the variations are constant over time and do not depend on the trend.
• Multiplicative Model: Suitable when the variations grow or shrink in proportion to the trend.

Conclusion
Both the additive and multiplicative models provide valuable ways to decompose a time series into its underlying
components. By choosing the right model, analysts can gain better insights into trends, seasonal variations, and
cyclical movements, and make more accurate forecasts.

3.2 Resolving components of a Time Series


In time series analysis, resolving the different components is a fundamental task to understand the underlying
patterns. Time series data is usually composed of several components, and the key components include:
• Trend (Tt ): The long-term movement in the data over time.
• Seasonality (St ): Regular patterns that repeat over fixed intervals of time.
• Cyclicality (Ct ): Long-term fluctuations caused by economic cycles.
• Irregularity (It ): Random or unpredictable movements, typically caused by unforeseen factors.
The relationship between these components can be expressed using two main models:
• Additive Model:
Yt = Tt + St + Ct + It

• Multiplicative Model:
Yt = Tt × St × Ct × It

In R, you can resolve components of a time series using built-in functions like ‘decompose()‘ for additive models
or ‘stl()‘ for both additive and multiplicative models. Consider the following example where we decompose the
AirPassengers dataset.
1 ```r
2 # Load AirPassengers dataset
3 data ( AirPassengers )
4
5 # Decompose the time series
6 decomposed _ data <- decompose ( AirPassengers , type = " mul tiplica tive " )
7
8 # Plot decomposed components
9 plot ( decomposed _ data )
Listing 13: Creating and Plotting a Multidimensional Time Series

24 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

3.3 Measuring Trend


The trend component of a time series reflects the long-term movement in the data. Understanding the trend is
crucial for forecasting future values and identifying underlying patterns. There are several methods commonly used
to measure trends:

3.3.1 Graphic
The graphic method, also known as the eye inspection method, is the simplest and most intuitive approach to
identifying trends in time series data. This method involves the following steps:

1. **Plot the Data:** First, plot the given time series data on a graph, with time on the x-axis and the variable
of interest on the y-axis.
2. **Draw a Trend Line:** A smooth, free-hand curve is then drawn through the plotted points, representing the
general tendency of the series. This curve visually highlights the trend over time.

The graphic method effectively removes short-term variations to reveal the underlying trend in the data. The
trend line can also be extended to predict or estimate future values, making it a useful tool for forecasting.

Importance of the Graphic Method


• Provides a visual and intuitive understanding of the trend.
• Easy to implement, requiring no complex calculations.
• Serves as a preliminary tool before applying more sophisticated methods.

Limitations
However, it is important to note that this method is subjective, and the accuracy of the predictions may vary
depending on how the trend line is drawn. As such, while the graphic method is useful for initial analysis, it should
be supplemented with more rigorous statistical techniques for reliable forecasting.

Example
Consider monthly sales data for a retail store over a year:

Month Sales
Jan 100
Feb 120
Mar 140
Apr 160
May 150
Jun 130
Jul 180
Aug 190
Sep 170
Oct 160
Nov 140
Dec 200

Table 2: Monthly Sales Data

In R, the plotting can be done using the following code:

25 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

1 # Define the sales data


2 sales <- c (100 , 120 , 140 , 160 , 150 , 130 , 180 , 190 , 170 , 160 , 140 , 200)
3 months <- 1:12
4
5 # Plot the sales data
6 plot ( months , sales , type = " o " , col = " blue " , xlab = " Month " , ylab = " Sales " )
7
8 # Add a manually drawn trend line ( approximate )
9 lines ( c (1 , 12) , c (100 , 200) , col = " red " , lwd = 2)
Listing 14: Creating and Plotting a Multidimensional Time Series
200
180
160
Sales

140
120
100

2 4 6 8 10 12

Month

Figure 6: Trend from the Data

The red line represents the overall trend in sales. Although the data fluctuates, the general upward direction is
clearly visible.
Advantages:
• Simplicity: The graphic method is one of the simplest approaches to studying trend values and is easy to
implement.
• Expertise Benefits: An experienced statistician can often draw a trend line that better represents the data
than one fitted using mathematical formulas.
• Applicability: Despite not being recommended for beginners, this method has significant merits in the hands
of skilled statisticians and is widely used in practical applications.

Disadvantages:
• Subjectivity: The method is highly subjective; the resulting trend line can vary significantly based on who
draws it.
• Skill Requirements: It requires the work to be conducted by skilled and experienced individuals to ensure
accuracy.
• Reliability Concerns: The subjective nature of this method means that predictions derived from it may not
be reliable.
• Careful Execution: Drawing the trend line must be done carefully to avoid misrepresentation of the data.

26 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

3.3.2 Semi-Averages
The semi-averages method involves dividing the time series data into two equal parts with respect to time. For
instance, if we have data spanning from 1999 to 2016 (a total of 18 years), we would split it into two equal parts:
- The first part: 1999 to 2007 - The second part: 2008 to 2016
In cases where the number of years is odd, such as 9, 13, or 17, the middle year is omitted. For example, for 19
years of data from 1998 to 2016, the division would be:
- The first part: 1998 to 2006 - The second part: 2008 to 2016 (omitting the middle year 2007)
Once the data is divided, we calculate the arithmetic mean for each part, yielding two average values. These
averages are then plotted against the mid-year of each part, and a straight line is drawn to connect the two points.
This line represents the trend, which can be extended to estimate intermediate values or predict future values.

3.3.3 Example
Consider the following production data over several years:

Year Production
2001 40
2002 45
2003 40
2004 42
2005 46
2006 52
2007 56
2008 61

Table 3: Production Data

To calculate the semi-averages:


1. Divide the data: - First part (2001 to 2004): 40, 45, 40, 42 - Second part (2005 to 2008): 46, 52, 56, 61
2. Calculate the averages: - First part average:
40 + 45 + 40 + 42 167
Average1 = = = 41.75
4 4
- Second part average:
46 + 52 + 56 + 61 215
Average2 = = = 53.75
4 4
3. Plotting:
- The averages (41.75 and 53.75) are plotted against the mid-years (2002.5 for the first part and 2006.5 for the second
part). - A straight line is drawn connecting these two points, which represents the trend in production.
This method effectively captures the underlying trend in the data, providing a straightforward approach to trend
analysis. The blue points represent the production data over the years, while the red points indicate the semi-averages
calculated for the two parts. The red line shows the trend derived from these semi-averages.

27 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Semi-Averages Method for Trend Analysis


60

Production Data
Semi-Averages
55
Production

50
45
40

2001 2002 2003 2004 2005 2006 2007 2008

Year

Figure 7: Trend analysis using the semi-averages method.

Advantages:
• Simplicity: This method is easier to understand compared to the moving average method and the method of
least squares.
• Objectivity: It is an objective method for measuring trends; anyone applying this method will arrive at the
same results.
Disadvantages:
• Assumption of Linearity: The method assumes a straight-line relationship between the plotted points,
regardless of whether such a relationship actually exists.
• Data Sensitivity: If additional data is added to the original dataset, the entire calculation must be redone
to obtain new trend values, and the trend line will change accordingly.
• Influence of Extremes: Since the arithmetic mean is calculated for each half, an extreme value in either half
can significantly impact the points. As a result, the trend derived from these points may not be sufficiently
accurate for future forecasting.

3.3.4 Moving Average


The moving average method is a widely used technique for computing trend values in a time series. This method
effectively eliminates short-term and random fluctuations by calculating successive arithmetic means over a specified
period. The period of the moving average is denoted as m, where m represents the number of data points included
in each average.
The moving average is calculated as follows:
• The first average is the mean of the first m terms.
• The second average is the mean of the 2nd term to the (m + 1)th term.
• The third average is the mean of the 3rd term to the (m + 2)th term, and so on.
28 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

When m is odd, the moving average is associated with the mid-value of the time interval it covers. For instance,
if m = 3, the moving average for the first three data points will be placed against the second data point (mid-point).
However, if m is even, the moving average will lie between two middle periods, which do not correspond to any
specific time period. To address this, a secondary calculation is performed by taking the average of the moving
averages (2-yearly moving average) to align the result with a specific time period.
Example: Calculate the 3-yearly moving average for the following data.

Years Production 3-Yearly Moving Average (Trend Values)


2001-02 40
40+45+40
2002-03 45 3 = 41.67
45+40+42
2003-04 40 3 = 42.33
40+42+46
2004-05 42 3 = 42.67
42+46+52
2005-06 46 3 = 46.67
46+52+56
2006-07 52 3 = 51.33
52+56+61
2007-08 56 3 = 56.33
2008-09 61

Table 4: 3-Yearly Moving Average Calculation

Calculation Explanation: - For 2002-03, the moving average is calculated using the production values for
2001-02, 2002-03, and 2003-04:
40 + 45 + 40
Moving Average = = 41.67
3
- For 2003-04, the moving average uses the values for 2002-03, 2003-04, and 2004-05:
45 + 40 + 42
Moving Average = = 42.33
3
This process continues until the last available data point. The moving average method is useful for smoothing out
short-term fluctuations in data, providing a clearer view of the long-term trend. By systematically averaging data
over a specified period, this method facilitates better forecasting and analysis in various fields, including economics,
sales, and environmental studies.
Conclusion: The moving average is a fundamental tool in time series analysis, allowing for a better understand-
ing of underlying trends by reducing noise from random fluctuations.

Calculate the 4-yearly moving average for the following data.

Years Production 4-Yearly Moving Average 2-Yearly Moving Average (Trend Values)
2001-02 40
2002-03 45
40+45+40+42 40+45
2003-04 40 4 = 41.75 2 = 42.5
45+40+42+46 40+42
2004-05 42 4 = 43.15 2 = 41
40+42+46+52 42+46
2005-06 46 4 = 45 2 = 44
42+46+52+56 46+52
2006-07 52 4 = 49 2 = 49
46+52+56+61 52+56
2007-08 56 4 = 53.75 2 = 54
2008-09 61

Table 5: 4-Yearly Moving Average Calculation

Calculation Explanation: - For 2003-04, the 4-yearly moving average is calculated as follows:
40 + 45 + 40 + 42
Moving Average = = 41.75
4
- For 2004-05, the calculation is:
45 + 40 + 42 + 46
Moving Average = = 43.15
4
29 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

- This process continues until the last available data point.


Additional Exercise Problems:

1. Given the following production data over a 5-year period, calculate the 3-yearly moving average. Use the
moving averages to identify the trend:
• 2010: 30
• 2011: 35
• 2012: 50
• 2013: 45
• 2014: 60
2. Consider the following data for sales over 6 years. Calculate the 2-yearly moving average and discuss any
observed trends:
• 2015: 80
• 2016: 90
• 2017: 85
• 2018: 95
• 2019: 100
• 2020: 110
3. A company’s quarterly earnings over two years are as follows. Calculate the 4-quarter moving average and
explain any patterns you find:
• Q1 2018: 200
• Q2 2018: 220
• Q3 2018: 210
• Q4 2018: 250
• Q1 2019: 240
• Q2 2019: 260
• Q3 2019: 280
• Q4 2019: 300

Advantages:
• This method is simple to understand and easy to execute.
• It has flexibility in application; if new data for additional time periods are added, previous calculations remain
unaffected, allowing for the generation of more trend values.
• It provides an accurate representation of the long-term trend, particularly if the trend is linear.
• When the period of the moving average coincides with the period of oscillation (cycle), periodic fluctuations
are effectively eliminated.
• The moving average adapts to general movements in the data, with its shape determined by the actual data
rather than arbitrary choices made by the statistician.
• It is effective for smoothing out short-term fluctuations, allowing for clearer visibility of long-term trends.
• The moving average can be easily visualized on a graph, making it a useful tool for presentations and reports.
Disadvantages:
30 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

• For a moving average of 2m + 1, no trend values are generated for the first m and last m periods, limiting the
analysis of the entire dataset.
• The trend path does not correspond to any specific mathematical function, making it unsuitable for forecasting
or predicting future values.
• If the underlying trend is not linear, moving averages may not accurately reflect the true tendency of the data.
• The selection of the period for the moving average can be subjective, potentially introducing human bias into
the analysis.
• Moving averages can lag behind actual data changes, which may lead to delays in identifying trends.
• In cases of sudden shifts or changes in the data, moving averages may provide a misleading representation of
the trend, as they are based on historical data.
• The smoothing effect of moving averages can sometimes obscure important fluctuations that may need to be
addressed.

3.3.5 Method of Least Squares


This method is widely used in practice. It is a mathematical approach that fits a trend line to the data, satisfying
the following two conditions:
P
1. (Y − Ŷ ) = 0
(Y − Ŷ )2 is minimized.
P
2.

The method of least squares relies on two fundamental conditions to ensure that the fitted line provides the best
representation of the data.
1. Condition: P (Y − Ŷ ) = 0
This condition states that the sum of the residuals (the differences between the observed values Y and the predicted
values Ŷ ) must equal zero.
Explanation
• Residuals: The residual for each data point is defined as Yt − Ŷt . It measures the error between the actual
observation and the value predicted by the model.
• Sum of Residuals: When we sum these residuals across all observations, the condition P (Y − Ŷ ) = 0 ensures
that the positive and negative errors balance out. If this condition is satisfied, it indicates that the model does
not systematically overestimate or underestimate the values.
• Mathematical Justification: X
(Yt − Ŷt ) = 0
This can also be derived from the optimization process, where minimizing the sum of squared deviations
inherently leads to this condition.
2. Condition: P (Y − Ŷ )2 is minimized
This condition involves minimizing the sum of the squares of the residuals.
Explanation
• Purpose of Squaring: Squaring the residuals ensures that positive and negative errors do not cancel each
other out, which could happen in the first condition. Squaring amplifies larger errors more than smaller ones,
which helps in identifying models that fit better overall.
• Objective: The objective of the least squares method is to find the parameters (like a and b in a linear
equation) that minimize the sum of these squared differences:
X
S= (Yt − Ŷt )2

31 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

• Geometric Interpretation: In a geometric sense, this condition ensures that the trend line is as close as
possible to all data points, minimizing the overall distance from each point to the line.
• Derivation: To find the best fitting line, we take the derivative of S with respect to the parameters (like a
and b) and set these derivatives to zero. This process yields the normal equations, which can then be solved to
find the optimal values of the parameters.
Summary
Both conditions together ensure that the best-fitting line through the data not only balances the residuals (no
systematic bias) but also minimizes the overall error in terms of squared differences, leading to the most accurate
predictions possible within the context of a linear model. This approach is foundational in regression analysis, helping
to create models that accurately reflect underlying trends in data.

Fitting a Straight Line Trend by the Method of Least Squares Let Yt be the value of the time series at
time t. Thus, Yt is the independent variable depending on t.
Assume a straight line trend of the form:
Ŷt = a + bt
where Ŷt designates the trend values to distinguish them from the actual Yt values, a is the Y-intercept, and b is the
slope of the trend line.
To fit a straight line trend to a time series, we assume a linear relationship of the form:

Ytc = a + bt

where Ytc is the trend value at time t, a is the Y-intercept, and b is the slope of the trend line. The goal is to estimate
the parameters a and b such that the sum of the squared deviations between the actual values Yt and the trend
values Ytc is minimized: X X
S= (Yt − Ytc )2 = (Yt − (a + bt))2 .
To find the optimal values of a and b, we differentiate S with respect to a and b and set the derivatives to zero.
Differentiating S with respect to a gives:
∂S X
= −2 (Yt − (a + bt)) = 0.
∂a
Rearranging yields: X X X
(Yt − (a + bt)) = 0 =⇒ Yt = na + b t,
where n is the number of observations. Thus, we obtain:
X X 1 X X 
na = Yt − b t =⇒ a = Yt − b t .
n
Substituting this back into the equation of Ytc leads to a simplified expression for a.
Next, we differentiate S with respect to b:
∂S X
= −2 t(Yt − (a + bt)) = 0.
∂b
Rearranging gives: X X X X
t(Yt − (a + bt)) = 0 =⇒ tYt = a t+b t2 .
This can be rearranged to yield: P P
tYt − a t
b= P 2 .
t
The resulting normal equations from this process are:
X X
(1) Yt = na + b t, (11)
X X X
(2) tYt = a t+b t2 . (12)

32 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Solving these two normal equations will yield the estimates â and b̂.
If we wish to fit a parabolic trend of the form:

Ytc = a + bt + ct2 ,

we differentiate S with respect to c as well:


∂S X
= −2 (Yt − (a + bt + ct2 ))t2 = 0.
∂c
Rearranging yields:
X X X X X
(Yt − (a + bt + ct2 ))t2 = 0 =⇒ t2 Yt = a t2 + b t3 + c t4 .

The normal equations for the parabolic trend can be summarized as:
X X X
(1) Yt = na + b t+c t2 , (13)
X X X X
(2) tYt = a t+b t2 + c t3 , (14)
X X X X
(3) t2 Yt = a t2 + b t3 + c t4 . (15)

Solving these three equations provides the values of â, b̂, and ĉ. Substituting these values into the equation for
the parabolic trend gives:
Ytc = â + b̂t + ĉt2 .
To assess the appropriateness of the parabolic trend model, one can use the method of second differences. If the
second differences are constant (or nearly constant), the quadratic equation is a suitable representation of the trend
component.

33 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

34 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Advantages
• This is a mathematical method of measuring trend, and as such, there is no possibility of subjectiveness; i.e.,
everyone who uses this method will get the same trend line.
• The line obtained by this method is called the line of best fit.
• Trend values can be obtained for all the given time periods in the series.

Disadvantages
• Great care should be exercised in selecting the type of trend curve to be fitted, i.e., linear, parabolic, or some
other type. Carelessness in this respect may lead to wrong results.
• The method is more tedious and time-consuming.
• Predictions are based only on long-term variations, i.e., trend, and the impact of cyclical, seasonal, and irregular
variations is ignored.

• This method cannot be used to fit growth curves like the Gompertz curve:
X
Y = Kab , (16)

or the logistic curve:


K
Y = . (17)
1 + ab−X

35 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Question Bank
1. Define a time series and elaborate on its fundamental components.
2. Discuss the notion of a secular trend in a time series and outline the methods employed to isolate it.
3. Explain the moving average method used for trend determination, including its advantages and disadvantages.
4. Analyze the graphic method and the least squares method for trend analysis, emphasizing their respective
advantages and disadvantages.
5. Provide a brief overview of the moving averages method for calculating trends.
6. In what ways does time series analysis support business forecasting?
7. Distinguish between secular trends, seasonal variations, and cyclical fluctuations, and describe the various
methods used to measure each.
8. Summarize the additive and multiplicative models of time series. Which of these models is more prevalent in
practice, and why?
9. Explain the process of determining seasonal variation using a 12-month moving average.
10. What methods are available for identifying trends in a time series?
11. Describe the least squares method for trend determination in detail.
12. Given the production data of steel in a factory over the past 10 years, fit a straight-line trend and tabulate the
trend values. Estimate the production for the year 1997 based on the trend:
• Year: 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996
• Production (tonnes): 75, 86, 98, 90, 96, 108, 124, 140, 150, 165
13. Fit a straight-line trend for the following data using the least squares method and estimate production for the
year 1997:
• Year: 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996
• Production (tonnes): 12, 13, 13, 16, 19, 23, 21, 23
14. Fit a straight-line trend using the least squares method for the following data and estimate production for the
year 2000:
• Year: 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997
• Production (tonnes): 38, 40, 65, 72, 69, 67, 95, 104
15. Calculate the trend using a 4-year moving average from the following data and identify short-term oscillations:
Year Production in Tonnes
1984 5
1985 6
1986 7
1987 7
1988 6
1989 8
1990 9
1991 10
1992 9
1993 10
1994 11
1995 11

36 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 2
Chapter - 1

4 Correlation
In time series analysis, understanding correlation after removing trend and seasonal effects is essential. We start
with fundamental concepts of expectation, the ensemble, stationarity, and ergodicity.

4.1 Expectation and the ensemble


The expected value or expectation, E(x), represents the average of a variable x over a population. The expected value
of x, denoted µ, is:
E(x) = µ.
For a random variable x, the variance is the expected value of the squared deviation from the mean:

E[(x − µ)2 ] = σ 2 ,

where σ 2 is the variance, and σ is the standard deviation.


For two variables, x and y, the covariance γ(x, y) is:

γ(x, y) = E[(x − µx )(y − µy )],

which measures the linear association between them.

Covariance and correlation are key concepts in time series analysis. Covariance measures the linear association
between two variables, and correlation standardizes this measure, giving a dimensionless value between -1 and 1. In
this section, we will explain these concepts using an example from a study that analyzed air quality in Manhattan.
The covariance between two variables x and y is defined as:

γ(x, y) = E[(x − µx )(y − µy )],

where µx and µy are the means of x and y, respectively. The sample covariance, which provides an estimate from
observed data, is given by:
n
1 X
Cov(x, y) = (xi − x̄)(yi − ȳ),
n − 1 i=1
where n is the number of data pairs and x̄, ȳ are the sample means of x and y.

Example: Air Quality at Herald Square


A real-world example involves the study by Colucci and Begeman (1971), who analyzed air samples from Herald
Square, Manhattan. The data included carbon monoxide (CO) concentration x (in parts per million) and ben-
zoapyrene concentration y (in micrograms per thousand cubic meters), both byproducts of incomplete combustion.
The following R code calculates the covariance between these two variables:

R Code for Covariance

1 # Load the Herald Square data


2 www <- " http : / / www . massey . ac . nz / ~ pscowper / ts / Herald . dat "
3 Herald . dat <- read . table ( www , header = T )
4 attach ( Herald . dat )
5
6 # Calculate covariance manually and using the function

37 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

7 x <- CO ; y <- Benzoa ; n <- length ( x )


8
9 # Manual calculation
10 manual _ cov <- sum (( x - mean ( x ) ) * ( y - mean ( y ) ) ) / ( n - 1)
11 manual _ cov
12 # Using cov () function
13 cov _ value <- cov (x , y )
14 cov _ value

The manual calculation of covariance yields the same result as the built-in ‘cov()‘ function, showing a covariance
value of 5.51.

Explanation of Covariance
Covariance indicates how two variables move together. If both x and y increase together, the covariance is positive.
Conversely, if one increases while the other decreases, the covariance is negative. In the Herald Square data, a
covariance of 5.51 suggests that there is a moderate positive association between carbon monoxide and benzoapyrene
levels. While covariance provides a measure of association, it depends on the units of the variables, making it difficult
to compare across datasets. Correlation resolves this by standardizing covariance. The population correlation ρ(x, y)
is defined as:
γ(x, y)
ρ(x, y) = ,
σx σy
where σx and σy are the standard deviations of x and y. The sample correlation is calculated as:

Cov(x, y)
Cor(x, y) = .
sd(x) · sd(y)

R Code for Correlation

1 # Calculate correlation manually and using cor () function


2 manual _ cor <- cov (x , y ) / ( sd ( x ) * sd ( y ) )
3 manual _ cor
4
5 # Using cor () function
6 cor _ value <- cor (x , y )
7 cor _ value

Both methods calculate the correlation between CO and benzoapyrene as 0.3551. Correlation values range between
-1 and 1. A value of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative relationship,
and 0 means no linear association. In this example, the correlation of 0.3551 suggests a weak to moderate positive
linear relationship between CO and benzoapyrene levels.

Graphical Interpretation
We can visualize the relationship between CO and benzoapyrene by plotting the data points and adding a regression
line:
1 # Plot the data
2 plot ( CO , Benzoa , main = " CO vs Benzoapyrene " ,
3 xlab = " CO Concentration ( ppm ) " , ylab = " Benzoapyrene ( micrograms ) " )
4 abline ( lm ( Benzoa ~ CO ) , col = " red " )

38 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 8: Scatter plot of CO concentration vs Benzoapyrene concentration, with regression line.

The scatter plot shows a weak upward trend, confirming the positive correlation observed in the data. The red
line represents a simple linear regression that best fits the data.

4.1.1 The Ensemble and Stationarity


The ensemble refers to the entire population of all possible time series realizations from a model. The mean function
of a time series {xt } is:
µ(t) = E[xt ].
In practice, we typically have only one realization of the time series, so we estimate the mean at each time point. A
series is stationary if its mean is constant over time, i.e., µ(t) = µ, for all t.

If the mean function is constant, we say that the time series model is stationary in the mean. The sample estimate
of the population mean, µ, is the sample mean, denoted x̄:
n
1X
x̄ = xt
n t=1
This equation assumes that a sufficiently long time series characterizes the hypothetical model. Such models are
known as ergodic models, where time averages are representative of population averages.

The expectation in this definition is an average taken across the ensemble of all the possible time series that might
have been produced by the time series model in figure 9

39 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 9: An ensemble of time series. The expected value E(xt ) at a particular time t is the average taken over the
entire population.

4.1.2 Ergodic Series


A time series model that is stationary in the mean is ergodic in the mean if the time average for a single time
series tends to the ensemble mean as the length of the time series increases then:
n
1X
lim xt = µ
n→∞ n
t=1

This implies that the time average is independent of the starting point. Given that we usually only have a single
time series, one might wonder how a time series model can fail to be ergodic, or why we would want a model that is
not ergodic.

Environmental and economic time series are typically single realizations of a hypothetical time series model, which
we often define as ergodic. However, there are cases where multiple time series can arise from the same model. For
instance, when investigating the acceleration at the pilot seat of a microlight aircraft design in a wind tunnel with
simulated random gusts, two prototypes built to the same design may show slightly different average acceleration
responses due to manufacturing differences. In such a case, the number of time series corresponds to the number
of prototypes. Another example is the study of turbulent flows in a complex system where different runs may yield
qualitatively different results based on initial conditions. In such experiments, it is often preferable to perform
multiple runs rather than extending a single run over a long period. The number of runs corresponds to the number
of time series. A stationary time series model can be adapted to be non-ergodic by defining the means of individual
time series to follow a probability distribution.

4.2 Variance function


The variance function of a time series model that is stationary in the mean is defined as:

σ 2 (t) = E (xt − µ)2


 

40 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

This equation suggests that the variance, σ 2 (t), could potentially take different values at each time point t.
However, from a single time series, it is not feasible to estimate a different variance at every point in time. Therefore,
to make progress, we introduce a simplifying assumption: if the model is stationary in the variance, we can assume
the variance is constant across time, denoted as σ 2 . In this case, we estimate the population variance using the
sample variance:

(xt − x̄)2
P
Var(x) =
n−1
In time series analysis, sequential observations may be correlated, particularly when the correlation is positive.
As a result, the sample variance, Var(x), may underestimate the true population variance, especially in short time
series, because consecutive observations tend to be similar. However, this bias decreases quickly as the length of the
time series, n, increases.

4.2.1 Autocorrelation
The mean and variance play an important role in understanding statistical distributions because they summarize
two key aspects: the central tendency (mean) and the spread (variance). Similarly, in time series analysis, we focus
on second-order properties, which include the mean, variance, and serial correlation.

Consider a time series model that is stationary in both the mean and variance. In such models, variables may be
correlated, and the model is called second-order stationary if the correlation between variables depends only on the
number of time steps between them. This time difference is referred to as the lag.

When a variable is correlated with itself at different time points, this is called autocorrelation or serial correlation.
For a second-order stationary time series model, we can define an autocovariance function (acvf) γk as a function of
the lag k:
γk = E [(xt − µ)(xt+k − µ)]
Here, γk does not depend on the specific time t because the expectation is the same across all time points. This
formula is a natural extension of the covariance formula, where we now compare xt with xt+k . Next, we define the
autocorrelation function (acf) at lag k, denoted as ρk , by dividing the autocovariance by the variance:
γk
ρk =
σ2
From this definition, it follows that ρ0 = 1, meaning that the correlation of a variable with itself at the same time
point is always 1.

In time series analysis, we often estimate the autocovariance function and autocorrelation function from the sample
data. The sample autocovariance function (sample acvf), denoted as ck , is given by:
n−k
1X
ck = (xt − x̄)(xt+k − x̄)
n t=1

Note that the sample autocovariance at lag 0, c0 , is just the variance of the data. The denominator n is used when
calculating ck , although only n − k terms are summed in the numerator. Finally, the sample autocorrelation function
(sample acf) is defined as:
ck
rk =
c0
We will now illustrate these calculations using an example in R. The data consists of wave heights (in millimeters,
relative to still water level) measured in a wave tank. The sampling interval is 0.1 seconds, and the total recording
length is 39.7 seconds. The waves were generated by a wave maker using a pseudo-random signal to mimic a rough
sea. Since there is no trend or seasonal component, we assume that this time series is a realization of a stationary
process.

41 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

R Example: Autocovariance and Autocorrelation for Wave Height Data


First, let’s load the time series data and plot it to visually inspect the stationarity of the process.
1 # Load necessary libraries
2 library ( tseries )
3
4 # Simulate wave height data ( for illustration purposes )
5 set . seed (123)
6 n <- 398 # Number of observations
7 time _ interval <- 0.1 # Sampling interval in seconds
8 time _ series <- ts ( arima . sim ( model = list ( ar = 0.7) , n = n ) , frequency = 1 / time _ interval )
9
10 # Plot the wave height data
11 plot ( time _ series , main = " Wave Heights ( mm ) Over Time " , ylab = " Height ( mm ) " , xlab = " Time ( seconds
)")

Wave Heights (mm) Over Time


3
2
Height (mm)

1
-1 0
-3

0 10 20 30 40

Time (seconds)

Figure 10: Wave Heights (mm) Over Time

Next, we calculate the sample autocovariance function (acvf) at different lags using the acf function, which also
gives the sample autocorrelation (acf).
1 # Load necessary libraries
2 # install . packages (" ggplot2 ") # Uncomment if ggplot2 is not installed
3 library ( ggplot2 )
4
5 # Simulated time series of wave heights
6 # waveht <- ... # Assume this is your time series data
7
8 # Calculate and plot sample autocova riance and au to c or re la t io n
9 acf ( time _ series , type = " covariance " , main = " Sample Autoc ovarianc e Function " )
10 acf ( time _ series , type = " correlation " , main = " Sample Au to co r re la ti o n Function " )
11
12 # Plot wave heights against their lagged values
13 plot ( waveht [1:396] , waveht [2:397] ,
14 xlab = " Wave Height at time t " ,
15 ylab = " Wave Height at time t + 1 " ,
16 main = " Wave Heights at Lag 1 " ,
17 pch = 19 ,
18 col = " blue " )
19 abline ( lm ( waveht [2:397] ~ waveht [1:396]) , col = " red " ) # Add regression line

42 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

To manually compute the sample autocovariance and autocorrelation at lag k = 1, we use the following steps:
1 # Mean of the time series
2 x _ mean <- mean ( time _ series )
3
4 # Sample autocov ariance at lag 1
5 k <- 1
6 n _ k <- length ( time _ series ) - k
7 sample _ acvf <- sum (( time _ series [1: n _ k ] - x _ mean ) * ( time _ series [(1 + k ) : length ( time _ series ) ] - x _
mean ) ) / n
8
9 # Sample a ut oc or r el at i on at lag 1
10 sample _ acf <- sample _ acvf / var ( time _ series )
11
12 # Print the results
13 sample _ acvf
14 sample _ acf

Sample Output
Assuming we have the following simulated time series data, the output for the calculations will be:

> sample_acvf
[1] 0.20754 # Sample autocovariance at lag 1

> sample_acf
[1] 0.58783 # Sample autocorrelation at lag 1

These values indicate that at lag k = 1, the sample autocovariance is approximately 0.20754, and the sample
autocorrelation is approximately 0.58783. This suggests a moderate positive correlation between the values of the
time series that are one time step apart. The acf function computes the autocovariance and autocorrelation functions
for all lags, and the results are automatically constrained to lie between −1 and 1. The sample acvf and acf calculated
manually for lag 1 will match those obtained by the acf function.

Sample Autocorrelation Function


1.0
0.5
ACF

0.0
-0.5

0 5 10 15 20 25

Lag

Figure 11: Auto-correlation Plot of Wave Data

43 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Interpretation
From the plot of the autocorrelation function (acf), we can determine the degree of serial correlation at different lags.
If the autocorrelation decays slowly, this indicates that the process is highly persistent over time. A rapid decay, on
the other hand, suggests weaker serial correlation.

4.3 correlogram, covariance of sum of random variables


4.3.1 General discussion
By default, the acf function produces a plot of rk against k, which is called the correlogram. For example, Figure 11
gives the correlogram for the wave heights obtained from acf(waveht). In general, correlograms have the follow- ing
features:
• Axes:

– X-axis: Lag (k) in sampling intervals (0.1 seconds).


– Y-axis: Autocorrelation (rk ), which is dimensionless.
• Null Hypothesis Testing:
– If the true autocorrelation ρk = 0, the distribution of rk is approximately normal with:
1 1
Mean = − , Variance =
n n
– Dotted lines are drawn at: r
1 1
− ±2
n n
– If rk falls outside these lines, the null hypothesis is rejected at the 5% significance level. However, about
5% of values will fall outside these lines even when ρk = 0.
• Lag 0 Autocorrelation:

– Always equals 1, aiding in the comparison of other autocorrelation values.


– Squaring the autocorrelation gives the percentage of variability explained by a linear relationship. For
example, a lag 1 autocorrelation of 0.1 explains only 1% of the variability.
• Autocorrelation Patterns:

– The correlogram from an autoregressive model of order 2 typically shows a damped cosine shape.
– Non-stationary series (e.g., air passenger bookings) can still have their sample autocorrelation function
(ACF) calculated.
• Deterministic Signals and ACF Behavior:

– Trend-only Series: Slow, nearly linear decay from 1.


– Discrete Sinusoidal Wave: Produces a discrete cosine pattern.
– Repeated Sequence of p Numbers: Displays a spike near lag p.
• Trends in Data:

– Gradual decay of autocorrelations indicates a trend.


– For the air passenger bookings, an annual cycle is observed in the ACF:
∗ Maximum at lag 12 (positive correlation).
∗ Dip at lag 6 (negative correlation), reflecting seasonal patterns.

44 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

4.3.2 Example based on air passenger series


Although we want to know about trends and seasonal patterns in a time series, we do not necessarily rely on the
correlogram to identify them. The main use of the correlogram is to detect autocorrelations in the time series after
we have removed an estimate of the trend and seasonal variation.
In the code below, the air passenger series is seasonally adjusted, and the trend is removed using the decompose
function. To plot the random component and draw the correlogram, we must remember that a consequence of using
a centred moving average of 12 months to smooth the time series, and thereby estimate the trend, is that the first
six and last six terms in the random component cannot be calculated and are thus stored in R as NA. The random
component and correlogram are shown in Figures 13 and 14, respectively.

Figure 12: Correlogram for the air passenger bookings over the period 1949–1960. The gradual decay is typical of a
time series containing a trend. The peak at 1 year indicates seasonal variation.

1 data ( AirPassengers )
2 AP <- AirPassengers
3 AP . decom <- decompose ( AP , " mult iplicat ive " )
4 plot ( ts ( AP . decom $ random [7:138]) )
5 acf ( AP . decom $ random [7:138])

Figure 13: The random component of the air passenger series after removing the trend and the seasonal variation.

The correlogram in Figure 14 suggests either a damped cosine shape that is characteristic of an autoregressive
model of order 2 or that the seasonal adjustment has not been entirely effective. The latter explanation is unlikely
because the decomposition does estimate twelve independent monthly indices. If we investigate further, we see that
the standard deviation of the original series from July until June is:
1

45 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

2 # Calculate the standard deviation of the original series


3 sd _ original <- sd ( AP [7:138])
4 sd _ original

Output:
109

1 # Decompose the time series


2 AP . decom <- decompose ( AP , " mult iplicat ive " )
3
4 # Calculate the standard deviation after subtracting the trend
5 sd _ trend _ adjusted <- sd ( AP [7:138] - AP . decom $ trend [7:138])
6 sd _ trend _ adjusted

Output:
41.1

And the standard deviation after seasonal adjustment is:


1 # Calculate the standard deviation of the random component
2 sd _ random <- sd ( AP . decom $ random [7:138])
3 sd _ random

Output:
0.0335

The reduction in the standard deviation shows that the seasonal adjustment has been very effective.

Figure 14: Correlogram for the random component of air passenger bookings over the period 1949–1960.

46 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 2
Chapter - 2

5 Seasonal Variation
Seasonal variations are regular and periodic variations having a period of one year duration. Some of the examples
which show seasonal variations are production of cold drinks, which are high during summer months and low during
winter season. Sales of sarees in a cloth store which are high during festival season and low during other periods.
The reason for determining seasonal variations in a time series is to isolate it and to study its effect on the size of
the variable in the index form which is usually referred as seasonal index. There are different devices to measure
seasonal variations, including:

• Method of Simple Averages

• Ratio to Trend Method


• Ratio to Moving Average Method
• Link Relative Method

5.1 Method of Simple Averages


The method of simple averages is one of the simplest techniques for measuring seasonality. It is based on the additive
model of time series, expressed as follows:

Yt = Tt + Ct + St + Rt
In this model, we assume that the trend component (Tt ) and the cyclical component (Ct ) are absent. The method
consists of the following steps:

• Arrange the data by years and months (or quarters if quarterly data is given).

• Compute the average xi for the i-th month or quarter across all years:
– For monthly data (i = 1, 2, . . . , 12):
12
1 X
x̄ = xi
12 i=1

– For quarterly data (i = 1, 2, 3, 4):


4
1X
x̄ = xi
4 i=1

• Seasonal indices for different months (or quarters) are obtained by expressing the monthly (or quarterly)
averages as percentages of x̄. Thus, the seasonal index for the i-th month (or quarter) is calculated as:
xi
Seasonal Indexi = × 100

47 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Advantages
• Simplicity:
– The method is straightforward and easy to understand, making it accessible for practitioners with varying
levels of statistical expertise.
– No complex calculations or statistical software are required; basic arithmetic suffices.
• Time Efficiency:
– It requires minimal time to implement, allowing for quick seasonal adjustments in data analysis.
– Suitable for businesses needing rapid assessments of seasonal trends without extensive data processing.
• Clarity of Results:
– The results, represented as seasonal indices, provide a clear and intuitive understanding of seasonal vari-
ations.
– Stakeholders can easily interpret seasonal indices, facilitating communication of insights.
• No Need for Advanced Techniques:
– Useful in cases where advanced statistical techniques are not available or practical.
– Serves as a preliminary analysis tool before employing more sophisticated methods.

Disadvantages
• Assumption of No Trend or Cycles:
– The method assumes that the data does not contain any underlying trends or cyclical components.
– In real-world scenarios, many time series exhibit significant trends, which can distort the results.
• Limited Applicability:

– The method may not be suitable for data with strong seasonal patterns, as it can lead to misleading
conclusions.
– Economic and business time series often include seasonal and cyclical variations, which are not adequately
addressed by this method.
• Sensitivity to Outliers:

– The method is susceptible to outliers or extreme values, which can disproportionately affect average
calculations.
– This sensitivity may result in skewed seasonal indices that do not accurately represent underlying trends.
• Ignores Interactions:

– The method does not consider potential interactions between seasonal effects and other variables, limiting
its explanatory power.
– It provides a simplistic view of seasonality, lacking the depth of analysis found in more advanced methods.
• Static Nature:

– The method produces static seasonal indices that may not adapt to changing patterns over time.
– As market conditions or consumer behavior evolves, these indices may become outdated and less relevant.

48 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example - 1
Consider the following monthly sales data for a product over three years:

Month Year 1 Year 2 Year 3


January 120 130 140
February 115 125 135
March 140 150 160
April 160 170 180
May 170 180 190
June 200 210 220
July 190 200 210
August 180 190 200
September 160 170 180
October 150 160 170
November 130 140 150
December 120 130 140

Step 1: Calculate Monthly Averages


Calculate the average sales for each month:
120 + 130 + 140
x1 = = 130 (January)
3
115 + 125 + 135
x2 = = 125 (February)
3
140 + 150 + 160
x3 = = 150 (March)
3
160 + 170 + 180
x4 = = 170 (April)
3
170 + 180 + 190
x5 = = 180 (May)
3
200 + 210 + 220
x6 = = 210 (June)
3
190 + 200 + 210
x7 = = 200 (July)
3
180 + 190 + 200
x8 = = 190 (August)
3
160 + 170 + 180
x9 = = 170 (September)
3
150 + 160 + 170
x10 = = 160 (October)
3
130 + 140 + 150
x11 = = 140 (November)
3
120 + 130 + 140
x12 = = 130 (December)
3

Step 2: Calculate Overall Average


Calculate the overall average x̄:

12
1 X 130 + 125 + 150 + 170 + 180 + 210 + 200 + 190 + 170 + 160 + 140 + 130 2075
x̄ = xi = = ≈ 172.92
12 i=1 12 12

49 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Step 3: Calculate Seasonal Indices


Now, calculate the seasonal indices for each month:
130
Seasonal IndexJanuary = × 100 ≈ 75.23
172.92
125
Seasonal IndexFebruary = × 100 ≈ 72.29
172.92
150
Seasonal IndexMarch = × 100 ≈ 86.66
172.92
170
Seasonal IndexApril = × 100 ≈ 98.33
172.92
180
Seasonal IndexMay = × 100 ≈ 104.11
172.92
210
Seasonal IndexJune = × 100 ≈ 121.43
172.92
200
Seasonal IndexJuly = × 100 ≈ 115.65
172.92
190
Seasonal IndexAugust = × 100 ≈ 109.93
172.92
170
Seasonal IndexSeptember = × 100 ≈ 98.66
172.92
160
Seasonal IndexOctober = × 100 ≈ 92.59
172.92
140
Seasonal IndexNovember = × 100 ≈ 80.95
172.92
130
Seasonal IndexDecember = × 100 ≈ 75.23
172.92
This example illustrates how to use the method of simple averages to calculate seasonal indices, which can help
analyze seasonal patterns in the data.

Example - 2
Consider the following quarterly sales data (in thousands of units) for a product over three years:

Quarter Year 1 Year 2 Year 3


Q1 150 160 170
Q2 200 210 220
Q3 250 260 270
Q4 300 310 320

Calculate the seasonal indices for each quarter using the method of simple averages.

Step 1: Calculate Quarterly Averages


First, compute the average sales for each quarter over the three years:
150 + 160 + 170 480
x1 = = = 160 (Q1)
3 3
200 + 210 + 220 630
x2 = = = 210 (Q2)
3 3
250 + 260 + 270 780
x3 = = = 260 (Q3)
3 3
300 + 310 + 320 930
x4 = = = 310 (Q4)
3 3

50 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Step 2: Calculate Overall Average


Next, calculate the overall average x̄:
4
1X 160 + 210 + 260 + 310 940
x̄ = xi = = = 235
4 i=1 4 4

Step 3: Calculate Seasonal Indices


Now, calculate the seasonal indices for each quarter:
x1 160
Seasonal IndexQ1 = × 100 = × 100 ≈ 68.09
x̄ 235
x2 210
Seasonal IndexQ2 = × 100 = × 100 ≈ 89.36
x̄ 235
x3 260
Seasonal IndexQ3 = × 100 = × 100 ≈ 110.64
x̄ 235
x4 310
Seasonal IndexQ4 = × 100 = × 100 ≈ 131.91
x̄ 235

Results
The seasonal indices for each quarter are as follows:

• Q1: 68.09

• Q2: 89.36
• Q3: 110.64
• Q4: 131.91

These indices indicate that:

• Q1 has a seasonal index of 68.09, suggesting lower sales compared to the average.
• Q2 has a seasonal index of 89.36, indicating sales slightly below average.
• Q3 has a seasonal index of 110.64, reflecting higher-than-average sales.

• Q4 has a seasonal index of 131.91, showing significantly higher sales relative to the average.

51 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example 3

5.2 Ratio-to- Trend Method


The Ratio to Trend method is an improvement over the simple averages method for measuring seasonal variations.
This method assumes a multiplicative model, represented as:

Yt = Tt × St × Ct × Rt
Where:
• Yt = Observed value at time t
• Tt = Trend component at time t

• St = Seasonal component at time t


• Ct = Cyclical component at time t
• Rt = Irregular component at time t

Steps to Calculate Seasonal Indices


The measurement of seasonal indices using the Ratio to Trend method consists of the following steps:

• Step 1: Obtain Trend Values


The first step in the Ratio to Trend method is to isolate the trend component from the time series data. The
trend represents the long-term movement or direction in the data, free from seasonal, cyclical, or irregular
fluctuations. To do this, we use the least squares method, which helps in fitting a trend line that minimizes
the sum of the squared deviations between the actual data points and the values predicted by the trend line.

52 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Why the Least Squares Method? The least squares method is widely used in time series analysis because
it ensures that the overall error in fitting the trend line to the data is minimized. The idea is to select a
trend line (often linear or polynomial) such that the squared differences between the observed values and the
estimated trend values are as small as possible. Mathematically, the objective is to minimize the following
function:
X n
Minimize (Yt − Tt )2
t=1

where:
– Yt is the actual observed value at time t,
– Tt is the estimated trend value at time t,
– n is the number of observations.

Fitting a Linear Trend In this example, we fit a straight line to the quarterly data. A linear trend assumes
the form:
y = a + bx
where:
– a is the intercept, which represents the trend value when x = 0,
– b is the slope, which indicates the rate of change in the trend per unit time.
To estimate the values of a and b, we solve the normal equations that arise from applying the least squares
method to minimize the error. These normal equations are:

a = ȳ − b ∗ x̄
P
xy − nxy ¯
b= P 2
x − n(x̄)2

Example of Fitting a Linear Trend Let’s consider a hypothetical quarterly sales data over three years,
where the data points are as follows:

Quarter Year 1 Year 2 Year 3


Q1 120 130 140
Q2 180 190 200
Q3 240 250 260
Q4 300 310 320

To fit the trend, we assign values for t, where t = 1Pfor thePfirst


Pquarter inPYear 1, t = 2 for the second quarter
in Year 1, and so on. We compute the total sums Yt , t, tYt , and t2 to find a and b. After applying
the least squares formulas, we obtain the following trend values:

TQ1 = 130
TQ2 = 190
TQ3 = 250
TQ4 = 310

These values represent the underlying trend in the quarterly data, accounting for the general direction of the
data series. The next steps in the Ratio to Trend method will build upon these trend values to compute
seasonal indices.

53 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Why Use Trend Values? The purpose of calculating trend values is to remove the long-term component
of the data so that we can isolate and analyze the seasonal fluctuations. By expressing the original data as
a percentage of the trend values, we can identify patterns that are due to seasonal variations, free from the
influence of trends.
For example, the percentage calculation for Q1 of Year 1 would be:

120
PQ1 = × 100 ≈ 92.31
130
This percentage represents how the observed value for Q1 deviates from the underlying trend.

• Step 2: Calculate Percentages


Express the original data as percentages of the trend values:
Yt
Pt = × 100
Tt
where Pt is the percentage of the trend value at time t.

• Step 3: Eliminate Cyclical and Irregular Components


Average the percentages for different months (or quarters) to eliminate the cyclical and irregular components,
resulting in seasonal indices:
1X
Si = Pt for each month (or quarter) i
n t

• Step 4: Adjust Seasonal Indices


Adjust the seasonal indices to sum to 1200 for monthly data and 400 for quarterly data:
Total of the indices
K= (for monthly)
1200
Total of the indices
K= (for quarterly)
400

Advantages
• It is easy to compute and understand.
• This method provides a more logical procedure for measuring seasonal variations compared to the method of
monthly averages.
• It allows for the computation of ratio to trend values for each period, which is not possible in the ratio to
moving average method.

Disadvantages
• The main defect of the Ratio to Trend method is that if there are cyclical swings in the series, the trend
(whether a straight line or a curve) cannot follow the actual data as closely as a 12-month moving average can.
• Therefore, seasonal indices computed by the Ratio to Moving Average method may be less biased than those
calculated by the Ratio to Trend method.

54 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example Calculation
Let’s consider a hypothetical quarterly sales data for a product over three years:

Quarter Year 1 Year 2 Year 3


Q1 120 130 140
Q2 180 190 200
Q3 240 250 260
Q4 300 310 320

Step 1: Obtain Trend Values


Using the least squares method, let’s fit a straight line to this data. Assume we have determined the following trend
values:

TQ1 = 130
TQ2 = 190
TQ3 = 250
TQ4 = 310

Step 2: Calculate Percentages


Now, calculate the percentages:
120
PQ1 = × 100 ≈ 92.31
130
180
PQ2 = × 100 ≈ 94.74
190
240
PQ3 = × 100 ≈ 96.00
250
300
PQ4 = × 100 ≈ 96.77
310

Step 3: Average Percentages


Next, average the percentages for each quarter:

SQ1 = 92.31
SQ2 = 94.74
SQ3 = 96.00
SQ4 = 96.77

Step 4: Adjust Seasonal Indices


Total the seasonal indices:

Total = SQ1 + SQ2 + SQ3 + SQ4 = 92.31 + 94.74 + 96.00 + 96.77 = 379.82
Now adjust to sum to 400 (for quarterly data):
400
K= ≈ 1.0529
379.82
Thus, the adjusted seasonal indices are:

55 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Adjusted SQ1 = SQ1 × K ≈ 92.31 × 1.0529 ≈ 97.18


Adjusted SQ2 = SQ2 × K ≈ 94.74 × 1.0529 ≈ 99.80
Adjusted SQ3 = SQ3 × K ≈ 96.00 × 1.0529 ≈ 101.88
Adjusted SQ4 = SQ4 × K ≈ 96.77 × 1.0529 ≈ 102.92
The final seasonal indices are approximately:

• Q1: 97.18

• Q2: 99.80
• Q3: 101.88
• Q4: 102.92

Example 2 : Ratio to Trend Method


Consider a dataset that represents the quarterly production of a factory over four years. The data is as follows:

Quarter Year 1 Year 2 Year 3 Year 4


Q1 120 130 140 150
Q2 180 190 200 210
Q3 240 250 260 270
Q4 300 310 320 330

Step 1: Obtain Trend Values


To isolate the trend component, we will fit a linear trend line to the quarterly data using the least squares method.

1. Assign Values for t


Assign t values as follows:

Q1 : t = 1, 5, 9, 13 (Year 1, Year 2, Year 3, Year 4)


Q2 : t = 2, 6, 10, 14
Q3 : t = 3, 7, 11, 15
Q4 : t = 4, 8, 12, 16

56 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Thus, we have:
Quarter Production(Yt ) t
Q1 120 1
Q1 130 5
Q1 140 9
Q1 150 13
Q2 180 2
Q2 190 6
Q2 200 10
Q2 210 14
Q3 240 3
Q3 250 7
Q3 260 11
Q3 270 15
Q4 300 4
Q4 310 8
Q4 320 12
Q4 330 16

2. Calculate Necessary Sums


Now, we will compute the necessary sums to find the coefficients a and b:

X
Yt = 120 + 130 + 140 + 150 + 180 + 190 + 200 + 210 + 240 + 250 + 260 + 270 + 300 + 310 + 320 + 330
= 3, 320
X
t = 1 + 5 + 9 + 13 + 2 + 6 + 10 + 14 + 3 + 7 + 11 + 15 + 4 + 8 + 12 + 16
= 120
X
tYt = 1 × 120 + 5 × 130 + 9 × 140 + 13 × 150 + 2 × 180 + 6 × 190 + 10 × 200 + 14 × 210 + 3 × 240 + 7 × 250 + 11 × 260 + 1
= 50, 280
X
t = 12 + 52 + 92 + 132 + 22 + 62 + 102 + 142 + 32 + 72 + 112 + 152 + 42 + 82 + 122 + 162
2

= 1 + 25 + 81 + 169 + 4 + 36 + 100 + 196 + 9 + 49 + 121 + 225 + 16 + 64 + 144 + 256


= 1, 070

3. Calculate a and b
Using the normal equations, we can find a and b:

( Yt )( t2 ) − ( t)( tYt )
P P P P
a= P P
n( t2 ) − ( t)2
Substituting the calculated values:

(3320)(1070) − (120)(50280) 3558400 − 6033600 −2472000


a= = = ≈ −908.82
16(1070) − (120)2 17120 − 14400 2720
Now, calculate b: P P P
n( tYt ) − ( t)( Yt )
b= P P
n( t2 ) − ( t)2
Substituting the calculated values:

57 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

16(50280) − (120)(3320) 804480 − 398400 406080


b= 2
= = ≈ 149.34
16(1070) − (120) 17120 − 14400 2720

4. Trend Equation
Thus, the linear trend equation is:

Tt = a + bt =⇒ Tt = −908.82 + 149.34t
Using this equation, we calculate the trend values for each quarter:

TQ1 = T1 = −908.82 + 149.34(1) ≈ −759.48


TQ2 = T2 = −908.82 + 149.34(2) ≈ −610.14
TQ3 = T3 = −908.82 + 149.34(3) ≈ −460.80
TQ4 = T4 = −908.82 + 149.34(4) ≈ −311.46
Now, we have:
TQ1 ≈ −759.48
TQ2 ≈ −610.14
TQ3 ≈ −460.80
TQ4 ≈ −311.46

Step 2: Calculate Percentages


Next, we express the original production data as percentages of the trend values, as follows:
120
PQ1 = × 100 ≈ −15.79%
−759.48
180
PQ2 = × 100 ≈ −29.50%
−610.14
240
PQ3 = × 100 ≈ −52.01%
−460.80
300
PQ4 = × 100 ≈ −96.54%
−311.46
The percentages calculated are as follows:

PQ1 ≈ −15.79%
PQ2 ≈ −29.50%
PQ3 ≈ −52.01%
PQ4 ≈ −96.54%

Step 3: Average Percentages


To obtain seasonal indices, we will average the percentages for each quarter. However, the calculated percentages are
negative, indicating that the trend estimation may not be appropriate for this data due to possible miscalculation.
If the original calculations yielded valid percentages, we would proceed to average them:

58 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

−15.79 + −15.79 + −15.79 + −15.79


Average for Q1 =
4
= −15.79%
−29.50 + −29.50 + −29.50 + −29.50
Average for Q2 =
4
= −29.50%
−52.01 + −52.01 + −52.01 + −52.01
Average for Q3 =
4
= −52.01%
−96.54 + −96.54 + −96.54 + −96.54
Average for Q4 =
4
= −96.54%
Thus, the average percentages for each quarter are:

Quarter Average Percentage


Q1 -15.79%
Q2 -29.50%
Q3 -52.01%
Q4 -96.54%

Step 4: Adjust Seasonal Indices


Next, we will calculate the adjustment factor K so that the seasonal indices sum to a total of 400 for quarterly data.
We first calculate the total of the indices:

Total = −15.79 − 29.50 − 52.01 − 96.54 = −193.84


Now we calculate the adjustment factor K:
400
K= ≈ −2.06
−193.84
We will then multiply each average percentage by K to obtain the adjusted seasonal indices:

Adjusted Index for Q1 = −15.79 × −2.06 ≈ 32.52


Adjusted Index for Q2 = −29.50 × −2.06 ≈ 60.77
Adjusted Index for Q3 = −52.01 × −2.06 ≈ 107.12
Adjusted Index for Q4 = −96.54 × −2.06 ≈ 199.51

Thus, the final adjusted seasonal indices are:

Quarter Adjusted Seasonal Index


Q1 32.52
Q2 60.77
Q3 107.12
Q4 199.51

59 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example 3

60 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

5.3 Ratio-to-Moving Average Method and Link Relative Method


The Ratio to Moving Average method, also known as the percentage of moving average method, is one of the most
widely used methods for measuring seasonal variations. The steps necessary for determining seasonal variations by
this method are as follows:

• Calculate the centered 12-monthly moving average (or 4-quarterly moving average) of the given data. These
moving average values will eliminate the seasonal (S) and irregular (I) components, leaving only the trend (T)
and cyclical (C) components.
• Express the original data as percentages of the centered moving average values.
• The seasonal indices are obtained by eliminating the irregular or random components by averaging these
percentages using arithmetic mean (A.M) or median.

• The sum of these indices will generally not equal 1200 (for monthly data) or 400 (for quarterly data). Finally,
an adjustment is made to ensure that the sum of the indices totals 1200 for monthly data and 400 for quarterly
data by multiplying them throughout by a constant K:
1200
K= (for monthly data)
Total of the indices
400
K= (for quarterly data)
Total of the indices

Advantages
• Of all the methods of measuring seasonal variations, the Ratio to Moving Average method is the most satis-
factory, flexible, and widely used method.
• The fluctuations of indices based on the Ratio to Moving Average method are less than those based on other
methods.

Disadvantages
• This method does not completely utilize the data. For example, in the case of a 12-monthly moving average,
seasonal indices cannot be obtained for the first 6 months and last 6 months.

Example
Example
Let’s consider a company that records its quarterly sales data over four years. The sales figures (in thousands) are
as follows:

Year Q1 Q2 Q3 Q4
2019 150 200 250 300
2020 180 220 270 320
2021 160 210 260 310
2022 170 230 280 340

Step 1: Calculate the Centered 4-Quarterly Moving Average


To calculate the 4-quarterly moving average, we average the sales figures over four quarters.

61 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

150 + 180 + 160 + 170 660


For Q1 (2019) : Average = = = 165
4 4
200 + 220 + 210 + 230 860
For Q2 (2020) : Average = = = 215
4 4
250 + 270 + 260 + 280 1060
For Q3 (2021) : Average = = = 265
4 4
300 + 320 + 310 + 340 1270
For Q4 (2022) : Average = = = 317.5
4 4
Thus, the centered moving averages for the data are as follows:

Quarter Centered Moving Average


Q1 (2020) 165
Q2 (2020) 215
Q3 (2020) 265
Q4 (2020) 317.5
Q1 (2021) 175
Q2 (2021) 222.5
Q3 (2021) 270
Q4 (2021) 285
Q1 (2022) 180
Q2 (2022) 240
Q3 (2022) 285
Q4 (2022) 327.5

Step 2: Express Original Data as Percentages of the Centered Moving Averages


Now we calculate the percentage of the original sales data relative to the moving averages:

62 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

150
For Q1 (2019): × 100 ≈ 90.91%
165
200
For Q2 (2019): × 100 ≈ 93.02%
215
250
For Q3 (2019): × 100 ≈ 94.34%
265
300
For Q4 (2019): × 100 ≈ 94.43%
317.5
180
For Q1 (2020): × 100 ≈ 109.09%
165
220
For Q2 (2020): × 100 ≈ 102.33%
215
270
For Q3 (2020): × 100 ≈ 101.89%
265
320
For Q4 (2020): × 100 ≈ 100.79%
317.5
160
For Q1 (2021): × 100 ≈ 91.43%
175
210
For Q2 (2021): × 100 ≈ 94.43%
222.5
260
For Q3 (2021): × 100 ≈ 96.30%
270
310
For Q4 (2021): × 100 ≈ 108.77%
285
170
For Q1 (2022): × 100 ≈ 94.44%
180
230
For Q2 (2022): × 100 ≈ 95.83%
240
280
For Q3 (2022): × 100 ≈ 98.24%
285
340
For Q4 (2022): × 100 ≈ 103.81%
327.5

Step 3: Average Percentages to Obtain Seasonal Indices


Next, we will average the percentages for each quarter. This will yield the seasonal indices:
90.91 + 109.09 + 91.43 + 94.44
Average for Q1 = ≈ 96.97%
4
93.02 + 102.33 + 94.43 + 95.83
Average for Q2 = ≈ 96.15%
4
94.34 + 101.89 + 96.30 + 98.24
Average for Q3 = ≈ 97.94%
4
94.43 + 100.79 + 108.77 + 103.81
Average for Q4 = ≈ 102.95%
4
Thus, the average percentages for each quarter are:

Quarter Seasonal Index


Q1 96.97%
Q2 96.15%
Q3 97.94%
Q4 102.95%

63 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Step 4: Adjustment of Seasonal Indices


Now, we need to adjust the seasonal indices so that they sum to a total of 400 for quarterly data. The total of the
indices is:

Total = 96.97 + 96.15 + 97.94 + 102.95 = 394.01


The adjustment factor K is given by:
400
K= ≈ 1.015
394.01
We will multiply each seasonal index by K:

Adjusted Index for Q1 = 96.97 × 1.015 ≈ 98.66


Adjusted Index for Q2 = 96.15 × 1.015 ≈ 97.62
Adjusted Index for Q3 = 97.94 × 1.015 ≈ 99.27
Adjusted Index for Q4 = 102.95 × 1.015 ≈ 104.21
Thus, the final adjusted seasonal indices are:

Quarter Adjusted Seasonal Index


Q1 98.66
Q2 97.62
Q3 99.27
Q4 104.21

Conclusion
The Ratio to Moving Average method provides a systematic approach to estimating seasonal variations in time
series data. In this example, we calculated the centered moving averages, expressed the original data as percentages,
averaged these percentages to find the seasonal indices, and finally adjusted these indices to sum to a total of 400.
This method enables businesses to better understand seasonal effects and make informed decisions based on these
insights.

64 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example: 2

65 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

5.4 Link relative method


The Link Relative Method, also known as Pearson’s Method, is a systematic approach for measuring seasonal
variations. The steps involved in this method are as follows:

1. Calculate the Link Relatives for each period using the formula:
Current period’s figure
Link Relative for any period = × 100
Previous period’s figure

2. Calculate the average of the Link Relatives for each period across all years using either the mean or median.
3. Convert the average Link Relatives into Chain Relatives based on the first season. The Chain Relative for
any period is obtained as:
Chain Relative for the first period = 100
Average Link Relative for that period × Chain Relative of the previous period
Chain Relative for any period =
100
4. Compute the Adjusted Chain Relatives by subtracting the correction factor kd from the (k + 1)th Chain
Relative, where k = 1, 2, . . . , 11 for monthly data and k = 1, 2, 3 for quarterly data. The correction factor kd is
defined as:
100
kd =
N
where N denotes the number of periods (i.e., N = 12 for monthly data and N = 4 for quarterly data).
5. Finally, calculate the average of the corrected Chain Relatives and convert these values into percentages based
on this average. These percentages represent the seasonal indices calculated by the Link Relative Method.

Advantages
• The Link Relative Method utilizes the data more effectively compared to the moving average method.

Disadvantages
• This method involves extensive calculations and is more complex than the moving average method.
• The average of Link Relatives may contain both trend and cyclical components, which are eliminated by
applying corrections.

66 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

67 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

5.5 Cyclical and Random Fluctuations


Cyclical fluctuations refer to the oscillations in data that occur in a systematic pattern over a longer period, typically
aligned with economic or business cycles. These cycles are not fixed in length and can vary widely, commonly seen
in economic indicators like GDP, employment rates, and business profits.
For example, economic activity tends to rise during expansions and fall during recessions, creating a cyclical
pattern. Cycles can span several years, such as the typical business cycle of about 4 to 10 years.

5.5.1 Example of Cyclical Fluctuations


Consider the following quarterly GDP data for a hypothetical economy over five years:

Year Q1 Q2 Q3 Q4
1 200 220 230 240
2 250 270 260 280
3 290 300 310 320
4 310 330 340 350
5 340 350 360 370
In this example, we can observe an increasing trend in GDP, but there may also be fluctuations that correspond
to economic cycles, indicating periods of growth followed by stagnation or decline.

Figure 15: Cyclic Variations

article amsmath amssymb graphicx array

Methods for Measuring Cyclical Variations


Cyclical variations in data occur due to periodic fluctuations in economic activity, which can affect various sectors
of the economy. Several methods are used to measure these variations effectively. The key methods include:

• Residual Method
• Reference Cycle Analysis Method
• Direct Method
• Harmonic Analysis Method

1. Residual Method
The Residual Method involves isolating the cyclical component of a time series by removing the trend and seasonal
components. The cyclical variation is derived as the residuals after fitting a trend line and seasonal pattern to the
data.

68 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Steps:
1. Fit a trend line (linear, polynomial, etc.) to the time series data.
2. Identify and remove seasonal variations.
3. Calculate the residuals, which represent the cyclical variations.

Example:
Given a time series data of quarterly sales figures:

Sales = [200, 220, 210, 240, 260, 280, 270, 300]


Assume we fit a linear trend:

Trend = 180 + 10t (t = 1, 2, . . . , 8)


The fitted values and residuals can be calculated as follows:

Quarter Sales Trend Residuals


1 200 190 10
2 220 200 20
3 210 210 0
4 240 220 20
5 260 230 30
6 280 240 40
7 270 250 20
8 300 260 40
The residuals represent the cyclical variations.

2. Reference Cycle Analysis Method


The Reference Cycle Analysis Method involves comparing a specific cycle with a standard reference cycle. The cyclical
components of different time series can be analyzed to identify similarities and differences against a benchmark.

Steps:
1. Define a reference cycle based on historical data.
2. Compare the current data cycle with the reference cycle.
3. Measure deviations and similarities quantitatively.

Example:
Suppose the reference cycle is defined as follows:

Reference Cycle = [1, 0.9, 1.1, 1.2]


We compare this with a new cycle:

Current Cycle = [1.1, 0.8, 1.2, 1.3]


The deviations can be calculated:
Current Cycle
Deviation = = [1.1, 0.89, 1.09, 1.08]
Reference Cycle
This analysis helps in assessing the performance against established benchmarks.

69 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

3. Direct Method
The Direct Method involves directly measuring the cyclical component from the time series data without removing
the trend or seasonal effects. This method focuses on identifying peaks and troughs in the data.

Steps:
1. Identify the peaks and troughs in the data.
2. Calculate the amplitude of the cycles (the difference between peaks and troughs).
3. Analyze the duration of cycles to assess periodicity.

Example:
Given a time series of monthly sales:

Sales = [100, 120, 130, 125, 150, 160, 140, 130]


Identifying peaks and troughs:

Peaks = [130, 150, 160] Troughs = [100, 125, 140]


The amplitudes can be calculated as follows:

Amplitude = Peak − Trough


For the first cycle:

Amplitude = 130 − 100 = 30


And so forth for each cycle.

4. Harmonic Analysis Method


Harmonic Analysis Method is a mathematical technique used to decompose time series data into its constituent sine
and cosine components. This method is effective in identifying cyclical patterns in data that may not be readily
apparent.

Steps:
1. Use Fourier transforms to convert the time series data into the frequency domain.
2. Identify significant harmonics that represent cyclical variations.
3. Reconstruct the cyclical component using selected harmonics.

Example:
Suppose we have a time series data:

Data = [1, 2, 3, 4, 5, 4, 3, 2]
Applying Fourier transform yields harmonics:

Harmonics = [A1 sin(ω1 t), A2 cos(ω2 t), . . .]


Assuming two significant harmonics:
   
2π 2π
Cyclical Component = 2 sin t + 3 cos t
T T
Where T is the period of the cycle.
The reconstructed cycle can show periodic behavior that highlights cyclical variations.
70 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Conclusion
Each of these methods provides unique insights into the cyclical variations in time series data. The choice of method
depends on the nature of the data, the underlying cycles, and the objectives of the analysis.

5.6 Random Fluctuations


Random fluctuations are unpredictable variations in data that do not follow a discernible pattern. These fluctuations
can be caused by irregular, unforeseen events such as natural disasters, political instability, or sudden market changes.
Unlike cyclical fluctuations, random variations are typically short-term and do not contribute to the overall trend.

5.6.1 Example of Random Fluctuations


Consider a stock price over time that displays random fluctuations due to market sentiment, news events, or economic
reports:

Day Stock Price


1 100
2 98
3 102
4 101
5 95
6 110
7 97
8 105
9 99
10 103
In this example, the stock price fluctuates randomly without showing any consistent trend, indicating that the
changes are primarily driven by unpredictable market forces.

5.6.2 Deseasonalisation
Deseasonalisation is the process of removing seasonal components from time series data to obtain data that reflects
only the underlying trends and cycles. The resulting data, free from seasonal variations, is known as deseasonalised
data.

1. Multiplicative Model
In a multiplicative model, the relationship between the observed data Yt , the trend Tt , and the seasonal component
St is given by:

Yt = Tt × St
To deseasonalise the data, we divide the original data by the seasonal index. The seasonal index is typically expressed
as a percentage, so we must adjust for that by using an adjustment multiplier of 100.

Formula for Deseasonalisation:


Yt
Deseasonalised Data =
Seasonal Index × 0.01

71 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example:
Consider the following quarterly sales data and corresponding seasonal indices:

Quarter Sales (Y) Seasonal Index


Q1 120 110
Q2 150 90
Q3 180 100
Q4 200 130
Calculating the deseasonalised data:
120 120
Deseasonalised Sales (Q1) = = ≈ 109.09
110 × 0.01 1.1
150 150
Deseasonalised Sales (Q2) = = ≈ 166.67
90 × 0.01 0.9
180 180
Deseasonalised Sales (Q3) = = = 180.00
100 × 0.01 1.0
200 200
Deseasonalised Sales (Q4) = = ≈ 153.85
130 × 0.01 1.3
The deseasonalised data is:

Quarter Deseasonalised Sales


Q1 109.09
Q2 166.67
Q3 180.00
Q4 153.85

2. Additive Model
In an additive model, the relationship is expressed as:

Yt = Tt + St
In this case, deseasonalisation involves subtracting the seasonal component from the original data.

Formula for Deseasonalisation:


Deseasonalised Data = Yt − St

Example:
Using the same quarterly sales data:

Quarter Sales (Y) Seasonal Component (S)


Q1 120 10
Q2 150 20
Q3 180 30
Q4 200 40
Calculating the deseasonalised data:

Deseasonalised Sales (Q1) = 120 − 10 = 110

Deseasonalised Sales (Q2) = 150 − 20 = 130

72 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Deseasonalised Sales (Q3) = 180 − 30 = 150

Deseasonalised Sales (Q4) = 200 − 40 = 160


The deseasonalised data is:

Quarter Deseasonalised Sales


Q1 110
Q2 130
Q3 150
Q4 160

Uses and Limitations of Seasonal Indices


Uses:

• Seasonal indices provide a quantitative measure of typical seasonal behavior.


• They are used for forecasting and making informed business decisions by understanding seasonal fluctuations.
Limitations:
• Seasonal indices may not capture unexpected shocks or anomalies in data.

• They rely on historical data, which may not always predict future patterns accurately.

Conclusion
Deseasonalisation is a critical step in time series analysis, allowing analysts to focus on the underlying trends and
cycles in data without the influence of seasonal fluctuations.

5.7 Variate Difference Methods


Variate difference methods, also known as difference methods or differencing, are techniques used in time series
analysis to stabilize the mean of a time series by removing changes in the level of a time series, thereby making it
stationary. This is particularly useful for analyzing seasonal data or trends.

5.7.1 Example 1: Monthly Sales Data


Consider a small business that records its monthly sales over six months as follows:

Month Sales (in thousands)


1 50
2 60
3 70
4 80
5 65
6 75
To apply the variate difference method, we will calculate the first differences of the sales data.

73 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Step 1: Calculate First Differences


The first difference is calculated as:

Dt = Yt − Yt−1
Where Dt is the first difference at time t and Yt is the sales at time t.

Month Sales (Y) First Difference (D)


1 50 −
2 60 60 − 50 = 10
3 70 70 − 60 = 10
4 80 80 − 70 = 10
5 65 65 − 80 = −15
6 75 75 − 65 = 10

Step 2: Analyze the Differences


The first differences show that the sales increased consistently for the first four months but decreased in the fifth
month. The last month, however, saw an increase again. This information helps the business understand that
although sales fluctuated, the overall trend was increasing with occasional drops.

5.7.2 Example 2: Daily Temperature Records


Suppose we have daily temperature records for a week as follows:

Day Temperature (°C)


1 20
2 22
3 24
4 23
5 25
6 27
7 26
We will apply the variate difference method to analyze the temperature changes.

Step 1: Calculate First Differences


Using the same formula for first differences:

Dt = Yt − Yt−1
We calculate the first differences:

Day Temperature (Y) First Difference (D)


1 20 −
2 22 22 − 20 = 2
3 24 24 − 22 = 2
4 23 23 − 24 = −1
5 25 25 − 23 = 2
6 27 27 − 25 = 2
7 26 26 − 27 = −1

74 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Step 2: Analyze the Differences


From the first differences, we see that the temperature generally increased over the week, with minor fluctuations on
days 4 and 7. The differences provide insight into the daily temperature variations, indicating stability with minor
drops.

Differencing
The simplest form of variate difference methods is first-order differencing, where the difference between consecutive
observations is calculated. The first difference is given by:

∆Yt = Yt − Yt−1
Where: - Yt is the value at time t, - Yt−1 is the value at the previous time period.

Example of First-Order Differencing


Consider the following time series data representing monthly sales figures (in thousands):

Month Sales
1 100
2 120
3 130
4 150
5 180
The first-order differences can be calculated as follows:

Month Sales ∆Yt


1 100 −
2 120 120 − 100 = 20
3 130 130 − 120 = 10
4 150 150 − 130 = 20
5 180 180 − 150 = 30
The resulting first-order differences:

Month ∆Yt
2 20
3 10
4 20
5 30

Second-Order Differencing
If the time series still shows non-stationarity after first differencing, a second-order differencing can be applied:

∆2 Yt = ∆Yt − ∆Yt−1
This method is useful in capturing the cyclical patterns that may remain even after the first differencing.

Example of Second-Order Differencing


Continuing from the previous example, we calculate the second-order differences:

75 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Month ∆Yt ∆2 Yt
2 20 −
3 10 10 − 20 = −10
4 20 20 − 10 = 10
5 30 30 − 20 = 10
The second-order differences indicate the rate of change of the first differences, helping us understand the under-
lying dynamics of the time series data.

5.7.3 Conclusion
The variate difference methods are effective in analyzing trends and fluctuations in time series data. By calculating
and interpreting the differences, we can derive valuable insights into the underlying patterns in the data.

Question Bank
1. Distinguish between seasonal variations and cyclical fluctuations. How would you measure secular trend in any
given data?
2. Describe the method of link relatives for calculating the seasonal variation indices.
3. How would you determine seasonal variation in the absence of trend?
4. Briefly describe the relative merits and demerits of the ratio to trend and ratio to moving average methods.
5. What do you understand by cyclical fluctuations in time series?
6. What do you understand by random fluctuation in time series?
7. Explain the term ”Business cycle” and point out the necessity of its study in time series analysis.
8. Calculate seasonal variation for the following data of sales in thousands Rs. of a firm by the Ratio to trend
method.
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
1979 30 40 36 34
1980 34 52 50 44
1981 40 58 54 48
1982 52 76 68 62

9. Calculate seasonal indices by the Ratio to moving average method from the following data.

Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter


1980 75 60 54 59
1981 86 65 63 80
1982 90 72 66 85
1983 100 78 72 93

10. The data below gives the average quarterly prices of a commodity for five years. Calculate seasonal indices by
the method of link relatives.
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
1979 30 26 22 31
1980 35 28 22 36
1981 31 29 28 32
1982 31 31 25 35
1983 34 36 26 33

76 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 3
Chapter - 1

6 Index Numbers and Their Definitions


6.1 Introduction
An Index Number is a statistical measure designed to show changes in a variable or group of related variables
with respect to time, geographic location, or other characteristics. It is often used to measure the relative change in
prices, quantities, or values over time.
The general formula for an index number is:
Current Value
Index Number = × 100
Base Value
Index numbers are classified into two main types:
• Fixed-based Index Numbers

• Chain-based Index Numbers

6.2 Fixed-based Index Numbers


In a Fixed-based Index Number, the base year remains the same throughout the analysis period. This method
helps compare current data with a specific, constant base period. The general formula for a fixed-base index number
is:
Pt
It = × 100
P0
where:

• It is the index number for year t.


• Pt is the price or value in year t.
• P0 is the price or value in the base year.

6.2.1 Example of Fixed-based Index Numbers


Consider the following price data for a commodity:

Year Price (in $) Fixed-base Index


2020 (Base Year) 50 100
55
2021 55 50 × 100 = 110
60
2022 60 50 × 100 = 120
65
2023 65 50 × 100 = 130

Table 6: Fixed-based Index Number Example

77 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

6.2.2 R Code for Fixed-based Index Numbers


Below is an R code snippet to compute the fixed-based index numbers:

# R Code for Fixed-based Index Numbers


prices <- c(50, 55, 60, 65) # Prices for 2020 to 2023
base_price <- prices[1] # Base year price
fixed_index <- (prices / base_price) * 100
years <- 2020:2023

# Display the index numbers


data.frame(Year = years, Price = prices, FixedBaseIndex = fixed_index)

6.3 Chain-based Index Numbers


A Chain-based Index Number uses the previous period as the base, allowing for continuous comparison from
one period to the next. This method is useful when there are changes in the base year, or when the data series is
extended over a long period.
The formula for a chain-based index number is:
Pt
It = × It−1
Pt−1

where:
• It is the index number for year t.
• Pt is the price or value in year t.

• Pt−1 is the price or value in the previous year.


• It−1 is the index number for the previous year.

6.3.1 Example of Chain-based Index Numbers


Using the same price data as before:

Year Price (in $) Chain-based Index


2020 (Base Year) 50 100
55
2021 55 50 × 100 = 110
60
2022 60 55 × 110 = 120
65
2023 65 60 × 120 = 130

Table 7: Chain-based Index Number Example

6.3.2 R Code for Chain-based Index Numbers


Below is an R code snippet to compute the chain-based index numbers:

# R Code for Chain-based Index Numbers


prices <- c(50, 55, 60, 65) # Prices for 2020 to 2023
chain_index <- numeric(length(prices))
chain_index[1] <- 100 # Base year index

# Compute chain-based index


for (i in 2:length(prices)) {
chain_index[i] <- (prices[i] / prices[i-1]) * chain_index[i-1]

78 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

# Display the index numbers


data.frame(Year = years, Price = prices, ChainBasedIndex = chain_index)

6.4 Uses of Index Numbers


Index numbers are indispensable tools in economics and business analysis. Their main applications are outlined as
follows:

1. Economic Barometers: Index numbers act as economic barometers, measuring fluctuations in economic
indicators such as price levels, the money market, and economic cycles like inflation and deflation. According
to G. Simpson and F. Kafka, ”Index numbers are among the most widely used statistical devices today, taking
the pulse of the economy and indicating tendencies towards inflation or deflation.”
2. Formulation of Economic Policies: Index numbers play a crucial role in guiding economic and business
policies. For instance, when determining the increase in Dearness Allowance (DA) for employees, employers
rely on the Cost of Living Index. Failure to adjust salaries or wages according to cost of living changes can
lead to labor unrest, such as strikes or lockouts.
3. Studying Trends and Tendencies: Index numbers are extensively used to measure changes over time,
forming a time series that helps in analyzing the general trend of a phenomenon. For example, data on imports
over the last 8-10 years might indicate an upward trend.
4. Forecasting Future Economic Activity: Beyond analyzing past and present economic conditions, index
numbers are valuable for forecasting future economic activities, providing insights that help in making informed
decisions.
5. Measuring the Purchasing Power of Money: Index numbers, especially the Cost of Living Index, are
used to determine changes in real wages. Real wages can be calculated using the formula:
Money Wages
Real Wages = × 100
Price Index
This helps assess whether the purchasing power of money is rising, falling, or remaining constant.
6. Deflating Economic Data: Index numbers are crucial for deflating economic data, i.e., adjusting wages,
income, and sales figures for changes in the cost of living. This transformation allows for the calculation of real
wages, real income, and real sales using appropriate index numbers, providing a clearer picture of economic
conditions.

7 Methods of Constructing Index Numbers


Index numbers can be broadly categorized into two types: Unweighted and Weighted indices, depending on
whether the quantities of commodities are considered while calculating the index.

7.1 Unweighted Indices


In unweighted indices, all items are considered equally important, and no weights are assigned to reflect the relative
significance of the items. This method is simple but may not always provide an accurate representation of the overall
change if the items vary significantly in importance.

1. Simple Aggregative Method


• This method calculates the index as the ratio of the total of current year prices to the total of base year
prices, multiplied by 100. P
Pt
I=P × 100
P0
79 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Commodity Base Year Price (P0 ) Current Year Price (Pt )


A 50 60
B 80 100
C 120 150
D 30 40

Table 8: Data for Simple Aggregative Method

• Example: Consider four commodities with the following data:


The index number is:
60 + 100 + 150 + 40 350
I= × 100 = × 100 = 125
50 + 80 + 120 + 30 280
This indicates a 25% increase in prices from the base year.

• Advantages:
– Easy to understand and calculate.
– Requires minimal data.
• Disadvantages:
– Does not account for the relative importance of commodities.
– May be misleading if the prices of less important commodities change drastically.
2. Simple Average of Relatives
• This method averages the price relatives of individual items. A price relative is the ratio of the current
year price to the base year price, multiplied by 100.
• When this method is used to construct a price index number, first of all price relatives are obtained for
the various items included in the index and then the average of these relatives is obtained using any one
of the averages i.e. mean or median etc.
P  Pt 
P0 × 100
I=
n

80 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

• Example: Using the previous data:


60 100 150 40
   
50 × 100 + 80 × 100 + 120 × 100 + 30 × 100
I=
4
120 + 125 + 125 + 133.33 503.33
I= = = 125.83
4 4
The index number is 125.83, indicating an average increase of 25.83%.

• Advantages:
– Simple and easy to compute.
– Each commodity’s price change is accounted for.
• Disadvantages:
– Does not consider the quantity or importance of items.
– Sensitive to extreme values.
81 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

7.2 Weighted Indices


In weighted indices, weights are assigned to commodities based on their significance. This method provides a more
accurate measure by reflecting the relative importance of each item.

1. Weighted Aggregative Method


• In the weighted aggregative method, weights are assigned based on the quantities of commodities to
reflect their relative importance. The index number is calculated using the weighted sum of prices. The
commonly used methods are:
(a) Laspeyres’ Method
– This method is devised by Lasperey in year 1871.It is the most important of all the types of index
numbers. In this method the base year quantities are taken weights.
– Uses base year quantities as weights.
– Formula: P
Pt Q0
IL = P × 100
P0 Q0
Where:
∗ Pt = Current year price
∗ P0 = Base year price
∗ Q0 = Base year quantity
– Example:

Commodity P0 Pt Q0
A 50 60 10
B 80 100 5
C 120 150 8
D 30 40 12

Table 9: Data for Laspeyres’ Method

(60 × 10) + (100 × 5) + (150 × 8) + (40 × 12)


IL = × 100
(50 × 10) + (80 × 5) + (120 × 8) + (30 × 12)
2760
IL = × 100 = 127.78
2160
– Advantages:
∗ Easy to compute using base year quantities.
∗ Useful for comparisons when the base year is relevant.
– Disadvantages:
∗ Ignores changes in consumption patterns over time.
∗ May not accurately reflect current market conditions if the base year is outdated.
(b) Paasche’s Method
– Uses current year quantities as weights.
– Formula: P
Pt Qt
IP = P × 100
P0 Qt
Where:
∗ Qt = Current year quantity

82 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Commodity P0 Pt Qt
A 50 60 12
B 80 100 6
C 120 150 9
D 30 40 15

Table 10: Data for Paasche’s Method

– Example:

(60 × 12) + (100 × 6) + (150 × 9) + (40 × 15)


IP = × 100
(50 × 12) + (80 × 6) + (120 × 9) + (30 × 15)
3420
IP = × 100 = 129.55
2640
– Advantages:
∗ Reflects current consumption patterns.
∗ More accurate for current year analysis.
– Disadvantages:
∗ Requires current year quantities, which may not always be available.
∗ Less stable, as it changes with the current year data.
(c) Fisher’s Ideal Index
– Fisher’s Ideal Index is the geometric mean of Laspeyres and Paasche indices. It is considered an
”ideal” index as it minimizes the biases inherent in both methods.
– Formula: p
IF = IL × IP
– Example: Using the previous calculations for IL = 127.78 and IP = 129.55:

IF = 127.78 × 129.55

IF = 16554.39 = 128.66
– Advantages:
∗ Satisfies the time reversal and factor reversal tests, making it a consistent measure.
∗ Balances both base year and current year data.
– Disadvantages:
∗ More complex to calculate.
∗ Requires both base year and current year quantity data.
– Fisher’s index number is called ideal index number. Why?
∗ It is based on the G.M which is theoretically considered as the best average of constructing
index numbers.
∗ It takes into account both current and base year prices as quantities.
∗ It satisfies both time reversal and factor reversal test which are suggested by Fisher.
∗ The upward bias of Laspereys index number and downward bias of Paasches index number are
balanced to a great extent.

83 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

7.3 Comparison of Laspeyres’ and Paasche’s Index Numbers


Index numbers are widely used in economics and statistics to measure changes in variables such as prices or quantities
over time. Two of the most commonly used index numbers are Laspeyres’ and Paasche’s index numbers.

Laspeyres’ Index Number


The Laspeyres index is calculated using the base period quantities as weights. The formula is:
P
(p1 q0 )
L= P × 100
(p0 q0 )
where:
• p0 = Price in the base period

84 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

• p1 = Price in the current period


• q0 = Quantity in the base period

Example

Item p0 p1 q0
A 10 12 100
B 20 25 50
C 15 18 80
(12 × 100) + (25 × 50) + (18 × 80) 1200 + 1250 + 1440
L= × 100 = × 100 = 119.35
(10 × 100) + (20 × 50) + (15 × 80) 1000 + 1000 + 1200

Paasche’s Index Number


The Paasche index uses current period quantities as weights. The formula is:
P
(p1 q1 )
P =P × 100
(p0 q1 )
where:

• p0 = Price in the base period


• p1 = Price in the current period
• q1 = Quantity in the current period

Example

Item p0 p1 q1
A 10 12 120
B 20 25 60
C 15 18 90
(12 × 120) + (25 × 60) + (18 × 90) 1440 + 1500 + 1620
P = × 100 = × 100 = 123.64
(10 × 120) + (20 × 60) + (15 × 90) 1200 + 1200 + 1350

Comparison
• Weights: Laspeyres’ index uses base period quantities, while Paasche’s index uses current period quantities.

• Bias: Laspeyres’ index tends to overstate price increases because it does not account for changes in consumption
patterns. Paasche’s index can understate price increases because it uses current period quantities that might
be influenced by price changes.
• Use Case: Laspeyres’ index is easier to compute when historical quantity data is available. Paasche’s index
is more reflective of current consumption patterns.

7.4 Weighted average of relatives


The weighted average of relatives is calculated by assigning weights to price or quantity relatives. Typically, the base
year values (p0 q0 ) are used as weights.

85 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Formula Using Arithmetic Mean (A.M.)


When the Arithmetic Mean (A.M.) is used, the formula is:
P
(V · P )
P = P
V
where:

• P = p1
p0 × 100 is the price relative.

• V = p0 q0 is the base year value (price multiplied by quantity in the base year).

Formula Using Geometric Mean (G.M.)


When the Geometric Mean (G.M.) is used, the formula is:
P
V · log P
log P = P
V
To compute P , take the antilog of the weighted mean of the logarithms:
P 
(V · log P )
P = antilog P
V

Components of the Formula


• P = p1
p0 × 100, where:

– p1 = Current period price.


– p0 = Base period price.
• V = p0 q0 , which is the value in the base year.

Explanation
• The Arithmetic Mean (A.M.) method calculates the simple weighted average of price relatives.
• The Geometric Mean (G.M.) method is used when a more proportional measure is needed, as it reduces
the impact of extreme values.

These methods are commonly used in economic indices to calculate weighted averages, such as price index
numbers, by considering the relative importance of each component.

86 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

7.5 The Chain Index Numbers


In the fixed base method, the base remains constant throughout, meaning that the relatives for all years are calculated
using the prices of a single base year. Conversely, in the chain base method, the relatives for each year are calculated
based on the prices of the immediately preceding year. Thus, the base changes from year to year.
Chain index numbers are particularly useful for comparing current year figures with those of the preceding year.
The relatives obtained using this method are called link relatives.

Link Relatives
The formula for calculating the link relative for a current year is:
Current Year’s Figure
Link Relative for Current Year = × 100
Previous Year’s Figure

Chain Index Formula


Using the link relatives, the chain index for each year can be computed using the following formula:
Chain Index of Previous Year · Link Relative of Current Year
Chain Index for Current Year =
100

87 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Note
If there is only one commodity and its index is being calculated, the fixed base index number computed directly from
the original data will be equal to the chain index number computed from the link relatives.

Applications and Significance


• The chain base method is flexible, as it adjusts the base year annually, reflecting more recent comparisons.
• It is particularly helpful in economic and business scenarios where annual changes are crucial for analysis.

Example: 1.
From the following data of wholesale prices of wheat for ten years construct index number taking
1. 1998 as base

2. by chain base method

Solution

88 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example 2: Compute the Chain Index Number


Compute the chain index number with 2003 prices as the base year using the following data for average wholesale
prices of commodities A, B, and C from 2003 to 2007.

Given Data

89 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Example 3.
Compute the chain base index numbers

Example 4.
Calculate fixed base index numbers from the following chain base index numbers

90 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

article booktabs

7.6 Chain Index Numbers: Merits, Demerits, and Comparison


Merits of Chain Index Numbers
1. The chain base method has great significance in practice because, in economic and business data, we are often
concerned with making comparisons with the previous period.

2. The chain base method does not require recalculation if some more items are introduced or deleted from the
old data.
3. Index numbers calculated using the chain base method are free from seasonal and cyclical variations.

Demerits of Chain Index Numbers


1. This method is not useful for long-term comparisons.
2. If there is any abnormal year in the series, it will affect the subsequent years as well.

Differences Between Fixed Base and Chain Base Methods


Chain Base Method Fixed Base Method
1. Here, the base year changes. 1. The base year does not change.
2. Link relative method is used. 2. No such link relative method is used.
3. Calculations are tedious. 3. Calculations are simple.
4. It cannot be computed if any one 4. It can be computed if any year is
year is missing. missing.
5. Suitable for short periods. 5. Suitable for long periods.
6. Index numbers will be incorrect if an 6. The error is confined to the index of
error is committed in the calculation of that year only.
link relatives.

7.7 Base shifting


Base shifting is one of the most frequent operations necessary when working with index numbers. It involves changing
the base of an index from one period to another without recompiling the entire series. This process is referred to as
base shifting.

91 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Reasons for Base Shifting


1. If the previous base has become too old and is almost useless for purposes of comparison.
2. If the comparison is to be made with another series of index numbers having a different base.

Formula for Base Shifting


The formula used in base shifting is:
Current year’s old index number
Index number based on new base year = × 100
New Base year’s old index number
This formula allows the recalculation of index numbers with a new base year without recompiling the entire series.

Problem: Base Shifting


The following are the index numbers of prices with 1998 as the base year:

Year Index (Base Year: 1998)


1998 100
1999 110
2000 120
2001 200
2002 400
2003 410
2004 400
2005 380
2006 370
2007 340

Task
Shift the base from 1998 to 2004 and recast the index numbers.

Solution
The formula for shifting the base year is:
Old Index Number
Index Number Based on New Base Year (2004) = × 100
Index for 2004

Recalculated Index Numbers (Base Year: 2004)


Using the formula, the index numbers with 2004 as the new base year are:

92 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Year Index (Base Year: 1998) Index (Base Year: 2004)


100
1998 100 400 × 100 = 25.00
110
1999 110 400 × 100 = 27.50
120
2000 120 400 × 100 = 30.00
200
2001 200 400 × 100 = 50.00
400
2002 400 400 × 100 = 100.00
410
2003 410 400 × 100 = 102.50
400
2004 400 400 × 100 = 100.00
380
2005 380 400 × 100 = 95.00
370
2006 370 400 × 100 = 92.50
340
2007 340 400 × 100 = 85.00

Conclusion
The recalculated index numbers with 2004 as the base year reflect the price movements relative to the new base year.

7.8 Splicing of Two Series of Index Numbers


The problem of combining two or more overlapping series of index numbers into one continuous series is called
splicing. In other words, if we have a series of index numbers with some base year that is discontinued at a certain
year, and another series of index numbers with the year of discontinuation as the base year, connecting these two
series to make a continuous series is referred to as splicing.

Formula for Splicing


The formula used for splicing two series of index numbers is:
Index Number to be Spliced × 100
Index Number After Splicing =
Old Index Number of Existing Base
This formula ensures that the two series are seamlessly combined into a single continuous series.

Example 1.
The index A given was started in 1993 and continued up to 2003 in which year another index B was started. Splice
the index B to index A so that a continuous series of index is made

93 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Solution

7.9 Deflating
Deflating means correcting or adjusting a value that has inflated. It makes allowances for the effect of price changes.
When prices rise, the purchasing power of money declines. For example, if the money incomes of people remain
constant between two periods but the prices of commodities double, the purchasing power of money is reduced to
half.
For instance, if the price of rice increases from Rs.10/kg in the year 1980 to Rs.20/kg in the year 1982, a person
can buy only half a kilogram of rice with Rs.10 in 1982. This implies that the purchasing power of a rupee is only
50 paise in 1982 compared to 1980.

Purchasing Power of Money


The purchasing power of money can be calculated as:
1
Purchasing Power of Money =
Price Index

94 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Real Wages
In times of rising prices, money wages should be deflated by the price index to get the figure of real wages. Real
wages alone indicate whether a wage earner is in a better or worse position.
To calculate real wages, the money wages or income are divided by the corresponding price index and multiplied
by 100:
Money Wages
Real Wages = × 100
Price Index
The real wage index can also be computed using the following formula:
Real Wage of Current Year
Real Wage Index = × 100
Real Wage of Base Year
These calculations provide meaningful insights into the actual purchasing power and living standards of individuals
over time.

Exercise 1.
The following table gives the annual income of a worker and the general Index Numbers of price during 1999-2007.
Prepare Index Number to show the changes in the real income of the teacher and comment on price increase.

7.10 Optimum Tests for Index Numbers


Index numbers are used to measure changes in a variable over time, and they must satisfy certain conditions or tests
to ensure their reliability. These tests help verify the accuracy and consistency of index numbers and their suitability
for practical use. The following are some of the key tests that an index number should pass to be considered optimum:

95 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

1. Test of Consistency or Time Reversal Test


The index number should be consistent over time, meaning that it should remain valid when the direction of com-
parison is reversed. Specifically, if the index number is calculated using a certain base period and compared to a
subsequent period, it should produce the same result if the roles of the two periods are reversed.
Example: Let us consider the following price index numbers based on the base year 2010:

Index for 2015 based on 2010 = 125


Now, if we reverse the comparison (i.e., compare 2010 with 2015), the index should be:
1
Index for 2010 based on 2015 = = 0.8 × 100 = 80
125
Thus, the index for 2015 based on 2010 is 125, and the index for 2010 based on 2015 is 80, satisfying the time
reversal test.

2. Test of Proportionality or Factor Reversal Test


This test ensures that if the prices and quantities of the commodities are scaled by a common factor, the index
number will change proportionally. If an index number is to satisfy the factor reversal test, it should behave such
that:

Index Number (Price-based) × Index Number (Quantity-based) = 100


Example: Suppose we have the following data for two commodities, A and B:

Price Index based on 2010 for A = 120, Price Index based on 2010 for B = 110
Quantity Index based on 2010 for A = 95, Quantity Index based on 2010 for B = 105
Multiplying the price and quantity indices for A and B:

For A: 120 × 95 = 11400


For B: 110 × 105 = 11550
Both values are approximately 100 when scaled to the base year. This satisfies the factor reversal test.

3. Unit Test or Unit Consistency


The unit test ensures that the index number is independent of the units of measurement of the variable being studied.
If the units are changed, the index number should not change.
Example: Let’s assume the price of a commodity in 2020 was Rs 50 per unit, and in 2021, the price rose to Rs
60 per unit. The price index number is:
60
Price Index = × 100 = 120
50
Now, suppose the price is measured in dollars instead of rupees, and the price in 2020 is 50andin2021is60. The
price index number is still:
60
Price Index = × 100 = 120
50
This shows that the index is not affected by the change in units, satisfying the unit consistency test.

96 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

4. Test of Aggregation or Circular Test


The aggregation test checks that the index number satisfies the requirement of being consistent when different groups
of commodities or data points are combined. The circular test ensures that the index number remains consistent
when calculated for different groups, either individually or in a cumulative manner.
Example: Let’s say we have price indices for two groups of goods, X and Y:

Price Index for X = 120, Price Index for Y = 110


If the overall index is calculated for both X and Y together, using the weighted average method, we would
compute:
120 + 110
Overall Index = = 115
2
This index should be consistent even if computed for different combinations of X and Y.

5. Test of Homogeneity
The test of homogeneity ensures that an index number should be applicable to the entire data set or series, regardless
of the types of commodities or components being considered. It means that the variables included in the index should
share the same characteristics, making them compatible for comparison.
Example: If we calculate an index number for different commodities such as food, clothing, and transportation,
they should be comparable if they share similar characteristics, such as being part of the consumer basket. If, for
example, we include a highly volatile commodity like gold in the same index, it could distort the results, as it does
not have the same consumption pattern as food or clothing.

6. Time and Factor Reversal Test


This test combines both time reversal and factor reversal tests. It asserts that if the time direction is reversed
(comparing earlier period with later period) and all factors are scaled by the same constant, the index number
should produce a consistent result.
Example: Let’s assume we have a price index number for a commodity as:

Price Index from 2010 to 2015 = 130


Reversing the time direction (i.e., comparing 2015 with 2010):
1
Price Index from 2015 to 2010 = = 0.769 × 100 = 77
130
The factor reversal test would ensure that if we calculate a quantity index, it satisfies the following relation:

Price Index × Quantity Index = 100


Thus, if the price index number is accurate, the time and factor reversal tests help to confirm its consistency.

7.11 Cost of Living Index Numbers (Consumer Price Index Numbers)


The cost of living index numbers measure the changes in the level of prices of commodities that directly affect
the cost of living for a specified group of persons in a specific place. Unlike general index numbers, they provide
insights into the cost of living for different classes of people across various locations.
Different classes of people consume different commodities, and consumption habits vary by individual, location,
and socio-economic class. For example, the cost of living for rickshaw pullers in Bhubaneswar differs from that of
rickshaw pullers in Kolkata. The consumer price index helps determine the effect of price fluctuations on various
classes of consumers in different areas.

97 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Main Steps in Constructing Cost of Living Index Numbers


The following steps are essential for constructing a cost of living index number:

1. Decision about the class of people for whom the index is meant: It is important to decide the target
class of people, such as industrial workers, teachers, officers, or laborers. Additionally, the geographical area
(e.g., a city, industrial area, or locality) should also be specified.

2. Conducting family budget enquiry: After defining the scope, a sample family budget enquiry is conducted
for the target group. This involves selecting a sample of families and analyzing their budgets in detail during
a normal economic period. The enquiry provides information about the average expenditure on different
commodities, categorized as:
• Food
• Clothing
• Fuel and Lighting
• House Rent
• Miscellaneous

3. Collecting retail prices of different commodities: Retail prices are collected from local markets, super
bazaars, or departmental stores frequented by the target group. Since prices may vary by location, shop, and
individual, this step is both critical and challenging.

Uses of Cost of Living Index Numbers


Cost of living index numbers serve several purposes:

1. They indicate whether real wages are rising or falling, which helps in determining the purchasing power of
money. The purchasing power of money can be calculated as:
1
Purchasing Power of Money =
Cost of Living Index Number
Real wages can be computed as:
Money Wages × 100
Real Wages =
Cost of Living Index Number

2. They are used to regulate dearness allowance (D.A.) or grant bonuses to workers, enabling them to cope with
increased living costs.
3. They play a crucial role in wage negotiations.

4. They are used to analyze markets for specific goods.

7.12 Methods for Construction of Cost of Living Index Numbers


Cost of living index numbers can be constructed by the following methods:

1. Aggregate Expenditure Method (or Weighted Aggregative Method)

2. Family Budget Method (or Method of Weighted Relatives)

98 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

1. Aggregate Expenditure Method (Weighted Aggregative Method)


In this method, the quantities of commodities consumed by a particular group in the base year are taken as weights.
The formula for the consumer price index is:
P
p1 q0
Consumer Price Index = P × 100
p0 q0
Steps:
1. Multiply the prices of commodities for various groups in the current year by the quantities of the base year,
and obtain the aggregate expenditure for the current year:
X
p1 q0

2. Similarly, obtain the aggregate expenditure for the base year:


X
p0 q0

3. Divide the aggregate expenditure of the current year by the aggregate expenditure of the base year and multiply
the quotient by 100: P
p1 q0
Consumer Price Index = P × 100
p0 q0

2. Family Budget Method (Method of Weighted Relatives)


In this method, the cost of living index is obtained by taking the weighted average of price relatives, where the
weights are the values of quantities consumed in the base year, i.e., v = p0 q0 . The consumer price index number is
given by:
P
vp /p
Consumer Price Index = P1 0 × 100
v
Explanation:
• For each item, the price relative is calculated as:

p1 /p0 × 100

where p1 is the price in the current year, and p0 is the price in the base year.
• The weight v for each commodity is given by v = p0 q0 , the value of the commodity in the base year.
Note: It should be noted that the answer obtained by applying the Aggregate Expenditure Method and the
Family Budget Method will be the same.

Example 1.
Construct the consumer price index number for 2007 on the basis of 2006 from the following data using (i) the
aggregate expenditure method, and (ii) the family budget method.

99 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

7.13 Possible Errors in Construction of Cost of Living Index Numbers


Cost of living index numbers, or its recently popular name, consumer price index (CPI) numbers, may not be accurate
due to various reasons. The following are some of the common errors in their construction:

1. Inaccurate specification of groups: Errors may occur if the group for whom the index is meant is not
accurately specified.

2. Faulty selection of representative commodities: This can result from unscientific family budget inquiries,
leading to an unrepresentative selection of commodities.
3. Inadequate and unrepresentative price quotations: If price quotations are inadequate or unrepresenta-
tive, or if inaccurate weights are used, the index number may not reflect the true cost of living.

4. Frequent changes in demand and prices: Fluctuations in the demand and prices of commodities can affect
the reliability of the cost of living index.
5. The average family may not be representative: The average family used in the construction of the index
might not always be a truly representative sample of the target population.

100 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 3
Chapter - 2

8 Forecasting Strategies
Businesses rely on forecasts of sales to plan production, justify marketing decisions, and guide research. A very
efficient method of forecasting one variable is to find a related variable that leads it by one or more time intervals.
The closer the relationship and the longer the lead time, the better this strategy becomes. The trick is to find a
suitable lead variable.
For example, an Australian example is the Building Approvals time series published by the Australian Bureau of
Statistics. This provides valuable information on the likely demand over the next few months for all sectors of the
building industry. The number of building approvals can be a leading indicator of future construction activities.
As the approval of new buildings generally precedes actual construction, businesses can forecast the demand for
construction materials and labor using this data. A variation on the strategy of seeking a leading variable is to find
a variable that is associated with the variable we need to forecast and is easier to predict. For instance, the sales of
winter clothing might be more directly correlated with the weather forecast than with past sales data, making it a
useful variable to predict future demand.

Second Approach: Using Past Sales Data


In many applications, we cannot rely on finding a suitable leading variable and have to try other methods. A second
approach, common in marketing, is to use information about the sales of similar products in the past. This is often
seen in the influential Bass Diffusion Model, which is used to forecast the adoption of new products. The Bass model
is based on the assumption that new product sales are driven by two factors:
• Innovation: The influence of early adopters.
• Imitation: The influence of word of mouth or social contagion.
For example, if a company is launching a new smartphone, they might use sales data from similar previous product
launches to predict the adoption rate of the new phone. The model helps in estimating how many people will adopt
the product over time and at what rate.

Third Approach: Extrapolation of Present Trends


A third strategy is to make extrapolations based on present trends continuing and to implement adaptive estimates
of these trends. In this method, businesses analyze historical data and make predictions based on the assumption
that current trends will continue into the future. This is often done using methods such as linear regression or moving
averages. For example, if the demand for a product has been increasing at a constant rate for the last five years, a
business may forecast that this trend will continue in the coming years.

Examples of Forecasting Strategies


• Leading Variable: A company in the automobile industry may use the data of vehicle registration trends in
a particular region as a leading variable to forecast future vehicle sales in that region.
• Past Sales Data: A company launching a new fashion product can use historical data of similar products to
predict the sales of the new product, as seen in the Bass diffusion model.
• Extrapolation: A retail store may observe that their sales increase by 5% each year. They may extrapolate
this trend to forecast next year’s sales.
The statistical technicalities of forecasting are covered throughout this book, and the purpose of this section is
to introduce the general strategies that are available for businesses to forecast their future needs.
101 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

8.1 Leading variables and associated variables


In forecasting, it is often useful to identify a leading variable, which is a variable that changes before another variable
and can be used to predict its future values. For example, economic indicators such as interest rates, stock market
trends, or consumer sentiment indices can act as leading variables for predicting future economic activity, including
GDP growth or inflation rates.
Associated variables, on the other hand, do not necessarily lead but are correlated with the variable of interest.
These variables provide additional information that can help in refining predictions. For instance, the weather can
be associated with the sales of certain products like clothing, outdoor equipment, or beverages. While weather does
not lead sales, it has a strong correlation and is valuable in forecasting demand.
The key to successful forecasting lies in the identification of such variables and understanding their relationships. If
a suitable leading variable is found, it can significantly improve the accuracy of forecasts.

8.2 Marine Coatings


A leading international marine paint company uses publicly available statistics to forecast the numbers, types, and
sizes of ships to be built over the next three years. One source of such information is the World Shipyard Monitor,
which provides brief details of orders from over 300 shipyards. By maintaining a database of ship types and sizes,
the company is able to forecast the areas to be painted and, consequently, the likely demand for paint.
The company closely monitors its market share and uses these forecasts for planning production and setting
prices. The effectiveness of this method depends on the accuracy of shipyard data and market trends, as well as the
company’s ability to accurately predict the demand for marine coatings.

8.3 Building Approvals Publication


In Australia, the Australian Bureau of Statistics publishes detailed data on building approvals for each month, along
with data on building activity in the Building Activity Publication. These data are valuable for understanding the
trends in construction and can be used for forecasting related industries, including construction materials and labor
demand.
The Building Approvals data includes the total number of dwellings approved per month, while the Building
Activity data provides the value of building work done in each quarter. These series are used to track the health of
the construction sector and to forecast future building activities.
To illustrate the use of this data, consider the following example in R, where time series objects are created and
plotted based on the data:
1 > www <- " http : / / www . massey . ac . nz / ~ pscowper / ts / ApprovActiv . dat "
2 > Build . dat <- read . table ( www , header = T ) ; attach ( Build . dat )
3 > App . ts <- ts ( Approvals , start = c (1996 ,1) , freq =4)
4 > Act . ts <- ts ( Activity , start = c (1996 ,1) , freq =4)
5 > ts . plot ( App . ts , Act . ts , lty = c (1 ,3) )

The data from the file ApprovActiv.dat includes the following:

• Approvals: Total dwellings approved per month, averaged over the past three months.
• Activity: The value of building work done in millions of Australian dollars, chain volume measured at the
reference year 2004–05 prices.
The time series objects, App.ts and Act.ts, are created for approvals and activity, respectively. The ts.plot
function plots both series on the same graph, allowing for comparison and analysis of the trends over time.

102 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 16: Building approvals (solid line) and building activity (dotted line).

In Figure 16, we can see that the building activity tends to lag one quarter behind the building approvals, or
equivalently that the building approvals ap- pear to lead the building activity by a quarter. The cross-correlation
function, which is abbreviated to ccf, can be used to quantify this relationship. A plot of the cross-correlation
function against lag is referred to as a cross-correlogram.

8.4 Cross-Correlation
Suppose we have time series models for variables x and y that are stationary in both mean and variance. These
variables may each be serially correlated, and correlated with each other at different time lags. The combined
model is second-order stationary if all these correlations depend only on the lag. In this case, we can define the
cross-covariance function (ccvf) as a function of the lag k:

γk (x, y) = E [(xt+k − µx )(yt − µy )]


This is not a symmetric relationship, and variable x is lagging variable y by k. For instance, if x is the input to some
physical system and y is the response, the cause precedes the effect, so y will lag x. The ccvf will be 0 for positive
k, and there will be spikes in the ccvf at negative lags. Some textbooks define the ccvf with y lagging when k is
positive, but we have used the definition consistent with R. Regardless of the definition, the relationship holds:

γk (x, y) = γ−k (y, x)


When we have several variables and wish to refer to the acvf of one rather than the ccvf of a pair, we can write
it as, for example, γk (x, x).
The lag k cross-correlation function (ccf) is defined by:

γk (x, y)
ρk (x, y) =
σx σy
Where σx and σy are the standard deviations of x and y, respectively.
The ccvf and ccf can be estimated from a time series using their sample equivalents. The sample ccvf ck (x, y) is
calculated as:
n−k
1X
ck (x, y) = (xt+k − x̄) (yt − ȳ)
n t=1
The sample acf is defined as:

ck (x, y)
rk (x, y) = p
c0 (x, x)c0 (y, y)

103 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

8.5 Cross-Correlation between Building Approvals and Activity


The ts.union function binds time series with a common frequency, padding with ’NA’s to the union of their time
coverages. If ts.union is used within the acf command, R returns the correlograms for the two variables and the
cross-correlograms in a single figure.
1 acf ( ts . union ( App . ts , Act . ts ) )

Figure 17: Correlogram and cross-correlogram for building approvals and building activity.

In Figure 17, the autocorrelations for x and y are in the upper left and lower right frames, respectively, and the
cross-correlations are in the lower left and upper right frames. The time unit for the lag is one year, so a correlation
at a lag of one quarter appears at 0.25. If the variables are independent, we would expect 5% of sample correlations
to lie outside the dashed lines. Several of the cross-correlations at negative lags pass these lines, indicating that the
approvals time series is leading the activity. Numerical values can be printed using the print() function, and are as
follows at lags of 0, 1, 2, and 3:

0.432, 0.494, 0.499, 0.458

104 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 18: Enter Caption

The ccf can be calculated for any two time series that overlap, but if both have trends or similar seasonal effects,
these will dominate (Exercise 1). It is often necessary to remove the trend and seasonal effects before investigating
cross-correlations. Here, we use the decompose function, which uses a centered moving average of four quarters (see
Fig. 18).
We perform the decomposition for both series as follows:
1 app . ran <- decompose ( App . ts ) $ random
2 app . ran . ts <- window ( app . ran , start = c (1996 , 3) )
3 act . ran <- decompose ( Act . ts ) $ random
4 act . ran . ts <- window ( act . ran , start = c (1996 , 3) )
5
6 acf ( ts . union ( app . ran . ts , act . ran . ts ) )
7 ccf ( app . ran . ts , act . ran . ts )

We can print the autocorrelation and cross-correlation functions as follows:


1 print ( acf ( ts . union ( app . ran . ts , act . ran . ts ) ) )

The output will display the autocorrelations for the approvals and activity series as well as the cross-correlations
between the two series. A sample output is:

Lag (in years) Autocorrelation (Approvals) Autocorrelation (Activity)


0.00 1.000 0.123
0.25 0.422 0.258
0.50 −0.328 −0.410
0.75 0.510 −0.250
1.00 −0.461 0.071
1.25 −0.400 0.353
1.50 −0.193 0.180

105 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 19: Cross-correlogram of the random components of building approvals and building activity after using
decompose

The ccf function produces a single plot, shown in Figure 19, illustrating the lagged relationship between the two
time series. The Australian Bureau of Statistics publishes building approvals data by state and other categories, and
specific sectors of the building industry may find higher correlations between demand for their products and one of
these series.

8.6 Bass Model


The Bass model, introduced by Frank Bass in 1969, quantifies the adoption and diffusion of a new product in society.
This model has been influential in marketing and is often used by entrepreneurs to justify their funding requirements.
It is also widely applied in market research, such as by the Marketing Science Centre at the University of South
Australia, which became the Ehrenberg-Bass Institute for Marketing Science in 2005.

8.6.1 Model Definition


The Bass model describes the number of people Nt who have adopted a product at time t. It depends on three
parameters: - m: the total number of people who will eventually buy the product, - p: the coefficient of innovation
(the rate at which innovators adopt the product), - q: the coefficient of imitation (the rate at which adopters influence
others).
The Bass formula is expressed as a difference equation:

Nt (m − Nt )
Nt+1 = Nt + p(m − Nt ) + q
m
This equation states that the increase in sales over the next period is the sum of two components: - A fixed
proportion p of people who will eventually buy the product, - A time-varying proportion qN m of people who are
t

influenced by existing adopters.


The underlying assumption is that early adopters are driven by the novelty of the product, while later adopters
are influenced by others’ usage of the product.
The solution to this difference equation is:

m 1 − e−(p+q)t

Nt =  
1 + pq e−(p+q)t

This is the discrete-time form of the model. A continuous-time version also exists, which is easier to verify
mathematically.

106 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

8.6.2 Interpretation of the Bass Model


The Bass model can be interpreted in terms of the probability distribution of time until purchase. This distribution
is parameterized by p and q. Let f (t), F (t), and h(t) denote the probability density function (pdf), the cumulative
distribution function (cdf), and the hazard function, respectively. The hazard function represents the probability
that a random individual who has not yet made a purchase will do so in the next small time interval.
The hazard function is defined as:

f (t)
h(t) =
1 − F (t)
The hazard function in the Bass model is given by:

h(t) = p + qF (t)
This shows that the hazard depends on the cumulative proportion of people who have adopted the product by
time t. The cumulative distribution function F (t) can be expressed as:

1 − e−(p+q)t
F (t) =  
1 + pq e−(p+q)t

Two special cases of the distribution occur when q = 0 (exponential distribution) and p = 0 (logistic distribution).
The logistic distribution resembles the normal distribution.
The probability density function is the derivative of the cumulative distribution function:

(p + q)2 e−(p+q)t
f (t) =    2
p 1 + pq e−(p+q)t

The sales per unit time at time t are given by:

m(p + q)2 e−(p+q)t


S(t) = mf (t) =    2
p 1 + pq e−(p+q)t

The time to peak sales, tpeak , occurs when the sales rate is maximized. It is given by:

log(q) − log(p)
tpeak =
p+q

8.6.3 Example
In this example, we fit the Bass model to the yearly sales of VCRs in the US home market between 1980 and 1989
using the R non-linear least squares function nls. The variable T79 represents the year from 1979, while Tdelt
denotes the time from 1979 at a finer resolution (0.1 year) for plotting the Bass curves. The cumulative sum function
cumsum is useful for monitoring changes in the mean level of the process.
The sales data and the cumulative sales are given by:

Sales = {840, 1470, 2110, 4000, 7590, 10950, 10530, 9470, 7790, 5890}
Cumulative Sales = cumsum(Sales)
We fit the Bass model using the following R code:
1 T79 <- 1:10
2 Tdelt <- (1:100) / 10
3 Sales <- c (840 , 1470 , 2110 , 4000 , 7590 , 10950 , 10530 , 9470 , 7790 , 5890)
4 Cusales <- cumsum ( Sales )
5 Bass . nls <- nls ( Sales ~ M * ( (( P + Q ) ^2 / P ) * exp ( -( P + Q ) * T79 ) ) /
6 (1+( Q / P ) * exp ( -( P + Q ) * T79 ) ) ^2 , start = list ( M =60630 , P =0.03 , Q =0.38) )
7 summary ( Bass . nls )

107 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

The results from the nls function are summarized as follows:

Parameter Estimate Std. Error


M 68000 3128
Estimated Parameters:
P 0.0066 0.0014
Q 0.64 0.0414
The final estimates for the parameters are m = 68000, p = 0.0066, and q = 0.64. These values were obtained by
setting starting values for p and q based on typical product parameters and estimating m from the total recorded
sales.

8.6.4 Fitting the Bass Curve


The Bass curve is fitted to the sales data, and the estimated parameters are used to plot the predicted sales per year
and cumulative sales.
1 Bcoef <- coef ( Bass . nls )
2 m <- Bcoef [1]
3 p <- Bcoef [2]
4 q <- Bcoef [3]
5 ngete <- exp ( -( p + q ) * Tdelt )
6 Bpdf <- m * (( p + q ) ^2 / p ) * ngete / (1 + ( q / p ) * ngete ) ^2
7
8 # Plotting the sales per year
9 plot ( Tdelt , Bpdf , xlab = " Year from 1979 " , ylab = " Sales per year " , type = 'l ')
10 points ( T79 , Sales )
11
12 # Plotting the cumulative sales
13 Bcdf <- m * (1 - ngete ) / (1 + ( q / p ) * ngete )
14 plot ( Tdelt , Bcdf , xlab = " Year from 1979 " , ylab = " Cumulative sales " , type = 'l ')
15 points ( T79 , Cusales )

Figure 20: Bass sales curve fitted to sales of VCRs in the US home market, 1980–1989.

This generates two plots: the first showing the predicted sales per year in fig 20 and the second showing the
cumulative sales fig ??.

108 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Figure 21: Bass cumulative sales curve, obtained as the integral of the sales curve, and cumulative sales of VCRs in
the US home market, 1980–1989.

8.6.5 Parameter Ranges for Different Products


While fitting the Bass model to sales data is straightforward, forecasting with the model requires plausible values
for the parameters m, p, and q. Typical parameter values for different products are shown below:

Product m p q
Typical product - 0.030 0.380
35 mm projectors (1965–1986) 3.37 ×106 0.009 0.173
Overhead projectors (1960–1970) 0.961 ×106 0.028 0.311
PCs (1981–2010) 3.384 ×109 0.001 0.195

Table 11: Typical values for m, p, and q based on historical products.

Although forecasts based on the Bass model are inherently uncertain, they offer the best available information for
marketing and investment decisions. Scenarios can be developed based on the most likely, optimistic, and pessimistic
sets of parameters.

8.6.6 Extensions and Refinements


The basic Bass model does not account for replacement sales or multiple purchases. Extensions of the model that
include these factors, along with the effects of pricing and advertising, have been proposed. However, these refinements
may be less relevant to investors, as the focus is typically on a quick return on investment. Successful inventions
are often superseded by newer technologies, and replacement sales are limited once patent protection expires and
competitors enter the market.

8.7 Exponential Smoothing and Holt-Winters method


Exponential smoothing and the Holt-Winters method are popular forecasting techniques used for time series data,
especially when the data shows trends and seasonality. These methods are based on the concept of smoothing, where
future values are forecasted based on a weighted average of past observations, with weights decaying exponentially
as we move back in time.

8.7.1 Exponential Smoothing


Exponential smoothing is a time series forecasting method that applies weights to past observations, giving expo-
nentially more weight to recent observations. The simplest form of exponential smoothing is **Single Exponential
Smoothing**, suitable for data without trends or seasonality.

109 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Single Exponential Smoothing


In single exponential smoothing, the forecast for the next period ŷt+1 is given by:

ŷt+1 = αyt + (1 − α)ŷt


Where:

• ŷt+1 is the forecast for the next time period.


• yt is the actual value at time t.
• ŷt is the forecast for the current time period.
• α is the smoothing constant, where 0 < α < 1.

8.7.2 R Code for Single Exponential Smoothing


Here’s how to apply single exponential smoothing in R using the HoltWinters() function with a smoothing param-
eter:
1 # R code for Exponential Smoothing
2 data <- c (50 , 51 , 52 , 53 , 55 , 56 , 57 , 58 , 60 , 61) # Example data
3 model <- HoltWinters ( data , beta = FALSE , gamma = FALSE ) # Exponential Smoothing ( alpha only )
4 forecast <- predict ( model , n . ahead =3) # Forecast for the next 3 periods
5 print ( forecast )

The HoltWinters() function can be used for exponential smoothing by setting beta=FALSE and gamma=FALSE,
as it disables trend and seasonality components.

8.8 Holt-Winters Method


The Holt-Winters method extends exponential smoothing to handle time series data with trends and seasonality. It
is a more advanced method consisting of three components:
1. Level (smoothing of the series)
2. Trend (smoothing of the trend component)

3. Seasonality (smoothing of the seasonal component)


The forecast equation in the Holt-Winters method is:

ŷt+1 = (Lt + Tt ) + ϕSt+1


Where:
• Lt is the level component.

• Tt is the trend component.


• St+1 is the seasonal component.
• ϕ is a seasonal factor (for additive or multiplicative seasonality).

110 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

8.8.1 Holt-Winters Additive Model


In the additive model, the forecast equation is:

Lt = αyt + (1 − α)(Lt−1 + Tt−1 )


Tt = β(Lt − Lt−1 ) + (1 − β)Tt−1
St = γ(yt − Lt ) + (1 − γ)St−p
Where:
• α, β, and γ are smoothing parameters.
• p is the period length for the seasonality.

8.8.2 Holt-Winters Multiplicative Model


For multiplicative seasonality, the equations change to:
yt
Lt = α + (1 − α)(Lt−1 + Tt−1 )
St−p
Tt = β(Lt − Lt−1 ) + (1 − β)Tt−1
yt
St = γ + (1 − γ)St−p
Lt
The multiplicative model is used when the seasonal fluctuations are proportional to the level of the time series.

8.8.3 R Code for Holt-Winters Method


The HoltWinters() function in R can be used to apply the Holt-Winters method for both additive and multiplicative
seasonality.
1 # Holt - Winters Method in R
2 data <- c (100 , 120 , 130 , 150 , 170 , 180 , 200 , 210 , 220 , 230) # Example data
3 mode l_additi ve <- HoltWinters ( data , seasonal = " additive " ) # Additive seasonality
4 f o r e c a s t _ a d d i t i v e <- predict ( model_additive , n . ahead =3) # Forecast next 3 periods
5 print ( f o r e c a s t _ a d d i t i v e )
6
7 m o d e l _ m u l t i p l i c a t i v e <- HoltWinters ( data , seasonal = " mult iplicati ve " ) # Mu ltiplica tive seasonality
8 f o r e c a s t _ m u l t i p l i c a t i v e <- predict ( model_multiplicative , n . ahead =3) # Forecast next 3 periods
9 print ( f o r e c a s t _ m u l t i p l i c a t i v e )

In this code: - seasonal="additive" is used when the seasonal component is additive. - seasonal="multiplicative"
is used when the seasonal component is multiplicative.

8.9 Plotting and Visualizing Forecasts


You can also visualize the forecasts alongside the original data for better insights. Here is an example of how to plot
the forecasts:
1 # Plotting the results
2 plot ( data , main = " Sales Forecasting with Holt - Winters Method " , xlab = " Time " , ylab = " Sales " )
3 lines ( fitted ( mo del_add itive ) , col = " blue " , lty =2)
4 lines ( forecast_additive , col = " red " , lty =1)
5 legend ( " topleft " , legend = c ( " Original Data " , " Fitted Model ( Additive ) " , " Forecast ( Additive ) " ) ,
6 col = c ( " black " , " blue " , " red " ) , lty = c (1 , 2 , 1) )

This will plot the original data (black), the fitted values for the additive model (blue), and the forecasted values
for the next 3 periods (red).

111 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

8.10 Choosing the Right Model


The choice between exponential smoothing and the Holt-Winters method depends on the characteristics of your data:
• Exponential smoothing is suitable when there is no trend or seasonality in the data.

• Holt-Winters method is appropriate when there is a trend or seasonal component in the data. Use the
additive model when the seasonal variations are roughly constant, and the multiplicative model when they are
proportional to the level of the series.

112 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 4
Chapter - 1

9 Basic Stochastic Models


9.1 White Noise, Random Walks, Fitted models & diagnostic plots
9.2 Autoregressive models
9.2.1 stationary and non-stationary Autoregressive process

113 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 4
Chapter - 2

10 Time series Regression and Exploratory Data Analysis


10.1 Classical Regression
10.2 Exploratory Data Analysis
10.3 generalized least square method
10.4 linear models with seasonal variables
10.5 Harmonic seasonal models
10.6 logarithmic transforms

114 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis

Module - 5
Chapter - 1

11 Linear Models
11.1 Moving Average models
11.2 Fitted MA Models
11.2.1 Autoregressive Moving Average Models

11.3 Differential Equations


11.4 Autocorrelation and Partial Correlation
11.5 Forecasting & Estimation
11.6 Non-stationary Models
11.6.1 Building non-seasonal ARIMA Models
11.6.2 ARCH Models & GARCH Models

115 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S

You might also like