Time_Series_Analysis (3)
Time_Series_Analysis (3)
Time_Series_Analysis (3)
Lecture Review
Contents
4 Correlation 37
4.1 Expectation and the ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 The Ensemble and Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 Ergodic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Variance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 correlogram, covariance of sum of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Example based on air passenger series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
5 Seasonal Variation 47
5.1 Method of Simple Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Ratio-to- Trend Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Ratio-to-Moving Average Method and Link Relative Method . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Link relative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.5 Cyclical and Random Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5.1 Example of Cyclical Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Random Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6.1 Example of Random Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6.2 Deseasonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Variate Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7.1 Example 1: Monthly Sales Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7.2 Example 2: Daily Temperature Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
4 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 1
Chapter - 1
A time series is a sequence of statistical data organized according to the time of occurrence or in chronological order.
The numerical data collected at various points in time, forming a set of observations, is referred to as a time series. In
time series analysis, current data within a series can be compared with past data from the same series. Additionally,
the progression of two or more series over time can be compared. These comparisons can provide valuable insights
for individual businesses. Time series analysis is crucial in fields such as economics, statistics, and commerce.
Symbolically, if t represents time and yt denotes the value at time t, then the paired values (t, yt ) constitute the time
series data. Ex 1: Production of rice in Karnataka for the period from 2010-11 to 2016-17.
1.1 Purpose
Time series analysis is crucial for understanding historical data and forecasting future trends, which aids managers
and policymakers in making informed decisions. By quantifying key features and random variations in data, time
series methods have become widely applicable across government, industry, and commerce, especially with advances
in computing power. The Kyoto Protocol, an amendment to the United Nations Framework Convention on Climate
Change, was signed in December 1997 and came into effect on February 16, 2005. The rationale for reducing
greenhouse gas emissions involves a blend of scientific data, economic considerations, and time series analysis. The
decisions made in the coming years will have significant implications for the planet’s future.
In 2006, Singapore Airlines expanded its fleet by ordering twenty Boeing 787-9s and expressing intent to purchase
twenty-nine Airbus planes, including twenty A350s and nine A380s (superjumbos). This expansion was guided by
time series analysis of passenger trends and strategic corporate planning to maintain or enhance market share. Time
5 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
series methods are also employed in everyday operational decisions. For instance, UK gas suppliers must place orders
for offshore gas one day in advance. The variation from the seasonal average is influenced by temperature and, to
a lesser extent, wind speed. Time series analysis helps forecast demand by adjusting the seasonal average with
one-day-ahead weather forecasts.
Additionally, time series models underpin many computer simulations. Examples include evaluating inventory control
strategies using simulated demand series, comparing wave power device designs with simulated sea states, and
simulating daily rainfall to assess the long-term environmental impacts of proposed water management policies.
Time series data often exhibit trends and seasonal variations that can be modeled mathematically. Additionally,
observations close in time are typically correlated. Time series analysis aims to explain this correlation and other data
features using statistical models. Once a model is fitted, it can be used to forecast future values, conduct statistical
tests, and summarize the main characteristics of the data, aiding decision-making.Sampling intervals impact data
quality. Aggregated data, like daily tourist arrivals, or sampled data, such as daily stock prices, need appropriate
intervals to accurately reflect the original signal. In high-frequency trading or signal processing, continuous signals
are sampled at very high rates to create time series for detailed analysis.
1.2.2 Plots
Visualizing time series data is crucial for identifying patterns and trends. Common types of plots include:
• Line Plot: Displays data points connected by lines to show changes over time. Useful for identifying trends
and seasonal patterns.
• Scatter Plot: Plots individual data points to observe the relationship between two variables or to identify
patterns and outliers.
• Bar Plot: Represents data with bars, helpful for comparing discrete time periods or categories.
• Histogram: Shows the distribution of data over specified intervals, useful for understanding the frequency of
values.
• Box Plot: Displays the distribution of data based on quartiles, highlighting median, and potential outliers.
6 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1.2.3 Trends
Trends refer to the long-term movement or direction in the data over a period. Identifying trends helps in under-
standing the overall pattern:
• Stationary Trend: The data fluctuates around a constant mean without a long-term trend.
• Seasonal Variations: Regular patterns that repeat at consistent intervals, such as monthly or quarterly.
• Cyclical Variations: Fluctuations that occur over longer periods, influenced by economic or business cycles.
• Irregular Variations: Unpredictable changes due to unforeseen events or anomalies that do not follow a
pattern.
Understanding these components allows for effective analysis and forecasting of time series data.
1.3.2 Models
Many time series are dominated by trend and/or seasonal effects. A simple additive decomposition model is given
by:
xt = mt + st + zt (2)
where xt is the observed series, mt is the trend, st is the seasonal effect, and zt is the error term, often a sequence of
correlated random variables with mean zero. Two main approaches for extracting mt and st will be outlined along
with R functions for this.
For cases where the seasonal effect increases with the trend, a multiplicative model may be more suitable:
xt = mt · st + zt (3)
log(xt ) = mt + st + zt (4)
7 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Care is needed when transforming back to xt from log(xt ) to avoid bias. If zt is normally distributed with mean
0 and variance σ 2 , the predicted mean value is:
1 2
x̂t = emt +st + 2 σ (5)
For non-normal distributions, bias correction may lead to overcorrection, requiring an empirical adjustment. This is
critical, for instance, in financial forecasts, where underestimating mean costs is a common issue.
where t = 7, . . . , n − 6. The coefficients sum to 1, ensuring equal weight for each value. This method generalizes
to other seasonal frequencies (e.g., quarterly) by maintaining the condition that coefficients sum to unity.
The seasonal effect ŝt can be estimated by subtracting the trend:
Averaging the monthly estimates across all years provides a single estimate of the effect for each month. To ensure
the seasonal effects sum to zero, they are adjusted by subtracting the mean. For multiplicative models, the estimate
becomes:
xt
ŝt = (8)
m̂t
and multiplicative factors are adjusted to average to 1. Seasonally adjusted data, often used in economic indicators,
removes seasonal effects. If the seasonal effect is additive, the adjusted series is xt − s̄t , and if multiplicative, it is
xt /s̄t , where s̄t is the mean seasonal adjustment for the given time.
1.3.4 Smoothing
The centred moving average is a smoothing procedure applied retrospectively to identify an underlying trend in a
time series. It uses points before and after the target time, often leaving some missing values at the series’ start
and end unless adapted for edge points. Another smoothing method in R is ‘stl‘, which uses locally weighted regres-
sion (loess). This local regression considers a small number of points around the target time, weighted to reduce the
influence of outliers, making it a robust regression. While straightforward in principle, the details of ‘stl‘ are complex.
Unlike smoothing, which does not provide a forecast model, fitting a linear trend has the advantage of enabling
extrapolation. The term ”filtering” is also used in this context, particularly in engineering, to describe obtaining the
best estimate of a variable based on past and current noisy measurements. Filtering is vital in control algorithms,
such as those used by the Huygens probe during its 2005 landing on Titan.
1.3.5 Decomposition in R
In R, the function ‘decompose‘ estimates trends and seasonal effects using a moving average. Nesting it within
‘plot‘ (e.g., ‘plot(stl())‘) produces a figure showing the original series xt , and decomposed series mt , st , and zt . For
example, additive and multiplicative decomposition plots for electricity data are created by the following commands,
with the seasonal effect superimposed on the trend using ‘lty‘ for line types.
8 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 1: Electricity production data: trend with superimposed multiplicative seasonal effects.
A multiplicative model is often more suitable than an additive one when the variance of the series and trend
increase over time. However, if the random component zt also shows increasing variance, a log-transformation (Eq.
1.4) may be more appropriate.
9 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
The random series from ‘decompose‘ is not the true realization of zt , but an estimate derived from the trend and
seasonal components, treated as a residual error series, yet used as a realisation of the random process.
10 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 1
Chapter - 2
In R, we can visualize a simple time series data using the AirPassengers dataset:
1 # Example in R :
2 data ( " AirPassengers " )
3 plot ( AirPassengers , main = " AirPassengers Dataset " ,
4 ylab = " Number of Passengers " , xlab = " Year " )
Listing 2: Example in R
11 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• Seasonal Component (S): Regular fluctuations that repeat over a specific period, such as monthly sales
peaking every December.
• Cyclic Component (C): Recurrent but non-periodic fluctuations often linked to economic cycles.
or
Y (t) = T (t) × S(t) × C(t) × I(t) (10)
In R, we can decompose a time series to analyze these components:
1 # Decomposition Example in R :
2 decomposed <- decompose ( AirPassengers )
3 plot ( decomposed )
Listing 3: Decomposition Example in R
Advantages of InfluxDB:
12 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
– High Performance: InfluxDB is optimized for fast writes, making it ideal for use cases where data is
collected frequently, such as IoT applications or system monitoring. Its write throughput is among the
best in class for time series databases.
– Retention Policies: Users can define retention policies to automatically expire older data, thus managing
storage costs efficiently. This is particularly useful in environments where data grows exponentially over
time.
– Schema-Free: InfluxDB is schemaless, meaning that data can be written with any fields and tags, making
it flexible to adapt to new use cases and metrics without predefined structures.
– Integrations: InfluxDB integrates easily with other tools such as Grafana for visualization, Telegraf for
data collection, and Kapacitor for alerting and data processing.
Cons of InfluxDB:
– Query Complexity: While InfluxQL is simple for basic queries, more complex queries involving joins
or transformations might be challenging. Flux, the newer query language, addresses these issues but
introduces a learning curve.
– Scaling Issues: Scaling InfluxDB horizontally (i.e., across multiple nodes) can be challenging. The
enterprise version of InfluxDB offers clustering, but the open-source version does not, making scaling
limited for high-availability deployments.
– Storage Costs: Although InfluxDB offers compression, the storage requirements for high-frequency data
can still be substantial, especially in long-term retention scenarios.
13 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Advantages of TimescaleDB:
– PostgreSQL Ecosystem: Since TimescaleDB is an extension of PostgreSQL, it benefits from the sta-
bility, reliability, and community support of PostgreSQL. This includes support for advanced indexing,
relational joins, transactions, and other features common to relational databases.
– SQL Support: TimescaleDB supports standard SQL queries, making it easy for developers familiar with
SQL to work with time series data. This reduces the learning curve compared to other TSDBs that use
custom query languages.
– Efficient Time Series Storage: TimescaleDB automatically partitions data into chunks based on time
intervals, which improves query performance. It also supports data compression, making it highly efficient
for storing large datasets.
– Scalability: TimescaleDB provides built-in tools for scaling horizontally, allowing it to handle large time
series datasets across distributed environments.
Cons of TimescaleDB:
– Limited for Extreme Real-Time Use Cases: While TimescaleDB performs well for most time series
applications, it may not be as optimized for extreme high-frequency, real-time applications as InfluxDB
or Prometheus.
– Complexity with Large Joins: Although relational joins are a strength of TimescaleDB, performing
large-scale joins on massive datasets can lead to performance issues, particularly for real-time queries.
– Enterprise Features: Some advanced features, like continuous aggregation and advanced compression,
are part of TimescaleDB’s enterprise offering, which can be a limitation for users relying only on the
open-source version.
• Prometheus: Prometheus is a highly popular, open-source monitoring and alerting toolkit designed specifically
for cloud-native environments. It was developed as part of the Cloud Native Computing Foundation and is
often used in conjunction with Kubernetes for monitoring application performance, infrastructure metrics, and
other system behaviors.
Prometheus works by scraping metrics from instrumented services at regular intervals, storing them as time
series data. It supports multi-dimensional data collection using labels, which are key-value pairs attached to
the metrics. Prometheus uses its own query language called PromQL (Prometheus Query Language), which
is specifically designed for aggregating and filtering time series data. Its alerting mechanism is flexible and
integrates easily with various notification systems like PagerDuty, Slack, and email.
One of the primary use cases for Prometheus is in monitoring cloud infrastructure, where it excels at tracking
the performance of servers, containers, and microservices. The system is designed to be lightweight and works
well in environments where quick real-time insights and monitoring are critical.
Prometheus uses PromQL for querying, and data is scraped from instrumented services.
1 # An example of Prometheus scrape configuration
2 scra pe_confi gs :
3 - job_name : ' node_exporter '
4 stat ic_confi gs :
5 - targets : [ ' localhost :9100 ']
Listing 8: Scraping Time Series Data in Prometheus
Advantages of Prometheus:
14 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
X = {120, 122, 119, 118, 121, 123, 124, 125, 126, 128}.
Step 1: Compute the mean of the series. First, compute the mean X̄ of the series:
120 + 122 + 119 + 118 + 121 + 123 + 124 + 125 + 126 + 128 1226
X̄ = = = 122.6.
10 10
Step 2: Define the autocorrelation formula. The formula for autocorrelation at lag k is:
Pn−k
t=1 (Xt − X̄)(Xt+k − X̄)
ρ(k) = Pn 2
,
t=1 (Xt − X̄)
where:
• Xt is the value at time t,
• X̄ is the mean of the time series,
• k is the lag,
• n is the total number of observations.
Step 3: Calculate the denominator for all lags. The denominator for both lag 1 and lag 2 is the same:
n
X
(Xt − X̄)2 = (120 − 122.6)2 + (122 − 122.6)2 + (119 − 122.6)2 + . . . + (128 − 122.6)2 .
t=1
= (−2.6)2 + (−0.6)2 + (−3.6)2 + (−4.6)2 + (−1.6)2 + (0.4)2 + (1.4)2 + (2.4)2 + (3.4)2 + (5.4)2 .
= 6.76 + 0.36 + 12.96 + 21.16 + 2.56 + 0.16 + 1.96 + 5.76 + 11.56 + 29.16 = 92.4.
Thus, the denominator is 92.4.
Step 4: Calculate the numerator for lag 1. Now, compute the numerator for lag 1, which is:
n−1
X
(Xt − X̄)(Xt+1 − X̄) = (120 − 122.6)(122 − 122.6) + (122 − 122.6)(119 − 122.6) + . . . + (126 − 122.6)(128 − 122.6).
t=1
= (−2.6)(−0.6)+(−0.6)(−3.6)+(−3.6)(−4.6)+(−4.6)(−1.6)+(−1.6)(0.4)+(0.4)(1.4)+(1.4)(2.4)+(2.4)(3.4)+(3.4)(5.4).
= 1.56 + 2.16 + 16.56 + 7.36 + (−0.64) + 0.56 + 3.36 + 8.16 + 18.36 = 57.68.
Step 5: Calculate autocorrelation at lag 1. Now that we have the numerator and denominator, calculate the
autocorrelation:
57.68
ρ(1) = ≈ 0.624.
92.4
Thus, the autocorrelation at lag 1 is approximately 0.624, indicating a moderate positive correlation between con-
secutive values.
16 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Step 6: Calculate the numerator for lag 2. For lag 2, compute the numerator:
n−2
X
(Xt − X̄)(Xt+2 − X̄) = (120 − 122.6)(119 − 122.6) + (122 − 122.6)(118 − 122.6) + . . . + (125 − 122.6)(128 − 122.6).
t=1
Conclusion From these calculations, we see that the autocorrelation at lag 1 is higher (0.624) compared to lag
2 (0.352). This suggests that consecutive stock prices are more closely related than prices separated by two days,
which is a typical observation in time series where immediate past values have a stronger influence on the present.
17 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
The plot of the stationary series will show random fluctuations around a constant mean with no visible trend.
In time series analysis, we often compute autocorrelation or the correlation between values at different time lags.
Estimating the autocorrelation function (ACF) helps us determine whether previous values have predictive power
for future values.
This equation calculates the correlation between observations separated by k time steps. As k increases, ρ(k) typically
decreases, reflecting the diminishing influence of earlier observations on future values.
important in fields like economics (e.g., analyzing stock prices for multiple companies) and environmental science
(e.g., temperature, humidity, and wind speed together).
Multidimensional time series analysis focuses on understanding the relationships between these multiple series
and how they jointly evolve over time. The vector autoregressive (VAR) model is a common model used for such
series.
Equation for Multivariate Model: In a multivariate time series model, each variable depends on its own past
values and the past values of other variables. The vector autoregressive (VAR) model for two time series Xt and Yt
is given by:
Xt = α1 Xt−1 + β1 Yt−1 + ϵ1t
Yt = α2 Xt−1 + β2 Yt−1 + ϵ2t
where α1 , α2 , β1 , β2 are coefficients and ϵ1t , ϵ2t are error terms.
This type of modeling is crucial in understanding how multiple series interact over time.
19 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 1
Chapter - 3 & 4
Example: A consistent rise in global temperatures over decades due to climate change.
20 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1. Natural forces
2. Manmade conventions
The most significant cause of seasonal variations is climate. Changes in weather conditions—such as rainfall,
humidity, and temperature—impact industries and products differently. For example, there is a higher demand for
woolen clothes and hot drinks in winter, while in summer, cotton clothes and cold drinks see increased sales. During
the rainy season, the demand for umbrellas and raincoats rises.
In addition to nature, customs, traditions, and habits also influence seasonal variation. For instance, during
festivals like Diwali, Dussehra, and Christmas, there is an increased demand for sweets and clothes. Similarly, the
start of a school or college year sees a surge in demand for books and stationery.
Example: Higher sales of air conditioners during summer months due to the hot weather.
1. Boom: This phase is characterized by rapid economic growth, high levels of production, employment, and
rising prices. During the boom period, consumer demand is strong, and businesses expand rapidly. However,
inflationary pressures may also build up, leading to potential overheating of the economy.
Example: The global economy in the late 1990s experienced a boom due to the dot-com bubble, where
technology companies saw rapid growth and expansion.
2. Decline: After the boom, the economy begins to slow down. Production and demand decrease, unemployment
starts to rise, and inflation stabilizes. This phase marks the transition from a peak towards a downturn, signaling
the end of rapid economic expansion.
Example: The early 2000s saw a decline after the burst of the dot-com bubble, where stock prices fell, and
many tech companies collapsed, leading to a slowdown in economic growth.
3. Depression: This is the lowest phase of the cycle, marked by a significant decline in economic activity. There
is high unemployment, reduced consumer spending, lower investment, and overall economic stagnation. It
represents the most severe form of economic contraction.
Example: The Great Depression of the 1930s is a classic example, where global economies shrank, unemploy-
ment reached record levels, and industrial output dropped sharply.
4. Improvement (Recovery): After the depression, the economy begins to recover. Businesses start investing
again, employment rises, and consumer confidence gradually returns. Production and demand start increasing,
marking the beginning of the next upward cycle.
Example: After the Great Recession of 2008, the economy began recovering in 2010, with improved job growth,
increased consumer spending, and steady economic expansion.
21 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
These phases repeat over time, reflecting the fluctuating nature of economic activity.
Example: Economic cycles with alternating periods of economic expansion and contraction.
22 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Additive Model
The Additive Model assumes that the different components of a time series combine in an additive manner. That
is, the observed value Yt at time t is the sum of the contributions of the individual components.
Mathematically, the additive model is represented as:
Yt = Tt + St + Ct + It
where:
• Yt is the observed value at time t,
• Tt is the trend component at time t,
Example
Consider a time series of monthly sales data for a store over a year. Suppose the trend increases by 5 units per
month, the seasonal effect adds 10 units during the summer months (June, July, and August), and cyclical factors
add or subtract up to 3 units. Then, using the additive model, we can express the sales data Yt for a summer month
as:
Yt = 5t + 10 + Ct + It
where Ct is the cyclical effect and It represents any irregular variations.
Multiplicative Model
The Multiplicative Model assumes that the components of the time series interact in a multiplicative manner.
That is, the observed value Yt at time t is the product of the contributions of the individual components.
Mathematically, the multiplicative model is represented as:
Yt = Tt × St × Ct × It
where the variables Yt , Tt , St , Ct , and It represent the same components as in the additive model.
23 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example
Consider a company’s quarterly revenue over a few years. If the trend increases by 10% each quarter, and sales are
doubled during the holiday season, the multiplicative model expresses the revenue as:
Yt = Tt × 2 × Ct × It
where Tt represents the 10% growth in each quarter, the factor 2 accounts for the seasonal holiday surge, Ct
captures any cyclical effects, and It represents irregular variations.
Conclusion
Both the additive and multiplicative models provide valuable ways to decompose a time series into its underlying
components. By choosing the right model, analysts can gain better insights into trends, seasonal variations, and
cyclical movements, and make more accurate forecasts.
• Multiplicative Model:
Yt = Tt × St × Ct × It
In R, you can resolve components of a time series using built-in functions like ‘decompose()‘ for additive models
or ‘stl()‘ for both additive and multiplicative models. Consider the following example where we decompose the
AirPassengers dataset.
1 ```r
2 # Load AirPassengers dataset
3 data ( AirPassengers )
4
5 # Decompose the time series
6 decomposed _ data <- decompose ( AirPassengers , type = " mul tiplica tive " )
7
8 # Plot decomposed components
9 plot ( decomposed _ data )
Listing 13: Creating and Plotting a Multidimensional Time Series
24 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
3.3.1 Graphic
The graphic method, also known as the eye inspection method, is the simplest and most intuitive approach to
identifying trends in time series data. This method involves the following steps:
1. **Plot the Data:** First, plot the given time series data on a graph, with time on the x-axis and the variable
of interest on the y-axis.
2. **Draw a Trend Line:** A smooth, free-hand curve is then drawn through the plotted points, representing the
general tendency of the series. This curve visually highlights the trend over time.
The graphic method effectively removes short-term variations to reveal the underlying trend in the data. The
trend line can also be extended to predict or estimate future values, making it a useful tool for forecasting.
Limitations
However, it is important to note that this method is subjective, and the accuracy of the predictions may vary
depending on how the trend line is drawn. As such, while the graphic method is useful for initial analysis, it should
be supplemented with more rigorous statistical techniques for reliable forecasting.
Example
Consider monthly sales data for a retail store over a year:
Month Sales
Jan 100
Feb 120
Mar 140
Apr 160
May 150
Jun 130
Jul 180
Aug 190
Sep 170
Oct 160
Nov 140
Dec 200
25 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
140
120
100
2 4 6 8 10 12
Month
The red line represents the overall trend in sales. Although the data fluctuates, the general upward direction is
clearly visible.
Advantages:
• Simplicity: The graphic method is one of the simplest approaches to studying trend values and is easy to
implement.
• Expertise Benefits: An experienced statistician can often draw a trend line that better represents the data
than one fitted using mathematical formulas.
• Applicability: Despite not being recommended for beginners, this method has significant merits in the hands
of skilled statisticians and is widely used in practical applications.
Disadvantages:
• Subjectivity: The method is highly subjective; the resulting trend line can vary significantly based on who
draws it.
• Skill Requirements: It requires the work to be conducted by skilled and experienced individuals to ensure
accuracy.
• Reliability Concerns: The subjective nature of this method means that predictions derived from it may not
be reliable.
• Careful Execution: Drawing the trend line must be done carefully to avoid misrepresentation of the data.
26 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
3.3.2 Semi-Averages
The semi-averages method involves dividing the time series data into two equal parts with respect to time. For
instance, if we have data spanning from 1999 to 2016 (a total of 18 years), we would split it into two equal parts:
- The first part: 1999 to 2007 - The second part: 2008 to 2016
In cases where the number of years is odd, such as 9, 13, or 17, the middle year is omitted. For example, for 19
years of data from 1998 to 2016, the division would be:
- The first part: 1998 to 2006 - The second part: 2008 to 2016 (omitting the middle year 2007)
Once the data is divided, we calculate the arithmetic mean for each part, yielding two average values. These
averages are then plotted against the mid-year of each part, and a straight line is drawn to connect the two points.
This line represents the trend, which can be extended to estimate intermediate values or predict future values.
3.3.3 Example
Consider the following production data over several years:
Year Production
2001 40
2002 45
2003 40
2004 42
2005 46
2006 52
2007 56
2008 61
27 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Production Data
Semi-Averages
55
Production
50
45
40
Year
Advantages:
• Simplicity: This method is easier to understand compared to the moving average method and the method of
least squares.
• Objectivity: It is an objective method for measuring trends; anyone applying this method will arrive at the
same results.
Disadvantages:
• Assumption of Linearity: The method assumes a straight-line relationship between the plotted points,
regardless of whether such a relationship actually exists.
• Data Sensitivity: If additional data is added to the original dataset, the entire calculation must be redone
to obtain new trend values, and the trend line will change accordingly.
• Influence of Extremes: Since the arithmetic mean is calculated for each half, an extreme value in either half
can significantly impact the points. As a result, the trend derived from these points may not be sufficiently
accurate for future forecasting.
When m is odd, the moving average is associated with the mid-value of the time interval it covers. For instance,
if m = 3, the moving average for the first three data points will be placed against the second data point (mid-point).
However, if m is even, the moving average will lie between two middle periods, which do not correspond to any
specific time period. To address this, a secondary calculation is performed by taking the average of the moving
averages (2-yearly moving average) to align the result with a specific time period.
Example: Calculate the 3-yearly moving average for the following data.
Calculation Explanation: - For 2002-03, the moving average is calculated using the production values for
2001-02, 2002-03, and 2003-04:
40 + 45 + 40
Moving Average = = 41.67
3
- For 2003-04, the moving average uses the values for 2002-03, 2003-04, and 2004-05:
45 + 40 + 42
Moving Average = = 42.33
3
This process continues until the last available data point. The moving average method is useful for smoothing out
short-term fluctuations in data, providing a clearer view of the long-term trend. By systematically averaging data
over a specified period, this method facilitates better forecasting and analysis in various fields, including economics,
sales, and environmental studies.
Conclusion: The moving average is a fundamental tool in time series analysis, allowing for a better understand-
ing of underlying trends by reducing noise from random fluctuations.
Years Production 4-Yearly Moving Average 2-Yearly Moving Average (Trend Values)
2001-02 40
2002-03 45
40+45+40+42 40+45
2003-04 40 4 = 41.75 2 = 42.5
45+40+42+46 40+42
2004-05 42 4 = 43.15 2 = 41
40+42+46+52 42+46
2005-06 46 4 = 45 2 = 44
42+46+52+56 46+52
2006-07 52 4 = 49 2 = 49
46+52+56+61 52+56
2007-08 56 4 = 53.75 2 = 54
2008-09 61
Calculation Explanation: - For 2003-04, the 4-yearly moving average is calculated as follows:
40 + 45 + 40 + 42
Moving Average = = 41.75
4
- For 2004-05, the calculation is:
45 + 40 + 42 + 46
Moving Average = = 43.15
4
29 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1. Given the following production data over a 5-year period, calculate the 3-yearly moving average. Use the
moving averages to identify the trend:
• 2010: 30
• 2011: 35
• 2012: 50
• 2013: 45
• 2014: 60
2. Consider the following data for sales over 6 years. Calculate the 2-yearly moving average and discuss any
observed trends:
• 2015: 80
• 2016: 90
• 2017: 85
• 2018: 95
• 2019: 100
• 2020: 110
3. A company’s quarterly earnings over two years are as follows. Calculate the 4-quarter moving average and
explain any patterns you find:
• Q1 2018: 200
• Q2 2018: 220
• Q3 2018: 210
• Q4 2018: 250
• Q1 2019: 240
• Q2 2019: 260
• Q3 2019: 280
• Q4 2019: 300
Advantages:
• This method is simple to understand and easy to execute.
• It has flexibility in application; if new data for additional time periods are added, previous calculations remain
unaffected, allowing for the generation of more trend values.
• It provides an accurate representation of the long-term trend, particularly if the trend is linear.
• When the period of the moving average coincides with the period of oscillation (cycle), periodic fluctuations
are effectively eliminated.
• The moving average adapts to general movements in the data, with its shape determined by the actual data
rather than arbitrary choices made by the statistician.
• It is effective for smoothing out short-term fluctuations, allowing for clearer visibility of long-term trends.
• The moving average can be easily visualized on a graph, making it a useful tool for presentations and reports.
Disadvantages:
30 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• For a moving average of 2m + 1, no trend values are generated for the first m and last m periods, limiting the
analysis of the entire dataset.
• The trend path does not correspond to any specific mathematical function, making it unsuitable for forecasting
or predicting future values.
• If the underlying trend is not linear, moving averages may not accurately reflect the true tendency of the data.
• The selection of the period for the moving average can be subjective, potentially introducing human bias into
the analysis.
• Moving averages can lag behind actual data changes, which may lead to delays in identifying trends.
• In cases of sudden shifts or changes in the data, moving averages may provide a misleading representation of
the trend, as they are based on historical data.
• The smoothing effect of moving averages can sometimes obscure important fluctuations that may need to be
addressed.
The method of least squares relies on two fundamental conditions to ensure that the fitted line provides the best
representation of the data.
1. Condition: P (Y − Ŷ ) = 0
This condition states that the sum of the residuals (the differences between the observed values Y and the predicted
values Ŷ ) must equal zero.
Explanation
• Residuals: The residual for each data point is defined as Yt − Ŷt . It measures the error between the actual
observation and the value predicted by the model.
• Sum of Residuals: When we sum these residuals across all observations, the condition P (Y − Ŷ ) = 0 ensures
that the positive and negative errors balance out. If this condition is satisfied, it indicates that the model does
not systematically overestimate or underestimate the values.
• Mathematical Justification: X
(Yt − Ŷt ) = 0
This can also be derived from the optimization process, where minimizing the sum of squared deviations
inherently leads to this condition.
2. Condition: P (Y − Ŷ )2 is minimized
This condition involves minimizing the sum of the squares of the residuals.
Explanation
• Purpose of Squaring: Squaring the residuals ensures that positive and negative errors do not cancel each
other out, which could happen in the first condition. Squaring amplifies larger errors more than smaller ones,
which helps in identifying models that fit better overall.
• Objective: The objective of the least squares method is to find the parameters (like a and b in a linear
equation) that minimize the sum of these squared differences:
X
S= (Yt − Ŷt )2
31 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• Geometric Interpretation: In a geometric sense, this condition ensures that the trend line is as close as
possible to all data points, minimizing the overall distance from each point to the line.
• Derivation: To find the best fitting line, we take the derivative of S with respect to the parameters (like a
and b) and set these derivatives to zero. This process yields the normal equations, which can then be solved to
find the optimal values of the parameters.
Summary
Both conditions together ensure that the best-fitting line through the data not only balances the residuals (no
systematic bias) but also minimizes the overall error in terms of squared differences, leading to the most accurate
predictions possible within the context of a linear model. This approach is foundational in regression analysis, helping
to create models that accurately reflect underlying trends in data.
Fitting a Straight Line Trend by the Method of Least Squares Let Yt be the value of the time series at
time t. Thus, Yt is the independent variable depending on t.
Assume a straight line trend of the form:
Ŷt = a + bt
where Ŷt designates the trend values to distinguish them from the actual Yt values, a is the Y-intercept, and b is the
slope of the trend line.
To fit a straight line trend to a time series, we assume a linear relationship of the form:
Ytc = a + bt
where Ytc is the trend value at time t, a is the Y-intercept, and b is the slope of the trend line. The goal is to estimate
the parameters a and b such that the sum of the squared deviations between the actual values Yt and the trend
values Ytc is minimized: X X
S= (Yt − Ytc )2 = (Yt − (a + bt))2 .
To find the optimal values of a and b, we differentiate S with respect to a and b and set the derivatives to zero.
Differentiating S with respect to a gives:
∂S X
= −2 (Yt − (a + bt)) = 0.
∂a
Rearranging yields: X X X
(Yt − (a + bt)) = 0 =⇒ Yt = na + b t,
where n is the number of observations. Thus, we obtain:
X X 1 X X
na = Yt − b t =⇒ a = Yt − b t .
n
Substituting this back into the equation of Ytc leads to a simplified expression for a.
Next, we differentiate S with respect to b:
∂S X
= −2 t(Yt − (a + bt)) = 0.
∂b
Rearranging gives: X X X X
t(Yt − (a + bt)) = 0 =⇒ tYt = a t+b t2 .
This can be rearranged to yield: P P
tYt − a t
b= P 2 .
t
The resulting normal equations from this process are:
X X
(1) Yt = na + b t, (11)
X X X
(2) tYt = a t+b t2 . (12)
32 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Solving these two normal equations will yield the estimates â and b̂.
If we wish to fit a parabolic trend of the form:
Ytc = a + bt + ct2 ,
The normal equations for the parabolic trend can be summarized as:
X X X
(1) Yt = na + b t+c t2 , (13)
X X X X
(2) tYt = a t+b t2 + c t3 , (14)
X X X X
(3) t2 Yt = a t2 + b t3 + c t4 . (15)
Solving these three equations provides the values of â, b̂, and ĉ. Substituting these values into the equation for
the parabolic trend gives:
Ytc = â + b̂t + ĉt2 .
To assess the appropriateness of the parabolic trend model, one can use the method of second differences. If the
second differences are constant (or nearly constant), the quadratic equation is a suitable representation of the trend
component.
33 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
34 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Advantages
• This is a mathematical method of measuring trend, and as such, there is no possibility of subjectiveness; i.e.,
everyone who uses this method will get the same trend line.
• The line obtained by this method is called the line of best fit.
• Trend values can be obtained for all the given time periods in the series.
Disadvantages
• Great care should be exercised in selecting the type of trend curve to be fitted, i.e., linear, parabolic, or some
other type. Carelessness in this respect may lead to wrong results.
• The method is more tedious and time-consuming.
• Predictions are based only on long-term variations, i.e., trend, and the impact of cyclical, seasonal, and irregular
variations is ignored.
• This method cannot be used to fit growth curves like the Gompertz curve:
X
Y = Kab , (16)
35 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Question Bank
1. Define a time series and elaborate on its fundamental components.
2. Discuss the notion of a secular trend in a time series and outline the methods employed to isolate it.
3. Explain the moving average method used for trend determination, including its advantages and disadvantages.
4. Analyze the graphic method and the least squares method for trend analysis, emphasizing their respective
advantages and disadvantages.
5. Provide a brief overview of the moving averages method for calculating trends.
6. In what ways does time series analysis support business forecasting?
7. Distinguish between secular trends, seasonal variations, and cyclical fluctuations, and describe the various
methods used to measure each.
8. Summarize the additive and multiplicative models of time series. Which of these models is more prevalent in
practice, and why?
9. Explain the process of determining seasonal variation using a 12-month moving average.
10. What methods are available for identifying trends in a time series?
11. Describe the least squares method for trend determination in detail.
12. Given the production data of steel in a factory over the past 10 years, fit a straight-line trend and tabulate the
trend values. Estimate the production for the year 1997 based on the trend:
• Year: 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996
• Production (tonnes): 75, 86, 98, 90, 96, 108, 124, 140, 150, 165
13. Fit a straight-line trend for the following data using the least squares method and estimate production for the
year 1997:
• Year: 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996
• Production (tonnes): 12, 13, 13, 16, 19, 23, 21, 23
14. Fit a straight-line trend using the least squares method for the following data and estimate production for the
year 2000:
• Year: 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997
• Production (tonnes): 38, 40, 65, 72, 69, 67, 95, 104
15. Calculate the trend using a 4-year moving average from the following data and identify short-term oscillations:
Year Production in Tonnes
1984 5
1985 6
1986 7
1987 7
1988 6
1989 8
1990 9
1991 10
1992 9
1993 10
1994 11
1995 11
36 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 2
Chapter - 1
4 Correlation
In time series analysis, understanding correlation after removing trend and seasonal effects is essential. We start
with fundamental concepts of expectation, the ensemble, stationarity, and ergodicity.
E[(x − µ)2 ] = σ 2 ,
Covariance and correlation are key concepts in time series analysis. Covariance measures the linear association
between two variables, and correlation standardizes this measure, giving a dimensionless value between -1 and 1. In
this section, we will explain these concepts using an example from a study that analyzed air quality in Manhattan.
The covariance between two variables x and y is defined as:
where µx and µy are the means of x and y, respectively. The sample covariance, which provides an estimate from
observed data, is given by:
n
1 X
Cov(x, y) = (xi − x̄)(yi − ȳ),
n − 1 i=1
where n is the number of data pairs and x̄, ȳ are the sample means of x and y.
37 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
The manual calculation of covariance yields the same result as the built-in ‘cov()‘ function, showing a covariance
value of 5.51.
Explanation of Covariance
Covariance indicates how two variables move together. If both x and y increase together, the covariance is positive.
Conversely, if one increases while the other decreases, the covariance is negative. In the Herald Square data, a
covariance of 5.51 suggests that there is a moderate positive association between carbon monoxide and benzoapyrene
levels. While covariance provides a measure of association, it depends on the units of the variables, making it difficult
to compare across datasets. Correlation resolves this by standardizing covariance. The population correlation ρ(x, y)
is defined as:
γ(x, y)
ρ(x, y) = ,
σx σy
where σx and σy are the standard deviations of x and y. The sample correlation is calculated as:
Cov(x, y)
Cor(x, y) = .
sd(x) · sd(y)
Both methods calculate the correlation between CO and benzoapyrene as 0.3551. Correlation values range between
-1 and 1. A value of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative relationship,
and 0 means no linear association. In this example, the correlation of 0.3551 suggests a weak to moderate positive
linear relationship between CO and benzoapyrene levels.
Graphical Interpretation
We can visualize the relationship between CO and benzoapyrene by plotting the data points and adding a regression
line:
1 # Plot the data
2 plot ( CO , Benzoa , main = " CO vs Benzoapyrene " ,
3 xlab = " CO Concentration ( ppm ) " , ylab = " Benzoapyrene ( micrograms ) " )
4 abline ( lm ( Benzoa ~ CO ) , col = " red " )
38 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
The scatter plot shows a weak upward trend, confirming the positive correlation observed in the data. The red
line represents a simple linear regression that best fits the data.
If the mean function is constant, we say that the time series model is stationary in the mean. The sample estimate
of the population mean, µ, is the sample mean, denoted x̄:
n
1X
x̄ = xt
n t=1
This equation assumes that a sufficiently long time series characterizes the hypothetical model. Such models are
known as ergodic models, where time averages are representative of population averages.
The expectation in this definition is an average taken across the ensemble of all the possible time series that might
have been produced by the time series model in figure 9
39 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 9: An ensemble of time series. The expected value E(xt ) at a particular time t is the average taken over the
entire population.
This implies that the time average is independent of the starting point. Given that we usually only have a single
time series, one might wonder how a time series model can fail to be ergodic, or why we would want a model that is
not ergodic.
Environmental and economic time series are typically single realizations of a hypothetical time series model, which
we often define as ergodic. However, there are cases where multiple time series can arise from the same model. For
instance, when investigating the acceleration at the pilot seat of a microlight aircraft design in a wind tunnel with
simulated random gusts, two prototypes built to the same design may show slightly different average acceleration
responses due to manufacturing differences. In such a case, the number of time series corresponds to the number
of prototypes. Another example is the study of turbulent flows in a complex system where different runs may yield
qualitatively different results based on initial conditions. In such experiments, it is often preferable to perform
multiple runs rather than extending a single run over a long period. The number of runs corresponds to the number
of time series. A stationary time series model can be adapted to be non-ergodic by defining the means of individual
time series to follow a probability distribution.
40 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
This equation suggests that the variance, σ 2 (t), could potentially take different values at each time point t.
However, from a single time series, it is not feasible to estimate a different variance at every point in time. Therefore,
to make progress, we introduce a simplifying assumption: if the model is stationary in the variance, we can assume
the variance is constant across time, denoted as σ 2 . In this case, we estimate the population variance using the
sample variance:
(xt − x̄)2
P
Var(x) =
n−1
In time series analysis, sequential observations may be correlated, particularly when the correlation is positive.
As a result, the sample variance, Var(x), may underestimate the true population variance, especially in short time
series, because consecutive observations tend to be similar. However, this bias decreases quickly as the length of the
time series, n, increases.
4.2.1 Autocorrelation
The mean and variance play an important role in understanding statistical distributions because they summarize
two key aspects: the central tendency (mean) and the spread (variance). Similarly, in time series analysis, we focus
on second-order properties, which include the mean, variance, and serial correlation.
Consider a time series model that is stationary in both the mean and variance. In such models, variables may be
correlated, and the model is called second-order stationary if the correlation between variables depends only on the
number of time steps between them. This time difference is referred to as the lag.
When a variable is correlated with itself at different time points, this is called autocorrelation or serial correlation.
For a second-order stationary time series model, we can define an autocovariance function (acvf) γk as a function of
the lag k:
γk = E [(xt − µ)(xt+k − µ)]
Here, γk does not depend on the specific time t because the expectation is the same across all time points. This
formula is a natural extension of the covariance formula, where we now compare xt with xt+k . Next, we define the
autocorrelation function (acf) at lag k, denoted as ρk , by dividing the autocovariance by the variance:
γk
ρk =
σ2
From this definition, it follows that ρ0 = 1, meaning that the correlation of a variable with itself at the same time
point is always 1.
In time series analysis, we often estimate the autocovariance function and autocorrelation function from the sample
data. The sample autocovariance function (sample acvf), denoted as ck , is given by:
n−k
1X
ck = (xt − x̄)(xt+k − x̄)
n t=1
Note that the sample autocovariance at lag 0, c0 , is just the variance of the data. The denominator n is used when
calculating ck , although only n − k terms are summed in the numerator. Finally, the sample autocorrelation function
(sample acf) is defined as:
ck
rk =
c0
We will now illustrate these calculations using an example in R. The data consists of wave heights (in millimeters,
relative to still water level) measured in a wave tank. The sampling interval is 0.1 seconds, and the total recording
length is 39.7 seconds. The waves were generated by a wave maker using a pseudo-random signal to mimic a rough
sea. Since there is no trend or seasonal component, we assume that this time series is a realization of a stationary
process.
41 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1
-1 0
-3
0 10 20 30 40
Time (seconds)
Next, we calculate the sample autocovariance function (acvf) at different lags using the acf function, which also
gives the sample autocorrelation (acf).
1 # Load necessary libraries
2 # install . packages (" ggplot2 ") # Uncomment if ggplot2 is not installed
3 library ( ggplot2 )
4
5 # Simulated time series of wave heights
6 # waveht <- ... # Assume this is your time series data
7
8 # Calculate and plot sample autocova riance and au to c or re la t io n
9 acf ( time _ series , type = " covariance " , main = " Sample Autoc ovarianc e Function " )
10 acf ( time _ series , type = " correlation " , main = " Sample Au to co r re la ti o n Function " )
11
12 # Plot wave heights against their lagged values
13 plot ( waveht [1:396] , waveht [2:397] ,
14 xlab = " Wave Height at time t " ,
15 ylab = " Wave Height at time t + 1 " ,
16 main = " Wave Heights at Lag 1 " ,
17 pch = 19 ,
18 col = " blue " )
19 abline ( lm ( waveht [2:397] ~ waveht [1:396]) , col = " red " ) # Add regression line
42 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
To manually compute the sample autocovariance and autocorrelation at lag k = 1, we use the following steps:
1 # Mean of the time series
2 x _ mean <- mean ( time _ series )
3
4 # Sample autocov ariance at lag 1
5 k <- 1
6 n _ k <- length ( time _ series ) - k
7 sample _ acvf <- sum (( time _ series [1: n _ k ] - x _ mean ) * ( time _ series [(1 + k ) : length ( time _ series ) ] - x _
mean ) ) / n
8
9 # Sample a ut oc or r el at i on at lag 1
10 sample _ acf <- sample _ acvf / var ( time _ series )
11
12 # Print the results
13 sample _ acvf
14 sample _ acf
Sample Output
Assuming we have the following simulated time series data, the output for the calculations will be:
> sample_acvf
[1] 0.20754 # Sample autocovariance at lag 1
> sample_acf
[1] 0.58783 # Sample autocorrelation at lag 1
These values indicate that at lag k = 1, the sample autocovariance is approximately 0.20754, and the sample
autocorrelation is approximately 0.58783. This suggests a moderate positive correlation between the values of the
time series that are one time step apart. The acf function computes the autocovariance and autocorrelation functions
for all lags, and the results are automatically constrained to lie between −1 and 1. The sample acvf and acf calculated
manually for lag 1 will match those obtained by the acf function.
0.0
-0.5
0 5 10 15 20 25
Lag
43 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Interpretation
From the plot of the autocorrelation function (acf), we can determine the degree of serial correlation at different lags.
If the autocorrelation decays slowly, this indicates that the process is highly persistent over time. A rapid decay, on
the other hand, suggests weaker serial correlation.
– The correlogram from an autoregressive model of order 2 typically shows a damped cosine shape.
– Non-stationary series (e.g., air passenger bookings) can still have their sample autocorrelation function
(ACF) calculated.
• Deterministic Signals and ACF Behavior:
44 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 12: Correlogram for the air passenger bookings over the period 1949–1960. The gradual decay is typical of a
time series containing a trend. The peak at 1 year indicates seasonal variation.
1 data ( AirPassengers )
2 AP <- AirPassengers
3 AP . decom <- decompose ( AP , " mult iplicat ive " )
4 plot ( ts ( AP . decom $ random [7:138]) )
5 acf ( AP . decom $ random [7:138])
Figure 13: The random component of the air passenger series after removing the trend and the seasonal variation.
The correlogram in Figure 14 suggests either a damped cosine shape that is characteristic of an autoregressive
model of order 2 or that the seasonal adjustment has not been entirely effective. The latter explanation is unlikely
because the decomposition does estimate twelve independent monthly indices. If we investigate further, we see that
the standard deviation of the original series from July until June is:
1
45 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Output:
109
Output:
41.1
Output:
0.0335
The reduction in the standard deviation shows that the seasonal adjustment has been very effective.
Figure 14: Correlogram for the random component of air passenger bookings over the period 1949–1960.
46 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 2
Chapter - 2
5 Seasonal Variation
Seasonal variations are regular and periodic variations having a period of one year duration. Some of the examples
which show seasonal variations are production of cold drinks, which are high during summer months and low during
winter season. Sales of sarees in a cloth store which are high during festival season and low during other periods.
The reason for determining seasonal variations in a time series is to isolate it and to study its effect on the size of
the variable in the index form which is usually referred as seasonal index. There are different devices to measure
seasonal variations, including:
Yt = Tt + Ct + St + Rt
In this model, we assume that the trend component (Tt ) and the cyclical component (Ct ) are absent. The method
consists of the following steps:
• Arrange the data by years and months (or quarters if quarterly data is given).
• Compute the average xi for the i-th month or quarter across all years:
– For monthly data (i = 1, 2, . . . , 12):
12
1 X
x̄ = xi
12 i=1
• Seasonal indices for different months (or quarters) are obtained by expressing the monthly (or quarterly)
averages as percentages of x̄. Thus, the seasonal index for the i-th month (or quarter) is calculated as:
xi
Seasonal Indexi = × 100
x̄
47 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Advantages
• Simplicity:
– The method is straightforward and easy to understand, making it accessible for practitioners with varying
levels of statistical expertise.
– No complex calculations or statistical software are required; basic arithmetic suffices.
• Time Efficiency:
– It requires minimal time to implement, allowing for quick seasonal adjustments in data analysis.
– Suitable for businesses needing rapid assessments of seasonal trends without extensive data processing.
• Clarity of Results:
– The results, represented as seasonal indices, provide a clear and intuitive understanding of seasonal vari-
ations.
– Stakeholders can easily interpret seasonal indices, facilitating communication of insights.
• No Need for Advanced Techniques:
– Useful in cases where advanced statistical techniques are not available or practical.
– Serves as a preliminary analysis tool before employing more sophisticated methods.
Disadvantages
• Assumption of No Trend or Cycles:
– The method assumes that the data does not contain any underlying trends or cyclical components.
– In real-world scenarios, many time series exhibit significant trends, which can distort the results.
• Limited Applicability:
– The method may not be suitable for data with strong seasonal patterns, as it can lead to misleading
conclusions.
– Economic and business time series often include seasonal and cyclical variations, which are not adequately
addressed by this method.
• Sensitivity to Outliers:
– The method is susceptible to outliers or extreme values, which can disproportionately affect average
calculations.
– This sensitivity may result in skewed seasonal indices that do not accurately represent underlying trends.
• Ignores Interactions:
– The method does not consider potential interactions between seasonal effects and other variables, limiting
its explanatory power.
– It provides a simplistic view of seasonality, lacking the depth of analysis found in more advanced methods.
• Static Nature:
– The method produces static seasonal indices that may not adapt to changing patterns over time.
– As market conditions or consumer behavior evolves, these indices may become outdated and less relevant.
48 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example - 1
Consider the following monthly sales data for a product over three years:
12
1 X 130 + 125 + 150 + 170 + 180 + 210 + 200 + 190 + 170 + 160 + 140 + 130 2075
x̄ = xi = = ≈ 172.92
12 i=1 12 12
49 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example - 2
Consider the following quarterly sales data (in thousands of units) for a product over three years:
Calculate the seasonal indices for each quarter using the method of simple averages.
50 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Results
The seasonal indices for each quarter are as follows:
• Q1: 68.09
• Q2: 89.36
• Q3: 110.64
• Q4: 131.91
• Q1 has a seasonal index of 68.09, suggesting lower sales compared to the average.
• Q2 has a seasonal index of 89.36, indicating sales slightly below average.
• Q3 has a seasonal index of 110.64, reflecting higher-than-average sales.
• Q4 has a seasonal index of 131.91, showing significantly higher sales relative to the average.
51 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example 3
Yt = Tt × St × Ct × Rt
Where:
• Yt = Observed value at time t
• Tt = Trend component at time t
52 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Why the Least Squares Method? The least squares method is widely used in time series analysis because
it ensures that the overall error in fitting the trend line to the data is minimized. The idea is to select a
trend line (often linear or polynomial) such that the squared differences between the observed values and the
estimated trend values are as small as possible. Mathematically, the objective is to minimize the following
function:
X n
Minimize (Yt − Tt )2
t=1
where:
– Yt is the actual observed value at time t,
– Tt is the estimated trend value at time t,
– n is the number of observations.
Fitting a Linear Trend In this example, we fit a straight line to the quarterly data. A linear trend assumes
the form:
y = a + bx
where:
– a is the intercept, which represents the trend value when x = 0,
– b is the slope, which indicates the rate of change in the trend per unit time.
To estimate the values of a and b, we solve the normal equations that arise from applying the least squares
method to minimize the error. These normal equations are:
a = ȳ − b ∗ x̄
P
xy − nxy ¯
b= P 2
x − n(x̄)2
Example of Fitting a Linear Trend Let’s consider a hypothetical quarterly sales data over three years,
where the data points are as follows:
TQ1 = 130
TQ2 = 190
TQ3 = 250
TQ4 = 310
These values represent the underlying trend in the quarterly data, accounting for the general direction of the
data series. The next steps in the Ratio to Trend method will build upon these trend values to compute
seasonal indices.
53 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Why Use Trend Values? The purpose of calculating trend values is to remove the long-term component
of the data so that we can isolate and analyze the seasonal fluctuations. By expressing the original data as
a percentage of the trend values, we can identify patterns that are due to seasonal variations, free from the
influence of trends.
For example, the percentage calculation for Q1 of Year 1 would be:
120
PQ1 = × 100 ≈ 92.31
130
This percentage represents how the observed value for Q1 deviates from the underlying trend.
Advantages
• It is easy to compute and understand.
• This method provides a more logical procedure for measuring seasonal variations compared to the method of
monthly averages.
• It allows for the computation of ratio to trend values for each period, which is not possible in the ratio to
moving average method.
Disadvantages
• The main defect of the Ratio to Trend method is that if there are cyclical swings in the series, the trend
(whether a straight line or a curve) cannot follow the actual data as closely as a 12-month moving average can.
• Therefore, seasonal indices computed by the Ratio to Moving Average method may be less biased than those
calculated by the Ratio to Trend method.
54 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example Calculation
Let’s consider a hypothetical quarterly sales data for a product over three years:
TQ1 = 130
TQ2 = 190
TQ3 = 250
TQ4 = 310
SQ1 = 92.31
SQ2 = 94.74
SQ3 = 96.00
SQ4 = 96.77
Total = SQ1 + SQ2 + SQ3 + SQ4 = 92.31 + 94.74 + 96.00 + 96.77 = 379.82
Now adjust to sum to 400 (for quarterly data):
400
K= ≈ 1.0529
379.82
Thus, the adjusted seasonal indices are:
55 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• Q1: 97.18
• Q2: 99.80
• Q3: 101.88
• Q4: 102.92
56 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Thus, we have:
Quarter Production(Yt ) t
Q1 120 1
Q1 130 5
Q1 140 9
Q1 150 13
Q2 180 2
Q2 190 6
Q2 200 10
Q2 210 14
Q3 240 3
Q3 250 7
Q3 260 11
Q3 270 15
Q4 300 4
Q4 310 8
Q4 320 12
Q4 330 16
X
Yt = 120 + 130 + 140 + 150 + 180 + 190 + 200 + 210 + 240 + 250 + 260 + 270 + 300 + 310 + 320 + 330
= 3, 320
X
t = 1 + 5 + 9 + 13 + 2 + 6 + 10 + 14 + 3 + 7 + 11 + 15 + 4 + 8 + 12 + 16
= 120
X
tYt = 1 × 120 + 5 × 130 + 9 × 140 + 13 × 150 + 2 × 180 + 6 × 190 + 10 × 200 + 14 × 210 + 3 × 240 + 7 × 250 + 11 × 260 + 1
= 50, 280
X
t = 12 + 52 + 92 + 132 + 22 + 62 + 102 + 142 + 32 + 72 + 112 + 152 + 42 + 82 + 122 + 162
2
3. Calculate a and b
Using the normal equations, we can find a and b:
( Yt )( t2 ) − ( t)( tYt )
P P P P
a= P P
n( t2 ) − ( t)2
Substituting the calculated values:
57 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
4. Trend Equation
Thus, the linear trend equation is:
Tt = a + bt =⇒ Tt = −908.82 + 149.34t
Using this equation, we calculate the trend values for each quarter:
PQ1 ≈ −15.79%
PQ2 ≈ −29.50%
PQ3 ≈ −52.01%
PQ4 ≈ −96.54%
58 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
59 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example 3
60 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• Calculate the centered 12-monthly moving average (or 4-quarterly moving average) of the given data. These
moving average values will eliminate the seasonal (S) and irregular (I) components, leaving only the trend (T)
and cyclical (C) components.
• Express the original data as percentages of the centered moving average values.
• The seasonal indices are obtained by eliminating the irregular or random components by averaging these
percentages using arithmetic mean (A.M) or median.
• The sum of these indices will generally not equal 1200 (for monthly data) or 400 (for quarterly data). Finally,
an adjustment is made to ensure that the sum of the indices totals 1200 for monthly data and 400 for quarterly
data by multiplying them throughout by a constant K:
1200
K= (for monthly data)
Total of the indices
400
K= (for quarterly data)
Total of the indices
Advantages
• Of all the methods of measuring seasonal variations, the Ratio to Moving Average method is the most satis-
factory, flexible, and widely used method.
• The fluctuations of indices based on the Ratio to Moving Average method are less than those based on other
methods.
Disadvantages
• This method does not completely utilize the data. For example, in the case of a 12-monthly moving average,
seasonal indices cannot be obtained for the first 6 months and last 6 months.
Example
Example
Let’s consider a company that records its quarterly sales data over four years. The sales figures (in thousands) are
as follows:
Year Q1 Q2 Q3 Q4
2019 150 200 250 300
2020 180 220 270 320
2021 160 210 260 310
2022 170 230 280 340
61 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
62 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
150
For Q1 (2019): × 100 ≈ 90.91%
165
200
For Q2 (2019): × 100 ≈ 93.02%
215
250
For Q3 (2019): × 100 ≈ 94.34%
265
300
For Q4 (2019): × 100 ≈ 94.43%
317.5
180
For Q1 (2020): × 100 ≈ 109.09%
165
220
For Q2 (2020): × 100 ≈ 102.33%
215
270
For Q3 (2020): × 100 ≈ 101.89%
265
320
For Q4 (2020): × 100 ≈ 100.79%
317.5
160
For Q1 (2021): × 100 ≈ 91.43%
175
210
For Q2 (2021): × 100 ≈ 94.43%
222.5
260
For Q3 (2021): × 100 ≈ 96.30%
270
310
For Q4 (2021): × 100 ≈ 108.77%
285
170
For Q1 (2022): × 100 ≈ 94.44%
180
230
For Q2 (2022): × 100 ≈ 95.83%
240
280
For Q3 (2022): × 100 ≈ 98.24%
285
340
For Q4 (2022): × 100 ≈ 103.81%
327.5
63 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Conclusion
The Ratio to Moving Average method provides a systematic approach to estimating seasonal variations in time
series data. In this example, we calculated the centered moving averages, expressed the original data as percentages,
averaged these percentages to find the seasonal indices, and finally adjusted these indices to sum to a total of 400.
This method enables businesses to better understand seasonal effects and make informed decisions based on these
insights.
64 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example: 2
65 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1. Calculate the Link Relatives for each period using the formula:
Current period’s figure
Link Relative for any period = × 100
Previous period’s figure
2. Calculate the average of the Link Relatives for each period across all years using either the mean or median.
3. Convert the average Link Relatives into Chain Relatives based on the first season. The Chain Relative for
any period is obtained as:
Chain Relative for the first period = 100
Average Link Relative for that period × Chain Relative of the previous period
Chain Relative for any period =
100
4. Compute the Adjusted Chain Relatives by subtracting the correction factor kd from the (k + 1)th Chain
Relative, where k = 1, 2, . . . , 11 for monthly data and k = 1, 2, 3 for quarterly data. The correction factor kd is
defined as:
100
kd =
N
where N denotes the number of periods (i.e., N = 12 for monthly data and N = 4 for quarterly data).
5. Finally, calculate the average of the corrected Chain Relatives and convert these values into percentages based
on this average. These percentages represent the seasonal indices calculated by the Link Relative Method.
Advantages
• The Link Relative Method utilizes the data more effectively compared to the moving average method.
Disadvantages
• This method involves extensive calculations and is more complex than the moving average method.
• The average of Link Relatives may contain both trend and cyclical components, which are eliminated by
applying corrections.
66 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
67 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Year Q1 Q2 Q3 Q4
1 200 220 230 240
2 250 270 260 280
3 290 300 310 320
4 310 330 340 350
5 340 350 360 370
In this example, we can observe an increasing trend in GDP, but there may also be fluctuations that correspond
to economic cycles, indicating periods of growth followed by stagnation or decline.
• Residual Method
• Reference Cycle Analysis Method
• Direct Method
• Harmonic Analysis Method
1. Residual Method
The Residual Method involves isolating the cyclical component of a time series by removing the trend and seasonal
components. The cyclical variation is derived as the residuals after fitting a trend line and seasonal pattern to the
data.
68 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Steps:
1. Fit a trend line (linear, polynomial, etc.) to the time series data.
2. Identify and remove seasonal variations.
3. Calculate the residuals, which represent the cyclical variations.
Example:
Given a time series data of quarterly sales figures:
Steps:
1. Define a reference cycle based on historical data.
2. Compare the current data cycle with the reference cycle.
3. Measure deviations and similarities quantitatively.
Example:
Suppose the reference cycle is defined as follows:
69 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
3. Direct Method
The Direct Method involves directly measuring the cyclical component from the time series data without removing
the trend or seasonal effects. This method focuses on identifying peaks and troughs in the data.
Steps:
1. Identify the peaks and troughs in the data.
2. Calculate the amplitude of the cycles (the difference between peaks and troughs).
3. Analyze the duration of cycles to assess periodicity.
Example:
Given a time series of monthly sales:
Steps:
1. Use Fourier transforms to convert the time series data into the frequency domain.
2. Identify significant harmonics that represent cyclical variations.
3. Reconstruct the cyclical component using selected harmonics.
Example:
Suppose we have a time series data:
Data = [1, 2, 3, 4, 5, 4, 3, 2]
Applying Fourier transform yields harmonics:
Conclusion
Each of these methods provides unique insights into the cyclical variations in time series data. The choice of method
depends on the nature of the data, the underlying cycles, and the objectives of the analysis.
5.6.2 Deseasonalisation
Deseasonalisation is the process of removing seasonal components from time series data to obtain data that reflects
only the underlying trends and cycles. The resulting data, free from seasonal variations, is known as deseasonalised
data.
1. Multiplicative Model
In a multiplicative model, the relationship between the observed data Yt , the trend Tt , and the seasonal component
St is given by:
Yt = Tt × St
To deseasonalise the data, we divide the original data by the seasonal index. The seasonal index is typically expressed
as a percentage, so we must adjust for that by using an adjustment multiplier of 100.
71 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example:
Consider the following quarterly sales data and corresponding seasonal indices:
2. Additive Model
In an additive model, the relationship is expressed as:
Yt = Tt + St
In this case, deseasonalisation involves subtracting the seasonal component from the original data.
Example:
Using the same quarterly sales data:
72 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• They rely on historical data, which may not always predict future patterns accurately.
Conclusion
Deseasonalisation is a critical step in time series analysis, allowing analysts to focus on the underlying trends and
cycles in data without the influence of seasonal fluctuations.
73 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Dt = Yt − Yt−1
Where Dt is the first difference at time t and Yt is the sales at time t.
Dt = Yt − Yt−1
We calculate the first differences:
74 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Differencing
The simplest form of variate difference methods is first-order differencing, where the difference between consecutive
observations is calculated. The first difference is given by:
∆Yt = Yt − Yt−1
Where: - Yt is the value at time t, - Yt−1 is the value at the previous time period.
Month Sales
1 100
2 120
3 130
4 150
5 180
The first-order differences can be calculated as follows:
Month ∆Yt
2 20
3 10
4 20
5 30
Second-Order Differencing
If the time series still shows non-stationarity after first differencing, a second-order differencing can be applied:
∆2 Yt = ∆Yt − ∆Yt−1
This method is useful in capturing the cyclical patterns that may remain even after the first differencing.
75 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Month ∆Yt ∆2 Yt
2 20 −
3 10 10 − 20 = −10
4 20 20 − 10 = 10
5 30 30 − 20 = 10
The second-order differences indicate the rate of change of the first differences, helping us understand the under-
lying dynamics of the time series data.
5.7.3 Conclusion
The variate difference methods are effective in analyzing trends and fluctuations in time series data. By calculating
and interpreting the differences, we can derive valuable insights into the underlying patterns in the data.
Question Bank
1. Distinguish between seasonal variations and cyclical fluctuations. How would you measure secular trend in any
given data?
2. Describe the method of link relatives for calculating the seasonal variation indices.
3. How would you determine seasonal variation in the absence of trend?
4. Briefly describe the relative merits and demerits of the ratio to trend and ratio to moving average methods.
5. What do you understand by cyclical fluctuations in time series?
6. What do you understand by random fluctuation in time series?
7. Explain the term ”Business cycle” and point out the necessity of its study in time series analysis.
8. Calculate seasonal variation for the following data of sales in thousands Rs. of a firm by the Ratio to trend
method.
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
1979 30 40 36 34
1980 34 52 50 44
1981 40 58 54 48
1982 52 76 68 62
9. Calculate seasonal indices by the Ratio to moving average method from the following data.
10. The data below gives the average quarterly prices of a commodity for five years. Calculate seasonal indices by
the method of link relatives.
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
1979 30 26 22 31
1980 35 28 22 36
1981 31 29 28 32
1982 31 31 25 35
1983 34 36 26 33
76 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 3
Chapter - 1
77 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
where:
• It is the index number for year t.
• Pt is the price or value in year t.
78 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1. Economic Barometers: Index numbers act as economic barometers, measuring fluctuations in economic
indicators such as price levels, the money market, and economic cycles like inflation and deflation. According
to G. Simpson and F. Kafka, ”Index numbers are among the most widely used statistical devices today, taking
the pulse of the economy and indicating tendencies towards inflation or deflation.”
2. Formulation of Economic Policies: Index numbers play a crucial role in guiding economic and business
policies. For instance, when determining the increase in Dearness Allowance (DA) for employees, employers
rely on the Cost of Living Index. Failure to adjust salaries or wages according to cost of living changes can
lead to labor unrest, such as strikes or lockouts.
3. Studying Trends and Tendencies: Index numbers are extensively used to measure changes over time,
forming a time series that helps in analyzing the general trend of a phenomenon. For example, data on imports
over the last 8-10 years might indicate an upward trend.
4. Forecasting Future Economic Activity: Beyond analyzing past and present economic conditions, index
numbers are valuable for forecasting future economic activities, providing insights that help in making informed
decisions.
5. Measuring the Purchasing Power of Money: Index numbers, especially the Cost of Living Index, are
used to determine changes in real wages. Real wages can be calculated using the formula:
Money Wages
Real Wages = × 100
Price Index
This helps assess whether the purchasing power of money is rising, falling, or remaining constant.
6. Deflating Economic Data: Index numbers are crucial for deflating economic data, i.e., adjusting wages,
income, and sales figures for changes in the cost of living. This transformation allows for the calculation of real
wages, real income, and real sales using appropriate index numbers, providing a clearer picture of economic
conditions.
• Advantages:
– Easy to understand and calculate.
– Requires minimal data.
• Disadvantages:
– Does not account for the relative importance of commodities.
– May be misleading if the prices of less important commodities change drastically.
2. Simple Average of Relatives
• This method averages the price relatives of individual items. A price relative is the ratio of the current
year price to the base year price, multiplied by 100.
• When this method is used to construct a price index number, first of all price relatives are obtained for
the various items included in the index and then the average of these relatives is obtained using any one
of the averages i.e. mean or median etc.
P Pt
P0 × 100
I=
n
80 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• Advantages:
– Simple and easy to compute.
– Each commodity’s price change is accounted for.
• Disadvantages:
– Does not consider the quantity or importance of items.
– Sensitive to extreme values.
81 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Commodity P0 Pt Q0
A 50 60 10
B 80 100 5
C 120 150 8
D 30 40 12
82 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Commodity P0 Pt Qt
A 50 60 12
B 80 100 6
C 120 150 9
D 30 40 15
– Example:
83 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
84 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example
Item p0 p1 q0
A 10 12 100
B 20 25 50
C 15 18 80
(12 × 100) + (25 × 50) + (18 × 80) 1200 + 1250 + 1440
L= × 100 = × 100 = 119.35
(10 × 100) + (20 × 50) + (15 × 80) 1000 + 1000 + 1200
Example
Item p0 p1 q1
A 10 12 120
B 20 25 60
C 15 18 90
(12 × 120) + (25 × 60) + (18 × 90) 1440 + 1500 + 1620
P = × 100 = × 100 = 123.64
(10 × 120) + (20 × 60) + (15 × 90) 1200 + 1200 + 1350
Comparison
• Weights: Laspeyres’ index uses base period quantities, while Paasche’s index uses current period quantities.
• Bias: Laspeyres’ index tends to overstate price increases because it does not account for changes in consumption
patterns. Paasche’s index can understate price increases because it uses current period quantities that might
be influenced by price changes.
• Use Case: Laspeyres’ index is easier to compute when historical quantity data is available. Paasche’s index
is more reflective of current consumption patterns.
85 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• P = p1
p0 × 100 is the price relative.
• V = p0 q0 is the base year value (price multiplied by quantity in the base year).
Explanation
• The Arithmetic Mean (A.M.) method calculates the simple weighted average of price relatives.
• The Geometric Mean (G.M.) method is used when a more proportional measure is needed, as it reduces
the impact of extreme values.
These methods are commonly used in economic indices to calculate weighted averages, such as price index
numbers, by considering the relative importance of each component.
86 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Link Relatives
The formula for calculating the link relative for a current year is:
Current Year’s Figure
Link Relative for Current Year = × 100
Previous Year’s Figure
87 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Note
If there is only one commodity and its index is being calculated, the fixed base index number computed directly from
the original data will be equal to the chain index number computed from the link relatives.
Example: 1.
From the following data of wholesale prices of wheat for ten years construct index number taking
1. 1998 as base
Solution
88 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Given Data
89 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Example 3.
Compute the chain base index numbers
Example 4.
Calculate fixed base index numbers from the following chain base index numbers
90 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
article booktabs
2. The chain base method does not require recalculation if some more items are introduced or deleted from the
old data.
3. Index numbers calculated using the chain base method are free from seasonal and cyclical variations.
91 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Task
Shift the base from 1998 to 2004 and recast the index numbers.
Solution
The formula for shifting the base year is:
Old Index Number
Index Number Based on New Base Year (2004) = × 100
Index for 2004
92 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Conclusion
The recalculated index numbers with 2004 as the base year reflect the price movements relative to the new base year.
Example 1.
The index A given was started in 1993 and continued up to 2003 in which year another index B was started. Splice
the index B to index A so that a continuous series of index is made
93 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Solution
7.9 Deflating
Deflating means correcting or adjusting a value that has inflated. It makes allowances for the effect of price changes.
When prices rise, the purchasing power of money declines. For example, if the money incomes of people remain
constant between two periods but the prices of commodities double, the purchasing power of money is reduced to
half.
For instance, if the price of rice increases from Rs.10/kg in the year 1980 to Rs.20/kg in the year 1982, a person
can buy only half a kilogram of rice with Rs.10 in 1982. This implies that the purchasing power of a rupee is only
50 paise in 1982 compared to 1980.
94 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Real Wages
In times of rising prices, money wages should be deflated by the price index to get the figure of real wages. Real
wages alone indicate whether a wage earner is in a better or worse position.
To calculate real wages, the money wages or income are divided by the corresponding price index and multiplied
by 100:
Money Wages
Real Wages = × 100
Price Index
The real wage index can also be computed using the following formula:
Real Wage of Current Year
Real Wage Index = × 100
Real Wage of Base Year
These calculations provide meaningful insights into the actual purchasing power and living standards of individuals
over time.
Exercise 1.
The following table gives the annual income of a worker and the general Index Numbers of price during 1999-2007.
Prepare Index Number to show the changes in the real income of the teacher and comment on price increase.
95 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Price Index based on 2010 for A = 120, Price Index based on 2010 for B = 110
Quantity Index based on 2010 for A = 95, Quantity Index based on 2010 for B = 105
Multiplying the price and quantity indices for A and B:
96 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
5. Test of Homogeneity
The test of homogeneity ensures that an index number should be applicable to the entire data set or series, regardless
of the types of commodities or components being considered. It means that the variables included in the index should
share the same characteristics, making them compatible for comparison.
Example: If we calculate an index number for different commodities such as food, clothing, and transportation,
they should be comparable if they share similar characteristics, such as being part of the consumer basket. If, for
example, we include a highly volatile commodity like gold in the same index, it could distort the results, as it does
not have the same consumption pattern as food or clothing.
97 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1. Decision about the class of people for whom the index is meant: It is important to decide the target
class of people, such as industrial workers, teachers, officers, or laborers. Additionally, the geographical area
(e.g., a city, industrial area, or locality) should also be specified.
2. Conducting family budget enquiry: After defining the scope, a sample family budget enquiry is conducted
for the target group. This involves selecting a sample of families and analyzing their budgets in detail during
a normal economic period. The enquiry provides information about the average expenditure on different
commodities, categorized as:
• Food
• Clothing
• Fuel and Lighting
• House Rent
• Miscellaneous
3. Collecting retail prices of different commodities: Retail prices are collected from local markets, super
bazaars, or departmental stores frequented by the target group. Since prices may vary by location, shop, and
individual, this step is both critical and challenging.
1. They indicate whether real wages are rising or falling, which helps in determining the purchasing power of
money. The purchasing power of money can be calculated as:
1
Purchasing Power of Money =
Cost of Living Index Number
Real wages can be computed as:
Money Wages × 100
Real Wages =
Cost of Living Index Number
2. They are used to regulate dearness allowance (D.A.) or grant bonuses to workers, enabling them to cope with
increased living costs.
3. They play a crucial role in wage negotiations.
98 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
3. Divide the aggregate expenditure of the current year by the aggregate expenditure of the base year and multiply
the quotient by 100: P
p1 q0
Consumer Price Index = P × 100
p0 q0
p1 /p0 × 100
where p1 is the price in the current year, and p0 is the price in the base year.
• The weight v for each commodity is given by v = p0 q0 , the value of the commodity in the base year.
Note: It should be noted that the answer obtained by applying the Aggregate Expenditure Method and the
Family Budget Method will be the same.
Example 1.
Construct the consumer price index number for 2007 on the basis of 2006 from the following data using (i) the
aggregate expenditure method, and (ii) the family budget method.
99 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
1. Inaccurate specification of groups: Errors may occur if the group for whom the index is meant is not
accurately specified.
2. Faulty selection of representative commodities: This can result from unscientific family budget inquiries,
leading to an unrepresentative selection of commodities.
3. Inadequate and unrepresentative price quotations: If price quotations are inadequate or unrepresenta-
tive, or if inaccurate weights are used, the index number may not reflect the true cost of living.
4. Frequent changes in demand and prices: Fluctuations in the demand and prices of commodities can affect
the reliability of the cost of living index.
5. The average family may not be representative: The average family used in the construction of the index
might not always be a truly representative sample of the target population.
100 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 3
Chapter - 2
8 Forecasting Strategies
Businesses rely on forecasts of sales to plan production, justify marketing decisions, and guide research. A very
efficient method of forecasting one variable is to find a related variable that leads it by one or more time intervals.
The closer the relationship and the longer the lead time, the better this strategy becomes. The trick is to find a
suitable lead variable.
For example, an Australian example is the Building Approvals time series published by the Australian Bureau of
Statistics. This provides valuable information on the likely demand over the next few months for all sectors of the
building industry. The number of building approvals can be a leading indicator of future construction activities.
As the approval of new buildings generally precedes actual construction, businesses can forecast the demand for
construction materials and labor using this data. A variation on the strategy of seeking a leading variable is to find
a variable that is associated with the variable we need to forecast and is easier to predict. For instance, the sales of
winter clothing might be more directly correlated with the weather forecast than with past sales data, making it a
useful variable to predict future demand.
• Approvals: Total dwellings approved per month, averaged over the past three months.
• Activity: The value of building work done in millions of Australian dollars, chain volume measured at the
reference year 2004–05 prices.
The time series objects, App.ts and Act.ts, are created for approvals and activity, respectively. The ts.plot
function plots both series on the same graph, allowing for comparison and analysis of the trends over time.
102 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 16: Building approvals (solid line) and building activity (dotted line).
In Figure 16, we can see that the building activity tends to lag one quarter behind the building approvals, or
equivalently that the building approvals ap- pear to lead the building activity by a quarter. The cross-correlation
function, which is abbreviated to ccf, can be used to quantify this relationship. A plot of the cross-correlation
function against lag is referred to as a cross-correlogram.
8.4 Cross-Correlation
Suppose we have time series models for variables x and y that are stationary in both mean and variance. These
variables may each be serially correlated, and correlated with each other at different time lags. The combined
model is second-order stationary if all these correlations depend only on the lag. In this case, we can define the
cross-covariance function (ccvf) as a function of the lag k:
γk (x, y)
ρk (x, y) =
σx σy
Where σx and σy are the standard deviations of x and y, respectively.
The ccvf and ccf can be estimated from a time series using their sample equivalents. The sample ccvf ck (x, y) is
calculated as:
n−k
1X
ck (x, y) = (xt+k − x̄) (yt − ȳ)
n t=1
The sample acf is defined as:
ck (x, y)
rk (x, y) = p
c0 (x, x)c0 (y, y)
103 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 17: Correlogram and cross-correlogram for building approvals and building activity.
In Figure 17, the autocorrelations for x and y are in the upper left and lower right frames, respectively, and the
cross-correlations are in the lower left and upper right frames. The time unit for the lag is one year, so a correlation
at a lag of one quarter appears at 0.25. If the variables are independent, we would expect 5% of sample correlations
to lie outside the dashed lines. Several of the cross-correlations at negative lags pass these lines, indicating that the
approvals time series is leading the activity. Numerical values can be printed using the print() function, and are as
follows at lags of 0, 1, 2, and 3:
104 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
The ccf can be calculated for any two time series that overlap, but if both have trends or similar seasonal effects,
these will dominate (Exercise 1). It is often necessary to remove the trend and seasonal effects before investigating
cross-correlations. Here, we use the decompose function, which uses a centered moving average of four quarters (see
Fig. 18).
We perform the decomposition for both series as follows:
1 app . ran <- decompose ( App . ts ) $ random
2 app . ran . ts <- window ( app . ran , start = c (1996 , 3) )
3 act . ran <- decompose ( Act . ts ) $ random
4 act . ran . ts <- window ( act . ran , start = c (1996 , 3) )
5
6 acf ( ts . union ( app . ran . ts , act . ran . ts ) )
7 ccf ( app . ran . ts , act . ran . ts )
The output will display the autocorrelations for the approvals and activity series as well as the cross-correlations
between the two series. A sample output is:
105 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 19: Cross-correlogram of the random components of building approvals and building activity after using
decompose
The ccf function produces a single plot, shown in Figure 19, illustrating the lagged relationship between the two
time series. The Australian Bureau of Statistics publishes building approvals data by state and other categories, and
specific sectors of the building industry may find higher correlations between demand for their products and one of
these series.
Nt (m − Nt )
Nt+1 = Nt + p(m − Nt ) + q
m
This equation states that the increase in sales over the next period is the sum of two components: - A fixed
proportion p of people who will eventually buy the product, - A time-varying proportion qN m of people who are
t
m 1 − e−(p+q)t
Nt =
1 + pq e−(p+q)t
This is the discrete-time form of the model. A continuous-time version also exists, which is easier to verify
mathematically.
106 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
f (t)
h(t) =
1 − F (t)
The hazard function in the Bass model is given by:
h(t) = p + qF (t)
This shows that the hazard depends on the cumulative proportion of people who have adopted the product by
time t. The cumulative distribution function F (t) can be expressed as:
1 − e−(p+q)t
F (t) =
1 + pq e−(p+q)t
Two special cases of the distribution occur when q = 0 (exponential distribution) and p = 0 (logistic distribution).
The logistic distribution resembles the normal distribution.
The probability density function is the derivative of the cumulative distribution function:
(p + q)2 e−(p+q)t
f (t) = 2
p 1 + pq e−(p+q)t
The time to peak sales, tpeak , occurs when the sales rate is maximized. It is given by:
log(q) − log(p)
tpeak =
p+q
8.6.3 Example
In this example, we fit the Bass model to the yearly sales of VCRs in the US home market between 1980 and 1989
using the R non-linear least squares function nls. The variable T79 represents the year from 1979, while Tdelt
denotes the time from 1979 at a finer resolution (0.1 year) for plotting the Bass curves. The cumulative sum function
cumsum is useful for monitoring changes in the mean level of the process.
The sales data and the cumulative sales are given by:
Sales = {840, 1470, 2110, 4000, 7590, 10950, 10530, 9470, 7790, 5890}
Cumulative Sales = cumsum(Sales)
We fit the Bass model using the following R code:
1 T79 <- 1:10
2 Tdelt <- (1:100) / 10
3 Sales <- c (840 , 1470 , 2110 , 4000 , 7590 , 10950 , 10530 , 9470 , 7790 , 5890)
4 Cusales <- cumsum ( Sales )
5 Bass . nls <- nls ( Sales ~ M * ( (( P + Q ) ^2 / P ) * exp ( -( P + Q ) * T79 ) ) /
6 (1+( Q / P ) * exp ( -( P + Q ) * T79 ) ) ^2 , start = list ( M =60630 , P =0.03 , Q =0.38) )
7 summary ( Bass . nls )
107 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 20: Bass sales curve fitted to sales of VCRs in the US home market, 1980–1989.
This generates two plots: the first showing the predicted sales per year in fig 20 and the second showing the
cumulative sales fig ??.
108 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Figure 21: Bass cumulative sales curve, obtained as the integral of the sales curve, and cumulative sales of VCRs in
the US home market, 1980–1989.
Product m p q
Typical product - 0.030 0.380
35 mm projectors (1965–1986) 3.37 ×106 0.009 0.173
Overhead projectors (1960–1970) 0.961 ×106 0.028 0.311
PCs (1981–2010) 3.384 ×109 0.001 0.195
Although forecasts based on the Bass model are inherently uncertain, they offer the best available information for
marketing and investment decisions. Scenarios can be developed based on the most likely, optimistic, and pessimistic
sets of parameters.
109 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
The HoltWinters() function can be used for exponential smoothing by setting beta=FALSE and gamma=FALSE,
as it disables trend and seasonality components.
110 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
In this code: - seasonal="additive" is used when the seasonal component is additive. - seasonal="multiplicative"
is used when the seasonal component is multiplicative.
This will plot the original data (black), the fitted values for the additive model (blue), and the forecasted values
for the next 3 periods (red).
111 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
• Holt-Winters method is appropriate when there is a trend or seasonal component in the data. Use the
additive model when the seasonal variations are roughly constant, and the multiplicative model when they are
proportional to the level of the series.
112 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 4
Chapter - 1
113 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 4
Chapter - 2
114 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S
Time Series Analysis
Module - 5
Chapter - 1
11 Linear Models
11.1 Moving Average models
11.2 Fitted MA Models
11.2.1 Autoregressive Moving Average Models
115 Time Series Analysis, Lecture Review, Dr. Kalyan N & Prof. Sangeetha S