A Method For Visualizing Multivariate Time Series Data
A Method For Visualizing Multivariate Time Series Data
A Method For Visualizing Multivariate Time Series Data
Abstract
Visualization and exploratory analysis is an important part of any data
analysis and is made more challenging when the data are voluminous and
high-dimensional. One such example is environmental monitoring data,
which are often collected over time and at multiple locations, resulting in a
geographically indexed multivariate time series. Financial data, although
not necessarily containing a geographic component, present another source
of high-volume multivariate time series data. We present the mvtsplot
function which provides a method for visualizing multivariate time series
data. We outline the basic design concepts and provide some examples of
its usage by applying it to a database of ambient air pollution measure-
ments in the United States and to a hypothetical portfolio of stocks.
1 Introduction
Multivariate time series data are collected in a number of different fields rang-
ing from environmental health to finance and the physical sciences. In some
instances, the data may be geographically indexed, so that each individual time
series can be thought of as representing a specific location. In other instances,
the time series may be grouped together because they share an unobserved char-
acteristic or are correlated for some other reason. For example, time series of
stock or mutual fund prices might be grouped together because they fall into a
predefined asset class.
Monitoring of environmental factors over time and space is a common com-
ponent of a number of important applications. Monitoring can provide data to
produce descriptive analyses of a given geographic area, to serve as early warn-
ing systems for various contaminants, or to assess compliance with regulatory
mandates. Often, monitoring is conducted in many different locations using a
fixed sampling rate over a period of time. Taken together, the data from each
1
of the monitors can be thought of as a geographically indexed multivariate time
series.
Monitoring data for ambient air pollutants in particular is abundant due to
the US Environmental Protection Agency’s mandate to enforce National Am-
bient Air Quality Standards throughout the country. The EPA’s Air Quality
System is a national network of monitors designed to collect near-daily samples
of particulate matter, ozone, nitrogen dioxide, sulfur dioxide, carbon monox-
ide, and lead. Other important monitoring networks include the National
Weather Service’s network for measuring temperature and humidity and the
EPA’s Chemical Speciation Network for collecting data on the chemical compo-
nents of fine particulate matter. Given such available data, it is important to
develop tools for exploratory analysis and for summarizing the data.
2
Los Angeles, CA
10 20 30 40 50
ppb
Chicago, IL
50
30
ppb
0 10
Bronx, NY
60
40
ppb
20
0
Figure 1: Daily ozone levels in parts per billion (ppb) for Los Angeles, CA,
Chicago, IL, and Bronx, NY, for the years 1999–2005.
3
X 20
X 19
X 18
X 17
X 16
X 15
X 14
X 13
X 12
X 11
X 10
X9
X8
X7
X6
X5
X4
X3
X2
X1
colors. The colors used are taken from palettes provided by the RColorBrewer
package (Neuwirth, 2007). By default, we use the diverging palette PRGn, which
assigns purple to low values, grey to medium values, and green to high values.
The discretization of the data values is an important aspect of the plotting
code. Our default of three levels is arbitrary, but it provides the user with the
ability to visualize variation in the data while also acting as a simple smoother.
In particular, the discretization handles highly skewed data and outliers by
not allowing those skewed values to push the remaining values into a relatively
narrow range of the color spectrum. The choice of the number of levels for
discretization depends on the nature of the data and is analogous to choosing
the number of degrees of freedom to use in a smoother. Very smooth data can
usually be plotted with many levels; very noisy or spikey data typically need
to be plotted with fewer levels to be useful in an image plot. Currently, the
mvtsplot function discretizes the data using quantiles of the time series values.
Therefore, using the default of three levels implies dividing the data into tertiles
with roughly an equal number of points in each.
An example of the default plot created by mvtsplot is shown in Figure 2
using simulated data for 20 time series of length 200. The simulated data
actually come from two groups. Ten of the series have an error distribution
that is mean 0 and variance 1 while the other 10 come from a distribution that
is mean 1 and variance 4. The black horizontal line in the middle of Figure 2
indicates the separation between the two groups of time series. This line is
drawn when the group argument is non-NULL.
Although the two groups of time series come from different distributions,
the plot appears the same for both because, by default, each time series is
categorized individually. Hence, each time series is assigned to “low”, “medium”,
and “high” using an internal definition of those categories. One can change this
4
X 20
X 19
X 18
X 17
X 16
X 15
X 14
X 13
X 12
X 11
X 10
X9
X8
X7
X6
X5
X4
X3
X2
X1
behavior by setting the argument norm = "global" which normalizes each time
series using a definition of the categories based on data from all the time series,
not just the time series under consideration. Figure 3 shows the data when they
are normalized using the global categories. One can see now that the top group
has an overall higher level, producing mostly green values while the bottom
group is lower, producing mostly purple values.
While normalizing the data using global categories can provide some insight
into the differences between the time series, it is often more useful to use the
internal normalization and to show information about the overall levels in the
margins. Setting the margin option to TRUE (the default) makes mvtsplot show
summary information about the time series in the bottom and right hand sides
of the image plot. Specifically, on the right hand side panel are boxplots of the
data in each time series and on the bottom panel are median values across all the
time series for each time point. Figure 4 shows this plot for the simulated data.
Now we can see in the right hand side panel that the top group not only has a
higher level but also a larger variance than the bottom group. In the bottom
panel we see that on average across the 40 time series, there is a downward
trend that was not as easily seen in the image plot. In Figure 4, we have also
increased the number of colors used in the image to 7 from the default of 3.
In many exploratory statistical analyses, it is often useful to smooth the data
to obtain a clearer picture of the trends. The mvtsplot function has an option
called smooth.df which can be used to apply a natural spline smoother to each
of the time series in the time series matrix. The smooth.df argument specifies
the number of degrees of freedom to be used in the natural spline smoother for
each of the time series (the same number is used for each series). Once the
smoother is fit to the data, each observed value is replaced by its fitted value
and then plotted. Missing data are replaced by their predicted values, so this
5
X 20 ●
X 19 ●
X 18 ●
X 17 ●
X 16 ●
X 15 ●
X 14 ●
X 13 ●
X 12 ●
X 11 ●
X 10 ●
X9 ●
X8 ●
X7 ●
X6 ●
X5 ●
X4 ●
X3 ●
X2 ●
X1 ●
1.5
−4 0 2 4 6
1.0
0.5
Level
0.0
−0.5
−1.0
6
option can be used to fill in occasional missing data. While it is probably not
wise to fill in long stretches of missing data (and mvtsplot attempts to detect
such long stretches at the beginning and at the end of the series), missing values
could reasonably be filled in if the underlying process were sufficiently smooth.
Missing values often obscure the overall picture in a dataset and filling in values
can be useful for visualization purposes if not for inference. We make use of this
in the example in Section 3.2.
These arguments can be used to control the display and the data that are
plotted.
x: an n × p matrix representing a multivariate time series where n is the
length of each series and p is the number of time series. The matrix is
interpreted such that the first row is the first time point and the last row
is the last time point. If x is a data frame, it is converted to a matrix
using data.matrix.
group: a vector of length equal to the number of columns of x. Each
element of the vector should assign a column of x into a group. If group
is not a factor it will be coerced to be one.
xtime: a vector of dates or times. This vector should inherit from class
Date or POSIXt. If xtime is NULL (the default) then the vector seq_len(nrow(x))
is used.
norm: possible values are “internal” and “global”. Should categorization of
the time series be based on within-series categories (“internal”) or using a
global set of categories (“global”)?
levels: the number of categories or a vector of ranges describing the
categories into which each time series should be divided. When specifying
ranges, the vector should be in the form of quantiles (i.e. between 0 and
1).
smooth.df: the number of degress of freedom to use in a natural spline
smooth of the data. The default is NULL which indicates no smoothing.
7
margin: logical, indicating whether the marginal summary information
should be plotted on the bottom and right sides. The default is TRUE.
sort: the function for determining the sorting of the columns of the time
series matrix. The default (sort = NULL) is no sorting but one can specify
an arbitrary function to sort the columns in increasing order of the values
of that function applied to the columns of the time series matrix.
main: a character string indicating a name for the plot. This name will
be plotted in the lower right corner of the overall plot.
palette: the RColorBrewer palette to be used for the image plot. The
default is PRGn which uses green to indicate high values and purple to
indicate low values.
xlim: the range of x values for plotting. If xtime is non-NULL, then xlim
should be expressed in the same values.
gcol: the color of the grid lines used to indicate the boundary between
different groups (only used when group is non-NULL).
3 Examples
3.1 Ozone in the United States
Ozone (O3 ) is one of the “criteria pollutants” monitored by the US EPA and has
been shown in numerous studies to have a short-term effect on daily mortality
and hospitalization (Samet et al., 2000; Bell et al., 2004, see e.g.). Figure 5
shows daily ozone in the 100 largest US counties, ordered from top to bottom
by decreasing latitude. These data were obtained from the EPA’s Air Quality
System. When multiple monitors for ozone were available on a given day in
a given county, the values for those monitors were averaged together using a
trimmed mean to create a county-wide average. From the bottom panel we see
that ozone is highly seasonal across the US with a summer peak and winter
trough. Indeed, ozone development depends on sunlight and is correlated with
temperature. In the right hand side panel we see that the overall levels of
8
ozone as well as the within-county ranges of variation are similar across the 100
counties.
Missing data in Figure 5 are denoted by the color white. Many of the counties
have blocks of missing data equal to about six months during the late fall, winter,
and early spring periods. This is because ozone is often not monitored in areas
where the levels are very low during the winter season. Notice that as the
latitude decreases (towards the bottom of the plot), the number of counties
with all-year monitoring increases. Also, in the lower latitudes, the pattern of
ozone variation tends to be less strongly seasonal, indicated by a more scattered
pattern of greens and purples as opposed to long blocks of either color. This
presumably reflects the relative similarity of the different seasons in the warmer
latitudes.
9
to smooth each of the time series in the matrix. The result of smoothing the
sulfate time series is shown in Figure 7. Here we use a natural spline smoother
with 16 degrees of freedom. We see clearly that sulfate tends to peak in the
summer time and reach its lowest levels in the winter. The bottom panel shows
the median of the smooths across counties. We also see that this general seasonal
pattern holds for almost all of the counties, regardless of what region they fall
in.
One component of PM2.5 that is thought to be particularly harmful is
nickel (Lippmann et al., 2006). Compared to sulfate, nickel makes up a very
small fraction of total PM2.5 mass. Figure 8 shows the smoothed daily nickel
levels for the 98 US counties, sorted by their overall median level. The data here
are smoothed using a natural spline smoother with 32 degrees of freedom. In
contrast to sulfate, we do not see any obvious seasonal or other temporal trend
in the data, either overall or for any particular county. Also, we can see that in
general, levels of nickel are very low with the exception of a few counties. Bronx,
Queens, and New York counties (all in New York state) all have levels of nickel
that are much higher than in all other counties. Similarly, these three counties
have a much larger range of variation across time. One anomaly in Figure 8 is
a large peak in the wither of 2005. It is not immediately clear why such a rela-
tively large peak occurs during this time across many different counties except
that nickel tends to be highly variable across time and that such a peak simply
occurred by chance.
Figure 9 shows 40 of the 56 components measured by the Speciation Trends
Network for Bronx County, New York. The 40 components shown in Figure 9
are the 40 largest components as a percent of the total PM2.5 mass. We see from
the right hand side panel that handful of components—sulfate, organic carbon
(OC_K14), ammonium, nitrate, and elemental carbon—dominate the total PM2.5
mass. The lower panel uses a different summary statistic than the previous plots.
Since the median component level across components is not interesting, we set
the rowstat argument to be the sum function so that the bottom panel shows
the sum of all the component levels, closely approximating the total PM2.5 mass.
Note that we have applied a smoother with 16 degrees of freedom to each of the
columns of the component matrix.
One interesting feature of the data that is shown in the bottom panel is the
presence of roughly two peaks in each calendar year, one in summer and one in
winter. From the image plot, it would appear that the summer peak is largely
driven by the levels of organic carbon and sulfate while the winter peak is driven
by nitrate and elemental carbon. Ammonium appears to correlate with both
peaks, which is not surprising since it appears in the air as either ammonium
nitrate or ammonium sulfate.
10
cal price data for stocks and mutual funds in the US. The R user already has
a number of tools for downloading financial data, including the tseries pack-
age (Trapletti and Hornik, 2007). There are numerous other packages available
on CRAN for analyzing financial time series data and for calculating many
common statistics.
In this section we present an example of visualizing an hypothetical portfolio
of stocks and exchange traded funds (ETFs). We have collected daily price data
for a portfolio consisting of an S&P 500 index exchange traded fund (ticker
symbol SPY), an ETF tracking the Lehman Brothers Aggregate Bond Index
(AGG), an ETF tracking an index of Treasury Inflation Protected Securities
(TIP), and nine large real estate investment trusts (REITs)1 . The data were
obtained using the get.hist.quote function from the tseries package.
Figure 10 shows the portolio over the period 2006–2007. We took an hypo-
thetical $10,000 investment and divided it equally between the portfolio com-
ponents. The bottom panel shows the sum of the components, i.e. the market
value of the entire portfolio. One can see that the REITs rose dramatically in
2006 and early 2007 and then declined towards the end of the period. The S&P
500 index fund increases in value more or less steadily throughout the period
but its relatively small allocation is not sufficient to prop up the market value
of the portfolio. The right hand side panel shows the differences in variation
between the holdings. As one might expect, the two bond funds indicate the
least variation while the REITs indicate much higher levels of variability.
4 Discussion
We have presented the mvtsplot function for producing plots of large-scale
multivariate time series data that can be useful for exploratory analysis prior to
formal model fitting. We have demonstrated the function’s usage by applying
it to data on ambient air pollution monitoring as well as stock prices.
Many choices had to be made in order to make the default plots reasonable.
In particular, an attempt is made to calculate the appropriate positioning of
the axis labels, but this system is easily thwarted. The mvtsplot function also
uses the layout function to position the different panels when margin = TRUE,
and so subsequent modification of the individual panels is not possible. We
encourage interested users to make their own modifications to the code to suit
their unique purposes.
References
Bell ML, Dominici F, Ebisu K, Zeger SL, Samet JM (2007). “Spatial and tem-
poral variation in PM2.5 chemical composition in the United States for health
effects studies.” Environmental Health Perspectives, 115, 989–995.
1 Ticker symbols SPG, PLD, VNO, BXP, EQR, GGP, HST, KIM, PSA
11
Bell ML, McDermott A, Zeger SL, Samet JM, Dominici F (2004). “Ozone and
Short-term Mortality in 95 US Urban Communities, 1987-2000.” Journal of
the American Medical Association, 292(19), 2372–2378.
Samet JM, Zeger SL, Dominici F, et al (2000). The National Morbidity, Mor-
tality, and Air Pollution Study, Part II: Morbidity and Mortality from Air
Pollution in the United States. Health Effects Institute, Cambridge MA.
A Figures
12
King, WA ●
Pierce, WA ●
Multnomah, OR ●
Monroe, NY ●
Milwaukee, WI ●
Kent, MI ●
Erie, NY ●
Essex, MA ●
Macomb, MI ●
Oakland, MI ●
Middlesex, MA ●
Wayne, MI ●
Suffolk, MA ●
Worcester, MA ●
Lake, IL ●
Norfolk, MA ●
DuPage, IL ●
Providence, RI ●
Cook, IL ●
Bristo, MA ●
Hartford, CT ●
Cuyahoga, OH ●
New Haven, CT ●
Fairfield, CT ●
Summit, OH ●
Westchester, NY ●
Bergen, NJ ●
Bronx, NY ●
Suffolk, NY ●
Essex, NJ ●
Hudson, NJ ●
Queens, NY ●
Salt Lake, UT ●
Middlesex, NJ ●
Allegheny, PA ●
Philadelphia, PA ●
Franklin, OH ●
Delaware, PA ●
Camden, NJ ●
Marion, IN ●
Montgomery, OH ●
Denver, CO ●
Baltimore, MD ●
Baltimore (city), MD ●
Hamilton, OH ●
Montgomery, MD ●
District of Columbia, DC ●
Prince George's, MD ●
Fairfax, VA ●
St. Louis, MO ●
Sacramento, CA ●
Jefferson, KY ●
San Joaquin, CA ●
Contra Costa, CA ●
San Francisco, CA ●
Alameda, CA ●
San Mateo, CA ●
Santa Clara, CA ●
Fresno, CA ●
Davidson, TN ●
Clark, NV ●
Tulsa, OK ●
Wake, NC ●
Oklahoma, OK ●
Kern, CA ●
Mecklenburg, NC ●
Shelby, TN ●
Bernalillo, NM ●
San Bernardino, CA ●
Los Angeles, CA ●
Cobb, GA ●
DeKalb, GA ●
Riverside, CA ●
Fulton, GA ●
Jefferson, AL ●
Maricopa, AZ ●
San Diego, CA ●
Dallas, TX ●
Tarrant, TX ●
Pima, AZ ●
El Paso, TX ●
Travis, TX ●
Duval, FL ●
Harris, TX ●
Bexar, TX ●
Orange, FL ●
Hillsborough, FL ●
Pinellas, FL ●
Palm Beach, FL ●
Hidalgo, TX ●
Broward, FL ●
Miami−Dade, FL ●
Honolulu, HI ●
0 20 40 60
40
Level
30
20
10
Figure 5: Daily ozone for the largest 100 US counties, 2002–2005. Counties are
sorted from top to bottom by decreasing latitude.
13
Davis, UT ●
Hinds, MS ●
Washoe, NV ●
Pulaski, AR ●
Adams, CO ●
E. Baton Rouge, LA ●
Bernalillo, NM ●
Tulsa, OK ●
Multnomah, OR ●
Kern, CA ●
El Paso, TX ●
Fresno, CA ●
Pima, AZ ●
Shelby, TN ●
Salt Lake, UT ●
Sacramento, CA ●
Clark, NV ●
Riverside, CA ●
Santa Clara, CA ●
King, WA ●
Dallas, TX ●
San Diego, CA ●
Maricopa, AZ ●
Los Angeles, CA ●
Kanawha, WV ●
Buncombe, NC ●
Lackawanna, PA ●
Chatham, GA ●
Leon, FL ●
Dauphin, PA ●
Mahoning, OH ●
Madison, IL ●
Fayette, KY ●
Henrico, VA ●
Northampton, PA ●
Madison, AL ●
Rockingham, NH ●
Erie, PA ●
Lorain, OH ●
Escambia, FL ●
Forsyth, NC ●
Charleston, SC ●
Richland, SC ●
Washtenaw, MI ●
Butler, OH ●
Waukesha, WI ●
Polk, IA ●
Stark, OH ●
Greenville, SC ●
Hillsborough, NH ●
York, PA ●
Knox, TN ●
Mobile, AL ●
Guilford, NC ●
Lucas, OH ●
Hampden, MA ●
Morris, NJ ●
Lancaster, PA ●
Anne Arundel, MD ●
New Castle, DE ●
Camden, NJ ●
Ramsey, MN ●
Union, NJ ●
Summit, OH ●
Delaware, PA ●
Montgomery, OH ●
Davidson, TN ●
District of Columbia, DC ●
Kent, MI ●
Providence, RI ●
Wake, NC ●
Jefferson, AL ●
DeKalb, GA ●
Suffolk, MA ●
Jefferson, KY ●
Mecklenburg, NC ●
Monroe, NY ●
Middlesex, NJ ●
Baltimore, MD ●
New Haven, CT ●
Hamilton, OH ●
Marion, IN ●
DuPage, IL ●
Milwaukee, WI ●
Erie, NY ●
Hillsborough, FL ●
Franklin, OH ●
Hennepin, MN ●
Allegheny, PA ●
Bronx, NY ●
Cuyahoga, OH ●
Philadelphia, PA ●
New York, NY ●
Wayne, MI ●
Queens, NY ●
Miami−Dade, FL ●
Cook, IL ●
0 2 4 6 8 10 14
20
15
Level
10
14
Davis, UT ●
Hinds, MS ●
Washoe, NV ●
Pulaski, AR ●
Adams, CO ●
E. Baton Rouge, LA ●
Bernalillo, NM ●
Tulsa, OK ●
Multnomah, OR ●
Kern, CA ●
El Paso, TX ●
Fresno, CA ●
Pima, AZ ●
Shelby, TN ●
Salt Lake, UT ●
Sacramento, CA ●
Clark, NV ●
Riverside, CA ●
Santa Clara, CA ●
King, WA ●
Dallas, TX ●
San Diego, CA ●
Maricopa, AZ ●
Los Angeles, CA ●
Kanawha, WV ●
Buncombe, NC ●
Lackawanna, PA ●
Chatham, GA ●
Leon, FL ●
Dauphin, PA ●
Mahoning, OH ●
Madison, IL ●
Fayette, KY ●
Henrico, VA ●
Northampton, PA ●
Madison, AL ●
Rockingham, NH ●
Erie, PA ●
Lorain, OH ●
Escambia, FL ●
Forsyth, NC ●
Charleston, SC ●
Richland, SC ●
Washtenaw, MI ●
Butler, OH ●
Waukesha, WI ●
Polk, IA ●
Stark, OH ●
Greenville, SC ●
Hillsborough, NH ●
York, PA ●
Knox, TN ●
Mobile, AL ●
Guilford, NC ●
Lucas, OH ●
Hampden, MA ●
Morris, NJ ●
Lancaster, PA ●
Anne Arundel, MD ●
New Castle, DE ●
Camden, NJ ●
Ramsey, MN ●
Union, NJ ●
Summit, OH ●
Delaware, PA ●
Montgomery, OH ●
Davidson, TN ●
District of Columbia, DC ●
Kent, MI ●
Providence, RI ●
Wake, NC ●
Jefferson, AL ●
DeKalb, GA ●
Suffolk, MA ●
Jefferson, KY ●
Mecklenburg, NC ●
Monroe, NY ●
Middlesex, NJ ●
Baltimore, MD ●
New Haven, CT ●
Hamilton, OH ●
Marion, IN ●
DuPage, IL ●
Milwaukee, WI ●
Erie, NY ●
Hillsborough, FL ●
Franklin, OH ●
Hennepin, MN ●
Allegheny, PA ●
Bronx, NY ●
Cuyahoga, OH ●
Philadelphia, PA ●
New York, NY ●
Wayne, MI ●
Queens, NY ●
Miami−Dade, FL ●
Cook, IL ●
0 2 4 6 8 10 14
6
5
Level
15
Bronx, NY ●
Queens, NY ●
New York, NY ●
Union, NJ ●
Philadelphia, PA ●
New Castle, DE ●
New Haven, CT ●
Camden, NJ ●
Delaware, PA ●
Providence, RI ●
Santa Clara, CA ●
Suffolk, MA ●
Los Angeles, CA ●
Chatham, GA ●
Hampden, MA ●
Hillsborough, NH ●
San Diego, CA ●
Cuyahoga, OH ●
King, WA ●
Northampton, PA ●
Rockingham, NH ●
E. Baton Rouge, LA ●
Multnomah, OR ●
Middlesex, NJ ●
Allegheny, PA ●
Riverside, CA ●
Miami−Dade, FL ●
Baltimore, MD ●
Charleston, SC ●
Hillsborough, FL ●
Erie, PA ●
York, PA ●
Morris, NJ ●
Lancaster, PA ●
Kanawha, WV ●
Buncombe, NC ●
Lackawanna, PA ●
Davis, UT ●
Leon, FL ●
Hinds, MS ●
Dauphin, PA ●
Mahoning, OH ●
Fayette, KY ●
Henrico, VA ●
Madison, AL ●
Lorain, OH ●
Escambia, FL ●
Forsyth, NC ●
Richland, SC ●
Washtenaw, MI ●
Butler, OH ●
Washoe, NV ●
Waukesha, WI ●
Pulaski, AR ●
Adams, CO ●
Stark, OH ●
Greenville, SC ●
Mobile, AL ●
Guilford, NC ●
Lucas, OH ●
Ramsey, MN ●
Summit, OH ●
Montgomery, OH ●
Tulsa, OK ●
Kent, MI ●
Wake, NC ●
Kern, CA ●
Jefferson, AL ●
Jefferson, KY ●
Mecklenburg, NC ●
Fresno, CA ●
Pima, AZ ●
Hamilton, OH ●
Marion, IN ●
Salt Lake, UT ●
DuPage, IL ●
Milwaukee, WI ●
Franklin, OH ●
Hennepin, MN ●
Sacramento, CA ●
Clark, NV ●
Wayne, MI ●
Maricopa, AZ ●
Cook, IL ●
District of Columbia, DC ●
Anne Arundel, MD ●
Erie, NY ●
Dallas, TX ●
Polk, IA ●
Knox, TN ●
Bernalillo, NM ●
Shelby, TN ●
Madison, IL ●
Davidson, TN ●
DeKalb, GA ●
Monroe, NY ●
El Paso, TX ●
0.0020
0.0018
Level
0.0016
0.0014
0.0012
0.0010
0.0008
16
terbium ●
molybdenum ●
bromine ●
copper ●
silver ●
cadmium ●
phosphorus ●
lead ●
europium ●
titanium ●
wolfram ●
indium ●
vanadium ●
potassium_Ion ●
tantalum ●
tin ●
magnesium ●
chlorine ●
Antimony ●
hafnium ●
aluminum ●
cesium ●
nickel ●
barium ●
lanthanum ●
zinc ●
cerium ●
potassium ●
calcium ●
sodium ●
Carbonate_Carbon ●
silicon ●
iron ●
Sodium_Ion ●
Elemental_Carbon ●
NITRATE ●
AMMONIUM ●
OC_K14 ●
SULFATE ●
0 2 4 6 8
18
16
Level
14
12
10
Figure 9: Daily smoothed levels of the largest 40 components of PM2.5 (by mass)
for Bronx County, New York, 2002–2005.
17
PSA ●
KIM ●
HST ●
GGP ●
EQR ●
BXP ●
VNO ●
PLD ●
SPG ●
TIP ●
AGG ●
SPY ●
14000
600 1000 1400
13000
Level
12000
11000
10000
2007 2008
Figure 10: Daily prices and market value for a hypothetical portfolio of an
S&P 500 exchange traded fund (ETF), two bond ETFs, and nine real estate
investment trusts, 2006–2007.
18
B Code
Below we show the code for the main mvtsplot function and the drawImage-
Margin function which draws the margin panels. The full code for using the mvt-
splot function can be found at http://www.biostat.jhsph.edu/~rpeng/RR/mvtsplot.
> body(mvtsplot)
{
if (is.data.frame(x))
x <- data.matrix(x)
checkMatrix(x)
norm <- match.arg(norm)
if (!is.null(sort))
sort <- match.fun(sort)
rowstat <- match.fun(rowstat)
if (!require(RColorBrewer))
stop("'RColorBrewer' package required")
if (is.null(xtime)) {
xtime <- seq_len(nrow(x))
xlim <- c(0, max(xtime))
}
else xlim <- range(xtime)
if (!is.null(group)) {
group <- as.factor(group)
x <- reorderCols(x, group)
}
if (!margin && !is.null(sort)) {
stat <- apply(x, 2, sort, na.rm = TRUE)
x <- x[, order(stat)]
}
if (margin) {
colm <- apply(x, 2, function(x) {
grDevices::boxplot.stats(x)$stats
})
if (!is.null(sort)) {
stat <- apply(x, 2, sort, na.rm = TRUE)
ord <- order(stat)
x <- x[, ord]
colm <- colm[, ord]
}
}
if (is.null(smooth.df))
cx <- catcols(x, levels, norm)
else {
x <- smoothX(x, smooth.df)
cx <- catcols(x, levels, norm)
19
}
if (margin)
rowm <- apply(x, 1, rowstat, na.rm = TRUE)
colnames(cx) <- colnames(x)
empty <- apply(cx, 2, function(x) all(is.na(x)))
if (any(empty)) {
cx <- cx[, !empty]
if (margin)
colm <- colm[, !empty]
}
pal <- colorRampPalette(brewer.pal(4, palette))
nlevels <- if (length(levels) == 1)
levels
else length(levels)
if (margin)
drawImageMargin(cx, pal, nlevels, xlim, cn, xtime, group,
gcol, smooth.df, rowm, nrow(x), bottom.ylim, colm,
right.xlim, main)
else drawImage(cx, pal, nlevels, xlim, cn, xtime, group,
gcol)
}
attr(,"srcfile")
mvtsplot.R
> body(drawImageMargin)
{
op <- par(no.readonly = TRUE)
on.exit(par(op))
par(las = 1, cex.axis = 0.6)
cn <- colnames(cx)
nc <- ncol(cx)
side2 <- 0.55
utime <- unclass(xtime)
layout(rbind(c(1, 1, 1, 1, 1, 1, 3), c(1, 1, 1, 1, 1, 1,
3), c(1, 1, 1, 1, 1, 1, 3), c(2, 2, 2, 2, 2, 2, 4)))
if (!is.null(cn))
side2 <- max(0.55, max(strwidth(cn, "inches")) + 0.1)
else cn <- rep("", nc)
par(mai = c(0.05, side2, 0.1, 0.05))
image(utime, seq_len(nc), cx, col = pal(nlevels), xlim = xlim,
xaxt = "n", yaxt = "n", ylab = "", xlab = "")
axis(2, at = seq_len(nc), cn)
box()
if (!is.null(group)) {
usrpar <- par("usr")
20
par(usr = c(usrpar[1:2], 0, 1))
tg <- table(group)[-nlevels(group)]
abline(h = cumsum(tg)/nc, lwd = 2, col = gcol)
}
if (!is.null(smooth.df)) {
xx <- seq_along(rowm)
tmp.fit <- lm(rowm ~ ns(xx, smooth.df), na.action = na.exclude)
rowm <- predict(tmp.fit)
}
bottom.ylim <- if (is.null(bottom.ylim))
range(rowm, na.rm = TRUE)
else bottom.ylim
par(mai = c(0.4, side2, 0.05, 0.05))
plot(utime, rep(0, nr), type = "n", ylim = bottom.ylim, xaxt = "n",
xlab = "", ylab = "Level")
par(usr = c(xlim, par("usr")[3:4]))
nalines(utime, rowm)
Axis(xtime, side = 1)
right.xlim <- if (is.null(right.xlim))
range(colm, na.rm = TRUE)
else right.xlim
par(mai = c(0.05, 0.05, 0.1, 0.1))
plot(colm[3, ], 1:nc, type = "n", ylab = "", yaxt = "n",
xlab = "", xlim = right.xlim)
usrpar <- par("usr")
par(usr = c(usrpar[1:2], 0, 1))
ypos <- (1:nc - 1/2)/nc
segments(colm[1, ], ypos, colm[2, ], ypos, col = gray(0.6))
segments(colm[4, ], ypos, colm[5, ], ypos, col = gray(0.6))
points(colm[3, ], ypos, pch = 19, cex = 0.6)
if (!is.null(group))
abline(h = cumsum(tg)/nc, lwd = 2, col = gcol)
blankplot()
text(0, 0, main)
rval <- list(z = cx, rowm = rowm, colm = colm)
invisible(rval)
}
attr(,"srcfile")
mvtsplot.R
C Code Repository
The latest source code can be found at the project’s git repository located at
http://repo.or.cz/w/mvtsplot.git
21