Two-Stream Multi-Channel Convolutional Neural Network (TM-CNN) For Multi-Lane Traffic Speed Prediction Considering Traffic Volume Impact

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Two-Stream Multi-Channel Convolutional Neural Network (TM-CNN) for Multi-

Lane Traffic Speed Prediction Considering Traffic Volume Impact

Ruimin Ke1, Wan Li2, Zhiyong Cui1, Yinhai Wang1,*


1
Smart Transportation Applications and Research (STAR) Lab, Department of Civil and Environmental
Engineering, University of Washington
2
The intelligent Urban Transportation Systems (iUTS) Lab, Department of Civil and Environmental
Engineering, University of Washington
*
[email protected]

Abstract: Traffic speed prediction is a critically important component of intelligent transportation systems (ITS). Recently,
with the rapid development of deep learning and transportation data science, a growing body of new traffic speed
prediction models have been designed, which achieved high accuracy and large-scale prediction. However, existing studies
have two major limitations. First, they predict aggregated traffic speed rather than lane-level traffic speed; second, most
studies ignore the impact of other traffic flow parameters in speed prediction. To address these issues, we propose a two-
stream multi-channel convolutional neural network (TM-CNN) model for multi-lane traffic speed prediction considering
traffic volume impact. In this model, we first introduce a new data conversion method that converts raw traffic speed data
and volume data into spatial-temporal multi-channel matrices. Then we carefully design a two-stream deep neural network
to effectively learn the features and correlations between individual lanes, in the spatial-temporal dimensions, and
between speed and volume. Accordingly, a new loss function that considers the volume impact in speed prediction is
developed. A case study using one-year data validates the TM-CNN model and demonstrates its superiority. This paper
contributes to two research areas: (1) traffic speed prediction, and (2) multi-lane traffic flow study.

considered the periodic characteristics of traffic flow and


1. Introduction achieved state-of-the-art performance [20]. Although these
Traffic speed prediction is one of the most crucial pioneering studies still focus on relatively small-scale
components of intelligent transportation systems (ITS). It prediction at individual locations, they have greatly inspired
can benefit both traffic agencies and travelers by the explorations of more advanced deep-learning-based
contributing to key applications such as variable speed limit traffic speed prediction methods.
control and route guidance. Although traffic speed Recently, substantial research has focused on
prediction has a long history that can be dated back to extending the traffic speed prediction problem from
several decades ago, traditional traffic speed prediction individual roadway locations to traffic networks by
methods are unable to precisely capture the high designing new deep neural networks that integrate physical
dimensional and nonlinear characteristics of traffic flow due roadway structures [22–30]. Ma et al. developed a
to the lack of either the computational ability or amount of convolutional neural network (CNN) model that can capture
data [1, 2]. In recent years, with the emerging trends in spatial correlations between adjacent roadway segments and
artificial intelligence and transportation data science, a temporal correlations between adjacent times in a 2D
growing body of research has been conducted in this field. spatial-temporal matrix [22]. Yao et al. proposed a deep
The typical traffic speed prediction problem is to learning architecture named spatial-temporal dynamic
predict traffic speed at a future time using given historical network (STDN) that incorporated CNN, LSTM, and a
traffic data. Traditionally, time series methods such as periodically shifted attention mechanism to address the
autoregressive integrated moving average (ARIMA) and issues on dynamic dependency and shifting of long-term
conventional machine learning models such as support periodic dependency [29]. Cui et al. devised a high-order
vector regression (SVR) are widely applied to traffic graph convolutional LSTM (HGC-LSTM) to model the
prediction [3–12]. Later on, with the tremendous success of dynamics of the traffic speed and acquire the spatial
deep learning in many fields [13–15], researchers started to dependencies within the traffic network. This group of
explore the possibility of deep learning for traffic speed studies considers both spatial dependencies and temporal
prediction and then developed a couple of deep learning dynamics of traffic flow in deep learning models, thereby
models that significantly outperformed the conventional enables effective learning and accurate speed prediction for
models [16–21]. For example, Ma et al. implemented long network-scale traffic.
short-term memory neural network (LSTM NN) for the first Despite the achievements mentioned above in traffic
time in traffic speed prediction. Their work suggested that speed prediction, the existing studies have two major
LSTM NN received the best performance over previous limitations. First, they predict aggregated traffic speed rather
methods [19]. Tang et al. designed an improved fuzzy neural than lane-level traffic speed. At every data collection unit of
network (FNN) for traffic speed prediction. This model roadways, they implicitly assume no traffic pattern

1
difference between different lanes. In some studies, this is matrices. The converted data matrices are
due to the unavailability of lane-level traffic data; in others organized as the inputs to the deep neural network.
where the lane-level data are available, the speeds are still (2) We design a two-stream CNN architecture for
often aggregated for simplifying the model complexity. multi-lane traffic speed prediction. The
However, since a long time ago, research has revealed that convolutional layers extract the correlations
traffic flows on different lanes show different yet correlated between lanes and spatial-temporal features in the
patterns [31–39]. For instance, Daganzo et al. studied a multi-channel data matrices. It also concatenates
“reverse lambda” pattern in their work [37]. This pattern the outputs of the two convolutional-layer streams
shows as consistently high flows on freeway median lanes, and learns a speed-volume feature vector.
but it has not been reported for the shoulder lanes. It is also (3) We propose a new loss function for the deep
observed that for either two-lane or three-lane freeway learning model. It is the sum of a speed term and a
segments, there are certain volume-density distributions for weighted volume term. By appropriately setting the
individual lanes [34]. As the increasing need for lane-based weight, the volume term improves the learning
traffic operations such as carpool lane tolling and reversable ability of the model and helps prevent overfitting.
lane control in modern transportation systems, this issue can (4) Traditional studies on multi-lane traffic flow
no longer be ignored. mostly focus on the mathematically modeling and
The second limitation is that most existing studies behavior description of multi-lane traffic. This
ignore other traffic flow parameters in speed prediction study is among the first efforts to apply deep
tasks. In traffic flow theory, there are correlations among learning methods for multi-lane traffic pattern
traffic flow speed, volume, and occupancy [40]. Without the mining and prediction.
integration of volume or occupancy into speed prediction,
the hidden traffic flow patterns may not be fully captured 2. Methodology
and learned, which can lead to reduced prediction accuracy 2.1 Modeling multi-lane traffic as multi-channel matrices
[41]. An intuitive example is that: In free-flow conditions, a The first step of our methodology is modeling the
larger-volume traffic stream tends to be more sensitive to multi-lane traffic flow as multi-channel matrices. We
perturbances than smaller-volume traffic stream. Therefore, propose a data conversion method to convert the raw data
the speed of the larger-volume traffic stream is more likely into spatial-temporal multi-channel matrices, in which
to decrease in a future time. However, without the volume traffic on every individual lane is added to the matrices as a
or occupancy data, it is hard to model the hidden traffic flow separate channel. This modeling idea comes from CNN’s
patterns. superiority to capture features in multi-channel RGB images.
To address these challenges, we propose a two- In RGB images, each color channel has correlations yet
stream multi-channel convolutional neural network (TM- differences with the other two. This is similar to traffic
CNN) for multi-lane traffic speed prediction with the flows on different lanes where correlations and differences
consideration of traffic volume impact. In the proposed both exist [32, 37]. Thus, averaging traffic flow parameters
model, we develop a data conversion method to convert at a certain milepost and timestamp is like doing a weighted
both the multi-lane speed data and multi-lane volume data average of the RGB values to get the grayscale value. In this
into multi-channel spatial-temporal matrices. We design a sense, previous methods for traffic speed prediction are
CNN architecture with two streams, where one takes the designed for “grayscale images” (spatial-temporal prediction
multi-channel speed matrix as input and another takes the for averaged speed) or even just a single image column
multi-channel volume matrix as input. A fusion method is (speed prediction for an individual location). In this study,
further implemented for the two streams. Specifically, the proposed model manages to handle lane-level traffic
convolutional layers learn the two matrices to capture traffic information by formulating the data inputs as “RGB images.”
flow features in three dimensions: the spatial dimension, the In this paper, loop detector data is used due to the
temporal dimension, and the lane dimension. Then, the fact it collects different types of traffic flow data on
output tensors of the two streams will be flattened and individual lanes. That is being said, though loop detector is a
concatenated into one speed-volume vector, and this vector relatively traditional traffic detector, it provides lane-level
will be learned by the fully connected (FC) layers. traffic speed, volume, and occupancy data which many other
Accordingly, a new loss function is devised considering the detectors do not [42–44]. For example, probe vehicle data
volume impact in the speed prediction task. are widely used nowadays, but besides a small sample of
The proposed TM-CNN model is validated using traffic speeds and trajectories, most of them are unable to
one-year loop detector data on a major freeway in the collect lane-level data or volume data.
Seattle area. The comprehensive comparisons and analyses This data conversion method diagram is shown in
demonstrate the strength and effectiveness of our model. Figure 1. There are loop detectors installed at k different
This paper contributes to two transportation research areas. mileposts along this segment, and the past n time steps are
First, it contributes to the traffic speed prediction area by considered in the prediction task. We denote the number of
adding a new deep neural network model to the existing lanes as c. Without loss of generality, it is assumed that the
literature. Second, it pushes off the boundary of knowledge number of lanes is three in Figure 1 for the sake of
in the multi-lane traffic flow study area by developing a illustration. Single-lane traffic would be represented by two
method for the learning and speed prediction of multi-lane 𝑘 × 𝑛 spatial-temporal 2D matrices, where one is for speed
traffic. In summary, the contribution of this paper is fourfold: and another for volume. We denote them as 𝐼𝑢 for speed and
(1) We introduce a new data conversion method to 𝐼𝑞 for volume. We define the speed value and volume value
convert the multi-lane traffic speed data and
to be 𝑢𝑖𝑙𝑡 and 𝑞𝑖𝑙𝑡 respectively for a detector at milepost i (i
volume data into spatial-temporal multi-channel
2
= 1,2,…,k) and lane l (l = 1,2,…,c) at time t (t = 1,2,…,n). given milepost 𝑖 and time 𝑡. In the three-lane example in
Note that each 𝑢𝑖𝑙𝑡 or 𝑞𝑖𝑙𝑡 is normalized to between 0 and 1 Figure 1, the spatial-temporal matrices have three channels.
using min-max normalization since speed and volume have Mathematically, the spatial-temporal multi-channel matrices
different value ranges. Hence, in the speed and volume for traffic speed (𝑋𝑢 ) and volume (𝑋𝑞 ) can be denoted as
matrices with the size 𝑘 × 𝑛 × 𝑐, we construct the matrices
using Eq. (1) and Eq. (2), 𝐼𝑢 (1,1) 𝐼𝑢 (1,2) … 𝐼𝑢 (1, 𝑛)
𝐼 (2,1) 𝐼𝑢 (2,2) … 𝐼𝑢 (2, 𝑛)
𝑋𝑢 = [ 𝑢 ] (3)
𝐼𝑢 (𝑖, 𝑡) = (𝑢𝑖1𝑡 , 𝑢𝑖2𝑡 , … , 𝑢𝑖𝑐𝑡 ) (1) ⋮ ⋮ ⋮
𝐼𝑞 (𝑖, 𝑡) = (𝑞𝑖1𝑡 , 𝑞𝑖2𝑡 , … , 𝑞𝑖𝑐𝑡 ) (2) 𝐼𝑢 (𝑘, 1) 𝐼𝑢 (𝑘, 2) … 𝐼𝑢 (𝑘, 𝑛)

where 𝑖 and 𝑡 are the row index and column index of a


spatial-temporal matrix, representing the milepost and the 𝐼𝑞 (1,1) 𝐼𝑞 (1,2) … 𝐼𝑞 (1, 𝑛)
timestamp, respectively. 𝐼𝑢 (𝑖, 𝑡) and 𝐼𝑞 (𝑖, 𝑡) denote the 𝐼 (2,1) 𝐼𝑞 (2,2) … 𝐼𝑞 (2, 𝑛)
𝑋𝑞 = 𝑞 (4)
multi-channel pixel values of the speed and the volume. The ⋮ ⋮ ⋮
number of channels correspond to the number of lanes c. [𝐼𝑞 (𝑘, 1) 𝐼𝑞 (𝑘, 2) … 𝐼𝑞 (𝑘, 𝑛)]
Each element in the 2D multi-channel matrices is a c-unit
vector representing c lanes’ traffic speeds or volumes at a

Fig. 1 The data input modeling process of converting the multi-lane traffic flow raw data to the multi-channel spatial-temporal
matrix

2.2 Convolution for feature extraction the left-most column, channel #1 displays the traffic pattern
The CNN has demonstrated a promising performance of lane #1; and on the bottom, the pattern of lane #c is
in image classification and many other applications due to presented. The symbol “*” denotes the convolution
its locally-connected layers and the better ability than other operation in Figure 2. Since our input is a multi-channel
neural networks to capture local features. In transportation, image, the convolution filters are also multi-channel. In the
traffic stream, as well as disturbance to traffic stream, moves figure, a 3 × 3 × c filter is drawn, while the size of the filter
along the spatial axis and the temporal axis. Thus, applying can be changed in practice. The values inside the cells of a
CNN to the spatial-temporal traffic image manages to filter are weights of the CNN, which are automatically
capture local features in both spatial and temporal modified during the training process. The final weights are
dimensions. The fundamental operation in the feature able to extract the most salient features in the multi-channel
extraction process of CNN is convolution. With the re- image. The convolution operation outputs a feature map for
organized input as a multi-channel matrix 𝑋 (𝑋 could be 𝑋𝑢 each channel, and they are summed up to be the extracted
or 𝑋𝑞 ), the basic unit of a convolution operation is shown in feature map of this convolution filter in the current
Figure 2. On the left most of the figure, it is the input convolutional layer. With multiple filters operated on the
spatial-temporal matrix or image 𝑋. Every channel of the same input image, a multi-channel feature map will be
input matrix is a 2D spatial-temporal matrix representing the constructed, and serves as the input to the next layer.
traffic flow pattern on the corresponding lane. On the top of

3
Fig. 2 The convolution operation to extract features from the multi-channel spatial-temporal traffic flow matrices

2.3 The TM-CNN for speed prediction (5) Different from most CNN's, our CNN does not have a
In order to learn the multi-lane traffic flow patterns pooling layer. The main reason for not inserting pooling
and predict traffic speeds, a CNN structure is designed (see layers in between convolutional layers is that our input
Figure 3). Compared to a standard CNN, the proposed CNN images are much smaller than regular images for image
architecture is modified in the following aspects: (1) The classification or object detection [45, 46]. Regular input
network inputs are different, that is, the input image is a images to a CNN usually have hundreds of columns and
spatial-temporal image built by traffic sensor data, and it has rows while the spatial-temporal images for roadway traffic
multiple channels which represent the lanes of a corridor. are not that large. In this research and many existing traffic
Moreover, the pixels values’ range is different from a prediction studies, the time resolution of the data is five
normal image. For a normal image, it is 0 to 255; however, minutes, which means even using two-hour data for
here it ranges from 0 to either the highest speed (often the prediction there are only 24 time steps. Thus, we do not risk
speed limit) or the highest volume (often the capacity). (2) losing information by pooling. (6) The loss function is
The neural network has two streams of convolutional layers, devised to contain both speed and volume information. For
which are for processing the speeds and volumes. But most traditional image classification CNN’s, the loss function is
CNN has only one stream of convolutional layers. The the cross-entropy loss. And for traffic speed prediction tasks,
purpose of having two streams of convolutional layers is to the loss function is commonly the Mean Squared Error
integrate both speed information and volume information (MSE) function with only speed values. However, in this
into the model so that the network can learn the traffic research, we add a new term in the loss function to
patterns better than only learning speed. To combine the two incorporate the volume information. We denote the ground
streams, a fusion operation that flattens and concatenates the truth speed vector and volume vector as 𝑌𝑢 and 𝑌𝑞 , and the
outputs of the two streams are implemented between the predicted speed vector and volume vector as 𝑌̂𝑢 and 𝑌̂𝑞 . Note
convolutional layers and the FC layers. The fusion operation that 𝑌𝑢 , 𝑌𝑞 , 𝑌̂𝑢 , and 𝑌̂𝑞 are all normalized between 0 and 1.
is chosen to be concatenation instead of addition or
The loss function 𝐿 is defined in Eq. (5) by summing up the
multiplying because the concatenation operation is more
MSEs of speed and volume. The volume term λ||𝑌̂𝑞 − 𝑌𝑞 ||22
flexible for us to modify each stream’s structures. In other
words, the concatenation fusion method allows the two is added to the loss function for reducing the probability of
streams of convolutional layers to have different structures. overfitting by helping the model better understand the
(3) The extracted features have unique meanings and are essential traffic patterns. This design improves the speed
different from image classifications or most other tasks. The prediction accuracy on test dataset with proper settings of λ.
extracted features here are relations among road segments, Our suggested value of λ is between 0 and 1 considering that
time series, adjacent lanes, and between traffic flow speeds the volume term that deals with overfitting should still have
and volumes. (4) The output is different, i.e., our output is a a lower impact than the speed term on speed prediction
vector of traffic speeds of multiple locations at a future time problems.
rather than a single category label or some bounding boxes’
coordinates. The output itself is part of the input for another 𝐿 = ||𝑌̂𝑢 − 𝑌𝑢 ||22 + λ||𝑌̂𝑞 − 𝑌𝑞 ||22 (5)
prediction, while this is not the case for most other CNN’s.

4
Fig. 3 The proposed two-stream multi-channel convolutional neural network (TM-CNN) architecture

In the proposed TM-CNN, the inputs are our multi-


channel matrices 𝑋𝑢 and 𝑋𝑞 with the dimension of 𝑘 × 𝑛 × 𝑐. where 𝑊4 and 𝑊5 are the weights for the two FC layers, 𝑏4
The filter size is all 2 × 2 × 𝑐 in order to better capture the and 𝑏5 are the biases, 𝐹(∙) is the flatten function, and
correlations between each pair of adjacent loops as well as 𝐶𝑜𝑛𝑐(∙) is the concatenation function.
adjacent times. The number of filters for each convolutional
layer is chosen based on experience and the consideration to 3. Case Study
balance efficiency and accuracy. The last convolutional 3.1 Data description
layer in each of the two streams is flattened and connected In this paper, one-year loop data from January 1st,
to a fully-connected (FC) layer. This FC layer is fully 2016 to December 31st, 2016 for a four-lane freeway
connected with the output layer as well. The length of the corridor in Seattle is used for validation. Seattle is currently
output vector 𝑌̂𝑢 is 1 × (k × c), since the prediction is for a city with top ten busiest traffic in the United States. And
one future step. All activations except the output layer use this study freeway segment is one of the busiest corridors in
Relu function. The output layer has a linear activation Seattle. It starts from milepost-170 to milepost-165 of
function, which is adopted for regression tasks. Eq. (6) and Interstate-5 (I5) freeway southbound, connecting the
Eq. (7) describe the derivations mathematically from inputs University of Washington to Downtown Seattle. There are
to the outputs of the last convolutional layers, 40 loop detectors on this corridor. They collect speed,
volume, and occupancy traffic data. In our study, we use
𝑌̂𝑢𝑐𝑜𝑛𝑣 = 𝜑{𝑊𝑢3 ∗ 𝜑[𝑊𝑢2 ∗ 𝜑(𝑊𝑢1 ∗ 𝑋𝑢 + 𝑏𝑢1 ) + 𝑏𝑢2 ] + 𝑏𝑢3 } (6) speed and volume for speed prediction. The reason we do
𝑌̂𝑞𝑐𝑜𝑛𝑣 = 𝜑{𝑊𝑞3 ∗ 𝜑[𝑊𝑞2 ∗ 𝜑(𝑊𝑞1 ∗ 𝑋𝑞 + 𝑏𝑞1 ) + 𝑏𝑞2 ] + 𝑏𝑞3 } (7) not include occupancy in our model is that adding
occupancy increases the training time and complexity yet
does not improve the prediction accuracy. This can be
where 𝑌̂𝑢𝑐𝑜𝑛𝑣 and 𝑌̂𝑞𝑐𝑜𝑛𝑣 are the intermediate speed and
explained by traffic flow theory: In most cases, if two of the
volume outputs of the CNN in between the last three traffic parameters are known, the third one can be
convolutional layers and the flatten layers, 𝑊𝑢𝑖 and 𝑊𝑞𝑖 (𝑖 = estimated. Hence, essentially, using three of them does not
1,2,3) are the weights for the convolutions, 𝑏𝑢𝑖 and 𝑏𝑞𝑖 (𝑖 = add more information to the prediction model. The data is
1,2,3) are the biases, and 𝜑(∙) is the Relu activation function. downloaded from a traffic big data platform named Digital
After getting these two intermediate outputs, we flatten them Roadway Interactive Visualization and Evaluation Network
and fuse them into one vector, and then further learn the (DRIVE Net) [47], where the data is aggregated to every 5
relations between the volume feature map and the speed minutes. Based on our data conversion method, speed and
feature map using FC layers. As aforementioned, we choose volume are each converted to a four-channel matrix. The
concatenation as the fusion function for the two flattened data conversion of the one-year data generates about
intermediate outputs to allows the customization of different 105,000 data samples for model validation.
neural network designs of the two streams. Customized
streams could result in two intermediate outputs of different 3.2 Model implementation
dimensions. While concatenation would still successfully The proposed model is implemented in Keras deep
fuse the two outputs together and support the learning of learning library using TensorFlow backend on an Nvidia
speed-volume relationships by the FC layers, most other GTX 1080 GPU. The implemented model architecture in the
fusion operations require the two vectors to have the same case study is shown in Figure 4. This architecture figure is
length. This fusion process is mathematically represented in automatically generated by Keras after the model design. It
Eq. (8) as follows, displays the overall model structure as well as the
dimensions of the inputs and outputs of all layer. Each input
𝑌̂𝑢 , 𝑌̂𝑞 = 𝑊5 × 𝜑 [𝑊4 × 𝐶𝑜𝑛𝑐 (𝐹(𝑌̂𝑢𝑐𝑜𝑛𝑣 ), 𝐹(𝑌̂𝑞𝑐𝑜𝑛𝑣 )) + 𝑏4 ] + 𝑏5 (8) data sample is organized into the dimension of 10×8×4,
5
where 10 is the number of detectors along the freeway different learning rates in the first 50 epochs. It can be
segment on each lane, 8 is the number of time steps used for observed that when the learning rate is smaller than 0.0001,
learning and prediction, and 4 is the number of channels smaller learning rate generates a smaller loss. However,
(lanes). Here we choose 8 as the number of time steps when the learning rate is larger than 0.0001, the model loss
because we observe that 8 is large enough (40 minutes) to starts becoming larger again, and at the same time, the
ensure the model adequately captures the past traffic model training takes longer time. Thus, we picked up 0.0001
patterns while we target using short past time steps for as the learning rate for our model.
efficient training. In this model, three convolutional layers Another critical parameter of the proposed model is
are added sequentially to each stream. We select three as the the λ in the loss function. It determines the impact of traffic
number of hidden convolutional layers based on a trial-and- volume on the speed prediction. As aforementioned, our
error process, during which we observe that three suggested value of λ is between 0 and 1 with the
convolutional layers constantly outperform just having one consideration that the volume term should have a lower
of two convolutional layers, yet little improvement is impact than the speed term. Therefore, we tested ten values
observed with more than three of them. Two dropout layers of λ from 0 to 0.9 with an interval of 0.1. The curve of speed
are added to the model to reduce the probability of prediction accuracy on the test dataset with respect to λ is
overfitting in the training process. One dropout layer is shown as the second plot in Figure 5. Compared to no
inserted between the last convolutional layer and the volume term in the loss function, the speed prediction
concatenation layer, with a dropout ratio 0.5; another is accuracy improves when λ = 0.1, which implies that the
inserted between the FC layer (which is shown as a dense loss function design is effective. The model accuracy starts
layer in Keras) and the output layer with a dropout ratio 0.25. to decrease as λ getting larger from 0.1. This interesting
finding indicates that: On the one hand, the volume term
does have impact on the speed prediction accuracy; on the
other hand, the impact of the volume term should exist but
not too large. This observation is reasonable: Firstly,
according to traffic flow theory, two traffic flow parameters
can better determine the actual traffic flow status than just
one parameter; secondly, since volume has more
randomness and variation than speed in short term, large
impact of volume could increase the uncertainty in speed
prediction.

Fig. 4 The model architecture diagram of the TM-CNN


implemented in the case study

In the model validation process, we split the dataset


to 80,000 samples for training and 25,000 samples for
testing. In addition to the model architecture parameters,
several other hyper-parameters need to be tuned in the
training process. For example, the optimizer in our model
training is RMSprop given its faster convergence rate than
other optimizers. One key hyper-parameter that often
influences deep learning models’ performances is the
learning rate, which determines how much the weights are Fig. 5 Model parameter tuning for learning rate and the 𝜆 in
adjusted with respect to the loss function gradient. It impacts the loss function
both model accuracy and training speed. In our case, we
examined different learning rates ranging from 0.01 to 3.3 Results and comparison
0.00001, in which we found that the learning rate around In order to demonstrate the superiority of the
0.0001 generated the best model accuracy. As shown in the proposed model, we conducted two evaluations. On the one
first plot in Figure 5, training loss curves are plotted for hand, we compared it with five baseline models, on the
6
other hand, we visualized both the ground truth speeds and 5 minutes in this visualization. The horizontal axis is the
the predicted speeds in the formats of spatial-temporal heat index of time. The vertical axis is the index for loop
map and single-detector speed plot. ARIMA is one of the detectors, where loops 0-9 are on lane #1 (the shoulder lane),
pioneering methods for traffic prediction; SVR is a popular and loops 30-39 is on lane #4 (the median lane). Figure 7
model in the field before the large-scale applications of deep shows the speed prediction curves (the orange curves) and
learning in traffic prediction; ANN is the traditional fully- the ground truth curves (the blue curves) for single loops on
connected artificial neural network, which often serves as every lane at milepost 166.4 for 24 hours. Three
one baseline; LSTM, a specific type of recurrent neural observations can be summarized based on the visualizations
network, is the most widely-used model in recent years for in Figure 6 and Figure 7: First, the proposed model achieves
traffic prediction. We also compared the proposed two- excellent learning and prediction performances in different
stream CNN with a single-stream CNN, which merely traffic conditions (free flow and congestion). Second, the
contains a speed stream in the network structure and no proposed model can learn and capture similar trends yet
volume term in the loss function. The models were all unique patterns of the traffic flow speed on all individual
finetuned to have their best performances. There are many lanes. Third, the predicted speed values are smoother than
other speed prediction models, some of which could be the ground truth values. This is due to the variation and
more advanced than the baseline models for specific tasks. noise in real-world traffic flow and traffic data collection.
However, considering most of the existing models are not The smoothness of the prediction actually demonstrates the
designed for multi-lane traffic pattern prediction, modifying ability of the proposed model to capture the general trends
them to predict multi-lane traffic just for comparison of traffic flow and its robustness to noises.
purpose could downgrade their capability and is also not
meaningful at this point. Table 1 Accuracy comparison with baseline methods
Table 1 shows the accuracies and comparison results. Prediction time steps
For each model, three different prediction time steps, i.e., 5 1 (5 mins) 2 (10 mins) 3 (15 mins)
mins, 10 mins, and 15 mins are examined. In general, the
ARIMA 83.13% 81.06% 78.35%
shorter the prediction time step is, the higher the accuracy.
This is consistent with most previous studies. It can be seen SVR 82.66% 80.90% 78.47%
that the proposed two-stream multi-channel CNN has the ANN 87.74% 85.79% 83.90%
best prediction accuracy over the baseline models in all LSTM 88.46% 86.78% 84.65%
three cases. The single-stream CNN in general beats the Single-stream CNN 90.83% 89.25% 86.94%
other four baseline models, while has a lower accuracy than TM-CNN 91.21% 90.06% 88.15%
the two-stream CNN. Also, it can be observed that with the
increase of time step, the prediction accuracy differences
between TM-CNN and other models generally become
larger. These comparisons show three strengths of the
proposed model: First, the conversion of raw traffic data to
the multi-channel matrix indeed improves the learning and
prediction ability by better capturing spatial-temporal
correlations between adjacent lanes, mileposts, and times.
Second, the fusion of volume and speed further enhances the
learning ability and model accuracy. Third, compared to the
baseline methods, the TM-CNN demonstrates a better
performance overall and its superiority in relatively longer-
term speed prediction.
Figure 6 displays the heat maps of the ground truth Fig. 6 Heat maps showing the ground truth speeds (upper)
speeds (upper) and the predicted speeds (lower) for every and our predicted speeds (lower) for all four lanes from 6
lane from 6 am to 8 pm on a day. The prediction time step is am to 8 pm on a day

Fig. 7 The predicted speeds and ground truths at milepost 166.4 for all lanes in 24 hours

and incorporated in the model to enable the effective


4. Conclusion and Future Work learning of multi-lane traffic flow characteristics and the
In this paper, we proposed a novel deep learning accurate prediction of multi-lane speeds. The new
model called TM-CNN for multi-lane traffic speed components included a raw data conversion method, a two-
prediction. Several new components were carefully designed
7
stream multi-channel convolutional neural network Moving Average ( ARIMA )’TRB, 2014.
architecture, and a new loss function. 12 Asif, M.T., Dauwels, J., Goh, C.Y., et al.: ‘Spatiotemporal
Some interesting findings and recommendations can patterns in large-scale traffic speed prediction’IEEE Trans.
be concluded: (1) Experimental results demonstrate that the Intell. Transp. Syst., 2014, 15, (2), pp. 794–804.
13 Lecun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’ (2015)
TM-CNN can learn and capture the traffic patterns in 14 Ahmad, J., Farman, H., Jan, Z.: ‘Deep Learning Methods
different traffic conditions and individual lanes. (2) and Applications’, in ‘SpringerBriefs in Computer Science’
Comparisons with the baseline models show that the TM- (2019)
CNN achieves superior prediction accuracy and robustness 15 Ke, R., Li, Z., Tang, J., Pan, Z., Wang, Y.: ‘Real-Time
over ARIMA, SVR, ANN, LSTM, and single-stream CNN. Traffic Flow Parameter Estimation from UAV Video
(3) The learning rate in the training process and the weight Based on Ensemble Classifier and Optical Flow’IEEE
of the volume term in the loss function are critical hyper- Trans. Intell. Transp. Syst., 2019.
parameters in this model. (4) For multi-lane traffic learning 16 Lee, E.-M., Kim, J.-H., Yoon, W.-S.: ‘Traffic speed
and prediction, we suggest converting traffic flow data into prediction under weekday, time, and neighboring links’
speed: Back propagation neural network approach’, in
multi-channel matrices using the proposed data conversion ‘International Conference on Intelligent Computing’
method. (5) We suggest incorporating traffic volume data (2007), pp. 626–635
into both the neural network architecture and the loss 17 Jia, Y., Wu, J., Du, Y.: ‘Traffic speed prediction using
function for speed prediction tasks. deep learning method’, in ‘2016 IEEE 19th International
Future work will be carried out in two directions. Conference on Intelligent Transportation Systems (ITSC)’
First, this study conducted an initial experiment on a (2016), pp. 1217–1222
relatively small-scale dataset for the purpose of validating 18 Huang, S.-H., Ran, B.: ‘An application of neural network
the model performance on predicting multi-lane traffic. In on traffic speed prediction under adverse weather
future studies, we will finetune and test the model for condition’. University of Wisconsin--Madison, 2003
19 Ma, X., Tao, Z., Wang, Y., Yu, H., Wang, Y.: ‘Long
network-scale multi-lane traffic speed prediction. The short-term memory neural network for traffic speed
second future direction is to modify the model structure to prediction using remote microwave sensor data’Transp.
integrate ramp detectors data into speed prediction. By Res. Part C Emerg. Technol., 2015.
doing this, we aim to further improve the learning ability 20 Tang, J., Liu, F., Zou, Y., Zhang, W., Wang, Y.: ‘An
and prediction accuracy of the model. Improved Fuzzy Neural Network for Traffic Speed
Prediction Considering Periodic Characteristic’IEEE
Trans. Intell. Transp. Syst., 2017.
5. Reference 21 Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y.: ‘Traffic
Flow Prediction with Big Data: A Deep Learning
1 Bennett, C.R.: ‘A speed prediction model for rural two- Approach’IEEE Trans. Intell. Transp. Syst., 2015.
lane highways’1994. 22 Ma, X., Dai, Z., He, Z., Ma, J., Wang, Y., Wang, Y.:
2 Dougherty, M.S., Cobbett, M.R.: ‘Short-term inter-urban ‘Learning traffic as images: A deep convolutional neural
traffic forecasts using neural networks’Int. J. Forecast., network for large-scale transportation network speed
1997, 13, (1), pp. 21–31. prediction’Sensors (Switzerland), 2017.
3 Chen, H., Grant-Muller, S.: ‘Use of sequential learning for 23 Cui, Z., Ke, R., Wang, Y.: ‘Deep Bidirectional and
short-term traffic flow forecasting’Transp. Res. Part C Unidirectional LSTM Recurrent Neural Network for
Emerg. Technol., 2001. Network-wide Traffic Speed Prediction’2018.
4 Theja, P.V.V.K., Vanajakshi, L.: ‘Short term prediction of 24 Cui, Z., Henrickson, K., Ke, R., Wang, Y.: ‘High-Order
traffic parameters using support vector machines Graph Convolutional Recurrent Neural Network: A Deep
technique’, in ‘Proceedings - 3rd International Conference Learning Framework for Network-Scale Traffic Learning
on Emerging Trends in Engineering and Technology, and Forecasting’2018.
ICETET 2010’ (2010) 25 Zhang, Z., Li, M., Lin, X., Wang, Y., He, F.: ‘Multistep
5 Kumar, S.V., Vanajakshi, L.: ‘Short-term traffic flow Speed Prediction on Traffic Networks: A Graph
prediction using seasonal ARIMA model with limited Convolutional Sequence-to-Sequence Learning Approach
input data’Eur. Transp. Res. Rev., 2015. with Attention Mechanism’arXiv Prepr. arXiv1810.10237,
6 Clark, S.: ‘Traffic prediction using multivariate 2018.
nonparametric regression’J. Transp. Eng., 2003, 129, (2), 26 Wang, J., Chen, R., He, Z.: ‘Traffic speed prediction for
pp. 161–168. urban transportation network: A path based deep learning
7 Kumar, S.V.: ‘Traffic Flow Prediction using Kalman approach’Transp. Res. Part C Emerg. Technol., 2019, 100,
Filtering Technique’, in ‘Procedia Engineering’ (2017) pp. 372–385.
8 Vanajakshi, L., Rilett, L.R.: ‘A comparison of the 27 Yu, H., Wu, Z., Wang, S., Wang, Y., Ma, X.:
performance of artificial. neural networks and support ‘Spatiotemporal recurrent convolutional networks for
vector machines for the prediction of traffic speed’, in traffic prediction in transportation networks’Sensors, 2017,
(2004) 17, (7), p. 1501.
9 Chandra, S.R., Al-Deek, H.: ‘Predictions of freeway 28 Yu, B., Yin, H., Zhu, Z.: ‘Spatio-temporal graph
traffic speeds and volumes using vector autoregressive convolutional networks: A deep learning framework for
models’J. Intell. Transp. Syst. Technol. Planning, Oper., traffic forecasting’arXiv Prepr. arXiv1709.04875, 2017.
2009. 29 Yao, H., Tang, X., Wei, H., Zheng, G., Yu, Y., Li, Z.:
10 Smith, B.L., Williams, B.M., Keith Oswald, R.: ‘Modeling spatial-temporal dynamics for traffic
‘Comparison of parametric and nonparametric models for prediction’arXiv Prepr. arXiv1803.01254, 2018.
traffic flow forecasting’Transp. Res. Part C Emerg. 30 Ma, X., Li, Y., Cui, Z., Wang, Y.: ‘Forecasting
Technol., 2002. Transportation Network Speed Using Deep Capsule
11 Liu, L., Hall, K.: ‘A Hybrid Short-Term Traffic Speed Networks with Nested LSTM Models’arXiv Prepr.
Forecasting Model through Empirical Mode arXiv1811.04745, 2018.
Decomposition ( EMD ) and Au- toregressive Integrated 31 Ahn, S., Cassidy, M.J.: ‘Freeway traffic oscillations and
8
vehicle lane-change maneuvers’, in ‘Transportation and
Traffic Theory 2007. Papers Selected for Presentation at
ISTTT17Engineering and Physical Sciences Research
Council (Great Britain) Rees Jeffreys Road FundTransport
Research FoundationTMS ConsultancyOve Arup and
Partners, Hong KongTransp’ (2007)
32 Daganzo, C.F.: ‘A behavioral theory of multi-lane traffic
flow. Part II: Merges and the onset of congestion’Transp.
Res. Part B Methodol., 2002, 36, (2), pp. 159–169.
33 Michalopoulos, P.G., Beskos, D.E., Yamauchi, Y.:
‘Multilane traffic flow dynamics: some macroscopic
considerations’Transp. Res. Part B Methodol., 1984, 18,
(4–5), pp. 377–395.
34 Shiomi, Y., Taniguchi, T., Uno, N., Shimamoto, H.,
Nakamura, T.: ‘Multilane first-order traffic flow model
with endogenous representation of lane-flow
equilibrium’Transp. Res. Part C Emerg. Technol., 2015,
59, pp. 198–215.
35 Wagner, P., Nagel, K., Wolf, D.E.: ‘Realistic multi-lane
traffic rules for cellular automata’Phys. A Stat. Mech. its
Appl., 1997, 234, (3), pp. 687–698.
36 Klar, A., Wegener, R.: ‘A hierarchy of models for
multilane vehicular traffic I: Modeling’SIAM J. Appl.
Math., 1998, 59, (3), pp. 983–1001.
37 Daganzo, C.F.: ‘A behavioral theory of multi-lane traffic
flow. Part I: Long homogeneous freeway sections’Transp.
Res. Part B Methodol., 2002.
38 Shvetsov, V., Helbing, D.: ‘Macroscopic dynamics of
multilane traffic’Phys. Rev. E, 1999, 59, (6), p. 6328.
39 Tang, J., Liu, F., Zhang, W., Ke, R., Zou, Y.: ‘Lane-
changes prediction based on adaptive fuzzy neural
network’Expert Syst. Appl., 2018.
40 Ke, R., Li, Z., Kim, S., Ash, J., Cui, Z., Wang, Y.: ‘Real-
time bidirectional traffic flow parameter estimation from
aerial videos’IEEE Trans. Intell. Transp. Syst., 2017, 18,
(4), pp. 890–901.
41 Liu, Q., Wang, B., Zhu, Y.: ‘Short-Term Traffic Speed
Forecasting Based on Attention Convolutional Neural
Network for Arterials’Comput. Civ. Infrastruct. Eng.,
2018.
42 Ke, R., Zeng, Z., Pu, Z., Wang, Y.: ‘New framework for
automatic identification and quantification of freeway
bottlenecks based on wavelet analysis’J. Transp. Eng.
Part A Syst., 2018, 144, (9), pp. 1–10.
43 Feng, S., Ke, R., Wang, X., Zhang, Y., Li, L.: ‘Traffic
flow data compression considering burst components’IET
Intell. Transp. Syst., 2017, 11, (9), pp. 572–580.
44 Zhuang, Y., Ke, R., Wang, Y.: ‘Innovative method for
traffic data imputation based on convolutional neural
network’IET Intell. Transp. Syst., 2018.
45 Chen, X., Wang, S., Shi, C., Wu, H., Zhao, J., Fu, J.:
‘Robust Ship Tracking via Multi-view Learning and
Sparse Representation’J. Navig., 2019, 72, (1), pp. 176–
192.
46 Ke, R., Lutin, J., Spears, J., Wang, Y.: ‘A Cost-Effective
Framework for Automated Vehicle-Pedestrian Near-Miss
Detection Through Onboard Monocular Vision’, in ‘IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition Workshops’ (2017)
47 Wang, Y., Zhang, W., Henrickson, K., et al.: ‘Digital
Roadway Interactive Visualization and Evaluation
Network Applications to WSDOT Operational Data
Usage’ (2016)

You might also like