Mathematics 12 02056 v2
Mathematics 12 02056 v2
Mathematics 12 02056 v2
1 School of Traffic and Transportation Engineering, Central South University, Changsha 410075, China
2 Harvard Medical School, Harvard University, Boston, MA 02138, USA
3 School of Resources and Safety Engineering, Central South University, Changsha 410083, China
* Correspondence: [email protected]
Abstract: To improve traffic efficiency, adaptive traffic signal control (ATSC) systems have been
widely developed. However, few studies have proactively optimized the air environmental issues
in the development of ATSC. To fill this research gap, this study proposes an optimized ATSC al-
gorithm to take into consideration both traffic efficiency and decarbonization. The proposed algo-
rithm is developed based on the deep reinforcement learning (DRL) framework with dual goals
(DRL-DG) for traffic control system optimization. A novel network structure combining Convolu-
tional Neural Networks and Long Short-Term Memory Networks is designed to map the intersec-
tion traffic state to a Q-value, accelerating the learning process. The reward mechanism involves a
multi-objective optimization function, employing the entropy weight method to balance the weights
among dual goals. Based on a representative intersection in Changsha, Hunan Province, China, a
simulated intersection scenario is constructed to train and test the proposed algorithm. The result
shows that the ATSC system optimized by the proposed DRL-DG results in a reduction of more
than 71% in vehicle waiting time and 46% in carbon emissions compared to traditional traffic signal
control systems. It converges faster and achieves a balanced dual-objective optimization compared
Citation: Zhang, G.; Chang, F.;
to the prevailing DRL-based ATSC.
Huang, H.; Zhou, Z. Dual-Objective
Reinforcement Learning-Based
Adaptive Traffic Signal Control for
Keywords: adaptive signal control system; intersections; carbon emissions; deep reinforcement
Decarbonization and Efficiency learning
Optimization. Mathematics 2024, 12,
2056. https://doi.org/10.3390/ MSC: 93C40
math12132056
developed. In such a system, the traffic light or its duration time varies with the detected
traffic flow at the specific entrance of an intersection according to pre-defined rules [8,9].
Although the actuated control system takes into consideration traffic fluctuations, the traf-
fic flow alone is insufficient in reflecting the actual traffic demands in complex traffic con-
ditions [10].
To relax the limitations of the actuated control system, adaptive traffic control sys-
tems have been proposed. In such a system, the real-time traffic state is monitored contin-
uously through several critical parameters, based on which the adaptive control strategies
are updated accordingly [11]. The most deployed ATSC systems at urban intersections
include the Sydney Coordinated Adaptive Traffic System (SCATS) and the Split Cycle
Offset Optimization Technique (SCOOT). The SCATS aims to select the optimal phasing
(i.e., cycle times, phase splits, and offsets) scheme for a traffic situation from pre-stored
plans according to the vehicle equivalent flow and saturation level calculated from the
fluctuations in traffic flow and system capacity. The SCOOT reacts to different traffic
states by changing the cycle length, phase splits, and offset in small increments according
to vehicle delays and stops calculated from the detected flow [12]. SCATS and SCOOT
have proven their great potential in improving traffic efficiency while being human-
crafted, given their respective control schemes and incremental designs pre-determined
by experts [13]. The experts’ knowledge is valuable but may suffer from subjective bias
issues.
In recent years, reinforcement learning (RL), particularly data-driven deep reinforce-
ment learning (DRL), has shown excellent application prospects in ATSC [14]. In the ATSC
system, RL self-learns the optimal actions through interaction and feedback with the traf-
fic environment instead of manually setting pre-defined rules. One or several intersections
are considered an agent. The signal control system of the agent makes a decision after
observing the state of the road network, and then learns the optimal signal timing scheme
by maximizing the reward of environmental feedback [15]. Mikami and Kakazu [16] first
applied RL to TSC optimization, leading to an upsurge in application of RL in TSC sys-
tems. However, RL is suitable for models with a discrete state and its direct application to
TSC systems increases the computational complexity and requires large storage space
[17]. Deep learning (DL) inspired by the working mode of the human brain can effectively
process high-dimensional data by transforming low-level features to abstract high-level
features, and thus can address the application limitation of RL in traffic signal control
systems [18]. By combining the perception capacity of DL with the decision-making ca-
pacity of RL, DRL has been widely applied to ATSC [19].
The application of the DRL algorithm to ATSC in most studies focuses on the calcula-
tion rate, convergence effect, and application scenarios, in which traffic efficiency is the ma-
jor goal of TSC optimization [19–22]. Considering the severe air pollution caused by idling
times, parking times, and frequent accelerations/decelerations at intersections [23], vehicle
emissions are also taken into consideration in the development of FTSC [24,25] and ASC
[26], in addition to traffic efficiency. However, these bi-objective traffic control systems are
pre-defined optimal timing schemes based on historical traffic data, which cannot be ap-
plied to the real-time control of real-world dynamic traffic flow for efficiency and emission
optimization. To fill this research gap, this study proposes an optimized algorithm for the
development of the ATSC system to take into consideration both vehicle emissions, espe-
cially carbon emissions, and traffic efficiency.
The proposed ATSC algorithm utilizes a DRL framework with traffic efficiency and
carbon emissions as the dual-objective optimization. The agent in the DRL framework is
developed to change the traffic signal phase based on the multiple-reward function re-
lated to optimization objectives. More specifically, traffic efficiency and carbon emissions
are optimized by reducing the cumulative waiting time (CWT) and carbon dioxide emis-
sions (CDEs) of all vehicles, respectively. The agent self-learns the optimal decision of
traffic signal phases by minimizing the CWT and CDE between two adjacent traffic signal
phases. To accelerate and balance the agent learning process, we develop a novel neural
Mathematics 2024, 12, 2056 3 of 24
network comprising Convolutional Neural Networks and Long Short-Term Memory Net-
works and utilize the entropy weight method to balance the weights among the CWT and
CDE. A representative intersection in the real world is simulated for training and testing
the proposed algorithm.
2. Literature Review
2.1. TSC System for Decarbonization and Efficiency
Traffic signal control (TSC) systems are the primary means of organizing traffic at
intersections, and a reasonable allocation of signal phase durations can improve vehicle
passage efficiency. Early studies focused on calculating signal phase durations or setting
rules for FTSC and ASC, which are unsuitable for dynamically changing traffic flows
[4,8,9]. Hence, ATSC systems were proposed, which dynamically adjust signal timing
based on real-time traffic data collected from various sensors [27]. The core principle of
ATSC involves real-time analysis of traffic patterns and the optimization of signal phases
to improve overall traffic throughput [28,29]. The mainstream approach is dynamic pro-
gramming models [30,31] that predict traffic patterns based on historical and real-time
data to optimize signal timing. Zhao and Ma [32] established an ATSC dynamic planning
model to increase traffic volume at intersections under an alternative design. Dynamic
programming considers the traffic density, vehicle arrival rates, and intersection geome-
try in signal timing allocation [33,34]. Although these models can provide optimal solu-
tions for traffic signal planning, they are computationally intensive and resource-demand-
ing, especially for large-scale networks. Therefore, reinforcement learning (RL), particu-
larly deep reinforcement learning (DRL), offers an innovative solution for training agents
to manage traffic signals [20]. Agents learn optimal strategies through trial and error to
minimize delays and enhance traffic flow stability. Although RL techniques can handle
complex nonlinear traffic patterns, they often require long training periods and significant
computational power.
Previous studies indicated that TSC systems can alter driving behavior, effectively
reducing vehicle carbon emissions and fuel consumption [35–37]. Eco-driving strategies
integrated with TSC systems encourage drivers to adopt energy-saving driving habits,
such as smooth acceleration and deceleration, maintaining optimal speeds and minimiz-
ing idling time [38,39]. By advising drivers to maintain stable speeds and avoid rapid ac-
celeration or braking, TSC systems developed based on eco-driving strategies reduce fuel
consumption and emissions. Hao et al. [26] developed a vehicle eco-approach and depar-
ture framework controlled by ASC to achieve carbon emissions reduction. Dynamic pro-
gramming models [40,41] and RL techniques [42,43] are also applied to optimize signal
timing specifically for decarbonization. These models can prioritize green waves in high-
traffic corridors and adjust signal timings to minimize idling at intersections, both of
which reduce fuel consumption. Some studies use multi-agent deep reinforcement learn-
ing techniques to coordinate multiple traffic signals, ensuring smooth traffic flow and re-
ducing stop-and-go traffic [44,45]. Additionally, TSC systems are designed to prioritize
eco-friendly vehicles, such as electric and hybrid cars, by providing them with longer
green phases or giving them priority at intersections to lower overall emissions [46,47].
To achieve the synergistic optimization of decarbonization and efficiency, balancing
the demand for efficient traffic flow with the goals of reducing emissions and fuel con-
sumption is necessary. Multi-objective optimization frameworks are employed to tackle
this challenge as they can handle conflicting objectives and provide solutions that balance
efficiency and decarbonization [48–50]. These multi-objective frameworks often use evo-
lutionary algorithms or other advanced optimization techniques to find Pareto optimal
solutions. Lin et al. [51] tackles multi-objective urban traffic light scheduling, minimizing
delays and carbon emissions using Q-learning-enhanced algorithms. Furthermore, adap-
tive TSC systems integrating eco-driving strategies and dynamic programming models
are particularly effective in achieving synergistic optimization [52,53]. Integrating eco-
Mathematics 2024, 12, 2056 4 of 24
driving/carbon emission and DRL allows the system to learn and adapt to real-time traffic
conditions. Boukerche et al. [54] used deep reinforcement learning to optimize traffic sig-
nals and vehicle speeds, ensuring smooth passage through intersections and reducing de-
lays and emissions. The DRL-based TSC method continuously improves performance by
incorporating real-time data from various sources, including vehicle-to-infrastructure
(V2I) communication, to enhance its optimization capabilities.
continuous interaction with the environment. The real-time state 𝑠 of the environment is
first input to the agent for taking the corresponding action 𝑎 according to its current
knowledge of policy 𝜋. Then, the agent obtains feedback reward 𝑅 (or punishment) from
the environment, and accumulates long-term goals based on the reward. Under the action
′
𝑎𝑡 , the state 𝑠𝑡 transits to the state 𝑠𝑡+1 with a probability of 𝑝𝑎 . In the learning process,
policy is constantly updated to maximize the expected value of the long-term reward (ac-
tion-value function) until the expected value stabilizes in the optimal policy 𝜋∗ (term:
“Converge”). The action-value function is defined as
where 𝑄∗ (𝑠, 𝑎) denotes the optimal expected return of the optimal adopted policy 𝜋 after
action 𝑎 is taken at state 𝑠, 𝐸 is the expected value, 𝑅𝑡+1 is the reward at the time step
𝑡 + 1, 𝑠𝑡+1 is the state at the time step 𝑡 + 1, 𝑎′ is the action taken at the time step 𝑡 + 1,
and 𝑄∗ (𝑠 ′ , 𝑎′ ) is the optimal expected return of the optimal adopted policy 𝜋 after action
𝑎′ is taken at state 𝑠𝑡+1 . In addition, 𝑠 ′ is the state at the time step 𝑡 + 1, 𝑟 is the reward
after action 𝑎 is taken at state 𝑠, and 𝑝 is the probability of the state transition. The opti-
mal policy by iterating the optimal action value function continuously is solved:
𝑦(𝑠, 𝑎) = 𝑟 + 𝛾 max
′
𝑄∗ (𝑠 ′ , 𝑎′ ; 𝑤 𝜃 ) (5)
𝑎
However, the DQN tends to overestimate Q-values. Therefore, this study employs
the Double DQN (DDQN) framework to design the agent, whose current target action-
value function is defined as
3.3. Framework
Based on the DRL’s architecture, the conceptual framework of the DRL-DG approach
consists of the environment and the agent is composed of a self-learning algorithm and a
TSC component as shown in Figure 2. The agent applies the DDQN algorithm and re-
ceives a reward related to optimization goals after executing actions affecting the environ-
ment. The TSC system takes actions to adjust traffic signal phases to smooth traffic flow.
Traffic environment information is collected and transformed to the state 𝑠𝑡 at the 𝑡
time step as the input of the agent in DRL-DG. Based on 𝑠𝑡 , an action 𝑎𝑡 is selected for
the agent through an 𝜀-greedy policy. According to the action 𝑎𝑡 , the TSC system remains
in the current traffic signal phase or switches to another traffic signal phase to change
vehicular movements on specific lanes. After taking action 𝑎𝑡 , the traffic environment
changes to the state 𝑠𝑡+1 at the next time step 𝑡 + 1. The reward 𝑟𝑡 of the state–action pair
(𝑠𝑡 , 𝑎𝑡 ) is calculated according to the definition of reward functions. Next, the reward 𝑟𝑡
and the state 𝑠𝑡+1 are returned from the environment, forming (𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝑠𝑡+1 ) together
with state–action pair (𝑠𝑡 , 𝑎𝑡 ), stored as the agent’s experience in the memory pool. The
state 𝑠𝑡+1 is used as the agent’s input at the next time step 𝑡 + 1. All procedures involving
the input and feedback mechanism between the agent and environment are iterative. Fi-
nally, the agent learns the optimal traffic signal phases and updates the DDQN model
from the memory pool by the experience replay method.
Figure 3. Schematic diagram of cells designed for west entrance at intersection (state presentation).
As for the design of each cell, it aims to reflect the distribution of vehicles along the
road. As shown in Figure 3, the cell nearest to the intersection is 7 m long, which can
accommodate only one vehicle. Considering the relatively low traffic density in the road-
way sections far from the intersection, the cell farthest from the intersection is 180 m long.
Compared with the method of using a real-time image or lane uniform division to repre-
sent the state, the proposed division method can reflect the actual nonuniform traffic den-
sity along the road, reduce the data dimension, and shorten the calculation time [77]. Us-
ing the presence of vehicles in each cell as the state can simplify traffic information, give
samples specific labels of the environmental features, facilitate the feature extraction in
the model, and thus increase the stability of convergence.
3.4.2. Action
The agent selects appropriate actions to divert traffic flow based on the traffic state.
The action in this study is defined as the selection of a possible traffic signal phase. The
action set is 𝐴 = {𝐸𝑊𝐺, 𝐸𝑊𝐿𝐺, 𝑁𝑆𝐺, 𝑁𝑆𝐿𝐺} representing the east–west straight move-
ment and right turn, the east–west left turn, the north–south straight movement and right
turn, and the north–south left turn, respectively. The minimum duration of each green
traffic signal phase is set to 10 s [63]. Meanwhile, a 4 s yellow signal is set during the
switching between green and red signals for intersection safety. At each signal phase de-
cision, if the agent selects the same phase, the green light for that phase is extended by 10
s. Otherwise, a 4 s yellow light is executed before switching to the next phase. In the DRL-
DG system, after a phase has been selected consecutively six times, it will trigger the en-
forcement of other phases. Each green phase duration ranges from 10 to 60 s.
3.4.3. Reward
At a certain moment, the agent selects an action according to the observed state. Once
the action is executed, the feedback, i.e., reward, is obtained for evaluating the perfor-
mance of the action. The reward function is a key factor in ensuring the convergence of
DRL and the achievement of optimization goals. The dual-objective reward function is
defined by the reward functions of traffic efficiency 𝑅𝑇𝐸 and carbon emissions 𝑅𝐶𝐸 .
(𝑡) (𝑡)
𝑅(𝑡) = 𝑊𝑇𝐸 𝑅𝑇𝐸 + 𝑊𝐶𝐸 𝑅𝐶𝐸 (7)
where 𝑊𝑇𝐸 and 𝑊𝐶𝐸 are the weights of traffic efficiency and carbon emissions set in the
dual-objective reward function, respectively.
The weight values in the reward function influence the model’s convergence. Com-
pared to the expert scoring method, analytic hierarchy process, or simple linear weighting,
the entropy weight method calculates weights based on data distribution, reducing sub-
jective biases and providing a more data-driven and adaptable solution. The entropy
weight method is used to adjust the weights based on the reward values in the DRL-based
ATSC system [49,78]. Given that the entropy method is sensitive to data distribution and
Mathematics 2024, 12, 2056 9 of 24
has initial subjective weighting issues, data normalization and the dynamic adjustment of
weights based on real-time traffic data and reward values are implemented, ensuring sta-
ble and reliable weighting results.
𝑥𝑖 − 𝑚𝑖𝑛 {𝑥𝑖 }
𝑦𝑖 = (8)
𝑚𝑎𝑥{𝑥𝑖 } − min {𝑥𝑖 }
where 𝑚𝑖𝑛 {𝑥𝑖 } and 𝑚𝑎𝑥{𝑥𝑖 } represent the maximum and minimum value of the 𝑖 re-
ward.
𝑥𝑖𝑗
𝑃𝑖𝑗 = 𝑚 , 0 ≤ 𝑃𝑖𝑗 ≤ 1 (9)
∑𝑖=1 𝑥𝑖𝑗
where 𝑥𝑖𝑗 is the reward value at action 𝑖 calculated by the reward function 𝑗.
𝑔𝑗 = 1 − 𝐻𝑗 (12)
where 𝑔𝑗 is the coefficient of variation in the reward function 𝑗.
𝑔𝑗
𝑤𝑗 = 𝑛 (13)
∑𝑗=1 𝑔𝑗
where 𝑤𝑗 is the weight value of the reward function 𝑗, i.e., the value of 𝑊𝑇𝐸 and 𝑊𝐶𝐸 .
In terms of traffic efficiency, minimizing travel delays is the primary goal. Previous
studies have proved that the waiting time of vehicles at the intersection can be used as an
indicator of travel delay [58,61,64].
CWT denotes the cumulative or total waiting time of all vehicles stopping and wait-
ing at the lane before crossing the intersection. A longer waiting time indicates longer
delays. The difference in CWT between two adjacent execution time steps refers to the
reward function indicating traffic efficiency:
(𝑡)
𝑅𝑇𝐸 = −(𝐶𝑊𝑇(𝑡+1) − 𝐶𝑊𝑇(𝑡) ) (14)
where 𝐶𝑊𝑇(𝑡) and 𝐶𝑊𝑇(𝑡+1) denote the cumulative waiting time at step 𝑡 and 𝑡 + 1, re-
spectively.
In terms of carbon emissions, its major source is carbon dioxide emissions. Thus, the
second goal is minimizing carbon dioxide emissions (CDEs). The difference in CDE in two
adjacent executing actions refers to the reward function indicating carbon emission reduc-
tions:
(𝑡)
𝑅𝐶𝐸 = −(𝑃𝐸(𝑡+1) − 𝑃𝐸(𝑡) ) (15)
where 𝑃𝐸(𝑡) and 𝑃𝐸(𝑡+1) denote the cumulative carbon dioxide emissions of step 𝑡 and
𝑡 + 1, respectively.
Carbon dioxide emissions are acquired by the pollutant emission model of SUMO
[79], which defines the emission quantity (g/h) as a function of the vehicular current en-
gine power using typical emission curves over power (CEPs). The total carbon dioxide
emissions 𝑃𝐸 are defined as
where
𝜌
𝑃𝐴𝑖𝑟 = (𝑐𝑑 × 𝐴 × )𝑣 3 (18)
2
𝑃𝐴𝑐𝑐𝑒𝑙 = (𝑚𝑣𝑒ℎ𝑖𝑐𝑙𝑒 + 𝑚𝑟𝑜𝑡 + 𝑚𝑙𝑜𝑎𝑑 )𝑎𝑣 (19)
The given state represented by 80 cells, i.e., the 0–1 matrix with size 8 × 10, is cali-
brated to the Q-value of the action through convolutional and fully connected layers.
Based on the size of the 0–1 state matrix, two convolution layers with 100 and 10 kernels
are set up [80], whose filter size is 1 × 3 and stride is 2, to create labels with a lane char-
acteristic. The final convolution layer’s output is flattened via a pooling layer as the state
vector to fully connected layers. The LSTM includes two layers with 80 units and a 0.2
dropout rate. It is noted that the number of fully connected layers is 5, whose width is
400. Using the Adam optimizer, the learning rate is 0.001, the batch size is 100, and the
training iteration is 800 times per round, using the mean square error as the loss function.
The Q-value indicates the reward value. Thus, the optimal selection is the action which
has the highest Q-value. The agent’s experience at every time step is stored in the memory
pool. The DQN is trained by the experience replay method to update the weight parame-
ter of the neural network.
Mathematics 2024, 12, 2056 11 of 24
4. Case Validation
Based on a representative signalized intersection in the Changsha urban road net-
work, Simulation of Urban Mobility (SUMO) software is adopted to build the simulated
intersection scenario for training and testing the proposed algorithm. In the simulation,
the algorithm collects traffic information and controls traffic signal phases by the Traci
interface coded directly in Python. The agent in DRL-DG is trained under a random traffic
flow generated by a Weibull distribution. The performance of the proposed DRL-DG is
evaluated at the simulated intersection with the real-world traffic flow data recorded by
photography and compared with that of three classic traffic signal control algorithms.
4.1. Scenario
Mathematics 2024, 12, 2056 12 of 24
The experimental scenario refers to the intersection of Kaiyuan East Road and
Huangxing Avenue in Changsha City, Hunan Province, China. The intersection is a typi-
cal cruciform signalized intersection in China, which connects four 500 m long dual car-
riageways with four lanes each way in Figure 5. For each entrance direction, there is an
inside lane for left-turning vehicles, two middle lanes for straight-ahead vehicles, and an
outside lane for right-turning or straight-ahead vehicles. The vehicles on the outside lane
are permitted to turn right during the red signal phase without conflicts in the intersection
area.
The four directions of the real-world intersection are all business areas with a bal-
anced traffic flow distribution. The real-world traffic flow data were collected from 7:30
a.m. to 8:30 a.m., which is part of the peak hours in Changsha City on Thursday 23 June,
2022. A total of 979 vehicles were observed during such a period. The number of vehicles
increases significantly during the peak hours, especially in the middle lanes which ac-
count for about 70% of the total number of vehicles as presented in Figure 6, causing traffic
congestion. In such a case, the traffic flow approximately obeys a Weibull distribution,
which is thus used to simulate the flow distributions during peak hours.
the maximum acceleration, 4.5 m/s2 as the maximum deceleration, and 0.5 (𝑠𝑖𝑔𝑚𝑎) as
the driver defect.
Table 1 shows the detailed setting for the simulation and algorithm. The parameters
presented are for the agent (action number, duration of signal phases) and algorithm (ep-
isode, step, learning rate, batch size, memory, etc.).
1
𝑃 = (𝑚 ∙ 𝑎 + 𝑚 ∙ 𝑔 ∙ 𝐶𝑟 + ∙ 𝑝 ∙ 𝐴 ∙ 𝐶𝑑 ∙ 𝑣 2 ) ∙ 𝑣 (24)
2
where 𝑃 is the engine power in kilowatts (kW). 𝐸𝐹𝐶𝑂 and 𝐸𝐹𝑁𝑂𝑥 are the emission fac-
tors for CO and NOx (grams/kWh). 𝑣 is the vehicle speed (m/s). 𝑚 is the vehicle mass
(kg). 𝑎 is the vehicle acceleration in (m/s2). 𝑔 is the gravitational acceleration, typically
9.81 m/s2. 𝐶𝑟 is the rolling resistance coefficient. 𝑝 is the air density (kg/m3), typically
1.225 kg/m3 at sea level and 15 °C. 𝐴 is the vehicle frontal area (m²). 𝐶𝑑 is the air re-
sistance coefficient.
Table 2. Average performance of traffic signal control methods with evaluation metrics.
Average VWT Average VQL Average CDE Average CDE Average VFC Average NGE
TSC
(s/Vehicle) (Vehicle/s) (g/Vehicle) (Rate) (mL/Vehicle) (g/Vehicle)
Mathematics 2024, 12, 2056 17 of 24
emit higher levels of pollutants, such as CO2, CO, and NOx, during idling and frequent
stop-and-go movements. The DRL-DG system effectively lowers emissions by minimiz-
ing these periods through optimized signal timings. Specifically, the system adjusts signal
phases dynamically, ensuring that vehicles spend less time idling at red lights and expe-
rience fewer abrupt stops and starts. This leads to a smoother traffic flow with reduced
acceleration and deceleration cycles, significantly reducing carbon emissions. The macro-
scopic simulation results confirm this point (see Figure 12), showing that the DRL-DG
system has the lowest vehicle acceleration/deceleration frequency. In fact, the DRL-DG
system influences several critical factors directly impacting emissions: it reduces idle
times, ensures smoother transitions through intersections, and adapts to real-time traffic
conditions to prevent congestion. These adjustments result in a substantial reduction in
overall fuel consumption and emissions. In simulations, the DRL-DG system achieved a
69.71% reduction in CO2 emissions compared to FTSC and a 52.71% reduction compared
to ASC. Further, CO and NOx emissions were significantly reduced, proving the environ-
mental benefits of the DRL-DG system and its potential to improve sustainable urban mo-
bility by lowering harmful emissions.
Several factors also influence vehicle carbon emissions at intersections, including in-
tersection design, vehicle types, and traffic volume. The design of an intersection, such as
the number of lanes, presence of dedicated turning lanes, and overall layout, can signifi-
cantly affect traffic flow and emissions. Well-designed intersections that minimize vehicle
idling and facilitate smooth traffic flow can reduce emissions. Additionally, the types of
vehicles and their respective emission rates impact overall emissions. Eco-friendly cars
produce fewer emissions than conventional vehicles. Traffic volume is another critical fac-
tor: high traffic volumes often lead to increased idling times and more frequent stop-and-
go movements, contributing to higher emissions. Traffic signal optimization aims to pre-
vent high traffic volumes, thereby reducing carbon emissions. Therefore, effective policy
measures are essential to addressing these factors and reducing vehicle emissions at in-
tersections. Implementing congestion pricing can decrease traffic volume during peak
hours, reducing emissions. Incentives for eco-driving behaviors and driver education can
promote energy-efficient driving practices. Investing in smart infrastructure, such as
adaptive traffic signal control systems and real-time traffic monitoring, can enhance traffic
flow and reduce emissions.
5. Conclusions
To improve traffic efficiency and reduce carbon emissions at intersections, this study
proposes a deep reinforcement learning-based dual-objective optimization algorithm for
the adaptive traffic signal control system. The objectives of this study are achieved by
reducing vehicle waiting time and carbon dioxide emissions through the proposed DRL-
DG-based ATSC traffic control systems. In addition, the performance of the proposed sys-
tem in reducing vehicle fuel consumption and toxic gas emissions is also evaluated.
Based on the video data collected from an isolated intersection in Changsha City,
China, the intersection and traffic flow are simulated through SUMO. Based on the simu-
lated intersection, the proposed DRL-DG algorithm is trained and tested with an equal pri-
ority set for vehicle waiting time and carbon dioxide emissions. For comparison purposes,
fixed-time signal control (FTSC), actuated signal control (ASC), and DRL-based ATSC that
optimizes only traffic efficiency are also trained and tested. In terms of traffic efficiency, the
results show that DRL-DG and -SG methods perform similarly on traffic efficiency without
significance. But DRL-DG performs much better than FTSC and ASC with a reduction of
more than 71% in vehicle waiting time. Regarding carbon dioxide emissions, the DRL-DG
method performs best with a reduction of more than 46%. The traffic control system devel-
oped based on the proposed DRL-DG also shows its advantage in the reduction in vehicle
fuel consumption and toxic gas emissions. For all evaluation metrics, the performance of the
proposed algorithm is especially outstanding in high-traffic-flow situations.
Mathematics 2024, 12, 2056 20 of 24
The proposed DRL-DG-based traffic control systems are suitable for intersections
with heavy traffic, considering their overwhelming advantage in high-traffic-flow situa-
tions and the limited funds available for system development. By revising the weights of
the two objectives, the algorithms can adjust to government policies and practical de-
mands on the trade-off of traffic efficiency and carbon emissions.
This study is not without limitations. In terms of objectives, road safety, especially
traffic conflicts, which is another important issue of traffic in intersections, is not taken
into consideration. In addition, the DRL-DG in the ATSC system faces challenges such as
requiring extensive high-quality data, hyperparameter tuning, system complexity,
lengthy training times, and ensuring robustness under diverse conditions. Future research
will address these issues, aiming to develop more efficient, scalable, and practical DRL-
DG-optimized ATSC systems for diverse urban environments.
Author Contributions: Conceptualization, G.Z. and F.C.; methodology, G.Z.; software, G.Z. and
Z.Z.; validation, G.Z., F.C., and Z.Z.; formal analysis, G.Z. and H.H.; investigation, G.Z. and H.H.;
resources, G.Z.; data curation, G.Z. and Z.Z.; writing—original draft preparation, G.Z.; writing—
review and editing, G.Z., F.C., H.H., and Z.Z.; visualization, G.Z. and Z.Z.; supervision, F.C. and
H.H.; project administration, F.C.; funding acquisition, F.C. and H.H. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was funded by the National Key Research and Development Program of
China (grant number 2023YFB2504704) and the Natural Science Foundation in Hunan Province
(grant number S2023JJQNJJ1969)
Data Availability Statement: The raw data supporting the conclusions of this article will be made
available by the authors on request.
Conflicts of Interest: The authors declare no conflict of interest.
Reference
1. Zhu, L.; Yu, F.R.; Wang, Y.; Ning, B.; Tang, T. Big Data Analytics in Intelligent Transportation Systems: A Survey. IEEE Trans.
Intell. Transp. Syst. 2019, 20, 383–398. https://doi.org/10.1109/tits.2018.2815678.
2. Zhao, Y.; Tian, Z. Applicability of Adaptive Traffic Control Systems in Nevada’s Urban Areas. No. 092-09-803. Nevada Department of
Transportation: Carson City, NV, USA, 2011.
3. Federal Highway Administration. Traffic Signal Timing Manual. Technical Report FHWA-HOP-08-024, U.S. Department of
Transportation: Washington, DC, USA, 2008.
4. Muralidharan, A.; Pedarsani, R.; Varaiya, P. Analysis of fixed-time control. Transp. Res. Part B Methodol. 2015, 73, 81–90.
5. Celtek, S.A.; Durdu, A.; Ali, M.E.M. Evaluating Action Durations for Adaptive Traffic Signal Control Based On Deep Q-Learn-
ing. Int. J. Intell. Transp. Syst. Res. 2021, 19, 557–571. https://doi.org/10.1007/s13177-021-00262-5.
6. Roess, R.P.; Prassas, E.S.; Mcshane, W.R. Traffic Engineering; Pearson/Prentice Hall: Hoboken, NJ, USA, 2014.
7. Zhou, P.; Fang, Z.; Dong, H.; Liu, J.; Pan, S. Data analysis with multi-objective optimization algorithm: A study in smart traffic
signal system. In Proceedings of the 2017 IEEE 15th International Conference on Software Engineering Research, Management
and Applications (SERA), London, UK, 7–9 June 2017; pp. 307–310.
8. Cesme, B.; Furth, P.G. Self-organizing traffic signals using secondary extension and dynamic coordination. Transp. Res. Part C
Emerg. Technol. 2014, 48, 1–15. https://doi.org/10.1016/j.trc.2014.08.006.
9. Wang, X.B.; Yin, K.; Liu, H. Vehicle actuated signal performance under general traffic at an isolated intersection. Transp. Res.
Part C Emerg. Technol. 2018, 95, 582–598. https://doi.org/10.1016/j.trc.2018.08.002.
10. Eom, M.; Kim, B.-I. The traffic signal control problem for intersections: A review. Eur. Transp. Res. Rev. 2020, 12, 50.
https://doi.org/10.1186/s12544-020-00440-8.
11. Wang, Y.; Yang, X.; Liang, H.; Liu, Y. A Review of the Self-Adaptive Traffic Signal Control System Based on Future Traffic
Environment. J. Adv. Transp. 2018, 2018, 1096123. https://doi.org/10.1155/2018/1096123.
12. Stevanovic, A.; Kergaye, C.; Martin, P.T. Scoot and scats: A closer look into their operations. In Proceedings of the 88th Annual
Meeting of the Transportation Research Board, Washington DC, USA, 11–15 January 2009.
13. Zhao, D.; Dai, Y.; Zhang, Z. Computational Intelligence in Urban Traffic Signal Control: A Survey. IEEE Trans. Syst. Man Cybern.
Part C Appl. Rev. 2011, 42, 485–494. https://doi.org/10.1109/tsmcc.2011.2161577.
14. Wei, H.; Zheng, G.; Gayah, V.; Li, Z. Recent advances in reinforcement learning for traffic signal control: A survey of models
and evaluation. ACM SIGKDD Explor. Newsl. 2021, 22, 12–18.
Mathematics 2024, 12, 2056 21 of 24
15. Balaji, P.; German, X.; Srinivasan, D. Urban traffic signal control using reinforcement learning agents. IET Intell. Transp. Syst.
2010, 4, 177–188. https://doi.org/10.1049/iet-its.2009.0096.
16. Mikami, S.; Kakazu, Y. Genetic reinforcement learning for cooperative traffic signal control. First IEEE Conference on Evolu-
tionary Computation. In Proceedings of the IEEE World Congress on Computational Intelligence, Orlando, FL, USA, 27–29 June
1994; pp. 223-228.
17. Dai, Y.; Hu, J.; Zhao, D.; Zhu, F. Neural network based online traffic signal controller design with reinforcement training. In
Proceedings of the 2011 14th International IEEE Conference on Intelligent Transportation Systems—(ITSC 2011), Washington,
DC, USA, 5–7 October 2011; pp. 1045–1050.
18. Arel, I.; Liu, C.; Urbanik, T.; Kohls, A. Reinforcement learning-based multi-agent system for network traffic signal control. IET
Intell. Transp. Syst. 2010, 4, 128. https://doi.org/10.1049/iet-its.2009.0070.
19. Haydari, A.; Yilmaz, Y. Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey. IEEE Trans. Intell.
Transp. Syst. 2020, 23, 11–32. https://doi.org/10.1109/tits.2020.3008612.
20. Cel
21. , M.; Naik, A.; Goodman, L.; Crebo, J.; Abrar, T.; Abad, Z.S.H.; Bazzan, A.L.; Far, B. Reinforcement learning in urban network
traffic signal control: A systematic literature review. Expert Syst. Appl. 2022, 199, 116830.
https://doi.org/10.1016/j.eswa.2022.116830.
22. Gregurić, M.; Vujić, M.; Alexopoulos, C.; Miletić, M. Application of Deep Reinforcement Learning in Traffic Signal Control: An
Overview and Impact of Open Traffic Data. Appl. Sci. 2020, 10, 4011. https://doi.org/10.3390/app10114011.
23. Liu, S.; Wu, G.; Barth, M. A Complete State Transition-Based Traffic Signal Control Using Deep Reinforcement Learning. In
Proceedings of the 2022 IEEE Conference on Technologies for Sustainability (SusTech), Corona, CA, USA, 21–23 April 2022; pp.
100–107.
24. Vasconcelos, L.; Silva, A.B.; Seco, Á .M.; Fernandes, P.; Coelho, M.C. Turboroundabouts: Multicriterion assessment of intersec-
tion capacity, safety, and emissions. Transp. Res. Rec. 2014, 2402, 28–37.
25. Yao, R.; Wang, X.; Xu, H.; Lian, L. Emission factor calibration and signal timing optimisation for isolated intersections. IET Intell.
Transp. Syst. 2018, 12, 158–167. https://doi.org/10.1049/iet-its.2016.0332.
26. Yao, R.; Sun, L.; Long, M. VSP‐based emission factor calibration and signal timing optimisation for arterial streets. IET Intell.
Transp. Syst. 2019, 13, 228–241. https://doi.org/10.1049/iet-its.2018.5066.
27. Hao, P.; Wu, G.; Boriboonsomsin, K.; Barth, M.J. Eco-Approach and Departure (EAD) Application for Actuated Signals in Real-
World Traffic. IEEE Trans. Intell. Transp. Syst. 2018, 20, 30–40. https://doi.org/10.1109/TITS.2018.2794509.
28. Shepelev, V.; Glushkov, A.; Fadina, O.; Gritsenko, A. Comparative Evaluation of Road Vehicle Emissions at Urban Intersections
with Detailed Traffic Dynamics. Mathematics 2022, 10, 1887. https://doi.org/10.3390/math10111887.
29. Shepelev, V.; Glushkov, A.; Slobodin, I.; Balfaqih, M. Studying the Relationship between the Traffic Flow Structure, the Traffic
Capacity of Intersections, and Vehicle-Related Emissions. Mathematics 2023, 11, 3591. https://doi.org/10.3390/math11163591.
30. Jovanović, A.; Kukić, K.; Stevanović, A.; Teodorović, D. Restricted crossing U-turn traffic control by interval Type-2 fuzzy logic.
Expert Syst. Appl. 2023, 211, 118613. https://doi.org/10.1016/j.eswa.2022.118613.
31. Zheng, L.; Li, X. Simulation‐based optimization method for arterial signal control considering traffic safety and efficiency under
uncertainties. Comput. Civ. Infrastruct. Eng. 2023, 38, 640–659. https://doi.org/10.1111/mice.12876.
32. Tsitsokas, D.; Kouvelas, A.; Geroliminis, N. Two-layer adaptive signal control framework for large-scale dynamically-congested
networks: Combining efficient Max Pressure with Perimeter Control. Transp. Res. Part C Emerg. Technol. 2023, 152, 104128.
https://doi.org/10.1016/j.trc.2023.104128.
33. Zhao, J.; Ma, W. An Alternative Design for the Intersections with Limited Traffic Lanes and Queuing Space. IEEE Trans. Intell.
Transp. Syst. 2021, 22, 1473–1483. https://doi.org/10.1109/tits.2020.2971353.
34. Li, J.; Peng, L.; Xu, S.; Li, Z. Distributed edge signal control for cooperating pre-planned connected automated vehicle path and
signal timing at edge computing-enabled intersections. Expert Syst. Appl. 2024, 241, 122570.
https://doi.org/10.1016/j.eswa.2023.122570.
35. Li, J.; Yu, C.; Shen, Z.; Su, Z.; Ma, W. A survey on urban traffic control under mixed traffic environment with connected auto-
mated vehicles. Transp. Res. Part C Emerg. Technol. 2023, 154, 104258. https://doi.org/10.1016/j.trc.2023.104258.
36. McKenney, D.; White, T. Distributed and adaptive traffic signal control within a realistic traffic simulation. Eng. Appl. Artif.
Intell. 2013, 26, 574–583. https://doi.org/10.1016/j.engappai.2012.04.008.
37. Tan, W.; Li, Z.C.; Tan, Z.J. Modeling the effects of speed limit, acceleration, and deceleration on overall delay and traffic emission
at a signalized intersection. J. Transp. Eng. Part A-Syst. 2017, 143, 04017063.
38. Shi, X.; Zhang, J.; Jiang, X.; Chen, J.; Hao, W.; Wang, B. Learning eco-driving strategies from human driving trajectories. Phys.
A Stat. Mech. Its Appl. 2024, 633, 129353. https://doi.org/10.1016/j.physa.2023.129353.
39. Rabinowitz, A.I.; Ang, C.C.; Mahmoud, Y.H.; Araghi, F.M.; Meyer, R.T.; Kolmanovsky, I.; Asher, Z.D.; Bradley, T.H. Real-Time
Implementation Comparison of Urban Eco-Driving Controls. IEEE Trans. Control. Syst. Technol. 2023, 32, 143–157.
https://doi.org/10.1109/tcst.2023.3304910.
40. Li, Y.; Yang, Y.; Lin, X.; Hu, X. Traffic Information-Based Hierarchical Control Strategies for Eco-Driving of Plug-In Hybrid
Electric Vehicles. IEEE Trans. Veh. Technol. 2023, 73, 3206–3217. https://doi.org/10.1109/tvt.2023.3326989.
Mathematics 2024, 12, 2056 22 of 24
41. Dong, S.; Harzer, J.; Frey, J.; Meng, X.; Liu, Q.; Gao, B.; Diehl, M.; Chen, H. Cooperative Eco-Driving Control of Connected
Multi-Vehicles With Spatio-Temporal Constraints. IEEE Trans. Intell. Veh. 2023, 9, 1733–1743.
https://doi.org/10.1109/tiv.2023.3282490.
42. Zhang, Z.; Ding, H.; Guo, K.; Zhang, N. An Eco-driving Control Strategy for Connected Electric Vehicles at Intersections Based
on Preceding Vehicle Speed Prediction. IEEE Trans. Transp. Electrif. 2024, PP, 1–1. https://doi.org/10.1109/tte.2024.3410278.
43. Boukerche, A.; Zhong, D.; Sun, P. FECO: An Efficient Deep Reinforcement Learning-Based Fuel-Economic Traffic Signal Control
Scheme. IEEE Trans. Sustain. Comput. 2021, 7, 144–156. https://doi.org/10.1109/tsusc.2021.3138926.
44. Ding, H.; Zhuang, W.; Dong, H.; Yin, G.; Liu, S.; Bai, S. Eco-Driving Strategy Design of Connected Vehicle among Multiple
Signalized Intersections Using Constraint-enforced Reinforcement Learning. IEEE Trans. Transp. Electrif. 2024, PP, 1–1.
https://doi.org/10.1109/tte.2024.3396122.
45. Wang, Q.; Ju, F.; Wang, H.; Qian, Y.; Zhu, M.; Zhuang, W.; Wang, L. Multi-agent reinforcement learning for ecological car-
following control in mixed traffic. IEEE Trans. Transp. Electrification 2024, PP, 1–1. https://doi.org/10.1109/tte.2024.3383091.
46. Feng, J.; Lin, K.; Shi, T.; Wu, Y.; Wang, Y.; Zhang, H.; Tan, H. Cooperative traffic optimization with multi-agent reinforcement
learning and evolutionary strategy: Bridging the gap between micro and macro traffic control. Phys. A Stat. Mech. Its Appl. 2024,
647, 129734. https://doi.org/10.1016/j.physa.2024.129734.
47. Krishankumar, R.; Pamucar, D.; Deveci, M.; Ravichandran, K.S. Prioritization of zero-carbon measures for sustainable urban
mobility using integrated double hierarchy decision framework and EDAS approach. Sci. Total. Environ. 2021, 797, 149068.
https://doi.org/10.1016/j.scitotenv.2021.149068.
48. Liu, J.; Wang, C.; Zhao, W. An eco-driving strategy for autonomous electric vehicles crossing continuous speed-limit signalized
intersections. Energy 2024, 294, 130829. https://doi.org/10.1016/j.energy.2024.130829.
49. Zhang, X.; Fan, X.; Yu, S.; Shan, A.; Fan, S.; Xiao, Y.; Dang, F. Intersection Signal Timing Optimization: A Multi-Objective Evo-
lutionary Algorithm. Sustainability 2022, 14, 1506. https://doi.org/10.3390/su14031506.
50. Zhang, G.; Chang, F.; Jin, J.; Yang, F.; Huang, H. Multi-objective deep reinforcement learning approach for adaptive traffic
signal control system with concurrent optimization of safety, efficiency, and decarbonization at intersections. Accid. Anal. Prev.
2024, 199, 107451. https://doi.org/10.1016/j.aap.2023.107451.
51. Salem, S.; Leonhardt, A. Optimizing Traffic Adaptive Signal Control: A Multi-Objective Simulation-Based Approach for En-
hanced Transportation Efficiency. In Proceedings of the 10th International Conference on Vehicle Technology and Intelligent
Transport Systems – VEHITS, Angers, France, 2-4 May 2024; pp. 344-351. https://doi.org/10.5220/0012682100003702.
52. Lin, Z.; Gao, K.; Wu, N.; Suganthan, P.N. Problem-Specific Knowledge Based Multi-Objective Meta-Heuristics Combined Q-
Learning for Scheduling Urban Traffic Lights With Carbon Emissions. IEEE Trans. Intell. Transp. Syst. 2024, PP, 1–12.
https://doi.org/10.1109/tits.2024.3397077.
53. Deshpande, S.R.; Jung, D.; Bauer, L.; Canova, M. Integrated Approximate Dynamic Programming and Equivalent Consumption
Minimization Strategy for Eco-Driving in a Connected and Automated Vehicle. IEEE Trans. Veh. Technol. 2021, 70, 11204–11215.
https://doi.org/10.1109/tvt.2021.3102505.
54. Wan, C.; Shan, X.; Hao, P.; Wu, G. Multi-objective coordinated control strategy for mixed traffic with partially connected and
automated vehicles in urban corridors. Phys. A Stat. Mech. Its Appl. 2024, 635, 129485.
https://doi.org/10.1016/j.physa.2023.129485.
55. Boukerche, A., Zhong, D., & Sun, P. (2021). Feco: An efficient deep reinforcement learning-based fuel-economic traffic signal
control scheme. IEEE Trans. Sustain. Comput. 2021, 7(1), 144-156.
56. Jamil, A.R.M.; Ganguly, K.K.; Nower, N. Adaptive traffic signal control system using composite reward architecture based deep
reinforcement learning. IET Intell. Transp. Syst. 2020, 14, 2030–2041. https://doi.org/10.1049/iet-its.2020.0443.
57. Liu, C.; Sheng, Z.; Chen, S.; Shi, H.; Ran, B. Longitudinal control of connected and automated vehicles among signalized inter-
sections in mixed traffic flow with deep reinforcement learning approach. Phys. A Stat. Mech. Its Appl. 2023, 629, 129189.
https://doi.org/10.1016/j.physa.2023.129189.
58. Hua, C.; Fan, W.D. Safety-oriented dynamic speed harmonization of mixed traffic flow in nonrecurrent congestion. Phys. A Stat.
Mech. Its Appl. 2024, 634, 129439. https://doi.org/10.1016/j.physa.2023.129439.
59. Jamil, A.R.M.; Nower, N. A Comprehensive Analysis of Reward Function for Adaptive Traffic Signal Control. Knowl. Eng. Data
Sci. 2021, 4, 85–96. https://doi.org/10.17977/um018v4i22021p85-96.
60. Ahmed, A.A.; Malebary, S.J.; Ali, W.; Barukab, O.M. Smart Traffic Shaping Based on Distributed Reinforcement Learning for
Multimedia Streaming over 5G-VANET Communication Technology. Mathematics 2023, 11, 700.
https://doi.org/10.3390/math11030700.
61. Agafonov, A.; Yumaganov, A.; Myasnikov, V. Cooperative Control for Signalized Intersections in Intelligent Connected Vehicle
Environments. Mathematics 2023, 11, 1540. https://doi.org/10.3390/math11061540.
62. Genders, W.; Razavi, S. Evaluating reinforcement learning state representations for adaptive traffic signal control. Procedia Com-
put. Sci. 2018, 130, 26–33. https://doi.org/10.1016/j.procs.2018.04.008.
63. Dong, L.; Xie, X.; Lu, J.; Feng, L.; Zhang, L. OAS Deep Q-Learning-Based Fast and Smooth Control Method for Traffic Signal
Transition in Urban Arterial Tidal Lanes. Sensors 2024, 24, 1845. https://doi.org/10.3390/s24061845.
Mathematics 2024, 12, 2056 23 of 24
64. Aslani, M.; Mesgari, M.S.; Wiering, M. Adaptive traffic signal control with actor-critic methods in a real-world traffic network
with different traffic disruption events. Transp. Res. Part C Emerg. Technol. 2017, 85, 732–752.
https://doi.org/10.1016/j.trc.2017.09.020.
65. Touhbi, S.; Babram, M.A.; Nguyen-Huu, T.; Marilleau, N.; Hbid, M.L.; Cambier, C.; Stinckwich, S. Adaptive Traffic Signal Con-
trol : Exploring Reward Definition For Reinforcement Learning. Procedia Comput. Sci. 2017, 109, 513–520.
https://doi.org/10.1016/j.procs.2017.05.327.
66. Li, D.; Wu, J.; Xu, M.; Wang, Z.; Hu, K. Adaptive Traffic Signal Control Model on Intersections Based on Deep Reinforcement
Learning. J. Adv. Transp. 2020, 2020, 6505893. https://doi.org/10.1155/2020/6505893.
67. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal
Process. Mag. 2017, 34, 26–38. https://doi.org/10.1109/msp.2017.2743240.
68. Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Confer-
ence on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. https://doi.org/10.1609/aaai.v30i1.10295.
69. Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952.
70. Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement
learning. Int. Conf. Mach. Learn. PMLR, 2016; pp. 1995–2003. https://proceedings.mlr.press/v48/wangf16.html.
71. Liang, X.; Du, X.; Wang, G.; Han, Z. A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Trans. Veh.
Technol. 2019, 68, 1243–1253. https://doi.org/10.1109/tvt.2018.2890726.
72. Chu, T.; Wang, J.; Codeca, L.; Li, Z. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE
Trans. Intell. Transp. Syst. 2019, 21, 1086–1095. https://doi.org/10.1109/TITS.2019.2901791.
73. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep rein-
forcement learning. arXiv 2015, arXiv:1509.02971.
74. Pang, H.; Gao, W. Deep Deterministic Policy Gradient for Traffic Signal Control of Single Intersection. In Proceedings of the
2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 5861–5866.
75. Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In International Conference
on Machine Learning; PMLR: 2018; pp. 1587–1596. https://proceedings.mlr.press/v80/fujimoto18a.html.
76. Ding, Z.; Huang, Y.; Yuan, H.; Dong, H. Introduction to reinforcement learning. In Deep Reinforcement Learning; Springer: Sin-
gapore, 2020; pp. 47–123.
77. Zeng, J.; Hu, J.; Zhang, Y. Training Reinforcement Learning Agent for Traffic Signal Control under Different Traffic Conditions.
In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference—ITSC, Auckland, New Zealand, 27–30 October
2019.
78. Jin, J.; Li, Y.; Huang, H.; Dong, Y.; Liu, P. A variable speed limit control approach for freeway tunnels based on the model-based
reinforcement learning framework with safety perception. Accid. Anal. Prev. 2024, 201, 107570.
https://doi.org/10.1016/j.aap.2024.107570.
79. Krajzewicz, D.; Behrisch, M.; Wagner, P.; Luz, R.; Krumnow, M. Second generation of pollutant emission models for SUMO. In
Modeling Mobility with Open Data; Springer, Cham, Switzerland, 2015; pp. 203–221.
80. Jin, J.; Huang, H.; Yuan, C.; Li, Y.; Zou, G.; Xue, H. Real-time crash risk prediction in freeway tunnels considering features
interaction and unobserved heterogeneity: A two-stage deep learning modeling framework. Anal. Methods Accid. Res. 2023, 40,
100306. https://doi.org/10.1016/j.amar.2023.100306.
81. Wu, Y.; Ho, C. The development of taiwan arterial traffic‐adaptive signal control system and its field test: A taiwan experience.
J. Adv. Transp. 2009, 43, 455–480. https://doi.org/10.1002/atr.5670430404.
82. Tsang, K.; Hung, W.; Cheung, C. Emissions and fuel consumption of a Euro 4 car operating along different routes in Hong
Kong. Transp. Res. Part D Transp. Environ. 2011, 16, 415–422. https://doi.org/10.1016/j.trd.2011.02.004.
83. Choudhary, A.; Gokhale, S. Urban real-world driving traffic emissions during interruption and congestion. Transp. Res. Part D
Transp. Environ. 2016, 43, 59–70. https://doi.org/10.1016/j.trd.2015.12.006.
84. Zhou, X.; Tanvir, S.; Lei, H.; Taylor, J.; Liu, B.; Rouphail, N.M.; Frey, H.C. Integrating a simplified emission estimation model
and mesoscopic dynamic traffic simulator to efficiently evaluate emission impacts of traffic management strategies. Transp. Res.
Part D Transp. Environ. 2015, 37, 123–136. https://doi.org/10.1016/j.trd.2015.04.013.
85. Clarke, P.; Muneer, T.; Cullinane, K. Cutting vehicle emissions with regenerative braking. Transp. Res. Part D Transp. Environ.
2010, 15, 160–167.
86. Gallus, J.; Kirchner, U.; Vogt, R.; Benter, T. Impact of driving style and road grade on gaseous exhaust emissions of passenger
vehicles measured by a Portable Emission Measurement System (PEMS). Transp. Res. Part D Transp. Environ. 2017, 52, 215–226.
https://doi.org/10.1016/j.trd.2017.03.011.
87. Ye, Q.; Chen, X.; Liao, R.; Yu, L. Development and evaluation of a vehicle platoon guidance strategy at signalized intersections
considering fuel savings. Transp. Res. Part D Transp. Environ. 2020, 77, 120–131. https://doi.org/10.1016/j.trd.2019.10.020.
88. Pandian, S.; Gokhale, S.; Ghoshal, A.K. Evaluating effects of traffic and vehicle characteristics on vehicular emissions near traffic
intersections. Transp. Res. Part D Transp. Environ. 2009, 14, 180–196. https://doi.org/10.1016/j.trd.2008.12.001.
Mathematics 2024, 12, 2056 24 of 24
89. Boryaev, A.; Malygin, I.; Marusin, A. Areas of focus in ensuring the environmental safety of motor transport. Transp. Res. Pro-
cedia 2020, 50, 68–76. https://doi.org/10.1016/j.trpro.2020.10.009.
90. Grote, M.; Williams, I.; Preston, J.; Kemp, S. A practical model for predicting road traffic carbon dioxide emissions using Induc-
tive Loop Detector data. Transp. Res. Part D Transp. Environ. 2018, 63, 809–825. https://doi.org/10.1016/j.trd.2018.06.026.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual au-
thor(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.