A New Approach To Controlling An Active Suspension
A New Approach To Controlling An Active Suspension
A New Approach To Controlling An Active Suspension
Abstract
Active suspension provides better vehicle control and safety on the road with optimal driving comfort compared to pas-
sive suspension. Achieving this requires a good control system that can adapt to any environment. This article uses a
deep reinforcement learning method to develop an optimal neural network that meets the comfort requirements
according to ISO 2631-5 standards. The algorithm trains the agent without any prior knowledge of the environment.
Various simulations were performed, and the results were validated with the literature and the standard until the appro-
priate reward function was found. Simple and consistent road profiles were used while maintaining constant system para-
meters during training. The results show that suspension based on deep reinforcement learning reduces vehicle body
acceleration and improves ride comfort without sacrificing suspension deflection and dynamic tire loading. The control-
ler expects the RMS value of the acceleration to be 0.228 with a minimum overrun of the suspended mass.
Keywords
Active suspension system, vehicle stability, vibration, artificial neural network, reinforcement learning
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work
without further permission provided the original work is attributed as specified on the SAGE and Open Access pages
(https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Advances in Mechanical Engineering
control techniques to have a more optimal custom solu- the road can lead to vibrational movements. The inten-
tion. According to studies, there is no perfect solution, sity of these movements depends on the profile of the
but we can achieve good results. However, the complex- obstacle and the vehicle’s speed.
ity of systems increases, and added external influencing The wear of a suspension part can lead to the failure
factors (like the behavior of the road) can cause the reg- of the shock absorber. This can harm the handling,
ulator to lose its performance. direction, or braking of a car and damage other parts
Hence the refuge toward artificial intelligence, which of the vehicle. The effect that can be noticed is that the
makes it possible to predict the behavior of the shock car begins to bounce, squat, or dive excessively. All
absorber and react accordingly to the state, is a kind of these actions can make driving uncomfortable and dan-
imitation of the behaviors of living beings. For exam- gerous, increase the difficulty of controlling the vehicle,
ple, the reference Salem and Aly2 showed that Fuzzy and the risk of aquaplaning.
Logic, an approach used in AI (artificial intelligence), The solution for detecting damper malfunction is
works better than PID in the daily model grounded on through specific diagnosis while exciting the damper
two types of road conditions. and comparing the measured values with the predicted
Neural networks can be combined with several tra- ones.8 With artificial intelligence, there is a new method
ditional controllers like Proportional integral-derivative of identifying and diagnosing shock absorbers. The
(PID), Linear-quadratic regulator (LQR), etc., and the principle consists of analyzing the squealing noise of
neural network aims to detect road roughness to the shock absorbers.9 This method of early fault detec-
improve the performance of these traditional control- tion can be a solution for the automotive industry.
lers by varying their parameters according to road Comfort is a physiological feeling of well-being asso-
conditions.3,4 ciated with the properties of the driver’s environment in
However, the neural network is rarely used as a con- a moving vehicle.
troller itself. Despite many motivating trials, such as When the whole body is subjected to prolonged
the one that trained its neural network with an optimal vibrations, it has harmful effects on the organs of the
classical controller, the results show that the perfor- human body such as lumbar pain, early degeneration
mance of neural networks exceeds that of traditional of the spine, rapid heartbeat, pelvic osteoarthritis,10
controllers.5 visual disturbances, pain in the neck and shoulders,
Another method of reinforcement learning (or etc.11,12 The driver’s sensitivity to vibrations inherent in
machine learning) has recently been used in various the use of the vehicle depends on the frequency of road
fields such as economics, games, aviation (drone con- conditions.
trol), and even the automotive field. This technique has As an example, studies have been conducted on the
gained momentum and great success in various fields, exposure of an agricultural tractor driver to suspension
as shown in the results obtained from the studies car- for different durations, which showed that 92% of the
ried out, such as in this article,6 which studied suspen- people studied suffered from health problems as a result
sion control in vehicles and trains. The results are of long periods of sitting in a vehicle.13,14
highly motivating, surpassing conventional controllers According to ISO 2631-5,15 the standard methodol-
and even smart controllers like artificial neural network ogy for assessing the exposure of individuals to vibra-
(ANN) and FUZZY logic. The main idea of reinforce- tion containing repeated shocks, the most dangerous
ment learning is to develop the suspension environment vibrations for the human body are in the following fre-
that interacts with the agent throughout the learning quency range [4.15 Hz]16,17:
phase, the objective of which is to maximize the reward
function to achieve the best neural network perfor- Between 4 and 8 Hz: the vibration of the whole
mance. The results obtained by the articles7 are optimal body is significant.
compared to the Linear Quadratic Gaussian (LQG), Between 8 and 15 Hz: the vibrations are trans-
and show an improvement of 62% compared to the mitted to the whole body through the spine.
passive suspension.
This work is a continuation of the research carried Vibration can be rated according to ISO 2631-5,
out by Anis Hamza. which measures weighted root-mean-square (RMS)
acceleration, defined as follows15:
State of the art sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð
1 T 2
Influence of the suspension on the human body aw = a (t)dt = RMS ð1Þ
T 0 w
When driving a vehicle on the road, the wheels encounter
a variety of obstacles with random and variable distribu-
tions, both spatially and temporally. This unevenness in RMS: Acceleration (m=s2 )
Dridi et al. 3
T: Time (s)
aw (t): The weighted (m=s2 )
Figure 2. Mechanical and pneumatic suspension: (a) the high coefficient of friction of these springs limits the suspension travel to
approximately 50–75 mm and (b and c) the travel of these suspensions can go up to 230 mm.24
Active suspension: An active suspension system Several studies have shown that the electromagnetic
is a passive shock absorber equipped with an actuator is more efficient than the hydraulic actuator,
actuator. The role of the actuator is to transmit a given its simplicity of manufacture and dynamic beha-
calculated force (using information collected vior, despite the limitations in terms of structure and
from sensors attached to the vehicle) to suppress complexity of the hydraulic system. The high cost of
the vibrations of the vehicle, ensuring greater manufacturing and system maintenance also poses
comfort and safety for the driver (shown in problems in terms of efficiency.
Figure 3). Unlike other passive and semi-active The linear electromagnetic motor (Figure 3) may be
suspension systems, active suspension provides the right choice as an actuator for an active suspension
greater flexibility to react to unpredictable forces system. Studies have shown that the finite force density
caused by road roughness and vehicle load, even of electromagnetic systems can be as high as 663 kN/m3,
while driving. In theory, all this control freedom compared to hydraulic and pneumatic systems. In addi-
provides better driving comfort and ideal wheel tion, the ability to regenerate energy through the transfer
holding. However, to use this technology effec- of linear motion directly into electrical energy reduces
tively in a real car, we need an intelligent system overall electrical consumption. The linear suspension
that can control it. Unfortunately, active suspen- movement can also be a stored energy source, which
sion remains a complicated and expensive solu- leads to reduced overall consumption. The cylindrical
tion, which explains why it is only used in several shape and the absence of attraction force, and the active
high-end car models or truck ranges.29–31 force generated in real-time, offer potential to the active
suspension. All these advantages improve performance
Several studies have investigated the different types of in terms of comfort, safety, and total control of the
vehicle suspension systems from the point of view of vehicle.33
complexity, efficiency, maintenance, and lifespan. All Active suspension sits at the top of the pyramid of
this study is analyzed and summarized in Table 1. suspension techniques. This adaptive suspension system
is capable of adapting to various changes, such as vehi-
cle loading or different phases of evolution like accel-
eration, braking, and turning. Sensors measure the
inclination and acceleration of the wheels, as well as the
anti-skid and steering wheel angle, among other para-
meters. All this information is analyzed by a computer,
which controls the supply of the cylinder, enabling the
system to compensate in real-time for the body’s move-
ments. This technology can anticipate the roughness of
the road and other potentially dangerous situations,
explaining the manufacturers’ focus on this technology.
1. Linear controllers.
Design implicit; Ease of implementation. Accurate linear structural model; Full state feedback
requirement; Mainly used to control linear structures.
LQR Can deal with various data sources and results. In some cases, it cannot control stability errors.
LQG No full feedback requirement.
PID Can overcome steady state error. Does not resist stress and noise; unable to process
input data and results at the same time.
H‘ Good performance when the system has multiple Design complexity; Requires a well-designed model.
variables.
2. Non-linear controllers.
No specific structural model requirement; Ease of Design complexity; Requires a well-designed model.
implementation; Good performance when the system
has multiple variables.
SMC Less sensitivity to model perturbations and System instability due to the effect of excessive noise.
uncertainties.
MPC Predicts the future behavior of states; Processes Slow tracking.
multiple inputs and outputs simultaneously; Can
overcome noise and disturbances.
3. Learning based controller.
No specific structural model requirement; Improved Complexity of design.
robustness and reliability; Improved control
performance; Used to control linear/non-linear
structures.
ANN A good method of learning and adaptation in the case Requires enough data for training; Poor system
of distributed parallel. stability.
Fuzzy Logic Offers an efficient solution to a complex model. System control and analysis problem; long parameter
adjustment; reliability issue; approximation error.
classes, one of which is based on a model and the other internal and external parameters), model-free algo-
without a model, or we can combine the two. The rithms are a solution for the non-linearities of the
choice of the algorithm is specific to the objective of suspension system. It allows one to actively learn in
our problem. Figure 5 shows the various algorithms. real-time without pre-learning to be functional in an
According to Poole and Mackworth in their book,44 unknown environment.
if one uses model-based learning, it is much more effi-
cient from the point of view of experience and neural
network maturity. However, in return for free learning Choice of algorithm. The agent’s decision-making func-
(without a model), the agent will be confronted with tion (control strategy) represents a mapping of
new experiences, which include a certain inaccuracy situations to actions. Given the importance of choosing
and imprecision of the state. This will be an advantage an algorithm, performance studies of these DRL
throughout the learning phase in the improvement of algorithms are carried out. For example, drone flight
the policy. Several approaches propose to combine control studies include hovering, landing, random
these two techniques.45 waypoints, and target tracking. All DRL algorithms
Another classification of deep reinforcement learn- have their pros and cons.
ing is based on the type of stimulation used to optimize In a study by Mr. William Koch in 2019, learning
the agent’s reaction. A positive model uses favorable algorithms such as Proximal Policy Optimization
stimulation of the system, while a negative model uses (PPO), Q-learning, Deep Q-Network (DQN), and
an undesirable stimulus to distract the agent from a Deep Deterministic Policy Gradient (DDPG) were used
specific action. to ensure stable and fluid autonomous navigation of
Our suspension system is based on the interaction drones. The study analyzed the angular velocity to
between the agent and the environment. The realization reach a target velocity O . The results were compared
of an optimal strategy requires precise parameteriza- with a PID controller (shown in Figure 647).
tion and an environment based on mathematical According to a comparative study of the behavior of
formulas.46 Deep reinforcement learning (DRL) appli- algorithms on the stability of drones, it was noted that
cation in active suspension control addresses several the TRPO and DDPG algorithms have extreme oscilla-
challenges, such as safety and ride comfort. Given the tions. This leads us to discard these types of algorithms
complexity of the suspension model (non-controllable in our suspension system because they generated
Dridi et al. 7
Figure 6. The best learning (RL) reinforcement agent response compared to PID. with a target angular velocity equal to
O = ½2:20, 8:14, 1:81 rad/s.47
instability in both the roll and yaw axis of the drone, system, we will use the Proximal Policy Optimization
resulting in instability during flight. According to this (PPO) algorithm.
study, the PPO algorithm is more precise, faster, and
produces the smoothest navigation with the minimum
Numerical modeling
error. This is why Song et al.48 trained the drone using
the PPO algorithm for AlphaPilot and Airsim drone Researchers often use two different approaches to solve
racing tracks due to its excellent performance and sim- control problems, such as ANN52 and fuzzy logic.53
ple implementation.49–51 So to stabilize our suspension ANN uses interconnected neurons that learn and adjust
8 Advances in Mechanical Engineering
2 3
0
6 1 7
6 7
6 ms 7
B=6
6 0 7
7 ð7Þ
6 7
4 1 5
mus
2 3
0
6 7
6 07
6 7
d=6 07 ð8Þ
6 7
4 bus kus 5
x_ r + xr
mus mus
2 3
xs
6 x_ S 7
x=6 4 xus 5
7 ð9Þ
x_ us
x xus
y= s u = Fc ð10Þ
€xs
2 3 2 3
1 0 1 0 0
C = 4 ks bs ks bs 5, D = 4 1 5 ð11Þ
ms ms ms ms ms
The equations (4) and (5) describes the state space
representation of the vehicle’s suspension system,
where:
A: State matrix.
B: Input matrix.
C: Output matrix.
D: Feedthrough matrix.
u: Input variable representing the actuator force.
y: Output variable representing the displacement
and acceleration of the sprung mass. Figure 8. Reinforcement learning model.
d: Disturbance variable due to changes in road
profile.
to the probability density function in the policy net-
The actuator plays the role of a regulator between work. A series of trajectories ti = si , ai , ri , si + 1 is stored
these components (suspended mass and unsprung mass) in the memory space by repeatedly interacting with the
while minimizing the acceleration of the suspended environment. The value network is updated by impor-
mass and eliminating the effect of wheel travel. Note: if tance sampling, and the control strategy is continuously
we deactivate the actuator, we give it zero force, and we optimized according to the obtained reward value until
return to the behavior of a passive damper. The force the control performance is better and convergence is
of the actuator varies between ½8000N ::8000N . achieved. The optimal neural network is then saved so
that it can be used in a real system. The flowchart of
the active suspension control structure based on the
Numerical simulation with reinforcement learning PPO algorithm is shown in Figure 8.
Our contribution in this work is to combine the PPO During the reinforcement learning phase, a rectangu-
algorithm with active suspension. The global process is lar shape was used for the road to simplify the problem
as follows: the road information is generated according and reduce the complexity of the environment. This cre-
to ISO 8608 and fed into the suspension system model. ated a more controllable and reproducible environment
Meanwhile, the control performance index of the cur- for learning, making it easier to evaluate the model’s
rent time t is calculated and fed into the actor network performance. However, this shape may not represent all
of the PPO algorithm as a state value. Then, the corre- real driving situations, and the model’s performance
sponding actor value is selected as an output according may be limited when facing unexpected or unknown
10 Advances in Mechanical Engineering
situations. Nonetheless, our model will behave accord- put (ajs): Policy output, importance sampling
ing to imposed rules and will be validated in advance (how much weight to give a particular update)
using this learning method with a reproducible road
profile. To increase reliability, we can diversify the Visualization of the neural network update: We move
learning environments so that the model can generalize to the visualization of the neural network update once
to different situations encountered during learning. the trajectory is completed (step 1). All values (log
probabilities, values, and rewards) are recorded. After
1 ifx 2 ½0, L the end of the trajectory, the rewards and benefits are
r(x) = ð12Þ
0 other wise discounted (step 2) (where advantages = discounted
return 2 expected return). In step 3, the loss of each
where x represents the longitudinal position on the step is calculated. Finally, in step 4, we calculate the
road, L represents the total length of the road, and r(x) average of all these losses and update it with gradient
represents the height of the road at position x. The r(x) descent (see Figure 9).
function is defined as being equal to 1 if x is between 0 Algorithm PPO: OpenAI proposed PPO to solve the
and L, and 0 otherwise. problem of the gradient policy’s learning rate conver-
gence. If the step size is too large, the policy diverges,
and if it is too small, the calculation time will be very
Active suspension control based on the proximal long. PPO adds a factor, the probability ratio, to pre-
policy optimization algorithm vent large updates from occurring and makes the policy
The PPO algorithm belongs to the policy gradient (PG) gradient less sensitive.55,56
family. The basic idea is to update the policy to maxi-
^ t , clipðg t (u), 1 e, 1 + eÞA
^ t min g t (u)A
LCLIP (u) = E ^t
mize the probability of actions that provide the greatest
future reward. It does this by running algorithms in the ð14Þ
environment and collecting state changes based on the
agent’s actions. Collections of these interactions are
called trajectories. Once one or more trajectories are L: Loss function.
captured, the algorithm examines each step, verifies ^ t : Empirical average over a nite batch of
E
whether the chosen action yields a positive or negative samples.
reward, and updates the policy. The environment rep- ^ t : Advantage function, which is the difference
A
resents the physical behavior of our suspension system, between Qpu and V pu
with its state representing the accelerations and displa- e: Constant, usually equals 0.2.
cements of the suspended and unsuspended mass, and
its action being the value of the force that must be The first term inside the min is LCPI :
exerted.
Trajectories are sampled through step-by-step inter- pu ðat jst Þ ^
CPI
L ^t
(u) = E At = E ^t
^ t g t (u)A ð15Þ
action with the environment. To perform a single step, puold ðat jst Þ
an agent selects an action and passes it to the
environment. pu ðat jst Þ
g t (u) = ð16Þ
Agent Update: Policy update: The basis of the PG puold ðat jst Þ
algorithm is the formula for updating the weights of a
Moreover, the second term clipðg t (u), 1 e, 1 + eÞ,
network (Formula 13).
The gradient is the positive or negative direction of modifies the surrogate objective by clipping the prob-
the weights in which the policy change will make ability ratio.
actions more likely in a given state. ^ pu ðat jst Þ = Qpu ðat jst Þ V pu ðat jst Þ
A t ð17Þ
rput (ajs)
ut + 1 = ut + aA(ajs) ð13Þ
put (ajs)
Reward function. The reward function plays a crucial
role throughout the learning process as it indicates the
ut + 1 : New weights of a network quality of the action undertaken by the agent after
ut : Current weights transitioning to the next state, with quality varying
a: Learning rate between positive or negative values. The reward is then
rput (ajs): Gradient of the current network transferred to the neural network via back-propaga-
A(ajs): Advantage function (adjustment of direc- tion, and the optimization method adjusts the neural
tion and amount of weight ) network parameters to minimize errors. During the
Dridi et al. 11
Table 3. Benchmarking reward functions for performance optimization of an active suspension system.
Table 4. Parameters for the quarter active suspension. ten-layered neural networks, respectively. The learning
rates of the actor network aA and the critic network aB
Settings were set to 5e 5 and 1e 4, respectively. The discount
Symbol Description Values factor g was set to 0:98. The clipping parameter e was
set to 0:2, and the GAE l parameter was set to 0:9. The
ms Sprung mass 300kg time steps were fixed at 1e7. The specific hyperpara-
ks Stiffness of the car body 40, 000N=m meters were defined as shown in Table 5.
mus Unsprung mass 30kg
C Suspension dumping 1385N:S=m The results obtained from the intelligent controller
kus Stiffness of the tire 22, 000N=m with a reward function (equation (18)) after the training
U Control force ½8000::8000N phase are compared with the passive suspension. The
results show that we obtained a reduction of 53.24%
and 35.60%, respectively, in acceleration and displace-
ment compared to the passive suspension. The reduc-
Result and discussion
tion results of the chosen reward function are shown in
In this article, we used Google Colab Pro to build a 1/4 Table 6 and Figure 11.
active suspension system model, and the suspension Figure 12 presents the simulation results of the
parameters are shown in Table 4. The actor network and acceleration of the unsprung mass compared to the
the critic network of the PPO algorithm were eight- and passive suspension. The proposed method significantly
Dridi et al. 13
Table 5. Hyperparameters for the proximal policy optimization Table 7. System settings.
structure.
Acceleration RMS value (m=s2 ) Comfort reaction
Settings
<0:315 Not uncomfortable
Symbol Description Values 0:315;0:63 A little uncomfortable
0:5;1 Fairly uncomfortable
aA Actor learning rate 5e 5 0:8;1:6 Uncomfortable
aB Critic learning rate 1e 4 1:25;2:5 Very uncomfortable
g Discount factor 0:98 ø2 Extremely uncomfortable
e Clip parameter 0:2
l GAE parameter 0:9
Times steps 1e7 The objective is to minimize acceleration while main-
taining the wheel’s grip on the road and keeping up
with the suspended mass.
Table 6. Reduced overshoot values for stepped road entry. To optimize the results, we adjusted the weights of
each constraint so that the agent maximizes its reward,
Parameters Passive Active Reduction
passiveactive
which generates stability of the suspension system with
passive 100% optimal values.
After several tests, we noticed that the agent con-
Acceleration 7:97(m=s2 ) 3:72(m=s2 ) 53:24%
(suspended mass)
trolled our suspension system according to the latter
Displacement 1:28(cm) 0:82(cm) 35:60% reward criteria. The system exhibited a uniform beha-
(suspended mass) vior of displacement of the unsprung mass in the loga-
displacement 1:19(cm) Yes – rithmic form with RMS, which can reach up to
(unsprung mass) 0.180 m=s2 . This prompted us to rectify the reward for-
mula to follow a logarithmic form while incorporating
the exponential function. The result is motivating and
improves the acceleration of the vehicle body, leading can be improved later, with an RMS of 0.228 m=s2 and
to more stability. a minimum overrun of the sprung and unsprung mass.
However, evaluating comfort solely by the accelera- The integration of the force exerted on the actuator
tion limit method does not capture the behavior of the U as a criterion in the reward formula aims to minimize
suspension throughout the journey. Therefore, accord- the phenomenon of wheel deflection. The result reaches
ing to the ISO 2631 standard,15 one can use the Root its target toward more stability of the wheel with an
Mean Square (RMS) acceleration method, which calcu- RMS of the order of 0.250 m=s2 and an overrun of the
lates the average acceleration over a certain period of suspension mass of 0.04 cm.
time.60 The comfort reference values are shown in Several articles have used the reinforcement learning
Table 7. method, for example, that of Fares and Younes,54 who
Several reward functions were tested during the used the critical actor algorithm, but the results
neural network optimization phase. To evaluate the obtained with the PPO algorithm are more efficient.
performance of each model, the hyperparameters for Comparison between the results:
the proximal policy optimization algorithm were set as Critical actor algorithm: profile of the road used
shown in Table 5. After each test, the root-mean-square during the learning phase (square signal of
(RMS) values were calculated and evaluated according amplitude 0:02m with a period of 1:5s), accelera-
to ISO 2631-5. The results obtained showed that the tion of the average suspended mass 4m=s2
reward function played a guiding role throughout the PPO algorithm used by our model: profile of the
learning phase. As described later, our objective was to road used during the learning phase (square signal
work on several aspects, such as the stability of the sus- of amplitude 0:08m of period 1:5s), acceleration
pended mass and the dynamics of wheel movement, to of the suspended mass does not exceed 3:8m=s2 .
ensure better grip with the road.
According to the reward, formulas studied, in which
It is clearly concluded that the PPO algorithm with
we gradually introduced the constraints such as:
the reward function4 listed in the Table 8 is more effi-
cient than the critical actor algorithm.
The distance between the road condition and the
unsprung mass (Xus Xr )
The distance between the state of the road and Simulation of different road levels
the suspended mass (Xs Xr ) To validate the control performance and robustness of
The acceleration of the suspended mass (X€s ). the PPO-based active suspension system, road profiles
14 Advances in Mechanical Engineering
w
were created in accordance with ISO 8608,61,62 and the n
G q ðn Þ = G q ðn 0 Þ ð19Þ
road characteristics were described using the special n0
Gq (n) index data. The road roughness can be classified
into eight classes, as shown in Table 9. Figure 13 depicts knowing that :
the road power spectrum, which includes the upper and
lower limits as well as the average values of the road Gq (n): Road unevenness coefficient (the road
spectrum for each road class. The formula for the spec- power spectral density value at the reference spa-
tral density of road power is given by Gq (n): tial frequency).
Dridi et al. 15
Reward function €s
RMS X maxðXs Þ maxðXr Þ ðcmÞ minðXs Þ minðXr Þ ðcmÞ
Table 9. Road classification according to ISO 8608 [1] dynamic stability characteristics of the active suspension
with road unevenness coefficient Gd (n0 )(106 m3 =cycles), with delay have been improved. Therefore, we can con-
n0 = 0:1cycles=m, and w = 2. clude that the proposed reinforcement controller is capa-
ble of meeting the design and simulation requirements.
Road level Range Geometric To further support our findings, it would be valu-
mean able to conduct physical experiments to validate the
A \32 16 results obtained from the simulations. Furthermore,
B 32–128 64 future work can focus on testing the proposed control-
C 128–512 512 ler on other road conditions to investigate its perfor-
D 512–2048 1024 mance in different scenarios.
E 2048–8192 4096
F 8192–32,768 16,384
G 32,768–131,072 65,536 Evaluation
H .131,072 262,144
Several simulations were conducted to evaluate and
compare the results obtained. The simulations were
conducted in the OpenAI Gym environment, and the
n: Spatial frequency (reciprocal of the same set of hyperparameters were used throughout all
wavelength). phases of the simulations to ensure consistency. To
n0 : Reference spatial frequency (generally conduct these simulations, we utilized the cloud simula-
selected as 0:1m1 ). tor ‘‘Google Colab Pro,’’ which is based on Jupyter
v: Frequency index. Notebook and offers a high-performance computing
environment with 32 GB of RAM and either a Tesla
Throughout the simulation, the same conditions T4 GPU or NVIDIA Tesla P100.
apply. Specifically, for a vehicle speed of 20 m/s, the Figure 16 shows the convergence process of the algo-
result is shown in the Figure 14. rithm, which appeared to stabilize at around 0.4e7 epi-
sodes. The simulation process lasted approximately 6 h
and required more than 30 trials to find optimal con-
Results of the acceleration of the suspended mass vergence. Despite the long time it took to find optimal
After simulating the passive and active suspensions under convergence, the results obtained from the simulations
class D road conditions, the results showed a significant were reliable and justified the time and effort spent
decrease in acceleration, which could reach up to 62.5%, conducting them.
as depicted in Figure 15. This improvement effectively Jin et al.64–66 used the Active Suspension of In-
enhances ride comfort and stability. Moreover, the Wheel-Drive Electric Vehicles (IWMD-EV) solution.
16 Advances in Mechanical Engineering
Figure 13. The various PSDs of different classes utilized to stimulate the model in this paper.63
In order to achieve better driving comfort and reduce suspension settings for a given driving scenario. The
the force applied to the motor bearing in the wheel, a controller receives feedback from sensors on the vehi-
robust H‘ dynamic output feedback controller64,66 and cle’s motion and makes adjustments to the suspension
m-Synthesis Methodology65 have been derived so that settings to optimize the ride quality and handling.
the closed loop system has asymptotic stability and Both approaches have their advantages and disad-
simultaneously satisfies stress performance such as vantages. A dedicated active suspension system pro-
road holding, suspension travel, dynamic load applied vides more precise control over the suspension settings
to bearings, and limitation of actuators. Finally, the and can respond more quickly to changes in driving
simulation results demonstrated that the proposed con- conditions. However, it requires more complex hard-
troller offered better suspension performance despite ware and software, which can increase the cost and
the faults of the actuators and the time delay. weight of the vehicle. A reinforcement learning control-
If we now compare our solution, the RL controller, ler, on the other hand, can adapt to different driving
with the work of Jin et al., we find that the results at the scenarios and learn from experience, which can lead to
RMS level of the acceleration of the body are very better overall performance. However, it requires a sig-
close, approximately 0:1m=s2 . Active suspension is a nificant amount of computing power and can be diffi-
technology that adjusts the suspension system of a vehi- cult to train and optimize.
cle in real-time to optimize the ride quality, handling, In summary, both approaches have their strengths
and stability of the vehicle. In the context of in-wheel- and weaknesses, and the choice between them depends
drive electric vehicles (IWMD-EVs), active suspension on the specific requirements of the IWMD-EV and the
can play an important role in improving the overall per- preferences of the designer.
formance of the vehicle. There are two main approaches Analysis of the results and performance of our rein-
to implementing active suspension in IWMD-EVs: forcement learning-based controller for active suspension
using a dedicated active suspension system or using a compared to recent works by Hamza and Ben Yahia1
reinforcement learning controller. and Swethamarai and Lakshmi67 proposing controllers
A dedicated active suspension system typically con- such as Artificial Neural Network (ANN), Proportional,
sists of a set of sensors, actuators, and a control unit. Integral, and Derivative (PID), Fractional Order PID
The sensors measure the vehicle’s motion and the road (FOPID), and Adaptive Fuzzy tuned Fractional Order
conditions, while the actuators adjust the suspension PID (AFFOPID) controllers, demonstrates the effective-
components (such as dampers, springs, and anti-roll ness and efficiency of our proposed method in this paper.
bars) to optimize the ride quality and handling. On the The results are listed in Table 10.
other hand, a reinforcement learning controller uses The superior performance of the reinforcement
machine learning algorithms to learn the optimal learning controller significantly outperforms that of
Dridi et al. 17
Figure 14. Movement of the suspended mass on different types of road (A, B, C, D, E, and F): (a) class A, (b) class B, (c) class C,
(d) class D, (e) class E, and (f) class F.
j Very bad suspension behavior (a great risk to the health of the driver, and poor adhesion of the vehicle with the road)
j The suspension control with PID minimizes the acceleration of the suspended mass, the results vary between fair and average
j The suspension control with FOPID to exceed the medium to above average
j The suspension control with AFFOPID surpasses from average to good
j The control with the neural network belongs to the family of good suspension controllers that ensure good driving comfort
j The controller by reinforcement goes from good to excellent comfort and more driving stability
Deterministic Policy Gradient (DDPG). Furthermore, 11. INRS. Vibrations, plein le dos. Edition INRS, ED 864.
continuous learning under more complex perturbations 2001.
could enhance reinforcement learning. We should also 12. Ning D, Du H, Sun S, et al. An innovative two-layer mul-
consider optimizing the number of epochs and the learn- tiple-DOF seat suspension for vehicle whole body vibra-
ing rate to provide further neural network optimization. tion control. IEEE ASME Trans Mechatron 2018; 23:
1787–1799.
13. Bovenzi M and Betta A. Low-back disorders in agricul-
tural tractor drivers exposed to whole-body vibration and
Declaration of conflicting interests
postural stress. Appl Ergon 1994; 25: 231–241.
The author(s) declared no potential conflicts of interest with 14. Bovenzi M, Pinto I and Stacchini N. Low back pain in
respect to the research, authorship, and/or publication of this port machinery operators. J Sound Vib 2002; 253: 3–20.
article. 15. BS ISO 2631-5:2018. Mechanical vibration and shock—
evaluation of human exposure to whole-body vibration.
Funding International Standard. ISO, 2018.
16. Kuznetsov A, Mammadov M, Sultan I, et al. Optimiza-
The author(s) received no financial support for the research, tion of a quarter-car suspension model coupled with the
authorship, and/or publication of this article. driver biomechanical effects. J Sound Vib 2011; 330:
2937–2946.
ORCID iDs 17. De la Hoz-Torres M, Aguilar-Aguilera A, Martı́nez-
Issam Dridi https://orcid.org/0000-0003-2185-4390 Aires M, et al. A comparison of ISO 2631-5:2004 and
Anis Hamza https://orcid.org/0000-0003-4283-5236 ISO 2631-5:2018 standards for whole-body vibrations
exposure: a case study. In: Arezes P, Santos Baptista J
and Barroso MP (eds) Occupational and environmental
References safety and health. Studies in systems, decision and control.
1. Hamza A and Ben Yahia N. Heavy trucks with intelli- Cham: Springer, 2019, pp.711–719.
gent control of active suspension based on artificial 18. Chumchan C and Tontiwattanakul K. Health risk and
neural networks. Proc IMechE, Part I: J Systems and ride comfort assessment by ISO2631 of an ambulance. In:
Control Engineering 2021; 235: 952–969. 2019 5th international conference on engineering, applied
2. Salem M and Aly A. Fuzzy control of a quarter-car sus- sciences and technology (ICEAST), Luang Prabang,
pension system. Int J Comput Inf Eng 2009; 3: Laos, 2 July 2019, pp.1–4. New York: IEEE.
1276–1281. 19. Ghazaly N and Moaaz A. The future development and
3. Qin Y, Xiang C, Wang Z, et al. Road excitation classifi- analysis of vehicle active suspension system. IOSR J
cation for semi-active suspension system based on system Mech Civil Eng 2014; 11: 19–25.
response. J Vib Control 2018; 24: 2732–2748. 20. Goodarzi A and Khajepour A. Vehicle suspension system
4. Zhao F, Dong M, Qin Y, et al. Adaptive neural networks technology and design. Synth Lect Adv Automot Technol
control for camera stabilization with active suspension 2017; 1: i–77.
system. Adv Mech Eng 2015; 7: 1687814015599926. 21. Zheng P, Wang R, Gao J, et al. Parameter optimisation
5. Konoiko A, Kadhem A, Saiful I, et al. Deep learning of power regeneration on the hydraulic electric regenera-
framework for controlling an active suspension system. J tive shock absorber system. Shock and Vibration 2019;
Vib Control 2019; 25: 2316–2329. (2019).
6. Papadimitrakis M and Alexandridis A. Active vehicle 22. Bouvin J. Vers une version alternative la suspension
suspension control using road preview model predictive CRONE Hydractive. Doctoral Dissertation, Bordeaux,
control and radial basis function networks. Appl Soft 2019.
Comput 2022; 120: 108646. 23. Khemoudj O. Dveloppement d’une mthode de pesage
7. Chen H, Lin Y and Chang Y. An actor-critic reinforce- embarqu pour poids lourd. Doctoral Dissertation, Valen-
ment learning control approach for discrete-time linear ciennes, 2010.
system with uncertainty. In: Proceedings of the 2018 24. Gabriel.com. Amortisseurs pour poids lourds, semi-
international automatic control conference (CACS), remorques et autobus, https://gabriel.com/wp-content/
Taoyuan, Taiwan, 2018, pp.1–5. New York: IEEE. uploads/2011/06/Gabriel_Section_Fraincaise_FRE.pdf
8. Ragot J, Maquin D, Adrot O, et al. Détection de dys- (2011).
fonctionnement d’un systéme amortisseur de véhicule 25. Riduan A, Tamaldin N, Sudrajat A, et al. Review on
automobile. In: 5ème Congrès International Pluridiscipli- active suspension system. SHS Web Conf 2018; 49: 02008.
naire Qualite´ et Sûrete´ de Fonctionnement, Qualita 2003, 26. Sharp R and Hassan S. The relative performance capabil-
Nancy, France, 2003, p.3. ities of passive, active and semi-active car suspension sys-
9. Huang H, Huang X, Wu J, et al. Novel method for iden- tems. Proc IMechE, Part D: Transport Engineering 1986;
tifying and diagnosing electric vehicle shock absorber 200: 219–228.
squeak noise based on a DNN. Mech Syst Signal Process 27. Inoue H, Yamaguchi T and Kondo T. Damping force
2019; 124: 439–458. generation system and vehicle suspension system con-
10. Axmacher B and Lindberg H. Coxarthrosis in farmers. structed by including the same. Google Patents. United
Clin Orthop Relat Res 1993; 287: 82–86. States patent US 7,722,056, 2010.
20 Advances in Mechanical Engineering
28. Soliman A and Kaldas M. Semi-active suspension sys- 48. Song Y, Steinweg M, Kaufmann E, et al. Autonomous
tems from research to mass-market: a review. J Low Freq drone racing with deep reinforcement learning. In: 2021
Noise Vib Act Control 2019; 40: 146134841987639. IEEE/RSJ International Conference on Intelligent Robots
29. Shafie A, Bello M and Khan R. Active vehicle suspension and Systems (IROS), Detroit, MI, USA, October 1–5
control using electro hydraulic actuator on rough road 2021, pp. 1205-1212. New York: IEEE.
terrain. J Adv Res Appl Mech 2015; 9: 15–30. 49. Li B and Wu Y. Path planning for UAV ground target
30. Martins I, Esteves M, da Silva F, et al. Electromagnetic tracking via deep reinforcement learning. IEEE Access
hybrid active-passive vehicle suspension system. In: Pro- 2020; 8: 29064–29074.
ceedings of the IEEE 49th vehicular technology conference, 50. Koch W, Mancuso R, West R, et al. Reinforcement
VTC 1999, Houston, TX, USA, 16–20 May 1999. learning for UAV attitude control. ACM Trans Cyber
31. Heidarian A and Wang X. Review on seat suspension sys- Phys Syst 2019; 3: 1–21.
tem technology development. Appl Sci 2019; 9: 2834. 51. Bøhn E, Coates E, Moe S, et al. Deep reinforcement
32. Jiregna I and Sirata G. A review of the vehicle suspension learning attitude control of fixed-wing UAVs using prox-
system. J Mech Energy Eng 2020; 4: 109–114. imal policy optimization. In: Proceedings of the 2019
33. Gysen B, van der Sande T, Paulides J, et al. Efficiency of international conference on unmanned aircraft systems,
a regenerative direct-drive electromagnetic active suspen- ICUAS 2019, Atlanta, GA, USA, 2019, pp.523–533.
sion. IEEE Trans Veh Technol 2014; 60: 1384–1393. New York: IEEE.
34. Kuber C. Modelling simulation and control of an active 52. Hamza A and Ben Yahia N. Artificial neural networks
suspension system. Int J Mech Eng Technol 2014; 5: controller of active suspension for ambulance based on
66–75. ISO standards. Proc IMechE, Part D: J Automobile Engi-
35. Ghrle C, Schindler A, Wagner A, et al. Road profile esti- neering 2023; 237: 34–47.
mation and preview control for low-bandwidth active sus- 53. Deng Y, Gong M and Ni T. Double-channel event-
pension systems. IEEE ASME Trans Mechatron 2014; 20: triggered adaptive optimal control of active suspension
2299–2310. systems. Nonlinear Dyn 2022; 108: 3435–3448.
36. Abdul Ali A, Abdul Razak F and Hayima N. A review 54. Fares A and Younes A. Online reinforcement learning-
on the AC servo motor control systems. ELEKTRIKA J based control of an active suspension system using the
Electr Eng 2020; 19: 22–39. actor critic approach. Appl Sci 2020; 10: 8060.
37. Amin R, Aijun L and Shamshirband S. A review of quad- 55. Schulman J, Wolski F, Dhariwal P, et al. Proximal policy
rotor UAV: control methodologies and performance eva- optimization algorithms. ArXiv:170706247v2 [cs.LG],
luation. Int J Autom Control 2016; 10: 87–103. 2017.
38. Li Z and Adeli H. Control methodologies for vibration 56. Hsu C, Mendler-Dünner C and Hardt M. Revisiting
control of smart civil and mechanical structures. Expert design choices in proximal policy optimization.
Syst 2018; 35: e12354. ArXiv:2009.10897v1 [cs.LG], 2020.
39. Roy R, Islam M, Sadman N, et al. A review on compara- 57. Han S and Liang T. Reinforcement-learning-based vibra-
tive remarks, performance evaluation and improvement tion control for a vehicle semi-active suspension system
strategies of quadrotor controllers. Technologies 2021; via the PPO approach. Appl Sci 2022; 12: 3078.
9: 37. 58. Li Z, Chu T and Kalabić U. Dynamics-enabled safe deep
40. Sutton R and Barto A. Reinforcement learning: an intro- reinforcement learning: case study on active suspension
duction. IEEE Trans Neural Netw 1998; 9: 1054. control. In: 2019 IEEE conference on control technology
41. Nguyen N, Nguyen T and Nahavandi S. System design and applications (CCTA), Hong Kong, China, 2019,
perspective for human-level agents using deep reinforce- pp.585–591. New York: IEEE.
ment learning: a survey. IEEE Access 2017; 5: 59. Ming L, Yibin L, Xuewen R, et al. Semi-active suspen-
27091–27102. sion control based on deep reinforcement learning. IEEE
42. Ding Z, Huang Y, Yuan H, et al. Introduction to reinfor- Access 2020; 8: 9978–9986.
cement learning. In: Dong H, Ding Z and Zhang S (eds) 60. Kumar K, Pal S and Sethi R. Objective evaluation of ride
Deep reinforcement learning: fundamentals, research and quality of road vehicles. SAE technical paper 990055,
applications. Singapore: Springer, 2020, pp.47–123. 1999.
43. Azar A, Koubaa A, Mohamed N, et al. Drone deep rein- 61. Mechanical Vibration. Road surface profiles–reporting of
forcement learning: a review. Electronics 2021; 10: 999. measured data. The International Organisation for Stan-
44. Poole D and Mackworth A. Artificial intelligence: founda- dardisation ISO. 8608, 1995.
tions of computational agents. New York, NY: Cambridge 62. Múčka P. Simulated road profiles according to ISO 8608
University Press, 2010. in vibration analysis. J Test Eval 2017; 46: 405–418.
45. Francxois-Lavet V, Henderson P, Islam R, et al. An intro- 63. Dridi I, Hamza A and Ben Yahia N. Control of an active
duction to deep reinforcement learning. Found Trends suspension system based on long short-term memory
Mach Learn 2018; 11: 219–354. (LSTM) learning. Adv Mech Eng 2023; 15: 1687813
46. Huang H, Yang Y, Wang H, et al. Deep reinforcement 2231156789.
learning for UAV navigation through massive MIMO 64. Jin X, Wang J, He X, et al. Improving vibration perfor-
technique. IEEE Trans Veh Technol 2019; 69: 1117–1121. mance of electric vehicles based on in-wheel motor-active
47. Koch W. Flight controller synthesis via deep reinforce- suspension system via robust finite frequency control.
ment learning. ArXiv:1909.06493, 2019. IEEE Trans Intell Transp Syst 2023; 24: 1631–1643.
Dridi et al. 21
65. Jin X, Wang J, Yan Z, et al. Robust vibration control for in-wheel-drive electric vehicles. Math Probl Eng 2022;
active suspension system of in-wheel-motor-driven elec- 2022: 4628539.
tric vehicle via m-synthesis methodology. J Dyn Syst 67. Swethamarai P and Lakshmi P. Adaptive-fuzzy frac-
Meas Control 2022; 144: 051007. tional order PID controller-based active suspension for
66. Jin X, Wang J and Yang J. Development of robust guar- vibration control. IETE J Res 2022; 68: 3487–3502.
anteed cost mixed control system for active suspension of