A New Approach To Controlling An Active Suspension

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Research Article

Advances in Mechanical Engineering


2023, Vol. 15(6) 1–21
Ó The Author(s) 2023
A new approach to controlling an DOI: 10.1177/16878132231180480
journals.sagepub.com/home/ade
active suspension system based on
reinforcement learning

Issam Dridi , Anis Hamza and Noureddine Ben Yahia

Abstract
Active suspension provides better vehicle control and safety on the road with optimal driving comfort compared to pas-
sive suspension. Achieving this requires a good control system that can adapt to any environment. This article uses a
deep reinforcement learning method to develop an optimal neural network that meets the comfort requirements
according to ISO 2631-5 standards. The algorithm trains the agent without any prior knowledge of the environment.
Various simulations were performed, and the results were validated with the literature and the standard until the appro-
priate reward function was found. Simple and consistent road profiles were used while maintaining constant system para-
meters during training. The results show that suspension based on deep reinforcement learning reduces vehicle body
acceleration and improves ride comfort without sacrificing suspension deflection and dynamic tire loading. The control-
ler expects the RMS value of the acceleration to be 0.228 with a minimum overrun of the suspended mass.

Keywords
Active suspension system, vehicle stability, vibration, artificial neural network, reinforcement learning

Date received: 27 February 2023; accepted: 19 May 2023

Handling Editor: Chenhui Liang

Introduction suspension system that has highly modular dynamic


behavior, which is the active suspension. In this suspen-
Passenger safety and comfort, as well as the safety of sion system, we find a spring, a shock absorber, and an
the load carried by the vehicle, are some of the greatest actuator that exerts an adaptive counter-force to meet
concerns of researchers and manufacturers of heavy the requirements of stability and safety.
goods vehicles. This is evident from the amount of The actuator control is the most delicate phase in an
research published throughout this decade.1 active suspension system, which requires a good servo
The shock absorber is the centerpiece that plays the system, such as commonly used controllers like PID,
role of a conductor in the suspension system, which LQR, etc. Engineers are not strictly limited to using one
tries to harmonize movements to maintain the maxi- of these types, and they are free to develop or merge
mum possible stability without sacrificing safety. There
are three types of suspension systems: active, semi-
active, and passive. Each of these systems offers benefits Mechanical, Production and Energy Laboratory (LMPE), National School
and drawbacks. For instance, the passive suspension of Engineering of Tunis (ENSIT), University of Tunis, Tunis, Tunisia
offers appropriate performance in a restricted frequency
range, while the semi-active mechanical system changes Corresponding author:
Issam Dridi, Mechanical, Production and Energy Laboratory (LMPE),
the coefficient by changing the viscosity of the shock National School of Engineering of Tunis (ENSIT), University of Tunis,
absorber, making it efficient in a wide frequency band. Avenue Taha Hussein, Montfleury, Tunis 1008, Tunisia.
However, the technique is limited, hence the need for a Email: [email protected]

Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work
without further permission provided the original work is attributed as specified on the SAGE and Open Access pages
(https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Advances in Mechanical Engineering

control techniques to have a more optimal custom solu- the road can lead to vibrational movements. The inten-
tion. According to studies, there is no perfect solution, sity of these movements depends on the profile of the
but we can achieve good results. However, the complex- obstacle and the vehicle’s speed.
ity of systems increases, and added external influencing The wear of a suspension part can lead to the failure
factors (like the behavior of the road) can cause the reg- of the shock absorber. This can harm the handling,
ulator to lose its performance. direction, or braking of a car and damage other parts
Hence the refuge toward artificial intelligence, which of the vehicle. The effect that can be noticed is that the
makes it possible to predict the behavior of the shock car begins to bounce, squat, or dive excessively. All
absorber and react accordingly to the state, is a kind of these actions can make driving uncomfortable and dan-
imitation of the behaviors of living beings. For exam- gerous, increase the difficulty of controlling the vehicle,
ple, the reference Salem and Aly2 showed that Fuzzy and the risk of aquaplaning.
Logic, an approach used in AI (artificial intelligence), The solution for detecting damper malfunction is
works better than PID in the daily model grounded on through specific diagnosis while exciting the damper
two types of road conditions. and comparing the measured values with the predicted
Neural networks can be combined with several tra- ones.8 With artificial intelligence, there is a new method
ditional controllers like Proportional integral-derivative of identifying and diagnosing shock absorbers. The
(PID), Linear-quadratic regulator (LQR), etc., and the principle consists of analyzing the squealing noise of
neural network aims to detect road roughness to the shock absorbers.9 This method of early fault detec-
improve the performance of these traditional control- tion can be a solution for the automotive industry.
lers by varying their parameters according to road Comfort is a physiological feeling of well-being asso-
conditions.3,4 ciated with the properties of the driver’s environment in
However, the neural network is rarely used as a con- a moving vehicle.
troller itself. Despite many motivating trials, such as When the whole body is subjected to prolonged
the one that trained its neural network with an optimal vibrations, it has harmful effects on the organs of the
classical controller, the results show that the perfor- human body such as lumbar pain, early degeneration
mance of neural networks exceeds that of traditional of the spine, rapid heartbeat, pelvic osteoarthritis,10
controllers.5 visual disturbances, pain in the neck and shoulders,
Another method of reinforcement learning (or etc.11,12 The driver’s sensitivity to vibrations inherent in
machine learning) has recently been used in various the use of the vehicle depends on the frequency of road
fields such as economics, games, aviation (drone con- conditions.
trol), and even the automotive field. This technique has As an example, studies have been conducted on the
gained momentum and great success in various fields, exposure of an agricultural tractor driver to suspension
as shown in the results obtained from the studies car- for different durations, which showed that 92% of the
ried out, such as in this article,6 which studied suspen- people studied suffered from health problems as a result
sion control in vehicles and trains. The results are of long periods of sitting in a vehicle.13,14
highly motivating, surpassing conventional controllers According to ISO 2631-5,15 the standard methodol-
and even smart controllers like artificial neural network ogy for assessing the exposure of individuals to vibra-
(ANN) and FUZZY logic. The main idea of reinforce- tion containing repeated shocks, the most dangerous
ment learning is to develop the suspension environment vibrations for the human body are in the following fre-
that interacts with the agent throughout the learning quency range [4.15 Hz]16,17:
phase, the objective of which is to maximize the reward
function to achieve the best neural network perfor-  Between 4 and 8 Hz: the vibration of the whole
mance. The results obtained by the articles7 are optimal body is significant.
compared to the Linear Quadratic Gaussian (LQG),  Between 8 and 15 Hz: the vibrations are trans-
and show an improvement of 62% compared to the mitted to the whole body through the spine.
passive suspension.
This work is a continuation of the research carried Vibration can be rated according to ISO 2631-5,
out by Anis Hamza. which measures weighted root-mean-square (RMS)
acceleration, defined as follows15:
State of the art sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð
1 T 2
Influence of the suspension on the human body aw = a (t)dt = RMS ð1Þ
T 0 w
When driving a vehicle on the road, the wheels encounter
a variety of obstacles with random and variable distribu-
tions, both spatially and temporally. This unevenness in  RMS: Acceleration (m=s2 )
Dridi et al. 3

 T: Time (s)
 aw (t): The weighted (m=s2 )

The ISO 2631 standard is devoted to the assessment


of health risks and provides a guideline on comfort.
The health alert diagram (Figure 1) shows the health
alert zone (which is in red) is a health risk zone that is a
function of the duration of exposure to vibration.18

Suspension system study


The suspension is a component that connects the vehi-
cle with the wheels, and it ensures relative movement
between them. The device acts as a vibration insulator
to protect the vehicle and provide ride comfort while
maintaining tire contact with the road, so they have a Figure 1. Health guidance caution zone (HGCZ) from (ISO
grip for rolling.19 The suspension system generally con- 2631-5).
sists of two main elements:
the coefficient of friction and better control of suspen-
Springs: Support the vehicle’s weight and allow up
and down movement to absorb road shock. Its mis- sion oscillation. This requires working on a new perfor-
sion is to transform kinetic energy into potential mance standard for shock absorbers.
energy; or vice versa. There are several types of A new generation of suspensions with a reduced coef-
springs (pneumatic, coil, torsion bar, and leaf). ficient of friction, such as air suspensions or parabolic
There are several types of springs, such as multi-leaf leaf suspensions, have a very low damping coefficient
springs, which have very limited travel of 50–75 mm and significant vertical travel of up to 230 mm. These
with a high coefficient of friction, which dampens features optimize vehicle control and performance.
the oscillations of the suspension and lightens the Coil springs, also called coil springs, have supplanted
task of the shock absorber. leaf suspensions in passenger vehicles, offering optimal
Shock Absorbers: Control spring oscillations, help- performance adapted to the vehicle. They can be used
ing to maintain vehicle control over bumps and in low-tonnage trucks. Pneumatic suspensions (bellows
corners. The damper is designed to dissipate the filled with a compressible fluid, the air is used for heavy
kinetic energy produced by the various modes of vehicles) replace leaf springs. These air bellows can
excitation. Such as single tube, twin-tube, compen- work at a constant volume or air mass (injection of air
sating chamber, and gas chamber shock absorbers.20 into the bellows by a compressor) or static charge22,23
Automotive suspension vibration energy can be (Figure 2).
recovered using regenerative shock absorbers to con- We can classify the suspension into three different
vert vibration energy into electrical energy, effec- types:
tively reducing vehicle fuel consumption. According
to comparative studies, it has been found that dam-  Passive suspension: This type of suspension does
pers with regenerative behavior are more reliable not ensure vehicle stability.25 The dynamic beha-
than others, specifically the hydroelectric damper.21 vior of the system changes as a result of variations
Tire: Its primary function is grip, but it also plays a in spring stiffness and damping coefficient.26,27
role comparable to the shock absorber by deforming.  Semi-active suspension: The main idea behind
It is an essential component in controlling the beha- semi-active control is to change the characteris-
vior of a vehicle. It transmits the longitudinal forces tics of energy dissipation devices in real-time,
necessary for acceleration and braking and the lat- with minimal energy input. The principle of oper-
eral forces for turning. ation of a semi-active suspension system is to
modify the damping coefficient, which requires
More precisely, the suspension of heavy trucks has only a reduced energy source. For example,
seen a significant evolution toward high performance. Pierce showed that changing the piston orifice
Today, the demand for more efficient shock absorbers diameter is sufficient, or another type of semi-
meets safety and comfort requirements. active rheological magnet (MR) damper that uses
Reducing driver fatigue by isolating the vehicle’s a magnetic fluid that interacts with the magnetic
components, as well as its loading from vibrations field produced by the magnetic coil to change the
excited by bad road conditions, requires a reduction in oil flow and break the piston movement.28
4 Advances in Mechanical Engineering

Figure 2. Mechanical and pneumatic suspension: (a) the high coefficient of friction of these springs limits the suspension travel to
approximately 50–75 mm and (b and c) the travel of these suspensions can go up to 230 mm.24

 Active suspension: An active suspension system Several studies have shown that the electromagnetic
is a passive shock absorber equipped with an actuator is more efficient than the hydraulic actuator,
actuator. The role of the actuator is to transmit a given its simplicity of manufacture and dynamic beha-
calculated force (using information collected vior, despite the limitations in terms of structure and
from sensors attached to the vehicle) to suppress complexity of the hydraulic system. The high cost of
the vibrations of the vehicle, ensuring greater manufacturing and system maintenance also poses
comfort and safety for the driver (shown in problems in terms of efficiency.
Figure 3). Unlike other passive and semi-active The linear electromagnetic motor (Figure 3) may be
suspension systems, active suspension provides the right choice as an actuator for an active suspension
greater flexibility to react to unpredictable forces system. Studies have shown that the finite force density
caused by road roughness and vehicle load, even of electromagnetic systems can be as high as 663 kN/m3,
while driving. In theory, all this control freedom compared to hydraulic and pneumatic systems. In addi-
provides better driving comfort and ideal wheel tion, the ability to regenerate energy through the transfer
holding. However, to use this technology effec- of linear motion directly into electrical energy reduces
tively in a real car, we need an intelligent system overall electrical consumption. The linear suspension
that can control it. Unfortunately, active suspen- movement can also be a stored energy source, which
sion remains a complicated and expensive solu- leads to reduced overall consumption. The cylindrical
tion, which explains why it is only used in several shape and the absence of attraction force, and the active
high-end car models or truck ranges.29–31 force generated in real-time, offer potential to the active
suspension. All these advantages improve performance
Several studies have investigated the different types of in terms of comfort, safety, and total control of the
vehicle suspension systems from the point of view of vehicle.33
complexity, efficiency, maintenance, and lifespan. All Active suspension sits at the top of the pyramid of
this study is analyzed and summarized in Table 1. suspension techniques. This adaptive suspension system
is capable of adapting to various changes, such as vehi-
cle loading or different phases of evolution like accel-
eration, braking, and turning. Sensors measure the
inclination and acceleration of the wheels, as well as the
anti-skid and steering wheel angle, among other para-
meters. All this information is analyzed by a computer,
which controls the supply of the cylinder, enabling the
system to compensate in real-time for the body’s move-
ments. This technology can anticipate the roughness of
the road and other potentially dangerous situations,
explaining the manufacturers’ focus on this technology.

Active suspension control system


The actuator is a crucial component of an active sus-
pension system. It acts as a regulator, applying a force
Figure 3. The electromagnetic active suspension system. between the sprung and unsprung mass, and ensuring
Dridi et al. 5

Table 1. Comparative study of suspension systems.32

Parameters Passive suspensions Semi-active suspensions Active suspension


Hydraulic/pneumatic Electro-magnetic

Structure Very clear Complicated Very complicated Light


Weight/volume Very inferior Low High Highest
Cost Very inferior Inferior Highest High
Ride comfort Bad Medium Perfect Best
Handling Bad Medium Good Perfect
Reliability Highest High Medium High
Dynamic Passive Passive Medium Good

the system’s dynamics according to state variables. This


results in a better driving quality, with excellent vehicle
maneuverability and improved wheel contact with the
road.
The active suspension system requires sensors to
measure physical parameters such as vertical displace-
ment, speed, and acceleration. These measurements
provide important information for the system’s opera-
tion, including vehicle comfort, suspension travel, and
tire condition estimation since it’s not possible to mea-
sure tire compression directly.34,35
The actuator requires a servo system to reach and
maintain the desired setpoint value more quickly. The Figure 4. Control techniques for a suspension system.37
objective of this study is to decrease the frequency of
the suspended mass, resulting in zero acceleration.
Various algorithms and control techniques are avail- the science of decision-making opens the horizon to
able, which can be classified into three categories, as reinforcement learning, which can optimize trajectories,
shown in Figure 4. plan movements, or establish routes dynamically.
The control algorithm gives a value to the power
 Linear controllers amplifier, and the amplifier translates this value into
 Non-linear controllers bidirectional electrical power to the electromagnetic
 Controllers based on learning technique. motor. The system uses the compression force to har-
vest energy and store it. Thus, the amplifier functions as
Several control algorithms include neural networks, a generator and provides power to extend or contract
fuzzy Logic, iterative control, vector control, scalar the motor to ensure the vehicle’s comfort and safety.34
control, etc. However, all control techniques had
advantages and limitations. Most control techniques
have various problems such as: Reinforcement learning
Deep learning, more precisely reinforcement learning
 Sensitivity to parameter variations and external (RL), attracts researchers for its ability to solve com-
disturbances. plex problems. In RL, agents can imitate the human
 Weak dynamic responses learning process to achieve a designated goal, and they
 Configuration complexity are trained on a mechanism of reward and punishment.
The agent perceives the current state of the environ-
Each control technique led to at least one problem. ment and performs actions for which it is rewarded for
No control technique has been implemented to solve all good moves and punished for bad ones. In doing so,
these problems simultaneously while providing high the agent tries to minimize bad moves and maximize
precision control.36 Below is a comparative table of good ones.40–42
control techniques according to their advantages and Reinforcement learning uses several algorithms that
disadvantages (Table 2). have a single objective, and the agent must find the pol-
In this context, to find an optimal solution for any icy that maximizes the sum of the rewards over time.
internal or external variation of a suspension system, We can classify these learning algorithms into two
6 Advances in Mechanical Engineering

Table 2. Evaluation of control techniques for an active suspension system.38,39

Method Advantages Disadvantages

1. Linear controllers.
Design implicit; Ease of implementation. Accurate linear structural model; Full state feedback
requirement; Mainly used to control linear structures.
LQR Can deal with various data sources and results. In some cases, it cannot control stability errors.
LQG No full feedback requirement.
PID Can overcome steady state error. Does not resist stress and noise; unable to process
input data and results at the same time.
H‘ Good performance when the system has multiple Design complexity; Requires a well-designed model.
variables.
2. Non-linear controllers.
No specific structural model requirement; Ease of Design complexity; Requires a well-designed model.
implementation; Good performance when the system
has multiple variables.
SMC Less sensitivity to model perturbations and System instability due to the effect of excessive noise.
uncertainties.
MPC Predicts the future behavior of states; Processes Slow tracking.
multiple inputs and outputs simultaneously; Can
overcome noise and disturbances.
3. Learning based controller.
No specific structural model requirement; Improved Complexity of design.
robustness and reliability; Improved control
performance; Used to control linear/non-linear
structures.
ANN A good method of learning and adaptation in the case Requires enough data for training; Poor system
of distributed parallel. stability.
Fuzzy Logic Offers an efficient solution to a complex model. System control and analysis problem; long parameter
adjustment; reliability issue; approximation error.

classes, one of which is based on a model and the other internal and external parameters), model-free algo-
without a model, or we can combine the two. The rithms are a solution for the non-linearities of the
choice of the algorithm is specific to the objective of suspension system. It allows one to actively learn in
our problem. Figure 5 shows the various algorithms. real-time without pre-learning to be functional in an
According to Poole and Mackworth in their book,44 unknown environment.
if one uses model-based learning, it is much more effi-
cient from the point of view of experience and neural
network maturity. However, in return for free learning Choice of algorithm. The agent’s decision-making func-
(without a model), the agent will be confronted with tion (control strategy) represents a mapping of
new experiences, which include a certain inaccuracy situations to actions. Given the importance of choosing
and imprecision of the state. This will be an advantage an algorithm, performance studies of these DRL
throughout the learning phase in the improvement of algorithms are carried out. For example, drone flight
the policy. Several approaches propose to combine control studies include hovering, landing, random
these two techniques.45 waypoints, and target tracking. All DRL algorithms
Another classification of deep reinforcement learn- have their pros and cons.
ing is based on the type of stimulation used to optimize In a study by Mr. William Koch in 2019, learning
the agent’s reaction. A positive model uses favorable algorithms such as Proximal Policy Optimization
stimulation of the system, while a negative model uses (PPO), Q-learning, Deep Q-Network (DQN), and
an undesirable stimulus to distract the agent from a Deep Deterministic Policy Gradient (DDPG) were used
specific action. to ensure stable and fluid autonomous navigation of
Our suspension system is based on the interaction drones. The study analyzed the angular velocity to
between the agent and the environment. The realization reach a target velocity O . The results were compared
of an optimal strategy requires precise parameteriza- with a PID controller (shown in Figure 647).
tion and an environment based on mathematical According to a comparative study of the behavior of
formulas.46 Deep reinforcement learning (DRL) appli- algorithms on the stability of drones, it was noted that
cation in active suspension control addresses several the TRPO and DDPG algorithms have extreme oscilla-
challenges, such as safety and ride comfort. Given the tions. This leads us to discard these types of algorithms
complexity of the suspension model (non-controllable in our suspension system because they generated
Dridi et al. 7

Figure 5. Taxonomy model of deep reinforcement learning (DRL).43

Figure 6. The best learning (RL) reinforcement agent response compared to PID. with a target angular velocity equal to
O = ½2:20,  8:14,  1:81 rad/s.47

instability in both the roll and yaw axis of the drone, system, we will use the Proximal Policy Optimization
resulting in instability during flight. According to this (PPO) algorithm.
study, the PPO algorithm is more precise, faster, and
produces the smoothest navigation with the minimum
Numerical modeling
error. This is why Song et al.48 trained the drone using
the PPO algorithm for AlphaPilot and Airsim drone Researchers often use two different approaches to solve
racing tracks due to its excellent performance and sim- control problems, such as ANN52 and fuzzy logic.53
ple implementation.49–51 So to stabilize our suspension ANN uses interconnected neurons that learn and adjust
8 Advances in Mechanical Engineering

their behavior based on expected input and output,


while fuzzy logic uses fuzzy sets and decision rules to
provide outputs. The researchers chose ANN as a
method for stabilizing a vehicle because it is better sui-
ted for modeling nonlinear systems like vehicle suspen-
sion. The dynamic characteristics of suspension are
very complex and difficult to model with simple mathe-
matical equations. ANN is capable of learning these
characteristics and adjusting its behavior accordingly.
However, for our study, we opted for reinforcement
learning instead of ANN because the latter requires a
large amount of training data, which can be difficult to
collect for real-world driving scenarios. Reinforcement
learning is an unsupervised approach in which an agent
interacts with its environment to learn how to make
decisions by maximizing a reward. This method is more
flexible and can adapt to unforeseen situations in the
environment, making it more robust for stabilizing a Figure 7. Active suspension model.
vehicle. Additionally, reinforcement learning allows for
directly optimizing the desired reward, in this case, the  ms : Sprung mass, which represents the mass of
stability of the car, which can lead to better overall the vehicle body and is supported by the
performance. suspension.
Our model consists of two suspended and unsus-  xs : Displacement of the sprung mass.
pended masses, respectively Ms and Mus, supported by  x_ s : Velocity of the sprung mass.
two shock absorbers and two springs. This model is  bs : Damping coefficient of the suspension for the
similar to the passive model but includes an actuator sprung mass.
between the sprung and unsprung mass, as shown in  ks : Spring constant of the suspension for the
Figure 7 below. The active damper generates forces sprung mass.
under the demand of a control strategy. The simplicity  mus : Unsprung mass, which represents the mass
of this model facilitates the analysis and optimization of the wheel and tire that are not supported by
of the calculation. the suspension.
 xus : Displacement of the unsprung mass.
 x_ us : Velocity of the unsprung mass.
Mathematical formula of an active damper  bus : Damping coefficient of the suspension for
the unsprung mass.
In our model with two degrees of freedom (Figure 7),  kus : Spring constant of the suspension for the
we use the general theorems of mechanics based on the unsprung mass.
fundamental principle of dynamics. The suspension sys-  w: Displacement of the wheel.
tem considers the vertical movement of the body xs and  _ Velocity of the wheel.
w:
that of the wheel xus along the road presented by xr . We  Fc : Force due to tire-road contact.
can present the dynamics of our suspension system by
the following differential equations54:
From (2) and ((3)), the following state space equa-
tions can be formulated:
 Sprung mass:
x_ = Ax + Bu + d ð4Þ
ms€xs =  bs ðx_ s  x_ us Þ  ks ðxs  xus Þ + U ð2Þ
y = Cx + Du ð5Þ
 Unsprung mass: 2 3
0 1 0 0
6 ks bs ks bs 7
6  7
 bs ðx_ s  x_ us Þ + ks ðxs  Xus Þ
Mus€xus =
ð3Þ 6 ms ms ms ms 7
+ bus W_  x_ us + kus ðw  Xus Þ  Fc A=6
6 0
7
7 ð6Þ
6 0 0 1 7
4 ks bs ks + kus bs + bus 5
Equations (2) and (3) present the mathematical models  
mus mus mus mus
of a vehicle suspension system knowing that:
Dridi et al. 9

2 3
0
6 1 7
6 7
6 ms 7
B=6
6 0 7
7 ð7Þ
6 7
4 1 5

mus
2 3
0
6 7
6 07
6 7
d=6 07 ð8Þ
6 7
4 bus kus 5
x_ r + xr
mus mus
2 3
xs
6 x_ S 7
x=6 4 xus 5
7 ð9Þ
x_ us
 
x  xus
y= s u = Fc ð10Þ
€xs
2 3 2 3
1 0 1 0 0
C = 4 ks bs ks bs 5, D = 4 1 5 ð11Þ
 
ms ms ms ms ms
The equations (4) and (5) describes the state space
representation of the vehicle’s suspension system,
where:

 A: State matrix.
 B: Input matrix.
 C: Output matrix.
 D: Feedthrough matrix.
 u: Input variable representing the actuator force.
 y: Output variable representing the displacement
and acceleration of the sprung mass. Figure 8. Reinforcement learning model.
 d: Disturbance variable due to changes in road
profile.
to the probability density function in the policy net-
The actuator plays the role of a regulator between work. A series of trajectories ti = si , ai , ri , si + 1 is stored
these components (suspended mass and unsprung mass) in the memory space by repeatedly interacting with the
while minimizing the acceleration of the suspended environment. The value network is updated by impor-
mass and eliminating the effect of wheel travel. Note: if tance sampling, and the control strategy is continuously
we deactivate the actuator, we give it zero force, and we optimized according to the obtained reward value until
return to the behavior of a passive damper. The force the control performance is better and convergence is
of the actuator varies between ½8000N ::8000N . achieved. The optimal neural network is then saved so
that it can be used in a real system. The flowchart of
the active suspension control structure based on the
Numerical simulation with reinforcement learning PPO algorithm is shown in Figure 8.
Our contribution in this work is to combine the PPO During the reinforcement learning phase, a rectangu-
algorithm with active suspension. The global process is lar shape was used for the road to simplify the problem
as follows: the road information is generated according and reduce the complexity of the environment. This cre-
to ISO 8608 and fed into the suspension system model. ated a more controllable and reproducible environment
Meanwhile, the control performance index of the cur- for learning, making it easier to evaluate the model’s
rent time t is calculated and fed into the actor network performance. However, this shape may not represent all
of the PPO algorithm as a state value. Then, the corre- real driving situations, and the model’s performance
sponding actor value is selected as an output according may be limited when facing unexpected or unknown
10 Advances in Mechanical Engineering

situations. Nonetheless, our model will behave accord-  put (ajs): Policy output, importance sampling
ing to imposed rules and will be validated in advance (how much weight to give a particular update)
using this learning method with a reproducible road
profile. To increase reliability, we can diversify the Visualization of the neural network update: We move
learning environments so that the model can generalize to the visualization of the neural network update once
to different situations encountered during learning. the trajectory is completed (step 1). All values (log
 probabilities, values, and rewards) are recorded. After
1 ifx 2 ½0, L the end of the trajectory, the rewards and benefits are
r(x) = ð12Þ
0 other wise discounted (step 2) (where advantages = discounted
return 2 expected return). In step 3, the loss of each
where x represents the longitudinal position on the step is calculated. Finally, in step 4, we calculate the
road, L represents the total length of the road, and r(x) average of all these losses and update it with gradient
represents the height of the road at position x. The r(x) descent (see Figure 9).
function is defined as being equal to 1 if x is between 0 Algorithm PPO: OpenAI proposed PPO to solve the
and L, and 0 otherwise. problem of the gradient policy’s learning rate conver-
gence. If the step size is too large, the policy diverges,
and if it is too small, the calculation time will be very
Active suspension control based on the proximal long. PPO adds a factor, the probability ratio, to pre-
policy optimization algorithm vent large updates from occurring and makes the policy
The PPO algorithm belongs to the policy gradient (PG) gradient less sensitive.55,56
family. The basic idea is to update the policy to maxi-   
^ t , clipðg t (u), 1  e, 1 + eÞA
^ t min g t (u)A
LCLIP (u) = E ^t
mize the probability of actions that provide the greatest
future reward. It does this by running algorithms in the ð14Þ
environment and collecting state changes based on the
agent’s actions. Collections of these interactions are
called trajectories. Once one or more trajectories are  L: Loss function.
captured, the algorithm examines each step, verifies  ^ t : Empirical average over a nite batch of
E
whether the chosen action yields a positive or negative samples.
reward, and updates the policy. The environment rep-  ^ t : Advantage function, which is the difference
A
resents the physical behavior of our suspension system, between Qpu and V pu
with its state representing the accelerations and displa-  e: Constant, usually equals 0.2.
cements of the suspended and unsuspended mass, and
its action being the value of the force that must be The first term inside the min is LCPI :
exerted.  
Trajectories are sampled through step-by-step inter- pu ðat jst Þ ^ 
CPI
L ^t
(u) = E At = E ^t
^ t g t (u)A ð15Þ
action with the environment. To perform a single step, puold ðat jst Þ
an agent selects an action and passes it to the
environment. pu ðat jst Þ
g t (u) = ð16Þ
Agent Update: Policy update: The basis of the PG puold ðat jst Þ
algorithm is the formula for updating the weights of a
Moreover, the second term clipðg t (u), 1  e, 1 + eÞ,
network (Formula 13).
The gradient is the positive or negative direction of modifies the surrogate objective by clipping the prob-
the weights in which the policy change will make ability ratio.
actions more likely in a given state. ^ pu ðat jst Þ = Qpu ðat jst Þ  V pu ðat jst Þ
A t ð17Þ
rput (ajs)
ut + 1 = ut + aA(ajs) ð13Þ
put (ajs)
Reward function. The reward function plays a crucial
role throughout the learning process as it indicates the
 ut + 1 : New weights of a network quality of the action undertaken by the agent after
 ut : Current weights transitioning to the next state, with quality varying
 a: Learning rate between positive or negative values. The reward is then
 rput (ajs): Gradient of the current network transferred to the neural network via back-propaga-
 A(ajs): Advantage function (adjustment of direc- tion, and the optimization method adjusts the neural
tion and amount of weight ) network parameters to minimize errors. During the
Dridi et al. 11

Figure 9. Neural network update diagram.

reinforcement learning phase, the agent receives a


reward value from the environment.
Therefore, the choice of the reward function is a key
element in the design of any control system, including
active vehicle suspension systems. The reward function
defines the goals of the control system and evaluates
the system’s performance in terms of these goals. In this
Table 3, we have compiled and compared the reward
functions used in several studies to stabilize an active sus-
pension system. We also evaluated the pros and cons of
each reward feature in terms of the performance of the
active suspension system. This comparison helped us to
choose the most suitable reward function for achieving
an effective and efficient active suspension system.
The reward of our suspension system is studied
according to four objectives that must be expected
simultaneously. The first is to minimize the acceleration
of the suspended mass (j€xs j). The second objective is to Figure 10. Convergence of objective parameters throughout
catch up with the position of the state of the road learning.
(jxs  xr j) to prepare for the next situation, as well as
the third objective is to ensure maximum grip between
the wheel and the road while keeping the position of are tested and studied in the result part. The following
the unsprung mass with the road state (jxus  xr j). (formula 18) presents the general reward function.
Ultimately optimize the use of force exerted on the
actuator to minimize unsprung mass vibration (jFc j). R =  ajxs  xr j  bðxus  xr Þ2  g ð€xs Þ2  dðFc Þ2
Figure 10 shows the four objectives of the work. The ð18Þ
more the values converge toward the origin (converges
toward zero), the more we have good stability of the Where:
suspended mass and good wheel adhesion. Our contri-
bution is to develop a formula that seeks a compromise  a, b, g, d: are variables to be optimized to find
between all these constraints. Several reward formulas the right occurrence of neural network.
12 Advances in Mechanical Engineering

Table 3. Benchmarking reward functions for performance optimization of an active suspension system.

Study Reward function Benefits Disadvantages

Fares and Younes54 rt =  k(jxs  xus j) Simple Convergence difficulties due to


small and close numerical values
Fares and Younes54 rt =  k(jx_ s j) Worked better Failed to eliminate the suspended
mass steady state error
Fares and Younes54 rt =  k1 (_xs )2  k2 (juj) Induce the actor to produce zero Do not respect the state of the
force when the speed of the road
suspended mass is zero
Han and Liang57 zr ! u, Dynamically adjusting the Using traditional policy over policy
PT suspensions performance weight leads to low sampling efficiency and
R= uT Au, matrix f based on these sampled data can only be used for
2t = 0 3 . conditions, all based on passive one policy update
x
A=4 y 5, suspension performance
measurements. Where A
z
represents the weight matrix of
each performance indicator, and the
values of x, y, and z are dynamically
adjusted according to the different
8 routes
Li et al.58 >
> c(t)  a  E(t) Hit-and-run sampling technique for Poor control of suspension
< sampling safe actions to preserve stresses, especially in non-linear
if a(t) is admissible
rt = exploration efficiency, provides systems
> G,
>
: good adaptability to new situation.
otherwise
c(t) standard objective cost. E(t)
and the overhead measuring the
amount of effort needed to keep
the system within constraints, a is a
weight on this latter cost, G is the
big penalty assigned to an
exploratory action and a(t)
  exploratory action
Ming et al.59 rt =  k1 y12 + k2 y22 + k3 y32 The reward is applied to the Test that on a semi-active
control of a semi-active suspension suspension, the weights used are
system, providing better not optimal
adaptability. Where, y1 is the
vehicle body acceleration, y2 is the
suspension dynamic deflection, y3 is
the vehicle body displacement;
k1 , k2 , k3 are the weights

Table 4. Parameters for the quarter active suspension. ten-layered neural networks, respectively. The learning
rates of the actor network aA and the critic network aB
Settings were set to 5e  5 and 1e  4, respectively. The discount
Symbol Description Values factor g was set to 0:98. The clipping parameter e was
set to 0:2, and the GAE l parameter was set to 0:9. The
ms Sprung mass 300kg time steps were fixed at 1e7. The specific hyperpara-
ks Stiffness of the car body 40, 000N=m meters were defined as shown in Table 5.
mus Unsprung mass 30kg
C Suspension dumping 1385N:S=m The results obtained from the intelligent controller
kus Stiffness of the tire 22, 000N=m with a reward function (equation (18)) after the training
U Control force ½8000::8000N phase are compared with the passive suspension. The
results show that we obtained a reduction of 53.24%
and 35.60%, respectively, in acceleration and displace-
ment compared to the passive suspension. The reduc-
Result and discussion
tion results of the chosen reward function are shown in
In this article, we used Google Colab Pro to build a 1/4 Table 6 and Figure 11.
active suspension system model, and the suspension Figure 12 presents the simulation results of the
parameters are shown in Table 4. The actor network and acceleration of the unsprung mass compared to the
the critic network of the PPO algorithm were eight- and passive suspension. The proposed method significantly
Dridi et al. 13

Table 5. Hyperparameters for the proximal policy optimization Table 7. System settings.
structure.
Acceleration RMS value (m=s2 ) Comfort reaction
Settings
<0:315 Not uncomfortable
Symbol Description Values 0:315;0:63 A little uncomfortable
0:5;1 Fairly uncomfortable
aA Actor learning rate 5e  5 0:8;1:6 Uncomfortable
aB Critic learning rate 1e  4 1:25;2:5 Very uncomfortable
g Discount factor 0:98 ø2 Extremely uncomfortable
e Clip parameter 0:2
l GAE parameter 0:9
Times steps 1e7 The objective is to minimize acceleration while main-
taining the wheel’s grip on the road and keeping up
with the suspended mass.
Table 6. Reduced overshoot values for stepped road entry. To optimize the results, we adjusted the weights of
each constraint so that the agent maximizes its reward,
Parameters Passive Active Reduction
passiveactive
which generates stability of the suspension system with
passive 100% optimal values.
After several tests, we noticed that the agent con-
Acceleration 7:97(m=s2 ) 3:72(m=s2 ) 53:24%
(suspended mass)
trolled our suspension system according to the latter
Displacement 1:28(cm) 0:82(cm) 35:60% reward criteria. The system exhibited a uniform beha-
(suspended mass) vior of displacement of the unsprung mass in the loga-
displacement 1:19(cm) Yes – rithmic form with RMS, which can reach up to
(unsprung mass) 0.180 m=s2 . This prompted us to rectify the reward for-
mula to follow a logarithmic form while incorporating
the exponential function. The result is motivating and
improves the acceleration of the vehicle body, leading can be improved later, with an RMS of 0.228 m=s2 and
to more stability. a minimum overrun of the sprung and unsprung mass.
However, evaluating comfort solely by the accelera- The integration of the force exerted on the actuator
tion limit method does not capture the behavior of the U as a criterion in the reward formula aims to minimize
suspension throughout the journey. Therefore, accord- the phenomenon of wheel deflection. The result reaches
ing to the ISO 2631 standard,15 one can use the Root its target toward more stability of the wheel with an
Mean Square (RMS) acceleration method, which calcu- RMS of the order of 0.250 m=s2 and an overrun of the
lates the average acceleration over a certain period of suspension mass of 0.04 cm.
time.60 The comfort reference values are shown in Several articles have used the reinforcement learning
Table 7. method, for example, that of Fares and Younes,54 who
Several reward functions were tested during the used the critical actor algorithm, but the results
neural network optimization phase. To evaluate the obtained with the PPO algorithm are more efficient.
performance of each model, the hyperparameters for Comparison between the results:
the proximal policy optimization algorithm were set as  Critical actor algorithm: profile of the road used
shown in Table 5. After each test, the root-mean-square during the learning phase (square signal of
(RMS) values were calculated and evaluated according amplitude 0:02m with a period of 1:5s), accelera-
to ISO 2631-5. The results obtained showed that the tion of the average suspended mass 4m=s2
reward function played a guiding role throughout the  PPO algorithm used by our model: profile of the
learning phase. As described later, our objective was to road used during the learning phase (square signal
work on several aspects, such as the stability of the sus- of amplitude 0:08m of period 1:5s), acceleration
pended mass and the dynamics of wheel movement, to of the suspended mass does not exceed 3:8m=s2 .
ensure better grip with the road.
According to the reward, formulas studied, in which
It is clearly concluded that the PPO algorithm with
we gradually introduced the constraints such as:
the reward function4 listed in the Table 8 is more effi-
cient than the critical actor algorithm.
 The distance between the road condition and the
unsprung mass (Xus  Xr )
 The distance between the state of the road and Simulation of different road levels
the suspended mass (Xs  Xr ) To validate the control performance and robustness of
 The acceleration of the suspended mass (X€s ). the PPO-based active suspension system, road profiles
14 Advances in Mechanical Engineering

Figure 11. Sprung mass displacement on rectangular road profile.

Figure 12. Sprung mass acceleration on rectangular road profile.

w
were created in accordance with ISO 8608,61,62 and the n
G q ðn Þ = G q ðn 0 Þ ð19Þ
road characteristics were described using the special n0
Gq (n) index data. The road roughness can be classified
into eight classes, as shown in Table 9. Figure 13 depicts knowing that :
the road power spectrum, which includes the upper and
lower limits as well as the average values of the road  Gq (n): Road unevenness coefficient (the road
spectrum for each road class. The formula for the spec- power spectral density value at the reference spa-
tral density of road power is given by Gq (n): tial frequency).
Dridi et al. 15

Table 8. Analysis of RMS values according to reward functions.

Reward function €s
RMS X maxðXs Þ  maxðXr Þ ðcmÞ minðXs Þ  minðXr Þ ðcmÞ

(1) 10jXs  Xr j  10jXus  Xr j2 1.173 0:49 " 0:49 #


(2) 10jXs  Xr j2  10jXus  Xr j2   1.020 0:15 " 0:22 #
(3) 10jXs  Xr j2  10jXus  Xr j2  X €  0.922 0:10 " 0:20 #
 s
(4) 10jXs  Xr j  10jXus  Xr j  X s 
2  € 0.930 0:05 " 0:06 #
(5)  2 0.759 0:13 " 0:09 #
10jXs  Xr j2  100jXus  Xr j2  X € s
(6)  2 0.180 0:53 # 0:13 "
10jXs  Xr j  100jXus  Xr j  100X
2 € s
  0:01 " 0:01 #
(7) 10jXs  Xr j2  100jXus  Xr j  10X € s 2 0.804
  0:01 # 0:12 "
(8) jXs  Xr j2  jXus  Xr j2  X € s 2 0.212
  0:02 " 0:01 #
(9) jXs  Xr j  jXus  Xr j2  X € s 2 0.225
  1:0     t 2
(10)   0.228 0:06 " 0:03 "
 Xr + Xst1  Xrt e 0:5  Xxt   Xus t
 Xrt   X €
s
  t1      t 2
(11)   2 0.483 0:08 # 0:05 "
 Xx   Xus  Xr  X
 t €
1:0
t t t
10 Xr + Xs  Xr e 0:5
s
 
(12)   t1  1:0      0.228 0:08 " 0:01 #
10 Xr + Xs  Xrt e jXs j  Xxt   Xus
t t
 Xrt   X € t 2
s

(13)  2 0.250 0:04 " 0:01 #


10jXs  Xr j  2jXus  Xr j2  10X € s   0:00001U2
(14) Passive 1.172 0:48 " 0:483 #

Table 9. Road classification according to ISO 8608 [1] dynamic stability characteristics of the active suspension
with road unevenness coefficient Gd (n0 )(106 m3 =cycles), with delay have been improved. Therefore, we can con-
n0 = 0:1cycles=m, and w = 2. clude that the proposed reinforcement controller is capa-
ble of meeting the design and simulation requirements.
Road level Range Geometric To further support our findings, it would be valu-
mean able to conduct physical experiments to validate the
A \32 16 results obtained from the simulations. Furthermore,
B 32–128 64 future work can focus on testing the proposed control-
C 128–512 512 ler on other road conditions to investigate its perfor-
D 512–2048 1024 mance in different scenarios.
E 2048–8192 4096
F 8192–32,768 16,384
G 32,768–131,072 65,536 Evaluation
H .131,072 262,144
Several simulations were conducted to evaluate and
compare the results obtained. The simulations were
conducted in the OpenAI Gym environment, and the
 n: Spatial frequency (reciprocal of the same set of hyperparameters were used throughout all
wavelength). phases of the simulations to ensure consistency. To
 n0 : Reference spatial frequency (generally conduct these simulations, we utilized the cloud simula-
selected as 0:1m1 ). tor ‘‘Google Colab Pro,’’ which is based on Jupyter
 v: Frequency index. Notebook and offers a high-performance computing
environment with 32 GB of RAM and either a Tesla
Throughout the simulation, the same conditions T4 GPU or NVIDIA Tesla P100.
apply. Specifically, for a vehicle speed of 20 m/s, the Figure 16 shows the convergence process of the algo-
result is shown in the Figure 14. rithm, which appeared to stabilize at around 0.4e7 epi-
sodes. The simulation process lasted approximately 6 h
and required more than 30 trials to find optimal con-
Results of the acceleration of the suspended mass vergence. Despite the long time it took to find optimal
After simulating the passive and active suspensions under convergence, the results obtained from the simulations
class D road conditions, the results showed a significant were reliable and justified the time and effort spent
decrease in acceleration, which could reach up to 62.5%, conducting them.
as depicted in Figure 15. This improvement effectively Jin et al.64–66 used the Active Suspension of In-
enhances ride comfort and stability. Moreover, the Wheel-Drive Electric Vehicles (IWMD-EV) solution.
16 Advances in Mechanical Engineering

Figure 13. The various PSDs of different classes utilized to stimulate the model in this paper.63

In order to achieve better driving comfort and reduce suspension settings for a given driving scenario. The
the force applied to the motor bearing in the wheel, a controller receives feedback from sensors on the vehi-
robust H‘ dynamic output feedback controller64,66 and cle’s motion and makes adjustments to the suspension
m-Synthesis Methodology65 have been derived so that settings to optimize the ride quality and handling.
the closed loop system has asymptotic stability and Both approaches have their advantages and disad-
simultaneously satisfies stress performance such as vantages. A dedicated active suspension system pro-
road holding, suspension travel, dynamic load applied vides more precise control over the suspension settings
to bearings, and limitation of actuators. Finally, the and can respond more quickly to changes in driving
simulation results demonstrated that the proposed con- conditions. However, it requires more complex hard-
troller offered better suspension performance despite ware and software, which can increase the cost and
the faults of the actuators and the time delay. weight of the vehicle. A reinforcement learning control-
If we now compare our solution, the RL controller, ler, on the other hand, can adapt to different driving
with the work of Jin et al., we find that the results at the scenarios and learn from experience, which can lead to
RMS level of the acceleration of the body are very better overall performance. However, it requires a sig-
close, approximately 0:1m=s2 . Active suspension is a nificant amount of computing power and can be diffi-
technology that adjusts the suspension system of a vehi- cult to train and optimize.
cle in real-time to optimize the ride quality, handling, In summary, both approaches have their strengths
and stability of the vehicle. In the context of in-wheel- and weaknesses, and the choice between them depends
drive electric vehicles (IWMD-EVs), active suspension on the specific requirements of the IWMD-EV and the
can play an important role in improving the overall per- preferences of the designer.
formance of the vehicle. There are two main approaches Analysis of the results and performance of our rein-
to implementing active suspension in IWMD-EVs: forcement learning-based controller for active suspension
using a dedicated active suspension system or using a compared to recent works by Hamza and Ben Yahia1
reinforcement learning controller. and Swethamarai and Lakshmi67 proposing controllers
A dedicated active suspension system typically con- such as Artificial Neural Network (ANN), Proportional,
sists of a set of sensors, actuators, and a control unit. Integral, and Derivative (PID), Fractional Order PID
The sensors measure the vehicle’s motion and the road (FOPID), and Adaptive Fuzzy tuned Fractional Order
conditions, while the actuators adjust the suspension PID (AFFOPID) controllers, demonstrates the effective-
components (such as dampers, springs, and anti-roll ness and efficiency of our proposed method in this paper.
bars) to optimize the ride quality and handling. On the The results are listed in Table 10.
other hand, a reinforcement learning controller uses The superior performance of the reinforcement
machine learning algorithms to learn the optimal learning controller significantly outperforms that of
Dridi et al. 17

Figure 14. Movement of the suspended mass on different types of road (A, B, C, D, E, and F): (a) class A, (b) class B, (c) class C,
(d) class D, (e) class E, and (f) class F.

ANN and AFFOPID, attributed to the constant Conclusion and outlook


improvement in RMS values (from 0:342 for ANN to
In this study, we utilized the Proximal Policy
0:180 for RL), leading to a considerable reduction in
Optimization (PPO) algorithm in deep reinforcement
vibrations experienced by the driver. Additionally, our learning to overcome the drawbacks of conventional
controller has the advantage of achieving these comfort suspension methods. We conducted studies and tests to
results without integrating the driver seat suspension. optimize the neural network on the reward function
On the other hand, the active suspension model of a using 13 different reward functions in both uniform
quarter car is supposed to be rigid. Therefore, we can and square road conditions. The results showed that
claim that the proposed solution in this paper, namely reinforcement learning provided better comfort and
the reinforcement learning controller, is the best and improved driving stability and vehicle safety, with
most cost-effective for controlling active suspension, near-optimal results. Additionally, the results high-
especially for heavy-duty trucks. lighted the importance of the reward function as a
18 Advances in Mechanical Engineering

Table 10. Level of comfort of the different types of regulators.

Controller Acceleration Comfort


RMS (m/s2) level

Passive 2.614 Very bad


Proportional Integral Derivative (PID) 1.051 Bad
Fractional Order PID (FOPID) 0.642 Mean
Adaptive Fuzzy tuned Fractional Order PID (AFFOPID) 0.339 Good
Artificial Neural Network(ANN) 0.342 Good
Reinforcement Learning (RL), current study 0.180 Very good

j Very bad suspension behavior (a great risk to the health of the driver, and poor adhesion of the vehicle with the road)
j The suspension control with PID minimizes the acceleration of the suspended mass, the results vary between fair and average
j The suspension control with FOPID to exceed the medium to above average
j The suspension control with AFFOPID surpasses from average to good
j The control with the neural network belongs to the family of good suspension controllers that ensure good driving comfort
j The controller by reinforcement goes from good to excellent comfort and more driving stability

Figure 15. Sprung mass acceleration on class D road profile.

guide throughout the learning phase. A clear reward


function leads to optimal neural network results that
meet all the intended goals.
We compared our results with the ISO 2631-5 stan-
dard to evaluate the degree of comfort of our solution.
The Root Mean Square (RMS) of the acceleration of
the suspended mass was reduced to 0.118 m=s2 com-
pared to the passive suspension’s RMS of 1.172 m=s2 .
This represents a significant reduction in RMS of 90%,
which is highly significant when compared to passive
suspension.
These results encourage further research on reward
function optimization and exploration of other algo-
rithms such as Asynchronous Advantage Actor Critic
Figure 16. Total reward during a number of time steps. (A3C), Deterministic Policy Gradient (DPG), and Deep
Dridi et al. 19

Deterministic Policy Gradient (DDPG). Furthermore, 11. INRS. Vibrations, plein le dos. Edition INRS, ED 864.
continuous learning under more complex perturbations 2001.
could enhance reinforcement learning. We should also 12. Ning D, Du H, Sun S, et al. An innovative two-layer mul-
consider optimizing the number of epochs and the learn- tiple-DOF seat suspension for vehicle whole body vibra-
ing rate to provide further neural network optimization. tion control. IEEE ASME Trans Mechatron 2018; 23:
1787–1799.
13. Bovenzi M and Betta A. Low-back disorders in agricul-
tural tractor drivers exposed to whole-body vibration and
Declaration of conflicting interests
postural stress. Appl Ergon 1994; 25: 231–241.
The author(s) declared no potential conflicts of interest with 14. Bovenzi M, Pinto I and Stacchini N. Low back pain in
respect to the research, authorship, and/or publication of this port machinery operators. J Sound Vib 2002; 253: 3–20.
article. 15. BS ISO 2631-5:2018. Mechanical vibration and shock—
evaluation of human exposure to whole-body vibration.
Funding International Standard. ISO, 2018.
16. Kuznetsov A, Mammadov M, Sultan I, et al. Optimiza-
The author(s) received no financial support for the research, tion of a quarter-car suspension model coupled with the
authorship, and/or publication of this article. driver biomechanical effects. J Sound Vib 2011; 330:
2937–2946.
ORCID iDs 17. De la Hoz-Torres M, Aguilar-Aguilera A, Martı́nez-
Issam Dridi https://orcid.org/0000-0003-2185-4390 Aires M, et al. A comparison of ISO 2631-5:2004 and
Anis Hamza https://orcid.org/0000-0003-4283-5236 ISO 2631-5:2018 standards for whole-body vibrations
exposure: a case study. In: Arezes P, Santos Baptista J
and Barroso MP (eds) Occupational and environmental
References safety and health. Studies in systems, decision and control.
1. Hamza A and Ben Yahia N. Heavy trucks with intelli- Cham: Springer, 2019, pp.711–719.
gent control of active suspension based on artificial 18. Chumchan C and Tontiwattanakul K. Health risk and
neural networks. Proc IMechE, Part I: J Systems and ride comfort assessment by ISO2631 of an ambulance. In:
Control Engineering 2021; 235: 952–969. 2019 5th international conference on engineering, applied
2. Salem M and Aly A. Fuzzy control of a quarter-car sus- sciences and technology (ICEAST), Luang Prabang,
pension system. Int J Comput Inf Eng 2009; 3: Laos, 2 July 2019, pp.1–4. New York: IEEE.
1276–1281. 19. Ghazaly N and Moaaz A. The future development and
3. Qin Y, Xiang C, Wang Z, et al. Road excitation classifi- analysis of vehicle active suspension system. IOSR J
cation for semi-active suspension system based on system Mech Civil Eng 2014; 11: 19–25.
response. J Vib Control 2018; 24: 2732–2748. 20. Goodarzi A and Khajepour A. Vehicle suspension system
4. Zhao F, Dong M, Qin Y, et al. Adaptive neural networks technology and design. Synth Lect Adv Automot Technol
control for camera stabilization with active suspension 2017; 1: i–77.
system. Adv Mech Eng 2015; 7: 1687814015599926. 21. Zheng P, Wang R, Gao J, et al. Parameter optimisation
5. Konoiko A, Kadhem A, Saiful I, et al. Deep learning of power regeneration on the hydraulic electric regenera-
framework for controlling an active suspension system. J tive shock absorber system. Shock and Vibration 2019;
Vib Control 2019; 25: 2316–2329. (2019).
6. Papadimitrakis M and Alexandridis A. Active vehicle 22. Bouvin J. Vers une version alternative la suspension
suspension control using road preview model predictive CRONE Hydractive. Doctoral Dissertation, Bordeaux,
control and radial basis function networks. Appl Soft 2019.
Comput 2022; 120: 108646. 23. Khemoudj O. Dveloppement d’une mthode de pesage
7. Chen H, Lin Y and Chang Y. An actor-critic reinforce- embarqu pour poids lourd. Doctoral Dissertation, Valen-
ment learning control approach for discrete-time linear ciennes, 2010.
system with uncertainty. In: Proceedings of the 2018 24. Gabriel.com. Amortisseurs pour poids lourds, semi-
international automatic control conference (CACS), remorques et autobus, https://gabriel.com/wp-content/
Taoyuan, Taiwan, 2018, pp.1–5. New York: IEEE. uploads/2011/06/Gabriel_Section_Fraincaise_FRE.pdf
8. Ragot J, Maquin D, Adrot O, et al. Détection de dys- (2011).
fonctionnement d’un systéme amortisseur de véhicule 25. Riduan A, Tamaldin N, Sudrajat A, et al. Review on
automobile. In: 5ème Congrès International Pluridiscipli- active suspension system. SHS Web Conf 2018; 49: 02008.
naire Qualite´ et Sûrete´ de Fonctionnement, Qualita 2003, 26. Sharp R and Hassan S. The relative performance capabil-
Nancy, France, 2003, p.3. ities of passive, active and semi-active car suspension sys-
9. Huang H, Huang X, Wu J, et al. Novel method for iden- tems. Proc IMechE, Part D: Transport Engineering 1986;
tifying and diagnosing electric vehicle shock absorber 200: 219–228.
squeak noise based on a DNN. Mech Syst Signal Process 27. Inoue H, Yamaguchi T and Kondo T. Damping force
2019; 124: 439–458. generation system and vehicle suspension system con-
10. Axmacher B and Lindberg H. Coxarthrosis in farmers. structed by including the same. Google Patents. United
Clin Orthop Relat Res 1993; 287: 82–86. States patent US 7,722,056, 2010.
20 Advances in Mechanical Engineering

28. Soliman A and Kaldas M. Semi-active suspension sys- 48. Song Y, Steinweg M, Kaufmann E, et al. Autonomous
tems from research to mass-market: a review. J Low Freq drone racing with deep reinforcement learning. In: 2021
Noise Vib Act Control 2019; 40: 146134841987639. IEEE/RSJ International Conference on Intelligent Robots
29. Shafie A, Bello M and Khan R. Active vehicle suspension and Systems (IROS), Detroit, MI, USA, October 1–5
control using electro hydraulic actuator on rough road 2021, pp. 1205-1212. New York: IEEE.
terrain. J Adv Res Appl Mech 2015; 9: 15–30. 49. Li B and Wu Y. Path planning for UAV ground target
30. Martins I, Esteves M, da Silva F, et al. Electromagnetic tracking via deep reinforcement learning. IEEE Access
hybrid active-passive vehicle suspension system. In: Pro- 2020; 8: 29064–29074.
ceedings of the IEEE 49th vehicular technology conference, 50. Koch W, Mancuso R, West R, et al. Reinforcement
VTC 1999, Houston, TX, USA, 16–20 May 1999. learning for UAV attitude control. ACM Trans Cyber
31. Heidarian A and Wang X. Review on seat suspension sys- Phys Syst 2019; 3: 1–21.
tem technology development. Appl Sci 2019; 9: 2834. 51. Bøhn E, Coates E, Moe S, et al. Deep reinforcement
32. Jiregna I and Sirata G. A review of the vehicle suspension learning attitude control of fixed-wing UAVs using prox-
system. J Mech Energy Eng 2020; 4: 109–114. imal policy optimization. In: Proceedings of the 2019
33. Gysen B, van der Sande T, Paulides J, et al. Efficiency of international conference on unmanned aircraft systems,
a regenerative direct-drive electromagnetic active suspen- ICUAS 2019, Atlanta, GA, USA, 2019, pp.523–533.
sion. IEEE Trans Veh Technol 2014; 60: 1384–1393. New York: IEEE.
34. Kuber C. Modelling simulation and control of an active 52. Hamza A and Ben Yahia N. Artificial neural networks
suspension system. Int J Mech Eng Technol 2014; 5: controller of active suspension for ambulance based on
66–75. ISO standards. Proc IMechE, Part D: J Automobile Engi-
35. Ghrle C, Schindler A, Wagner A, et al. Road profile esti- neering 2023; 237: 34–47.
mation and preview control for low-bandwidth active sus- 53. Deng Y, Gong M and Ni T. Double-channel event-
pension systems. IEEE ASME Trans Mechatron 2014; 20: triggered adaptive optimal control of active suspension
2299–2310. systems. Nonlinear Dyn 2022; 108: 3435–3448.
36. Abdul Ali A, Abdul Razak F and Hayima N. A review 54. Fares A and Younes A. Online reinforcement learning-
on the AC servo motor control systems. ELEKTRIKA J based control of an active suspension system using the
Electr Eng 2020; 19: 22–39. actor critic approach. Appl Sci 2020; 10: 8060.
37. Amin R, Aijun L and Shamshirband S. A review of quad- 55. Schulman J, Wolski F, Dhariwal P, et al. Proximal policy
rotor UAV: control methodologies and performance eva- optimization algorithms. ArXiv:170706247v2 [cs.LG],
luation. Int J Autom Control 2016; 10: 87–103. 2017.
38. Li Z and Adeli H. Control methodologies for vibration 56. Hsu C, Mendler-Dünner C and Hardt M. Revisiting
control of smart civil and mechanical structures. Expert design choices in proximal policy optimization.
Syst 2018; 35: e12354. ArXiv:2009.10897v1 [cs.LG], 2020.
39. Roy R, Islam M, Sadman N, et al. A review on compara- 57. Han S and Liang T. Reinforcement-learning-based vibra-
tive remarks, performance evaluation and improvement tion control for a vehicle semi-active suspension system
strategies of quadrotor controllers. Technologies 2021; via the PPO approach. Appl Sci 2022; 12: 3078.
9: 37. 58. Li Z, Chu T and Kalabić U. Dynamics-enabled safe deep
40. Sutton R and Barto A. Reinforcement learning: an intro- reinforcement learning: case study on active suspension
duction. IEEE Trans Neural Netw 1998; 9: 1054. control. In: 2019 IEEE conference on control technology
41. Nguyen N, Nguyen T and Nahavandi S. System design and applications (CCTA), Hong Kong, China, 2019,
perspective for human-level agents using deep reinforce- pp.585–591. New York: IEEE.
ment learning: a survey. IEEE Access 2017; 5: 59. Ming L, Yibin L, Xuewen R, et al. Semi-active suspen-
27091–27102. sion control based on deep reinforcement learning. IEEE
42. Ding Z, Huang Y, Yuan H, et al. Introduction to reinfor- Access 2020; 8: 9978–9986.
cement learning. In: Dong H, Ding Z and Zhang S (eds) 60. Kumar K, Pal S and Sethi R. Objective evaluation of ride
Deep reinforcement learning: fundamentals, research and quality of road vehicles. SAE technical paper 990055,
applications. Singapore: Springer, 2020, pp.47–123. 1999.
43. Azar A, Koubaa A, Mohamed N, et al. Drone deep rein- 61. Mechanical Vibration. Road surface profiles–reporting of
forcement learning: a review. Electronics 2021; 10: 999. measured data. The International Organisation for Stan-
44. Poole D and Mackworth A. Artificial intelligence: founda- dardisation ISO. 8608, 1995.
tions of computational agents. New York, NY: Cambridge 62. Múčka P. Simulated road profiles according to ISO 8608
University Press, 2010. in vibration analysis. J Test Eval 2017; 46: 405–418.
45. Francxois-Lavet V, Henderson P, Islam R, et al. An intro- 63. Dridi I, Hamza A and Ben Yahia N. Control of an active
duction to deep reinforcement learning. Found Trends suspension system based on long short-term memory
Mach Learn 2018; 11: 219–354. (LSTM) learning. Adv Mech Eng 2023; 15: 1687813
46. Huang H, Yang Y, Wang H, et al. Deep reinforcement 2231156789.
learning for UAV navigation through massive MIMO 64. Jin X, Wang J, He X, et al. Improving vibration perfor-
technique. IEEE Trans Veh Technol 2019; 69: 1117–1121. mance of electric vehicles based on in-wheel motor-active
47. Koch W. Flight controller synthesis via deep reinforce- suspension system via robust finite frequency control.
ment learning. ArXiv:1909.06493, 2019. IEEE Trans Intell Transp Syst 2023; 24: 1631–1643.
Dridi et al. 21

65. Jin X, Wang J, Yan Z, et al. Robust vibration control for in-wheel-drive electric vehicles. Math Probl Eng 2022;
active suspension system of in-wheel-motor-driven elec- 2022: 4628539.
tric vehicle via m-synthesis methodology. J Dyn Syst 67. Swethamarai P and Lakshmi P. Adaptive-fuzzy frac-
Meas Control 2022; 144: 051007. tional order PID controller-based active suspension for
66. Jin X, Wang J and Yang J. Development of robust guar- vibration control. IETE J Res 2022; 68: 3487–3502.
anteed cost mixed control system for active suspension of

You might also like