23 ML A Survey On Yogic Posture Recognition
23 ML A Survey On Yogic Posture Recognition
23 ML A Survey On Yogic Posture Recognition
ABSTRACT Yoga has been a great form of physical activity and one of the promising applications in
personal health care. Several studies prove that yoga is used as one of the physical treatments for cancer,
musculoskeletal disorder, depression, Parkinson’s disease, and respiratory heart diseases. In yoga, the body
should be mechanically aligned with some effort on the muscles, ligaments, and joints for optimal posture.
Postural-based yoga increases flexibility, energy, overall brain activity and reduces stress, blood pressure, and
back pain. Body Postural Alignment is a very important aspect while performing yogic asanas. Many yogic
asanas including uttanasana, kurmasana, ustrasana, and dhanurasana, require bending forward or backward,
and if the asanas are performed incorrectly, strain in the joints, ligaments, and backbone can result, which
can cause problems with the hip joints. Hence it is vital to monitor the correct yoga poses while performing
different asanas. Yoga posture prediction and automatic movement analysis are now possible because of
advancements in computer vision algorithms and sensors. This research investigates a thorough analysis of
yoga posture identification systems using computer vision, machine learning, and deep learning techniques.
INDEX TERMS Yogic posture recognition, optimal posture, machine learning methods, deep learning
methods.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 11183
A. K. Rajendran, S. C. Sethuraman: Survey on Yogic Posture Recognition
All the yoga postures are classified based on their util- Pranayama exercises four essential aspects of breathing
ity, and pre-position [5]. As per the utility, yoga asanas are Pūraka (inhalation), recaka (exhalation), anta kumbhaka
classified into three categories such as 1. Cultural Posture, 2. (retention of breath inside the body), and bahi kumbhaka
Meditative Posture, and 3. Relaxative Posture. (retention of breath outside the body) [8]. Figure 1 depicts
Cultural or Corrective Posture is used to regulate the the classification of asanas separated only based on utility and
defective body posture, systematize the different activi- body positions.
ties, and renovate energy in the body. Cultural postures
are commonly performed as daily physical exercise. It is A. HEALTH BENEFITS OF YOGA
separated into seven categories i) Dynamic Sequences, Sun salutation is the ideal activity that produces optimal
ii) Inverted Postures, iii) Forward Bending Postures, iv) Back- stress on the cardiovascular and respiratory systems because
ward Bending Postures, v) Twisting Postures, vi) Sideward it utilizes exercise’s slow, dynamic, and static stretching
Bending Postures, and vii) Standing Postures. elements. In order to examine the potential of sun saluta-
Meditative Posture is used to stabilize the body for some tion as an exercise and weight loss tool, four rounds of it
advanced pranayama and meditation practices. Siddhasana were assessed for their effects on the cardiorespiratory and
(Adept’s Pose), Padmasana (Lotus Pose), Vajrasana (Dia- metabolism systems. Six Asian Indian males and females
mond Pose), Sukhasana (Comfortable Pose), Swastikasana (18 to 22 years) who had practiced sun salutation for more
(Auspicious Pose), and Vrukshasana (tree pose) are some than two years took part in this study in good health. In this
of the meditating asanas. Relaxative Posture is performed to study, the volunteers were attached to a heart rate tracker and
get healed from the situations like stress and the associated the Oxycon Mobile Metabolism System to assess their heart
tensions. This Posture brings physical and mental relaxation. rate & oxygen consumption during the four rounds of sun
A few common relaxation postures include Makarasana salutation. The assessment was conducted in a single session
(Crocodile posture), Shavasana (Corpse posture), and Bal- lasting around 30 minutes. This study’s volunteers burned
asana (Child’s Pose) [6]. As per the pre-position, the major exactly 100 kcal every 15 minutes (400 kcal per hour) [9].
types of poses are standing poses, sitting poses, supine poses, Yoga has been accepted as a physical exercise that is safe
prone poses, balancing poses, forward bend poses, backward to practice following a stroke. According to participants, the
bend poses, twisting poses, and inversion poses [7]. physical advantages of yoga include ‘‘increasing flexibility,
Pranayama is the art of deliberately altering one’s strength, and coordination [10].’’ In this study, a two-week
breathing patterns, typically while seated, to include quick training program in an integrated set of yoga postures, com-
diaphragmatic breathing, slow, deep breathing, trying to prising breathing techniques, sun salutation, yogasana (phys-
breathe via alternate nostrils, and breath holding or retention. ical postures), pranayama (breathing), dhyana (meditation),
and a devotion session was taught to fifty-three asthmatic as yoga, Pilates, and workouts, as well as suitable activities
patients. They were instructed to perform these poses for that can be completed at home or quarantine room, provides
sixty-five minutes each day. They were compared with a con- an opportunity for those people to maintain or engage in
trol group of fifty-three asthmatics matched with their age, physical activity [17]. Yoga, healthy fresh fruits, vitamins C,
gender, and type severity of their condition, who kept taking D, and zinc boost the immunity of the affected ones [18]. Sun
the regular medications. Regarding the frequency of asthma salutation can help to manage weight and maintain or increase
attacks per week, drug treatment ratings, and peak flow rate, your cardiorespiratory fitness [9].
there was even a noticeably more significant improvement in During the practice, better monitoring of yogic posture is
the yoga-practicing group [11]. mandatory to gain all the health benefits.
This systematic review trial demonstrated that Sahaja yoga
has minimal positive effects on several objective and subjec-
tive indicators of asthma’s effects in patients who are willing B. YOGA RELATED INJURIES
to do nonpharmacological therapy. Compared with the con- Yoga-related injuries occur for several reasons, including
trol group, participants in the yoga group gained marginally poor instructor, improper posture, self- or instructor-imposed
higher on the Asthma-related quality of life mood subscale pressure to perform specific complex postures, and proper
and had higher peak expiratory flow values. Possible reasons guidance in yoga.
include a change in the passage of ‘‘vital energy,’’ as defined Muscular injuries are most common during any exercise
by the conventional yogic system, or a change in its dynamics or activity, and yoga will be no exception. Sixty-two percent
of airway muscle cells [12]. of survey respondents reported at least one musculoskeletal
This trial examines the effects of a single session of yoga injury that lasted over a month. There were 107 muscular
on cardiovascular physiology and discovers that paled in reports of injuries in total during the practice of Ashtanga
comparison to simply sitting on a chair at rest, yoga raises Vinyasa Yoga. A new practice-related injury occurred every
oxygen consumption (VO2), Metabolic Equivalents (METs), 1,000 training hours at a rate of 1.18. Recurrence of preceding
Heart Rate (HR), and percent Maximal Heart Rate (MHR) injuries and non-specific back pain of unidentified origin
by 0.35 L/min, 1.67, 20 beats per minute (bpm), and 11%, were added, increasing the injury rates to 1.45 per 1,000
respectively [13]. training hours. Injuries during yoga were more common in the
This study examines the heart rate differences whilst lower extremities, specifically, the hamstrings and knees [19].
practicing yoga postures, deep breathing, and relaxing tech- The following case studies reveal that improper stance of
niques. Sixteen volunteers are instructed through three dis- yogic posture could become a health issue during practice.
tinct styles of yoga practice. Polar S610 heart rate monitors A 15-year-old girl had a fracture-separation of the epiphyseal
measure every practice session’s one-minute mean heart rate. plate of the distal tibia at the time of practicing the lotus
A repeated analysis of variance measures did not find any yoga posture. In the lotus posture, ankles need to be laid in a
substantial variations amongst any of the three yogic styles position of supinated-inverted with the foot on the opposing
at the early or the last resting heart rate. It is measured thigh. This posture involves the entire structure of the tibio-
in the fourth and seventy-eighth minute of every session. tarsal joint’s outer compartment, including the tibiofibular
A substantial variation was discovered between the early and ligament, and forces the insertion of an anterolateral region
last relaxing postures during the whole 80-minute practice as of the epiphysis tract to the maximum [20]. A 38-year-old
well as throughout the ‘‘Postures only’’ phase of the session. healthy yoga practitioner had a low-energy femoral shaft frac-
Compared to the remaining two forms of yoga, Bonferroni ture while practicing a specific yogic stance, marichyasana
post hoc checks found significance in the astanga yogic heart pose B. This posture must bend the hip and knees into the
rate. There was no discernible variation between hatha yoga opposing inguinal crease [21].
and gentle methods. These findings reveal that various yogic From their case report, doing yoga while using sedative
exercises might have different health advantages [14]. drugs, being elderly, or having ‘‘benign hypermobility’’ of
Participants graded yoga’s impact on seven health-related their connective tissue needs much alertness. These individu-
factors, including lifestyle, mental, and physical compo- als risk suffering sciatic nerve damage while doing complex
nents. The majority said they strongly agreed with or agreed and challenging yoga postures for an extended amount of
that yoga had enhanced their strength (87.1%), flexibil- time [22]. Information released on particular injuries linked to
ity (91.6%), stress (82.6%), and mental stability (86.2%). yoga has frequently come from relatively small case studies,
In addition, 57.4% of people reported sleeping better, and frequently focusing on a single person who sustained an
69.3% mentioned their yoga practice had improved the way injury as an outcome of extreme activities which are very
they lived [15]. out of the normal for a yoga session or because of the par-
This study reveals that in all suspected and proven cases ticipant’s underlying medical condition [23]. Injuries in yoga
of COVID-19 (stages 1 and 2) cases, in order to manage are not rare; however, the most frequent adverse effects are
their stress, they need to practice relaxation exercises like musculoskeletal-related, primarily tiny ligaments or muscu-
breathing exercises, meditation, and yoga [16]. Availability lar injuries which cure completely without intervention [21].
of internet platforms that give remote fitness classes such In this study, researchers discovered that the incidence of
severe injuries in yoga was low compared to specific other and evaluate human posture [52], [53]. In HPE, the problem
physical exercises and that the number of injuries revealed by of localizing the keypoints or body parts of the human is
practitioners for each year of practice exposure is low, con- mainly focused on [54].
firming that yoga is not a high-risk activity. Despite the limits HPE automatically discovers the keypoints of human body
of yogic research studies, the available evidence indicates parts in images and videos. In detecting a single person
that participants gain a wide range of wellness and health in an image or video in simple scenarios, a single-person
improvements [23]. posture estimation algorithm is used to detect the key-
Optimal posture is essential to get all the health benefits of points of a specific posture. If more than one person is
yoga; otherwise, poor postures result in minor injuries while detected from the image or video, then multi-person algo-
practicing yoga. rithms are employed to detect the keypoints in human body
parts [55].
C. THE ROLE OF COMPUTER VISION IN YOGA 1) Single-Person Pose Estimation: It is used to estimate a
Yoga is widely practiced in the yoga centre under the proper stance of a single person from an image or video. Picto-
guidance of a yoga instructor who could effectively instruct rial structure models are conventional methods of this
the client thoroughly with his/her assistance. approach. For example, tree and random forest (RF)
The COVID-19 outbreak and its associated lockdown mea- models are the most efficient in single-pose estimation.
sures have twisted the work-life balance. Due to global Deep learning techniques have been very convincing in
restrictions, individuals have been forced to stick in their object/face detection and HPE recently [56].
households for days or weeks. Meanwhile, all the gyms, yoga, 2) Multi-Person Pose Estimation: Most of these algo-
and sports centres are closed. It has put enormous stress on rithms use a top-down approach to estimate an image
people’s physical and mental health. and then extract the posture of every person [57]. Two
Recent advances in vision and sensor-based methods sug- major multi-person posture estimation algorithms [58]
gest a distanced yoga approach without the trainer, in which follow either a bottom-up or top-down approach. How-
the individual could stand before a gadget and precisely prac- ever, on the contrary, all the single-person posture esti-
tice yoga postures without the need for an educator or even mation approaches are top-down [59].
the yoga learning centre [24]. Researchers focus on creating • Top-Down Approach: This straightforward method
novel digital systems based on computer vision methods first recognizes the person and combines the dis-
to process, analyze, and sense visual information (such as covered keypoints of human body parts, then cal-
images or videos) from cameras or sensors. culates a posture for each person [60], [61], [62],
With recent technological advancements in sensors and the [63], [64], [65] based on the top-down approach.
emergence of kinetic devices, it is now easier to detect human • Bottom-Up Approach: This approach first detects
poses in real time. Human Pose Prediction is one such type all parts in the image, followed by associating parts
that is generally utilized for real-time yogic posture detection for each distinct person [54], [66], [67], [68], [69]
around the world. It focuses on remaking and comprehending based on the bottom-up approach.
the asana’s posture from depth images. 3) 2D Posture Estimation: In two-dimensional posture
There have been several reviews published in the area of estimation, X and Y coordinates are solely anticipated
human activity recognition in vision [25], [26], [27], [28], in each image landmark. It provides no information
[29], [30], [31], sensor [32], [33], [34], [35], [36], [37], regarding the angles of the skeleton.
machine learning [38], [39], [40], [41], [42], [43], and deep 4) 3D Posture Estimation: In three-dimensional posture
learning-based methodologies [44], [45], [46], [47], [48], estimation, both X, Y, and Z coordinates of each land-
[49], [50], [51]. Nevertheless, there needs to be a survey mark and the angles of each body joint of a human
that focuses specifically on yogic posture recognition. The skeleton.
first comprehensive survey focuses on current vision-based
& sensor-based yogic posture recognition utilizing Machine E. HUMAN BODY MODELING
Learning (ML) and Deep Learning (DL) techniques. This
One of the most important aspects of identifying human
one will give a good overview of recent studies and recom-
posture is the choice of a human body model. A body model
mendations for further research directions in yogic posture
contains information like human shape and texture [43].
recognition. This review focus on the factual studies in yoga
Analysis of the human body model gives all the essential
posture classification, yoga posture grading, and performance
information about the body. All the significant aspects of
analysis of numerous works conducted on the posture recog-
the human body model include points, segments (each seg-
nition of yoga in recent years.
ment is considered a rigid body), and segment groups linked
to one another via joints. Human body models are sepa-
D. HUMAN POSE ESTIMATION (HPE) rated into the following categories based on how frequently
Real-time HPE is a promising and significant research chal- they are used [70]. Figure 2 shows the three human body
lenge in computer vision. The HPE’s core concept is to detect models.
other visual [24], [94], [95], [96], [97], [98], [99], [100],
[101], [102], [103], [104], [105], [106], [107], [108] and
wearable sensors [97], [109] are some of the most uti-
lized in postural estimations. Radio Frequency Identifica-
tion [RFID] [110] and Wireless Fidelity (Wi-Fi) [111], for
example, are location-based sensors. In today’s era, a wide
range of affordable and portable sensors capable of sensing
and communicating data via wireless networks are widely
FIGURE 2. Types of models in human body modeling.
available.
Data collection [24], [81], [82], [83], [84], [85], [86], [87],
[88], [91], [92], [94], [95], [96], [97], [98], [99], [100], [101],
1) SKELETON-BASED MODEL [102], [103], [104], [105], [106], [109], [110], [111], [112],
This model is also referred to as the kinematic model [71]. [113], [114], [115], [116], keypoint extraction [81], [82], [83],
This human body model was simple and adaptable enough to [84], [85], [86], [87], [88], [91], [92], [94], [95], [96], [97],
be used in 2D, and 3D posture estimation [54], [72]. It rep- [98], [99], [100], [101], [102], [103], [104], [105], [106],
resents the set of joints like shoulders, elbows, knees, ankles, [109], [110], [111], [112], [113], [114], [115], [116], pose
and limb orientations encompassing the skeletal structure of identification [24], [81], [82], [83], [84], [85], [86], [87],
a human body. While the advantage of this model includes [88], [91], [92], [94], [95], [96], [97], [98], [99], [100], [101],
its flexibility and ease of representation, it also has certain [102], [103], [104], [105], [106], [109], [110], [111], [112],
disadvantages; for example, a lack of textures in an image [113], [114], [115], [116], and evaluation [81], [84], [85],
implies a lack of width and contour details of the human [86], [87], [88], [91], [92], [94], [95], [96], [97], [100], [101],
body [70]. [102], [104], [105], [106], [110], [111], [112], [113], [115]
is the four essential stages of the yoga posture recognition
2) CONTOUR-BASED MODEL process. Figure 3 depicts the whole architecture of yogic
This model is also referred to as the planer model. It depicts posture recognition.
the human body shape by encircling it with a contour [73]. In the first stage, it collects the user’s image or real-time
Multiple rectangles imprecise the person’s body contours video of yoga postures using various cameras and sen-
reveal human body parts in this model, and it is used for 2D sors. The second stage uses the keypoint estimation method
pose estimation [74]. The most widely used contour-based on a real-time video stream to automatically extract the
models are the active shape model, and the cardboard user’s body posture while doing asanas. The third stage
model [75], [76], [77], [78], [79]. uses machine learning & deep learning approaches to extract
essential features of human body posture in images and
3) VOLUME-BASED MODEL videos manually/automatically. The fourth and final stage
It is used to estimate 3D poses. Various well-known 3D predicts the asana and analyses the yogic posture done by the
person’s body models are used for deep learning-based user.
three-dimensional postural measurement for 3D human mesh
extraction [74]. It was trained using 60,000 human scans from G. PAPER COLLECTION PROCESS
the GHS3D dataset, which includes full-body and close-ups In this process, one of our primary paper collection sources
of the face and hands [80]. is Google Scholar. We utilized IEEE Xplore and PubMed
in addition. We utilized nineteen keywords for searching
F. YOGA POSTURE RECOGNITION (YPR) references in all three databases, and it is separated into five
Over the past decade, numerous works have been done in groups. General searching keywords are in groups 1 & 2,
yoga posture recognition using vision and sensor technology. utilized type of sensors are considered in group 3, uti-
Motion capture sensors [81], [82], [83], [84], [85], [86], lized keypoint detection tools and libraries are mentioned in
[87], [88], accelerometers [89], [90], gyroscopes [91], mag- group 4, and utilized learning methodologies are considered
netometers [89], motion sensors [92], pressure sensors [93], in group 5. Table 1 illustrates those five groups.
• This paper might be helpful for those who are working 1) CAMERAS
in machine learning & deep learning-based yoga posture The most basic and classic approach to action identification
identification in real-time and guides future researchers is to install security cameras within the venue and observe
in the right direction. human activities. Information can be seen either manually
This review has been organized as follows: section II (by a user reviewing all images and videos) or automati-
enumerates various sources of data based on vision and cally. Computer vision methodologies have been proposed to
sensor, section III explains the several keypoint detection process and analyze info and distinguish activities automati-
techniques used in pose estimation, section IV illustrates the cally [117].
learning models and their evaluation metrics used in yoga Vision-based pose recognition systems use cameras, such
posture recognition, section V specifies the yoga pose pre- as RGB cameras, web cameras, and mobile cameras, as their
diction approaches, section VI discusses the inferences of sensors and analyze the human motions in images or
yoga posture recognition systems, and finally, section VII videos [118], [119]. It is inexpensive as it makes use of
mention the future directions and conclusion of this computer vision technology, which is less costly than the
review paper. Figure 5 illustrates the taxonomy of this specialized sensor. Furthermore, it may run on a CPU system
review. only using a camera, and it does not really require a GPU
for computing, keeping it inexpensive. It does not require any
II. SOURCE OF DATA special sensors or advanced technologies to operate, reducing
Pose Recognition methods are classified into two main cate- operating costs and making it available to a wide range of
gories based on the generated data. people [101].
1) Vision Based
2) Sensor Based
2) DEPTH CAMERAS
A. VISION BASED One disadvantage of traditional cameras is their dependency
A vision-based pose analysis has the promising ability to on lighting, which means they will never be able to work in
provide inexpensive and feasible solutions in the estimation a dark environment. Depth cameras, like Kinect, could oper-
of the human body pose using cameras. ate in dark places, effectively solving this problem. Kinect
produces a wide variety of streaming data, including color, monitor and identify arrhythmia and aid in exercise and
depth, and audio [81], [82], [83], [84], [85], [86], [87]. recovery. Wearable gadgets are targeted to healthcare pro-
It can collect more data about the human body and produce fessionals, including ECG patch recorders, vests, fitness
an accurate virtual skeleton. Activities can be identified using trackers, smartwatches, and smart clothing with built-in sen-
this knowledge because distinct motions of a body (par- sors to increase prognostication and early identification of
ticularly the skeleton) are associated with various motions. severe decompensation. Wearable technology is developing
Apart from the processing complexity, depth-sensing devices very quickly. A decision on the treatment of cardiac failure
are expensive, which is a barrier to using them for activity can benefit from the instruments’ increasing precision. The
recognition [117]. advancement of wearable technologies and information inter-
faces between customers, patients, and medical professionals
B. SENSOR BASED was likely to support healthier lifestyles and disease preven-
Many tools are available for yoga posture recognition, includ- tion [122]. Nadi X yoga pants were created for people who
ing vision-based markers, markerless, and wearable sensors. seek a step-by-step tutorial about how to perform a yogic pos-
Vision-based sensors rely on the presence of an infrastructure ture, along with advice on where to concentrate and evalua-
to be assessed. As a result, this remedy is intrusive for con- tion of yogic posture’s effectiveness [123]. Nadi X yoga pants
tinuous monitoring of everyday actions. Furthermore, vision- eliminate the need for an instructor to monitor yogic stances
based recognition systems are expensive. The only reason at practice. It gives gentle vibration feedback about optimal
for considering wearable (motion) detectors is for practical alignment for that specific posture during yoga practice.
reasons. Following that, innumerable acceleration sensors, According to the healthcare global market report in 2022,
gyroscopes, and force sensor nodes have been developed. the size of the healthcare market of the global internet of
Wearable sensors are small, inexpensive, and lightweight and things (IoT) is anticipated to increase from $130.26 billion
are used to acquire data without interfering with daily activi- in 2021 to $158.03 billion in 2022, representing a compound
ties [120]. Figure 6 specifies the utilized sensors in YPR. annual growth rate (CAGR) of 21.3%. With a CAGR of
Wearables have had a massive effect on one of the areas 22.4%, the worldwide healthcare market share is expected
where conventional medical systems are migrating to active to reach $354.66 billion in 2026. The growing use of smart
models which will take better care of a patient’s medical devices and wearables will propel the IoT in the healthcare
status through continuous monitoring in order to diagnose market. For instance, the Times of India, an Indian newspaper
the patient’s illness at an early stage. Wearables are tiny, which was published in April 2022, reported that just over
portable medical tools that provide immediate access to a 200 million smart gadget users in India in 2019; the IoT
patient’s health status, often in time to protect life. A wearable revolution, as well as the rapid digitization caused by the
device makes it easier to react to changes in the patient’s pandemic, increased the number of users to over 2 billion in
body quickly. Real-time data can be gathered from a wearable 2021. [124].
device, and findings can be obtained using machine learn- To recognize their activities, people wear various sensing
ing algorithms. The user data was collected by wearables, devices in different places: Sensewear, ActiGraph on the right
which then analyzed the data using machine learning algo- and left wrist, and ActivPal on the right hip [125]. According
rithms [121]. to the rapid development in sensor technology, sensor-based
Most wearables were marketed toward customers and pose recognition is prevalent and widely used in many areas,
monitored the body vitals and motion tracking. It can also such as medical and healthcare applications. They provide
and to examine the smoothness with which exercise was MIMUs were one of the most promising motion capture
executed [90]. technologies and standards because of their smaller size and
low cost. The procedures used to extract orientation informa-
2) GYROSCOPE tion using MIMUs are known as ‘‘sensor fusion’’ algorithms
A gyroscope would be a triaxial MEMS device that mea- that integrate information from several sensors to provide an
sures the angular motion of an object, such as a body part. accurate estimation of kinematic features like joint angles.
A gyroscope operates on the Coriolis concept, which mea- They could also be used in outdoor applications, but their
sures angular position based on linear movement. Modern kinematic estimation accuracy is lower than OMCs. As a
gyroscopes have typical resolution and sampling rates equiv- result, IMUs could be considered a suitable choice [138].
alent to accelerometers; maximal angular speed is roughly Among them, Kalman filters are one of the most depend-
1000-2000 degrees per sec; energy usage seems to be an order able, efficient, and durable sensor fusion algorithms. Human
of magnitude greater. Gyroscope sensors could be mounted kinematic assessments in indoor circumstances utilize an
on several human body regions, including the ankle, foot, IMU-based Extended Kalman filter. Four IMUs have been
waist, and knee, to monitor body pose and kinematic move- mounted laterally on the right upper limb and trunk using
ments. Gyroscopes have a minor bias drift than accelerom- elastic bands. All the IMUs and the reflective markers were
eters, but they have measurements that are less sensitive to braced on the 3D-printed plastic support. During the execu-
shocks, and gravitational field impact [120]. tion of the sun salutation series, five qualified yoga train-
With sensors like accelerometers and gyroscopes in wear- ers were used to test the system. By comparing the joint
ables and smartphones, exercises or physical activities can angle predictions with the results collected from the optoelec-
be easily tracked by capturing the user’s body motions. The tronic reference system, Pearson’s correlation coefficients
smartphone collects the 6D sequence of time-stamped sam- with Mean Absolute Error (MAEs) can be measured [138].
ple data equivalent to the 3-axis accelerometer, and gyro- Gupta and Gupta [91] proposed a yoga help system-
scope [91]. mounted sensory units in various body parts to correct the
yoga postures with feedback. Each sensor unit comprises a
Node Micro Controller Unit (NodeMCU) module, including
3) MAGNETOMETER
motion sensor units, an accelerometer, and a gyroscope. This
A magnetometer is a device that detects the magnetic field’s module serves as a control unit and transmits the sensor infor-
direction, strength, & changes. In wearable technology, the mation via Bluetooth. The nine-axis IMU sensor (MPU9250)
hall effect detects the Earth’s magnetic field. Magnetometers does provide an accelerometer and gyroscope with a sam-
can also help determine the user’s absolute alignment for pos- pling rate of 50 Hz. They also created a mobile application
ture recognition. Micromechanical magnetometers typically and made data collection fast and easy on the smartphone.
have lower sampling rates with signal-to-noise ratio (SNR) Furthermore, a feedback report portraying the accuracy level
resolutions, around 10-100Hz & 8-12 bits. As a result, the is prepared and sent back to the trainee’s mobile phone. This
magnetometer is being used as an assistive motion-sensing system predicts and evaluates the sun salutation yoga posture
component [120]. Magnetometers are one of the few sensing sequence using sensors and a deep neural network.
devices that are unresponsive to acceleration and can provide An interactive yoga training model with motion replication
absolute information regarding alignment in 3D space [89]. is used in virtual reality. Their system uses sixteen IMUs and
Magnetic sensors are typically hall effect sensors. six tactors to capture the user’s body posture. It compares
and analyzes the yoga postures of experts and practition-
4) INERTIAL MEASUREMENT UNIT (IMU) ers and gives feedback to the users for correcting the yoga
The IMU was a sensor unit that uses a combination of an posture [139]. The IMU seems to be the most common and
accelerometer, a gyroscope, and a magnetometer to detect the accurate wearable device for constructing movement analysis
user’s linear velocity, angle speed, alignments, and the force applications due to its small size and integrated sensor fusion
of gravity. A Mahony filter with an additional Kalman filter is implementations [120].
commonly utilized for physical data fusion of triaxial inertial
measurement units. Some gadgets, such as Bosch Sensortec
chips, straightforwardly generate exact orientations in quater- 5) ELECTROMYOGRAPHY (EMG)
nions at a sample rate of 100Hz [120]. An EMG evaluates muscular actions such as voluntarily or
Human motion assessment is now performed using vari- involuntarily muscular contractions. It can reveal muscu-
ous technologies, including infrared optoelectronic systems lar disorder, nerve disorder, and nerve-muscle transmission
(OMC), Magneto-Inertial Measurement Units (MIMUs), and issues, all of which cause locomotion difficulties. The sen-
camera-based systems. OMCs are the ‘‘gold standard’’ for sor’s EMG electrodes record electrical signals in use for
motion tracking. Even though these processes have high muscular contractions. These signals can then be analyzed to
accuracy, they require a secure environment and special- identify anomalous behaviour after they have been acquired.
ized skills, making them impractical for outdoor environ- The Electromyographic sensor employs two different types
ments [138]. of electrodes: needle-like invasive electrodes for readings
of depth and sensitivity, & non-invasive, lower sensitivity brain impulses of the cerebral cortex. EEG devices assess the
surface of the skin electrodes. Surface EMG (sEMG) anal- electromagnetic fields linked with a broad set of neurons. Due
ysis can examine various gait-related characteristics such as to the general complexity in framing the spatial activities onto
muscular properties, paresis, rigidity, and stress [120]. the distinct areas in the brain & electrode location, EEG is
Myoware is often used to monitor muscular activity tough for inexperienced spectators to understand [112].
through electric potential, generally known as sEMG, which
has already been utilized in clinical research and diagnosis
of neuromuscular disorders. EMG sensors, however, have 7) INFRARED (IR) SENSOR
also found their way into robotic systems, prosthetic devices, An IR sensor is a radioactive material with a spectral sens-
and other control devices as microcontrollers and integrated ing component within specific wavelengths ranging between
circuits have become much more powerful [113]. 780 nm to 50 µm. They are most extensively used in move-
However, accelerometers had intrinsic difficulties distin- ment detection. IR sensing elements easily detect physical
guishing between passive and active movement performance. movements based on heat variation.
sEMG sensors, on the other hand, are sensitive to muscular Low-resolution IR sensors have been used in the
contraction and hence aid in distinguishing between active gadget-free yogic pose detection system. It contains a sensory
and passive movements. The method of sEMG study is module with eight thermal sensors and an I2C serial port
gaining significance in sports and is now the most practi- connecting to WiPy 2.0 (Wi-Fi/Bluetooth module). A WiPy
cal method of determining how hard a muscle is working 2.0 unit, including an ESP32S microcontroller, interacts with
(although it has some limitations) [90]. a router and the deep learning server over the Internet. The
A postural recognition system is used during yoga ses- AMG8833 has a built-in camera. The sensor also has a small
sions to ensure that specific lower muscle movements are surface-mounted device and senses IR signals with wave-
accurate. To collect data, ten people participated in this study lengths ranging from 8 to 13 µm, and it records the observed
and performed five yoga poses. Their technique analyses the temperatures as the floating-point value of 2 decimal digit
movement of four lower-limb muscle cells in both legs using numbers in Celsius. The significant benefits of AMG8833
EMG signals output. The analog-read package of Arduino, in the YPR system are compact, low energy consumption,
a simple open-source circuitry design that relies on hardware unobtrusiveness, ability to identify immobile subjects, off-
and software, provides data-gathering EMG values ranging the-shelf nature, and low cost compared to all other thermal
from 0 to 1,023. The Myoware muscle sensor is a pre- imaging cameras [92].
configured printed circuit board that includes the circuitry A unique IoT-based yogic posture recognition system con-
needed to convert minor fluctuations of muscle energy into tains three wireless sensor nodes, and each node interconnects
analog values that a microcontroller could recognize. After the wireless unit and low-resolution IR sensors. The wireless
collecting EMG data from the ten participants, the Simple sensor nodes are mounted on ceilings and walls to capture
Average Moving (SMA) calculation was utilized to ana- yogic postures in x, y, and z directions. They selected 18 yogis
lyze the EMG data and remove the noise from them. The to do 26 postures throughout two sessions, each lasting up to
EMG data were extracted using feature extraction before 20 seconds. Those sessions are recorded, preprocessed, and
being served into machine learning approaches (like SMO, then transformed into grayscale images. Their model evalu-
J48, Random Forest) for posture detection. According to the ated 93,200 posture images using the tenfold cross-validation
results, the Random Forest algorithm has the more accurate with DCNN, producing an accurate result of 99.991% [92].
results in identifying yoga poses when compared with other
approaches [113].
The amplitude of the sEMG signal was proportionate to 8) FORCE SENSITIVE RESISTOR (FSR) SENSOR
the force produced by the muscles, and it was used to assess FSR sensors sense the static and dynamic pressure exerted on
the muscle effort exerted by people to do specific asana. The a target surface. Their reaction range is mainly decided by the
signal should be adjusted before comparing sEMG signals fluctuation in electrical resistivity [140].
from various individuals. sEMG signals were normalized to FSR is made of a semi-conductive substance or semi-
different characteristics such as the person’s height, body conductive inks between two thin substrates. Shunt type and
weight, limb, and neck [90]. thru type are two distinct types of FSR sensors. FSR in
the shunt type were polymeric thick-film sensors with two
6) ELECTROENCEPHALOGRAPHY (EEG) membranes divided with a thin air gap. One membrane con-
Most meditation studies were focused on EEG due to its low tains two pairs of interdigitated traces that are electrically
price, portability, & non-invasive connectivity to the activity separated from each other, and the other is covered with
in the brain. Brain waves are altered based on specific cogni- some special textured, resistant ink. Thru-type FSRs used
tive and motor act changes. EEG has become one of the finest twin polyester outer substrates as resilient printed circuitry.
ways of grasping brainwave activity in a non-invasive fashion Silver rings with traces are positioned above and below in
with high resolution, compared with many other brain signals. between the pressure-sensitive phase, succeeded by the poly-
Electrodes are placed on the scalp surface to assess the overall mer film [141].
Their system facilitates a real-time yoga practice using non-intrusive Wi-Fi gadgets that can do action sequences
an embedded-based Intelligent Yoga Mat (ESYM) to correct classification, which would be more comfortable and secure
the yoga postures. For evaluating the pressure of human than vision-based & wearable sensor systems. Neverthe-
poses, FSR-type sensors are favoured. The ESYM is created less, the wireless system is vital in ambient environments.
using a network of pressure sensors. At first, the pressure Although Wiga works well in this experimental scenario, its
nodes on the ESYM are recognized, and then a pattern is accuracy might degrade in a different atmosphere, but since
derived using FSR sensors. Pressure sensor data modules the model was trained inside one environment would learn
store each sensor’s information. For the assessment of yoga environment-specific properties [111].
poses, an approach for pattern recognition was developed. The contributions of sensors in the yoga posture recogni-
Through the use of a speech unit, biofeedback results are tion system are summarized in Table 3. In these approaches,
utilized to correct the poses [93]. the sensors are mounted on the yoga practitioner’s entire
body [91], [93], [109], [112], [113], [114], [138], and the
9) RFID walls & ceiling [92]. These sensors predict the entire body
With the growth of RFID technology, various techniques posture, muscle activities, and the rested brain states during
for human action recognition employing device-free RFID the practice of yoga.
technology have developed in recent years. Initially, the RFID Many studies have been conducted to develop pos-
technology’s range was limited to a few more centimeters, but ture recognition models using cameras, Kinect, wearables,
it has already been expanded for passive and active tags. and smart mats. Vision-based methods, in general, have
RFID technology is comprised of two major components: higher accuracy due to higher resolutions; nevertheless, these
readers and tags. A reader is a device that reads tags and methodologies raise privacy issues. As a result, most users
collects data from them. It is equipped with an antenna that may well not tolerate such systems. On the contrary, the
emits radio waves. RFID tags receive and manipulate these key advantages of using wearable device approaches are
radio signals with data like ID. A reader can collect these non-invasiveness and high accuracy. Even though transport-
backscattered signals through the use of an antenna that ing and maintaining numerous or even a sole wearable
contains tag information. Tags are microscopic chips that can sensing device is problematic in long-term real-life usage
be applied to a variety of objects. There are two kinds of because of its maintenance overhead and discomfort. As a
tags: Active and Passive. Active RFID tags have their power result, among the most optimal alternative for yogic posture
source, whereas passive RFID tags do not have a battery and detection would be to use a privacy-preserving & device-
rely on the readers’ radio signals for energy. As opposed to free sensor module that does not use the cameras or any
passive tags, active RFID tags have a greater range. RFID has wearables [92].
been embraced in various disciplines because of its passive
nature, low price, and unobtrusiveness. RFID is currently fre- III. FEATURE EXTRACTION
quently employed in scientific work on activity recognition. The collected information would first go through the feature
RFID technology is used by researchers in tracking, localiza- extraction methods in order to support an effective perception
tion, posture, gesture, and behaviour recognition [117]. of human actions. The recognition model can be constructed
RFitness is the fitness posture system to detect yogic pos- using learning approaches from every feature occurrence.
tures using RFID tags on the yoga mats. Multiple commodity After training, unseen occurrences can be assessed with
RFID tags are attached to the yoga mat to predict the yoga in recognition model to estimate a prediction of the act
posture, letting different yogic postures activate distinct RFID done [32].
tags. According to the detected signals from the RFID tags, In order to classify images effectively, features play a
distinct yogic postures are easily identified using deep neural crucial role, and it is correlated to image classification effi-
networks [110]. ciency. Traditionally image features are separated into two
categories. In the first stage, basic features include shape,
10) WI-FI color, and textures. The second stage includes semantic fea-
There has been a paradigm change in activity recognition tures like scene and behaviour. Conventional feature extrac-
research during the last decade, from device-based meth- tion techniques are concentrated primarily on extracting the
ods to device-free ones. Researchers have started to employ image’s local or global features, including spatial relation
Channel State Information (CSI) for activity detection after extraction and other artificial feature extraction methods.
investigating the features of wireless networks. Several Wi- Local binary patterns, scale-invariant feature transform, &
Fi-based localization, track, and fall detection methods were histogram of the oriented gradient are some representative
already proposed. Wi-Fi has the advantage of being unob- methods of extracting the artificial features from images.
trusive, so users are not obliged to take any device with Additionally, automated extraction of feature methods con-
them [117]. tains linear & non-linear, supervised & unsupervised, includ-
Wiga is a noncontact activity detection system using Wi-Fi ing principal component analysis [111], [142], decision
in real time. It can detect the sequence of actions; moreover, tree [103], [113], random forest [87], [101], [103], [113],
the user did not participate in a training phase. Wiga uses kernel approaches [87], [106], [112] and deep learning feature
extraction methods. Support vector machines [101], [103], vertical orientation and magnitude of gradients from the
[113] in kernel approach, convolutional neural networks [83], entire image. Typically, this method is employed to iden-
[94], [97], [100], [104], [115], deep neural network [91], [92], tify human postures in images. HOG divides the image into
[110], and recurrent neural network [96], [102], [111] in deep blocks, then computes the gradient histogram of every block,
learning methods are having efficient feature learning capa- and finally concatenates these histograms to create the feature
bilities [143]. Feature extraction methodologies are separated vector [43].
into three categories based on the extracted features.
C. SKELETAL BASED FEATURE EXTRACTION
A. LOCAL FEATURE EXTRACTION Skeletal data features extracted from the human skeleton had
Local features in the images or videos refer to the patterns recently spurred research into yoga activity recognition [97].
or distinctive structures like points, edges, or small image Extracted skeletal data provides significant improvements in
patches. They are primarily associated with an image patch accurate prediction. In most cases, depth sensors such as
distinct from its surroundings in terms of texture, color, Microsoft Kinect derive skeleton-based key features. This
or intensity. depth sensor provides the body joints’ 3D coordinates. The
The Local descriptors, including Scale Invariant Feature feature vector could be calculated using the relative distances
Transform (SIFT), Speeded Up Robust Features (SURF), and between the joints [43].
Local Binary Patterns (LBP), have been widely used in vari-
ous computer vision tasks, especially HPE [43]. To recognize D. KEYPOINT BASED FEATURE EXTRACTION
objects, SIFT was initially developed. SIFT features have In pose estimation, the configuration of the body (pose) is
become very effective image descriptors, and their use in face predicted from an image. The process of pose estimation is
analysis has been extensively researched [144], [145], [146], separated into basic steps:
[147]. SIFT and SURF features are utilized to compare the 1) Identifying human body joints/keypoints.
user’s portrayal of an asana to a video of the identical ‘‘asana’’ 2) Grouping those predicted joints.
done by an expert [107], [148], [149]. Initially, keypoints are usually identified by various joints
in the human body. It includes all the joint positions, eye, ear,
B. GLOBAL FEATURE EXTRACTION neck, shoulder, and so on. In the second step, by grouping
A method for extracting global features based on com- all those joints, the entire body structure should be formed,
puter vision is the Histogram of Oriented Gradient (HOG) and these grouped keypoints predict the pose of a human
[150], [151], [152], [153]. It calculates the horizontal and at a time [154]. The keypoint detection method extracts the
TABLE 4. Kinect models used in YPR. and voice control. It captures real-time RGB and Depth
video feed. Compared with traditional RGB videos, the
key benefit of employing depth video content is that
separating the human in the foreground would be easier,
while the video seems cluttered. color information is
also not there in the depth videos, so clothes worn by
the human subjects could not affect the segmentation
process. Kinect was initially manufactured as a motion
controller to avoid physical controllers in 3D gaming
applications, and it is a specific novel product among
the numerous competitors in the gaming sector. It per-
mits activity recognition researchers to focus mainly on
acquiring robust feature descriptors to specify its actions
instead of low-level segmentation [158], [159].
Kinect has an inbuilt IR sensor that can detect the
information of distance in-between the object & the
x and y coordinates of distinct areas of the body parts and sensor and generates the depth images all at the rate of
their confidence levels. After that, the coordinates of different 30 frames per second. Furthermore, Kinect’s precision &
areas of the body derived from such an image encompass adaptability in detecting the joints of a human body are
sufficient evidence to identify whether the posture is correct outstanding, letting it detect complex and challenging
or not [94]. Keypoint estimation methods could not need postures [86].
high-tech devices for analyzing pictures or videos. It pre- The first version of Kinect usually captures images at the
dicts the following poses: sitting, standing, walking, running, rate of 640 × 480 pixels resolution [83]. To extract the
cycling, lying down, fall detection, push-up count, and activ- features of the user’s body map, Kinect uses the OpenNI
ity prediction [155]. library and OpenCV library to extract the body contour
from the body map [81]. Body contouring information
1) MICROSOFT KINECT is gathered in a single direction; therefore, it cannot pre-
In pose estimation, the major challenge was predicting the cisely compare and contrast the yoga poses of beginners
human body coordinates of joints. With the introduction and experts in the field. With the help of two Kinects,
of Kinect, anyone can easily detect and forecast the body the body maps of the practitioners were extracted from
coordinates joints in any human pose using depth images. the front and side views. It eliminates the limitations
Conventional techniques in this field have mainly relied on in [107] and [139] surpasses the previous comparison
common sensors, like RGB cameras, that are often compu- results [84]. Kinect utilized a built-in infrared laser pro-
tationally expensive and susceptible to lighting changes and jector, multiarray mic, RGB camera, and CMOS sensor
cluttered backgrounds [86]. The following figure 7 illustrates to capture the image and video. It also has a skele-
Microsoft Kinect’s three versions. ton tracking tool to detect and track the coordinates
Table 4 summarizes the utilized Kinect versions in the yoga of human joints. There has recently been a surge in
posture recognition system. interest in employing Microsoft’s 3D Kinect sensor in
• Kinect Version 1 vision-based pose identification [86].
Kinect is a motion perception input device released by Using a Microsoft Kinect, a novel self-trained yogic
Microsoft in 2010. Microsoft Kinect contains RGB cam- posture recognition system was designed. They selected
eras, an infrared camera, infrared projectors, and micro- five yoga practitioners to perform 12 yoga postures five
phones. It can be used for gesture recognition, posture times each and collected 300 videos. First, the user’s
recognition, speech recognition, skeletal body detection, body map is captured, and the body contour is extracted
using a Kinect. A star skeleton, a rapid skeletonization recognizes the gesture of yoga. They recruited 20 stu-
method that connects the centroid of a particular object dents (adults) with minimal yoga experience for this
towards the contouring extreme, is utilized as a promi- course and measured the posture alignment of yoga
nent identifier of human posture in YPR. A distance conducted weekly twice for 75 minutes over ten weeks.
function is used to examine the divergences between the The system accuracy was over 90% in all yoga postures,
user’s pose and the pre-built traditional Yoga poses. The along with the specificity closer to 1 [85].
system’s pose prediction accuracy was 99.33% [81]. • Azure Kinect
Their model classifies the yoga poses based on the Microsoft has made great efforts to enhance the capabil-
detected body joint points in real-time using Kinect. ities of Kinect in business applications, as previous ver-
They selected five yoga practitioners to perform three sions of the device had been a failure among consumers
yoga poses thrice for each in 5 – 8 seconds (5*3*3 = in the gaming industry. Due to their continuous efforts,
45 videos). They achieved over 97% accuracy in every Azure Kinect was born in 2019 using computer vision
calculated angle between the distinct body parts associ- technology, and it supports some new applications [160].
ated with those three poses [86]. For building robust body tracking, advanced computer
• Kinect Version 2 vision, and voice recognition models, Azure Kinect has
Microsoft launched the second generation of Kinect in in-built RGB & depth cameras, 360-degree angled seven
2014. It has numerous improvements over face, hand, microphones circular array, and an orientation sensor.
and body gesture recognition. Compared to version 1, It can connect many Azure Kinects into one volumet-
Kinect version 2 is more accurate and effective. How- ric capture rig using software tools through volumetric
ever, it takes a lot of time in terms of computation capture workflow. It permits the users to experience
to build more complicated models. Kinect V2 addi- the interactive virtual reality of human performances.
tionally includes advanced audio capabilities in speech Previous versions support only gaming, but it supports
recognition techniques with Software Development Kit some new fields, such as logistics, robotics, healthcare,
(SDK2.0). Motion detection Kinect cameras can work and retail [161].
with three input data streams: depth, color, and body Detecting a person’s body posture from either image or
tracking. Kinect Studio 2.0 & Visual Gesture Builder video could be problematic in case of powerful articu-
(VGB) were two crucial tools within Kinect V2 [82]. lations or occlusion and tiny or scarcely visible joints.
VGB would be a tool of Kinect V2 which uses classi- In order to do an accurate detection, the context must
fication techniques to deliver data to discover motions be collected. Utilizing the IR sensors for transforming
[82], [85]. It records the depth and color image and 2D images acquired through a simple RGB camera
tracks the entire body skeleton with its 25 body joint into a three-dimensional human posture prediction, each
coordinates. The Kinect and the yogis (children) were approach has its own set of advantages and disadvan-
distanced by 1.5 to 3.0 meters [87]. Two Kinect were tages. Microsoft’s Azure Kinect is amongst the most
placed around 2 meters from the yoga mat in a perpen- well-known solutions. Azure Kinect records all skeletal
dicular front direction. Perpendicular view orientation information, those data are the 3-dimensional coordi-
is slightly more accurate than front view orientation in nates of the participant’s joints, and it is fed into the layer
obtaining the body maps from the two separate direc- as an input. It employs them to generate various vectors
tions. It assesses the inaccuracy between the body joints & angles, allowing for precise postural detection. This
and exhibits that standing poses are more accurate than gadget weighs 440g, which is significantly much less
the seated and supine body orientation yoga poses. than the weight of the Kinect v2. The joint coordinates
Algorithms in Kinect need clarification when the yoga are based on the 3D coordinate system of the depth
practitioner moves their heads below the waist during camera. The joints were arranged in a hierarchy from the
the practice. Kinect VGB predicted the solution in most middle of the body towards the extremity. The mathe-
sampled yoga postures with 99.5% high true positive and matical formula was utilized to calculate the angles. The
0.03% low false positive [85]. Bayesian network then computes the likelihood of each
Their model uses an innovative approach for practicing predetermined pose and selects the best one with the
yogic poses that can monitor up to six persons simultane- most excellent probability as the final posture. It iden-
ously using Kinect V2. They recognized six yoga poses tifies six body postures of the players, such as Hands
using Kinect V2 and AdaBoost algorithm. This model UP, Hands Down, Turn Left, Turn Right, Dive, and Slow
produces accurate results of yogic postures greater than Down [157].
94.78% [82]. Figure 8 illustrates the skeletal body joints or keypoints
A gesture analysis model of yoga using machine learn- tracking in Kinect versions.
ing selected six yoga experts to perform five yoga poses Nowadays, the Microsoft 3D Kinect sensor is widely
in 1-2 minutes and collected 10 – 20 GB of video clips used in vision-based human action recognition tech-
for training purposes. It uses the Visual Gesture Builder niques [157]. Table 5 illustrates the comparison features
tool of Kinect V2 and the AdaBoost algorithm that of Kinect’s three versions.
FIGURE 10. Front and Back side view of 39 reflective markers mounted
on yoga practitioners [162].
or skeletal information and converts it into the color texture
features using joint angle and joint distance map (JDM) [97].
a skeletal posture. At 10Hz, analog data was filtered. Vicon
Surya namaskar asana consists of 12 sequences of postures
Nexus was used to compute the angular position of the spine
with distinct body motions. Furthermore, postures’ duration
and top and bottom extremities during 12 successive poses
varied based on the user’s training, shape, and mass index.
of Surya namaskar [109]. The markers are placed across the
With the exception of time flexibility, there will be other
whole body, and it is illustrated in the following figure 10.
aspects to consider when practicing yoga, including muscular
Compared to the Kinect, the Vicon was a marker-based
stiffness & different stages of joint occlusion at the time of the
motion-tracking system. Vicon utilizes a global coordinate
entire yoga practice. Utilizing merely joint location data as
system, while Kinect does not. Vicon adopted a reference
the features that vary considerably irrespective of the causes
standard lab method for measuring movements, whereas
could produce an excessive number of false positives & true
Kinect used algorithms to improve measurement accuracy.
negatives. Figure 9 specifies the standard nine-camera mocap
Keypoint detection methodologies estimate the yoga posture
system in the indoor environment [97].
from the head, shoulder, body, and foot skeletal joint’s key-
In their model, all practitioners were attached with sixteen
points. From the image or video, the acceleration of posi-
low-mass retro-reflective markers at specified body positions
tional changes in yoga was predicted based on the skeletal
in accordance with Vicon Plug-In Gait models for its leg mus-
keypoint information extracted using the keypoint extraction
cles. All kinematics data were obtained at 100 Hz employing
tools like OpenCV, OpenPose, Mask R-CNN, and MediaPipe.
a Vicon ten-camera motion capture system. Each of the seven
It increases the yoga posture’s prediction accuracy.
yogic postures was executed in a way that users could practice
frequently. Every pose was done in the balanced standing
E. KEYPOINT DETECTION TOOLS
position three times, with each participant holding the pos-
ture for fifteen seconds. All the positions were done both Some standard keypoint extraction tools from images or
on the left and right legs. For analysis, each pose’s three- videos used in yoga posture recognition are outlined below.
dimensional ankle dislocation and joint motions are extracted
and entered into Excel File. For the mid-five seconds of the 1) OpenCV
stance hold, the mean joint displacements & joint moments Intel’s Open-Source Computer Vision (OpenCV) library was
were determined. The stress and strain inside the three axes written in C & C++. OpenCV library contains more than
of motion were charted using mean and standard errors (SE). 500 functions and 2500 optimized algorithms covering many
One shortcoming with this model would be that the MoCap computer vision areas. OpenCV has been downloaded more
features represent movement between both the shank & foot than 2 million times, and this number is growing, averag-
sections and need to account for the numerous joints within ing over 26,000 downloads per month [118], [164]. Further-
the foot [163]. more, OpenCV included OpenPose in its libraries with their
A three-dimensional motion was collected utilizing Part Affinity Fields-based topology in their Neural Network
39 retro-reflective markers with a 12-camera Vicon system module [57].
at a sampling frequency of 100 Hz. Markers were positioned
on predefined anatomical landmarks specified by the plug- 2) OpenPose
in-gait model and fixed using double-sided sticky tape. For It is the first multi-person detection library in real-time. This
calibration, the static trials were captured while standing in library identifies 135 keypoints in total from the human
FIGURE 11. Representation of Body, Hand, Face, [165] and Foot [57] detected keypoints in OpenPose.
posture on a single image. It contains three different cate- detection methods, requiring a different library for each
gories of keypoint blocks. It is illustrated in the following use [57].
figure 11 [102]. OpenPose network utilizes the first ten layers of VGG-19
1) Body and foot keypoint detection. to extract the features of an image then those extracted fea-
2) Hand keypoint detection. tures are given as input to the two parallel stages of convolu-
3) Face keypoint detection. tional layers. The first stage of convolutional layers predicted
Existing 2D body pose estimation libraries did not combine a series of two-dimensional confidence mappings. Each one
their face, body, hand, and foot keypoint detectors [57]. of them represents a particular body part of the human skele-
Features of OpenPose: OpenPose, the keypoint detection ton. In this second stage, a collection of two-dimensional
library, has many inbuilt features, which are listed below. vector fields from the part affinity field was predicted, and it
• It is compatible with many versions of hardware and computes the degree of relationship between estimated parts.
software. In the final stage, bipartite graphs are created in-between
• It is compatible with multiple operating systems like pairs of body parts by utilizing the confidence maps, and
Windows, Mac OS, Ubuntu, & Embedded systems. part affinity fields could cut out its weak linkages. After
• It supports some of the hardware like CUDA GPUs, getting the results of each phase, human skeletal postures are
OpenCL GPU, and CPU. evaluated and assigned to everyone in the image [166].
• Users can select their inputs from an image, video, web- Extracted postural features are required for developing a
cam, & IP camera. self-trained yoga posture detection model. Nevertheless, this
• Users can choose to have their results displayed or saved model utilizes the manually extracted features and needs
to disks. a distinct model for every asana. The skeletal system is
• Users can activate or deactivate the face, body, hand, and the fundamental characteristic required for expressing many
foot keypoint detectors. human postures. There are several methods for obtaining the
• Users can enable pixel coordinate normalization. human body’s skeletal structure that can subsequently be
• Users can select the number of GPUs required for their used to predict posture. However, such approaches [83], [88],
application. [107] are computationally intensive, unsuitable for general
• 2-Dimensional multi-person and 3-Dimensional single- smartphone apps, and sensitive to vibrations. It measures the
person keypoint detections in real-time. user’s motion & identifies the position of secret body parts
• Fast-tracking and visual smoothing in detecting a single that were hard to discover in normal situations. HPE has
person in real-time. been a rapidly evolving area, and OpenPose completely trans-
Compared with OpenPose, Mask R-CNN and Alpha-Pose formed it by substantially decreasing computing time without
have some drawbacks in their libraries. Mask R-CNN and compromising the prediction accuracy of the model [94].
Alpha-Pose, enable its users to complete their frame read- The outcome relating to every frame of the live stream was
ers (image, video, and live streaming data), the majority obtained in the form of JSON, and it consists of the locations
of its pipelines, visualize the results, and JSON or XML of each body portion for each person identified in the image.
output file creation to their results. Face and body keypoint The pose extraction was carried out by the OpenPose net-
representations are not integrated into conventional keypoint work’s default resolution for optimal performance. At these
3) MASK R-CNN
Mask R-CNN is the extension of faster R-CNN [168] by
adding one additional branch in the network to predict the
segmentation masks from every Region of Interest (RoI).
Furthermore, it has similar existing features of classification
and bounding box recognition [61].
Each RoIs mask branch can be applied using a tiny, Fully
Convolutional Network (FCN) to forecast the segmentation
results in pixel-to-pixel mode. To achieve better accuracy
and speed, RoIPool enables attending RoIs on feature maps.
RoIAlign, a straightforward, quantization-free layer that reli-
ably conserves precise spatial information, is used by Mask
R-CNN to fix misalignments [61]. FIGURE 12. Keypoint detection using MediaPipe [171].
used to extract features and to create a skeleton of a body be independent of one another using a Naïve Bayesian classi-
by marking & linking all the joints. Coordinates & angles fier [181]. This model has its probability table and is updated
formed by the joints can be retrieved and used as a predicted through training data. In this model, to predict a new obser-
feature in machine learning models. Multiple ML approaches vation on data needs to look up the probability table based on
were utilized to compute the posture’s rate of accuracy [103]. the feature value of classes [179]. Agrawal et al. [103] have
Figure 14 represents the implemented YPR system using created an NB classification model to classify the ten yogic
machine learning methods to classify yoga postures. poses at an accuracy rate of 74.75%.
Yogic postures are identified by the trained classifiers
using machine learning models. The classifiers are first 3) SUPPORT VECTOR MACHINE (SVM)
trained with training feature datasets, then the trained classi- SVM is one of the supervised learning methods with a two-
fier is utilized to predict the specific yoga pose from a testing class classifier. To solve a greater number of issues involved
set of features [43]. A popular machine learning classification in different categories, multiclass SVM is the best choice.
model in the yoga posture recognition system is listed below. A multiclass SVM generates several classifiers and differ-
entiates those relying on the distinct labels from the rest or
1) LOGISTIC REGRESSION (LR) in-between every set of classes [182].
LR model was mainly used for binary classification prob- Support Vector Machines are also handling both classifi-
lems. It produces binomial, multinomial, and ordinal out- cation and regression problems. It classifies objects on the
comes as well. The linear regression model deals with the training dataset more accurately based on the examples [179].
prediction of continuous variables, while this model predicts SVM classifies data by generating a hyperplane with the
the target, which is categorical. It uses the sigmoid function greatest possible separation within classes [179], [182].
to predict the probabilistic values between 0 to 1 in binary SVM and Kernel-SVM have been used in a comparison
classification problems [179]. study to categorize the resting brainwave patterns affiliated
LR might be the most well-known discriminative method, with Kriya Yoga meditation sessions. The EEG signals of
and it supports both L1 and L2 regularization. For multi- 10 non-meditating persons and 23 meditating persons were
class problems, it uses the SoftMax activation function to recorded. EEG data were collected using a 64-channel EEG
predict the probability of each class. Newton-cg, sag, and device and using global 10/20 conventional electrode place-
lbfgs solvers only support L2 regularization. Liblinear solver ment. A 16-bit resolution EEG gadget with a 256Hz sam-
supports both regularizations and is the first and best choice ple rate was used to gather the EEG sample. During data
for smaller datasets. Solvers of ‘sag’ and ‘saga’ are faster for collection, the meditating persons meditated meanwhile the
larger datasets [180]. non-meditators sat casually. It has been found that ‘‘Polyno-
Agrawal et al. [103] use multinomial and Newton-cg mial’’ has higher classification rates than the other three most
solvers to predict the ten yoga poses. The maximum number used kernel functions. The results of SVM & k-SVM for two
of iterations taken for the Newton-cg solvers to converge distinct groups are displayed and compared. The polynomial
is 1000, 1500, 2000, & 2500, and it achieves the average kernel function had an average classification rate of 90.82 %.
accuracy of 82.15%, 83.02%, 83.79%, and 83.16% for all the Furthermore, the average accuracy in the system of SVM &
yoga poses. k-SVM was observed to be 85.54 percent and 90.82 percent.
These findings revealed that k-SVM outperforms traditional
2) Naïve BAYES (NB) SVM in differentiating meditating & non-meditating patterns
NB is built on the Bayesian theorem’s probability model. The of EEG. Amongst other classifiers, the k-SVM is a stronger
certain characteristics of the class variables are assumed to predictor of detecting non-linearity in EEG time series. For
nonlinear signals such as EEG, k-SVM is far superior to with accuracy in 99.26%, 99.72%, and 99.90% using different
SVM [112]. parameters [103].
Lee et al. [87] classify the four yoga positions performed
by the kids, and the linear kernel SVM classifier yields 91.3 % 6) K-NEAREST NEIGHBORS
accuracy. Compared to linear SVM, polynomial kernel SVM, It is a non-parametric and lazy learner algorithm. It does not
KNN, and Random Forest classifiers in yoga posture classi- make any assumptions based on data and takes action only
fication, polynomial kernel SVM produces better results. at the classification stage. It does not learn anything from the
Gupta and Jangid [101] developed an SVM classification underlying data; instead, it simply stores them. This model
model in YPR with an accuracy of 97.64% in classifying handles expensive calculations based on huge datasets [179].
the four yoga poses. The SVM and RF models achieved Lee et al. [87] identify the four yoga postures of children
excellent results, with an accuracy above 95%. However, the using KNN, and it produces the best overall average of 93.1%
SVM classifier surpasses traditional RF with a precision of accuracy that could be based on its short amount of video
97.64 percent, which is 1.17 percent higher. data.
Agrawal et al. [103] have created an SVM classifica- Nagalakshmi and Mukherjee [106] have created a KNN
tion method that uses linear, polynomial, and radial kernel classifier model that uses the euclidean distance function for
functions and classifies the yoga poses with an accuracy of all the yoga posture classifications. In their model, uniform
87.91%, 93.58% and 98.71%. and in-verse weight functions are used for prediction. In the
Nagalakshmi and Mukherjee [106] have proposed an SVM uniform weight function, all the points in each neighbor
model. It uses linear and radial kernel classifiers to predict the are equally weighted, but in the case of inverse, points are
13 yoga asanas and achieves an overall accuracy of 71.5% and weighted inversely to their distance such that closer ones have
59%. Linear SVM achieves the highest accuracy compared more weight than the farthest one. It predicts the highest
with the models K-Nearest Neighbors (KNN) and k-SVM. accuracy of 99.01% using five neighbors, inverse distance
weight, with Euclidean distance. KNN classifier has the k
value of 6 and it classifies the 13 yoga asanas with an overall
4) DECISION TREE (DT) 71% accuracy, 71.59% precision, and 72.76% recall.
One of the most well-known data categorization algorithms is
a DT classifier. An essential feature of DT has been its poten- 7) NEURAL NETWORKS (NN)
tial to convert complex decision-making issues into simpler NN is a set of algorithms that use nodes to recognize relation-
processes. It constructs the classifier model as a tree [183]. ships between the data, like how neurons in the human brain
Decision trees are mainly applied to solve the classifica- function. Changes to the input data are adopted in neural net-
tion & regression problems. Target variables in classification works and produce the best resultant data without redesigning
are categorical, whereas they are continuous in regression. the outputs. This model entirely relies on the training data for
This machine learning model is widely used to predict future the learning process. Once it is learned from the training data,
outcomes [179]. then the performance of the network increases automatically.
Agrawal et al. [103] predict the ten yoga postures with the YPR uses neural networks to classify the 13 yogic postures
highest accuracy of 97.71 percent while detecting the yogic with an overall accuracy of around 74% [106].
position using the DT classifier.
8) PRINCIPAL COMPONENT ANALYSIS (PCA)
PCA is an unsupervised machine learning approach widely
5) RANDOM FOREST used for reducing the dimensions of the dataset and can
RF is an ensemble classification technique widely used in be used for feature extraction. Principal components are
ML classification and regression. It uses a similar ensem- a statistical technique for changing correlated variables to
bling technique to connect multiple decision trees in parallel. uncorrelated ones. This model was trendy in ML and Data
Different subsamples of datasets are given as inputs to each Analysis [142].
decision tree. It uses the majority voting technique for binary To reduce data redundancy, Wiga performs PCA on video.
classification problems to collect the binary outputs. It takes Nevertheless, Wiga uses PCA to obtain real-time detection
the average (mean or median) of the continuous outputs with a latency of fewer than 0.5 seconds in a smaller time
for regression problems. It minimizes the loss and improves window to reduce duplicate data, and it avoids the seg-
accuracy. The RF in the ML model highly accurate than the mentation procedure in this model. The redundancy removal
DT learning model [142]. algorithm in each receiver antenna uses the PCA algorithm
The yoga Posture Recognition model using RF classifiers to minimize the dimensions of subcarriers. Three Channel
achieves 94.9% accuracy in classifying the four yoga postures State Information (CSI) streams antennas are used to compute
performed by the children [87]. PCA initially. PCA was conducted on CSI streams across
RF classifier correctly classified four yoga positions: tree a sliding window termed short PCA to fulfil the sequential
pose, triangle stance, warrior I, and warrior II poses, with an action identification in real time. Because most everyday
accuracy of 96.47% [68], and it detects the ten yogic postures human activities last less than 0.5s to 3s, a window duration
for SPCA was adjusted to 0.25, 0.5, 0.75, or 1 second to pick 1) CONVOLUTIONAL NEURAL NETWORK (CNN)
the relevant metrics [111]. Regarding vision tasks, CNNs have been the frequently uti-
lized deep learning model. Hand-crafted elements would be
9) AdaBoost used in classical ML approaches, whilst CNNs automatically
understand certain representative features [96]. CNN works
Adaptive Boosting is an ensemble method that combines
machine learning models to improve performance and min- quite well for visual recognition tasks, including picture cat-
imize the errors produced in classifiers. In this machine egorization, object recognition, and so on, specifically when-
learning technique, initially, weights are equally distributed ever massive data is used to train the network. To detect the
to the base classifier. If it produces the incorrectly classified incorrect yogic stance from the specific pose is to identify the
records, weights are updated to all the records and the feed is pose that is executed by utilizing the CNN from various yogic
forwarded to the newly added weak classifier. It continually postures. For providing keypoints and skeletal annotation for
adds the weak learners for incorrectly classified records until pose identification, which might not be feasible because of
it becomes the strong learner. Adaboost model improves the the higher computational requirements, they utilized a CNN
accuracy and produces minimal error [82]. to alleviate this problem. A CNN has been used to extract
In [82], they collected three datasets of Hatha yoga’s six 15 keypoints of a body from which that skeleton of a user’s
yoga poses to evaluate the performance of a YPR. Initially, posture was constructed [94]. Each layer in CNN performs
in the first session, the yoga trainer’s pose sequence was a distinct role of play. The three important types of neural
evaluated in the first dataset. Footage from the second session layers which involve a CNN are as follows:
conducted by the yoga trainee was included in the second
• Convolutional Layer: In this first layer, CNN convolves
dataset. The third dataset was made from videos from both
that entire picture, and its feature maps [186], [187].
sessions but was restricted to 5 clips. After getting the col-
A convolutional layer was composed of 3 × 3 filters,
lected frames, the system’s accuracy was calculated. The third
each of which has its number of parameters that must be
dataset achieves the highest accuracies of more than 94.78 %
learned. These filters’ height and weight were less than
in all postures.
the input array’s. Every filter is combined with an input
VGB classifies every recorded frame onto motion capture
matrix to generate a neuron-based activation map. The
patterns using one of two detection technologies: AdaBoost
convolution layer’s output volume is obtained by stack-
Trigger or RFR Progress. The AdaBoost algorithm merges
ing its activation maps filtered along its dimensions.
the outputs of an ensemble of weak classifiers into a total,
Every neuron within the activation map is just connected
and it can create thousands of additional weak classifiers
to the relatively limited regions of input volumes since
in data analysis. It chooses the relevant features used to
the height and width for every filter were created to
increase the model’s predictive power, and the irrelevant
be much less than the input. The convolution layer’s
features need not be computed, such as dimensionality reduc-
local connectivity lets the network learn the filters whose
tion and increasing speed. A filter has been applied to raw
interaction is maximal to their input’s local region, and
frame findings, removing the noise and jitter in the skele-
it exploits the spatially localized correlation of inputs.
ton. Filtering settings were examined to identify ideal values
Each pixel in an input image would be much more corre-
for yoga postures which minimizes the false positives and
lated with the adjacent than distant pixels. Furthermore,
negatives [85].
while the activation maps can be created by convoluting
In [85], Kinect Studio was utilized to record six yoga train-
a filter with the inputs, all filtering characteristics are
ers performing a set of five yoga asanas, mountain, forward
shared through all local positions. This weight-sharing
bend, upward salute, side bend, and tree pose, then convert
method minimizes the range of parameters needed for
that to extracted videos utilizing KSConvert. By consensus
successful expression, learning, & generalization [188].
of two yoga teachers, video clips were labelled or tagged in
• Pooling Layer: Pooling layers have diminished the
the entire frame throughout the recording, identifying a yoga
incoming volume’s spatial extent in advance of
posture. This model achieves accuracy of 95.8% and 98.4%
the upcoming convolutional layer, and it has no effect
on the yoga postures of trainees and experts. The following
on the volume’s depth dimension. This layer’s operation
table 7 illustrates the overview of ML models.
is also known as subsampling or downsampling because
Table 8 illustrates the accuracy of machine learning models
the decrease in size results in a concurrent data loss.
in classifying yogic postures.
However, any kind of loss is advantageous to the net-
work as the reduction in the size directly leads to a lower
B. DEEP LEARNING MODELS computational burden for the network’s successive lay-
Deep learning models most widely used in yoga posture ers and tends to work against overfitting. The most popu-
recognition systems are convolutional neural networks, recur- lar methods are average pooling and maximum pooling.
rent neural networks, deep neural networks, autoencoders, It is revealed that maximum pooling could further lead to
and hybrid models. Figure 15 shows how deep learning mod- faster convergence, excellent feature invariant selection,
els work in YPR using different types of networks. & improved generalization [186]. A pooling layer is
often placed between two successive convolutional lay- perform high-level thinking in CNN. As the name indi-
ers. Through down sampling and its representations, the cates, this layer possesses the entire connections among
layer minimizes the range of elements & computations. all activations at the preceding layers using neurons.
Max pooling is widely utilized because it is much more These activations could be calculated by performing a
effective [188]. matrix multiplication preceded by a bias offset. The
• Fully Connected Layer: After the convolution and two-dimensional extracted features are eventually con-
pooling layers, fully linked layers have been used to verted to a one-dimensional feature vector utilizing fully
connected layers. The generated vector may be classified a VGG-16 trained network using a transfer learning algo-
into a specific set of categories for use as a feature vector rithm. This approach achieves an accuracy of 98.44% in
in further analysis [186]. training, 79.3% in validation, and 72% in testing. Finally, the
VGG network influenced this 3-dimensional Yoga Net results specify that VGG16 model accuracies are marginally
model. VGG net is a highly deep Convolutional Neu- lesser than that of the CNN models [100].
ral Network model, and it won the Large Scale Visual Some popular CNN architectures like AlexNet, VGG16
Recognition ImageNet Competition at categorization and and ResNet18 are used in [106]. The AlexNet model was built
localization tasks in 2014. This model architecture is the with eight layers, and the first five layers are convolutional
same as that of CNN architectures. Both Keras & Ten- layers followed by two fully connected layers and one Soft-
sorFlow packages were used to create this network, which Max layer. Eight layers of AlexNet, 16 layers of VGG, and
has been done in Python. This CNN uses hyperparame- 18 layers of ResNet were trained and tested with 13 classes
ter activations such as rectified linear unit (ReLU) in the of yoga datasets with 2129 images. In this approach, deep
convolution layer and SoftMax in dense layers. In three- learning classifiers except AlexNet perform worse than other
dimensional skeletal-based human action classification tasks, models. This model classifies the yoga postures with an
combining three-dimensional movement sensory data with accuracy of 30% in VGG16, 60% in ResNet18, and 83.05%
two-dimensional color-coded JDMs as input to this deep net- in AlexNet [106].
work was beneficial. Despite this, these joints’ distance was For vision-based applications, including specific facial
restricted by their potential to reflect rotated joint motions recognition, object identification and recognition, posture
that contribute to quite a significant availability of data in detection, robotics, and self-autonomous vehicles, CNNs are
human action categorization tasks. Incorporating the extra already proven to be incredibly effective [186].
feature of joint orientation with joint distance characteristics
reduces the recurrence of false positives in this approach. 1) Three-Dimensional CNN (3D-CNN) Model
As a result, rather than JDMs, employing JADMs. In multiple A three-dimensional CNN model architecture has been
trials, JADMs outperformed province techniques utilizing developed to identify yoga postures quickly. This
JDMs and CNNs in terms of training time and accuracy. Over model utilizes the three-dimensional convolutional DL
four weeks, a nine-camera motion capture system has been framework to recognize the yoga poses based on their
used to capture 42 yoga positions with ten subjects at ten underlying spatial-temporal relationship. They used
different positions. At last, this model can track the transition cell phone cameras with a 4K resolution as a 30- fps
of yogic movements with time. It could serve as a baseline frame rate for recording all videos. The sequences are
for a self-assessment yogic approach [97]. fed into the proposed three-dimensional CNN. The
CNN achieves the yogic posture classification accuracy stated 3D CNN model’s 3D convolution layer extracts
of 95% in training [94]. In [83], the proposed CNN model discriminative features from the data video clips. The
achieved 90% average accuracy in all yoga postures. In this derived features are then fed into the SoftMax layers
model, CNN achieved 98% training, 88% validation, and to predict the specified yoga pose from the ten yogic
91% testing accuracy. Another approach would be to employ postures [95].
It consists of repeatedly iterating three-dimensional stage, on the other hand, includes 26 neurons for every
convolution, maximum pooling, average pooling, and output & utilizes its SoftMax layer. DCNN classi-
fully connected layers using SoftMax activations. fiers are evaluated over 100 iterations using a ten-
At first, these convolution layers toward the up and fold cross-validation approach. This DCNN model for
down phases of the network learned that retrieve high YPR Achieves 99.889% average precision, 99.889%
and low discriminating data via yogic pose videos for recall, 99.996% specificity, 99.889% F1-measure, and
categorization. However, the pooling layers minimize 99.991% accuracy in all yoga poses. This system clas-
the spectral dimensionality of the feature mappings sifies the single yogic posture and all the yogic postures
derived from the convolution layer. This average pool with an average latency of 110 milliseconds on the DL
layer aggregates the combined output features retrieved server [92].
from the activity video into one-dimensional input vec-
tors, which are then categorized into one of ten Yoga 2) RECURRENT NEURAL NETWORK (RNN)
positions by the SoftMax layer [95]. RNNs are also neural net architectures used to resolve con-
The network’s convolution layers use the ReLU func- flicts involving sequence prediction. Sequence prediction
tion to perform nonlinear feature transformation. With issues could be one-to-many, many-to-one, or many-to-many
no computational overhead, this activation function type. The prior data of a neuron are kept in RNNs, which
generates monotone gradients that are quicker and sim- aids in processing sequential data. Therefore, the context is
pler to compute. Moreover, batch normalization can kept, and output is given while taking earlier learned facts into
be used at every stage of constructed networks to account. RNNs have frequently been used to solve problems
increase speed, accuracy, and stability. Additionally, in natural language processing where the inputs are typically
when the dropout layer was used, which had a proba- modelled after the sequence. However, there is an interdepen-
bility of 0.5, its networks were forced to identify highly dence between the past action and the following act in action
resilient characteristics that can be used by combining detection or posture categorization tasks. When identifying
several randomly chosen subsets over the other neu- the final posture in yoga, the context or understanding of the
ron. Dropout is perhaps the fast way of aggregating opening or intermediate poses is equally crucial. As a result,
models in the neural network by eliminating random yoga can be conceived of as a series of stances. RNNs are
elements [95]. an excellent option for yoga posture categorization since the
The network is set to automatically pick the best fil- sequential assessment of joint positions can better reflect the
ters that reflect the key features for recognizing Yoga interdependence among joint locations [182].
poses. They collected 261 videos running for 37 min- However, the major issue of RNNs, they cannot protect
utes and 53 seconds using 27 yoga experts performing against long-term dependency. The current data can be suit-
ten yoga postures. Among these 261 videos, the first able for performing its recent task, and however, in some
241 videos are chosen for their training & validation, cases, the distance will be too large amidst the current data
and then the last 20 videos are utilized for testing and its task. RNNs failed in these circumstances of not
purposes. The in-house generated yoga posture dataset being able to integrate that relevant data. During yoga, when
of ten yoga postures obtained a test recognition rate of the intermediate movements for yoga postures become too
91.15 percent. Moreover, the built architecture obtained lengthy, RNNs fail to preserve the recording of the initial
a comparable test recognition rate of 99.39 percent phases required to estimate the portrayed action, and its
on publicly available data. This model was calculated termed a long-term dependency issue in RNN [182].
effectively and worked even on the frame computing • Long Short-Term Memory (LSTM)
rate of 20 frames per second, and it might be integrated The specific RNN type termed LSTM persists in
easily into the resource-constrained embedded system addressing the long-term dependency issue. An LSTM is
in real-world scenarios [95]. a well-known RNN that, by default, can easily remember
2) Deep Convolutional Neural Network (DCNN) knowledge or information for lengthy sufficient peri-
In this DCNN model, the feature extraction section ods. The essence of the concept is that it enables this
comprises three convolution stages. Everyone follows would-be cell state. A cell state enables for continuous
through the max-pooling layer with the kernel’s size flow of information. It can be compared to a conveyor
of 2 × 2 and two strides. All three convolution layers belt. LSTMs use regulatory structures called gates to
consist of 32, 128, & 256 kernels with 3 × 3 sizes add and remove information from the cell state. These
and one stride. Although this model uses three max- specific gates permit the information to pass through
pooling layers, the feature extraction stage produces if required. LSTM employs three gates: input, update,
256 extracted features that are three times smaller than and forget. As a result, an LSTM may select, forget
the original image, and these traits are then sent to or remember its learning. Because LSTMs allow for
its FC layers for classification. The first three layers extended storage of the network’s input source, they
in the classification stage include 128, 64, & 32 units can handle long sequences effectively and success-
of neurons with ReLU activation function. The final fully [182].
3) DEEP NEURAL NETWORK (DNN) an over a frame. That predicted threshold value was utilized
With YogaHelp, beginners can practice yoga correctly on to identify a frame in which the user is not practicing yoga,
their own. This deep learning model uses a convolution and the impact on polls of frames is being explored. This
layer, which does not require precisely extracted informa- model predicts the yoga postures in every video with 99.04%
tion, to identify the yoga postures. This recognition model accuracy for frame-wise & 99.38% accuracy on the poll of
is built utilizing DNNs. They are chosen because of their 45 frames, and it attains an accuracy of 98.92% in real-time
capacity to retrieve the more relevant features of raw infor- for a group of 12 diverse persons, demonstrating its capability
mation autonomously. Every dataset is passed via a series of to perform six yoga poses effectively [96].
interconnected convolution layers, then its flattened and FC Wiga introduces a deep learning model which combines
layers. After that, the SoftMax function, which assigns a set CNN & LSTM to retrieve the high-level characteristics.
of predetermined probabilities to each Sun Salutation posture, It uses fine-grained CSI as a source to construct a deep
is then used to integrate and send the results of the fully learning model which maps motion-induced signal changes
connected layers through another FC layer. Selected eight on activity sequences. Beginning with the measured inputs of
trainers to perform the 20 rounds of sun salutation sequences CSI, Wiga filters out the undesirable signals and their redun-
in 30 days and collected 4800 videos and 57120 instances. dant components. After that, it abstracts the deep features
Within four weeks, the Yoga Help system improved trainers’ using CNN and models the source’s temporal dependencies
accuracy of yogic posture from 93.7 percent to 98.7 percent with LSTM. This model was evaluated with 17 yogic pos-
and the trainees from 88.7 percent to 98.3 percent [91]. tures performed by seven practitioners. Wiga attains 97.7%
& 85.6% accurate yogic posture results for the trained, and
untrained yoga participants [111].
4) AUTO ENCODER (AE)
Table 9 specifies the achieved accuracy of existing yoga
This model utilizes a pose recognition system using CNN and
posture recognition systems using deep learning models for
a stacked autoencoder. An autoencoder is another unsuper-
classification.
vised measure to reduce dimensions. An autoencoder could
do encoding; after that, the inputs could be classified by
C. METRICS FOR PERFORMANCE ANALYSIS
decoding by NN. Image inputs are supplied into the hidden
units of a Stacked Auto Encoder (SAE) to extract the features The following metrics are utilized for evaluating the perfor-
that are subsequently fed into the output units of SAE to mance of the ML and DL classification models.
rebuild the input image. The final layers are used as inputs for 1) Confusion Matrix: It is an N * N matrix used for com-
the classifier. They train an NN like a classifier that maps the paring the actual value with the predicted one. Here, the
extracted features for its output labels. After this SAE, it has range of target classes was specified as N. It measures
a neural network containing 784 input layers, 100 hidden the classification performance of ML methods, also
layers, and output layers equivalent to the total of classes [83]. referred to as Error Matrix, having a specific layout
They collected two datasets of 12 karanas and 14 karanas of table used to visualize and summarize the performance
864 and 1260 images using YouTube and a third dataset con- of a classification model. It has four specific outcomes
taining 400 images of 8 yoga postures. They achieved accu- used to describe the evaluation metrics of the classifier.
racy in 86.11%, 97.22%, and 70% of those three datasets [83]. They are
• True Positive (TP): The model correctly predicted
5) HYBRID MODEL the specific type of asana.
In recent times, a CNN-LSTM hybrid has been used for • False Positive (FP): Model incorrectly predicted
sentiment analysis, text categorization, cardiac prognosis, the specific type of asana. It is a Type-I error.
face anti-spoofing, and skeleton-based pose estimation. The • True Negative (TN): The model correctly rejected
combined deep learning approach utilizing CNN & LSTM the specific type of asana.
was implemented for yoga posture recognition on real-time • False Negative (FN): Model incorrectly rejected
clips. In this model, the CNN layer was utilized for extracting the specific type of asana. It is a Type-II error.
the features on keypoints of every frame acquired through 2) Accuracy: It is the ratio of the correctly classified
OpenPose, and the LSTM layer delivers the temporal posture asanas out of all the asanas. Accuracy intimates how
estimation [96]. the model performs in all asanas.
A hybrid of CNN & LSTM is employed in this deep
learning model. CNN was often used for pattern recognition TP + TN
Accuracy = (1)
issues, while LSTM was used for time-series data applica- TP + TN + FP + FN
tions. This time-distributed CNN network was employed to 3) Overall Accuracy (OA): It represents the total ratio of
extract the feature from two-dimensional coordinated key- accurately classified asanas across all the asanas [105].
points generated within the preceding stage. The SoftMax
estimates the possibility of every asana in a frame, while an TP1 + TP2 + TP3 + . . . + TPn
OA = (2)
LSTM model examines the changes within those features on Total Testing Set
VOLUME 11, 2023 11209
A. K. Rajendran, S. C. Sethuraman: Survey on Yogic Posture Recognition
4) Precision: It is the ratio of the number of asanas cor- 6) F1-score: It represents the harmonic mean of precision
rectly classified in the specific type to the overall asanas and recall. It attempts to find the balance of precision
classified correctly or incorrectly in that specific type. and recall. It specifies how precise and robust the clas-
sification model is.
TP
Precision = (3) Precision ∗ Recall
TP + FP F1 − score = 2 ∗ (5)
Precision + Recall
5) Recall: This is the ratio of correctly identified yoga 7) Activity Error Rate (AER): For maintaining the consis-
poses to the overall number of yoga poses in that group. tency of the standard recognized sequence of activities,
TP some activities must be altered, removed, or inserted.
Recall = (4) AER is defined as calculating the overall percentage of
TP + FN
11210 VOLUME 11, 2023
A. K. Rajendran, S. C. Sethuraman: Survey on Yogic Posture Recognition
TABLE 11. (Continued.) Detailed analysis of computer vision methods used in YPR.
TABLE 11. (Continued.) Detailed analysis of computer vision methods used in YPR.
TABLE 11. (Continued.) Detailed analysis of computer vision methods used in YPR.
TABLE 11. (Continued.) Detailed analysis of computer vision methods used in YPR.
TABLE 11. (Continued.) Detailed analysis of computer vision methods used in YPR.
all the altered, removed, or inserted activities divided a – Number of correct steps in speed, b – Number of
by its standard sequence of action [111]. acceptable steps in deviation.
10) Support: It specifies the number of actual class occur-
R+D+I
AER = (6) rences in that dataset.
N
Table 10 illustrates the performance evaluation metrics
8) Matthews Correlation Coefficient (MCC): A correla- used for the yoga posture recognition systems.
tion coefficient of values between the actual and pre- It discusses the machine learning algorithms frequently
dicted classifications. It takes all four values of the used in YPR and explains how each machine learning
confusion matrix into account. The maximum and min- algorithm differs depending on the applications and set-
imum value returned by MCC is between +1 and −1. tings. Based on the performance and based on the type
The resultant coefficient of MCC is: of data, we have to choose an algorithm. It offers a com-
• +1 means perfect prediction and best agreement prehensive breakdown of the ML and DL models utilized
between actual and predicted values. in YPR so that the readers can decide what course of
• 0 means random prediction, and it is no agreement action to take in light of the availability of the new ML
or relationship between any values. methods.
• −1 means wrong prediction and total disagreement Table 11 shows the detailed analysis of distinct computer
between those values [105]. vision methodologies utilized in YPR.
V. PREDICTION APPROACHES
MCC
[(TP ∗ TN ) − (FP ∗ FN )] Yoga poses are detected by the YPR system as two
=√ approaches using vision and sensor-based methodologies.
[(TP + FN )(FP + TN )(TP + FP)(TN + FN )]
(7) 1) Yoga Posture Classification
2) Yoga Posture Grading
9) Overall Correctness Score (OCS): Yoga Help sys-
tem [91] assesses the entire sun salutation sequence. A. YOGA POSTURE CLASSIFICATION
If any sun salutation sequence is missed during the Yoga has recently witnessed extraordinary global popularity.
performance, it assigns the OCS as zero. Otherwise, It is a great way to exercise at home. Sun Salutation is also
it calculates the OCS as follows: a type of Yoga form that involves strengthening practically
0.5 ∗ a + 0.5 ∗ b every region of the body and incorporates a series of 12 linked
OCS = (8) stages [91]. Deep learning has recently achieved incredible
12
11216 VOLUME 11, 2023
A. K. Rajendran, S. C. Sethuraman: Survey on Yogic Posture Recognition
TABLE 12. Summary of yoga pose classification and grading methodologies in recent years.
performance in tackling the yogic posture classification due each chosen keypoint were computed. Second, the values of
to its remarkable featured learning capability [108]. With angle difference could be normalized, and the mean score and
the proliferation of motion sensors, it is now possible to standard deviation were computed using the feature of the
collect motion data and monitor the performance of yoga distribution. Finally, the intervals of postural assessment for
postures. This unique Yoga Help system uses motion data to the chosen keypoint are determined by adjusting the threshold
recognize the various movements & estimate how well they based on angle changes. Some postures concentrate on the
were performed by the practitioner [91]. hands, while others concentrate on the feet. Every competent
In [88], a computer vision-assisted self-training yoga yoga teacher is familiar with the prerequisites for each Yogic
system was implemented. According to the skeleton and position. The major vital points would receive an 80, while
contour-based analysis from front and side perspectives, this the other key points would receive a 20 if the appropriate
yoga method categorizes the yoga positions. Five practition- weighting were applied [99].
ers were selected to perform a total of 12 yoga poses, five The yoga teacher visual analysis system was used to com-
times for each pose. They achieved a maximum of 99.87% pare and assess the performed yogic video sequences of
and 99.15% accuracy in front and back views. beginners and professionals using the speed-up robust fea-
Yogic postures are classified on several YPR system mod- ture algorithm. It enables users to adjust their yoga postures
els [81], [82], [83], [84], [85], [86], [87], [88], [91], [92], without the assistance of yoga professionals [107].
[93], [95], [96], [97], [98], [100], [101], [102], [103], [104], Selecting two yoga posture photos of the student and the
[105], [106], [109], [110], [111], [112], [113], [114], [115], teacher, retrieving the human skeleton keypoints, and enter-
[138] based on recent developments in the technologies of ing those into the posture characteristic encoder are methods
vision, wearable, and RF-based sensors utilizing the learning for grading yoga postures. At last, in terms of generating the
techniques of ML and DL. posture grade, overall feature similarities in-between them
were computed [24], [94], [99], [107], [108]. Table 12 rep-
B. YOGA POSTURE GRADING resents a detailed overview of these approaches used in the
Unlike yoga pose categorization, which seeks to deduce recent ML and DL models.
the yoga posture class label, yogic posture grading (YPG)
attempts to measure the individual’s yogic acts statistically. VI. INFERENCES
Despite the fact that there are numerous studies on yogic This section illustrates some inferences observed through this
posture categories, there are very few studies on yogic posture detailed analysis.
grading [108]. • Yogic posture recognition is one of the major research
The reference posture was the desired yogic posture the challenges in computer vision. It is well known that the
client attempted to perform. The targeted posture and pos- proper yoga posture promotes health benefits and lowers
ture derived from its key-point prediction model will be illness burden. The health advantages of practicing yoga
compared in order to validate the consensus among various are still not widely known yet.
angles and joints. This resemblance could determine whether • An enhanced tool with excellent resolution and accuracy
the participant’s posture was correct. Examining the angles is available for vision-based models to identify yoga
between a participant’s joints and then confirming that those poses, but it lacks privacy. Hence, new models utilize
angles must be within the level of tolerance for performing a multiple sensors for predicting yogic posture to achieve
yogic posture based on the field of expertise of yoga is one high accuracy.
technique to identify anomalies [94]. • Machine learning and deep learning models provide
In [24], a computer vision-assisted autonomous yoga learn- accurate results to some extent for estimating the yogic
ing system was developed. In this interactive learning system, posture. Beyond that, a combination of these models
the player’s gesture was compared with the standard yoga (Hybrid models) is applied to achieve better results.
posture, and its grade would be computed using distance • In the current scenario, YPR models must be accompa-
transition & pattern matching. Six people were chosen to per- nied by skeletal feature extraction, Keypoint estimation
form every posture three times in the trials. The system score tools, angle calculations, and learning algorithms.
was assigned between 0 to 100. The differences between the • From this detailed analysis, it has been noted that yoga
scores given by the computer and the yoga instructor would posture can be accurately classified and graded in image,
be approximately around 86 percent lies in -2.5 to 2.5. video, and live streams using a variety of ML, DL, and
The effectiveness of a user’s yogic posture is assessed hybrid models.
using a Self-Practice YPR with an OpenPose-based Workout
& Overall Coaching Consultant model, and a statistically VII. CONCLUSION
based grading approach is suggested. The algorithm gener- YPR aids in advancing different applications in many health-
ated the overall score of the selected keypoints by adding care, security, games, and fitness areas. Due to their small
the weight to the total score. Initially, all poses of 50 prac- size and low cost, vision-based detectors and wearable sen-
titioners resembling a yoga instructor’s pose were recorded sors have historically been widely utilized to collect yogic
using subjective evaluation, and the two angle variations of postures. This paper investigates the most recent yoga posture
assessment developments using sensing devices, machine improvements in this pandemic period. Moreover, design a
learning, and deep learning methodologies. Initially, the novel approach to predict the amputees’ yogic postures in
feature of vision-based and sensor-based gadgets has been images.
provided. Furthermore, a thorough analysis of keypoint esti-
mation strategies used in recent research is presented, high- ACKNOWLEDGMENT
lighting the key merits and shortcomings of each strategy. The authors would like to thank the editors and reviewers.
The study also includes information about the publication, They would also like to thank Dr. S. V. Kota Reddy, Vice
used kinetic models, and the sensor’s location attached to Chancellor and Dr. Jagadish Chandra Mudiganti, Registrar,
the human body. The widely used machine learning and VIT-AP University for their great support.
deep learning algorithms are then suggested, along with
their accuracy. Finally, it specifies the recent research on REFERENCES
classification and grading approaches of yogic posture. The [1] S. Madan, J. Sembhi, N. Khurana, K. Makkar, and P. Byati, ‘‘Yoga for
discovery of advanced textiles with multi-sensing abilities preventive health: A holistic approach,’’ Amer. J. Lifestyle Med., pp. 1–6,
and the development of sensing devices for smart healthcare Jan. 2022.
[2] H. Cramer, L. Ward, A. Steel, R. Lauche, G. Dobos, and Y. Zhang,
open new directions for further research. Yoga, pranayama, ‘‘Prevalence, patterns, and predictors of yoga use: Results of a US nation-
and other forms of exercise can address health advantages, ally representative survey,’’ Amer. J. Preventive Med., vol. 50, no. 2,
which can be a breakthrough in healthcare applications and pp. 230–235, 2016.
[3] R. La Forge, ‘‘Mind-body fitness: Encouraging prospects for primary
open the door for healthcare sectors to extend their emphasis and secondary prevention,’’ J. Cardiovascular Nursing, vol. 11, no. 3,
in the face of the COVID-19 pandemic. It raises awareness pp. 53–65, Apr. 1997.
about using E-textiles, smart textiles, and other vital human [4] R. Govindaraj, S. Karmani, S. Varambally, and B. Gangadhar, ‘‘Yoga
and physical exercise—A review and comparison,’’ Int. Rev. Psychiatry,
monitoring devices. The use of such wearables to monitor vol. 28, no. 3, pp. 242–253, 2016.
bodily vitals has rapidly increased during this pandemic; it [5] Acharya. (2006). Yogic Asanas and Their Classification. [Online].
creates self-awareness in every individual’s health care across Available: http://www.ayurvedictalk.com/yogic-asanas-and-their-
the globe and predicts health issues. The existing approaches classification/187/
[6] AYMYogaSchool. (2022). Introduction of Asana—Yoga in India.
only predict the accurate pose when all the body parts are [Online]. Available: https://www.indianyogaassociation.com/blog/
visible. During the practice of complex yogic postures, some asana.html
body parts overlap with others; these parts are obscured from [7] A. Pizer. (2021). Yoga Poses: An Introduction to Asana Practice. [Online].
Available: https://www.verywellfit.com/poses-yoga-4157147
view or difficult to detect. The current approaches typically [8] R. Jayawardena, P. Ranasinghe, H. Ranawaka, N. Gamage,
do not label or predict concealed body parts. Most large-scale D. Dissanayake, and A. Misra, ‘‘Exploring the therapeutic benefits
pose estimate datasets comprise typical poses used daily, of pranayama (yogic breathing): A systematic review,’’ Int. J. Yoga,
vol. 13, no. 2, p. 99, 2020.
making it difficult for the existing models to handle such com- [9] B. S. Mody, ‘‘Acute effects of Surya Namaskar on the cardiovascular
plicated poses successfully. Existing yoga posture recogni- & metabolic system,’’ J. Bodywork Movement Therapies, vol. 15, no. 3,
tion systems couldn’t produce the expected results in complex pp. 343–347, 2011.
yogic posture recognition. Furthermore, the investigation of [10] A. Harris, M. Austin, T. M. Blake, and M. L. Bird, ‘‘Perceived ben-
efits and barriers to yoga participation after stroke: A focus group
three-dimensional body posture recognition might be helpful approach,’’ Complementary Therapies Clin. Pract., vol. 34, pp. 153–156,
in this, and it could address new challenges in the future. Feb. 2019.
A Novel approach to detect multiperson yogic postures in [11] R. Nagarathna and H. Nagendra, ‘‘Yoga for bronchial asthma: A con-
trolled study,’’ Brit. Med. J. (Clin. Res. Ed.), vol. 291, no. 6502,
a single frame during yoga practice could be an extension pp. 1077–1079, 1985.
of YPR and future improvement. Many factors, including [12] R. Manocha, ‘‘Sahaja yoga in the management of moderate to severe
background, lighting, and overlaying figures, would make asthma: A randomised controlled trial,’’ Thorax, vol. 57, no. 2,
pp. 110–115, Feb. 2002.
predicting multi-person postures even more difficult. Future [13] C. C. Clay, L. K. Lloyd, J. L. Walker, K. R. Sharp, and R. B. Pankey, ‘‘The
investigation will focus on enhancing the suggested system’s metabolic cost of hatha yoga,’’ J. Strength Conditioning Res., vol. 19,
potential to detect a broad range of yogic postures performed no. 3, p. 604, 2005.
by multiple persons and generalizing the method for use [14] V. S. Cowen and T. B. Adams, ‘‘Heart rate in yoga asana practice: A
comparison of styles,’’ J. Bodywork Movement Therapies, vol. 11, no. 1,
in the real world. Durable models across multiple camera pp. 91–95, Jan. 2007.
angles would be essential in applications like an automated [15] T. Cartwright, H. Mason, A. Porter, and K. Pilkington, ‘‘Yoga practice
Yoga trainer. Train the models using a large number of yogic in the U.K.: A cross-sectional survey of motivation, health benefits and
behaviours,’’ BMJ Open, vol. 10, no. 1, Jan. 2020, Art. no. e031848.
postures with a variety of camera angles to do this. New [16] I. Monye and A. B. Adelowo, ‘‘Strengthening immunity through healthy
inventions with the right combination of ML methods and lifestyle practices: Recommendations for lifestyle interventions in the
wearables will benefit the entire society and improve the management of COVID-19,’’ Lifestyle Med., vol. 1, no. 1, p. e7, 2020.
[17] P. O’Gorman and S. Norris, ‘‘Exercising in the COVID-19 era: Impli-
quality of people’s lives conveniently. An innovative inte- cations in non-alcoholic fatty liver disease (NAFLD),’’ BMJ Open Gas-
grated yoga mat with a smartphone and optional wearables troenterol., vol. 8, no. 1, Jun. 2021, Art. no. e000568.
like a smartwatch, smart band, or any other smart gad- [18] H. Bawaskar and P. Bawaskar, ‘‘From quarantine room: Physician per-
gets is a powerful combination for creating a personalized spective,’’ J. Family Med. Primary Care, vol. 9, no. 10, p. 5092, 2020.
[19] J. Mikkonen, P. Pedersen, and P. W. McCarthy, ‘‘A survey of muscu-
yoga journey in future. Design an efficient mobile applica- loskeletal injury among ashtanga vinyasa yoga practitioners,’’ Int. J. Yoga
tion for self-assessment yoga and gaming tools for health Therapy, vol. 18, no. 1, pp. 59–64, Jan. 2008.
[20] G. Bianchi, C. Cavenago, and M. Marchese, ‘‘Can the practice of yoga [42] S. C. Gupta, D. Kumar, and V. Athavale, ‘‘A review on human action
be dangerous? Considerations over a case of epiphyseal separation of recognition approaches,’’ in Proc. 10th IEEE Int. Conf. Commun. Syst.
the distal tibia in a teenager,’’ J. Orthopaedics Traumatol., vol. 5, no. 3, Netw. Technol. (CSNT), Jun. 2021, pp. 338–344.
pp. 188–190, Dec. 2004. [43] F. Sajjad, A. F. Ahmed, and M. A. Ahmed, ‘‘A study on the learning
[21] A. Moriarity, P. Ellanti, and N. Hogan, ‘‘A low-energy femoral shaft based human pose recognition,’’ in Proc. 9th IEEE-GCC Conf. Exhib.
fracture from performing a yoga posture,’’ BMJ Case Rep., vol. 2015, (GCCCE), May 2017, pp. 1–8.
Oct. 2015, Art. no. bcr2015212444. [44] N. S. Khan and M. S. Ghani, ‘‘A survey of deep learning based models
[22] P. Dacci, S. Amadio, S. Gerevini, L. Moiola, U. Del Carro, M. Radaelli, for human activity recognition,’’ Wireless Pers. Commun., vol. 120, no. 2,
G. Figlia, V. Martinelli, G. Comi, and R. Fazio, ‘‘Practice of yoga may pp. 1593–1635, Sep. 2021.
cause damage of both sciatic nerves: A case report,’’ Neurol. Sci., vol. 34, [45] F. Gu, M.-H. Chung, M. Chignell, S. Valaee, B. Zhou, and X. Liu,
no. 3, pp. 393–396, Mar. 2013. ‘‘A survey on deep learning for human activity recognition,’’ ACM Com-
[23] C. Wiese, D. Keil, A. S. Rasmussen, and R. Olesen, ‘‘Injury in yoga asana put. Surv., vol. 54, no. 8, pp. 1–34, 2021.
practice: Assessment of the risks,’’ J. Bodywork Movement Therapies, [46] H. F. Nweke, Y. W. Teh, M. A. Al-Garadi, and U. R. Alo, ‘‘Deep learning
vol. 23, no. 3, pp. 479–488, Jul. 2019. algorithms for human activity recognition using mobile and wearable
[24] B.-S. Wu, C.-C. Hsieh, and C.-C. Lee, ‘‘A distance computer vision sensor networks: State of the art and research challenges,’’ Expert Syst.
assisted yoga learning system,’’ J. Comput., vol. 6, no. 11, pp. 2382–2388, Appl., vol. 105, pp. 233–261, Sep. 2018.
Nov. 2011. [47] D. Wu, N. Sharma, and M. Blumenstein, ‘‘Recent advances in video-
[25] S. Zhang, Z. Wei, J. Nie, L. Huang, S. Wang, and Z. Li, ‘‘A review on based human action recognition using deep learning: A review,’’ in Proc.
human activity recognition using vision-based method,’’ J. Healthcare Int. Joint Conf. Neural Netw. (IJCNN), May 2017, pp. 2865–2872.
Eng., vol. 2017, Jul. 2017, Art. no. 3090343. [48] E. Ramanujam, T. Perumal, and S. Padmavathi, ‘‘Human activity recog-
[26] A. Bux, P. Angelov, and Z. Habib, ‘‘Vision based human activity recog- nition with smartphone and wearable sensors using deep learning tech-
nition: A review,’’ in Advances in Computational Intelligence Systems. niques: A review,’’ IEEE Sensors J., vol. 21, no. 12, pp. 13029–13040,
Cham, Switzerland: Springer, 2017, pp. 341–371. Mar. 2021.
[27] M. Hassan, T. Ahmad, A. Farooq, S. A. Ali, S. R. Hassan, and N. Liaqat, [49] H. A. Ullah, S. Letchmunan, M. S. Zia, U. M. Butt, and
‘‘A review on human actions recognition using vision based techniques,’’ F. H. Hassan, ‘‘Analysis of deep neural networks for human activity
J. Image Graph., vol. 2, no. 1, pp. 28–32, 2014. recognition in videos—A systematic literature review,’’ IEEE Access,
[28] X. Xu, J. Tang, X. Zhang, X. Liu, H. Zhang, and Y. Qiu, ‘‘Exploring tech- vol. 9, pp. 126366–126387, 2021.
niques for vision based human activity recognition: Methods, systems, [50] S. N. Boualia and N. Essoukri Ben Amara, ‘‘Pose-based human activity
and evaluation,’’ Sensors, vol. 13, no. 2, pp. 1635–1650, Jan. 2013. recognition: A review,’’ in Proc. 15th Int. Wireless Commun. Mobile
[29] H.-B. Zhang, Y.-X. Zhang, B. Zhong, Q. Lei, L. Yang, J.-X. Du, and Comput. Conf. (IWCMC), Jun. 2019, pp. 1468–1475.
D.-S. Chen, ‘‘A comprehensive survey of vision-based human action [51] T. Alhersh, H. Stuckenschmidt, A. U. Rehman, and S. B. Belhaouari,
recognition methods,’’ Sensors, vol. 19, no. 5, p. 1005, 2019. ‘‘Learning human activity from visual data using deep learning,’’ IEEE
[30] D. R. Beddiar, A. Hadid, B. Nini, and M. Sabokrou, ‘‘Vision-based human Access, vol. 9, pp. 106245–106253, 2021.
activity recognition: A survey,’’ Multimedia Tools Appl., vol. 79, no. 41, [52] S. Aftab, S. F. Ali, A. Mahmood, and U. Suleman, ‘‘A boosting framework
pp. 30509–30555, 2020. for human posture recognition using spatio-temporal features along with
[31] I. Jegham, A. Ben Khalifa, I. Alouani, and M. A. Mahjoub, ‘‘Vision- radon transform,’’ Multimedia Tools Appl., vol. 8, pp. 42325–42351,
based human action recognition: An overview and real world challenges,’’ Aug. 2022.
Forensic Sci. Int., Digit. Invest., vol. 32, Mar. 2020, Art. no. 200901. [53] S. F. Ali, R. Khan, A. Mahmood, M. T. Hassan, and M. Jeon, ‘‘Using
[32] O. D. Lara and M. A. Labrador, ‘‘A survey on human activity recognition temporal covariance of motion and geometric features via boosting for
using wearable sensors,’’ IEEE Commun. Surveys Tuts., vol. 15, no. 3, human fall detection,’’ Sensors, vol. 18, no. 6, p. 1918, 2018.
pp. 1192–1209, 3rd Quart., 2013. [54] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, ‘‘Realtime multi-person 2D
[33] F. Demrozi, G. Pravadelli, A. Bihorac, and P. Rashidi, ‘‘Human activity pose estimation using part affinity fields,’’ in Proc. IEEE Conf. Comput.
recognition using inertial, physiological and environmental sensors: A Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 7291–7299.
comprehensive survey,’’ IEEE Access, vol. 8, pp. 210816–210836, 2020. [55] Q. Dang, J. Yin, W. Bin, and W. Zheng, ‘‘Deep learning based 2D
[34] A. Das Antar, M. Ahmed, and M. A. R. Ahad, ‘‘Challenges in sensor- human pose estimation: A survey,’’ Tsinghua Sci. Technol., vol. 24, no. 6,
based human activity recognition and a comparative analysis of bench- pp. 663–676, Dec. 2019.
mark datasets: A review,’’ in Proc. Joint 8th Int. Conf. Informat., Elec- [56] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, ‘‘RMPE: Regional multi-
tron. Vis. (ICIEV) 3rd Int. Conf. Imag., Vis. Pattern Recognit. (icIVPR), person pose estimation,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
May 2019, pp. 134–139. Oct. 2017, pp. 2334–2343.
[35] R. Liu, A. A. Ramli, H. Zhang, E. Henricson, and X. Liu, ‘‘An overview of [57] Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, ‘‘OpenPose:
human activity recognition using wearable sensors: Healthcare and artifi- Realtime multi-person 2D pose estimation using part affinity fields,’’
cial intelligence,’’ in Proc. Int. Conf. Internet Things. Cham, Switzerland: IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186,
Springer, 2021, pp. 1–14. Jan. 2021.
[36] K. Kumari, P. H. Chandankhede, A. S. Titarmare, B. R. Vidhale, and [58] S. Jin, X. Ma, Z. Han, Y. Wu, W. Yang, W. Liu, C. Qian, and
S. K. Tadse, ‘‘A review on human activity recognition using body sen- W. Ouyang, ‘‘Towards multi-person pose tracking: Bottom-up and top-
sor networks,’’ in Proc. 5th Int. Conf. Comput. Methodol. Commun. down methods,’’ in Proc. ICCV Posetrack Workshop, 2017, vol. 2, no. 3,
(ICCMC), Apr. 2021, pp. 61–66. p. 7.
[37] E. De-La-Hoz-Franco, P. Ariza-Colpas, J. M. Quero, and M. Espinilla, [59] W. Li, Z. Wang, B. Yin, Q. Peng, Y. Du, T. Xiao, G. Yu, H. Lu, Y. Wei, and
‘‘Sensor-based datasets for human activity recognition—A systematic J. Sun, ‘‘Rethinking on multi-stage networks for human pose estimation,’’
review of literature,’’ IEEE Access, vol. 6, pp. 59192–59210, 2018. 2019, arXiv:1901.00148.
[38] A. Biswal, S. Nanda, C. R. Panigrahi, S. K. Cowlessur, and B. Pati, [60] Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, ‘‘Cascaded
‘‘Human activity recognition using machine learning: A review,’’ in pyramid network for multi-person pose estimation,’’ in Proc. IEEE/CVF
Progress in Advanced Computing and Intelligent Engineering. Singapore: Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 7103–7112.
Springer, 2021, pp. 323–333. [61] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc.
[39] B. Nguyen, Y. Coelho, T. Bastos, and S. Krishnan, ‘‘Trends in human IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969.
activity recognition with focus on machine learning and power require- [62] S. Huang, M. Gong, and D. Tao, ‘‘A coarse-fine network for keypoint
ments,’’ Mach. Learn. Appl., vol. 5, Sep. 2021, Art. no. 100072. localization,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
[40] L. Miranda, J. Viterbo, and F. Bernardini, ‘‘A survey on the use of machine pp. 3028–3037.
learning methods in context-aware middlewares for human activity recog- [63] G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler,
nition,’’ Artif. Intell. Rev., vol. 55, no. 4, pp. 3369–3400, Apr. 2022. and K. Murphy, ‘‘Towards accurate multi-person pose estimation in the
[41] S. Kulkarni, S. Jadhav, and D. Adhikari, ‘‘A survey on human group activ- wild,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
ity recognition by analysing person action from video sequences using Jul. 2017, pp. 4903–4911.
machine learning techniques,’’ in Optimization in Machine Learning and [64] X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, ‘‘Integral human pose
Applications. Singapore: Springer, 2020, pp. 141–153. regression,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 529–545.
[65] B. Xiao, H. Wu, and Y. Wei, ‘‘Simple baselines for human pose esti- [87] D. Lee, W.-H. Yun, C. Park, H. Yoon, and J. Kim, ‘‘Analysis of children’s
mation and tracking,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, posture for the bodily kinesthetic test,’’ in Proc. 12th Int. Conf. Ubiquitous
pp. 466–481. Robots Ambient Intell. (URAI), Oct. 2015, pp. 408–410.
[66] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, [88] H.-T. Chen, Y.-Z. He, and C.-C. Hsu, ‘‘Computer-assisted yoga training
‘‘DeeperCut: A deeper, stronger, and faster multi-person pose estimation system,’’ Multimedia Tools Appl., vol. 77, no. 18, pp. 23969–23991,
model,’’ in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, Sep. 2018.
2016, pp. 34–50. [89] S. Omkar, M. Mour, and D. Das, ‘‘Motion analysis of sun salutation using
[67] M. Kocabas, S. Karagoz, and E. Akbas, ‘‘MultiPoseNet: Fast multi- magnetometer and accelerometer,’’ Int. J. Yoga, vol. 2, no. 2, p. 62, 2009.
person pose estimation using pose residual network,’’ in Proc. Eur. Conf. [90] A. Ramesh, V. Devaraju, K. Alva, V. Debur, and S. Omkar, ‘‘A mathemat-
Comput. Vis. (ECCV), 2018, pp. 417–433. ical method for electromyography analysis of muscle functions during
[68] A. Newell, Z. Huang, and J. Deng, ‘‘Associative embedding: End-to- yogasana,’’ Int. J. Yoga, vol. 12, no. 3, p. 240, 2019.
end learning for joint detection and grouping,’’ in Proc. Adv. Neural Inf. [91] A. Gupta and H. P. Gupta, ‘‘YogaHelp: Leveraging motion sensors for
Process. Syst., vol. 30, 2017, pp. 1–11. learning correct execution of yoga with feedback,’’ IEEE Trans. Artif.
[69] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, Intell., vol. 2, no. 4, pp. 362–371, Aug. 2021.
P. Gehler, and B. Schiele, ‘‘DeepCut: Joint subset partition and labeling [92] M. Gochoo, T.-H. Tan, S.-C. Huang, T. Batjargal, J.-W. Hsieh,
for multi person pose estimation,’’ in Proc. IEEE Conf. Comput. Vis. F. S. Alnajjar, and Y.-F. Chen, ‘‘Novel IoT-based privacy-preserving yoga
Pattern Recognit. (CVPR), Jun. 2016, pp. 4929–4937. posture recognition system using low-resolution infrared sensors and
[70] Y. Chen, Y. Tian, and M. He, ‘‘Monocular human pose estimation: A sur- deep learning,’’ IEEE Internet Things J., vol. 6, no. 4, pp. 7192–7200,
vey of deep learning-based methods,’’ Comput. Vis. Image Understand., Aug. 2019.
vol. 192, Mar. 2020, Art. no. 102897. [93] M. Anusha, ‘‘Real–time yoga activity with assistance of embedded based
[71] A. M. Lehrmann, P. V. Gehler, and S. Nowozin, ‘‘A non-parametric smart yoga mat,’’ in Proc. 2nd Int. Conf. Innov. Electron., Signal Process.
Bayesian network prior of human pose,’’ in Proc. IEEE Int. Conf. Comput. Commun. (IESC), Mar. 2019, pp. 1–6.
Vis., Dec. 2013, pp. 1281–1288. [94] A. Chaudhari, O. Dalvi, O. Ramade, and D. Ambawade, ‘‘Yog-guru:
[72] D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, Real-time yoga pose correction system using deep learning methods,’’
H.-P. Seidel, W. Xu, D. Casas, and C. Theobalt, ‘‘VNect: Real-time 3D in Proc. Int. Conf. Commun. Inf. Comput. Technol. (ICCICT), Jun. 2021,
human pose estimation with a single RGB camera,’’ ACM Trans. Graph., pp. 1–6.
vol. 36, no. 4, pp. 1–14, 2017. [95] S. Jain, A. Rustagi, S. Saurav, R. Saini, and S. Singh, ‘‘Three-dimensional
[73] TechVidvan. (2022). Human Pose Estimation Using OpenCV & CNN-inspired deep learning architecture for yoga pose recognition in
Python. [Online]. Available: https://techvidvan.com/tutorials/human- the real-world environment,’’ Neural Comput. Appl., vol. 33, no. 12,
pose-estimation-opencv/ pp. 6427–6441, Jun. 2021.
[74] E. Odemakinde. (2022). Human Pose Estimation With Deep Learning— [96] S. K. Yadav, A. Singh, A. Gupta, and J. L. Raheja, ‘‘Real-time yoga
Ultimate Overview in 2022. [Online]. Available: https://viso.ai/deep- recognition using deep learning,’’ Neural Comput. Appl., vol. 31, no. 12,
learning/pose-estimation-ultimate-overview/ pp. 9349–9361, Dec. 2019.
[75] A. Baumberg and D. Hogg, ‘‘Learning flexible models from image [97] T. K. K. Maddala, P. V. V. Kishore, K. K. Eepuri, and A. K. Dande,
sequences,’’ in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, ‘‘YogaNet: 3-D yoga asana recognition using joint angular displace-
1994, pp. 297–308. ment maps with ConvNets,’’ IEEE Trans. Multimedia, vol. 21, no. 10,
[76] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, ‘‘Active shape pp. 2492–2503, Oct. 2019.
[98] M. C. Thar, K. Z. N. Winn, and N. Funabiki, ‘‘A proposal of yoga pose
models-their training and application,’’ Comput. Vis. Image Understand.,
assessment method using pose detection for self-learning,’’ in Proc. Int.
vol. 61, no. 1, pp. 38–59, 1995.
Conf. Adv. Inf. Technol. (ICAIT), Nov. 2019, pp. 137–142.
[77] R. Urtasun and P. Fua, ‘‘3D human body tracking using deterministic tem-
[99] C.-H. Lin, S.-W. Shen, I. T. Anggraini, N. Funabiki, and C.-P. Fan,
poral motion models,’’ in Proc. Eur. Conf. Comput. Vis. Berlin, Germany:
‘‘An openpose-based exercise and performance learning assistant design
Springer, 2004, pp. 92–106.
for self-practice yoga,’’ in Proc. IEEE 10th Global Conf. Consum. Elec-
[78] O. Freifeld, A. Weiss, S. Zuffi, and M. J. Black, ‘‘Contour people:
tron. (GCCE), Oct. 2021, pp. 456–457.
A parameterized model of 2D articulated human shape,’’ in Proc.
[100] S. Bhambure, S. Lawande, R. Upasani, and J. Kundargi, ‘‘YOG-assist,’’ in
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2010,
Proc. 4th Biennial Int. Conf. Nascent Technol. Eng. (ICNTE), Jan. 2021,
pp. 639–646.
pp. 1–6.
[79] H. Jiang, ‘‘Finding human poses in videos using concurrent matching [101] A. Gupta and A. Jangid, ‘‘Yoga pose detection and validation,’’ in
and segmentation,’’ in Proc. Asian Conf. Comput. Vis. Berlin, Germany: Proc. Int. Symp. Asian Control Assoc. Intell. Robot. Ind. Autom. (IRIA),
Springer, 2010, pp. 228–243. Sep. 2021, pp. 319–324.
[80] H. Xu, E. G. Bazavan, A. Zanfir, W. T. Freeman, R. Sukthankar, and [102] F. Rishan, B. De Silva, S. Alawathugoda, S. Nijabdeen, L. Rupasinghe,
C. Sminchisescu, ‘‘GHUM & GHUML: Generative 3D human shape and and C. Liyanapathirana, ‘‘Infinity yoga tutor: Yoga posture detection and
articulated pose models,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern correction system,’’ in Proc. 5th Int. Conf. Inf. Technol. Res. (ICITR),
Recognit., Jun. 2020, pp. 6184–6193. Dec. 2020, pp. 1–6.
[81] H.-T. Chen, Y.-Z. He, C.-C. Hsu, C.-L. Chou, S.-Y. Lee, and [103] Y. Agrawal, Y. Shah, and A. Sharma, ‘‘Implementation of machine learn-
B.-S. P. Lin, ‘‘Yoga posture recognition for self-training,’’ in Proc. ing technique for identification of yoga poses,’’ in Proc. IEEE 9th Int.
Int. Conf. Multimedia Modeling. Cham, Switzerland: Springer, 2014, Conf. Commun. Syst. Netw. Technol. (CSNT), Apr. 2020, pp. 40–43.
pp. 496–505. [104] J. Jose and S. Shailesh, ‘‘Yoga asana identification: A deep learning
[82] E. W. Trejo and P. Yuan, ‘‘Recognition of yoga poses through an inter- approach,’’ in Proc. IOP Conf., Mater. Sci. Eng., 2021, vol. 1110, no. 1,
active system with Kinect device,’’ in Proc. 2nd Int. Conf. Robot. Autom. Art. no. 012002.
Sci. (ICRAS), Jun. 2018, pp. 1–5. [105] C. Long, E. Jo, and Y. Nam, ‘‘Development of a yoga posture coaching
[83] A. Mohanty, A. Ahmed, T. Goswami, A. Das, P. Vaishnavi, and system using an interactive display based on transfer learning,’’ J. Super-
R. R. Sahay, ‘‘Robust pose recognition using deep learning,’’ in Proc. comput., vol. 78, no. 4, pp. 5269–5284, Mar. 2022.
Int. Conf. Comput. Vis. Image Process. Singapore: Springer, 2017, [106] C. Nagalakshmi and S. Mukherjee, ‘‘Classification of yoga asanas from a
pp. 93–105. single image by learning the 3D view of human poses,’’ in Digital Tech-
[84] H.-T. Chen, Y.-Z. He, C.-L. Chou, S.-Y. Lee, B.-S.-P. Lin, and niques for Heritage Presentation and Preservation. Cham, Switzerland:
J.-Y. Yu, ‘‘Computer-assisted self-training system for sports exercise Springer, 2021, pp. 37–49.
using kinects,’’ in Proc. IEEE Int. Conf. Multimedia Expo Workshops [107] S. Patil, A. Pawar, A. Peshave, A. N. Ansari, and A. Navada, ‘‘Yoga tutor
(ICMEW), Jul. 2013, pp. 1–4. visualization and analysis using SURF algorithm,’’ in Proc. IEEE Control
[85] P. Pullen and W. Seffens, ‘‘Machine learning gesture analysis of yoga Syst. Graduate Res. Colloq., Jun. 2011, pp. 43–46.
for exergame development,’’ IET Cyber-Phys. Syst., Theory Appl., vol. 3, [108] Y. Wu, Q. Lin, M. Yang, J. Liu, J. Tian, D. Kapil, and L. Vanderbloemen,
no. 2, pp. 106–110, Jun. 2018. ‘‘A computer vision-based yoga pose grading approach using contrastive
[86] M. U. Islam, H. Mahmud, F. B. Ashraf, I. Hossain, and M. K. Hasan, skeleton feature representations,’’ Healthcare, vol. 10, no. 1, p. 36, 2021.
‘‘Yoga posture recognition by detecting human joint points in real time [109] R. Mullerpatan, B. Agarwal, T. Shetty, G. Nehete, and O. Narasipura,
using Microsoft Kinect,’’ in Proc. IEEE Region 10 Humanitarian Technol. ‘‘Kinematics of suryanamaskar using three-dimensional motion capture,’’
Conf. (R-HTC), Dec. 2017, pp. 668–673. Int. J. Yoga, vol. 12, no. 2, p. 124, 2019.
[110] W. Sun, ‘‘RFitness: Enabling smart yoga mat for fitness posture detection [135] S. Puranik and A. W. Morales, ‘‘Heart rate estimation of PPG signals with
with commodity passive RFIDs,’’ in Proc. IEEE Int. Conf. RFID (RFID), simultaneous accelerometry using adaptive neural network filtering,’’
Apr. 2021, pp. 1–8. IEEE Trans. Consum. Electron., vol. 66, no. 1, pp. 69–76, Feb. 2020.
[111] S. Huang, D. Wang, R. Zhao, and Q. Zhang, ‘‘Wiga: A WiFi-based [136] S. C. Sethuraman, P. Kompally, S. P. Mohanty, and U. Choppali,
contactless activity sequence recognition system based on deep learning,’’ ‘‘MyWear: A novel smart garment for automatic continuous vital mon-
in Proc. 15th Int. Conf. Mobile Ad-Hoc Sensor Netw. (MSN), Dec. 2019, itoring,’’ IEEE Trans. Consum. Electron., vol. 67, no. 3, pp. 214–222,
pp. 69–74. Aug. 2021.
[112] L. Shaw and A. Routray, ‘‘A critical comparison between SVM and [137] A. Ometov, ‘‘A survey on wearable technology: History, state-of-
k-SVM in the classification of kriya yoga meditation state-allied EEG,’’ the-art and current challenges,’’ Comput. Netw., vol. 193, Jul. 2021,
in Proc. IEEE Int. WIE Conf. Electr. Comput. Eng. (WIECON-ECE), Art. no. 108074.
Dec. 2016, pp. 134–138. [138] L. Truppa, P. Garofalo, M. Raggi, E. Bergamini, G. Vannozzi,
[113] P. Anantamek and N. Hnoohom, ‘‘Recognition of yoga poses using EMG A. M. Sabatini, and A. Mannini, ‘‘Magnetic-free extended Kalman filter
signals from lower limb muscles,’’ in Proc. Joint Int. Conf. Digit. Arts, for upper limb kinematic assessment in yoga,’’ in Proc. 43rd Annu. Int.
Media Technol. ECTI Northern Sect. Conf. Electr., Electron., Comput. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Nov. 2021, pp. 937–940.
Telecommun. Eng. (ECTI DAMT-NCON), Jan. 2019, pp. 132–136. [139] Z. Luo, W. Yang, Z. Q. Ding, L. Liu, I.-M. Chen, S. H. Yeo,
[114] Z. Wu, J. Zhang, K. Chen, and C. Fu, ‘‘Yoga posture recognition and quan- K. V. Ling, and H. B.-L. Duh, ‘‘‘Left Arm Up!’ Interactive Yoga training
titative evaluation with wearable sensors based on two-stage classifier and in virtual environment,’’ in Proc. IEEE Virtual Reality Conf., Mar. 2011,
prior Bayesian network,’’ Sensors, vol. 19, no. 23, p. 5129, Nov. 2019. pp. 261–262.
[140] A. Sadun, J. Jalani, and J. Sukor, ‘‘Force sensing resistor (FSR): A brief
[115] J. Kutálek and K. Kutálek, ‘‘Detection of yoga poses in image and video,’’
overview and the low-cost sensor for active compliance control,’’ Proc.
Brno Fcaulty Univ. Inf. Technol., Brno, Czechia, Tech. Rep. 24, 2021.
SPIE, vol. 10011, pp. 222–226, Jul. 2016.
[116] S. W. Mohammed, V. Garrapally, S. Manchala, S. N. Reddy, and
[141] Tekscan. (2019). How Does a Force Sensing Resistor (FSR) Work?
S. K. Naligenti, ‘‘Recognition of yoga asana from real-time videos using
[Online]. Available: https://www.tekscan.com/blog/flexiforce/how-does-
blaze-pose,’’ Int. J. Comput. Digit. Syst., vol. 12, no. 1, pp. 1304–1295,
force-sensing-resistor-fsr-work
2022.
[142] I. H. Sarker, ‘‘Machine learning: Algorithms, real-world applications and
[117] Z. Hussain, M. Sheng, and W. Emma Zhang, ‘‘Different approaches for
research directions,’’ Social Netw. Comput. Sci., vol. 2, no. 3, pp. 1–21,
human activity recognition: A survey,’’ 2019, arXiv:1906.05074.
May 2021.
[118] R. Poppe, ‘‘Vision-based human motion analysis: An overview,’’ Comput. [143] J. Su, S. Lu, and L. Li, ‘‘A dual quantum image feature extraction
Vis. Image Understand., vol. 108, nos. 1–2, pp. 4–18, Oct. 2007. method: PSQIFE,’’ IET Image Process., vol. 16, no. 13, pp. 3529–3543,
[119] J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, ‘‘Deep learning for sensor- Nov. 2022.
based activity recognition: A survey,’’ Pattern Recognit. Lett., vol. 119, [144] M. Bicego, A. Lagorio, E. Grosso, and M. Tistarelli, ‘‘On the use of SIFT
pp. 3–11, Mar. 2019. features for face authentication,’’ in Proc. Conf. Comput. Vis. Pattern
[120] A. Saboor, T. Kask, A. Kuusik, M. M. Alam, Y. Le Moullec, I. K. Niazi, Recognit. Workshop (CVPRW), Jun. 2006, p. 35.
A. Zoha, and R. Ahmad, ‘‘Latest research trends in gait analysis using [145] J. Luo, Y. Ma, E. Takikawa, S. Lao, M. Kawade, and B.-L. Lu, ‘‘Person-
wearable sensors and machine learning: A systematic review,’’ IEEE specific SIFT features for face recognition,’’ in Proc. IEEE Int. Conf.
Access, vol. 8, pp. 167830–167864, 2020. Acoust., Speech Signal Process. (ICASSP), vol. 2, Apr. 2007, pp. II–593.
[121] H. K. Thakkar, S. R. Chowdhury, A. K. Bhoi, and P. Barsocchi, ‘‘Appli- [146] C. Geng and X. Jiang, ‘‘Face recognition using sift features,’’ in Proc.
cations of wearable technologies in healthcare: An analytical study,’’ in 16th IEEE Int. Conf. Image Process. (ICIP), Nov. 2009, pp. 3313–3316.
5G IoT and Edge Computing for Smart Healthcare. Amsterdam, The [147] D. R. Kisku, A. Rattani, E. Grosso, and M. Tistarelli, ‘‘Face identifica-
Netherlands: Elsevier, 2022, pp. 279–299. tion by SIFT-based complete graph topology,’’ in Proc. IEEE Workshop
[122] A. Singhal and M. R. Cowie, ‘‘The role of wearables in heart failure,’’ Autom. Identificat. Adv. Technol., Jun. 2007, pp. 63–68.
Current Heart Failure Rep., vol. 17, no. 4, pp. 125–132, Aug. 2020. [148] S. Abarna, V. Rathikarani, and P. Dhanalakshmi, ‘‘SIFT and SURF fea-
[123] W. X. (2022). Try on NADI X in Your Ar Closet. [Online]. Available: tures based classification of yoga hand mudras using machine learning
https://www.wearablex.com/blogs/news/try-on-nadi-x-in-your-ar-closet techniques,’’ Int. J. Health Sci., vol. 1, pp. 1971–1984, Mar. 2022.
[124] GlobeNewswire. (2022). The Rising Use of Smart Devices and Wear- [149] M. S. Reddy, S. A. Venkatramana, B. Ramji, and N. MV, ‘‘e-yoga
ables is Expected to Drive the Internet Of Things (IoT) in Healthcare prescription designed for computer users using e-yoga environment for
Market as Per the Business Research Company’s Internet of Things posture,’’ in Proc. 3rd Int. Congr. Sport Sci. Res. Technol. Support, 2015,
(IoT) in Healthcare Global Market Report 2022. [Online]. Available: pp. 48–52.
https://tinyurl.com/4u4nk876 [150] Y. Yang and D. Ramanan, ‘‘Articulated pose estimation with flexible
[125] J. Wang, Z. Huang, W. Zhang, A. Patil, K. Patil, T. Zhu, E. J. Shiroma, mixtures-of-parts,’’ in Proc. CVPR, Jun. 2011, pp. 1385–1392.
M. A. Schepps, and T. B. Harris, ‘‘Wearable sensor based human posture [151] M. Sanzari, V. Ntouskos, and F. Pirri, ‘‘Bayesian image based 3D
recognition,’’ in Proc. IEEE Int. Conf. Big Data (Big Data), Dec. 2016, pose estimation,’’ in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland:
pp. 3432–3438. Springer, 2016, pp. 566–582.
[126] S. C. Mukhopadhyay, ‘‘Wearable sensors for human activity monitoring: [152] M. Sun and S. Savarese, ‘‘Articulated part-based model for joint object
A review,’’ IEEE Sensors J., vol. 15, no. 3, pp. 1321–1330, Mar. 2015. detection and pose estimation,’’ in Proc. Int. Conf. Comput. Vis.,
[127] Fitbit. (2021). Fitbit Charge 5. [Online]. Available: https://www.fitbit. Nov. 2011, pp. 723–730.
com/global/in/products/trackers/charge5 [153] F. Wang and Y. Li, ‘‘Beyond physical connections: Tree models in human
pose estimation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[128] (2021). MI Smart Band 6. [Online]. Available: https://www.mi.
Jun. 2013, pp. 596–603.
com/global/product/mi-smart-band-6/
[154] G. Bradski and A. Kaehler, Learning OpenCV: Computer Vision With the
[129] Myzone. (2022). MZ-Switch. [Online]. Available: https://www.myzone.
OpenCV Library. Sebastopol, CA, USA: O’Reilly Media, 2008.
org/mz-switch
[155] K. Adhikari, H. Bouchachia, and H. Nait-Charif, ‘‘Activity recognition
[130] H. India. (2021). Huawei Watch Fit New. [Online]. Available: for indoor fall detection using convolutional neural network,’’ in Proc.
https://consumer.huawei.com/in/wearables/watch-fit-new/ 15th IAPR Int. Conf. Mach. Vis. Appl. (MVA), May 2017, pp. 81–84.
[131] A. Inc. (2022). Apple Watch Series 8. [Online]. Available: https://www. [156] M. Tölgyessy, M. Dekan, L’. Chovanec, and P. Hubinský, ‘‘Evaluation of
apple.com/in/apple-watch-series-8/ the azure Kinect and its comparison to Kinect v1 and Kinect v2,’’ Sensors,
[132] Hexoskin. (2022). Hexoskin Smart Shirts—Cardiac, Respiratory, Sleep vol. 21, no. 2, p. 413, Jan. 2021.
Activity Metrics. [Online]. Available: https://www.hexoskin.com/ [157] H. Alaoui, M. T. Moutacalli, and M. Adda, ‘‘AI-enabled high-level layer
[133] Astroskin. (2022). Astroskin Vital Signs Monitoring Platform. [Online]. for posture recognition using the azure Kinect in unity3D,’’ in Proc. IEEE
Available: https://www.hexoskin.com/pages/astroskin-vital-signs- 4th Int. Conf. Image Process., Appl. Syst. (IPAS), Dec. 2020, pp. 155–161.
monitoring-platform-for-advanced-research [158] L. Wang, D. Q. Huynh, and P. Koniusz, ‘‘A comparative review of
[134] S. C. Lab. (2019). Athos Apparel: Smart Clothing Lab. [Online]. Avail- recent Kinect-based action recognition algorithms,’’ IEEE Trans. Image
able: https://smartclothinglab.com/brands/athos-apparel/ Process., vol. 29, pp. 15–28, 2020.
[159] Wikipedia. (2022). Kinect. [Online]. Available: https://en.wikipedia. [182] S. Kothari, ‘‘Yoga pose classification using deep learning,’’ San Jose State
org/wiki/Kinect Univ., San Jose, CA, USA, Tech. Rep. 932, 2020.
[160] D. Thorp-Lancaster. (2019). Azure Kinect Developer Kit Hits [183] Priyanka and D. Kumar, ‘‘Decision tree classifier: A detailed survey,’’ Int.
General Availability, Preorders Begin Shipping. [Online]. Available: J. Inf. Decis. Sci., vol. 12, no. 3, pp. 246–269, 2020.
https://www.windowscentral.com/azure-Kinect-developer-kit-hits- [184] C. A. Ul Hassan, M. S. Khan, and M. A. Shah, ‘‘Comparison of machine
general-availability learning algorithms in data classification,’’ in Proc. 24th Int. Conf. Autom.
[161] Wikipedia. (2021). Azure Kinect. [Online]. Available: https://en. Comput. (ICAC), Sep. 2018, pp. 1–6.
wikipedia.org/wiki/Azure_Kinect [185] D. Varghese. (2018). Comparative Study on Classic Machine
[162] J. A. Albert, V. Owolabi, A. Gebel, C. M. Brahms, U. Granacher, and Learning Algorithms. [Online]. Available: https://towardsdatascience.
B. Arnrich, ‘‘Evaluation of the pose tracking performance of the azure com/comparative-study-on-classic-machine-learning-algorithms-
Kinect and Kinect v2 for gait analysis in comparison with a gold standard: 24f9ff6ab222
A pilot study,’’ Sensors, vol. 20, no. 18, p. 5104, Sep. 2020. [186] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis,
[163] S. C. Mears, S. A. Tackett, M. C. Elkins, A. C. Severin, S. G. Barnes, ‘‘Deep learning for computer vision: A brief review,’’ Comput. Intell.
E. M. Mannen, and R. D. Martin, ‘‘Ankle motion in common yoga poses,’’ Neurosci., vol. 2018, Feb. 2018, Art. no. 7068349.
Foot, vol. 39, pp. 55–59, Jun. 2019. [187] R. Kumar Sinha, R. Pandey, and R. Pattnaik, ‘‘Deep learning for computer
[164] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, ‘‘A brief vision tasks: A review,’’ 2018, arXiv:1804.03928.
introduction to OpenCV,’’ in Proc. 35th Int. Conv. MIPRO, May 2012, [188] Q. Ke, J. Liu, M. Bennamoun, S. An, F. Sohel, and F. Boussaid,
pp. 1725–1730. ‘‘Computer vision for human–machine interaction,’’ in Computer Vision
[165] U. Zabala, I. Rodriguez, J. M. Martínez-Otzeta, and E. Lazkano, ‘‘Mod- for Assistive Healthcare. Amsterdam, The Netherlands: Elsevier, 2018,
eling and evaluating beat gestures for social robots,’’ Multimedia Tools pp. 127–145.
Appl., vol. 81, no. 3, pp. 3421–3438, Jan. 2022.
[166] G. Boesch. (2022). A Guide to OpenPose in 2022. [Online]. Available:
https://viso.ai/deep-learning/OpenPose/
[167] I. Kareem, S. F. Ali, and A. Sheharyar, ‘‘Using skeleton based optimized
residual neural network architecture of deep learning for human fall
detection,’’ in Proc. IEEE 23rd Int. Multitopic Conf. (INMIC), Nov. 2020, ARUN KUMAR RAJENDRAN received the B.E.
pp. 1–5. degree in computer science and engineering
[168] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time
and the M.E. degree in computer and commu-
object detection with region proposal networks,’’ in Proc. Adv. Neural Inf.
nication engineering from Anna University, in
Process. Syst., vol. 28, 2015, pp. 1–9.
2006 and 2008, respectively. He is currently pur-
[169] E. Odemakinde. (2022). Everything About Mask R-CNN: A Beginner’s
Guide. [Online]. Available: https://viso.ai/deep-learning/mask-r-cnn/ suing the Ph.D. degree with the School of Com-
[170] GoogleLLC. (2020). Mediapipe. [Online]. Available: https://google. puter Science and Engineering, Vellore Institute of
github.io/mediapipe/ Technology-Andhra Pradesh (VIT-AP) University.
[171] Google. (2020). Mediapipe Pose. [Online]. Available: https://google. His Ph.D. research focuses on developing a com-
github.io/mediapipe/solutions/pose.html puter vision model for yogic posture correction
[172] GoogleLLc. (2020). Mediapipe Holistic. [Online]. Available: and classification to assist users in better understanding the health benefits
https://google.github.io/mediapipe/solutions/holistic.html of yoga. His research interests include computer vision, machine learning,
[173] A. P. Gulati. (2022). Facial Landmarks Detection Using Medi- and deep learning.
apipe Library. [Online]. Available: https://www.analyticsvidhya.com/
blog/2022/03/facial-landmarks-detection-using-mediapipe-library/
[174] GoogleLLC. (2020). Mediapipe Hands. [Online]. Available:
https://google.github.io/mediapipe/solutions/hands.html
[175] Asmorkalov. (2022). OpenCV. [Online]. Available: https://github.
com/opencv/opencv SIBI CHAKKARAVARTHY SETHURAMAN
[176] G. Hidalgo and Y. Raaj. (2022). OpenPose. [Online]. Available: (Member, IEEE) received the Ph.D. degree from
https://github.com/CMU-Perceptual-Computing-Lab/openpose Anna University, in 2018. He is currently work-
[177] W. Abdulla. (2017). Mask R-CNN for Object Detection and ing as an Associate Professor with the School
Instance Segmentation on Keras and Tensorflow. [Online]. Available: of Computer Science and Engineering, Vellore
https://github.com/matterport/Mask_RCNN
Institute of Technology-Andhra Pradesh (VIT-AP)
[178] M. Team and S. Schmidt. (2022). Mediapipe. [Online]. Available:
University. He is the Coordinator of the Artificial
https://github.com/google/mediapipe
Intelligence and Robotics (AIR) Research Center,
[179] S. Ray, ‘‘A quick review of machine learning algorithms,’’ in Proc. Int.
Conf. Mach. Learn., Big Data, Cloud Parallel Comput. (COMITCon), VIT-AP University. He is the Lead Engineer for
Feb. 2019, pp. 35–39. the Project ‘‘VISU,’’ an advanced 3D printed
[180] S. Learn. (2022). LogisticregressionCV. [Online]. Available: humanoid robot developed by VIT-AP. He is an active contributor to the
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model. open source community and a lead writer in top security magazines, such as
LogisticRegressionCV.html Pentestmag and eforensics. He was a recipient of the DST Fellowship. He is
[181] R. Choudhary and H. K. Gianey, ‘‘Comprehensive review on supervised an active Reviewer of many reputed journals, including IEEE, Springer, IET,
machine learning algorithms,’’ in Proc. Int. Conf. Mach. Learn. Data Sci. IGI Global, and Hindawi.
(MLDS), Dec. 2017, pp. 37–43.