Icses 24 T3 1047

An Advanced Autoencoder-Based Approach to
Anomaly Detection for Video Surveillance

Systems
1st Abhiram G 2nd Madineni Nitheesha 3rd Ambati Samatha
Department of Electronics and Department of Electronics and Department of Electronics and
Communication Engineering Communication Engineering Communication Engineering
SRM Institute of Science and SRM Institute of Science and SRM Institute of Science and
Technology, Kattankulathur Technology, Kattankulathur Technology, Kattankulathur
Chennai, India Chennai, India Chennai, India
[email protected]
[email protected] [email protected]
4th S. Giriprasad
Department of Electronics and Communication Engineering
SRM Institute of Science and Technology, Kattankulathur
Chennai, India
[email protected]
Abstract— visual feedback to the user regarding which frames

are anomalous and can be further inspected. The
This paper attempts to give a state-of-the-art experiment results demonstrate how the model is
machine learning anomaly detection in video robust against vast and complex environments. The
surveillance using the application of autoencoders. proposed system extends a reliable, highly scalable,
The use of video surveillance ensures safety and and flexible solution for smart surveillance, reducing
security in places like public spaces, office buildings, constant human supervision, and improving the speed
and homes. However, with the multiple cameras of response to security threats. More work will go into
present, monitoring various video feeds is quite enhancing the system so that it can better deal with
demanding and, thus, predisposed to human error or subtler anomalies and generalize to larger datasets for
inefficiency when dealing with large environments. ultimate deployment.
The single critical feature that will serve to automate
this process pertains to identifying unusual events or Keywords— Video Surveillance, Anomaly
behavior. The proposed system relies on an Detection, Autoencoder, Machine Learning, Smart
autoencoder neural network model for learning Surveillance
patterns of normal behavior in video streams. The
UCSD, a diverse dataset containing surveillance I. INTRODUCTION
videos, was used to train and test the system. Video surveillance is the view of surveillance cameras
Preprocessing of video data includes frame extraction, installed and used over public or private areas. This
resizing, and normalization, followed by feeding into camera captures video tapes which are either archived or
the model. It mainly consists of two parts, namely an under a real-time screen by the security personnel [1]; the
encoder that compresses the input data and a decoder multiple cameras in that case pose a problem requiring too
that reconstructs the input from the compressed many human eyes over the video feeds, to identify
representation. The reconstruction error is then incidents and eventually miss many due to being human
calculated by comparing the original and beings that get fatigued by time. Anomaly detection also
reconstructed frames. The frames with high plays a critical role in this process because it detects any
reconstruction errors are flagged as containing type of unusual events or behaviours that do not normally
anomalies. This system shows a high level of accuracy appear. It includes activities such as unauthorized access,
measured in terms of precision and recall. In addition suspicious behaviour, or sudden and unexpected crowd
to real-time anomaly detection, the system offers formation.
Machine learning (ML) is now an important tool for encoder dimensions and the schedule used for learning led
automating anomaly detection in video surveillance [2]. It to a more robust model output.
allows systems to "learn" from large datasets, identifying
Besides, Ganesh et al. [6] introduced Multi-
patterns of normal behaviour and detecting deviations
Sequence Learning (MSL), which adaptively optimizes
without manual programming. One effective ML
reduced sample lengths to sharpen localization
technique is the use of autoencoders, which are NNs
boundaries. Their approach very clearly shows the
designed to learn the essential features of normal data. In
importance of tuning hyperparameters like sequence
such a scenario, when an anomaly appears, for example,
length and sampling rates on the anomaly localization
where a person is entering into a restricted area, it fails to
accuracy. Meanwhile, Dhole et al. [7] proposed a
reconstruct the event properly and flags the anomaly. This
convolutional spatiotemporal autoencoder for feature
paper focuses on using autoencoders in anomaly detection
extraction in video sequences. Their convolutional filter
for smart surveillance systems, especially on video data
size hyperparameter modifications and the adopted
from the UCSD dataset.
pooling strategies were relevant to achieve better
Traditional surveillance systems monitor feeds with temporal feature extraction, which is very essential for
human operators, or employ predefined rules for anomaly anomaly detection.
detection. They are inherently constrained by the inability
to react to novel and unexpected patterns of anomalous In the multimodal information domain, Babanne
behaviour. Machine learning-based solutions eliminate the et al. proposed integrating visual and audio cues to
above constraints and automatically learn from data. strengthen video anomaly detection frameworks. Their
Techniques, such as CNNs, and autoencoders have been method required careful tuning of parameters related to
used popularly in image and video processing [3]. It has feature fusion, demonstrating that the alignment of
been proved previously that autoencoders work better than different modalities can enhance the system's ability to
any other model or model variants for video anomaly identify complex anomalous events. Wu et al. [8] had
detection since it focuses on the reconstruction errors, further developed this foundation through HL-Net;
which may suit the real-time scenario. synthesizing appearance, motion, and audio combines the
three for a more thorough multimodal approach, bringing
II. RELATED WORK to the surface the need to optimize hyperparameters in
Recent advances in video anomaly detection led seeking higher accuracy across different datasets.
to two-stage self-training approaches for generating high-
This more and more comes into play as the field
confidence pseudo-labels over video snippets, thereby
advances into the fact that although diverse methods have
rephrasing weakly supervised anomaly detection as a
highly explored modelling of the temporal relation, most
supervised learning problem with noisy labels. In this
methods depend on parallel branches to introduce more
regard, Althubiti et al. [4] further optimized their model
parameters, and thus to increase their computational
using the LSTM network with a focus on the
costs. That is seen in [9] where Vinayakumar et al
hyperparameters, which consisted of the number of
proposed an innovative model that could combine CNNs
hidden layers and dropout rates to better optimize the
with LSTMs for bettering temporal dynamics. Their
model in terms of anomaly detection. Their approach is
hyperparameters of number of convolutional layers and
an iterative refinement of the anomaly classifier, thus
sequences lengths do highlight the appeal of fine-tuned
establishing that recurrent architectures are a good fit for
models that not only are more accurate but also less
the underlying temporal dynamics.
computationally expensive.
In parallel, efforts have been made utilizing MIL
Recent research has also pivoted towards
to improve the quality of the training process. For
refining anomaly detection methodologies through
instance, Kwon et al. [5] used the graph convolution
innovative architectures and frameworks. The work by
network for refining pseudo-labels iteratively improving
Ergen and Kozat [10] emphasizes unsupervised learning
the classifier for anomaly detection. This method
methods with LSTM neural networks, tuning parameters
improves the detection performance and takes care of the
to improve temporal feature learning. This aligns with the
essential hyperparameters concerning learning rates and
objectives of Zhang et al., who introduced a multi-head
graph parameters. Feng et al. developed a multi-instance
module that generates varied pseudo labels, employing an
pseudo-label generation technique to fine-tune feature
iterative uncertainty-based training strategy to prioritize
encoders when creating task-specific discriminative
clips with lower uncertainty.
features. According to this suggestion, optimization of the
III. PROPOSED METHODOLOGY TABLE 2: AUC Comparison Across Datasets for
The methodology for this project involves several critical Different Models
stages, as represented in the block diagram. The entire Dataset Our STG-NF Jigsaw
process can be broken down into the following detailed Model (AUC) (AUC)
steps, starting from the input video dataset to the final (AUC)
output of anomaly detection:
USCD Ped2 99.7% 93.07% 98.88%
CUHK 92.8% 60.90% 91.41%

Avenue
ShanghaiTech 87.72% 85.93% 84.26%
UBnormal 69.88% 71.78% 55.57%
This table compares the AUC performance of three

Fig. 1: Block diagram models: Proposed Model, STG-NF, and Jigsaw, across
four datasets: UCSD Ped2, CUHK Avenue,
A. Input Video Dataset (UCSD Dataset)
ShanghaiTech, and Ubnormal . It highlights the superior
The input to the system consists of videos from the AUC of Our Model on most datasets, particularly 99.7%
UCSD dataset. This dataset includes surveillance videos on UCSD Ped2 and 92.8% on CUHK Avenue.
recorded in various environments, containing both normal
and anomalous behaviours. C. Preprocessing
Resize and Normalize: The extracted frames are
TABLE 1: Details of UCSD Pedestrian Dataset (Ped1 and
resized to a standard dimension (e.g., 128x128 pixels),
Ped2)
which ensures that all frames are uniform in size.
Dataset No. of No. of Resolution Anomalous
Videos Videos Events Normalization: Pixel values are normalized to a range
(Training) (Testing)
between 0 and 1 to speed up convergence during model
People training.
walking
UCSD 𝑥 −𝑚𝑖𝑛(𝑥)
outside 𝑥𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑠𝑒 =
Ped1 (1)
34 36 238 × 158 designated 𝑚𝑎𝑥(𝑥) − 𝑚𝑖𝑛(𝑥)
paths, bikes,
cars, etc. where 𝑥 is the original pixel value.
Similar Data Augmentation: Various transformations such

anomalies, as flipping, rotating, and cropping are applied to the
UCSD
such as non-
Ped2
pedestrian
frames to create variations in the training data. This helps
16 12 360 × 240
objects on the model generalize better [12].
walkways.
D. Feature Extraction
Convolutional Neural Networks (CNN): CNNs are
This table summarizes the UCSD Ped1 and Ped2 applied to the frames to extract spatial features. The layers
datasets, detailing the number of training and testing of CNNs perform convolution operations to detect
videos, resolutions, and types of anomalous events. patterns such as edges, corners, and textures in the
Anomalies include people walking outside designated images.
paths, vehicles, and non-pedestrian objects on walkways.
Layers:
B. Convert Videos to Frames (Frame Extraction)
The videos are split into individual frames. Each frame is • Convolution Layer: Detects low-level features
treated as a separate image for further processing [11]. using filters.
• Activation Function (ReLU): Introduces non-
Reason: Video anomaly detection works at the frame linearity into the model.
level, as processing each frame individually allows the • Pooling Layer: Reduces the spatial dimensions of
model to detect sudden irregularities. 𝑡ℎ𝑒 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑚𝑎𝑝𝑠.
𝑓(𝑝, 𝑞) = ∑𝑎 ∑𝑏 𝛽(𝑎, 𝑏). 𝛾(𝑝 + 𝑎, 𝑞 + 𝑏) (2) in distinguishing between normal and
abnormal behaviour [15].
where 𝛽 is the filter/kernel, and 𝛾 is the input image.
LSTM (Long Short-Term Memory): LSTM networks
Principal Component Analysis (PCA): PCA is
are used to capture temporal patterns between frames.
applied to reduce the dimensionality [13] of the extracted
This is crucial for video data as anomalies may occur
features while preserving most of the variance. This helps
across consecutive frames.
reduce the computational load and prevents overfitting.
𝑎𝑡 = 𝜎(𝜔𝑓 . [𝑔𝑡−1 , 𝑥𝑡 ] + 𝑝𝑓 ) (5)
Z = XW (3)
𝑏𝑡 = 𝜎(𝜔𝑖 . [𝑔𝑡−1 , 𝑥𝑡 ] + 𝑝𝑖 ) (6)
where X is the data matrix, and W is the matrix of
eigenvectors. 𝐶̃𝑡 = tanh(𝜔𝑐 . [𝑔𝑡−1 , 𝑥𝑡 ] + 𝑝𝑐 ) (7)
E. Model Training 𝐶𝑡 = 𝑎𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶̃𝑡 (8)

Autoencoder: An autoencoder is used to learn a where 𝑎𝑡 is the forget gate, bt is the input gate, is the
compressed representation of the normal data. It consists candidate cell state.
of two main parts [14]:
F. Anomaly Detection
▪ Encoder: Compresses the input into a latent
Anomalies are detected based on the reconstruction error.
space representation.
Frames with high reconstruction errors are flagged as
▪ Decoder: Reconstructs the input from the
anomalies since the model is trained on normal data.
compressed data.
Threshold Setting: A threshold is set to classify frames as
anomalous or normal.
L = ∥ 𝐴 − 𝐴̅ ∥2 (4)
G. Postprocessing & Visualization
where A is the input, and 𝐴̅ is the reconstructed
The detected anomalies are visualized by marking the
output. The loss function L minimizes the
frames where the anomalies occurred. These frames are
reconstruction error.
then compiled into a video, showing when the system
TABLE 3: Average Reconstruction Error Comparison for detects abnormal behaviour.
Normal and Anomalous Events
H. Output (Final Result)
Dataset Average Average The final output is a video or series of frames showing the
Reconstruction Reconstruction detected anomalies. A report summarizing the anomalies
Error Error found is also generated.
(Normal) (Anomalous)
I. Proposed Flow Diagram
ShanghaiTech 0.0032 0.0467 The block diagram provided represents the overall flow
of the project, from the input video dataset to the final
USCD Ped2 0.0025 0.0354 output. Each block corresponds to a major stage in the
CUHK 0.0048 0.0491 process, and within each block, sub-blocks can represent
Avenue the more detailed operations
TABLE 4: Confusion Matrix

This table shows the average reconstruction
error for normal and anomalous events across three Predicted Predicted
datasets: ShanghaiTech, UCSD Ped2, and CUHK Anomaly Normal
Avenue. Anomalous events consistently have a higher Actual Anomaly 45 5
reconstruction error, indicating the model's effectiveness
Actual Normal 7 43
Fig. 2: Anomaly Detection Using CNN with Autoencoder
Temporal Sequence Analysis Using LSTM (Long Anomaly Detection and Postprocessing:
Short-Term Memory):
With setting up the autoencoder by using the
While the CNN handles the spatial features in reconstruction error as a threshold, the system
individual frames, the LSTM network is used to calculates that a particular event belongs to an
capture the temporal dynamics (i.e., the progression anomaly. Like normally, this value of threshold is
of events over time). LSTMs are ideal for obtained by determining the 95th percentile from the
understanding sequences because they can reconstruction errors watched during training on
remember long-term dependencies between frames. normal behaviors. Whenever it exceeds the
The LSTM processes sequences of frames (typically threshold limit concerning reconstruction error of a
in batches of 32 frames). It learns the flow of events frame, it considers it to be an anomaly. After the
and detects if an anomaly disrupts the expected system has picked a potential anomaly, the filters are
sequence. For example, if a person suddenly runs in applied in the process of postprocessing to rule out
an area where people normally walk slowly, the false positives. Such minor variations as changes of
LSTM will flag this as an anomaly based on the lighting or camera shake lead to false positives;
deviation from the usual pattern. LSTM detects therefore, the system filters focus on major
anomalies not just by looking at single frames, but anomalies. When detected anomalies are needed for
by understanding how the frames transition over clear illustration in video shots, the system uses the
time. Sudden, unexpected changes in motion or bounding boxes around areas of interest, for
behavior (such as a car speeding into a parking lot) instance, where a person is detected crossing a
are detected by looking at the flow of frames. restricted area. This would allow security personnel
to determine any abnormality in real-time or when
reviewing the footage afterwards
Fig.3: LSTM – Autoencoder Model

training process, it learned to reconstruct those frames
IV. EXPERIMENTS accurately. In the process of training, it minimized the
A. Qualitative Analysis MSE, a form of loss function, by adjusting parameters
in the model. It took a few epochs during the training
The Video Anomaly Detection for Smart
process until the model could reconstruct normal data
Surveillance system, based on Autoencoders,
into minimal error.
showcases its strength in learning and detecting
anomalies from unsupervised video datasets. B. Testing Phase
Qualitatively, the model leverages its ability to It evaluates frames containing both normal and
identify abnormal behaviour in real-time without abnormal data for training. Upon testing, the
requiring labelled data, making it suitable for dynamic reconstruction error for every frame gets computed. If
environments. such as public transportation hubs, it crosses the threshold, then that frame is identified as
office premises, and industrial zones. This anomalous.
unsupervised learning approach is ideal for anomaly
detection since it adapts to diverse environments and TABLE 5: Dataset Statistics for Training and Testing
scales well across large datasets like UCSD Ped2 and Videos
ShanghaiTech.
Dataset No. of No. of Total Total
Videos Videos Frames Frames
It can also reconstruct error analysis, wherein the
(Training) (Testing) (Trainin (Testing)
model compresses the video frames and then g)
reconstructs them. A high reconstruction error points
out the anomalies, hence detects the anomalies. Low Shanghai 330 107 274,51 42,883
false positives are one of the significant qualitative Tech 5
advantages of this model since unnecessary alerts in a USCD 16 12 2,550 2,010
surveillance system are very troublesome. In addition, Ped2
it also tolerates changes in illumination, crowd
density, and scene complexity. However, it has some CUHK 16 21 15,328 15,324
Avenue
limitations of being mostly subliminal and context-
dependent anomaly detection, such as slightly unusual
human behaviors; it might be missing them in scenes
This table summarizes the statistics of the data set
that are generally highly complex or noisy due to
for the number of videos and frames used during
defined reconstruction error thresholds. Moreover, the
training and testing across three different data sets:
robust general anomaly detection does not say much
ShanghaiTech, USCD Ped2, and CUHK Avenue. It
for the environments full of rare but subtle anomalies
gives the overall number of frames analysed for the
that do not deviate quite strongly from the learned
training and testing phases for emphasis on the size as
normal behaviors.
well as balance of the data set.
In conclusion, the system has excellent real-time C. Evaluation & Testing
performance, very low false positives, and scalability,
A testing set is used to test the trained model. Its
which would be a very effective smart surveillance
precision, recall, and F1 score are calculated as a
tool, but improvement of sensitivity and optimization
measure of how well it can detect anomalies.
of the threshold may be enhanced so that it could
apply over more challenging environments.
V. TRAINING AND TESTING 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

This frame includes normal as well as abnormal 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (𝑇𝑃)
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (𝑇𝑃) + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (𝐹𝑃)
(9)
data and is tested to the model. During the test, it
computes the reconstruction error of every frame. If it
crosses the threshold error value, then it is defined as
anomalous. 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (𝑇𝑃)
A. Training Phase 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (𝑇𝑃)+ 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 (𝐹𝑁)
(10)
The autoencoder was trained using a dataset that had
only video frames with normal conditions. In the
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +𝑅𝑒𝑐𝑎𝑙𝑙
(11)
VI. RESULTS
We test the system on the UCSD dataset that
contains different normal and abnormal video events.
From our results, we see that it is highly effective to
use autoencoder-based methods for anomaly
detection. The system achieved a precision of 90% and
a recall of 88%. These metrics confirm that the system
detected most anomalies while keeping false positives
low. Fig. 6: Accuracy
Precision measures the number of actually detected
anomalies, out of all detected anomalies, whereas
recall is a measure of the number of true anomalies
found by the respective system. So, F1-score as the
harmonic mean of precision and recall was used here
for evaluation purposes.
Fig. 7: Precision
Fig. 4: Training & Validation Loss Vs Epochs
Fig. 8: Recall
Fig. 5: Training & Validation Accuracy Vs Epochs
Fig. 9: Specificity
[5] D. Kwon, K. Natarajan, S. C. Suh, H. Kim and J. Kim, "An
Empirical Study on Network Anomaly Detection Using
Convolutional Neural Networks," 2018 IEEE 38th
International Conference on Distributed Computing Systems
(ICDCS), Vienna, Austria, 2018, pp. 1595-1598, doi:
10.1109/ICDCS.2018.00178.
[6] M. Ganesh, A. Kumar and V. Pattabiraman, "Autoencoder
Based Network Anomaly Detection," 2020 IEEE
International Conference on Technology, Engineering,
Management for Societal impact using Marketing,
Entrepreneurship and Talent (TEMSMET), Bengaluru, India,
2020, pp. 1-6, doi: 10.1109/TEMSMET51618.2020.9557464.
[7] H. Dhole, M. Sutaone and V. Vyas, "Anomaly Detection
Fig. 10: F1 Score using Convolutional Spatiotemporal Autoencoder," 2019
10th International Conference on Computing,
Communication and Networking Technologies (ICCCNT),
Kanpur, India, 2019, pp. 1-5, doi:
VII. CONCLUSION 10.1109/ICCCNT45670.2019.8944523.
This paper presents and implements an autoencoder- [8] T. -Y. WU, Z. Lee, Y. Huang, C. -M. Chen and Y. -C. Chen,
based system for Video Anomaly Detection. Our "Security Analysis of Wu et al.'s Authentication Protocol for
system is well-positioned in the detection of Distributed Cloud Computing," 2019 IEEE International
Conference on Consumer Electronics - Taiwan (ICCE-TW),
anomalies with high accuracy and adaptability to Yilan, Taiwan, 2019, pp. 1-2, doi: 10.1109/ICCE-
different surveillance settings. In addition, our system TW46550.2019.8991710.
is fast enough to make real-time anomaly detection. [9] R. Vinayakumar, K. P. Soman and P. Poornachandran, "Long
This system is able to learn the normal behavior and short-term memory based operation log anomaly
identifies anomalies with fewer false positives detection," 2017 International Conference on Advances in
Computing, Communications and Informatics (ICACCI),
compared to traditional techniques, like KNN and
Udupi, India, 2017, pp. 236-242, doi:
Decision Trees, by flagging frames that have high 10.1109/ICACCI.2017.8125846.
reconstruction errors. Results show that our model
[10] T. Ergen and S. S. Kozat, "Neural networks based online
outperformed the existing techniques in all learning," 2017 25th Signal Processing and Communications
performance metrics like Accuracy, Precision, Recall, Applications Conference (SIU), Antalya, Turkey, 2017, pp. 1-
and F1 Score. This implies that its range of 4, doi: 10.1109/SIU.2017.7960218.
applications can be extended over the environments of [11] H. Yuqing, L. Shanshan and Z. Jian, "Multi-channel key
office buildings, parking lots, industrial areas, etc. frame extraction for video surveillance system," 2022 2nd
International Conference on Networking, Communications
This further means that it can be a wonderful robust and Information Technology (NetCIT), Manchester, United
tool for modern security systems. The future work Kingdom, 2022, pp. 83-85, doi:
will be utilized in raising the sensitivity of the model 10.1109/NetCIT57419.2022.00028.
and further deep complex architectures in order to [12] X. Qi, Z. Hu and G. Ji, "Retraining Generative Adversarial
detect ever-so-subtle anomalies on video data. Autoencoder for Video Anomaly Detection," in 2023
Eleventh International Conference on Advanced Cloud and
REFERENCES Big Data (CBD), Danzhou, China, 2023, pp. 63-68, doi:
10.1109/CBD63341.2023.00020.
[1] A. M.R., M. Makker and A. Ashok, "Anomaly Detection in [13] S. K. Dani, C. Thakur, N. Nagvanshi and G. Singh, "Anomaly
Surveillance Videos," 2019 26th International Conference on Detection using PCA in Time Series Data," 2024 IEEE
High Performance Computing, Data and Analytics Workshop International Conference on Interdisciplinary Approaches in
(HiPCW), Hyderabad, India, 2019, pp. 93-98, doi: Technology and Management for Social Innovation
10.1109/HiPCW.2019.00031. (IATMSI), Gwalior, India, 2024, pp. 1-6, doi:
[2] A. B. Nassif, M. A. Talib, Q. Nasir and F. M. Dakalbab, 10.1109/IATMSI60426.2024.10502929.
"Machine Learning for Anomaly Detection: A Systematic [14] Mishra, S., Jabin, S. Anomaly detection in surveillance
Review," in IEEE Access, vol. 9, pp. 78658-78700, 2021, doi: videos using deep autoencoder. Int. j. inf. tecnol. 16, 1111–
10.1109/ACCESS.2021.3083060. 1122 (2024). https://doi.org/10.1007/s41870-023-01659-z.
[3] Zhang, L., Li, S., Luo, X. et al. Video anomaly detection with [15] Gnouma, M., Ejbali, R., Zaied, M. (2023). Abnormal Event
both normal and anomaly memory modules. Vis Detection Method Based on Spatiotemporal CNN Hashing
Comput (2024). https://doi.org/10.1007/s00371-024-03584-z Model. In: Abraham, A., Pllana, S., Casalino, G., Ma, K.,
[4] Rezaiezadeh Roukerd, F., Rajabi, M.M. Anomaly detection in Bajaj, A. (eds) Intelligent Systems Design and Applications.
groundwater monitoring data using LSTM-Autoencoder ISDA 2022. Lecture Notes in Networks and Systems, vol 717.
neural networks. Environ Monit Assess 196, 692 (2024). Springer, Cham. https://doi.org/10.1007/978-3-031-35510-
https://doi.org/10.1007/s10661-024-12848-z. 3_16.

Icses 24 T3 1047

Uploaded by

Copyright:

Available Formats

Icses 24 T3 1047

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Icses 24 T3 1047

Uploaded by

Copyright:

Available Formats

An Advanced Autoencoder-Based Approach to

Anomaly Detection for Video Surveillance

Abstract— visual feedback to the user regarding which frames

CUHK 92.8% 60.90% 91.41%

UBnormal 69.88% 71.78% 55.57%

This table compares the AUC performance of three

Similar Data Augmentation: Various transformations such

E. Model Training 𝐶𝑡 = 𝑎𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶̃𝑡 (8)

TABLE 4: Confusion Matrix

Fig.3: LSTM – Autoencoder Model

V. TRAINING AND TESTING 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

Fig. 4: Training & Validation Loss Vs Epochs

Fig. 5: Training & Validation Accuracy Vs Epochs

You might also like