crowd counting
crowd counting
crowd counting
24.08.2022
─
Team size : 3
Members :
● Mohd Kaif
● Pranav Singh Sehgal
● Nausheen Noor Zaidi
Objective
● To estimate the number of people within a given area, using the image or live
video footage from a camera.
● Should be able to collate data of different image sources and be able to work on
a very large volume of crowd.
● Should be able to estimate in all illumination conditions.
● Trigger alarm when the number of people reaches a specified threshold. The use
of algorithms should work on casual, conventional, expressive and acting
crowds.
● Notification of any undesirable event in the crowded place.
1
Can you give me an approximate number of how many people are in the frame?
Yes, including the present way in the background. The most direct method is to
manually count each person but does that make practical sense? It’s nearly
impossible when the crowd is this big!
While we don’t yet have algorithms that can give us the EXACT number, most
computer vision techniques can produce impressively precise estimates.
2
1. OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer
vision and machine learning software library. OpenCV was built to provide a
common infrastructure for computer vision applications and to accelerate the use
of machine perception in the commercial products.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux,
Android and Mac OS. OpenCV leans mostly towards real-time vision applications
and takes advantage of MMX and SSE instructions when available. A
full-featured CUDAand OpenCL interfaces are being actively developed right
now. There are over 500 algorithms and about 10 times as many functions that
compose or support those algorithms. OpenCV is written natively in C++ and has
a templated interface that works seamlessly with STL containers.
2. Deep Learning
Deep learning is a type of machine learning and artificial intelligence (AI) that
imitates the way humans gain certain types of knowledge. Deep learning is an
important element of data science, which includes statistics and predictive
modeling. It is extremely beneficial to data scientists who are tasked with
collecting, analyzing and interpreting large amounts of data; deep learning makes
this process faster and easier.At its simplest, deep learning can be thought of as
a way to automate predictive analytics. While traditional machine learning
algorithms are linear, deep learning algorithms are stacked in a hierarchy of
increasing complexity and abstraction.
Computer programs that use deep learning go through much the same process
as the toddler learning to identify the dog. Each algorithm in the hierarchy applies
a nonlinear transformation to its input and uses what it learns to create a
statistical model as output.
3
3. Pytorch
PyTorch is an open source machine learning framework based on the Torch library,
used for applications such as computer vision and natural language
processing,primarily developed by Meta AI. It is free and open-source software released
under the Modified BSD license. Although the Python interface is more polished and the
primary focus of development, PyTorch also has a C++ interface.
A number of pieces of deep learning software are built on top of PyTorch, including
Tesla Autopilot, Uber's Pyro, Hugging Face's Transformers, PyTorch Lightning, and
Catalyst.
PyTorch provides two high-level features:
● Tensor computing (like NumPy) with strong acceleration via graphics
processing units (GPU)
● Deep neural networks built on a tape-based automatic differentiation system
4
4. CNN
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm
which can take in an input image, assign importance (learnable weights and
biases) to various aspects/objects in the image and be able to differentiate one
from the other. The pre-processing required in a ConvNet is much lower as
compared to other classification algorithms. While in primitive methods filters are
hand-engineered, with enough training, ConvNets have the ability to learn these
filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of
Neurons in the Human Brain and was inspired by the organization of the Visual
Cortex. Individual neurons respond to stimuli only in a restricted region of the
visual field known as the Receptive Field. A collection of such fields overlap to
cover the entire visual area.
5
1. Detection-based methods
Here, we use a moving window-like detector to identify people in an image and
count how many there are. The methods used for detection require well trained
classifiers that can extract low-level features. Although these methods work
well for detecting faces, they do not perform well on crowded images as
most of the target objects are not clearly visible.
2. Regression-based methods
We were unable to extract low level features using the above approach.
Regression-based methods come up trumps here. We first crop patches from
the image and then, for each patch, extract the low level features.
4. CNN-based methods
Ah, good old reliable convolutional neural networks (CNNs). Instead of
looking at the patches of an image, we build an end-to-end regression method using
6
CNNs. This takes the entire image as input and directly generates the crowd count.
CNNs work really well with regression or classification tasks, and they have also proved
their worth in generating density maps.
CSRNet, a technique we will implement, deploys a deeper CNN for capturing high-level
features and generating high-quality density maps without expanding the network
complexity. Let’s understand what CSRNet is.
The basic concept of using dilated convolutions is to enlarge the kernel without increasing
the parameters. So, if the dilation rate is 1, we take the kernel and convolve it on the entire
image. Whereas, if we increase the dilation rate to 2, the kernel extends as shown in the
above image (follow the labels below each image). It can be an alternative to pooling layers.
7
We can detect stampedes if there is a sudden movement in the heat map by continuously
comparing the current movement with previous. If there is a major change in the x,y
coordinates of the pixels then it is stampede.
8
The main objective of this project is to design a traffic light controller based on
Computer Vision that can adapt to the current traffic situation and to improve the
Traffic Control by adding necessary features and new technologies into the
application. We propose a system for controlling the traffic light by image
processing. The vehicles are detected by the system through images instead of
using electronic sensors embedded in the pavement which are fed from the CCTV
cameras at traffic junctions for real-time traffic density calculation by detecting the
vehicles at the signal and setting the green signal time accordingly. The vehicles are
detected to obtain a more accurate estimate of the green signal time.
9
A. Image Capturing
At first, System includes a camera placed facing a lane that will capture images of
the road on which we want to control traffic. Those cameras will capture image
sequences. The image sequence will then be analyzed using digital image
processing. Image processing is done by using OpenCV.
➔ F + Vc * 2 - 2
➔ 4 + 5*2 - 2
10
➔ 12 sec
Despite there are no vehicles in a path there will be a fixed time for 4 sec for
Pedestrians to cross.