crowd counting

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Crowd Counting System

24.08.2022

Team Name : UPP-82820128

Team size : 3

Members :
● Mohd Kaif
● Pranav Singh Sehgal
● Nausheen Noor Zaidi

Objective
● To estimate the number of people within a given area, using the image or live
video footage from a camera.
● Should be able to collate data of different image sources and be able to work on
a very large volume of crowd.
● Should be able to estimate in all illumination conditions.
● Trigger alarm when the number of people reaches a specified threshold. The use
of algorithms should work on casual, conventional, expressive and acting
crowds.
● Notification of any undesirable event in the crowded place.
1

What is Crowd Counting


Crowd Counting is a technique to count or estimate the number of people in an
image. Take a moment to analyze the below image:

Can you give me an approximate number of how many people are in the frame?
Yes, including the present way in the background. The most direct method is to
manually count each person but does that make practical sense? It’s nearly
impossible when the crowd is this big!
While we don’t yet have algorithms that can give us the EXACT number, most
computer vision techniques can produce impressively precise estimates.
2

Some Technology used in Crowd Counting -

1. OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer
vision and machine learning software library. OpenCV was built to provide a
common infrastructure for computer vision applications and to accelerate the use
of machine perception in the commercial products.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux,
Android and Mac OS. OpenCV leans mostly towards real-time vision applications
and takes advantage of MMX and SSE instructions when available. A
full-featured CUDAand OpenCL interfaces are being actively developed right
now. There are over 500 algorithms and about 10 times as many functions that
compose or support those algorithms. OpenCV is written natively in C++ and has
a templated interface that works seamlessly with STL containers.

2. Deep Learning
Deep learning is a type of machine learning and artificial intelligence (AI) that
imitates the way humans gain certain types of knowledge. Deep learning is an
important element of data science, which includes statistics and predictive
modeling. It is extremely beneficial to data scientists who are tasked with
collecting, analyzing and interpreting large amounts of data; deep learning makes
this process faster and easier.At its simplest, deep learning can be thought of as
a way to automate predictive analytics. While traditional machine learning
algorithms are linear, deep learning algorithms are stacked in a hierarchy of
increasing complexity and abstraction.

Computer programs that use deep learning go through much the same process
as the toddler learning to identify the dog. Each algorithm in the hierarchy applies
a nonlinear transformation to its input and uses what it learns to create a
statistical model as output.
3

3. Pytorch

PyTorch is an open source machine learning framework based on the Torch library,
used for applications such as computer vision and natural language
processing,primarily developed by Meta AI. It is free and open-source software released
under the Modified BSD license. Although the Python interface is more polished and the
primary focus of development, PyTorch also has a C++ interface.
A number of pieces of deep learning software are built on top of PyTorch, including
Tesla Autopilot, Uber's Pyro, Hugging Face's Transformers, PyTorch Lightning, and
Catalyst.
PyTorch provides two high-level features:
● Tensor computing (like NumPy) with strong acceleration via graphics
processing units (GPU)
● Deep neural networks built on a tape-based automatic differentiation system
4

4. CNN
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm
which can take in an input image, assign importance (learnable weights and
biases) to various aspects/objects in the image and be able to differentiate one
from the other. The pre-processing required in a ConvNet is much lower as
compared to other classification algorithms. While in primitive methods filters are
hand-engineered, with enough training, ConvNets have the ability to learn these
filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of
Neurons in the Human Brain and was inspired by the organization of the Visual
Cortex. Individual neurons respond to stimuli only in a restricted region of the
visual field known as the Receptive Field. A collection of such fields overlap to
cover the entire visual area.
5

Understanding the Different Computer Vision Techniques for Crowd


Counting -

1. Detection-based methods
Here, we use a moving window-like detector to identify people in an image and
count how many there are. The methods used for detection require well trained
classifiers that can extract low-level features. Although these methods work
well for detecting faces, they do not perform well on crowded images as
most of the target objects are not clearly visible.

2. Regression-based methods
We were unable to extract low level features using the above approach.
Regression-based methods come up trumps here. We first crop patches from
the image and then, for each patch, extract the low level features.

3. Density estimation-based method s


We first create a density map for the objects. Then, the algorithm learns a linear
mapping between the extracted features and their object density maps. We can
also use random forest regression to learn non-linear mapping.

4. CNN-based methods
Ah, good old reliable convolutional neural networks (CNNs). Instead of
looking at the patches of an image, we build an end-to-end regression method using
6

CNNs. This takes the entire image as input and directly generates the crowd count.
CNNs work really well with regression or classification tasks, and they have also proved
their worth in generating density maps.
CSRNet, a technique we will implement, deploys a deeper CNN for capturing high-level
features and generating high-quality density maps without expanding the network
complexity. Let’s understand what CSRNet is.

Understanding the Architecture and Training Method of CSRNet


CSRNet uses VGG-16 as the front end because of its strong transfer learning ability. The
output size from VGG is ⅛th of the original input size. CSRNet also uses dilated
convolutional layers in the back end.
But what in the world are dilated convolutions? It’s a fair question to ask. Consider the
below image:

The basic concept of using dilated convolutions is to enlarge the kernel without increasing
the parameters. So, if the dilation rate is 1, we take the kernel and convolve it on the entire
image. Whereas, if we increase the dilation rate to 2, the kernel extends as shown in the
above image (follow the labels below each image). It can be an alternative to pooling layers.
7

Solution to the problem

I. Crowd counting through live video source


1. split the video into single frames
2. apply the CSRnet algorithm on each frame one by one and calculate average
count from all the frames
3. recombine all frames and write into video format
4. store the result in a file

We can detect stampedes if there is a sudden movement in the heat map by continuously
comparing the current movement with previous. If there is a major change in the x,y
coordinates of the pixels then it is stampede.
8

II. Live traffic analysis

The main objective of this project is to design a traffic light controller based on
Computer Vision that can adapt to the current traffic situation and to improve the
Traffic Control by adding necessary features and new technologies into the
application. We propose a system for controlling the traffic light by image
processing. The vehicles are detected by the system through images instead of
using electronic sensors embedded in the pavement which are fed from the CCTV
cameras at traffic junctions for real-time traffic density calculation by detecting the
vehicles at the signal and setting the green signal time accordingly. The vehicles are
detected to obtain a more accurate estimate of the green signal time.
9

A. Image Capturing
At first, System includes a camera placed facing a lane that will capture images of
the road on which we want to control traffic. Those cameras will capture image
sequences. The image sequence will then be analyzed using digital image
processing. Image processing is done by using OpenCV.

B. Vehicle detection and Calculating traffic Density


After detecting Vehicles using OpenCV, we use HAAR CASCADE ALGORITHM the
cascade classifier gives The vehicle density count on the road Cascade classifier
is used to detect the objects in the video stream. This algorithm is capable of
differentiating vehicles from different objects.

C. Calculation of green signal time


Based on the density of the vehicles green signal count will be allotted for every
path. In our project the time given for each vehicle is 2 seconds. If the number of
the vehicles in one path is 5 then, the total green signal time given for that path
will be ->
4(fixed time) +5*2(each vehicle 2 sec)-2(last 2 sec yellow light will be displayed)

➔ F + Vc * 2 - 2
➔ 4 + 5*2 - 2
10

➔ 12 sec
Despite there are no vehicles in a path there will be a fixed time for 4 sec for
Pedestrians to cross.

D. Updating Traffic signal timer


When the green signal timer ends for a path then it moves clockwise and jumps
to the next path and starts detecting the vehicles. A line of vehicles waiting to be
served by a phase in which the flow rate from the front of the queue. Slowly
moving vehicles joining the rear of the queue are usually considered part of the
queue. A faster moving line of vehicles is often referred to as a moving queue or
a platoon

You might also like