Main Python Code
Main Python Code
Main Python Code
--------------------------------------------------------------------------
-
ModuleNotFoundError Traceback (most recent call las
t)
Cell In[1], line 4
2 from torch.utils.data import Dataset, DataLoader
3 import os, random, time, shutil
----> 4 import torch, torchvision
5 from torchvision import transforms, datasets, models
6 from torchvision.models.detection.faster_rcnn import FastRCNNPredi
ctor
localhost:8888/notebooks/Downloads/main.ipynb 1/32
2/24/24, 9:50 PM main - Jupyter Notebook
Let's implement data augmentation by doubling the dataset. To do this, I wrote a custom
function aug:
In [3]: aug(out_folder='augmented_dataset')
I randomly divide the photos into train (80%) and test (20%)
Annotations in this case are available in two different formats: COCO_json and
PASCAL_VOC_xml
localhost:8888/notebooks/Downloads/main.ipynb 2/32
2/24/24, 9:50 PM main - Jupyter Notebook
Let's look at how to work with annotation data presented in xml format
[<object>
<name>person</name>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>1000.3</xmin>
<ymin>301.6</ymin>
<xmax>1082.3</xmax>
<ymax>514.5</ymax>
</bndbox>
</object>, <object>
<name>person</name>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>1140.9</xmin>
<ymin>260.36</ymin>
<xmax>1215.6</xmax>
<ymax>493.2</ymax>
</bndbox>
</object>]
In this case, informative information is presented in the xmin , ymin , xmax , ymax section
and the class itself is presented in the name section
localhost:8888/notebooks/Downloads/main.ipynb 3/32
2/24/24, 9:50 PM main - Jupyter Notebook
Pytorch, when training detection models, requires data in the format [xmin, ymin, xmax,
ymax] for each box
localhost:8888/notebooks/Downloads/main.ipynb 4/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [56]: '''
This function will output a dictionary with 3 keys: boxes, labels and image_
The function takes as input:
image_id - index of the photo from the Pytorch class Dataset
file - path to xml file
'''
def generate_target(image_id, file):
with open(file) as f:
data = f.read()
soup = BeautifulSoup(data, 'xml')
objects = soup.find_all('object')
num_objs = len(objects)
# We will iterate through the sheet obtained after expanding the xml
boxes = []
labels = []
for i in objects :
boxes.append(generate_box(i))
labels.append(generate_label(i))
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# In this case there is only 1 class
labels = torch.as_tensor(labels, dtype=torch.int64)
# translate index torch tensor
img_id = torch.tensor([image_id])
# get the final dictionary for the photograph under study
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["image_id"] = img_id
return target
Let's create a MakeDataset class, inheriting it from the Dataset class. Let's describe the init ,
getitem and len methods
localhost:8888/notebooks/Downloads/main.ipynb 5/32
2/24/24, 9:50 PM main - Jupyter Notebook
Let's see in what format the data is stored in the MakeDataset class:
In this case, there are 3 objects of the person class in the photo, so there are 3 bounding
boxes
localhost:8888/notebooks/Downloads/main.ipynb 6/32
2/24/24, 9:50 PM main - Jupyter Notebook
Demonstration of output data from the DataLoader class -> [batch_size, dicts]
localhost:8888/notebooks/Downloads/main.ipynb 7/32
2/24/24, 9:50 PM main - Jupyter Notebook
localhost:8888/notebooks/Downloads/main.ipynb 8/32
2/24/24, 9:50 PM main - Jupyter Notebook
Network configuration:
We will use the Transfer learning approach, training the Faster RCNN network, which has
already been pretrained on COCO
Inside the Faster R-CNN implementation with FPN (Feature Pyramid Network) in the
PyTorch torchvision library, a complex loss function is used that combines several sub-
functions.
In [18]: model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.001, momentum=0.9, weight_decay=0.0
# We will train networks on a video card:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device
print(device)
model.to(device)
num_epochs = 30 # number of training epochs
cuda
localhost:8888/notebooks/Downloads/main.ipynb 9/32
2/24/24, 9:50 PM main - Jupyter Notebook
In this case, the choice of the SGD optimizer may be due to the following reasons:
Model size and complexity: Faster R-CNN with ResNet50-FPN is a fairly large and complex
model with many trainable parameters, which can lead to rapid overfitting and instability
when using more complex optimizers such as Adam.
Amount and type of data: When training object detectors based on Faster R-CNN, a loss
function is used, which consists of several components, including components related to
object classification and regression. SGD is a classic optimizer that does a good job of
training such models, while Adam, which is a more advanced method, may not produce
optimal results.
Availability of pre-trained weights: In this case, we use pre-trained weights for Faster R-CNN
with ResNet50-FPN, which can simplify the training process and allow the use of a simpler
SGD optimizer instead of Adam.
In [19]: # Create an empty folder in which we will save the trained models
newpath = 'models'
if not os.path.exists(newpath):
you . makedirs ( newpath )
The forward path for the model in train produces a loss_dict, which represents a dictionary
containing the loss function values for each of the Faster R-CNN components used during
training. The loss_dict dictionary in the Faster R-CNN model includes the following loss
function components:
1. Loss_objectness is responsible for determining whether the region of the inferred object
contains any object or not (binary classification). To do this, loss_objectness uses the
binary cross-entropy between the network output and the corresponding labels for each
region.
2. Loss_classifier is responsible for determining which class an object belongs to in a
given region. To do this, loss_classifier uses multi-class cross-entropy between the
network output and the corresponding class labels for each region.
3. Loss_box_reg is responsible for determining how well the model predicts the correct
bounding box coordinates for a detected object in the image. To do this, loss_box_reg
uses the root mean square error between the predicted bounding box coordinates and
the actual coordinates.
4. Loss_rpn_box_reg is responsible for determining how well the model predicts bounding
box coordinates for regions obtained from the Region Proposal Network (RPN) that may
contain objects. To do this, loss_rpn_box_reg uses the root mean square error between
the predicted bounding box coordinates and the actual coordinates.
Let’s load a custom function and train the network (30 training epochs):
localhost:8888/notebooks/Downloads/main.ipynb 10/32
2/24/24, 9:50 PM main - Jupyter Notebook
V lid ti l ft 29 h 0 11717739876359701
Tensorboard with training is saved in the results_training directory. I also saved it on the dev
website so you can watch it via link.
!!! The training results can be viewed by clicking on this link
(https://tensorboard.dev/experiment/rr43qafqQKyKP7CQ5r1RCA/#scalars&_smoothingW
Since the file with the model weighs too much (158 MB), it was not possible to upload the
trained model to github. So, in order to run the code yourself using my trained model, you
need to run this function, which will download the trained networks from my Google drive to
the models folder:
In [ ]: #download_models(folder_name='models')
localhost:8888/notebooks/Downloads/main.ipynb 11/32
2/24/24, 9:50 PM main - Jupyter Notebook
Testing:
Let's launch a custom function that implements displaying the results of the model
In [ ]: detect_and_visualize(image_input='detect_dataset/images/am3_7_violation_fram
model_path='models/model_human_detection_final.pth',
classes=['person'], plt_show=True)
detect_and_visualize(image_input='detect_dataset/images/am3_9_frame111.jpg',
model_path='models/model_human_detection_final.pth',
classes=['person'], plt_show=True)
localhost:8888/notebooks/Downloads/main.ipynb 12/32
2/24/24, 9:50 PM main - Jupyter Notebook
To calculate metrics in object detection tasks, two thresholds are usually used: a
classification threshold and a threshold for determining the intersection with the true position
of an object (intersection over union, IoU threshold).
The threshold for classification determines which predictions are considered positive and
which are negative. Usually the threshold is set based on the score that the model produces
for each detected object. If the score exceeds the threshold, then the object is considered
positive, otherwise - negative.
The IoU detection threshold determines how much the detected object overlaps with the
object's true position. Typically, the IoU threshold is set based on specified detection quality
requirements. If the IoU between the detected object and the true position of the object
exceeds the threshold, then the object is considered to be truly detected, otherwise it is
considered to be falsely detected.
Thus, to calculate metrics in object detection tasks, it is necessary to know two thresholds:
the threshold for classification and the threshold for determining IoU. They allow you to
separate positive and negative examples and determine how well the model detects the true
positions of objects.
An important note - the score threshold is used when the model is on combat duty in
production, so it should be chosen especially carefully. But the IoU threshold is used only for
validation to evaluate several well-known metrics (I’ll talk about them a little later in my story)
Let's evaluate the quality of the model using the IOU metric:
In [61]: '''
The first step is to find iou scores:
Let's get an array with the number of elements equal to the number of object
and containing an array of IOU correspondences between the predicted and rea
'''
iou_scores_list = calculate_iou(model, val_dataset, treshold=0.85)
Example 1:
tensor([[0.7746]])
Example 2:
tensor([[0.9480, 0.0432],
[0.0486, 0.8020]])
localhost:8888/notebooks/Downloads/main.ipynb 13/32
2/24/24, 9:50 PM main - Jupyter Notebook
We obtain iou matrices, which contain the values of the IoU coefficients between all pairs of
frames from the predicted and real boxes. These values will range from 0 to 1, where a
value of 0 means that the frames do not overlap and a value of 1 means that the frames
match completely.
Let's calculate the average IOU for validation at a given threshold score = 0.85
In [63]: val = []
for image in iou_scores_list:
for detect in image:
val.append(max(detect))
print ( f'Average IOU on validation is: { np . mean ( val )} ' )
Let’s visualize what predicted and real bounding boxes look like on images from the
validation dataset and find the IOU for them:
ps (in this custom function, all predicted boxes for which score>0 are built at once)
Recall for a one-class detection task shows how many of all objects of interest were
detected by the algorithm. That is, the closer the recall value is to 1, the more objects of
interest were found by the algorithm.
Precision for a one-class detection task shows how many of all bounding boxes predicted by
the algorithm actually contain objects of interest. That is, the closer the precision value is to
1, the fewer false objects were predicted by the algorithm.
localhost:8888/notebooks/Downloads/main.ipynb 14/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [25]: # Artificial example: only 3 objects actually existed, but 4 were discovered
# with the selected score threshold with the following IOU values:
score = torch.tensor([[0.20, 0.90, 0.10],
[0.80, 0.10, 0.20],
[0.00, 0.10, 0.20],
[0.10, 0.20, 0.00]])
print('recall (iou=0.1) =', recall(score, iou_threshold=0.1))
print('precission (iou=0.1) =', precision(score, iou_threshold=0.1))
print('recall (iou=0.5) =', recall(score, iou_threshold=0.5))
print('precission (iou=0.5) =', precision(score, iou_threshold=0.5))
print('recall (iou=0.85) =', recall(score, iou_threshold=0.85))
print('precission (iou=0.85) =', precision(score, iou_threshold=0.85))
In [26]: #Calculation of metric environments for the entire dataset at score porg 0.8
print ( 'Average recall over the entire validation dataset =' ,
mean_metric(iou_scores_list, func='recall', iou_treshold=0.5))
print ( 'Average precision over the entire validation dataset =' ,
mean_metric(iou_scores_list, func='precision', iou_treshold=0.5))
If the detection algorithm identifies many regions in a photograph that have a high similarity
to a real object, but these regions do not actually correspond to the object (a bunch of
predict boxes next to a person as in YOLO without non-maximum supression), then for this
case recall will be high and precision will be low.
In our case, the detection confidence threshold was chosen to be very high (score = 0.85),
so our situation is exactly the opposite. More often than not, the classifier will not find a
person at all, rather than finding him twice. For this reason, our precision is very high, and
recall has a lower value.
localhost:8888/notebooks/Downloads/main.ipynb 15/32
2/24/24, 9:50 PM main - Jupyter Notebook
In this case, AP is the average value of accuracy when choosing a threshold on score from 0
to 1, at which recall changes from 0 to 1. AP takes into account the importance of both
accuracy and recall when calculating the quality of an object detection algorithm and does
not take into account the choice of a specific score threshold, which allows us to give a more
general assessment of the quality of the detection model. Because in this problem there is
only one detectable class, so the metric mAP (mean AP) = AP
In [27]: rec=[]
prec = []
for i in np . linspace ( 0 , 1 , num = 11 , endpoint = False ):
iou_scores_list = calculate_iou(model, val_dataset, treshold=i)
rec.append(mean_metric(iou_scores_list, func='recall', iou_treshold=0.5)
prec.append(mean_metric(iou_scores_list, func='precision', iou_treshold=
rec.append(0)
prec.append(1)
Building precision_recall_curve:
AP can be defined as the area under the Precision-Recall curve. Let us depict this graph,
constructed when choosing a threshold IoU = 0.5
localhost:8888/notebooks/Downloads/main.ipynb 16/32
2/24/24, 9:50 PM main - Jupyter Notebook
Determining the average Average Precision (AP) from images of the validation dataset at
IoU=5:
localhost:8888/notebooks/Downloads/main.ipynb 17/32
2/24/24, 9:50 PM main - Jupyter Notebook
We trained a rather complex Faster RCNN model and obtained very high quality metrics
during validation.
The reason for such high metrics is that for this task there was a fairly large training dataset;
in total, only 1 class was required to be detected, which makes the task quite simple. And the
most important thing is that the entire dataset is presented in the form of photographs from
several fixed cameras, so the photographs taken during validation and during training are,
unfortunately, very similar to each other. Even the use of augmentation does not add much
variety to our studied dataset. So it is worth assuming that when using this pre-trained
network on completely different (different) images, the network will show significantly lower
quality results.
localhost:8888/notebooks/Downloads/main.ipynb 18/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [31]: detect_and_visualize(image_input='test_folder/1.jpg',
model_path='models/model_human_detection_final.pth',
classes=['person'], plt_show=True)
detect_and_visualize(image_input='test_folder/2.jpg',
model_path='models/model_human_detection_final.pth',
classes=['person'], plt_show=True)
detect_and_visualize(image_input='test_folder/3.jpg',
model_path='models/model_human_detection_final.pth',
classes=['person'], plt_show=True)
localhost:8888/notebooks/Downloads/main.ipynb 19/32
2/24/24, 9:50 PM main - Jupyter Notebook
localhost:8888/notebooks/Downloads/main.ipynb 20/32
2/24/24, 9:50 PM main - Jupyter Notebook
Regarding the procedure for creating a suitable dataset for training the model:
Initially, I marked up the photographs, highlighting the entire human figure and indicating the
class of the object (whether it was wearing a helmet or not), but this approach did not give
good results. Due to too significant unevenness in the number of objects between classes, I
received a very low recall for the class with hardhat. (since in the dataset 95% of the photos
are with people in helmets).
I was able to get slightly better results by artificially reducing the number of examples with
the class with hardhat (thereby reducing the size of the dataset to 93 photos) and marking
the boxes not by the area of the human body, but only by the area of the head. That is, there
are 2 possible classes in the box: a head with a helmet on and a bare head.
localhost:8888/notebooks/Downloads/main.ipynb 21/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [ ]: aug(image_dir="detect_hat_dataset/images",
xml_dir="detect_hat_dataset/annotations",
out_folder='augmented_hat_dataset')
localhost:8888/notebooks/Downloads/main.ipynb 22/32
2/24/24, 9:50 PM main - Jupyter Notebook
localhost:8888/notebooks/Downloads/main.ipynb 23/32
2/24/24, 9:50 PM main - Jupyter Notebook
cuda
localhost:8888/notebooks/Downloads/main.ipynb 24/32
2/24/24, 9:50 PM main - Jupyter Notebook
Обучение сети:
In [35]: train(model=model, train_data_loader=train_data_loader, optimizer=optimizer,
val_data_loader=val_data_loader,
num_epochs=30, comment=' hardhat detection new', device=device,
save_path='models/model_hardhat_detection.pth')
Так как файл с моделью весит слишком много (158 Мб), поэтому на github залить
обученную модель не вышло. Так что для того, чтобы запустить самостоятельно код с
использованием моей обученной модели необходимо запустить эту функцию, которая
с моего гугл диска скачает в папку models обученные сети:
In [48]: #download_models(folder_name='models')
localhost:8888/notebooks/Downloads/main.ipynb 25/32
2/24/24, 9:50 PM main - Jupyter Notebook
Тестирование:
localhost:8888/notebooks/Downloads/main.ipynb 26/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [ ]: detect_and_visualize(image_input='detect_hat_dataset/images/am3_6_frame084.j
model_path='models/model_hardhat_detection_final.pth',
classes=['hardhat','no_harhat'], plt_show=True,
treshhold=0.6)
detect_and_visualize(image_input='detect_hat_dataset/images/am3_9_frame090.j
model_path='models/model_hardhat_detection_final.pth',
classes=['hardhat','no_harhat'], plt_show=True,
treshhold=0.6)
detect_and_visualize(image_input='detect_hat_dataset/images/am3_9_violation_
model_path='models/model_hardhat_detection_final.pth',
classes=['hardhat','no_harhat'], plt_show=True,
treshhold=0.6)
localhost:8888/notebooks/Downloads/main.ipynb 27/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [37]: '''
Найдем первым этапом iou scores:
Получим массив с числом элементов равным чилу объектов в датасете
и содержащим массив соответвий IOU между предсказанным и рельным bounding бо
'''
iou_scores_list = calculate_iou(model, val_dataset, treshold=0.6)
In [38]: val = []
for image in iou_scores_list:
for detect in image:
val.append(max(detect))
print(f'Средний IOU на валидации равен: {np.mean(val)}')
localhost:8888/notebooks/Downloads/main.ipynb 28/32
2/24/24, 9:50 PM main - Jupyter Notebook
Определим значение метрик mAP при разных значениях порга IoU, а также значения
AP для обоих классов:
This time we received lower values for quality metrics during validation.
The reason for this may be the small size of the training dataset, the low degree of diversity
of the photographs themselves in the dataset (which makes it difficult to generalize models),
the high complexity of the task itself, and the very strong difference in the prior probability of
localhost:8888/notebooks/Downloads/main.ipynb 29/32
2/24/24, 9:50 PM main - Jupyter Notebook
classes in the dataset. There are very few photographs of people without helmets at the
train, so the recall values for the “without helmets” class during validation turned out to be
very low. The AP metric for this class is low.
But at the same time, the model has learned to find people with a helmet quite accurately
(the metrics for this class are an order of magnitude better)
So we can assume that the model is good at finding people with a helmet, but with difficulty
finding people without one.
Let's try to make detection on unfamiliar and dissimilar photographs in the dataset:
localhost:8888/notebooks/Downloads/main.ipynb 30/32
2/24/24, 9:50 PM main - Jupyter Notebook
In [66]: detect_and_visualize(image_input='test_folder/2.jpg',
model_path='models/model_hardhat_detection_final.pth',
classes=['hardhat','no_harhat'], plt_show=True, treshho
detect_and_visualize(image_input='test_folder/3.jpg',
model_path='models/model_hardhat_detection_final.pth',
classes=['hardhat','no_harhat'], plt_show=True, treshho
localhost:8888/notebooks/Downloads/main.ipynb 31/32