15 Object+Detection+
15 Object+Detection+
15 Object+Detection+
Abstract. A system for persons who are blind or visually impaired that runs on Android. For users who
are visually impaired, it offers object detection in the close vicinity. This technique aids those who are blind in
recognising everyday objects such as a chair, table, phone, etc. in their environment. This system helps the
blind person buy at the supermarket by using an RGB colour camera to capture images of the immediate area
and deep learning to recognise the type and placement of objects in front of the blind person. This method
builds two sets of picture databases with a combined total of nine categories, each with an object detection
system based on an SSD network as well as an object classification system based on a MobileNets network, to
train a neural network. In order to install an object categorization network on an Android phone, this system is
constructed of an Android - TensorFlow interface. The Android terminal now has a voice announcement
feature that notifies the blind person in real time when an object is recognised. For persons with vision
impairments, this system has been shown to be more effective.
Keyword : Blind, Deep Learning, Object Detection, Object Classification, Convolutional Neural Network
1. Introduction
The field of computer vision has always been an active area of research, and in this paper, we have a device
integrated with its application (object detection) and running in real-time. Due to the limited processing power
of the project equipment, we need a lightweight object detection model for real-time detection. Available light
models are TinyYOLO and SSD MobileNet. In this paper, we use SSD MobileNet. Research background and
significance. This topic is a subdivision of the big topic of blind visual aids, aiming to use deep learning-based.
The related network and algorithm in the field of object detection are used to detect objects in their daily life,
and the detection results are Feedback to blind people to help blind people improve their day-to-day life.
According to statistics on people with visual impairment and blindness released by the World Health
Organization in October 2017 Look, about 253 million people worldwide are currently visually impaired. 36
million of them are blind, 2.17 million people have moderate to severe visual impairment. The number of blind
people is huge. Vision, hearing, touch, and smell constitute the most important perception systems of human
beings. vision is human experience and the primary source of knowledge. The impact of picture information on
human perception is far greater than that of other media information. Therefore, to solve the visual impairment
of the visually impaired the issue of access to information became a pressing need. One of the burgeoning areas
of current deep learning research is computer vision, and object detection in computer vision is a crucial area of
the digital world. Images have evolved into an essential information medium, and enormous volumes of image
data are being created every second. Accurately identifying things in photos is also becoming more and more
crucial as a result of this [1]. It is the object classification's job to evaluate the input. Output a series of labels
with scores indicating the category of interest and the possibility of an object appearing in the input image to
determine whether there are objects of the specified category in the image [2]. The object's location and field of
view, its centre or closed boundary, its output object's bounding box, etc. The most popular option is the
bounding box for the shape [3]. Its purpose is to help blind people obtain the classification information of
images in a specific environment and process them. android application cameras can obtain environmental
information, and identify the type of objects, such as computers, bags, toothbrushes, and other daily necessities.
In this way, blind people recognize objects surrounding them in a way and enhance the adaptability and
independence of blind people in complex environments.
Object classification, detection, and segmentation are three major tasks in the field of computer vision. This
topic involves the classification and detection of the first two types of tasks. The object classification is to return
a single image The category with the highest confidence in the film is usually the most obvious object in the
corresponding image; object detection is in addition to returning all detected object categories in a single image,
It is a useful way to deal with gradient expansion or gradient disappearance issues that might arise during the
training process of a traditional network depth, and because it performs much better than a regular network
model, it is a strong contender in the ILSVRC 2015 and COCO 2015 competitions. It won first place in
detection, localization, and segmentation tasks. Compared with traditional VGG-Nets, residual networks It is
deeper VGG-Nets. Andrew G. Howard et al. suggested MobileNets in 2017 [12], a network that is primarily for
mobile device and embedded device design, considerably increasing the network's speed [13] and making it
more appropriate for these devices [14].
The output is a border (or "bounding box") and the object's label; it will be visually shown on the input image
when it is complete, and the network is now prepared to accept the following image. The Android-TensorFlow
interface is used to send the image directly to the network after it has been captured by the camera. The Android
app will display the detected category and the confidence of the category in real-time in accordance with the
returning result. A spoken announcement will be made for the appropriate category if the confidence level is
higher than 0.8. The average forward propagation time of each network is 200ms, so the system can realize real-
time detection on the android side.
There are also two installed versions of TensorFlow, one is CPU-only TensorFlow, the other is TensorFlow with
GPU support. TensorFlow is currently available on Windows The client only supports 3.5.x and 3.6.x. Based on
the above analysis, the CPU version of TensorFlow was installed on the notebook using the pip command. On
the server, in order not to interfere with other python-based installation packages in the system, an independent
virtual machine is created. environment and install the GPU version of TensorFlow.
The SSD detector is based on the VGG basic network. The whole system takes the VGG16 network the first
five layers are used as the basic classification network, and the two fully connected layers fc6 and fc7 are
converted by the astrous algorithm It is a convolutional layer, and finally 3 additional convolutional layers with
different channel numbers with a receptive field of 1*1 and a global average pooling layer. The feature maps of
each layer starting from the sixth convolutional layer are used for the prediction of the size and position of the
default box and the prediction of different categories of objects, the final network passes Non-Maximum
Suppression yields the result. The core of the SSD algorithm is that the feature map of each layer after the sixth
layer of the network will pass through a detector. For the feature maps of each layer, according to different sizes
(scale) and aspect ratios (ratio), the algorithm automatically generates k default boxes.
When the intersection ratio of the default box and the real box is greater than the threshold (set to 0.5), the two
match, and the setting is not the default boxes for matching are negative samples, and the default boxes for
matching are positive samples. The SSD object identification network, which is built on the TensorFlow deep
learning framework, is used in this topic.
(a) (b)
Fig.5. Results of Object Detection System
Figure 5 shows the results of the object detection system. The training is deployed on the server's the increase of
training times during the training process: After training, test a single image. In terms of single object detection,
both can achieve higher detection the confidence level of the three images is above 0.99. After completing the
single image evaluation, use the network for real-time detection. Since real-time detection and video detection
have the same method and principle in-network calls, for convenience, this project only uses the network to test
the input video stream without calling the computer camera in the program. Video Both stream reading and
output utilize OpenCV.
here is that the value of each picture is the detection time is different and cannot be measured by time). Then,
voice feedback is performed on the sampled graph, if Only feedback if the confidence is greater than 0.8,
otherwise enter the next cycle. This setting greatly increases the accuracy of voice feedback and makes the
whole system more robust.
4. Summary
4.1 Features and Summary of the Design
This project mainly develops an image recognition system for the blind, this topic draws from the existing deep
learning in the field of computer vision Starting from the application of the domain, this paper summarizes the
mainstream image detection since deep learning entered the field of computer vision. and image classification
algorithms, and according to the actual needs of blind people, two sets of helping them to object surrounding
and Image recognition system. The specific results obtained in this study are as follows: 1. This subject has
produced two sets of data sets, one for image detection, including 1 type of object, 2,635 color images, 2,635
.xml default box files; another set for image classification contains 8 Class objects, 11,200 color pictures; 2.
This topic trains two convolutional neural networks based on the TensorFlow deep learning framework,
respectively. is an SSD object detection network with 10 convolutional layers and 1 global pooling layer and an
SSD object detection network with 27 MobileNets object classification net with 2 convolutional layers, 1
average pooling layer, and 1 fully connected layer network; 3. This topic deploys the framework on both the PC
and Android sides, and the PC side is used to import the SSD object The detection network is used for realtime
detection of video, and the Android side is used to import MobileNets objects The classification network is used
for real-time detection of surrounding objects through the mobile phone camera; 4. In this project, a simple
audio library containing 8 types of object voices is made and deployed On the Android side, real-time feedback
of test results is realized.
In the early stage of training, it has always been reported that the dimensions do not match. The dimension
mismatch is mainly due to the number of input categories and the network. The original number of classes in the
network does not match, even if num_class in the program is changed to the class to be trained. Don't still report
errors. After that, the solution is to change the original network structure parameters during fine-tuning the last
layer excludes so that the number of network classes matches the number of input classes.
The network's image detection accuracy for multiple objects is not high. Found after trying various data
augmentation methods It does not improve the accuracy of network detection, so this can only be removed by
increasing the multi-object dataset. problem.
5. Conclusion
The real-time object detection system using Tensorflow Lite is discussed in a research article. The purpose of
this system is to aid the blind people surrounding them, so there are still many areas for improvement. First, it
can increase the number of categories for training, and collect more data in to improve the system's ability to
adaptability and robustness of object detection; Second, the system of this subject is not sensitive to small
objects, which requires blind people can identify goods only when they are closer to the camera. The solution
can be to realize the camera can be used in the process of building the system.
References
1. Szegedy C, Toshev A, Erhan D., “Deep Neural Networks for object detection”, Advances in Neural
Information Processing Systems, (2013), 26:2553-2561.
2. Felzenszwalb P F, Girshick R B, Mcallester D, et al., “Object Detection with Discriminatively Trained
Part-Based Models”, IEEE Transactions on Pattern Analysis & Machine Intelligence, (2014), 47(2):6-
7.
3. Li Xudong, Ye Mao, Li Tao., “A review of object detection research based on a convolutional neural
network”, Computer Application Research, (2017), 34(10):2881-2886.
4. Khekare G., Wankhade K., Dhanre U., Vidhale B. (2022) Internet of Things Based Best Fruit
5. Segregation and Taxonomy System for Smart Agriculture. In: Verma J.K., Saxena D., GonzálezPrida
V. (eds) IoT and Cloud Computing for Societal Good. EAI/Springer Innovations in Communication and
Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-73885-3_4
6. Ren S, He K, Girshick R, et al., “Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks”, IEEE Trans Pattern Anal Mach Intell, (2015), 39(6):1137-1149.
7. Redmon J, Divvala S, Girshick R, et al., “You Only Look Once: Unified, Real-Time Object Detection”,
(2015):779-788.
8. Liu W, Anguelov D, Erhan D, et al., “SSD: Single Shot MultiBox Detector”, 2015:21-37
9. Lecun Y, Bottou L, Bengio Y, et al., “Gradient-based learning applied to document recognition”,
Proceedings of the IEEE, (1998), 86(11):2278-2324.
10. Khekare, Ganesh and Shahrukh Sheikh. "Autonomous Navigation Using Deep Reinforcement
11. Learning in ROS." IJAIML vol.11, no.2 (2021): pp.63-70.
http://doi.org/10.4018/IJAIML.20210701.oa4
12. Simonyan K, Zisserman A., “Very Deep Convolutional Networks for Large-Scale Image Recognition”,
Computer Science, (2014).
13. He K, Zhang X, Ren S, et al., “Deep Residual Learning for Image Recognition”, (2015):770- 778.
14. Howard A G, Zhu M, Chen B, et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile
Vision Applications”. (2017).
15. Temurnikar, A., Verma, P., & Dhiman, G. (2022). A PSO Enable Multi-Hop Clustering Algorithm for
VANET. International Journal of Swarm Intelligence Research (IJSIR), 13(2), 1-14.
http://doi.org/10.4018/IJSIR.20220401.oa7
16. G. K. Yenurkar, R. K. Nasare and S. S. Chavhan, "RFID based transaction and searching of library
books," 2017 IEEE International Conference on Power, Control, Signals and Instrumentation
Engineering (ICPCSI), (2017), pp. 1870-1874, doi: 10.1109/ICPCSI.2017.8392040.