Project 1
Project 1
Project 1
Abstract— in this research, a tool that can provide This network contains two convolutional layers and three
information about object around is made. This tool can also layers that are fully connected, and this depth seems
estimate distance of detected object through camera which is important. We find that eliminating any convolutional layer
combined with glasses, to ease blind people who use it. This tool produces lower performance.
is certainly can help them to identify object around and improve In this paper, we train an object detection and distance aid
their skill and ability. This tool use camera as main sensor, for blind people. Because the blind people have difficulties to
which works like human eyes, to provide real time video as identify their environment around them. Several older
visual data. The RGB visual data is processed using research about object recognition for blind people has been
Convolutional Neural Network which has 176x132 pixels by
done. One of them is object detection for blind using SIFT to
convoluting 2 times. It produces smaller pixels with size 41x33
pixels, so weights is obtained for classification using back
identify and locate object in a video scene using SIFT and
propagation and determined dataset. After getting detection convert the output into sound, but the object label must be
result, the next step is a find centroid value as center point for towards to camera [3]. It is very tough to blind people to
measuring the distance between objects and cameras with Stereo locate label efficiently. The second research is also using
Vision. The results is converted into sound form and connected SIFT, and can identify object in a picture with many object
to earphones, so blind people can hear the information. The test [4]. From those research, object detection result only provide
results show that this tool can detect predetermined objects, information about kind of object or convert it into sound.
namely humans, tables, chairs, cars, bicycles and motorbikes With the result of that, we have an innovation to make object
with an average accuracy of 93.33%. For measurements of detection for blind people, which added stereo vision in the
distances between 50 cm to 300 cm it has an error of around tools. The object detection and classification uses the CNN
6.1%. method with two convolution processes. The object distance
also measured using stereo vision. The result will be
Keywords— Blind, Centroid, Convolution Neural Network. connected to the earphone in the form of sound.
Our model has several contributions. First, we define
I. INTRODUCTION object detection as a regression problem for the environment
Nowadays, object recognition is using essential Machine around the user which consists of many objects. Each
Learning methods. To improve its performance, we able to predicted box has a distance value measured from the user's
collect larger data sets, learn stronger models, and use better position. The second main contribution is combining object
techniques to prevent over fitting. But in reality, objects show detection using CNN with stereo vision for object
considerable variability, so to recognize them needs to use measurement. And the third contribution is converting the
larger training set [1]. The dataset containing small numbers results of the classification and distance measuring process of
of images does indeed have drawbacks, but now it is possible objects into sound. So, user can hear the objects around him.
to collect large amounts of data. Such datasets consist of
hundreds of thousands of images that are fully segmented,
and consist of more than 15 million high-resolution images II. METHODOLOGY
labeled in more than 22,000 categories. To learn about The purpose of this research is to create a device that can
thousand objects from million images, we need a model with detect an object and determine distance using the
a large learning capacity. And CNN has far fewer connections Convolutional Neural Network method. The camera is used
and parameters so they are easier to train, CNN training in the as the main sensor to get image input. The next stage of this
form of the latest data collection containing sufficient data in research is explained by the flowchart in figure 1.
the process of training objects without severe over fitting [2].
F. Hardware
The main hardware used is the Mini PC, Camera and
Battery. This tool is divided into two parts. The first part is a
computer for main of the algorithm processing. For the
second part there is a camera placement. The camera is
Fig 5. Stereo Vision Concept Illustration combined with the glasses so that the system works like the
eye, as in Figure 8.
From Figure 5, the stereo vision formula is obtained so
that it can estimate the distance of the object [10]. There is
the equation used in measuring the distance of objects to the
camera with an explanation as follows.
fl = focal length of left camera
fr = focal length of right camera
T = distance between two camera center points
Xl = centroid point X left camera
Xr = centroid point X right camera Fig 7. Glasses with Camera
The difference between centroid X frame points like In Figure 8, the camera is placed on the right and left side
equation 3. And can used to measure the distance as a divider of the glasses. There are 2 holes in which there is a camera.
in equation 4 below. The part is designed to resemble glasses so that it can be
comfortable.
d = Xl – X r (3)
A. CNN Test Result Figure 10, testing is carried out on chairs made of plastic.
This test is detecting all objects that have been determined Whereas in Figure 11 the table used for testing is small and
with the dataset in real time. The test result are below. made of wood. The results of human detection have no
obstacles seen in Figure 12. And the last object that can be
detected is a bicycle can be seen in Figure 13. Figure 14
shows the configuration results using Hidden Neuron 256. In
the graph the accuracy with Hidden Neuron 256 is more
stable starting at epoch 10. With this epoch 60 the iteration
value remains stable between the ranges of values close to 1.
9 210 236,4 12,5 using portable camera,” IEEE WCTFTR 2016 - Proc. 2016 World
Conf. Futur. Trends Res. Innov. Soc. Welf., pp. 3–6, 2016.
10 230 241,2 4,8
[4] H. Jabnoun, F. Benzarti, and H. Amiri, “Visual substitution system
11 250 275,3 10,1 for blind people based on SIFT description,” 6th Int. Conf. Soft
12 270 316,2 17,1 Comput. Pattern Recognition, SoCPaR 2014, pp. 300–305, 2015.
[5] Q. Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma, and B. Yu, “Recent
Error Percentage = ݔ5,3% advances in convolutional neural network acceleration,”
Neurocomputing, vol. 323, pp. 37–51, 2019.
TABLE 2. CHAIR DISTANCE MEASUREMENT RESULT [6] L. Sroba, J. Grman, and R. Ravas, “Impact of Gaussian noise and
No Meter (cm) Camera (cm) Error (%) image filtering to detected corner points positions stability,” 2017
11th Int. Conf. Meas. Meas. 2017 - Proc., vol. 10, no. 1, pp. 123–
1 50 56,9 13,8 126, 2017.
2 70 71,6 2,3 [7] A. Khumaidi, E. M. Yuniarno, and M. H. Purnomo, “Welding
3 90 82,1 8,8 defect classification based on convolution neural network (CNN)
and Gaussian Kernel,” 2017 Int. Semin. Intell. Technol. Its Appl.
4 110 108,3 1,5 Strength. Link Between Univ. Res. Ind. to Support ASEAN Energy
5 130 134,5 3,5 Sect. ISITIA 2017 - Proceeding, vol. 2017-January, pp. 261–265,
6 150 159,2 6,1 2017.
[8] J. Nagi et al., “Max-pooling convolutional neural networks for
7 170 177,3 4,3 vision-based hand gesture recognition,” 2011 IEEE Int. Conf.
8 190 196,6 3,5 Signal Image Process. Appl. ICSIPA 2011, pp. 342–347, 2011.
[9] K. Selvakumar, S. Prabu, and L. Ramanathan, “Centroid neural
9 210 246,3 17,3
network based clustering technique using competitive learning,”
10 230 236,4 2,8 6th Int. Conf. Cond. Monit. Mach. Fail. Prev. Technol. 2009, vol.
11 250 278,6 11,4 1, no. October, pp. 389–397, 2009.
[10] I. Marzuqi, G. P. Arinata, Z. M. A. Putra, and A. Khumaidi,
12 270 288,9 7
“Segmentasi dan Estimasi Jarak Bola dengan Robot Menggunakan
Error Percentage = ݔ6,9% Stereo Vision,” pp. 140–144, 2017.
[11] Syai'in, M., et al. Smart-Meter based on current transient signal
signature and constructive backpropagation method. in 2014 The
TABLE 3. STEREO VISION ERROR AVERAGE
1st International Conference on Information Technology,
Object Error (%) Computer, and Electrical Engineering. 2014.
Human 5,3
Chair 6,9
Error Percentage = ݔ6,1%
IV. CONCLUSION
2. CNN system performance when detecting objects in the
form of cars, tables, chairs, bicycles, humans and
motorbikes has their respective characteristics from
various directions. There are still unsuitable detection
results. According to the results the success rate of
detection is quite high at 93.33%.
3. Stereo Vision measurement results have a high error
around 6.1%, while the object is located right between 2
cameras. Stereo Vision has a weakness, if the object
detected is not right in front of and between the two
cameras, the Stereo Vision measurement cannot measure
properly. Stereo Vision has limited distance in
measurement. Distance measurement is only in the range
of 50 cm to about 300cm.
REFERENCES