Visual OCR

ABSTARCT
Human communication is based entirely on voice and writing. Visually disabled individuals,
with the aid of this, may collect knowledge from their words. Project visually disabled people
will read the text in the captured image. In this project, we use the PC Camera and this support to
take images, and this image is transformed to a scan image for further processing by using the
Imagemagick programme. The performance of the Imagemagick programme is in the form of a
scanned image that the scan image is used as an input to the Tesseract OCR (Optical Character
Recognition) software to translate the image to text. We use the TTS (Text to Speech) engine to
translate text to speech. Experimental studies suggest that an interpretation of the various images
collected would be more useful to blind people.
INTRODUCTION
Today, approximately 2-3 percent of the world 's population is blind and individuals with poor
vision are combined. Visually disabled or blind people are people who lack their vision sight and
are unable to see the object. We all know that blind people have their own scripting language
known as Braille, which is a little difficult to understand. Yet they have the capacity to hear, and
this ability helps them hear from the world. This approach is very difficult to adopt and takes
further practice. To solve the dilemma of visually disabled persons or blind people to read the
text and the document. So, we have to build a computer that reads a text or a script, and the
output of the device is a sound signal. Blind people can clearly detect the echo, and they can
recognize the text in the sound source. This device can benefit most of the world's people who
face vision loss issues.
Visually impaired people represent a significant population segment, currently the number
being estimated to tens of millions around the globe. Their integration in the society is an
important and constant objective. A great effort has been made to assure a health care system.
Various guidance system techniques have been developed to assist the visually impaired people
in living a normal life. Often, these systems are designed only for specific tasks. Nevertheless,
these systems can greatly contribute to the mobility and safety of such people.
The development of state-of-the-art guidance systems to assist visually impaired people is

closely related to the advanced methods in image processing and computer vision as well as to
the speed performance of the devices and unit processors. Regardless of the involved technology,
the application needs to operate in real time with quick actions and decisions, as the speed might
be critical for taking actions.
Basically, picking up the best solution is a trade-off between the performance of the software
component and the hardware capabilities. Optimum parameters tuning is required. During the
indoor movement of a visually impaired person, one of the main objectives for the assisted
system is to automatically detect and recognize objects or obstacles followed by an acoustic
alert.
The vision module for image processing proposed in this system is an integrated part of the
platform dedicated to guide visually impaired people. Moreover, the proposed module can be
also used off the shell, independently of the integrated platform. The vision based guidance
system proposed is designed, developed and validated throughout experiments and iteratively
optimized. The module is compliant to the principle of developing a highly performance device
but cost effective with practical usage. The module is using disruptive technology and allows for
updating and inclusion of new functions.
LITERATURE SURVEY
Smart reader for visually impaired [2018]
Usually many people suffered from visual disabilities. Written transcript is an appearing form of
information that is unapproachable by a lot of sightless and visually impaired except it is
symbolized in a non-visual form like Braille. Smart reader is a need of an effective system for
visually impaired. The OCR (Optical Character Recognition) functions of MATLAB for
converting image to text. This paper proposes the smart reader system for visually impaired.
Here proposed a novel audio-tactile user interface that supports the user to read the information
Portable computer vision based assisting device for the visually impaired people [2015]
This project presents a complete aid to the visually impaired people. The application deals
mainly with face detection and recognition for the blind user helping them to survive
independently in their house. In this project we are using a Raspberry pi B+ module and a
Raspberry pi camera module which is of 5MHz is found to be interfaced in the processor by
using a camera serial interface cable. The process works in such a way that when the webcam is
displayed it first searches for a face to check whether it is a human being or an object. Then it
processes to check for the eyes. The eye search region is one of the important criteria for face
detection. Once the eyes are detected it then starts collecting the samples for training and all
these files are found to be saved properly. When the collection process is over the final step will
be to recognize the face. So when an input image is applied it checks for the above criteria and it
predicts the name of the person through a spoken message. This spoken message can be known
to the blind user by wearing a headset so if it is a known user it will tell the name of the person
suppose if it is an unknown user it alerts the person by telling that it is an unauthorized person.
Autonomous OCR dictating system for blind people [2016]
In this study, the main idea is the development of an autonomous mobile system for dictating
text documents via image processing algorithm for blind people. The system is constituted by the
Raspberry Pi 2B - the mobile processing unit- and a pair of specially designed glasses with an
HD camera and Bluetooth headset. The blind user should hold the book open (two pages) with
his hands stretched straight at the level of his eyes; then a calibration procedure takes place in
order to capture the best image. Therefore, 1-D signal transformation of the above image is
produced in order to filter every text line. Finally, every word of each text line is identified via an
OCR (Optical Character Recognition) method and the user hears it via TTS (Text To Speech)
procedure.
Smart Glasses for the Visually Impaired People [2016]
People with visual impairment face various problems in their daily life as the modern assistive
devices are often not meeting the consumer requirements in term of price and level of assistance.
This paper presents a new design of assistive smart glasses for visually impaired students. The
objective is to assist in multiple daily tasks using the advantage of wearable design format. As a
proof of concept, this paper only presents one example application, i.e. text recognition
technology that can help reading from hardcopy materials. The building cost is kept low by using
single board computer raspberry pi 2 as the heart of processing and the raspberry pi 2 camera for
image capturing. Experiment results demonstrate that the prototype is working as intended.
Efficient Portable Camera Based Text to Speech Converter for Blind Person [2019]
Text Reader for Blind Person using camera module ensuring portability is the prototype made
using the Raspberry Pi 3b and Python to read the text from the handheld objects of the blind
person. This paper proposes a better approach for text localization and extraction for detection of
text areas in the images. The text size is an important factor whose dimension should be properly
elected to make the method more general and insensitive to various font shapes and sizes. The
proposed method involves four steps detection of an object, localization of the text, extraction of
the text and text to speech conversion. The Region of Interest is extracted from the cluttered
background and then the text localization algorithm is applied to locate and extract the text. After
extracting the text from the ROI, it is converted it into speech. It works more efficiently with
Optical Character Recognition. Convolutional recurrent neural network is proposed for training
the words separately. The experiment and training are performed on Synth 90k word dataset.
Finally using OCR and CRNN a proposed model has been developed.
Raspberry pi-based reader for blind people [2018]
This paper presents the automatic document reader for visually impaired people, developed on
Raspberry Pi. It uses the Optical character recognition technology for the identification of the
printed characters using image sensing devices and computer programming. It converts images
of typed, handwritten, or printed text into machine encoded text. In this research these images are
converted into the audio output (Speech) through the use of OCR and Text-to-speech synthesis.
The conversion of printed document into text files is done using Raspberry Pi which again uses
Tesseract library and Python programming. The text files are processed by OpenCV library &
python programming language and audio output is achieved.
OCR based automatic book reader for the visually impaired using raspberry pi [2016]
This paper aims at making an open-source audio book software to build a book reader with
raspberry pi controls. Here in this paper, we present the Pi Book reader which can read a real
book aloud and also turn the pages of the book. The overall process of the project involves Image
to Text conversion and then Text to Speech conversion. The image to text conversion is carried
out with the help of OCR [Optical Character Recognition]. The Optical Character Recognition
Technology can be used to convert various kinds of documents like images, scanned documents
and PDF files. The OCR algorithm involves various stages like Scanning, Preprocessing, Feature
Extraction, Classification and Recognition. Finally, E-Speak voice command software is used to
convert the obtained text from OCR into speech command. This converted speech is read aloud
by a speaker connected to the Raspberry Pi. The programming language that we have used is
python. This project is practical and of great use for the visually impaired.
“Text recognition and face detection aid for visually impaired person using raspberry pi,”
[2017]
Speech and text is the main medium for human communication. A person needs vision to access
the information in a text. However, those who have poor vision can gather information from
voice. This paper proposes a camera based assistive text reading to help visually impaired person
in reading the text present on the captured image. The faces can also be detected when a person
enter into the frame by the mode control. The proposed idea involves text extraction from
scanned image using Tesseract Optical Character Recognition (OCR) and converting the text to
speech by e-Speak tool, a process which makes visually impaired persons to read the text. This is
a prototype for blind people to recognize the products in real world by extracting the text on
image and converting it into speech. Proposed method is carried out by using Raspberry pi and
portability is achieved by using a battery backup. Thus the user can carry the device anywhere
and able to use at any time. Upon entering the camera view previously stored faces are identified
and informed which can be implemented as a future technology. This technology helps millions
of people in the world who experience a significant loss of vision.
Smart specs: Voice assisted text reading system for visually impaired persons using TTS
method [2017]
According to the World Health Organization, out of 7.4 billion population around 285 million
people are estimated to be visually impaired worldwide. It is observed that they are still finding it
difficult to roll their day today life and it is important to take necessary measure with the
emerging technologies to help them to live the current world irrespective of their impairments. In
the motive of supporting them We have proposed a smart spec for the blind persons which can
perform text detection thereby produce a voice output. This can help the visually impaired
persons to read any printed text in vocal form. A specs inbuilt camera is used to capture the text
image from the printed text and the captured image is analyzed using Tesseract-Optical
Character recognition (OCR). The detected text is then converted into speech using a compact
open source software speech synthesizer, eSpeak. Finally, the synthesized speech is produced by
the headphone by TTS method. In this project Raspberry Pi is the main target for the
implementation, as it provides an interface between camera, sensors, and image processing
results, while also performing functions to manipulate peripheral units (Keyboard, USB etc.,).
Portable camera-based text reading of objects for blind persons [2018]
Here they proposed a camera-based assistive text reading framework to help blind persons to
read text labels from hand-held objects in their day to day lives Self-Dependency of blind people
is very important in their day-to-day lives. This paper presents a cost effective prototype system
to help blind persons to shop independently. As we know printed text is everywhere like product
names, instructions on medicine bottles, restaurant menus, signed boards etc. To read these text
blind and visually impaired people need some help? This paper presents a camera-based assistive
product label reader for blind persons to read information of the products. It is hard to detect text
due to the variations of text font, sizes, color text, clutter background and different orientation. In
this paper Camera is used to captured the image of the product. Then captured image is
processed internally using different algorithms such as text localization and text recognition
algorithm to extract text the label from image by using MATLAB. The extracted text label is
converted to audio output using text to speech converter and it is pronounced as audio to the
blind person.
EXSISTING SYSTEM
In Existing system, have endeavored to facilitate the weight on visually impaired individuals by
proposing different strategies that changes over content to capable of being heard sounds.
Tyflos is two or three glasses that have cameras joined to the side, headphones and an
amplifier. Voice orders can be utilized to order the client and direct the stage. A few directions
incorporate "draw paper nearer," "move paper up," "move paper up, appropriate" from the
gadget to the client, and "rewind section," "forward passage," and "volume up" from the client
to the gadget. Regardless, the discourse client coordination probably won't work flawlessly in a
boisterous domain, rendering it constrained to indoor use.
Finger Reader is one such gadget, a wearable ring with a camera which is available on the front.
The voice UI probably won't work consummately in a bedlam encompassing; rendering is
limited to indoor need. The proposed system helps blind persons to read product the project
aims to implement a reading aid that is small, lightweight, efficient in using computational
resources, cost effective and of course user friendly the processor based system can be
equipped with a high resolution webcam the microcontroller-based system is easier to use
when compared to the mobile one
OBJECTIVE
The project aims to facilitate the movement and a smart reader device for blind and visually
impaired. The plan defines a vision-based platform for the identification of real-life indoor and
outdoor objects to guide visually impaired people and also to read out anything written texts.
Using Python and OpenCV library functions, the software is developed and eventually ported to
a RPi.
The main objectives of the proposed system are:
 To study and understand the existing vision module systems.
 Working of different frameworks for the image acquisition system.
 To study how to Classify objects using CNN
 Finding Objects positon in the given Input frame or finding Characters written in the frame.
 Programming both the objects detected and position of the objects to a speech output using
text-to-speech convert
 Analysing the written text and converting it into speech format.
PROPOSED SYSTEM
This system consists of a voice-assisted text-reading device for visually disabled individuals. The
proposed framework uses four distinct types of modules: a camera module, an image processing
module, an optical character recognition module and a text-to - speech module.
Image Capturing
In this stage, the text image is captured using a 5-megapixel resolution Raspberry pi camera. The
captured image is not flawless in shape and scale, nor is the captured image in an acceptable state
for text extraction data. Thus, the captured image is first applied to the image processing module.
The picture taken is in jpg format.
Camera Image Capturing Power Supply
Object detection
TTS
Image processing Text Extraction
Speaker/head
phone
Object Detection
Many artefacts are present in the captured shot. All objects are identified using the JSON.parse
library, but not all objects are interpreted, although the one with more precision can be read.
Image Processing
In the image processing, unnecessary noise is eliminated by using the image magic programme.
Imagemagick is an open source and free software. Imagemagick consists of a variety of tools in
which the proposed technique uses image sharpening and text washing. Image sharpening
increases the contrast between the light and dark regions of the image. Text cleaning is used to
clean the scan document to make the final image more readable for the OCR process.
Text Extraction
In this step, the output image of the image magic programme is translated into text or editable
data. We used tesseract OCR programme for this operation. The Tesseract OCR programme
detects text in the capture image after it has been analyzed. The performance of the text
extraction is in the form of a.txt format.
Text to Speech Converter
In this step, the extracted text is translated to speech using a speech synthesizer. We used the e-
speak TTS engine and the google speech synthesizer for this operation. The output of the speech
synthesizer is in sound or audio format.
METHODOLOGY:
OBJECT DETECTION
The video is captured using a Camera which is then divided into a sequence of frames. Object
detection is done using CNN classifiers and text to speech conversion is done using pyttsx3
Gather Training the Evaluate and

Split Dataset
Dataset Network test
Figure 2: Deep Learning Steps
The process image acquisition > image processing> acoustic notification is looped for the entire
person’s movement in the indoor environment. Summing the three processing times lead to the
overall processing time which determines the acquisition rate for the input image frames. The
process needs to be fast enough, so that the potential obstacles can be avoided timely.
The image processing method is applied to a specific object detection task, more
precisely traffic sign recognition. We have used the integrated OpenCV function named cv2.
Start
Image capturing
Import library
Load trained model
Image pre-processing
Object detection and classification
Draw bounding box on the object
Find positon of the detected objects
Append position and objects to the tts
Output speakers
Figure 3: Workflow of the object detection algorithm.
Match Template existing in the library for the Python version. The module addressed the
following specifications in the design:
• Required time between two consecutive video frames. We desire to obtain a low processing
time for each template. As we applied the approach for several scales, the summing processing
time should be small enough to permit real time decisions.
• In the multiscale approach each acquired video frame is down sampled with various
resolutions factors, i.e. 5, 3 and 1. For instance, if the source image has an initial resolution of
960×1280 pixels, by down sampling with factor 3, this will lead to three lower resolution images
having the following size: 960×1280 pixels, 720×960 pixels and 480×640 pixels. Then the
template is compared against each such scaled image source version. Another considered aspect
it to analyse the way the internal module parameters are correlated to the size of the traffic signs
and image source resolution. To have an overall evaluation we need to take into account several
processing time at each step, starting to the image acquisition, module communication and
ending with the trigger action to emit the acoustic alert. Once an object is identified, an acoustic
message is send to the user via headphones. In its simplest form, the audio message is
approximately 1.2÷1.5s length. This time interval is seen here as a reference.
TEXT RECOGNITION
Image Capturing: First step is to capture an image from the document or book and the document or
book is placed under the camera which is help to capture an image from the document or book. The
camera used to capture an image is PC camera.
Image Pre -processing: Image pre-processing is to remove unwanted noise in the image by applying
appropriate threshold. It is use for correcting skew angles, sharpening of image, thresholding and
segmentation.
Text extraction: In our project we use tesseract ocr engine which is used to extract the recognized text.
Text to speech: After extracting the text the text will be convert into speech. The text to speech
synthesizer is used to convert text into speech. At last stage the speech output will get through
START
Initialize camera / load the pre-stored

image
NO Is Image
captured /
image loaded
YES
Image preprocessing
Text extraction
Convert to speech
Speech output
END
FUNCTIONAL REQUIREMENTS
 Device must be enabled or disabled by user.
 Device should be able to capture image on mounted camera
 Device should be able to convert received text data to speech.
 Device should be able to direct audios to the headphones that connected to Jack.
 GUI should stream live video feed without severely lagging
NONFUNCTIONAL REQUIREMENTS
The nonfunctional requirements are divided into usability, reliability, performance,
Supportability and safety.

USABILITY
The system must be easy to learn for both users of the device and helpers who are the users of
the GUI interface
RELIABILITY
The reliability of the device essentially depends on the software tools (OpenCV,
Text-to-Speech etc.) and hardware tools (camera, computer etc.) used for the system
development.
PERFORMANCE
 Image data loading through program and live streaming makes performance measures
crucial.
 For desired performance, image capturing, transferred data size, speed of connection,
response time, processing speed must be considered.
 System should work real-time which means there should be an acceptable time delay
such as max 4-5 seconds between request and response.
 Image processing should be optimized so it should not take time more than 2 seconds.
Program should not process every frame and should determine whether process or not the
frame.
SUPPORTABILITY
The system shall allow the system administrator to add additional features. The system needs to
be cost-effective to maintain
SAFETY
In case of malfunction, system should shutdown itself and reboot in order to prevent unpredicted
results.
CONCLUSION
We have implemented voice assisted text reading system and object recognition our system. Our
methodology processed the captured image and reads out it clearly. The device output is in the form
voice so; it can be an easy hearing setup for visually impaired people. This system is an efficient device
as well as economically helpful for the blind people. This device is useful in blind school and colleges.
This can be also used as application of artificial intelligence. It is helpful for illiterate people also this
device is compact in size and very useful to the society.
REFERENCES
[1] Musale, Sandeep & Ghiye, Vikram. (2018). Smart reader for visually impaired. 339-342.
10.1109/ICISC.2018.8399091.
[2] Monisha, M. & Nandhini, A.. (2015). Portable computer vision based assisting device for the visually
impaired people. International Journal of Applied Engineering Research. 10. 14379- 14387.
[3] Liambas, Christos & Saratzidis, Miltiadis. (2016). Autonomous OCR dictating system for blind people.
172-179. 10.1109/GHTC.2016.7857276.
[4] Ali, Maghfirah & Tang, Tong Boon. (2016). Smart Glasses for the Visually Impaired People. 9759. 579-
582. 10.1007/978-3-319- 41267-2_82.
[5] Shah, Trupti & Parshionikar, Sangeeta. (2019). Efficient Portable Camera Based Text to Speech
Converter for Blind Person. 353-358. 10.1109/ISS1.2019.8907995.
[6] A. Goel, A. Sehrawat, A. Patil, P. Chougule, and S. Khatavkar, “Raspberry pi-based reader for blind
people,” 2018.
[7] S. Aaron James, S. Sanjana, and M. Monisha, “OCR based automatic book reader for the visually
impaired using raspberry pi,” International Journal of Innovative Research in Computer and
Communication, vol. 4, no. 7, 2016.
[8] M. Rajesh, B. K. Rajan, A. Roy, K. A. Thomas, A. Thomas, T. B. Tharakan, and C. Dinesh, “Text
recognition and face detection aid for visually impaired person using raspberry pi,” in 2017 International
Conference on Circuit, Power and Computing Technologies (ICCPCT). IEEE, 2017, pp. 1–5.
[9] R. Ani, E. Maria, J. J. Joyce, V. Sakkaravarthy, and M. Raja, “Smart specs: Voice assisted text reading
system for visually impaired persons using TTS method,” in 2017 International Conference on
Innovations in Green Energy and Healthcare Technologies (IGEHT). IEEE, 2017, pp.1–6.
[10] S. I. Shirkeand S. V. Patil, “Portable camera-based text reading of objects for blind persons,”
International Journal of Applied Engineering Research, vol. 13, no. 17, pp. 12995–12999, 2018

Visual OCR

Uploaded by

Copyright:

Available Formats

Visual OCR

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Visual OCR

Uploaded by

Copyright:

Available Formats

ABSTARCT

The development of state-of-the-art guidance systems to assist visually impaired people is

Smart reader for visually impaired [2018]

Autonomous OCR dictating system for blind people [2016]

Smart Glasses for the Visually Impaired People [2016]

Raspberry pi-based reader for blind people [2018]

Portable camera-based text reading of objects for blind persons [2018]

Camera Image Capturing Power Supply

Image processing Text Extraction

Text to Speech Converter

Gather Training the Evaluate and

Load trained model

Object detection and classification

Draw bounding box on the object

Find positon of the detected objects

Append position and objects to the tts

Initialize camera / load the pre-stored

The nonfunctional requirements are divided into usability, reliability, performance,

Supportability and safety.

the GUI interface

response time, processing speed must be considered.

such as max 4-5 seconds between request and response.

You might also like