Sinhala OCR To Produce Rich Text: Punchihewa D.H.T. Nishshanka N.M.J.W. Samaranayake M.M.U.C

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Sinhala OCR

to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
Sinhala OCR to produce Rich Text
M.M.U.C.

Outline

Introduction Punchihewa D.H.T.


Present status Nishshanka N.M.J.W.
of OCR
Samaranayake M.M.U.C.
Our Target

Our Plan

Finally...
Supervisor: Dr.Roshan G. Ragel
Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T. 1 Introduction
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C. 2 Present status of OCR
Outline

Introduction
3 Our Target
Present status
of OCR

Our Target
4 Our Plan
Our Plan

Finally...
5 Finally...
What is an OCR System ?

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C. Optical Character Recognition, usually abbreviated to OCR, is
Outline the mechanical or electronic translation of scanned images of
Introduction handwritten, typewritten or printed text into machine-encoded
Present status text.
of OCR
−Wikipedia
Our Target

Our Plan

Finally...
Sinhala OCR to produce Rich Text

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
This is a software solution for
Outline Recognizing Sinhala characters and encode it into
Introduction
Unicode.
Present status
of OCR Detect layout of the document and preserve it.
Our Target

Our Plan

Finally...
Why is this important ?

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka To convert books and documents into electronic files.
N.M.J.W.
Samaranayake
M.M.U.C.
To computerize a record-keeping system in an office.
To publish the text on a website.
Outline

Introduction OCR makes it possible to


Present status
of OCR Edit the text
Our Target Search for a word or phrase
Our Plan
Text-to-speech
Finally...
Text mining
Current work on the field

Sinhala OCR
to produce
Rich Text

Punchihewa Best softwares


D.H.T.
Nishshanka Microsoft OneNote 2007
N.M.J.W.
Samaranayake
M.M.U.C.
SimpleOCR
Outline
TopOCR
Introduction freeOCR
Present status
of OCR
Many of these
Our Target Produces normal text.
Our Plan
Do not have font styling.
Finally...
Do not have Table detection.
Only supported for two or three image file types.
Avalable Sinhala OCR

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
Sinhala OCR
M.M.U.C.
A tool from UCSC
Outline
Only output a text file without formating
Introduction

Present status
TerpWord
of OCR
A tool from UOP
Our Target

Our Plan Only contains certain levels of formatting


Finally...
What we are going to about it

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W. Support lots of image types
Samaranayake
M.M.U.C.
Detect format not by line but as words
Outline Detect tables
Introduction
Detect bullets
Present status
of OCR
Detect columns
Our Target

Our Plan
Detect colors
Finally... Provide all these facilities as a WEB Service
Task distribution

Sinhala OCR
to produce
Rich Text
Reviewing the current TerpWord code base
Punchihewa
D.H.T.
Nishshanka
Related knowledge gathering
N.M.J.W.
Samaranayake Devising a way to recognize words and its format
M.M.U.C.
Implementing those features in phase one
Outline

Introduction
Debugging the whole application
Present status Developing a way to detect tables
of OCR

Our Target
Developing way to detect colors
Our Plan Including punctuation marks in the image to the rich text
Finally...
Recognizing english characters in text
Integrate it with a JAVA text editor
Developing a web service as web version of this application
Technologies and needed resources

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
Image processing using matlab
Outline
Text editor integration using java
Introduction

Present status
Version controlling using Mercural
of OCR
Eclipse for code editing
Our Target

Our Plan

Finally...
Final outcome

Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
A text editor which has capability convert images of
Outline
sinhala documents preserving all page formats and
Introduction
styles.And deliver a *.rtf file.
Present status
of OCR A web service that can facilitate all these features.
Our Target

Our Plan

Finally...
Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.

Outline

Introduction

Present status
of OCR
Q&A
Our Target

Our Plan

Finally...
Sinhala OCR
to produce
Rich Text

Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.

Outline

Introduction

Present status
of OCR
Thank You !
Our Target

Our Plan

Finally...

You might also like