Sinhala OCR To Produce Rich Text: Punchihewa D.H.T. Nishshanka N.M.J.W. Samaranayake M.M.U.C
Sinhala OCR To Produce Rich Text: Punchihewa D.H.T. Nishshanka N.M.J.W. Samaranayake M.M.U.C
Sinhala OCR To Produce Rich Text: Punchihewa D.H.T. Nishshanka N.M.J.W. Samaranayake M.M.U.C
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
Sinhala OCR to produce Rich Text
M.M.U.C.
Outline
Our Plan
Finally...
Supervisor: Dr.Roshan G. Ragel
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T. 1 Introduction
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C. 2 Present status of OCR
Outline
Introduction
3 Our Target
Present status
of OCR
Our Target
4 Our Plan
Our Plan
Finally...
5 Finally...
What is an OCR System ?
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C. Optical Character Recognition, usually abbreviated to OCR, is
Outline the mechanical or electronic translation of scanned images of
Introduction handwritten, typewritten or printed text into machine-encoded
Present status text.
of OCR
−Wikipedia
Our Target
Our Plan
Finally...
Sinhala OCR to produce Rich Text
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
This is a software solution for
Outline Recognizing Sinhala characters and encode it into
Introduction
Unicode.
Present status
of OCR Detect layout of the document and preserve it.
Our Target
Our Plan
Finally...
Why is this important ?
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka To convert books and documents into electronic files.
N.M.J.W.
Samaranayake
M.M.U.C.
To computerize a record-keeping system in an office.
To publish the text on a website.
Outline
Sinhala OCR
to produce
Rich Text
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
Sinhala OCR
M.M.U.C.
A tool from UCSC
Outline
Only output a text file without formating
Introduction
Present status
TerpWord
of OCR
A tool from UOP
Our Target
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W. Support lots of image types
Samaranayake
M.M.U.C.
Detect format not by line but as words
Outline Detect tables
Introduction
Detect bullets
Present status
of OCR
Detect columns
Our Target
Our Plan
Detect colors
Finally... Provide all these facilities as a WEB Service
Task distribution
Sinhala OCR
to produce
Rich Text
Reviewing the current TerpWord code base
Punchihewa
D.H.T.
Nishshanka
Related knowledge gathering
N.M.J.W.
Samaranayake Devising a way to recognize words and its format
M.M.U.C.
Implementing those features in phase one
Outline
Introduction
Debugging the whole application
Present status Developing a way to detect tables
of OCR
Our Target
Developing way to detect colors
Our Plan Including punctuation marks in the image to the rich text
Finally...
Recognizing english characters in text
Integrate it with a JAVA text editor
Developing a web service as web version of this application
Technologies and needed resources
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
Image processing using matlab
Outline
Text editor integration using java
Introduction
Present status
Version controlling using Mercural
of OCR
Eclipse for code editing
Our Target
Our Plan
Finally...
Final outcome
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
A text editor which has capability convert images of
Outline
sinhala documents preserving all page formats and
Introduction
styles.And deliver a *.rtf file.
Present status
of OCR A web service that can facilitate all these features.
Our Target
Our Plan
Finally...
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
Outline
Introduction
Present status
of OCR
Q&A
Our Target
Our Plan
Finally...
Sinhala OCR
to produce
Rich Text
Punchihewa
D.H.T.
Nishshanka
N.M.J.W.
Samaranayake
M.M.U.C.
Outline
Introduction
Present status
of OCR
Thank You !
Our Target
Our Plan
Finally...