Sinhala Mobile Framework: Name: U. D. Wettewa Index Number: 054095T
Sinhala Mobile Framework: Name: U. D. Wettewa Index Number: 054095T
Sinhala Mobile Framework: Name: U. D. Wettewa Index Number: 054095T
Name: U. D. Wettewa
University of Moratuwa
2008
Sinhala Mobile Framework
Name: U. D. Wettewa
University of Moratuwa
2008
Declaration
I declare that this is my own work and has not been submitted in any form for another
degree or diploma at any university or other institution of tertiary education.
Information derived from the published or unpublished work of others has been
acknowledged in the text and a list of references is given.
U. D. Wettewa ………………………
Supervised by:
i
Dedication
This dissertation is dedicated to all those who contributed make my life what it is.
Especially I also dedicated this dissertation to my beloved parents and teachers.
ii
Abstract
Mobile applications are in the forefront as technology that shapes and models the
world culture. Together with Internet technologies mobile technology has changed
people’s lifestyles and habits. Novel concepts like mobile information distribution and
m-governance are becoming a part of life. In order to expand the accessibility to
mobile applications it has been essential to support native languages to make it
legitimate in cases of government use and to access wider masses.
The aim of this project is to enhance the mobile support for the language of Sinhala
which is the mother tongue of people of Sri Lanka. It proposes a framework that will
render Sinhala support for applications. Currently the area is covered by many tricks
and practices which are focused towards a specific area or a platform. Such methods
greatly decrease the generality and portability of contents through platforms.
This project aims build a new solution unifying Java ME, the Unicode standard and
bit map fonts to create a solution. Most previous solutions previously for this solution
were centric towards a particular problem in hand. This project has taken a new
approach to develop a framework which will enable users to build a collection of
programs that have Sinhala capabilities. This will enable a new generation of mobile
devices supported by Sinhala.
iii
Table of Contents
Pag
Design of Framework 21
5.1 Introduction 21
5.2 Sinhala Mobile Framework – The Big Picture 21
5.3 Sinhala Enabling Framework 22
iv
5.4 Representation of Bit Map fonts in the System 23
5.5 Find Sinhala Characters and Mapping it with glyphs 24
5.6 Designing a custom bitmap font format 24
References 37
Abbreviations v
v
List of Figures/Tables
Pag
Figure 5.1 The Sinhala Mobile Frame Work in use - The Big Picture........................21
vi
Chapter 1
1.1 Introduction
Though this is the case with personal computers mobile computing is a gray area.
Mobile devices compared to personal computing environments have wider variety of
system software which is proprietary and varying vastly from each other. Though high
end mobile devices which run operating systems like Windows Mobile or Windows
CE have capabilities for localization and to install new fonts and display non-Latin
characters it caters only a small portion of the current mobile users. In order to
address the problem of supporting local languages on mobile phones a considerable
portion of low end mobile phones should be supported. This is not an easy task as it
consists of an enormous amount of platforms which differ from each other and lack a
common programmable extensibility. Even to make the matter worse the proprietary
firmware which power these devices are kept tight control.
In recent times the web has transformed in a novel way where the content is more
dynamic and provided by average users. Quite a number of sites ranging from news,
mailing and messaging services and social networking provide mobile access for their
subscribers. This phenomenon has enabled to provide more user centric information
1
fast and gain feedback fast. Though most content is in English, Sinhala content is
increasing rapidly in the web. These range from personal weblogs and news sites.
Also the current policy of the government to empower its departments with
Information Technology has caused a rapid increase in Sinhala content. These sites
not only provide static content but provide services where a user is able get some
particular information. In order to preserve the value of some information it is not
enough to provide them on desktop browser but also as mobile content. For example
news or special alerts related to a disaster should be propagated fast. Currently most
such services are provided in English.
Apart from all this, mobile computing has introduced a new concept called m-
governance where citizens access government services through their mobile devices.
Mobile devices provide authenticity of device through service providers and wide
access for such services which makes it an attractive solution. In order to such
services to work, local language support is essential. Further it can be a requirement
my law to provide the service in all official languages of a state. Further they should
be accessible from any platform and device without preferring any vendor or
proprietary protocol. As of these reasons there are strong reasons why ubiquitous
mobile support for accessing information is needed. In order to achieve such a goal in
Sri Lanka mobile devices need to support not only English but native languages
Sinhala and Tamil. Do it is essential to increase the number of Sinhala supported
mobile devices. Further it is necessary that this support is Unicode complaint.
Current Sinhala support in mobile devices is not strong due to many reasons. Only a
handful of devices support the display of Sinhala locales. Even to make the matter
worse the formats used in such devices is not portable and most devices are locked in
vendor created formats for doing so. The intension of this project is to address these
issues. In addressing this issue the project tries to achieve the goals mentioned below.
Increase the number of Sinhala Supported mobiles i.e. to support a good portion of
mobile devices in current use to access Sinhala data
Follow the Unicode standard (a ubiquitous standard) which represents and allows
content to be shared in a vendor or platform neutral encoding scheme
2
supporting a specific vendor the set of supported devices is going to be very limited in
the mobile world because devices from a single vendor differ a lot. So the approach
will be to support a specific platform such as Java ME so that many devices can be
supported with ease. This is desirable as Java creates a common platform on top of
different types of devices lightening the burden of supporting specific phones. As
pointed out above Unicode is a standard aimed producing a single unified standard of
character encoding for most languages. Though Java Language support variants of
Unicode or multi-byte characters it depends on the underlying systems capability to
display the appropriate character glyph. So Java itself is not a solution for displaying
newer fonts. This project aims to implement a Unicode aware Sinhala Font enabling
Framework on top of Java 2 Micro Edition.
The aim of this project is to develop a Sinhala Framework that will enable mobile
phones to display Unicode complaint Sinhala Fonts in its display. Java 2 Micro
Edition is used as a framework to develop Sinhalese mobile applications. Since Java 2
Micro Edition is widely supported in mobile phones it will enable a major portion of
mobile users to access Sinhalese news services and other online content. In doing so
following goals and objectives will be achieved.
As the final deliverable the framework that enables display of Sinhala Fonts will be
presented. This framework will enable developers to easily write applications so that
they will seamlessly display Sinhala fonts.
3
1.2 Order and Intension of this report
As explained above this project aims to build a framework that is usable by others. A
framework in itself will of no use to the end user. Its primary users will be developers.
This project is a part of a more ambitious project which aims to build a stack of
programs that will enable a user to deliver a real Sinhala news service accessible to a
majority of mobile users in Sri Lanka. This portion of the project will exclusively run
on the client side and can be used for multiple purposes. For example this framework
can be used either for display of Unicode text messages send to the phone or to access
a web page with Sinhalese font.
As noted above the value of this solution lies upon the ability to deploy on many
mobile devices that are used by Sri Lankan users. Technically it will require a device
that supports Java Micro Edition 2 and the MIDlet standard 2.0. That is apart from the
environment for running Java the system should support a basic set of services and a
minimum value of memory defined in MIDlet specification. These types of mobiles
can be readily obtained as most Java enabled mobiles fall into this category.
This report will provide details on the Sinhala Mobile Framework. It will first
elaborate on the need for such a solution and its worthiness and applicability. In the
second chapter work related to this project is presented. In this section current
solutions for both proposed and used are highlighted with their advantages and
disadvantages.
The Chapter 03 will give a general explanation of the technologies used in the
proposed solution. It will give a high level overview of the technology where it fits in
the total solution and the current state of the solution. Chapter 04 discusses how the
details of the solution and array of technologies will achieve the desired target.
Technologies used for the solution and how it will fill the gaps and deliver the desired
solution. Next the Chapter 05 discusses about the design of the solution and finally
chapter 06 discusses the implementation done so far. Finally chapter 08 will contain
some details of further work and conclusion.
4
Chapter 2
2.1 Introduction
There are several methods currently used to display Native or Non-Latin characters.
Some of these are standards and some are not standards but are adopted by different
venders and service providers. This section discusses the advantages and
disadvantages when using each of these methods. Some of these described methods
are applicable for a limited number of devices or scenarios making it difficult to be
deployed in a wider variety of devices.
Conversion of text based content into an image was once a widely used method to
display content on desktop computers especially through the web. Even today some
web design practices advocate the design of a web page as a image and slicing it into
parts and delivering the page on the location part by part and assembling into form the
picture. In the early days of web this was done mainly to cope with differences HTML
rendering in browsers. Later this extended into the delivery of pages with non Latin
fonts. This was done through pure web pages or other means such as PDF with
embedded fonts. The adaption of this method into mobile devices was not done purely
because of inability display native fonts but these devices where not supporting
styling standards the browsers support and often displayed the content in a mangled
manner. Often it is not possible to predict the width of a mobile display and the
content is restructured in an unpredictable way. Though CSS standard specify
constructs to deal with such limitations mobile browsers hardly adhere to the full CSS
standard. Further the method can be enhanced by dividing the image and delivering
5
each part of the image [3].
The obvious advantage of this method is the ability of the mobile device to display the
content independently of the applications running on the mobile device. The
requirement is to have a simple web browser that is able to display images. No special
software is needed at all. In such a service news is converted to images on the fly.
When a request to a particular page is received through a mobile, it is passed through
a gateway into the server. The special software running on the gateway converts the
content into an image or a set of images to be loaded into the browser.
Despite these advantages this method of displaying information has certain draw
backs. The first is concerned with bandwidth. The bandwidth available for a mobile
device is limited and they are often billed for the amount of information rather than a
fixed amount. So an increase in the amount of data downloaded will affect users.
Image content is always expensive than textual content. Converting textual
information into pictures increases the size in an exponential order. For example a one
100 pixel by 100 pixel image of 24-bit color will amount to 230KB and will only be
able to display about 50 characters. On the other hand using a Unicode encoding
scheme the amount of space required to 50 characters is not more than 200 bytes.
Even though special encoding scheme with a single color and compression can be
introduced, they fail to compete with actual text size. Displaying of text as images
poses the fundamental problem of selecting, searching and copying into another place.
Secondly the mobile device needs to have a considerable amount of memory and
processing power to display the image. It should have enough memory to keep the
whole message in the memory of the mobile device and should facilitate features like
scrolling when desired by the user altering the view port of the image.
Using custom fonts (i.e. fonts created by an application developer that is not
according to Unicode standard) is another way of displaying non-Latin fonts in a
computer or a mobile device. Unlike the method described previously this method is
only applicable for high end mobile devices which often has the capability to install
fonts through an operating system.
In order to understand this method it is helpful to know how such fonts are displayed
in a device. A font is a mere mapping between a binary value and a glyph (a pictorial
6
representation of a font). In order to display characters, what a computer does is
convert each binary value represented in the data into corresponding glyph. For
example under a certain font if hexadecimal x41 represents the character ‘A’ it
displays the a pictorial representation of it. Also in anther font if hexadecimal x41
represents a Sinhalese alphabetical character it displays this character pictorially. But
in practice the lower part is used for Latin fonts and upper part is mapped into a non-
Latin font. The subtle point to note is in order to correctly display the characters the
exact font used to create the font (or one that is compatible) should be present in the
target platform. Otherwise meaningless words will be displayed as in some web sites
using this method.
However there are some advantages of this method. That is being able to display
many varieties of fonts using a single byte character encoding scheme. But this
overwhelmed by the difficulties associated with this scheme. Mainly there are two
difficulties associated with this method in the context of mobile devices. The first is
on a system to display the font correctly it must be installed. Normally the fonts
installed in a system depend on the device. Unlike desktop operating systems mobile
firmware does not have capabilities to install fonts.
The second problem these types of fonts poses is the problem of coexistence. If a
single document is composed of different types of fonts then the representation should
have an appropriate mechanism to handle such event such as a special tag to represent
the type of font to be displayed. Lastly these kinds of fonts are unable to support
languages consisting of large alphabets. Since many Asian Languages consists of
more than 256 characters this method is unable represent such an alphabet.
The most technically suitable solution is to use a mobile phone that supports Unicode
standard and Sinhala Language. Such a device should support display of fonts with
multi byte encoding schemes and should have built in support for Sinhala Font.
These kinds of mobile devices fall into two distinct categories. One set is the feature
rich phones that run an operating system like Windows Mobile or Symbian OS. These
mobile devices support installation of fonts that enable them to display any Unicode
character after installing one font that maps it into a glyph. It is a good solution if all
users use these kinds of mobile devices but these mobile devices are expensive and
7
represent only a mere fraction of the mobile users in Sri Lanka.
The second kind is phones which does not have font installation capabilities but
comes with Unicode support and pre installed Sinhala support. Nokia has introduced a
breed of mobile phones which fall into this category [5]. Rather than having
capabilities to install any kind of Unicode compliant font these have Sinhala Fonts
embedded into their firmware. Though these kinds of mobile devices are not as
expensive as ones mentioned previously they only represent a small fraction of mobile
users in Sri Lanka. Native Language support in these phones is just a feature that
users are willing to consider over other feature that is not in these kinds of phones.
Another approach used by some web and mobile application is to use English
alphabet to represent Sinhala words. This is the common practice by many who use
services like SMS in Sri Lankan context. There are services available online to
convert Sinhala into this format [2]. It works without any trouble since the mobile
phone has to only display Latin fonts. Though this sounds attractive most users are
not familiar with this format. Also there are hard nearly the word for different words
which may be confusing for a person who is not familiar with this language.
2.6 Summary
As highlighted above though there are several methods to display Sinhala fonts on
mobile device these fail to support a wide number of devices or to support core
standards like Unicode. The purpose of the project will be fill the gap in this regard
and create a enabling framework.
8
Chapter 3
3.1 Introduction
This chapter discusses about the enabling blend of technologies for the Sinhala
Mobile Framework. As mentioned in the section 1.2 of this document the project have
several objectives: supporting the widest range of devices possible and using existing
components to deliver a solution for a variety of mobile devices. For this purpose
several technologies are used and adopted. This section describes these technologies
in a brief manner highlighting the importance of its features relevant to the project to
get an idea why it was adapted.
With time, the need to represent other characters that are not in the Roman alphabet
was aroused. For example in Sri Lanka with the spread of computing technology in
1985 a Council was formed to oversee the development of languages standards. One
of the basic tasks that this council was met was to introduce a character encoding
scheme for Sinhala. Since the ASCII standard was well established any encoding
scheme should be backward compatible with it. The initial approach of this committee
was to define alphabet and encode it with 0xA0 to 0xFF. This scheme was introduced
so that it was backward compatible with ASCII standard. The first 128 character of
9
these encoding schemes were same as ASCII and the values greater than 128 were
used to represent these extended characters.
There were few limitations with this approach. First in such a character encoding
scheme only two or three languages can be facilitated. In the above mentioned
encoding scheme only Latin and Sinhala letters can be encoded. So any document was
limited to have a few number of languages. Further there were many such schemes to
encode different languages in the world. The first few bytes were used to encode few
Latin characters and the upper part was used to for various languages forming a set of
ASCII compatible encoding schemes. The result was the difficulty of exchanging
documents because different software encoded different characters with the same
binary value.
In order get out of this chaos the Unicode standard was introduced. The intension of
this standard is to assign a unique binary value for each character for all languages in
the world in a platform and application independent way. The Unicode standard is
based on the ISO/IEC 10646 standard [6,8]. Strictly speaking Unicode does not define
a binary format on how data should be represented. It assigns each character a
number. This number can be is represented in binary through different encoding
schemes. For example Unicode defines the value of letter ‘A’ as U+0041 and it is not
the same as representing it as a binary value of 0x0041. The actual representation
should be done through an encoding scheme that converts U+0041 into a number.
Though the initial standard assigns values that can be encoded in 16-bit value this can
be increased if there is a future need.
With the Unicode standardization the signs used to write Sinhala was standardized
and as an effect also the alphabet was standardized. There was a controversy among
language specialist over what characters should be in the Sinhala alphabet. Some
argued that obsolete letters to be thrown out of the language others advocated for new
letters to pronounce foreign language words. There was no agreement on how to do
this but after the standardization with Unicode we all have an alphabet that many
agree on. The Unicode standardization of Sinhala not only included all Sinhala
characters but character modifiers. It has divided the language symbols into six parts
given bellow.
1. Various Signs
2. Independent Vowels
10
3. Consonants
4. Sign
Some of the encoding schemes use to encode Unicode values are UTF-8, UTF-16 Big
Endean, UTF16 Little Endean, UTF-32 Little Endean and UTF32 Big Endean. The
UTF16 Big Endean and Little Endean Unicode encoding schemes represent the
original value presented in the Unicode standard as either a Little Endean or Big
Endean two byte number. In this scheme all characters are of uniform length of 2-
bytes.
The drawback of the above UTF16 was it will require twice the size of ASCII
encoding system to encode a simple Latin character only file. Since most current
documents in the world are written in Latin characters a huge amount of space will be
wasted. In order to cope with such situations UTF8 can be used. UTF8 uses one to
four bytes to encode a single character but UTF-16 uses 2 or 4 bytes and UTF-32 uses
4 bytes to encode a character. This division of encoding schemes is to reduce the
space requirements of files and to be backward compatible with ASCII text created
before. Currently the most used encoding scheme for Unicode is UTF8 because it is
backward compatible with ASCII and requires the same space as ordinary ASCII
required.
For UTF8 encoding scheme the Unicode values is divided into 4 spaces. Each of these
spaces will be represented with one, two, three or four bytes. The single byte character
range is U+0000 to U+007F. In writing this is to a single byte the least significant bit
is represented by the least significant half of the octet and the value more significant
part which can be from 0 to 7 is represented by the more significant three bits. The
11
first bit is set two zero by default. The two hexadecimal values are distributed into a
byte value form of 0xxxyyyy where xxx represent a value between 0 and 7.
The range between U+0080 to U+07FF is represented by two bytes. The value of the
two bytes are of the format 110xxxyy 10yyzzzz. Here z values are filled with the
value of the least significant value of the hexadecimal and y values with next and x
values with latter. Note that only three bits are needed to fill a hexadecimal value from
0 to 7.
The range between U+07FF to U+0FFF is represented by three bytes. The value of
the three bytes are of the format 1110wwww 10xxxxyy 10yyzzzz. Here z values are
filled with the value of the least significant value of the hexadecimal and y values
with next and x values with latter and w values with the most significant bits. For
example let us consider the value of the U+0D85. It becomes 1110 | 0000 the first bit
10 | 1101 | 10 for the second byte and third 10 | 00 | 0101. So the value is three byte
sequence E0 B6 85. UTF32 is able represent values ranging from U+10000 U+0FFFF
which is useful when extensions are needed for Unicode.
Also when reading a byte stream a byte starting with bit sequence 1110 indicates that
next two bytes are part of this character and a value bit sequence 110 indicates the
next byte is a part of this character. Also if a byte starts with bit sequence 10 it means
the previous byte is a part of it. The framework will extensively target at UTF8
encoding scheme and there will be extensibility for other forms as well by specifying
an encoding scheme with a class.
Java 2 Mobile Edition is the smaller devices and is member of Java family like Java 2
Standard Edition and Java 2 Enterprise Edition. It tries to transfer the Java philosophy
of ‘Write once run anywhere’ into realm of mobile devices. This will be invaluable
since mobile devices are full of inconsistencies and differences when compared with
their desktop counter parts.
J2ME is a platform rather than a single piece of software that runs on a single
computer. Since of the variety of the underlying platforms in mobile devices it is hard
to define a single standard for J2ME covering all environments similar to Java 2
Standard Edition. The problem is tackled by introducing a number of standards which
12
govern some aspect of the platform like configurations, profiles and optional APIs
device may supply like location services Media libraries etc [4]. Device manufactures
are responsible for porting a Java Virtual Machine (a smaller JVM than run on
desktop machines) for their device and complying with a specific configuration,
profile and a set of optional APIs if needed.
Configurations cover the very basic features of a device. It defines a JVM that
supports a specific set of facilities that are core facilities. Currently there are two
types of Configurations Limited Connected Device Configuration and Connected
Device Configuration. CDC is intended to be used for electronic device that have
relatively high power and strong network connectivity like home electronic
appliances. The LCDC is indented for mobile phones and Personal Digital Assistances
that will have a limited capacity [4]. A glimpse of the standards and devices
implementing the standard is given in figure 3.1.
Smaller Larger
Personal Profile
Assistant Profile
Personal Digital
Device Profile)
PDAP
MIDP
Foundation Profile
On top of Configurations lies a profile which is more relevant for J2ME application
development. A profile is layered on top of a configuration, adding the APIs and
specifications necessary to develop applications for a specific family of devices.
Several different profiles are being developed and being used. The Foundation Profile
13
is a specification for devices that can support a rich networked J2ME environment. It
does not support a user interface; other profiles can be layered on top of the
Foundation Profile to add user interface support and other functionality.
Layered on top of the Foundation Profile are the Personal Basis Profile and the
Personal Profile. The PDA Profile (PDAP), which is built on CLDC, is designed for
palmtop devices with a minimum of 512KB combined ROM and RAM (and a
maximum of 16MB). It sits midway between the Mobile Information Device Profile
(MIDP) and the Personal Profile. It includes an application model based on MIDlets
but uses a subset of the J2SE Abstract Windowing Toolkit (AWT) for graphic user
interface. The J2ME world currently is covered by MIDP on the small end and
Personal Profile on the higher end.
The focus of this project is to develop a framework for devices that are complaint to
Mobile Information Device Profile (MIDP). The MIDP 2.0 specification which is
defined in JSR-118 a Mobile Information Device must have the following
characteristics:
Many of the currently available devices that are marked as “Java enabled” are MIDP
2.0 and new devices will be backward compatible with this standard.
This section tries to briefly discuss how computer fonts are displayed. Though mobile
devices consist of raster displays that can display native fonts it is little use for the
purpose of adding font support into the mobile. A more software oriented approach is
needed and for this purpose computer fonts are used. Computer fonts can be added or
14
removed from a system by adding or removing files together with configuration. A
computer font is a data file containing a set of glyphs or pictorial representation of
symbols used in a particular language. There are several ways to store information
regarding the pictorial representation. Based on the technique used computer fonts can
be divided into three types which are explained below.
Bit map fonts consist of a series of pixels representing the image of each glyph. There
are different files for sizes and each file stores scaled version of the font in it native
size. For example it a glyph of 12pt and 20pt is needed to files is needed or one with
20pt and a algorithm to scale it down is needed. The advantage of using bit map fonts
is it is easy process and is fast compared to other formats. Also for environment with
low resources bit map fonts are used and it gives the exact output.
Though most bit map fonts are shown in one color on screen several colors can be
used to give anti aliased effect with rough edges. Even current software on modern
high resource systems use bit map fonts for example Windows safe mode supports
only bitmap fonts. There are several types of bitmap fonts and it is technique used for
creating the framework. For the purpose of the framework a custom format of font
was developed.
Vector fonts or outlined fonts are a another type of font. It is the technique used in
most modern computing and high end mobile systems to display fonts. The idea
behind Vector fonts is to model the glyph of a character into a collection of equations.
Rather than storing the picture of the glyph a mathematical representation is stored.
The advantage of such an approach is the ability to scale the glyph only having a
single file. Vector fonts produce sharp edged fonts for a variety of sizes and are used
extensively by word processors and graphics designing programs. But the
disadvantage of this type of font is the burden on the system resources. A considerable
amount of mathematical processing is to be done to scale and construct the font. Apart
from this cost the system should have special libraries to convert the mathematical
representation into a raster image before displaying.
15
In order to create fonts for mobile environment it is necessary to create binary objects
that embody binary details. Though there are commercial and open source tools
available for creating and converting fonts into bit map font formats they did not cater
the needs of this project. This was mainly due to two reasons. First most tools
converted fonts into some well established bi map font format. If there are to be used
in a mobile a special library must be used. Second is the problem with extended
character sets. Tools that converted fonts into bitmap fonts assumed that the supplied
font’s files composed of only one byte character encodings. So they were unable
convert fonts into a usable format. In order to overcome this difficulty a Binary
Object editor that comes with J2ME Polish was adapted.
A Binary object is a collection of data that has meaning only when interpreted with
the knowledge of what each field holds. The main advantage of using binary objects
over human readable data is it is readily manageable after reading out from a
streaming without the need for special libraries.
Since these there is not metadata explaining what is stored in a binary object it should
be specified externally so that they can be formatted in a human readable way. In the
J2ME Polish binary editor this is done through an XML file [1]. The format of the
XML file is given below.
<data-definition>
<extension>.bmf</extension>
</data-definition>
The root tag of this file is the <data-definition> and it consist other fields like
extension and a description to be added for the definition. After them the actual data
that should be in the binary object should be specified. Every field should have a
16
name, a type and the number of elements it should have. The type can be one of types
that is built into the binary editor or a full qualified class name for a class representing
the object.
17
Chapter 4
4.1 Introduction
This chapter discusses the approach taken for implementing the Sinhala Mobile
Framework. As mentioned in the previous chapter several technologies are being used
to in this project. The approach will discuss how these technologies are used for
constructing the solution so that the objectives are met.
As we have discussed all solution for discussing a particular font is application based.
When an application programmer faced with the task of displaying a set of non Latin
characters he creates a set of usable libraries or custom font and display the contents
on the device. This method is limited to this environment to display font. For another
a different method is used needing to re implement another solution from group up as
shown in right of figure 4.1. Such solutions tend to introduce inconstancies and the
lack of standards lead to distribution issues and incompatibility between platforms. So
a different approach is taken in this project.
Sinhala Enabled
Sinhala Enabled Sinhala Enabled Sinhala Enabled
Message reader
Browser Browser Message Reader
18
viewer that shows Sinhala Unicode messages.
One of the worst problems all mobile applications faces is the problem and the
assumptions made of the underlying system. As it was highlighted in the insight
mobile platforms are vastly different when compared with desktop machines. Another
point adding to this difficulty is with new technology more and more advanced
mobile devices come into the market. So developer is faced with the problem of
which devices are to be targeted. Catering all mobile phones is not possible and there
are tradeoffs to be made in terms of features if a large supper set of mobile devices are
chosen.
This framework is build on top of a JVM. A JVM is piece of software that runs in a
system that is able to run high level instructions known as byte code that are produced
through compiling Java source code. Many mobile device vendors support a JVM
which made their programs Java enabled. It is the duty of the mobile device producer
to port suitable JVM in to their mobile platform and implement the standard support
needed. There are several levels of JVM support based on the resources available in
the mobile platform. Based on this support JVM and memory availability of the
mobile hardware devices made subcategorized. The Connected Device Configuration
and Connected Limited Device Configuration are such standards. They are further
categorized based upon API support they provide. The MIDP 2.0 specification a
specification falls under CLDC and is the defacto for Java enabled mobile phones. It
specifies a minimum amount of memory and some basic class built into the Java
Virtual machine that can be readily accesses through all applications. Also through the
process of JSR additional layers of support is added. Since this framework is intended
to run on mobile phones compliant to MIDP 2.0 specification, a large amount of
mobile devices will be supported.
In Chapter 02 in the discussion of current solution it was pointed out that current
solution are spread throughout the server to mobile device. Also in the device they run
on different layer ranging from the internal firmware to high level application code. If
a framework is to be independent of the device it run on it should be in a higher layer
and must not depend on platform specific services.
19
This phenomenon that the framework runs on application is realized by two means.
First the entire code base is written in Java programming language and does not use
any System specific feature. As second point it does not depend on the font displaying
does not make any assumption on non Latin character support in the running
environment. The second point is vital to support a wide range of devices as there can
be a wide variety of platforms with differing degrees of support for displaying of
Sinhala fonts. Normally the display of font is handled by the firmware of the
underlying platform and in this case the framework itself is having the files necessary
to display the font. The framework will request the underlying system to display the
font in as image so that the system can carry out the task with ease.
The implications of the framework being run as an application have some more
advantages. It can readily install or uninstalled or upgraded with ease. It will affect the
underlying system minimally.
A central part of the Sinhala enabling framework will be store font information. As
described in Chapter 03 under font technologies there are a number of ways to
represent font data in a file. Particularly in bitmap font a picture two together with
some metadata about the font should be stored. Since of the nature of Sinhala font
there should be a considerable number of font glyphs and related information should
be stored. In a mobile platform efficient data storage and retrieval is vital for
applications good performance.
The method used by this framework is to store variables and objects directly into file
in a binary format rather than converting it into a human readable format. Java
provides interface to serialize basic type and objects so that they can be written into to
file. These files written into file can be read as what is needed and casted into objects.
The advantage of using such an approach is it reduces the size of data which is critical
is this sense. As there are no information in what is written into the file the values are
free to be read as anything. In this case it does little effect that these objects will be
read by the application itself and it is hard corded into it.
20
Chapter 5
5Design of Framework
5.1 Introduction
This chapter deals with the design of the Sinhala Framework. With a system with
considerable complexity there should be both a design so that development of
modules does not conflict with each other. The first section gives a glimpse of a
environment where the mobile framework will fit in and the next section explains the
module details.
Figure 5.3 The Sinhala Mobile Frame Work in use - The Big Picture
As the architecture of figure 5.1 shows a news service with special capabilities is
operated so that news can be entered in Sinhala. This news is delivered to the user in
21
two modes. One is through a web interface. The other is through SMS (Short
Messaging Service). The web pages are transferred through WAP rather than HTTP
because it is much lighter protocol suitable for mobile communication. The WML
page is accessed through a special browser made utilizing the Sinhala Framework. It
will be able to display Sinhala fonts encoded in Unicode Language.
Another application residing in the mobile device will be used to read SMS messages
received for subscribed users. This application will also use the capabilities of the
Sinhala Framework.
APIFramework
Sinhala Mobile
BitMapFont SinhalaFontMapper classes
The major components and their interaction are show in the figure 5.2. So when an
application needs to display fonts in Sinhala it delegates the task to Mobile Frame
work. The Framework will consist of mainly two features. One will be list of Unicode
characters mapping each character into a glyph and a set of Bit Map Fonts that are
representation of the Font to be represented. Based on the value it receives from the
upper layer it will display the appropriate glyph.
22
5.4 Representation of Bit Map fonts in the System
In this section the public interface for the users of the framework will be given. This
section will not deal with the implementation details of these classes or features. The
implementation is discussed on chapter 06.
The font file in the system is way a font is represented in the static way. When an
application is running in memory the fonts should be active. Also words should be
generated and they should be displayed when an application does that. Obliviously
there should be a dynamic way in which fonts are represented in the system. For this
purpose classes are to be defined.
The first class is the BitMapFont class represents some a bit map font in the view of
the programmer and its public interface is given in figure 5.3. First the BitMapFont
class has a method to get an instance of BitMapFont through passing the named
filename of the font. If we need to display fonts of different sizes this can be easily
achieved by passing the font file name for this method.
BitMapFont
+getInstance() : BitMapFont
+removeInstance()
+getCharWidth() : int
+getStringWidth() : int
+getBitMapFontViewer() : BitMapFontViewer
As noted in the previous section since mobile devices have a very low amount of
memory the object created thus should be able to reclaimed as desired by the
programmer since they take a considerable amount of space. This can be achieved
through a method which will query for thus for created fonts and remove them. Apart
from these two methods the public interface of the BitMapFont class has three other
useful methods. The first returns the width of an indiual charcter the second width of a
entire String and the third returns a BitMapFontViewer for this font.
The BitMapFont Viewer is the actual worker class that does the actual work of
displaying some font and its definition is given in figure 5.4. After properly
initializing the internal data of the BitMapFontViewer class its paint method should be
called with the Graphics object of the current context with the x and y coordinates of
the screen.
Instances of the BitMapFontViewer class are not intended to be made by the end user
23
but from the BitMapFont class through getViewer method. Apart from this the most
important method is the paint method which will be called by when the actual
BitMapFontViewer
+BitMapFontViewer() : BitMapFontViewer
+paint()
+layout()
+getWidth() : int
+setHorizontalOrientation()
+getHeight() : int
work should be carried out. All other methods are indented to adjust the formatting of
the font, and get information about the drawn fonts on screen.
There are two other class hierarchies that need for proper working of the framework.
An outline class diagram is provided in figure 5.5 defining the relationships among
classes. The first of this will be Object of implementing a mapping interface. The
mapping interface will contain information on how to obtain the numerical value of a
encoded Unicode stream and the information need to form a glyph from the internal
representation.
SinhalaFontMapperFactory SinhalaFontMapper
The other class is the ByteStreamParser class that will parse a byte stream and detect
the presence of Sinhala Unicode characters. In order to create a ByteStreamParser a
SinhalaFontMapper instance must be specified. Based on the information in this
instance the ByteStreamParser will parse the supplied byte stream and return
instances of sequences of Sinhala characters.
A font file contains a set of glyphs and symbols to be matched with binary bytes. So
24
this file should contain information on pictorial representation of each glyph together
with other metadata needed to display the font. As explained in the chapter 03 there
are mainly three broad categories of font technologies and all these font files should
have this data stored in some in font files.
First it is wise to take a look at the Sinhala alphabet, its encoding schemes and how
glyphs can be formed. As scholars we learned that Sinhala alphabet consists of 60
letters consisting of 18 vowels and 42 consonants. The combination of vowels and
consonants create the collection of characters in written Sinhala. The original 18
vowels and 42 consonants are combined to yield all letters that we normally write.
The combination yields about 42x18 letters. But the Sinhala letters are more
complicated. We have other constructs such as “YANSAYA” and
“RAKARANSHAYA” which are not unique characters but consonant modifiers.
A brief explanation of Unicode was given under Chapter 03 and the Unicode scheme
for encoding Sinhala is given in Appendix C. As noted there though Unicode specifies
a unique two octet value for Sinhala characters different encoding schemes are used to
actually represent this value in documents. UTF-8 is one such scheme and there are
other schemes such as UTF-16, UTF-32 or encoding pure Unicode values as big
Endean or little Endean numbers.
The inspection of Sinhala character glyphs and the standardized Unicode alphabet
reveals that there more work to be done in an implementation to represent the actual
glyphs. In Latin alphabet character glyphs and the alphabet has one to one
representation. If a file is encoded ASCII with the set of values these values can be
exactly matched to a set of glyphs to represent it. If Sinhala glyphs are to be
represented in such a one to one fashion the character set will exceed 700 and will
introduce more problems. For example we have a practice of writing ‘beddi akuru’ to
get the effect of SINHALA SIGN AL-LAKUNA. If these are to be considered it will
lead to an enormous number and features like alphabetical sorting based on character
25
encoding will be more complex than the current case.
Now let us consider how Sinhala glyphs can be formed with character encoding
scheme. First all the vowels in Sinhala alphabet can be mapped directly with vowel
glyphs. Sinhala vowels are not modified with any other modifiers. Further some
vowels can be considered as a combination of two or more elements and if these
elements are stored separately they can be reused. Now let us consider the how the
vowel can be formed through storing glyph elements.
The figure 5.6 illustrates how the vowels of Sinhala languages can be formed using a
limited set of glyphs. If we consider the composite characters to me composed of sub
glyphs the number of glyphs needed to represent reusing these glyph components as
seen. If the distinct characters are counted the number is 17 but these components can
also be used with the formation of consonants.
In a mobile environment storing all Sinhala glyphs is not practical due the limitation
of device memory. Each of these glyphs should be given a unique number so that they
can be use for formation of the glyph. So the internal representation of a single glyph
will need either 2 or 4 bytes. Following the discussion of consonants it will be
revealed that this can be even 6 bytes.
26
The task of forming consonants is challenging than the formation of vowels. The
vowels are limited to 18 and there are no modifiers for vowels but consonants are
modified with a huge number of modifiers depending on the character they are
appended to. As an example let us consider the first consonant and how it can be
modified form the needed consonants.
In this case six distinct glyphs can be seen in figure 5.7. First it is the pure consonant
that is the first character appearing. Secondly the character combined with vowel
‘Ayana’ and third and fourth combination of the letter with ‘ SINHALA LETTER
IYANNA’ and ‘SINHALA LETTER IIYANNA’. Lastly the combination of the vowel
‘SINHALA LETTER UYANNA’ and ‘SINHALA LETTER UUYANNA’ yields another distinct
character.
These rules apply for the vast majority of consonants in the Sinhala alphabet though
27
there are few exceptions. In summary if one of above mentioned modifiers are present
in a character a new glyph for the base character will be chosen. The figure 5.8 below
summarizes these modifiers.
There are of exceptions to rules mentioned above. These five paragraph will discuss
some exceptions. A notable consonant is the ‘SINHALA CONSONANT RAYANA’ as
seen in figure 5.9. In order to represent the letter it is not enough to store the basic six
forms of the letter.
The notable difference is that the letter combined with SINHALA VOWEL
AEYANNA and SINHALA VOWEL AEEYANNA forms a different variant which
should be addressed differently. These letters form a exception and will needed to be
handled differently.
Apart from these letter there are some special glyphs needed to represent some
characters. Actually these are like escape sequences when a specific sequence of
28
letters come together a special symbol sequence is added. The Sinhala letter
YANSAYA is one such letter. When a consonant except SINHALA RAYANA is
modified with SINHALA SIGN AL-LAKUNA and followed by a SINHALA
LETTER YAYANNA both letters are replaced with a new character sequence formed
by the consonant without the SINHALA SIGN AL-LAKUNA and a special symbol
called SINHALA SIGN YANSAYA. The SINHALA SIGN YANSAYA does not have
a Unicode encoding. But it is represented using a special character coming in middle
of the two letters. This letter is called the ‘Zero width Joiner’ and is outside the
Sinhala Unicode range as it is used by many other languages of this family.
A similar rule is applied for the sequence SINHALA RAYANA + SINHALA SIGN
AL-LAKUNA + <ANY CONSONANENT> is replaced with a Zero Width Character
in the middle to indicate the use of SINHALA SIGN RAKARANSAYA. Both these
rules form an exception and should be handled in creating the appropriate glyph.
29
Chapter 6
6.1 Introduction
This section describes the implementation of the framework. Previously the API given
to programmer was explained. The implementation aims to hide the details of the
framework so that users can easily use the feature without worrying about internal
details of the framework. In this section the implementation details of the framework
will be highlighted.
As it was explained previously the custom bitmap font format was designed for the
purpose of Sinhala Mobile Framework. The entire file is a serialize java object
consisting of the following data definition. In this section the format and the creation
of bitmap fonts is dealt.
Figure 6.1 gives a screen shot of the binary object editor of the J2ME Polish
standalone tool with a custom created bitmap font. The binary object editor is capable
of editing the object itself without the need for writing it in code [7]. Further the
objects can be modified by adding fields without the need for recreating.
30
As explained in chapter 05 the format of the binary object is given in a XML file. The
XML file defining the format of the bitmap font is given below. It consists of a single
integer which specifies the number of glyphs present in the object first. Second a short
array called char map is present. The char map is responsible of storing unique 2-bit
number given to each glyph in order to represent it internally. The figure 6.2 shows a
screen shot of the image embedded in the font object.
<data-definition>
<description></description>
<extension>.</extension>
</data-definition>
A byte array of length count is also present to store the individual lengths of each
character glyph. Finally an image containing all the glyph is stored in a object format
javax.microedition.lcdui.Image.
In the program a bitmap font is represented with a object of the class BitMapFont
class. The public interface for the BitMapFont class had five methods. Thus far the
internal data structures of the class were not revealed. The most notable structures
used in the BitMapFont class are as follows.
The BitMapFont class follows a variation of the singleton design pattern. In this case
multiple BitMapFont objects can be present but it is desired to have only one class for
a particular font file. In order to achieve this constructor is made private so that
instances cannot be made outside. A Hash map is used with the name of the font as the
31
key and a reference of the Font object as the value. All created instances of the
BitMapFont classes are registered in this Hash map. If further references are made for
this the presence is tested and if present that reference is returned. If not present a new
instance is registered with the references in the hash map and the reference is
returned.
Every BitMapFont instance has an Image object that stores the entire font glyph in an
image. This is the source of image glyph for the purpose of displaying. Apart from
this the widths of each font is store in an array. At the time of initializing this array a
separate array is created to hold the starting pixel value of each character. The internal
unique 16-bit value given to each glyph is also read and stored in the array.
In the early stages of the project, J2ME polish was tested for handling bit map fonts.
But this did not prove to be successful. The reason was it is able to handle only fonts
with a character map of 256 characters only. In order to display Sinhala glyphs more
than 256 characters are needed. Then the project needed to define its own format for
representing fonts.
In order to create the pictorial view of a string with glyphs another class is used. This
is the BitMapFontViewer class. After a BitMapFont object is initialized by calling the
getBitMapFontViewer() method returns a object of type BitMapFontViewer(). Calling
the paint() method of this object with arguments x, y coordinates together with
Graphic object to the current screen context will draw the font.
The other important method in the BitMapFontViewer class is the paint method. It is
this method which encapsulates the functionality needed to accessing the system
capabilities for drawing. It calls the methods of the Graphic object passed to it for
accomplishing this task. Apart from these methods there are methods for accessing
details of the fonts like height and width. The source code of this class is appended in
appendix C.
32
6.5 The SinhalaFontMapper Hierarchy
The SinhalaFontMapper is an abstract class that has the capabilities to decode a given
encoding into unique 2-byte numbers for character glyphs and identifying sequences
of character of the current glyph. The SinhalaFontMapper is implemented as abstract
factory. The corresponding factory class is SinhalaFontMapperFactory. This abstract
class can be the parent of any class that provides the methods for decoding and
finding characters in Unicode.
The important methods to consider in the SinhalaFontMapper class are the method to
convert a UTF8 character into a locally kept 16-bit character value and the method for
doing vice versa.
33
Chapter 07
The initial testing was done on the emulators that are packed with Sun Micro Systems
Wireless toolkit. There are several kinds of emulators packed with this toolkit with
differing screen sizes. The results yields were acceptable. The following screen shots
in figure 7.1 are taken from the testing of the product.
Apart from these testing several other tests were done. On was the use of a code
obfuscator to make the resulting executable jar file smaller. A code obfuscator
analyses the jar file and makes the variables used in the file smaller and cryptic. The
purpose of this is twofold. One is to reduce the size of the jar by renaming the
variables with shorter names. The other point is making it hard to reverse engineer. As
34
J2ME applications are small there is tendency of reverse engineering the byte code
into source. The obfuscator in this case was done in order to reduce the size of the
executable.
Lastly the memory usage and the number of instructions executed during running the
platform was examined for few cases. This was done in order to make sure that the
framework does not consume a lot of resources.
35
Chapter 08
The mobile framework developed is able to display the most commonly and
frequently used font glyphs. As explained in chapter 05 there are numerous rules in
Sinhala grammar like introduction of new characters like SINHALA SIGN
RAKARANSAYA and SINHALA SIGN YANSAYA which were explained in chapter
05. Apart from this such character is the SINHALA SIGN REPAYA that can be used
in place of SINHALA LETTER RAYANA with a SINHALA SIGN AL-LAKUNA.
Although this sign can be exclusively omitted from the Sinhala language it is most
used in older forms of literature and still considered grammatically correct.
These few exceptional rules are not properly handled by the current version of the
framework. It can be added to the framework as a different level of support. In order
to implement such a rule the conversion engine should beware of more previous
characters it encountered which require more information to be store in the device and
consumes more memory.
Finally Sinhala Unicode is not elegant standard that promises to solve many problems
there are many concerns for developers especially when it comes to displaying font
glyphs. But the benefits it provides it far reaching than these glitches. It is able to
make content available for a wide range of devices through a common standard
without the usage of custom fonts.
36
References
[1] Binary Object Editor Documentation. http://www.j2mepolish.org/. [Online] [Cited: 02 16,
2009.]
http://www.j2mepolish.org/cms/leftsection/documentation/programming/programming-
utilities.html.
[3] Ishida, Kazuo, Takada, Jun and Hiroaki, Toshihik. Document Distribution for Mobile Phone
by Divided JPEG Images. 2005, Vol. 29, 30.
[4] Li, Sing and Knudsen, Jonathan. Beginning J2ME: From Novice to Professional, Third
Edition. s.l. : Apress, Apr, 2005. ISBN10: 1-59059-479-7.
[5] Ratnayake, Gayan. Sri Lanka falls in love with Nokia’s Sinhala enabled handsets.
http://www.lankabusinessonline.com. [Online] October 02, 2006.
http://www.lankabusinessonline.com/fullstory.php?
newsID=1524104686&no_view=1&SEARCH_TERM=
[7] Vikus, Robert. Pro J2ME Polish: Open Source Wireless Java Tools Suite. s.l. : Apress, Jul,
2005. ISBN10: 1-59059-503-3.
[8] Weerasinghe A. R., Herath D. L., Gamage K. The Sinhala Collation Sequence and its
Representation in UNICODE. Localization Focus. 2006, Vol. 05, 01.
37
Appendix A
9Abbreviations
API Application Programming Interface
v
Appendix B
vi
Appendix C
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Hashtable;
import javax.microedition.lcdui.Image;
System.out.println(this.getClass().getResourceAsStream(fontUrl));
return;
}
DataInputStream dataIn = new DataInputStream( in );
this.length = dataIn.readInt();
System.out.println(this.length);
this.characterMap = new short[this.length];
System.out.println("Character Map");
for(int i=0; i<this.length; ++i) {
characterMap[i]=dataIn.readShort();
System.out.println(characterMap[i]);
}
/*this.spaceIndex = map.indexOf(' ');*/
this.characterWidths = new byte[this.length];
this.xPositions = new short[this.length];
vii
short xPos = 0;
System.out.println("char widths");
for (int i = 0; i < length; i++ ) {
byte width = dataIn.readByte();
System.out.println(width);
this.characterWidths[i] = width;
this.xPositions[i] = xPos;
xPos += width;
}
System.out.println("Over with char widths");
this.fontImage = Image.createImage( in );
this.fontHeight = this.fontImage.getHeight();
this.fontUrl = null;
} catch (IOException e) {
System.out.println("Unable to load bitmap-font [" +
this.fontUrl + "]" + e);
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
System.out.println("Unable to close
bitmap-font stream" + e);
}
}
}
}
viii
public static void removeInstance(String url) {
fontsByUrl.remove( url );
}
return width;
}
ix