Technical: Iso/Iec TR 15285

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30



TECHNICAL ISO/IEC
REPORT TR 15285

First edition
1998-12-15

Information technology — An operational


model for characters and glyphs
Technologies de l’information — Modèle pour l'utilisation de caractères
graphiques et de glyphes

Reference number
 ISO/IEC TR 15285:1998(E)


ISO/IEC TR 15285: 1998 (E) © ISO/IEC

Contents
Page
Foreword....................................................................................................... iii
Introduction .................................................................................................. iv
1 Scope .......................................................................................................... 1
2 References.................................................................................................. 1
3 Definitions .................................................................................................. 1
4 Character and glyph distinctions............................................................. 2
5 Operational model ..................................................................................... 3
6 Glyph selection .......................................................................................... 6
7 Summary..................................................................................................... 8
Annex A: Bibliography ................................................................................. 9
Annex B: Characters .................................................................................. 10
Annex C: Glyphs ......................................................................................... 14
Annex D: Font models................................................................................ 17
Annex E: Examples of character-to-glyph mapping ............................... 22
Annex F: Recommendations of the original report ................................. 24

© ISO/IEC 1998
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or
utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm, without permission in writing from the publisher.

ISO/IEC Copyright Office Á Case Postale 56 Á CH-1211 Genève 20 Á Switzerland


Printed in Switzerland


© ISO/IEC ISO/IEC TR 15285: 1998 (E)

Foreword
ISO (the International Organization for Standardization) and IEC (the
International Electrotechnical Commission) form the specialized sys-
tem for worldwide standardization. National bodies that are members
of ISO or IEC participate in the development of International Stan-
dards through technical committees established by the respective
organization to deal with particular fields of technical activity. ISO and
IEC technical committees collaborate in fields of mutual interest.
Other international organizations, governmental and non-govern-
mental, in liaison with ISO and IEC, also take part in the work.

The main task of a technical committee is to prepare International


Standards, but in exceptional circumstances a technical committee
may propose the publication of a Technical Report of one of the fol-
lowing types:

² Type 1, when the required support cannot be obtained for the


publication of an International Standard, despite repeated efforts;

² Type 2, when the subject is still under technical development or


where for any other reason there is the future but not immediate
possibility of an agreement on an International Standard;

² Type 3, when a technical committee has collected data of a dif-


ferent kind from that which is normally published as an Interna-
tional Standard (“state of the art”, for example).

Technical Reports of types 1 and 2 are subject to review within three


years of publication to decide whether they can be transformed into
International Standards. Technical Reports of type 3 do not necessar-
ily have to be reviewed until the data they provide are considered to
be no longer valid or useful.

ISO/IEC TR 15285, which is a Technical Report of type 3, was pre-


pared by Joint Technical Committee ISO/IEC JTC 1, Information
technology, Subcommittee SC 2, Coded character sets, and Sub-
committee SC 18, Document processing and related communication
(which has since been reorganized into SC 34, Document description
and processing languages).

iii
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

Introduction
People interpret the meaning of a written sentence by the shapes of
the characters contained in it. For the characters themselves, people
consider the information content of a character inseparable from its
printed image. Information technology, in contrast, makes a distinc-
tion between the concepts of a character’s meaning (the information
content) and its shape (the presentation image). Information technol-
ogy uses the term character (or coded character) for the information
content, and the term glyph for the presentation image. A conflict ex-
ists because people consider characters and glyphs equivalent.
Moreover, this conflict has led to misunderstanding and confusion.
This Technical Report provides a framework for relating characters
and glyphs to resolve the conflict because successful processing and
printing of character information on computers requires an under-
standing of the appropriate use of characters and glyphs.

Historically, ISO/IEC JTC 1/SC 2 has had responsibility for the devel-
opment of coded character set standards such as ISO/IEC 10646 for
the digital representation of letters, ideographs, digits, symbols, etc.
ISO/IEC JTC 1/SC 18 has had responsibility for the development of
standards for document processing, which presents the characters
coded by SC 2. SC 18 standards include the font standard, ISO/IEC
9541, and the glyph registration standard, ISO/IEC 10036. The Asso-
ciation for Font Information Interchange (AFII) maintains the 10036
glyph registry on behalf of ISO.

This Technical Report is written for a reader who is familiar with the
work of SC 2 and SC 18. Readers without this background should
first read Annex B, “Characters”, and Annex C, “Glyphs”.

This edition of the Technical Report does not fully develop the com-
plex issues associated with the Chinese, Japanese, Korean, and
Vietnamese ideographic characters used in East Asia. In addition,
although it discusses the process of rendering digital character infor-
mation for display and printing, it avoids discussing the inverse proc-
ess of character recognition (that is, converting printed text into char-
acter information in the computer).

iv
TECHNICAL REPORT © ISO/IEC ISO/IEC TR 15285:1998 (E)

Information technology —
An operational model for characters and glyphs

1 Scope ISO/IEC 10646-1: 1993, Information tech-


nology — Universal Multiple-Octet Coded
The purpose of this Technical Report is to Character Set (UCS) — Part 1: Architecture
provide a general framework for discussing and Basic Multilingual Plane.
characters and glyphs. The framework is
applicable to a variety of coded character 3 Definitions
sets and glyph-identification schemes. For
illustration, this Technical Report uses ex- For the purpose of this Technical Report,
amples from characters coded in ISO/IEC the following definitions apply. The defini-
10646 and glyphs registered according to tions have been extracted from the ISO/IEC
ISO/IEC 10036. 9541-1: 1991 and ISO/IEC 10646-1: 1993
standards.
This Technical Report
3.1 character: A member of a set of ele-
² differentiates between coded charac- ments used for the organisation, control, or
ters and registered glyphs representation of data. (ISO/IEC 10646-1:
1993)
² identifies the domain of use of coded
characters and glyph identifiers 3.2 coded character set: A set of unam-
biguous rules that establishes a character
² provides a conceptual framework for set and the relationship between the char-
the formatting and presentation of acters of the set and their coded represen-
coded character data using glyph iden- tation. (ISO/IEC 10646-1: 1993)
tifiers and glyph representations
3.3 font: A collection of glyph images
This Technical Report describes idealized having the same basic design, e.g. Courier
principles that were not completely followed Bold Oblique. (ISO/IEC 9541-1: 1991)
in coding characters for ISO/IEC 10646 and
in registering glyphs according to ISO/IEC 3.4 font resource: A collection of glyph
10036. The fact that ISO/IEC 10646, representations together with descriptive
ISO/IEC 10036, and other standards do not and font metric information which are rele-
completely follow the principles in the model vant to the collection of glyph representa-
does not invalidate the model and does not tions as a whole. (ISO/IEC 9541-1: 1991)
diminish the utility of having the model.
3.5 glyph: A recognizable abstract
2 References graphic symbol which is independent of any
specific design. (ISO/IEC 9541-1: 1991)
ISO/IEC 9541-1: 1991, Information technol-
ogy — Font information interchange — Part 3.6 glyph collection: An identified set of
1: Architecture. glyphs. (ISO/IEC 9541-1: 1991)

ISO/IEC 10036: 1996, Information technol- 3.7 glyph image: An image of a glyph, as
ogy — Font information interchange — Pro- obtained from a glyph representation dis-
cedures for registration of font-related iden- played on a presentation surface. (ISO/IEC
tifiers. 9541-1: 1991) [See the definition of graphic
symbol.]
ISO/IEC 10180: 1995, Information technol-
ogy — Processing languages — Standard 3.8 glyph metrics: The set of information
Page Description Language (SPDL). in a glyph representation used for defining

1
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

the dimensions and positioning of the glyph independently and contain terminology that
shape. (ISO/IEC 9541-1: 1991) requires explanation.

3.9 glyph representation: The glyph In information technology, characters are


shape and glyph metrics associated with a abstract information elements in the domain
specific glyph in a font resource. (ISO/IEC of coding for data representation and, in
9541-1: 1991) particular, data interchange. Coded charac-
ter set standards assign numeric values,
3.10 glyph shape: The set of information character names, and representative (sam-
in a glyph representation used for defining ple) images to each character contained in
the shape which represents the glyph. a coded character set. Typically a character
(ISO/IEC 9541-1: 1991) is given a name, which also serves to dif-
ferentiate it from the other characters of the
3.11 graphic character: A character, other coded character set. The precise semantics
than a control function, that has a visual and appearance of the information elements
representation normally handwritten, in any given implementation are not defined
printed, or displayed. (ISO/IEC 10646-1: by those standards for coded character
1993) sets. This apparent lack of definition is not
considered to be a defect in the standards.
3.12 graphic symbol: The visual repre- Recognizing that the information may be
sentation of a graphic character or of a acted upon (deciphered, sorted, trans-
composite sequence. (ISO/IEC 10646-1: formed, formatted, archived, presented,
1993) [See the definition of glyph image.] etc.) by many different application proc-
esses during its lifetime, standards for
3.13 presentation [of a graphic symbol]: coded character sets are defined as a basis
The process of writing, printing, or display- for information interchange.
ing a graphic symbol. (ISO/IEC 10646-1:
1993) In information technology, glyphs are ab-
stract presentation elements in the domain
3.14 presentation form: In the presenta- of presentation processing. The ISO/IEC
tion of some scripts, a form of a graphic 10036 standard for glyph registration de-
symbol representing a character that de- fines the process for assigning glyph identi-
pends on the position of the character rela- fiers, glyph descriptions, and representative
tive to other characters. (ISO/IEC 10646-1: (sample) images to each glyph submitted
1993) for registration. The precise usage and ap-
pearance of these presentation elements in
3.15 presentation surface: A virtual rep- any implemented font resource is not de-
resentation of a presentation medium fined by those glyph registration activities.
(page, graphic display, etc.) maintained by As with the coded character set standards,
the presentation process, on which all glyph this apparent lack of definition is not con-
shapes are to be imaged. (ISO/IEC 9541-1: sidered to be a defect in the standards.
1991) Glyph identifiers are unambiguously as-
signed as a basis for tagging presentation
3.16 repertoire: A specified set of charac-
elements in and among interchanged font
ters that are represented in a coded charac-
resources, recognizing that the font-specific
ter set. (ISO/IEC 10646-1: 1993)
design information may vary from one font
resource to another.
4 Character and glyph
distinctions Characters and glyphs are closely related,
with many attributes in common and yet
The character and glyph definitions in with distinctions that make it essential that
clause 3, which were taken from ISO/IEC they be managed in information processing
10646 and ISO/IEC 9541, were developed as separate entities. The ISO/IEC 10646
standard recognizes the distinction between

2
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

characters and their visual representation ² A glyph conveys distinctions in form or


by defining the term, graphic symbol. The appearance. A glyph has no intrinsic
graphic symbol of SC 2 standards and the meaning.
glyph image of SC 18 standards represent
equivalent concepts. However, glyph and its ² One or more characters may be de-
associated ISO/IEC 9541 terminology are picted by no, one, or multiple glyph rep-
preferred when referring to presentation and resentations (instances of an abstract
presentation processing. glyph) in a way that may depend on the
context.
The historical association of characters and
glyphs has resulted in character sets main- 5 Operational model
taining distinctions that cannot be founded
on distinctions in meaning, but only on dis- 5.1 Character and glyph domains
tinctions in shape. Similarly, the glyph regis-
tration authority and the SC 18 font re- Character information has two primary do-
source model have made use of criteria mains as illustrated in Figure 1 on the next
based on meaning to abstract potential dis- page. The first pertains to the processing of
tinctions in shape. In practice, ISO/IEC the content, that is, the meaning or phonetic
10646 contains characters that appear to be value of the character information. This is
instances of glyphs, while the glyph registry depicted on the left side of the figure. The
prescribed by ISO/IEC 10036 contains second pertains to the presentation of the
glyphs that appear to be designated as ab- content of the character information. This is
stract characters. In both cases, the ideal depicted on the right side of the figure.2)
nature of characters and glyphs has been Each domain places different requirements
compromised to a degree. For example, in on the representation of the character in-
ISO/IEC 10646-1, SC 2 coded the “¿” glyph formation. For example, searching for char-
into the character U+FB01 LATIN SMALL acter information in a database and sorting
LIGATURE FI “ ” for round-trip integrity with records containing character information
other standards.1) (See Annex B.5 The entail different requirements from those
“round-trip rule”.) Also, the JTC 1 Registra- found in presenting characters on paper.
tion Authority (AFII) for ISO/IEC 10036 The former processes are primarily con-
could have registered the same glyph iden- cerned with the content of data and have
tifier for the “$” glyph and used it for the little or no concern about the appearance
U+0041 LATIN CAPITAL LETTER A “$” charac- that the data may take.
ter, for the U+0391 GREEK CAPITAL LETTER
ALPHA “ ” character, and the U+0410 On the other hand, a composition and lay-
CYRILLIC CAPITAL LETTER A “ ” character. out process has little concern for the con-
However, AFII instead registered three tent of data, but great concern about its
glyph identifiers. appearance. In general, processing of char-
acter information in the content domain is
Within the realm of information technology, independent of font resources, whereas
an ideal characterization of characters and processing in the presentation domain is
glyphs and their relationship may be stated strongly dependent on the font resource
as follows: used for the presentation of the character
information. However, processes that per-
² A character conveys distinctions in form transformations from one domain to
meaning or sounds. A character has no the other are aware of both the content and
intrinsic appearance. appearance of characters. For example, a
character recognition process converts im-

1) This Technical Report describes a character in 
terms of its 10646 code position (U+FB01), its 2) ISO/IEC 6429 also depicts a 2-layer structure. For
10646 name (LATIN SMALL LIGATURE FI), and illus- ISO/IEC 6429, the data layer could use charac-
trates it with a representative glyph in quotation ters, and the presentation layer could use glyphs
marks (“ ”). to present the characters in the data layer.

3
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

/D\RXW

&KDUDFWHUV *O\SK6HOHFWLRQ

DQG6XEVWLWXWLRQ
*O\SKV

&RQWHQW $SSHDUDQFH
3URFHVVLQJ 2SHUDWLRQV 3URFHVVLQJ
2SHUDWLRQV EHWZHHQ 2SHUDWLRQV
'RPDLQV
'DWD(QWU\ )RUPDW

6HDUFK 'LVSOD\

6RUW2UGHU &KDUDFWHU 3ULQW

6SHOO&KHFNLQJ 5HFRJQLWLRQ

*UDPPDU&KHFNLQJ 0RXVH6HOHFWLRQ


Figure 1 — Character and glyph domains

ages into coded characters. Also, a pDUD The recognition that two separate domains
JUDSKOHYHO K\SKHQDWLRQ SURFHVV LV DQ H[ of processing are commonly applied to
DPSOHRIDOD\RXWSURFHVVWKDWUHTXLUHVFRQ character-based information leads to a con-
WHQWLQIRUPDWLRQ clusion that two primary forms of this infor-
mation are needed:
It is not possible, in general, to code data in
such a way as to optimize one process 1. a content-oriented form that is amena-
without reducing the performance of other ble to immediate content-based proc-
processes. Even within the content domain, esses and that can be easily converted
the nature of the character coding employed to and from other optimized forms
for textual data affects the type or types of
processing to be performed on the data; no 2. an appearance-oriented form that facili-
single coding can optimize more than a few tates imaging of content
such potential processes. Given this situa-
tion, the best solution is to formulate an These are, respectively, the character-
independent, logical character coding that, based form and the glyph-based form. Fail-
when necessary, can be transformed into ure to recognize this distinction between the
another coding more amenable to the proc- character domain and the glyph domain has
essing required. For example, in the case of led to the development of inconsistent stan-
searching, character data is often recast dards and inconsistent systems that lack
into specific forms that facilitate quick functional separation of the two domains.
searches. For sorting, a specially created
sort key is required. In addition, because 5.2 Composition, layout, and
ISO/IEC 10646 contains glyph-like charac- presentation
ters, it is expected that implementations
may choose to canonicalize or normalize As depicted in Figure 2 on the next page,
such characters by translating them to nor- the composition and layout process (for
mative characters. A presentation subsys- glyph selection and positioning) spans both
tem that employs such a technique may processing domains. If attention is restricted
require that character data be normalized to the text portion of this process, the pres-
prior to presentation. entation of character-based information
requires three primary operations:

4
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

&RQWHQWEDVHG 6RUWLQJ

3URFHVVLQJ 6HDUFKLQJ

6SHOO

FKHFNLQJ

'DWD &KDUDFWHU &RPSRVLWLRQ

(QWU\ ,QIRUPDWLRQ

/D\RXW

%RWK 3UHVHQWDWLRQ
,QIRUPDWLRQ
JO\SKLGHQWLILHUV

$SSHDUDQFH 'LVSOD\LQJ

EDVHG

3ULQWLQJ

3URFHVVLQJ


Figure 2 — Composition, layout, and presentation

² selecting the glyph representations many.3) This is particularly true for ISO/IEC
needed to display character data 10646 implementation level 3, which uses
combining characters. In its fully general
² positioning the glyph shapes on the form, the relationship is a context-sensitive
presentation surface M-to-N mapping where M > 0, N ˜ 0. For
some characters in ISO/IEC 10646-1, for
² imaging the glyph shapes example, the U+FEFF ZERO WIDTH NO-
BREAK SPACE character, no glyph (N=0) is
Glyph selection is the process of selecting defined.
(possibly through several iterations) the
most appropriate glyph identifier or combi- The SC 18 document-processing model
nation of glyph identifiers to render a coded separates the glyph selection and layout
character or composite sequence of coded operations from the operation of imaging
characters. Coded characters and their as- the glyph shape to permit document inter-
sociated implicit or explicit formatting infor- change between the processes. Glyph se-
mation (for example, specification of the lection and positioning are part of the com-
font and its size) represent the primary in- position and layout process, whereas
puts to composition and layout processing, imaging the glyph shape is part of the pres-
and glyph identifiers (or the associated entation process. The result of composition
glyph metrics and glyph shapes) represent and layout is a final-form document, which
the primary output from composition and contains font identifiers, glyph identifiers,
layout processing. The degree of glyph se- and coordinate positions, along with either
lection sophistication varies widely among references to font resources or the actual
existing standards and implementations. font resources themselves. Such a docu-
ment form contains all the necessary infor-
The relationship between coded characters mation required to present the formatted
and glyph identifiers may be one-to-one,
one-to-many, many-to-one, or many-to- 
3) The necessity for mapping characters to glyphs
(glyph selection), not its complexity, is one of the
motivations for developing this operational model
for characters and glyphs.

5
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

document on some presentation medium. being formatted). For example, German


An example of such a final form document text could use the “ Ä ” and “ ³ ” glyphs
is an SPDL (ISO/IEC 10180) document for quotation marks; and French text,
instance. the “©” and “ª” glyphs.

An important aspect of this document- ² When the U+002D HYPHEN-MINUS “”


processing model is that it begins with character is encountered, a composi-
coded-character data as its input and pro- tion and layout process may have to
duces either glyph-based data or directly determine if it is used in a mathematical
imaged glyph shapes as its output. That is, formula, as a separator between figures
it incorporates a transformation from a (digits), as a separator between words,
coded-character representation of a docu- or as a separator between syllables.
ment’s content to a glyph-based coding of a Depending on which context applies, it
document’s appearance. The latter may will select a minus sign, a figure dash, a
only be visible to the internal mechanisms quotation dash, or a hyphen dash (or
of an operating system or a user-interface possibly a hyphen point) glyph to dis-
subsystem in the case that the result is di- play the character.
rectly imaged for presentation. However,
even these systems frequently support NOTE: Because the ISO/IEC 10646 repertoire
includes the necessary characters, some appli-
some form of output that contains the glyph-
cations resolve quotation marks and the hyphen-
based final form of the document. minus illustrated in the previous two points by
converting to the appropriate 10646 characters
6 Glyph selection as they are input rather than selecting the ap-
propriate glyphs for presentation.
While some earlier formatting systems as-
sume a one-to-one correspondence be- ² When a parenthesis or square bracket
tween characters and glyphs, this is inade- character is encountered in a document
quate for many applications and scripts. being formatted in vertical lines (for ex-
Many contemporary composition and layout ample, with East Asian ideographs), a
systems support more complex glyph- composition and layout process may
selection processes that provide for the need to choose a vertical variant glyph
representation of sequences of multiple form of the parenthesis or square
character codes by a single glyph or by the bracket. It may also perform a similar
use of sequences of glyphs to represent selection for certain other characters
certain characters. In general, glyph selec- such as U+30FC KATAKANA-HIRAGANA
tion needs to be based on style information PROLONGED SOUND MARK “ ”, U+2014
and context as well as on the character data EM DASH “³”, U+2025 TWO DOT LEADER
itself. For example, consider the following: “”, etc.

² When the U+0022 QUOTATION MARK “  ” ² When an Arabic letter is encountered in


character is encountered, a composi- an Arabic, Farsi, Urdu, etc. document, if
tion and layout process may need to the Arabic style being used to display
determine whether it begins or ends a the text is of the Simplified Naskh type,
quotation and then choose either an a composition and layout process may
opening or closing quotation mark have to choose an isolated, initial, me-
glyph (“ ³ ” or “ ´ ”) as appropriate. In ad- dial, or final glyph form for the given let-
dition, the process may select glyphs ter according to its context in the
depending on the language of the text document. For example, glyphs for
being formatted (or the formatting style U+0647 ARABIC LETTER HEH “ ” are
specifications that apply to the content shown in Figure 3 on the next page.

6
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

the choices required to determine an ap-

   
propriate glyph are based solely on (1) the
context of a character within a document,
(2) the style specifications that apply to a
given character, or (3) a combination of the
Isolated Initial Medial Final context and style specification. All of the
choices required for the examples shown
Figure 3 — Glyphs for ARABIC LETTER HEH
above fall into one of these categories.
² In addition, Arabic typography makes However, in general, glyph selection can
extensive use of ligatures. For exam- only be made as an integral part of the en-
ple, Figure 4 shows the isolated forms tire composition and layout process. Con-
of U+0627 ARABIC LETTER ALEF “ ” and sider the following:
U+0644 ARABIC LETTER LAM “ ”, and
then the two ligature forms used when ² When hyphenating a line of text during
Lam is followed by Alef. composition, a composition and layout
process may insert a hyphen glyph
form at the end of a line if the line is

   
broken at a hyphenation point.

² If hyphenating a German text between


the letters “F” and “N”, a composition
Alef Lam Ligature Ligature and layout process may replace the “F”
Lam-Alef Lam-Alef with a “N”.
Isolated Final
² If during the composition of a German
Figure 4 — Two example ligatures in an
Arabic font
text, the character sequence “III” is en-
countered, a composition and layout
² When a U+0930 DEVANAGARI LETTER RA process may select two distinct (non-
“½” is encountered in a Hindi, Marathi, ligated) glyph forms for U+0066 LATIN
Sanskrit, etc. document, a composition SMALL LETTER F “I”. However, if the po-
and layout process may have to deter- sition for a hyphen (a hyphen point)
mine whether a subscript, superscript, should occur before the last “ I”, that is,
half (“eyelash”), or full form glyph is re- at “II I”, then a composition and layout
quired according to context. If a sub- process may select an ff ligature glyph
script form is required, a composition “j ”, followed by a hyphen (on the first
and layout process may have to choose line), and begin the subsequent line
from one of a number of possible sub- with a normal glyph for the third and
script forms depending on the glyph to final “I”.
which it is to be attached. Figure 5
shows an example of this. ² A composition and layout process may
select small cap glyph forms for the first
line of a paragraph of Roman text.

½ ¼ Ç É h ² A composition and layout process may


select a swash glyph form for the first
Full Super- Subscripts Half and last character of each line of a
script paragraph.
Figure 5 — Glyphs for DEVANAGARI LETTER
RA
² A composition and layout process may
select one of a number of possible
variant glyph forms for certain Arabic
The process of glyph selection is some- letters depending on whether more or
times implemented as a separate part of less space is available for composing a
composition and layout because many of line of Arabic text.

7
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

² When justifying a line of Arabic text, a mation technology distinguishes two re-
composition and layout process may lated, but distinct, domains:
start by selecting ligature glyph forms
that consume the smallest amount of ± The processing domain uses
linear space in a line, and then sequen- coded characters to represent the
tially replace these ligatures with com- character’s meaning.
ponent ligatures or component non- ± The presentation domain uses
ligature glyphs such that more linear glyph identifiers to represent the
line space is consumed up to the re- character’s image.
quired line measure. Alternatively, a
composition and layout process may ² Processes are available to convert be-
start justification by selecting no liga- tween the two domains:
tures and then sequentially select liga-
tures that consume a smaller amount of ± Presentation processing takes the
linear space until the desired line coded-character data plus any
measure is achieved or until an inter- formatting data plus font informa-
word space stretch threshold is tion to display and print character
reached (that is, a point at which inter- data.
word spaces can be stretched to justify ± A character recognition process
the line to the desired measure). scans images, analyzes the
shapes, and outputs the coded
In summary, the glyph-selection process is characters that correspond to the
primarily applicable to behavior occurring at shapes.
the end or beginning of individual lines of
text, or within the context of justifying or ² Depending on the script and the par-
altering the measure of a given line during ticular font or fonts used, glyph selec-
line composition. A system supporting the tion can be straightforward or relatively
capabilities illustrated in the preceding ex- complex.
amples must include glyph selection as an
integral part of the composition and layout ± It is straightforward when a one-to-
process. one correspondence exists be-
tween the set of coded characters
7 Summary and the set of glyphs in a font.
± The process is more complex
Here are the primary points of this technical when it must choose between sev-
report: eral alternatives; for example,
when a sequence of coded charac-
² Most people equate a character and its ters may be mapped into more
shape. than one sequence of glyphs in a
font.
² This causes difficulties and misunder-
standing because contemporary infor-

8
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

Annex A
Bibliography

1. ISO/IEC 646: 1991, Information tech- 7. ANSI X3.4-1986, American National


nology — ISO 7-bit coded character set Standard for Information Systems —
for information interchange. Coded Character Sets — 7-Bit Ameri-
can National Standard Code for Infor-
2. ISO/IEC 6429: 1992, Information tech- mation Interchange (7-Bit ASCII).
nology — Control functions for coded
character sets. 8. JIS X 0201-1976, Japanese Standards
Association, Jouhou koukan you fugou
3. ISO/IEC 6937: 1993, Information (Code for Information Interchange).
technology — Coded graphic character
sets for text communication – Latin 9. JIS X 0208-1990, Japanese Standards
alphabet. Association, Jouhou koukan you kanji
fugoukei (Code of the Japanese
4. ISO/IEC 8859, Information technology Graphic Character Set for Information
— 8-bit single-byte coded graphic Interchange).
character sets
— Part 1. Latin alphabet No. 1 (1987) 10. Becker, Joseph D., “Multilingual Word
— Part 2. Latin alphabet No. 2 (1987) Processing”, Scientific American, Vol.
— Part 3. Latin alphabet No. 3 (1988) 251, No. 1, July, 1984, pp. 96–107.
— Part 4. Latin alphabet No. 4 (1988)
— Part 5. Latin/Cyrillic alphabet (1988) 11. Bringhurst, Robert, The Elements of
— Part 6. Latin/Arabic alphabet (1987) Typographic Style, Hartley and Marks,
— Part 7. Latin/Greek alphabet (1987) Vancouver, 1996.
— Part 8. Latin/Hebrew alphabet
(1988) 12. Hartmann, R. R. K., and Stork, F. C.,
— Part 9. Latin alphabet No. 5 (1989) Dictionary of language and linguistics,
— Part 10. Latin alphabet No. 6 (1993). Applied Science Publishers Ltd., Lon-
don, 1976.
5. ISO/IEC 10367: 1991, Information
technology — Standardized coded 13. Lofting, Peter, “The Perception of
graphic character sets for use in 8-bit Character Entities in Unfamiliar
codes. Scripts”, unpublished paper, July, 1995.

6. ISO/IEC 10538: 1991, Information 14. The Unicode Consortium, The Unicode
technology — Control functions for text Standard, Version 2.0, Addison-
communication. Wesley, Reading, MA, 1996.

9
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

Annex B
Characters

B.1 Definition posedly being represented by a character.


Instead, SC 2 assumes that the semantics
In ISO/IEC 10646-1:1993, SC 2 defines a of a character is either (1) self-evident or (2)
character as: subject to conventions adhered to by the
user of the character, namely, the
A member of a set of elements used application.
for the organisation, control, and rep-
resentation of data. In a small character set standard, such as
ISO/IEC 646: 1991, the process of deter-
This definition asserts (1) that, in the con- mining the information represented by each
text of the role of SC 2, a character is an character is relatively straightforward and
element of a larger set, a character set, and usually involves the invocation of self-
(2) that a character is used to represent evident knowledge. For example, the char-
data or to organize and control data, or in a acters of ISO/IEC 646 that appear to be the
few cases, both. The division between data letters of the modern English alphabet, and
characters and control characters is usually to which are assigned names that appear to
specified by requiring the former to be be the names of the letters of this alphabet,
graphical characters, that is, characters with are indeed usually assumed to represent
which some graphical form can be associ- none other than the English alphabet. How-
ated. A character is not generally found (or ever, this assumption is not supported by
interpreted) in isolation, but appears as an the formal definition of ISO/IEC 646. No-
element of a sequence (an array) of charac- where in this standard does it specify that
ters, that is, a character string, and there- these characters actually represent informa-
fore is interpreted according to the context tion to be interpreted as letters of the Eng-
in which it appears. lish alphabet. Indeed, an application devel-
oper who happens to be Hawaiian may in-
After defining a character in this fashion, SC terpret these characters as representing the
2 defines character sets by enumerating a elements of the Hawaiian alphabet (plus a
list of characters. Such characters are enu- few extra letters not used by Hawaiian), or a
merated by assigning a unique name to Japanese developer may interpret them as
each character, by specifying a unique code representing the elements of the Romaji
(the code position), and by depicting a rep- form of written Japanese. In each case, the
resentative image in a table (the code ta- user of the standard is applying conventions
ble). In general, this describes the entire that do not conflict with the standard itself
formal content of any given SC 2 coded and that enable the user to employ the
character set standard, although various standard in a useful way. Other elements of
standards sometimes augment their formal ISO/IEC 646, such as the character as-
content with additional information, particu- signed to positions 2/13 (U+002D HYPHEN-
larly information pertaining to characters MINUS “”) and 2/7 (U+0027 APOSTROPHE “

that participate in control functions. ”) are commonly given multiple interpreta-
tions depending on their use. For example,
B.2 Character information the latter character may be used as an
apostrophe, as a single quote mark, or, in
What SC 2 does not do—and this is some transliteration systems, as standing
perhaps the most important point of this for a glottal stop or a palatalized consonant.
annex—is formally define the data or units Since the standard does not specify which
of information that graphic characters are information the character represents, a user
supposed to represent; that is, no formal of the standard is free to choose. Once the
semantics are specified to assist in the task number of characters in a standard is in-
of interpreting the so-called data sup- creased many times, such as the case with

10
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

ISO/IEC 10646-1: 1993 where over 30,000 Of these characters, the following are
characters are defined, the potential for merely size or position variants of a single
multiple usage conventions increases. form:

B.3 Example, the unit of information U+0031 DIGIT ONE “”


U+00B9 SUPERSCRIPT ONE “”
“one”
U+2081 SUBSCRIPT ONE “”
U+FF11 FULLWIDTH DIGIT ONE “ ”
Consider for a moment the case of the unit
of information meaning “one”. ISO/IEC The following are various adorned variants
10646 not only codes a large number of of this form:
characters that conceivably represent this
unit of information but also codes a number U+215F FRACTION NUMERATOR ONE “ ” 

of characters that represent a particular U+2460 CIRCLED DIGIT ONE “‘”


form associated with this meaning. The U+2474 PARENTHESIZED DIGIT ONE “  ”
characters that may be said to represent the U+2488 DIGIT ONE FULL STOP “”
U+2776 DINGBAT NEGATIVE CIRCLED DIGIT
unit of information designated by “one” are ONE “›”
(at least): U+2780 DINGBAT CIRCLED SANS-SERIF DIGIT
ONE “¥”
U+0031 DIGIT ONE “” U+278A DINGBAT NEGATIVE CIRCLED SANS-
U+00B9 SUPERSCRIPT ONE “” SERIF DIGIT ONE “¯”
U+0661 ARABIC-INDIC DIGIT ONE “p”
U+06F1 EXTENDED ARABIC-INDIC DIGIT ONE
The following characters, although all rep-
“ q”
resent the concept “one”, employ different
s
U+0967 DEVANAGARI DIGIT ONE “r”
U+09E7 BENGALI DIGIT ONE “ ” forms depending on the script with which
U+09F4 BENGALI CURRENCY NUMERATOR they are associated. However, one could
u
ONE “t” argue that several of these forms are really
v
U+0A67 GURMUKHI DIGIT ONE “ ” different instances of a single form from
w
U+0AE7 GUJARATI DIGIT ONE “ ”
which they are historically derived, namely,
x
U+0B67 ORIYA DIGIT ONE “ ”
the Indic-script forms of “one”:
y
U+0BE7 TAMIL DIGIT ONE “ ”

z
U+0C67 TELUGU DIGIT ONE “ ”
U+0661 ARABIC-INDIC DIGIT ONE “p”
{
U+0CE7 KANNADA DIGIT ONE “ ”
U+0D67 MALAYALAM DIGIT ONE “ ” U+06F1 EXTENDED ARABIC-INDIC DIGIT ONE
U+0E51 THAI DIGIT ONE “|” “ q”

s
U+0ED1 LAO DIGIT ONE “}” U+0967 DEVANAGARI DIGIT ONE “r”

u
U+2081 SUBSCRIPT ONE “” U+09E7 BENGALI DIGIT ONE “ ”
FRACTION NUMERATOR ONE “ ”
v
U+215F 
U+0A67 GURMUKHI DIGIT ONE “ ”

w
U+2160 ROMAN NUMERAL ONE “,” U+0AE7 GUJARATI DIGIT ONE “ ”

x
U+2170 SMALL ROMAN NUMERAL ONE “L” U+0B67 ORIYA DIGIT ONE “ ”

y
U+2460 CIRCLED DIGIT ONE “‘” U+0BE7 TAMIL DIGIT ONE “ ”

z
U+2474 PARENTHESIZED DIGIT ONE “  ” U+0C67 TELUGU DIGIT ONE “ ”

{
U+2488 DIGIT ONE FULL STOP “” U+0CE7 KANNADA DIGIT ONE “ ”
U+2776 DINGBAT NEGATIVE CIRCLED DIGIT U+0D67 MALAYALAM DIGIT ONE “ ”
ONE “›” U+0E51 THAI DIGIT ONE “|”
U+2780 DINGBAT CIRCLED SANS-SERIF DIGIT U+0ED1 LAO DIGIT ONE “}”
ONE “¥” U+3021 HANGZHOU NUMERAL ONE “~”

a
U+278A DINGBAT NEGATIVE CIRCLED SANS- U+3192 IDEOGRAPHIC ANNOTATION ONE
SERIF DIGIT ONE “¯” MARK “ ”
U+3021 HANGZHOU NUMERAL ONE “~” U+3220 PARENTHESIZED IDEOGRAPH ONE
“ ”
a
U+3192 IDEOGRAPHIC ANNOTATION ONE
MARK “ ” U+3280 CIRCLED IDEOGRAPH ONE “ ”
U+3220 PARENTHESIZED IDEOGRAPH ONE U+4E00 CJK UNIFIED IDEOGRAPH-4E00 “ ”
“ ” U+58F9 CJK UNIFIED IDEOGRAPH-58F9 “ ”
U+3280 CIRCLED IDEOGRAPH ONE “ ”
U+4E00 CJK UNIFIED IDEOGRAPH-4E00 “ ” This example clearly shows that the de-
U+58F9 CJK UNIFIED IDEOGRAPH-58F9 “ ” signers of this character set did not start
U+FF11 FULLWIDTH DIGIT ONE “ ”
with individual units of information and as-
sign each such unit to a unique character;

11
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

0123456789
furthermore, it is also clear that the design-
ers did not start with individual forms and
assign each to a unique character. Rather,
a combination of forms and variations of a Figure 6 — Old style figures
single form, all signifying the idea “one”,
were included as distinct characters. B.4 Considerations for deciding the
repertoire of a coded character set
To gain an understanding of the distinction
between characters and glyphs, consider Various arguments are possible for de-
that the following characters could have fending the inclusion or exclusion of a par-
easily been unified into a single character ticular form as a possible graphic character
that would be displayed using one of four in a repertoire. In many cases, the criterion
glyphs: for either inclusion or exclusion has not
been articulated but is based on informal
U+0031 DIGIT ONE “”
U+00B9 SUPERSCRIPT ONE “ ” 
opinion about appropriateness. Justifying
U+2081 SUBSCRIPT ONE “ ” 
why certain forms were coded into ISO/IEC
U+FF11 FULLWIDTH DIGIT ONE “ ” 10646-1: 1993 and why others were not is
beyond the scope of this Technical Report.
These four characters can be considered as However, with respect to coding glyphs ver-
instances of one character that takes on sus characters, the objective is to code
slightly different forms depending on usage. characters that represent different informa-
In this case, usage or style alone would tion. To meet this objective, three important
govern the form chosen to depict a single considerations should be applied.4)
abstract character. In the case of a form
used as the numerator of a fraction, the 1. Same shape/different meanings
appropriate glyph could be determined
based on the local context of the character, Does one shape have multiple mean-
assuming for a moment that a character ings (semantics)?
such as a U+0031 DIGIT ONE “” is followed
by a U+2044 FRACTION SLASH “'”. In the Some shapes will be the same, or
remaining cases, the character’s immediate nearly the same, but have different
context would not be sufficient but would meanings or different semantics. An
require that additional information be sup- example of this is that in many sans-
plied such as style information that would serif fonts the glyph “I” is used for both
govern the appearance of a character when the U+0049 LATIN CAPITAL LETTER I “,”
displayed. In either case, the process of and the U+006C LATIN SMALL LETTER L
depicting a given character may require the “O”. Similarly, for years many typewriters
selection of one of a number of possible lacked a key for the U+0031 DIGIT ONE
glyphs, each of which may serve (in “” and people were taught to type the
different cases) to present the image of a U+006C LATIN SMALL LETTER L “O” in-
character. stead. Later, when people switched
from typewriters to computers, this
Notice that certain other possible forms of a practice failed and people had to re-
“one” are, in fact, not found in this standard learn to type the digit one “” instead of
as characters. For example, many high- the letter “O”.
quality font collections supply a collection of
forms for the Arabic numerals known as old 2. Different shapes/same meaning
style figures shown in Figure 6. Were the
old style figures included as characters, the Do two or more shapes imply the same
OLD STYLE FIGURE DIGIT ONE “1” could have
meaning (semantics)?
been added to 10646.

4) Peter Lofting, “The Perception of Character Enti-
ties in Unfamiliar Scripts”.

12
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

Shape differences may be font design B.5 The “round-trip rule”


differences or glyph rendering differ-
ences. Examples of font design differ- In the case of ISO/IEC 10646, an informal
ences (for which the different shapes criterion (known as the “round-trip rule”) for
would have the same glyph identifier in the inclusion of a character can be phrased
the ISO/IEC 10036 glyph register) are as follows:
the “D” and “D” glyph variations of the
U+0061 LATIN SMALL LETTER A “D”. Ex- If a form is included as a character in
amples of glyph rendering differences any of the character sets from which
(for which the different shapes would ISO/IEC 10646 is derived, then that
have different glyph identifiers) are the form shall be included as a character
Arabic letters and corresponding initial, in ISO/IEC 10646 such that distinc-
medial, and final presentation forms. tions among characters in the source
Figure 3 illustrates this concept. It is character set are maintained as dis-
important to discern small differences tinctions in ISO/IEC 10646.
and determine when they are merely
embellishments and when they change This criterion was defined such that the
the meaning. For example, the shape elements of two source character sets could
of the U+0428 CYRILLIC CAPITAL LETTER be unified with each other (for example, the
SHA “ ” differs very little from the shape ideographic characters in the Chinese,
of the U+0429 CYRILLIC CAPITAL LETTER Japanese, and Korean national standards)
SHCHA “ ”, yet they are different letters. while at the same time guaranteeing that
distinctions within a source character set
3. Compatibility would be maintained. The latter was re-
quired to guarantee that no loss of informa-
Is the shape needed for migration of, tion would occur when translating from one
and coexistence with, text coded using of the source character sets to 10646 and
an older coded character set? then back to the original character set.

In practice, the need for compatibility Certain characters that might have been
with existing coded character sets fre- unified in 10646 were not unified because of
quently overrides the second consid- the round-trip rule. For instance, U+00B9
eration. Examples of this are found in SUPERSCRIPT ONE “¹” was not unified with
ISO/IEC 10646-1: 1993. The next U+0031 DIGIT ONE “1” because ISO 8859-1:
clause describes an important compati- 1987, a source character set for 10646,
bility criterion, the “round-trip rule”. includes these two forms as distinct charac-
ters. Most of the instances of formal entities
These considerations should be used to within 10646 that could have been unified
help decide which forms to include in a new were likewise distinct characters in some
repertoire to be coded. Although the con- source character set or, in some special
siderations are easy to state, obtaining de- instances, distinct characters in certain un-
finitive answers requires considerable effort, ions of character sets, for example, the un-
for example, to consult with experts and ion of 7-bit ASCII (ANSI X3.4-1986), JIS X
native users, who are normally unaware of 0201-1976, and JIS X 0208-1990 as em-
information technology and not concerned ployed in Shift JIS coding in Japan.
with such details.

13
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

Annex C
Glyphs

C.1 Definition concept used in linguistic theory in the fol-


lowing sense:5)
SC 18 defines a glyph as:
Allograph: One of a group of variants of
An abstract identified graphical sym- a grapheme or written sign. It usually re-
bol independent of any actual image. fers to different shapes of letters and
punctuation marks, e.g., lower case,
Two aspects of this definition are important capital, cursive, printed, strokes, etc., …
to consider: (1) a glyph is identifiable; and
(2) a glyph is an abstraction of an actual Grapheme: A minimum distinctive unit of
image. The notion of identification is closely the writing system of a particular lan-
tied to the use of a glyph. In the SC 18 guage, … the grapheme has no physical
model of font resources, articulated by identity, but is an abstraction based on
ISO/IEC 9541, ISO/IEC 10180 (SPDL), and the different shapes of written signs and
other standards, each element of a font their distribution within a given system.
resource must be able to be identified. This These different variants, e.g., the cursive
identification facilitates the unique selection and printed shapes of letters M, m,
of the representation of a glyph from a font cursivated m, M, etc. in an alphabetic
resource and the interchange of such identi- writings system are all allographs of the
fications embedded in the formatted, final grapheme /m/.
form of a document, for example, an
ISO/IEC 10180 file. The definition of a final- As can be seen, glyph and grapheme are
form document specifies that all composi- clearly related, partly overlapping concepts.
tion and layout operations have already The difference is that the grapheme concept
taken place and, in particular, that the is defined in relation to writing systems of
selection of the glyphs that will be employed particular languages, whereas the glyph
to depict character data has already concept is defined independently of
occurred. The business of defining language.
identifiers for glyphs is the task of ISO/IEC
10036, and AFII (Association for Font C.2 Assignment of glyph identifiers
Information Interchange) is the current
registration authority. To ensure global In specifying characters for inclusion in a
uniqueness, the ISO/IEC 10036 glyph character set standard, SC 2 normally has
identifiers are structured names as defined recourse to the meaning of a character and,
by ISO/IEC 9541. in particular, has the option of unifying two
The second aspect of the SC 18 definition or more forms if it is determined that those
of a glyph is that it is an abstraction that is forms do not represent distinctions in mean-
independent of an actual image. This is ing within a particular written language or
analogous to the primary definition of a that the forms represent merely stylistic
character as representing data. The level of differences. In registering glyphs, the glyph
abstraction is not defined nor are criteria registration authority of ISO/IEC 10036 has
defined that would allow determining recourse to analysis of the form of the glyph
whether two potential images (forms) are and has worked to identify which potential
instances of one abstract glyph or are to be glyphs are merely design variations of a
considered two distinct glyphs, each having single abstract glyph. However, the glyph
an independent image. registration authority of ISO/IEC 10036

The distinction between the concepts of


glyph and grapheme is not addressed by 
5) R. R. K. Hartmann and F. C. Stork, Dictionary of
this Technical Report. Grapheme is the language and linguistics.

14
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

must be prepared to register an arbitrary Each glyph representation in a font re-


glyph if so requested. source defines the metric and shape infor-
mation associated with a specific glyph. It is
The difficulty of identifying design or writing necessary that each glyph representation
system variants of a glyph is that the criteria be uniquely identified from all other glyph
for identifying distinct glyphs are culturally representations in that font resource. The
dependent. In Latin fonts used with Euro- glyph identifiers used within a font resource
pean languages, a wide set of variations is may be unique to that one font resource
allowed in the design of the glyphs. The only or may be unique within some larger
skeletal structure of the glyphs can change; scope (company register, industry register,
strokes can be omitted; the form of the national register, or international register).
stroke can change; and extra elements and
some flourishes can be added without C.3.2 Character-to-glyph mapping
creating a new glyph. The users of table
ideographic glyphs are much more
restrictive in the set of variations they will Character-to-glyph mapping tables are not
allow before a new glyph is created. Thus, defined by ISO standards but are necessary
the input of experts is extremely important to show the relationship between the char-
in identifying the relevant glyphs to be regis- acter codes of a given coded character set
tered. standard and the glyph identifiers of a given
C.3 Use of glyph identifiers font resource. A character-to-glyph mapping
table is used in document formatting to
Glyph identifiers are typically used in the identify which glyph identifier or identifiers
following data structures: (1) a font resource should be used for presentation when a
to uniquely identify the glyph metric and given character code or code sequence is
shape information contained in that font encountered in a revisable document. For
resource, (2) a character-to-glyph mapping one-to-one mappings, the character-to-
table to identify the glyph(s) to be used glyph mapping table is simplistic or non-
when one or more character codes occur in existent. However, for many-to-one, one-to-
a revisable document, (3) a glyph-index- many, or many-to-many mappings, the
map to identify the glyph to be used when a character-to-glyph mapping table may be-
glyph index occurs in a formatted docu- come quite complex and include metric in-
ment, and (4) a glyph collection to identify formation for repositioning component
the set of glyphs making up the collection. glyphs into composite shapes. The glyph
In these four uses, the industry is better identifiers used in a character-to-glyph
served by having commonly defined, uni- mapping table may be the same as those
versal glyph-identifiers. However, fonts are used in the associated font resource or may
not required to use registered glyph identifi- be indirectly mapped to the associated font
ers. For example, within a font, ISO/IEC resource.
9541 specifically allows the use of glyph
identifiers that are not registered under C.3.3 Glyph-index map
ISO/IEC 10036.
Glyph-index maps are defined by ISO/IEC
C.3.1 Font resource 10180 as a data structure that maps index
values (presentation codes) in a formatted
ISO/IEC 9541 defines a font resource as: document to the glyph identifiers in an as-
sociated font resource. Such document
A collection of glyph representations formatting processes transform the charac-
together with descriptive and font met- ter codes of an input document (using the
ric information which are relevant to information contained in a character-to-
the collection of glyph representations glyph map) into glyph-index numbers in a
as a whole. formatted output document. The formatting
process will either dynamically build a
glyph-index map that uniquely associates

15
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

the index values in the document to the lection is a list of glyph identifiers, and it
glyph identifiers of the font resource, or it may be assigned a unique identifier. Font
may use predefined (registered) glyph-index resources may contain any combination of
maps. glyph identifiers, and revisable documents
may contain any repertoire of character
C.3.4 Glyph collection codes. In formatting and presenting a
document, glyph collections help locate font
To aid in the process of identifying a font resources that contain a full set of glyphs
resource that contains a required set of that correspond to the set of character
glyphs, ISO/IEC 9541 defines a data struc- codes contained in the document.
ture called a glyph collection. A glyph col-

16
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

Annex D
Font models

D.1 Overview of font models identifiers contained in the font. The first
form requires separate fonts for each code
This annex describes three font models. table supported, while the second form re-
The first two, the coded font model and the quires separate mapping tables for each
font resource model, are from SC 18. The code table supported (this latter form saves
third, the intelligent font model, is from the storage). Both data structures depend on a
Unicode Consortium. Any one of these one-to-one mapping of character codes to
models could be used successfully to print glyphs in a font, and this is the basis for the
or display characters coded in ISO/IEC coded font model illustrated in Figure 7.
10646 or in other coded character sets.
These font models rely not only on the pro- This font model is the historic presentation
cesses described in this annex but also on model for data processing. In this model,
the glyph data structures described in An- each character code encountered by the
nex C.3, “Use of glyph identifiers”. layout process is used to locate a corre-
sponding glyph in the coded font. The glyph
D.2 Coded font model metric information for that character code is
used to determine positioning of the glyph,
A coded font (or a character-coded font) is a along with line and page breaks. The for-
data structure in which character codes are matted document may be interchanged to
used to identify the glyph metric and glyph another location for presentation processing
shape information contained in the font. In or transmitted to a local presentation proc-
practice, two primary forms of this data ess. The presentation process would use
structure are used: one in which the charac- the character codes contained in the format-
ter codes are used directly in the font to ted document to locate a corresponding
identify the glyph metric and glyph shape glyph in the coded font and use the associ-
information, and one in which the character ated glyph shape information to image the
codes are mapped to independent glyph glyph on the presentation surface at the

5HYLVDEOH'RFXPHQW
7H[W
&KDUDFWHU&RGHV
6W\OH
'HYLFH )RUPDW&RQWURO
,QIRUPDWLRQ
,QIRUPDWLRQ IRUPDW FRGHWDEOH
RSWLRQDO LQIRUPDWLRQ
/D\RXW 3UHVHQWDWLRQ

*HQHUDO/D\RXW3URFHVV
&RGHG)RQW

3DJH/D\RXW

DFFHVVJO\SKPHWULFVE\FKDUDFWHUFRGH *O\SK0HWULFV
&RGHG)RQW

LGHQWLILHGE\FKDUDFWHUFRGH
&RGHG)RQW
3URFHVV

)RUPDWWHG'RFXPHQW
'HYLFH,QGHSHQGHQW

FKDUDFWHUFRGHV

ZLWKSRVLWLRQLQIRUPDWLRQ

*O\SK6KDSHV
3UHVHQWDWLRQ3URFHVV
LGHQWLILHGE\FKDUDFWHUFRGH
5DVWHU,PDJH3URFHVVLQJ5,3

DFFHVVVKDSHLQIRUPDWLRQE\FKDUDFWHUFRGH

,PDJHVRQ

3UHVHQWDWLRQ6XUIDFH


Figure 7 — Coded font model

17
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

position indicated by the layout process. D.3 Font resource model


With the coded font model, if a desired The font resource model permits definition
glyph is not associated with a character in a of font resources that are less dependent on
coded character set, then the glyph cannot any single coded character set or docu-
be displayed or printed. For example, if the ment-processing model. It is illustrated in
U+FB01 LATIN SMALL LIGATURE FI “ ” Figure 8. This model is more suited to the
character is not in a coded character set, document printing and publishing environ-
the “¿” glyph is not available in the ment and permits blind interchange to occur
corresponding coded font for display or between the layout and presentation proc-
printing. This fact and the widespread esses. Glyph identifiers index the glyph
implementation of the coded font model metrics and glyph shape representations in
have resulted in pressure to include some the font resource. In this model, the layout
glyphs in coded character sets. The other process uses predefined character-to-glyph
two font models, which can be implemented maps to determine the mapping (one-to-
to do sophisticated glyph selection, do not one, many-to-one, or one-to-many) of char-
require that all the glyphs in a font resource acter codes to presentation glyphs and re-
be coded as characters in the coded places the character codes in the formatted
character set to print or display the glyphs. document with glyph index values. At the
same time, the layout process builds a
The coded font model is less suitable than glyph index map (or it may use a prede-
the other two models for the more complex fined, registered glyph index map) that as-
glyph-selection requirements of printing and sociates the glyph index values to the glyph
publishing. For example, the Arabic script identifiers used in the font resource.
requires special processing in the coded
font model. If the input to the general layout The glyph index map is a mapping of glyph
process includes Arabic characters, the index values to glyph identifiers as shown in
process also needs to convert the Arabic Figure 9 on the next page. The glyph index
characters to the correct Arabic presenta- map may be
tion forms.

5HYLVDEOH'RFXPHQW
7H[W
&KDUDFWHU&RGHV
6W\OH
'HYLFH )RUPDW&RQWURO
,QIRUPDWLRQ
,QIRUPDWLRQ IRUPDW FRGHWDEOH
RSWLRQDO LQIRUPDWLRQ

&KDUDFWHUWR*O\SK0DSV
&RGHG&KDUDFWHUVWR
/D\RXW 3UHVHQWDWLRQ

*O\SK,GHQWLILHUV

*HQHUDO/D\RXW3URFHVV IRUYDULRXVFKDUDFWHUHQFRGLQJV

6HOHFW*O\SK/D\RXW3DJH %XLOG*O\SK,QGH[

DFFHVVJO\SKPHWULFVE\JO\SKLGHQWLILFDWLRQ
*O\SK0HWULFV
)RQW5HVRXUFH
3URFHVV

LGHQWLILHGE\JO\SKLGHQWLILHU

)RUPDWWHG'RFXPHQW *O\SK,QGH[
'HYLFH,QGHSHQGHQW 0DS
JO\SKLQGH[ JO\SKLQGH[WR

ZLWKSRVLWLRQLQIRUPDWLRQ JO\SKLGHQWLILHUV

3UHVHQWDWLRQ3URFHVV
*O\SK6KDSHV
5DVWHU,PDJH3URFHVVLQJ5,3
LGHQWLILHGE\JO\SKLGHQWLILHU
DFFHVVVKDSHLQIRUPDWLRQE\JO\SKLQGH[

,PDJHVRQ

3UHVHQWDWLRQ6XUIDFH


Figure 8 — Font resource model

18
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

Unformatted Formatted
Document Document Glyph Index Map Font Resource
Coded Document Glyph Index Index Glyph Glyph Metric & Shape
Characters Formatting Values Values IDs IDs Data
Process 0 0 mmmm,sssss
· ·
· ·
… …
0x007C 0x0122 0x0122 1874 1874 mmmm,sssss
… … · ·
· ·

0x04AB
33661 mmmm,sssss 

Figure 9 — Font resource, glyph index model

² unique to a particular indexed font, augments a font resource with additional


information. The font resource contains
² a mapping that is shared among sev-
eral fonts, or ² glyph representations

² a standardized mapping. ² glyph metrics

This flexibility allows a composition and To this data structure, the intelligent font
layout process to generate a glyph index adds information describing
map that accesses only and exactly those
glyphs of a large font resource that are ² how a sequence of coded characters is
needed to image the output of the process. transformed into a sequence of glyph
This glyph index map may be combined identifiers, with associated position in-
with the font resource to produce an in- formation
dexed font for this particular output.
² how the transformation of coded char-
In the font resource model, the relationship acters to glyph identifiers is affected by
between the character repertoire and the style information
glyph collection may involve a one-to-one
mapping but may also involve a one-to- The first type of additional information typi-
many or many-to-one mapping. It is essen- cally includes several mappings from vari-
tial for successful presentation that the set ous coded character sets to private (font-
of glyphs in the glyph collection be mappa- specific) glyph identifiers. Subsequent
ble to the repertoire of characters used in transformations use the glyph identifiers.
the text or ideographic string. For the The subsequent transformations may be
smaller, single-byte coded character sets, it complex and may result in changes to the
is common to have a font resource that con- number and ordering of the glyph identifiers.
tains a glyph collection that contains all of For example, it may transform multiple
the glyphs required to present the character coded characters into a single glyph (either
repertoire of several coded character sets. because the glyph is a ligature or because
However, for the larger ISO/IEC 10646 the coded character sequence is a compos-
multi-octet coded character set, it will be ite sequence) or a single coded character
more common to have font resources that into multiple glyph representations that to-
contain glyph collections that are capable of gether construct the intended shape. See
presenting selected sub-repertoires of the Annex E. The second type of additional
total 10646 repertoire. information permits, for example, substitu-
tion of glyph subsets (for example, swash
D.4 Intelligent font model variants, vertical substitution) based on
style information.
An intelligent font is a data structure that

19
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

7H[W
&KDUDFWHU

&RGHV LQ &KDUDFWHU
6W\OH
'HYLFH PHPRU\
&RGH
,QIRUPDWLRQ
,QIRUPDWLRQ RUGHU
RSWLRQDO ,GHQWLILFDWLRQ

*O\SK6HOHFWLRQ3URFHVV
&KDUDFWHUWR*O\SK0DSV
/D\RXW 3UHVHQWDWLRQ IRUYDULRXVHQFRGLQJV

,QWHOOLJHQW)RQW
*O\SK,GHQWLILHUV
LQFKDUDFWHURUGHU )HDWXUH6HOHFWLRQ
3URFHVV

*HQHUDO/D\RXW3URFHVV

QRNQRZOHGJHRIZULWLQJV\VWHP /D\RXW7UDQVIRUPDWLRQ

0RGLILHG*O\SK,GHQWLILHUV *O\SK0HWULFV

LQGLVSOD\RUGHU

ZLWK3RVLWLRQ,QIRUPDWLRQ

3UHVHQWDWLRQ3URFHVV *O\SK6KDSHV

,PDJHVRQ

3UHVHQWDWLRQ6XUIDFH

Figure 10 — Intelligent font layout and presentation model

An intelligent font can be used with a layout which is written from left to right, the first
and presentation process that directly pre- character would be the leftmost character.
sents coded characters, that is, plain text (a For Latin text included in the middle of Ara-
coded character sequence that does not bic text, the logical order would be the
contain additional formatting information). rightmost Arabic character to the end of the
Figure 10 shows the intelligent font model Arabic text, then the leftmost Latin character
and the following paragraphs describe this to the end of the Latin text, and then the
model. rightmost Arabic character of the second
group of Arabic text to the end of the Arabic
Within the layout and presentation process text.
of the intelligent font model, the glyph
selection process transforms coded Next, the general layout process transforms
characters to glyph identifiers. This process the glyph identifiers in logical order into
requires (possibly modified) glyph identifiers in dis-
play order. Display order is the order in
² information about how the characters which the characters are to appear on pa-
are coded per or on a screen. The general layout pro-
cess requires
² the map from coded characters to glyph
identifiers for the specified character ² glyph metrics
coding
² layout transformation
The process takes coded characters in
memory or logical order and produces glyph ² feature selection information (how to
identifiers in character or logical order. use the optional style information)
Logical order is the order in which a person
² optional style information
would normally read the characters regard-
less of the normal direction of the charac- ² device information
ters. Thus, for a text stream of Arabic, which
is written from right to left, the first character
would be the rightmost character; for Latin,

20
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

The presentation process is the final proc- D.5 Font model summary
ess. It takes the glyph identifiers in display
order, the glyph positions, and the glyph Table 1 summarizes and compares the
shapes to produce the images on paper or three font models described in this Annex.
a screen. The primary difference between the three
models is the sophistication of the process
for selecting glyphs.

Table 1 — Comparison of font models

Font Models
Characteristic
Coded Font Font Resource Intelligent Font
Glyph Selection Process None Yes (1 Process) Yes (2 Processes)
(character-to-glyph mapping) (1-to-1) (1-to-1 or M-to-N) (1-to-1 or M-to-N)
Character-to-Glyph No Yes Yes
Mapping (implied by character (external to font re- (in font resource)
Font Data Structure

code position) source)


Index to Glyphs Code Position in Glyph Identifier Glyph Identifier
Code Table (private or registered) (private)

Glyph Metrics and Yes Yes Yes


Shapes
Additional Data No No Feature Selection,
Layout
Transformation



21
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

Annex E
Examples of character-to-glyph mapping

E.1 Mapping characters to glyphs Such mappings are more common in other
writing systems. Hebrew, for example,
This Annex shows examples of the charac- makes extensive use of diacritical marks
ter-to-glyph mapping process. It should be that are written around and even within
emphasized that it is often possible to rep- various letters of the alphabet. The exact
resent a coded character sequence in more position of the diacritical marks varies de-
than one way and to provide a visual repre- pending on the letter with which they are
sentation for it in more than one way. The written. The sequence U+05E4 HEBREW
two processes are separate, and they can LETTER PE “ ”, U+05BC HEBREW POINT
be individually optimized. DAGESH OR MAPIQ “f”, and U+05B8 HEBREW
POINT QAMATS “»” is often drawn by using a
E.2 One-to-one single glyph “ ” to provide optimal place-
ment of the diacritical marks.
A one-to-one mapping from character to
glyph is the most frequently used in repre- Level 3 implementations of ISO/IEC 10646-
senting Latin-based languages, where the 1 also use combining characters to repre-
character U+0041 LATIN CAPITAL LETTER A sent accented Latin letters. Again, individual
“$”, for example, is likely to be drawn by glyphs can be used to provide the best
using a single “$” glyph. The coded font alignment of letter and accent. A level 3
model assumes that a one-to-one mapping implementation of ISO/IEC 10646-1 might
is always the case. well use the coded character sequence
U+0065 LATIN SMALL LETTER E “H” and
It is often possible to use a single glyph to U+0302 COMBINING CIRCUMFLEX ACCENT “Ì”
represent more than one distinct character. but draw it using a single “r” glyph.
For example, both the U+00C5 LATIN
CAPITAL LETTER A WITH RING ABOVE “c” and E.4 One-to-many
U+212B ANGSTROM SIGN “ ” can be repre-
sented by the glyph “c”. It is also conceiv- One-to-many mappings are more common
able for some implementations to use a than is often suspected. Whereas high-
single glyph for U+0041 LATIN CAPITAL quality typography would insist on a large
LETTER A “$”, U+0391 GREEK CAPITAL number of glyphs to provide greatest visual
LETTER ALPHA “ ”, and U+0410 CYRILLIC appeal, systems that cannot afford the nec-
CAPITAL LETTER A “ ”. These examples are essary overhead can resort to other
different from the many-to-one mapping schemes. They might draw a U+00E9 LATIN
discussed below. SMALL LETTER E WITH ACUTE “p” by drawing
the “H” glyph first then positioning the “”
E.3 Many-to-one glyph above it to form the glyph for the “p”.

Many-to-one mappings are common even in One-to-many mappings are also found in
Latin typography. The sequence U+0066 Indic languages, where vowels can be writ-
LATIN SMALL LETTER F “I” and U+0069 LATIN ten in two pieces, one on either side of the
SMALL LETTER I “L” could be drawn by using
U
character they follow. The single character
a single glyph “¿” for the ligature of “I” and U+09CB BENGALI VOWEL SIGN O “ ” can be
“L”. The sequence U+0031 DIGIT ONE “1”, displayed using two glyphs that appear on
and U+2044 FRACTION SLASH “'”, and either side of the related consonant.
U+0032 DIGIT TWO “2” could be drawn by
using a single “ò” glyph. ISO/IEC 10646 also included characters for
Roman numerals. A system may choose to

22
© ISO/IEC ISO/IEC TR 15285: 1998 (E)

draw U+2165 ROMAN NUMERAL SIX “9,” by dot below, and a system that has a single
drawing a “9” and an “,” to the right. glyph for this sequence may simply draw
that. (Similarly, a level 1 implementation of
E.5 Many-to-many ISO/IEC 10646 would use the coded char-
acter U+1EC7 LATIN SMALL LETTER E WITH
Given the previous examples, it should not CIRCUMFLEX AND DOT BELOW “r”.)
be surprising that even many-to-many map-
pings occur. For example, in writing Viet- Indeed, depending on the details of the in-
namese using level 3 of ISO/IEC 10646, the dividual implementation, many of the exam-
coded character sequence U+0065 LATIN ples from the previous clauses could be
SMALL LETTER E “H”, U+0302 COMBINING recast in a many-to-many fashion. Again,
CIRCUMFLEX ACCENT “Ì”, and U+0323 note carefully that depending on the individ-
COMBINING DOT BELOW “” could occur. Dis- ual designs of the glyphs, individual pre-
playing this sequence would require draw- sentation systems will often differ in how
ing an “H” with a “$” above it and a dot “  ”
 they represent characters and how they
below it. A system that has an “r” glyph may present the associated glyphs.
choose to use that glyph and then add the

23
ISO/IEC TR 15285: 1998 (E) © ISO/IEC

Annex F
Recommendations of the original report

At its meeting held in November, 1993, c. A character may be associated


ISO/IEC JTC 1/SC 2/WG 2 (WG 2) received with multiple glyphs; likewise, a
the original draft of the “Character-glyph glyph may be associated with mul-
model” (WG 2 document, N 915, dated 23 tiple characters.
September, 1993). At the meeting, WG 2
resolved to accept the document as a first d. Some glyphs may not be associ-
working draft of this Technical Report and ated with any single character;
requested a change to the “Purpose” clause other glyphs may be associated
(WG 2 document, N 949 R, dated 30 only with a sequence (string) of
November, 1993). The requested change in characters.
purpose became item 4 in this list of
recommendations. 3. The coding of additional presentation
forms in ISO/IEC 10646 should be
1. In accordance with ISO/IEC 10036, avoided. Rather, such forms should be
AFII should undertake to register a registered as glyphs in accordance with
comprehensive set of glyphs (graphic ISO/IEC 10036.
symbols) needed for each known writ-
ing system. 4. The registration of additional glyphs in
accordance with ISO/IEC 10036 should
2. To facilitate the formatting and presen- be avoided when
tation of ISO/IEC 10646 coded charac-
ter data, a set of associations between a. the proposed glyph shares the
characters coded in 10646 and glyphs same shape and associated glyph
registered according to ISO/IEC 10036 properties as a glyph already regis-
should be defined. In particular, AFII tered, and
should provide a table to document the
ISO/IEC 10036 glyph identifier (or in b. the proposed glyph is distinguished
the case of East Asian ideographs, the solely by being associated with a
glyph identifiers) used to print each different character.
code position in the ISO/IEC 10646
standard. 5. SC 2 and SC 18 should adopt a com-
mon definition of terms and use the
a. The term “association” in this con- same terminology in developing stan-
text means that some glyph is suit- dards. If SC 2 and SC 18 are unable to
able for presenting a character or a reach consensus on terminology, then
sequence of characters under ap- when appropriate, SC 2 and SC 18
propriate circumstances. standards should cross-reference terms
for the other subcommittee.
b. At least one glyph should be asso-
ciated with each character.

24



ISO/IEC TR 15285: 1998 (E) © ISO/IEC

ICS 35.240.30

Price based on 24 pages

You might also like