Technical: Iso/Iec TR 15285
Technical: Iso/Iec TR 15285
Technical: Iso/Iec TR 15285
TECHNICAL ISO/IEC
REPORT TR 15285
First edition
1998-12-15
Reference number
ISO/IEC TR 15285:1998(E)
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
Contents
Page
Foreword....................................................................................................... iii
Introduction .................................................................................................. iv
1 Scope .......................................................................................................... 1
2 References.................................................................................................. 1
3 Definitions .................................................................................................. 1
4 Character and glyph distinctions............................................................. 2
5 Operational model ..................................................................................... 3
6 Glyph selection .......................................................................................... 6
7 Summary..................................................................................................... 8
Annex A: Bibliography ................................................................................. 9
Annex B: Characters .................................................................................. 10
Annex C: Glyphs ......................................................................................... 14
Annex D: Font models................................................................................ 17
Annex E: Examples of character-to-glyph mapping ............................... 22
Annex F: Recommendations of the original report ................................. 24
© ISO/IEC 1998
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or
utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm, without permission in writing from the publisher.
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
Foreword
ISO (the International Organization for Standardization) and IEC (the
International Electrotechnical Commission) form the specialized sys-
tem for worldwide standardization. National bodies that are members
of ISO or IEC participate in the development of International Stan-
dards through technical committees established by the respective
organization to deal with particular fields of technical activity. ISO and
IEC technical committees collaborate in fields of mutual interest.
Other international organizations, governmental and non-govern-
mental, in liaison with ISO and IEC, also take part in the work.
iii
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
Introduction
People interpret the meaning of a written sentence by the shapes of
the characters contained in it. For the characters themselves, people
consider the information content of a character inseparable from its
printed image. Information technology, in contrast, makes a distinc-
tion between the concepts of a character’s meaning (the information
content) and its shape (the presentation image). Information technol-
ogy uses the term character (or coded character) for the information
content, and the term glyph for the presentation image. A conflict ex-
ists because people consider characters and glyphs equivalent.
Moreover, this conflict has led to misunderstanding and confusion.
This Technical Report provides a framework for relating characters
and glyphs to resolve the conflict because successful processing and
printing of character information on computers requires an under-
standing of the appropriate use of characters and glyphs.
Historically, ISO/IEC JTC 1/SC 2 has had responsibility for the devel-
opment of coded character set standards such as ISO/IEC 10646 for
the digital representation of letters, ideographs, digits, symbols, etc.
ISO/IEC JTC 1/SC 18 has had responsibility for the development of
standards for document processing, which presents the characters
coded by SC 2. SC 18 standards include the font standard, ISO/IEC
9541, and the glyph registration standard, ISO/IEC 10036. The Asso-
ciation for Font Information Interchange (AFII) maintains the 10036
glyph registry on behalf of ISO.
This Technical Report is written for a reader who is familiar with the
work of SC 2 and SC 18. Readers without this background should
first read Annex B, “Characters”, and Annex C, “Glyphs”.
This edition of the Technical Report does not fully develop the com-
plex issues associated with the Chinese, Japanese, Korean, and
Vietnamese ideographic characters used in East Asia. In addition,
although it discusses the process of rendering digital character infor-
mation for display and printing, it avoids discussing the inverse proc-
ess of character recognition (that is, converting printed text into char-
acter information in the computer).
iv
TECHNICAL REPORT © ISO/IEC ISO/IEC TR 15285:1998 (E)
Information technology —
An operational model for characters and glyphs
ISO/IEC 10036: 1996, Information technol- 3.7 glyph image: An image of a glyph, as
ogy — Font information interchange — Pro- obtained from a glyph representation dis-
cedures for registration of font-related iden- played on a presentation surface. (ISO/IEC
tifiers. 9541-1: 1991) [See the definition of graphic
symbol.]
ISO/IEC 10180: 1995, Information technol-
ogy — Processing languages — Standard 3.8 glyph metrics: The set of information
Page Description Language (SPDL). in a glyph representation used for defining
1
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
the dimensions and positioning of the glyph independently and contain terminology that
shape. (ISO/IEC 9541-1: 1991) requires explanation.
2
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
3
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
/D\RXW
&KDUDFWHUV *O\SK6HOHFWLRQ
DQG6XEVWLWXWLRQ
*O\SKV
&RQWHQW $SSHDUDQFH
3URFHVVLQJ 2SHUDWLRQV 3URFHVVLQJ
2SHUDWLRQV EHWZHHQ 2SHUDWLRQV
'RPDLQV
'DWD(QWU\ )RUPDW
6HDUFK 'LVSOD\
6SHOO&KHFNLQJ 5HFRJQLWLRQ
*UDPPDU&KHFNLQJ 0RXVH6HOHFWLRQ
Figure 1 — Character and glyph domains
ages into coded characters. Also, a pDUD The recognition that two separate domains
JUDSKOHYHO K\SKHQDWLRQ SURFHVV LV DQ H[ of processing are commonly applied to
DPSOHRIDOD\RXWSURFHVVWKDWUHTXLUHVFRQ character-based information leads to a con-
WHQWLQIRUPDWLRQ clusion that two primary forms of this infor-
mation are needed:
It is not possible, in general, to code data in
such a way as to optimize one process 1. a content-oriented form that is amena-
without reducing the performance of other ble to immediate content-based proc-
processes. Even within the content domain, esses and that can be easily converted
the nature of the character coding employed to and from other optimized forms
for textual data affects the type or types of
processing to be performed on the data; no 2. an appearance-oriented form that facili-
single coding can optimize more than a few tates imaging of content
such potential processes. Given this situa-
tion, the best solution is to formulate an These are, respectively, the character-
independent, logical character coding that, based form and the glyph-based form. Fail-
when necessary, can be transformed into ure to recognize this distinction between the
another coding more amenable to the proc- character domain and the glyph domain has
essing required. For example, in the case of led to the development of inconsistent stan-
searching, character data is often recast dards and inconsistent systems that lack
into specific forms that facilitate quick functional separation of the two domains.
searches. For sorting, a specially created
sort key is required. In addition, because 5.2 Composition, layout, and
ISO/IEC 10646 contains glyph-like charac- presentation
ters, it is expected that implementations
may choose to canonicalize or normalize As depicted in Figure 2 on the next page,
such characters by translating them to nor- the composition and layout process (for
mative characters. A presentation subsys- glyph selection and positioning) spans both
tem that employs such a technique may processing domains. If attention is restricted
require that character data be normalized to the text portion of this process, the pres-
prior to presentation. entation of character-based information
requires three primary operations:
4
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
&RQWHQWEDVHG 6RUWLQJ
3URFHVVLQJ 6HDUFKLQJ
6SHOO
FKHFNLQJ
(QWU\ ,QIRUPDWLRQ
/D\RXW
%RWK 3UHVHQWDWLRQ
,QIRUPDWLRQ
JO\SKLGHQWLILHUV
$SSHDUDQFH 'LVSOD\LQJ
EDVHG
3ULQWLQJ
3URFHVVLQJ
Figure 2 — Composition, layout, and presentation
² selecting the glyph representations many.3) This is particularly true for ISO/IEC
needed to display character data 10646 implementation level 3, which uses
combining characters. In its fully general
² positioning the glyph shapes on the form, the relationship is a context-sensitive
presentation surface M-to-N mapping where M > 0, N 0. For
some characters in ISO/IEC 10646-1, for
² imaging the glyph shapes example, the U+FEFF ZERO WIDTH NO-
BREAK SPACE character, no glyph (N=0) is
Glyph selection is the process of selecting defined.
(possibly through several iterations) the
most appropriate glyph identifier or combi- The SC 18 document-processing model
nation of glyph identifiers to render a coded separates the glyph selection and layout
character or composite sequence of coded operations from the operation of imaging
characters. Coded characters and their as- the glyph shape to permit document inter-
sociated implicit or explicit formatting infor- change between the processes. Glyph se-
mation (for example, specification of the lection and positioning are part of the com-
font and its size) represent the primary in- position and layout process, whereas
puts to composition and layout processing, imaging the glyph shape is part of the pres-
and glyph identifiers (or the associated entation process. The result of composition
glyph metrics and glyph shapes) represent and layout is a final-form document, which
the primary output from composition and contains font identifiers, glyph identifiers,
layout processing. The degree of glyph se- and coordinate positions, along with either
lection sophistication varies widely among references to font resources or the actual
existing standards and implementations. font resources themselves. Such a docu-
ment form contains all the necessary infor-
The relationship between coded characters mation required to present the formatted
and glyph identifiers may be one-to-one,
one-to-many, many-to-one, or many-to-
3) The necessity for mapping characters to glyphs
(glyph selection), not its complexity, is one of the
motivations for developing this operational model
for characters and glyphs.
5
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
6
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
propriate glyph are based solely on (1) the
context of a character within a document,
(2) the style specifications that apply to a
given character, or (3) a combination of the
Isolated Initial Medial Final context and style specification. All of the
choices required for the examples shown
Figure 3 — Glyphs for ARABIC LETTER HEH
above fall into one of these categories.
² In addition, Arabic typography makes However, in general, glyph selection can
extensive use of ligatures. For exam- only be made as an integral part of the en-
ple, Figure 4 shows the isolated forms tire composition and layout process. Con-
of U+0627 ARABIC LETTER ALEF “ ” and sider the following:
U+0644 ARABIC LETTER LAM “ ”, and
then the two ligature forms used when ² When hyphenating a line of text during
Lam is followed by Alef. composition, a composition and layout
process may insert a hyphen glyph
form at the end of a line if the line is
broken at a hyphenation point.
7
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
² When justifying a line of Arabic text, a mation technology distinguishes two re-
composition and layout process may lated, but distinct, domains:
start by selecting ligature glyph forms
that consume the smallest amount of ± The processing domain uses
linear space in a line, and then sequen- coded characters to represent the
tially replace these ligatures with com- character’s meaning.
ponent ligatures or component non- ± The presentation domain uses
ligature glyphs such that more linear glyph identifiers to represent the
line space is consumed up to the re- character’s image.
quired line measure. Alternatively, a
composition and layout process may ² Processes are available to convert be-
start justification by selecting no liga- tween the two domains:
tures and then sequentially select liga-
tures that consume a smaller amount of ± Presentation processing takes the
linear space until the desired line coded-character data plus any
measure is achieved or until an inter- formatting data plus font informa-
word space stretch threshold is tion to display and print character
reached (that is, a point at which inter- data.
word spaces can be stretched to justify ± A character recognition process
the line to the desired measure). scans images, analyzes the
shapes, and outputs the coded
In summary, the glyph-selection process is characters that correspond to the
primarily applicable to behavior occurring at shapes.
the end or beginning of individual lines of
text, or within the context of justifying or ² Depending on the script and the par-
altering the measure of a given line during ticular font or fonts used, glyph selec-
line composition. A system supporting the tion can be straightforward or relatively
capabilities illustrated in the preceding ex- complex.
amples must include glyph selection as an
integral part of the composition and layout ± It is straightforward when a one-to-
process. one correspondence exists be-
tween the set of coded characters
7 Summary and the set of glyphs in a font.
± The process is more complex
Here are the primary points of this technical when it must choose between sev-
report: eral alternatives; for example,
when a sequence of coded charac-
² Most people equate a character and its ters may be mapped into more
shape. than one sequence of glyphs in a
font.
² This causes difficulties and misunder-
standing because contemporary infor-
8
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
Annex A
Bibliography
6. ISO/IEC 10538: 1991, Information 14. The Unicode Consortium, The Unicode
technology — Control functions for text Standard, Version 2.0, Addison-
communication. Wesley, Reading, MA, 1996.
9
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
Annex B
Characters
10
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
ISO/IEC 10646-1: 1993 where over 30,000 Of these characters, the following are
characters are defined, the potential for merely size or position variants of a single
multiple usage conventions increases. form:
z
U+0C67 TELUGU DIGIT ONE “ ”
U+0661 ARABIC-INDIC DIGIT ONE “p”
{
U+0CE7 KANNADA DIGIT ONE “ ”
U+0D67 MALAYALAM DIGIT ONE “ ” U+06F1 EXTENDED ARABIC-INDIC DIGIT ONE
U+0E51 THAI DIGIT ONE “|” “ q”
s
U+0ED1 LAO DIGIT ONE “}” U+0967 DEVANAGARI DIGIT ONE “r”
u
U+2081 SUBSCRIPT ONE “” U+09E7 BENGALI DIGIT ONE “ ”
FRACTION NUMERATOR ONE “ ”
v
U+215F
U+0A67 GURMUKHI DIGIT ONE “ ”
w
U+2160 ROMAN NUMERAL ONE “,” U+0AE7 GUJARATI DIGIT ONE “ ”
x
U+2170 SMALL ROMAN NUMERAL ONE “L” U+0B67 ORIYA DIGIT ONE “ ”
y
U+2460 CIRCLED DIGIT ONE “” U+0BE7 TAMIL DIGIT ONE “ ”
z
U+2474 PARENTHESIZED DIGIT ONE “” U+0C67 TELUGU DIGIT ONE “ ”
{
U+2488 DIGIT ONE FULL STOP “” U+0CE7 KANNADA DIGIT ONE “ ”
U+2776 DINGBAT NEGATIVE CIRCLED DIGIT U+0D67 MALAYALAM DIGIT ONE “ ”
ONE “” U+0E51 THAI DIGIT ONE “|”
U+2780 DINGBAT CIRCLED SANS-SERIF DIGIT U+0ED1 LAO DIGIT ONE “}”
ONE “¥” U+3021 HANGZHOU NUMERAL ONE “~”
a
U+278A DINGBAT NEGATIVE CIRCLED SANS- U+3192 IDEOGRAPHIC ANNOTATION ONE
SERIF DIGIT ONE “¯” MARK “ ”
U+3021 HANGZHOU NUMERAL ONE “~” U+3220 PARENTHESIZED IDEOGRAPH ONE
“ ”
a
U+3192 IDEOGRAPHIC ANNOTATION ONE
MARK “ ” U+3280 CIRCLED IDEOGRAPH ONE “ ”
U+3220 PARENTHESIZED IDEOGRAPH ONE U+4E00 CJK UNIFIED IDEOGRAPH-4E00 “ ”
“ ” U+58F9 CJK UNIFIED IDEOGRAPH-58F9 “ ”
U+3280 CIRCLED IDEOGRAPH ONE “ ”
U+4E00 CJK UNIFIED IDEOGRAPH-4E00 “ ” This example clearly shows that the de-
U+58F9 CJK UNIFIED IDEOGRAPH-58F9 “ ” signers of this character set did not start
U+FF11 FULLWIDTH DIGIT ONE “ ”
with individual units of information and as-
sign each such unit to a unique character;
11
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
0123456789
furthermore, it is also clear that the design-
ers did not start with individual forms and
assign each to a unique character. Rather,
a combination of forms and variations of a Figure 6 — Old style figures
single form, all signifying the idea “one”,
were included as distinct characters. B.4 Considerations for deciding the
repertoire of a coded character set
To gain an understanding of the distinction
between characters and glyphs, consider Various arguments are possible for de-
that the following characters could have fending the inclusion or exclusion of a par-
easily been unified into a single character ticular form as a possible graphic character
that would be displayed using one of four in a repertoire. In many cases, the criterion
glyphs: for either inclusion or exclusion has not
been articulated but is based on informal
U+0031 DIGIT ONE “”
U+00B9 SUPERSCRIPT ONE “ ”
opinion about appropriateness. Justifying
U+2081 SUBSCRIPT ONE “ ”
why certain forms were coded into ISO/IEC
U+FF11 FULLWIDTH DIGIT ONE “ ” 10646-1: 1993 and why others were not is
beyond the scope of this Technical Report.
These four characters can be considered as However, with respect to coding glyphs ver-
instances of one character that takes on sus characters, the objective is to code
slightly different forms depending on usage. characters that represent different informa-
In this case, usage or style alone would tion. To meet this objective, three important
govern the form chosen to depict a single considerations should be applied.4)
abstract character. In the case of a form
used as the numerator of a fraction, the 1. Same shape/different meanings
appropriate glyph could be determined
based on the local context of the character, Does one shape have multiple mean-
assuming for a moment that a character ings (semantics)?
such as a U+0031 DIGIT ONE “” is followed
by a U+2044 FRACTION SLASH “'”. In the Some shapes will be the same, or
remaining cases, the character’s immediate nearly the same, but have different
context would not be sufficient but would meanings or different semantics. An
require that additional information be sup- example of this is that in many sans-
plied such as style information that would serif fonts the glyph “I” is used for both
govern the appearance of a character when the U+0049 LATIN CAPITAL LETTER I “,”
displayed. In either case, the process of and the U+006C LATIN SMALL LETTER L
depicting a given character may require the “O”. Similarly, for years many typewriters
selection of one of a number of possible lacked a key for the U+0031 DIGIT ONE
glyphs, each of which may serve (in “” and people were taught to type the
different cases) to present the image of a U+006C LATIN SMALL LETTER L “O” in-
character. stead. Later, when people switched
from typewriters to computers, this
Notice that certain other possible forms of a practice failed and people had to re-
“one” are, in fact, not found in this standard learn to type the digit one “” instead of
as characters. For example, many high- the letter “O”.
quality font collections supply a collection of
forms for the Arabic numerals known as old 2. Different shapes/same meaning
style figures shown in Figure 6. Were the
old style figures included as characters, the Do two or more shapes imply the same
OLD STYLE FIGURE DIGIT ONE “1” could have
meaning (semantics)?
been added to 10646.
4) Peter Lofting, “The Perception of Character Enti-
ties in Unfamiliar Scripts”.
12
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
In practice, the need for compatibility Certain characters that might have been
with existing coded character sets fre- unified in 10646 were not unified because of
quently overrides the second consid- the round-trip rule. For instance, U+00B9
eration. Examples of this are found in SUPERSCRIPT ONE “¹” was not unified with
ISO/IEC 10646-1: 1993. The next U+0031 DIGIT ONE “1” because ISO 8859-1:
clause describes an important compati- 1987, a source character set for 10646,
bility criterion, the “round-trip rule”. includes these two forms as distinct charac-
ters. Most of the instances of formal entities
These considerations should be used to within 10646 that could have been unified
help decide which forms to include in a new were likewise distinct characters in some
repertoire to be coded. Although the con- source character set or, in some special
siderations are easy to state, obtaining de- instances, distinct characters in certain un-
finitive answers requires considerable effort, ions of character sets, for example, the un-
for example, to consult with experts and ion of 7-bit ASCII (ANSI X3.4-1986), JIS X
native users, who are normally unaware of 0201-1976, and JIS X 0208-1990 as em-
information technology and not concerned ployed in Shift JIS coding in Japan.
with such details.
13
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
Annex C
Glyphs
14
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
15
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
the index values in the document to the lection is a list of glyph identifiers, and it
glyph identifiers of the font resource, or it may be assigned a unique identifier. Font
may use predefined (registered) glyph-index resources may contain any combination of
maps. glyph identifiers, and revisable documents
may contain any repertoire of character
C.3.4 Glyph collection codes. In formatting and presenting a
document, glyph collections help locate font
To aid in the process of identifying a font resources that contain a full set of glyphs
resource that contains a required set of that correspond to the set of character
glyphs, ISO/IEC 9541 defines a data struc- codes contained in the document.
ture called a glyph collection. A glyph col-
16
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
Annex D
Font models
D.1 Overview of font models identifiers contained in the font. The first
form requires separate fonts for each code
This annex describes three font models. table supported, while the second form re-
The first two, the coded font model and the quires separate mapping tables for each
font resource model, are from SC 18. The code table supported (this latter form saves
third, the intelligent font model, is from the storage). Both data structures depend on a
Unicode Consortium. Any one of these one-to-one mapping of character codes to
models could be used successfully to print glyphs in a font, and this is the basis for the
or display characters coded in ISO/IEC coded font model illustrated in Figure 7.
10646 or in other coded character sets.
These font models rely not only on the pro- This font model is the historic presentation
cesses described in this annex but also on model for data processing. In this model,
the glyph data structures described in An- each character code encountered by the
nex C.3, “Use of glyph identifiers”. layout process is used to locate a corre-
sponding glyph in the coded font. The glyph
D.2 Coded font model metric information for that character code is
used to determine positioning of the glyph,
A coded font (or a character-coded font) is a along with line and page breaks. The for-
data structure in which character codes are matted document may be interchanged to
used to identify the glyph metric and glyph another location for presentation processing
shape information contained in the font. In or transmitted to a local presentation proc-
practice, two primary forms of this data ess. The presentation process would use
structure are used: one in which the charac- the character codes contained in the format-
ter codes are used directly in the font to ted document to locate a corresponding
identify the glyph metric and glyph shape glyph in the coded font and use the associ-
information, and one in which the character ated glyph shape information to image the
codes are mapped to independent glyph glyph on the presentation surface at the
5HYLVDEOH'RFXPHQW
7H[W
&KDUDFWHU&RGHV
6W\OH
'HYLFH )RUPDW&RQWURO
,QIRUPDWLRQ
,QIRUPDWLRQ IRUPDW FRGHWDEOH
RSWLRQDO LQIRUPDWLRQ
/D\RXW 3UHVHQWDWLRQ
*HQHUDO/D\RXW3URFHVV
&RGHG)RQW
3DJH/D\RXW
DFFHVVJO\SKPHWULFVE\FKDUDFWHUFRGH *O\SK0HWULFV
&RGHG)RQW
LGHQWLILHGE\FKDUDFWHUFRGH
&RGHG)RQW
3URFHVV
)RUPDWWHG'RFXPHQW
'HYLFH,QGHSHQGHQW
FKDUDFWHUFRGHV
ZLWKSRVLWLRQLQIRUPDWLRQ
*O\SK6KDSHV
3UHVHQWDWLRQ3URFHVV
LGHQWLILHGE\FKDUDFWHUFRGH
5DVWHU,PDJH3URFHVVLQJ5,3
DFFHVVVKDSHLQIRUPDWLRQE\FKDUDFWHUFRGH
,PDJHVRQ
3UHVHQWDWLRQ6XUIDFH
Figure 7 — Coded font model
17
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
5HYLVDEOH'RFXPHQW
7H[W
&KDUDFWHU&RGHV
6W\OH
'HYLFH )RUPDW&RQWURO
,QIRUPDWLRQ
,QIRUPDWLRQ IRUPDW FRGHWDEOH
RSWLRQDO LQIRUPDWLRQ
&KDUDFWHUWR*O\SK0DSV
&RGHG&KDUDFWHUVWR
/D\RXW 3UHVHQWDWLRQ
*O\SK,GHQWLILHUV
*HQHUDO/D\RXW3URFHVV IRUYDULRXVFKDUDFWHUHQFRGLQJV
6HOHFW*O\SK/D\RXW3DJH %XLOG*O\SK,QGH[
DFFHVVJO\SKPHWULFVE\JO\SKLGHQWLILFDWLRQ
*O\SK0HWULFV
)RQW5HVRXUFH
3URFHVV
LGHQWLILHGE\JO\SKLGHQWLILHU
)RUPDWWHG'RFXPHQW *O\SK,QGH[
'HYLFH,QGHSHQGHQW 0DS
JO\SKLQGH[ JO\SKLQGH[WR
ZLWKSRVLWLRQLQIRUPDWLRQ JO\SKLGHQWLILHUV
3UHVHQWDWLRQ3URFHVV
*O\SK6KDSHV
5DVWHU,PDJH3URFHVVLQJ5,3
LGHQWLILHGE\JO\SKLGHQWLILHU
DFFHVVVKDSHLQIRUPDWLRQE\JO\SKLQGH[
,PDJHVRQ
3UHVHQWDWLRQ6XUIDFH
Figure 8 — Font resource model
18
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
Unformatted Formatted
Document Document Glyph Index Map Font Resource
Coded Document Glyph Index Index Glyph Glyph Metric & Shape
Characters Formatting Values Values IDs IDs Data
Process 0 0 mmmm,sssss
· ·
· ·
… …
0x007C 0x0122 0x0122 1874 1874 mmmm,sssss
… … · ·
· ·
0x04AB
33661 mmmm,sssss
This flexibility allows a composition and To this data structure, the intelligent font
layout process to generate a glyph index adds information describing
map that accesses only and exactly those
glyphs of a large font resource that are ² how a sequence of coded characters is
needed to image the output of the process. transformed into a sequence of glyph
This glyph index map may be combined identifiers, with associated position in-
with the font resource to produce an in- formation
dexed font for this particular output.
² how the transformation of coded char-
In the font resource model, the relationship acters to glyph identifiers is affected by
between the character repertoire and the style information
glyph collection may involve a one-to-one
mapping but may also involve a one-to- The first type of additional information typi-
many or many-to-one mapping. It is essen- cally includes several mappings from vari-
tial for successful presentation that the set ous coded character sets to private (font-
of glyphs in the glyph collection be mappa- specific) glyph identifiers. Subsequent
ble to the repertoire of characters used in transformations use the glyph identifiers.
the text or ideographic string. For the The subsequent transformations may be
smaller, single-byte coded character sets, it complex and may result in changes to the
is common to have a font resource that con- number and ordering of the glyph identifiers.
tains a glyph collection that contains all of For example, it may transform multiple
the glyphs required to present the character coded characters into a single glyph (either
repertoire of several coded character sets. because the glyph is a ligature or because
However, for the larger ISO/IEC 10646 the coded character sequence is a compos-
multi-octet coded character set, it will be ite sequence) or a single coded character
more common to have font resources that into multiple glyph representations that to-
contain glyph collections that are capable of gether construct the intended shape. See
presenting selected sub-repertoires of the Annex E. The second type of additional
total 10646 repertoire. information permits, for example, substitu-
tion of glyph subsets (for example, swash
D.4 Intelligent font model variants, vertical substitution) based on
style information.
An intelligent font is a data structure that
19
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
7H[W
&KDUDFWHU
&RGHV LQ &KDUDFWHU
6W\OH
'HYLFH PHPRU\
&RGH
,QIRUPDWLRQ
,QIRUPDWLRQ RUGHU
RSWLRQDO ,GHQWLILFDWLRQ
*O\SK6HOHFWLRQ3URFHVV
&KDUDFWHUWR*O\SK0DSV
/D\RXW 3UHVHQWDWLRQ IRUYDULRXVHQFRGLQJV
,QWHOOLJHQW)RQW
*O\SK,GHQWLILHUV
LQFKDUDFWHURUGHU )HDWXUH6HOHFWLRQ
3URFHVV
*HQHUDO/D\RXW3URFHVV
QRNQRZOHGJHRIZULWLQJV\VWHP /D\RXW7UDQVIRUPDWLRQ
0RGLILHG*O\SK,GHQWLILHUV *O\SK0HWULFV
LQGLVSOD\RUGHU
ZLWK3RVLWLRQ,QIRUPDWLRQ
3UHVHQWDWLRQ3URFHVV *O\SK6KDSHV
,PDJHVRQ
3UHVHQWDWLRQ6XUIDFH
An intelligent font can be used with a layout which is written from left to right, the first
and presentation process that directly pre- character would be the leftmost character.
sents coded characters, that is, plain text (a For Latin text included in the middle of Ara-
coded character sequence that does not bic text, the logical order would be the
contain additional formatting information). rightmost Arabic character to the end of the
Figure 10 shows the intelligent font model Arabic text, then the leftmost Latin character
and the following paragraphs describe this to the end of the Latin text, and then the
model. rightmost Arabic character of the second
group of Arabic text to the end of the Arabic
Within the layout and presentation process text.
of the intelligent font model, the glyph
selection process transforms coded Next, the general layout process transforms
characters to glyph identifiers. This process the glyph identifiers in logical order into
requires (possibly modified) glyph identifiers in dis-
play order. Display order is the order in
² information about how the characters which the characters are to appear on pa-
are coded per or on a screen. The general layout pro-
cess requires
² the map from coded characters to glyph
identifiers for the specified character ² glyph metrics
coding
² layout transformation
The process takes coded characters in
memory or logical order and produces glyph ² feature selection information (how to
identifiers in character or logical order. use the optional style information)
Logical order is the order in which a person
² optional style information
would normally read the characters regard-
less of the normal direction of the charac- ² device information
ters. Thus, for a text stream of Arabic, which
is written from right to left, the first character
would be the rightmost character; for Latin,
20
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
The presentation process is the final proc- D.5 Font model summary
ess. It takes the glyph identifiers in display
order, the glyph positions, and the glyph Table 1 summarizes and compares the
shapes to produce the images on paper or three font models described in this Annex.
a screen. The primary difference between the three
models is the sophistication of the process
for selecting glyphs.
Font Models
Characteristic
Coded Font Font Resource Intelligent Font
Glyph Selection Process None Yes (1 Process) Yes (2 Processes)
(character-to-glyph mapping) (1-to-1) (1-to-1 or M-to-N) (1-to-1 or M-to-N)
Character-to-Glyph No Yes Yes
Mapping (implied by character (external to font re- (in font resource)
Font Data Structure
21
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
Annex E
Examples of character-to-glyph mapping
E.1 Mapping characters to glyphs Such mappings are more common in other
writing systems. Hebrew, for example,
This Annex shows examples of the charac- makes extensive use of diacritical marks
ter-to-glyph mapping process. It should be that are written around and even within
emphasized that it is often possible to rep- various letters of the alphabet. The exact
resent a coded character sequence in more position of the diacritical marks varies de-
than one way and to provide a visual repre- pending on the letter with which they are
sentation for it in more than one way. The written. The sequence U+05E4 HEBREW
two processes are separate, and they can LETTER PE “ ”, U+05BC HEBREW POINT
be individually optimized. DAGESH OR MAPIQ “f”, and U+05B8 HEBREW
POINT QAMATS “»” is often drawn by using a
E.2 One-to-one single glyph “ ” to provide optimal place-
ment of the diacritical marks.
A one-to-one mapping from character to
glyph is the most frequently used in repre- Level 3 implementations of ISO/IEC 10646-
senting Latin-based languages, where the 1 also use combining characters to repre-
character U+0041 LATIN CAPITAL LETTER A sent accented Latin letters. Again, individual
“$”, for example, is likely to be drawn by glyphs can be used to provide the best
using a single “$” glyph. The coded font alignment of letter and accent. A level 3
model assumes that a one-to-one mapping implementation of ISO/IEC 10646-1 might
is always the case. well use the coded character sequence
U+0065 LATIN SMALL LETTER E “H” and
It is often possible to use a single glyph to U+0302 COMBINING CIRCUMFLEX ACCENT “Ì”
represent more than one distinct character. but draw it using a single “r” glyph.
For example, both the U+00C5 LATIN
CAPITAL LETTER A WITH RING ABOVE “c” and E.4 One-to-many
U+212B ANGSTROM SIGN “ ” can be repre-
sented by the glyph “c”. It is also conceiv- One-to-many mappings are more common
able for some implementations to use a than is often suspected. Whereas high-
single glyph for U+0041 LATIN CAPITAL quality typography would insist on a large
LETTER A “$”, U+0391 GREEK CAPITAL number of glyphs to provide greatest visual
LETTER ALPHA “ ”, and U+0410 CYRILLIC appeal, systems that cannot afford the nec-
CAPITAL LETTER A “ ”. These examples are essary overhead can resort to other
different from the many-to-one mapping schemes. They might draw a U+00E9 LATIN
discussed below. SMALL LETTER E WITH ACUTE “p” by drawing
the “H” glyph first then positioning the “”
E.3 Many-to-one glyph above it to form the glyph for the “p”.
Many-to-one mappings are common even in One-to-many mappings are also found in
Latin typography. The sequence U+0066 Indic languages, where vowels can be writ-
LATIN SMALL LETTER F “I” and U+0069 LATIN ten in two pieces, one on either side of the
SMALL LETTER I “L” could be drawn by using
U
character they follow. The single character
a single glyph “¿” for the ligature of “I” and U+09CB BENGALI VOWEL SIGN O “ ” can be
“L”. The sequence U+0031 DIGIT ONE “1”, displayed using two glyphs that appear on
and U+2044 FRACTION SLASH “'”, and either side of the related consonant.
U+0032 DIGIT TWO “2” could be drawn by
using a single “ò” glyph. ISO/IEC 10646 also included characters for
Roman numerals. A system may choose to
22
© ISO/IEC ISO/IEC TR 15285: 1998 (E)
draw U+2165 ROMAN NUMERAL SIX “9,” by dot below, and a system that has a single
drawing a “9” and an “,” to the right. glyph for this sequence may simply draw
that. (Similarly, a level 1 implementation of
E.5 Many-to-many ISO/IEC 10646 would use the coded char-
acter U+1EC7 LATIN SMALL LETTER E WITH
Given the previous examples, it should not CIRCUMFLEX AND DOT BELOW “r”.)
be surprising that even many-to-many map-
pings occur. For example, in writing Viet- Indeed, depending on the details of the in-
namese using level 3 of ISO/IEC 10646, the dividual implementation, many of the exam-
coded character sequence U+0065 LATIN ples from the previous clauses could be
SMALL LETTER E “H”, U+0302 COMBINING recast in a many-to-many fashion. Again,
CIRCUMFLEX ACCENT “Ì”, and U+0323 note carefully that depending on the individ-
COMBINING DOT BELOW “” could occur. Dis- ual designs of the glyphs, individual pre-
playing this sequence would require draw- sentation systems will often differ in how
ing an “H” with a “$” above it and a dot “ ”
they represent characters and how they
below it. A system that has an “r” glyph may present the associated glyphs.
choose to use that glyph and then add the
23
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
Annex F
Recommendations of the original report
24
ISO/IEC TR 15285: 1998 (E) © ISO/IEC
ICS 35.240.30