Book Pahlavi
Book Pahlavi
Book Pahlavi
2018-08-26
Anshuman Pandey
[email protected]
1 Introduction
This is a proposal to encode the ‘Book Pahlavi’ script in Unicode. Other proposals for the script have been
submitted previously by different authors:
• 1993: “Unicode Technical Report #3”, Rick McGowan and Joe Becker
• 2007: “Preliminary proposal to encode the Book Pahlavi script in the BMP of the UCS” (L2/07-234),
Michael Everson, Roozbeh Pournader, and Desmond Durkin-Meisterernst
• 2013: “Preliminary proposal to encode the Book Pahlavi script in the Unicode Standard” (L2/13-141),
Roozbeh Pournader
• 2014: “Proposal for Encoding Book Pahlavi in the Unicode Standard” (L2/14-077), Abe Meyers
• an encoding that aligns with Unicode principles and the character-glyph model
• a character repertoire based upon semantically distinctive letters, numbers, and signs that can be used
for completely representing the script
• a model that supports the joining structure of the script and variations in the joining behavior of letters
This document is concerned primarily with presenting an encoding model for Book Pahlavi that provides
for the full encoding of printed texts, as these records are currently used by the Zoroastrian and Parsi com-
munities. I am actively conducting research to develop and expand the model. Towards that end, I request
feedback from experts and users of the script. A comparison of the advantages of my proposed encoding
with previous proposals will be offered in the formal proposal, which is forthcoming. The formal proposal
will also include additional background information and a set of specimens of usage. At present, the figures
provided in the previous proposals should be consulted.
1
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
2 Background
The ‘Book Pahlavi’ script is used for writing the Iranian language known as ‘Middle Persian’ (ISO 639-3:
pal). Originally spoken in southwestern Iran, this language began to flourish during the 3rd century with
the rise of the Sasanian dynasty, which succeeded the Parthian dynasty in 224 . Middle Persian was used
as a prestige language during the Sasanian dynasty, but began to decline after the Arab invasion in 651.
The script is one of three ‘Pahlavi’ writing systems (see table 1). The earliest is known as ‘Inscriptional
Pahlavi’. It is derived from the Parthian script, which evolved from a form of Imperial Aramaic. The
inscriptional Pahlavi script is a non-cursive abjad. The ‘Psalter Pahlavi’ is a full cursive joining abjad.
derived from the inscriptional form. It is attested in the Syriac Psalter, a Christian manuscript consisting of
twelve extant folios, from the c. 5th century . The ‘Book Pahlavi’ is the most well-known of these scripts
and has the largest extant corpus. It developed from the inscriptional type. Of the three, only Book Pahlavi
remains unencoded in Unicode.
The labels ‘inscriptional’ and ‘book’ are scholarly classifications based upon strict assessments of application
of the Pahlavi scripts in the available records. Although described as ‘book’ on account of its usage in
Zorosatrian literature, the script also occurs in inscriptions, coins, seals, and ostraca. From the perspective of
script encoding, the terms ‘inscriptional’, ’psalter’, and ‘book’ refer to the structure of the scripts, particularly
the lapidary nature of the ‘inscriptional’ type and the connected or cursive nature of the ‘psalter’ and ‘book’
forms.
Although common usage of Book Pahlavi declined after the introduction of the Arabic script in the 7th cen-
tury, it was maintained as an important liturgical and literary script. Alongside the Avestan script, Book
Pahlavi continues to possess significance for the Zoroastrian community. The extant literature of Zoroas-
trianism is written in these scripts. Book Pahlavi was adapted for printing in the late 19th century, and
Zoroastrian texts and Middle Persian grammatical studies continue to be printed in India in the script. The
script is also actively studied by scholars, especially of Middle Persian language and linguistics, and the
history and culture of pre-Islamic Iran.
3 Proposed Repertoire
• 20 letters
• 2 fixed-form letters
• 2 special ligatures
• 1 word ligature
• 1 particle
• 8 combining signs
• 1 end-of-word mark
• 2 punctuation signs
• 5 numbers
The code chart and names list follows p. 38. The encoded set may differ from traditional and scholarly in-
ventories of the script that occur in manuscript, inscriptional, and printed sources. Such differences naturally
2
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
arise from the requirements for digitally representing a script in plain text and for preserving the semantics
of characters.
Unicode character names are based upon those of ‘Imperial Aramaic’ characters. This convention has been
followed for Unicode encodings for related scripts, eg. Inscriptional Pahlavi and Psalter Pahlavi.
In this document names in italics refer to scholarly names for graphemes while names in small capitals refer
to Unicode characters, eg. is beth and . For sake of brevity, the descriptor
‘ ’ is dropped when refering to Book Pahlavi characters, eg.
may be referred to as . For letters that have been unified as one character, the graphemes may
be referred to using the names of the individual letters, while the character is known using the compound
name. For example, is the character - , but may be referred to as either
aleph or heth in discussion of the individual graphemes. Characters of other scripts are designated by their
full Unicode names. Latin transliteration of Book Pahlavi follows the current scholarly convention, with
Aramaic heterograms given in uppercase letters.
3.1 Letters
The following 20 basic letters are proposed. Details on the joining behavior of letters is given in § 5.2.
- dual
ʾ, h, x
right b
- - dual g, d, y
right d
right h
- - - right w, n, , r
ʿ
dual z
right k
right k
dual r
dual l, r
dual l
3
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
right l
- dual m, q
dual s
dual s
right p
right c
dual š
right t
The following two ‘fixed-form’ characters are proposed in order to represent the respective letters when
they occur in cases where their normal joining behavior is suspended (see § 6.2 and § 6.4.2). If the dif-
ferent behaviors described in the aforementioned sections may be produced using existing Unicode control
characters, then these ‘fixed-forms’ letters may be removed from the proposed repertoire.
- - dual
ʾ, h, x
- - - dual g, d, y
The following 2 special ligatures are encoded as atomic characters and their character names are based upon
scholarly usage:
1 non x1
2 non x2
4
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
The following character is the word for Ahriman, the Zoroastrian antagonist, rotated 180° counter-clockwise.
The orientation carries the metaphor of turning away the negative spirit. It occurs primarily in Pahlavi texts of
the 9th–12th centuries. It is proposed as an atomic character in order to provide a means for its representation
in plain text.
non
ʾhlmn
3.5 Particle
The following character represents the Aramaic heterogram ZY. It is proposed as an atomic character in order
to provide for its representation in plain text.
non ZY
The following 8 combining signs are used for distinguishing different values for letters that have the same
shape:
◌
◌
◌
◌
◌
◌
◌
◌
5
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
The following character is used for marking the end of a word. Also known in some scholarly works as
the ‘otiose stroke’, it is used only after letters that do not connect to the left. This character resembles
- - - , but it is encoded as a separate character on account of its character semantics. It is a
non-joining character that is used solely for delimiting words.
non .
3.8 Punctuation
The following two signs of punctuation occur in manuscripts and printed works. They resemble punctuation
already encoded in the Avesta block, ie. 𐬺 + 10B3C and
𐬾 + 10B3E . The difference is that the Book Pahlavi
punctuation are not ‘tiny’ or ‘large’ as the Avestan signs, but are of a ‘medium’ or ‘normal’ size. The below
characters are, therefore, encoded separately in order to accurately represent the proportions of the signs with
surround text.
3.9 Numbers
right 1
right 2
right 3
right 4
right 100
6
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
4 Script Details
4.1 Structure
Book Pahlavi is a cursive joining abjad. It is written from right to left, with lines that advance from top to
bottom.
Letters are written on a baseline. The nominal forms of letters are shown below where they occur in relation
to the baseline:
The ‘baseline’ is not readily apparent. It may be established by taking the baselines of the nominal shapes of
the letters - , - - , , , and alternate forms of the latter. The typical
‘head-height’ may be established by the heights of - , , - - , etc. Accord-
ingly, all other letters have features that are either ascending or descending.
4.3 Punctuation
Spaces are commonly used for separating words. The proposed signs of punctuation are used for indicating
text segments of varying length.
4.4 Line-breaking
There are no formal rules for the breaking of words at the end of line. Moreover, the available sources do not
contain text with words broken across lines. It may be assumed that words were not split at line boundaries.
There are no indications of hyphens or other continuation marks. In digital layouts, line-breaks should occur
occur after words.
4.5 Collation
7
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
5 Joining behavior
It is commonly said that Book Pahlavi has numerous ‘standard’ or ‘obligatory’ ligatures. Previous proposals
for encoding the script did not provide a thorough analysis of these ligatures. However, examples of ligatures
are provided in published materials. Such statements and absence of information on ligatures are based upon
a lack of understanding of the joining rules for the script.
To be fair, there is no manuscript or scholarly manual that is readily available that specifies such rules. The
ambiguity of certain sequences of letters further adds to the supposed complexity of ligatures in the script.
Nonetheless, the first step in understanding such ligatures is to analyze the joining behavior of each letter of
Book Pahlavi. This process permits a practical method of analyzing all ligatures in the script.
The word šāhān ‘kings’ (pl. of šāh ‘king’) is written using the following letters:
According to the rules of the script, these five letters are not strung along as
𐰉
But, are rendered according to the rules of the script as:
𐰉
In the above, the original shapes of the underlying letters are not easily recognizable, with the exception of
the nun, and perhaps the penultimate aleph. For this reason, encoding 𐰉 into its constituent characters
is difficult. Without knowing the joining behavior of letters, one could conjure up several different ways of
analyzing the cursive properties of the letters.
One method is to segment ligatures into primitive graphical components, as was done for producing metal
types. Such an approach, however, is quite subjective. It provides for numerous dissections of the ligature
into glyphic elements. For example:
8
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
𐰉
𐰉 𐲚 𐲣 𐲣 𐰄
𐰉 𐲚 𐲣 𐲣 𐲣 𐰄
𐰉 𐲚 𐰄 𐲤 𐰄 𐲤 𐰄 𐲤 𐰄
𐰉 𐰉 𐲤 𐰄 𐲤 𐰄 𐲤 𐰄 𐲤 𐰄
++++
𐰉 𐲚 𐰄
𐰉 𐲛
𐰉 𐲛
𐰉
There are many other possibilities. Composing Book Pahlavi text using a glyphic model was certainly fea-
sible for metal printing. For that purpose, it was sufficient to graphically reproduce the text of a particular
book or manuscript. But, such an approach is not useful for representation of Book Pahlavi texts in a digital
medium. It is necessary to represent the underlying characters, more than their graphical appearance. Instead
of stringing together a sequence of graphical primitives, it is more valuable from a plain text perspective to
use characters that correspond to letters of the script, as this transmits semantic values and identities, and to
use font technologies to render the ligatures.
As described in § 4.2, Book Pahlavi letters may be considered to be written on a baseline. The
The joining rules of certain letters specify that the connection to the next letter occurs not at the baseline,
but using a loop that descends basically a full x-height before curving back up to the baseline to join the
next letter. In this regard, it is the responsibility of a given letter to ensure that it joins to the following letter
according to the rules. This should be applied to typography as well.
Based upon these rules, the cursive connections for producing šāhān are as follows:
{ }
𐰉 𐰉 ++++
9
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
Book Pahlavi letters are traditionally divided into two sets: seven dual-joining and seven right-joining letters.
Alternate forms of letters have the same joining properties as the conventional letter. The isolated or nominal
forms of letters are typically identical to their initial forms.
Xn Xf Xm Xi
- - - , -- , -- , - , -
- - - -𐰅-, --, -- -𐰄 , - , -
𐰌- -𐰌- -𐰎, -𐰋
𐰱- -𐰰- -𐰯
- 𐰻 𐰶 𐰲
𐱂- -𐱂- -𐰿
-- -𐱈
- -- -
10
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
Xn Xf Xi
-
-
𐰇-
- - - 𐰉-
𐰑-, 𐰒-
𐰔- 𐰓
𐰮-
-, -, -
-
|
In order to develop a preliminary encoding model for Book Pahlavi, I have analyzed a variety of texts in
order to understand and identify the rules for connections between letters, as well as the contextual forms of
letters in cursive contexts. I provide these details in the next section.
11
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6 Description of Letters
The Book Pahlavi letters aleph and heth have the same shape and joining behavior. For this reason they are
unified into the single character - . This character is a dual-joining letter and has the following
behahior:
before - , - - ,
- before ,
- before - , - - ,
-- before ,
< zg>
ʾ azg branch 𐰌 - ,
𐰌 ,
- -
< pyckyh>
ʾ abēzagīh purity - ,
,
- - ,
,
,
- - ,
-
< thš>
ʾ ātaxš fire - ,
,
- ,
12
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
<b ht>
ʾ baxt destiny ,
- ,
- ,
<GBRʾ> mard man - - ,
,
- - - ,
-
< š dyh>
ʾʾ ašāyīh righteousness - ,
,
- ,
- - ,
- - ,
-
<g h>
ʾ gāh special place, - - ,
throne - ,
-
<g h n>
ʾʾ gāhān the Gathas 𐰉 - - ,
𐰉 - ,
- ,
- ,
- - -
13
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
<hm hl>
ʾ hamahl someone of 𐰙𐰸 - ,
equal social 𐰙𐰸 - ,
standing ,
- ,
- - -
<z hr>
ʾ zahr poison, venom 𐰙𐰏 ,
𐰙𐰏 - ,
- ,
- - -
<l tyh>
ʾ rādīh generosity 𐰞 ,
𐰞 - ,
,
- - ,
-
<š h>
ʾ šāh king ,
- ,
-
14
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
15
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.3 Beth
The letter beth is represented using . It is a right-joining letter. Its joining behavior is:
Letters that follow beth are written after the right descender of and above the horizontal stroke, nested
within the letter. The behavior of is illustrated below.
16
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
When beth occurs more than once in character sequence, the horizontal stroke of each preceding beth is
lowered to accommodate each subsequent occurrence. This behavior results in a nested appearance in which
the horizontal stroke of the left-most beth is nested within the lowered stroke of each preciding beth.
The Book Pahlavi letters gimel, daleth, yodh have the same shape and joining behavior, and are therefore
unified as the single character - - . It is a dual-joining letter, whose regular behavior is
illustrated below.
17
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
- before ,
also before - - in certain cases (see below)
before - , - - ,
-𐰄 before
-- before ,
before - , - - ,
-𐰅- before
18
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
𐰉𐰗 ,
- - ,
- - ,
- - - ,
<drwyst> drust healthy, sound 𐱇𐰉 - - ,
𐱇𐰉 ,
,
- - ,
,
<ym> yam Jam 𐰻 - - ,
𐰻 -
When - - is followed immediately by another instance of the same letter, then its contex-
tual form is determined by that of the second - - . The cases described below are to be
considered regular rendering behaviors for adjacent sequences of this letter.
19
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
2. When the second is shaped as 𐰄 — as before — then the first is rendered using its nominal
form:
In some words, in an adjacent sequence of - - the first is rendered using its nominal form,
while the second is shaped based upon the following letter. This behavior differs from the representation
of the words gēhān and gētīy, as described above. The exceptional cases require some mechanism for rep-
resenting a form of - - that does not change its shape. This behavior is morphological in
nature and cannot be predicted using conventional rules of the script. Instead of using a control charater for
modifying the regular behavior of - - , a ‘fixed’ form of the letter is proposed for encod-
ing: - - - . If experts agree that the representations below may be suitably
represented using a control character, then the ‘fixed-form’ letter may be withdrawn. The - -
- is to be used for representing the following cases:
20
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
- -
An archaic form of daleth occurs in historical spellings. This form is inherented from Psalter Pahlavi. It
has a distinctive shape and differs in its joining behavior from daleth. This letter is encoded separately as
.
𐰆𐰞𐰋 ,
- - ,
6.6 he
The letter is used only in Aramaic heterograms. It often resembles the sequence or 𐰉𐰲 mem + nun
(or waw). But, it is encoded as a separate character because of its semantic value and its treatment as an
atomic unit.
21
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
These four letters have the same shape and joining behavior, and are unified as the single character -
- - .
22
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.8 zayin
𐰏 before - , - - ,
𐰏 before - , - - ,
az <’z> goat 𐰍 - ,
𐰍
23
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.9 kaph
The letter kaph is written using , but it also has an archaic form that occurs in Aramaic heterograms
and historical spellings of words. This latter form is encoded as the separate character on account
of its distinctive shape.
24
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
𐰒 after - - , mem
25
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.11 lamedh
Although palaeographically derived from Aramaic lamedh, the letter generally represents /r/ in
Book Pahlavi. The letters and represent lamedh in Aramaic heterograms. As they occur concurrently
and are preserved in historically spellings of words, they are encoded as the separate characters
and , respectively. When represents /l/ instead of /r/, it is marked with a small stroke
as . This form is encoded as the letter .
26
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
ma <’L> do not 𐰮 - ,
(neg. part.) 𐰮
𐰱 - ,
𐰱
fradāg <MHL> tomorrow 𐰮𐰲 ,
𐰮𐰲 - ,
𐰱𐰲 ,
𐰱𐰲 - ,
ō <‘L> to (prep.) - - - ,
- - - ,
The letters mem and qoph are written using the same shape . They have the same joining behavior. The
letter qoph rarely occurs, and only in Aramaic heterograms.
27
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
28
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.13 samekh
The samekh is written using the two distinctive forms and . These forms are not glyphic variants, but
may occur concurrently in a text, and also within a word. The is encoded as , while is encoded
as .
29
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
30
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.14 pe
31
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.15 sadhe
𐰕 ,
sang <CCA> stone ,
,
-
32
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.16 shin
33
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.17 taw
34
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
6.18 x1, x2
35
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
7 Description of numbers
1 ēk one
2 dō two
3 sē three
4 čahār four
5 panǰ five ,
6 šaš six ,
7 haft seven ,
8 hašt eight ,
9 nō nine ,
,
36
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
8 Character Properties
37
Preliminary proposal to encode Book Pahlavi in Unicode Anshuman Pandey
9 Acknowledgments
I would like to thank Roozbeh Pournader for sharing his materials on the Book Pahlavi script and for moti-
vating me to carry on the effort to develop an encoding for the script.
This project has been made possible in part by funding from the Adopt-A-Character program of the Unicode
Consortium, and has been supervised by Deborah Anderson and Rick McGowan.
It was also made possible in part by a grant from the U.S. National Endowment for the Humanities, which
funded the Universal Scripts Project (part of the Script Encoding Initiative at UC Berkeley). Any views,
findings, conclusions or recommendations expressed in this publication do not necessarily reflect those of
the National Endowment of the Humanities.
38
10BB0 Book Pahlavi 10BDF
0
10BB0 10BC0 10BD0
1
10BB1 10BC1 10BD1
2
10BB2 10BC2 10BD2
3
10BB3 10BC3 10BD3
4
10BB4 10BC4 10BD4
5
10BB5 10BC5 10BD5
6
10BB6 10BC6 10BD6
7
10BB7 10BC7 10BD7
8 $
10BB8 10BC8 10BD8
9 $
10BB9 10BC9 10BD9
A $
10BBA 10BCA
B $
10BBB 10BCB
C $
10BBC 10BCC
D $
10BBD 10BCD
E $
10BBE 10BCE
F $
10BBF 10BCF
aleph 𐮀 𐭠 𐭀 𐡀
beth 𐮁 𐭡 𐭁 𐡁
gimel 𐮂 𐭢 𐭂 𐡂
daleth (), 𐮃 𐭣 𐭃 𐡃
he 𐮄 𐭤 𐭄 𐡄
waw 𐮅 𐭥 𐭅 𐡅
zayin 𐮆 𐭦 𐭆 𐡆
heth ( ) 𐮇 𐭧 𐭇 𐡇
teth — — 𐭨 𐭈 𐡈
yodh ( ) 𐮈 𐭩 𐭉 𐡉
kaph , 𐮉 𐭪 𐭊 𐡊
lamedh , , , 𐮊 𐭫 𐭋 𐡋
mem 𐮋 𐭬 𐭌 𐡌
nun ( ) 𐮌 𐭭 𐭍 𐡍
samekh , 𐮍 𐭮 𐭎 𐡎
ayin ( ) — (𐭥) 𐭏 𐡏
pe 𐮎 𐭯 𐭐 𐡐
sadhe 𐮏 𐭰 𐭑 𐡑
qoph () — (𐭬) 𐭒 𐡒
resh ( ) (𐭥) 𐭓 𐡓
shin 𐮐 𐭱 𐭔 𐡔
taw 𐮑 𐭲 𐭕 𐡕
Table 1: Comparison of the Pahlavi scripts with Parthian and Aramaic. Parenthesis indicate that a
letter has been unified with another in the respective encoding. In Inscriptional Pahlavi, ayin and
resh are unified with waw, and qoph with mem.
40