Aspects of Mongol writing today
Michael Balk, Berlin1
The theme of the 63rd Annual Meeting of the PIAC in Ulaanbaatar – Communality and
Mutual Influence of Language and Civilisation in the Altaic World – provides an opportunity to address a complex cultural commonality for any civilisation, writing. The original
script of the Mongols, in use since the dawn of the Mongol Empire, was adopted from the
Uighurs. The script of this Turkic-speaking nation goes back to the alphabet of the Sogdians, an Iranian people, and ultimately to Aramaic. As a Semitic script, it is characterised
by a clear-cut range in the notation of consonants, while vowels are marked in a more
restrained manner in comparison.
While it was abandoned by the Uighurs themselves in favour of the Arabic script in the
wake of their adoption of Islam, their peculiar vertical form of writing became a unique
feature of Mongolian communities over centuries. Not only has it developed, in the form
of the Oirat “clear script” (тод бичиг), a variant characterised by greater precision in the
rendering of phonetic features, but also, in the form of so-called Galig (галиг), a system of
notation had emerged that is capable of expressing the characters of both Tibetan and
Sanskrit in a distinctly unique manner. Let us not forget that Written Manchu is also
derivative of the Mongol script, holding nothing less than imperial status during the Qing
dynasty. Though, as a result of Soviet influence, the old script in modern Mongolia has met
a strong and presumably lasting competitor in an adapted form of Cyrillic, Written Mongol
has never really disappeared. Since 1990, there has been a steady renaissance, giving historical substance and etymological scope to what is now a two-pronged performance, constituting a dynamic duopoly in the writing of the language. In Inner Mongolia, which is
part of present-day China, the ancient script was not replaced. Here Mongol seems less
threatened by Cyrillic than by the levelling pulls of Chinese in the new age.
The nineties of the twentieth century were marked by a technological development that
seems to have been as significant for writing and publishing as the invention of modern
letterpress printing with movable type by Johannes Gutenberg around 1450. The reference is to Unicode, a comprehensive technical standard that aims to make it possible to
write all characters of all scripts on the same technical platform – using the same typewriter, so to speak – comprising an inventory that includes all letters and ideographs on
Earth.
This is a profound media revolution. If it has become commonplace today to type not only
elementary Latin letters on a computer or a smartphone, but also Cyrillic, Mongolian or
characters of any other script more or less without problems, we owe this to the development of the Unicode standard in the 1990s. As far as the traditional Mongol script is concerned, its characters were first included as a component of the third version of the
Unicode standard, which was published in September 1999.2 During this period, the present author worked as subject specialist for Central Asia at the Berlin State Library
1
2
Thanks go to Juha Janhunen, who contributed some important points of detail.
See http://www.unicode.org/versions/Unicode3.0.0/.
1
(Staatsbibliothek zu Berlin). In this capacity, I was involved in the project by attending
meetings and international consultations as an official representative of Germany. My
impression was that the details of the standard were mainly worked out by experts from
the People’s Republic of China and a group of non-governmental specialists from the computer community who were interested in an early completion and, naturally, the widest
possible acceptance for the new character code tables of the Mongol range of the standard.3
Opinions may differ on the outcome. A fundamental objection to the Unicode standard for
Written Mongol is the fact that the interpretation of the characters of the script does not
primarily follow the logic of a one-to-one transliteration of the characters, but that of a
transcription in the sense of determining an articulation understood as classical standard.
Put simply, the letters and basic elements of the script are not encoded in relation to themselves, but in relation to a pronunciation or “reading” in which the characters can be
interpreted in different ways. This is particularly obvious in the encoding of the (genuine)
Mongolian vowels, for which a total of seven letters are defined in the relevant partition
of the code:
1820 ᠠ MONGOLIAN LETTER A
1821 ᠡ MONGOLIAN LETTER E
1822 ᠢ MONGOLIAN LETTER I
1823 ᠣ MONGOLIAN LETTER O
1824 ᠤ MONGOLIAN LETTER U
1825 ᠥ MONGOLIAN LETTER OE
1826 ᠦ MONGOLIAN LETTER UE
This kind of presenting the Mongol vowels – the Unicode labels A E I O U OE UE are nondiacritical modes of expressing a e i o u ö ü – corresponds to a traditional description as
laid down in numerous grammars4 and in like manner reappearing in the Cyrillic script (а
э и о у ө ү). On an unbiased view, however, this can only be understood as a phonetic
interpretation of certain characters or composed ligatures of letters, not as a transliteration of the factual elements of the script. This is particularly obvious with the codes that
are graphically identical in the table: U+1823 (O) = U+1824 (U) and U+1825 (OE) =
U+1826 (UE). The letters themselves offer no clue to recognise O and U as separate elements, and the same applies to OE and UE. Written Mongol simply does not make this
distinction.
As a genuinely Semitic script, it basically has only three characters to express vowels:
ālaph, yodh and waw, to use the common names of these letters in the Aramaic alphabet.
When Mongolian children learn their script at school, teachers are likely to teach them the
elementary signs (махбод) through a series of popular nicknames. 5 Semitic ālaph is
3
A list of the currently implemented characters of the script will be found at https://www.unicode.org/
charts/PDF/U1800.pdf. Besides punctuation marks, format controls, digits and the basic Mongol letters the
table also includes additional signs for Todo, Sibe, Manchu, Buryat as well as extensions for Sanskrit and
Tibetan.
4 For example: Poppe, Grammar of written Mongolian, p. 17.
5 In naming these elements I follow Шагдарсүрэн, Монголчуудын усэг бичигийн товчоон, pp. 29-30. See
also: Kara, Books of the Mongolian nomads, pp. 79-95.
2
matched in Mongol by characters that look and are named differently depending on their
position in the word. Initially a “crown” (титим or титэм) is written, medially a “tooth”
(шүд). In final position at the end of a word there can be either a “tail” (сүүл) curving to
the right or a final stroke to the left (цацлага); the latter can appear connected to a preceding medial or unconnected, i.e. in isolated position after a final letter and a space. Primordial yodh corresponds to “shin” (шилбэ) in popular Mongol parlance, which also occurs in a smaller variant called “hook” (дэгээ) appearing in final position only. The Mongolian “belly” (гэдэс) goes back to an ancestral waw.
In the Unicode characters listed above, most of these basic elements of the script can be
detected, along with a so-called “bow” (нум). In the order of their appearance in the
Unicode range we can discern: титим (at the onset of all the seven vowels), сүүл (second
element in Mongolian letter A) , цацлага (E), нум (I), гэдэс (O and U), дэгээ (final element
after гэдэс in letters OE and UE).
During the deliberations on the Unicode Standard, there was occasional talk of the basic
elements known as “glyphs”, 6 but they remained largely unconsidered in the alphabetic
encoding of letters and were relegated to the level of graphic fonts. The reason for this
procedure was certainly that at that time there existed no transliteration generally accepted among scholars in Mongolian studies to which one could have referred easily and
without generating objections. There was no precise rule mechanism for an unambiguous
transliteration of letters (in the sense of an orthographic romanisation) that was universally accepted, as is the case with the Greek or Cyrillic alphabets and most other non-Latin
scripts in Oriental philologies. The fact that only a transcription (or phonetic romanisation)
has been in general use, and still is to a large extent, has probably something to do with
the fact that Mongolian studies is a comparatively old discipline with conservative
instincts.
When Isaak Jakob Schmidt (1779-1847) dedicated his “Anfangsgründe der mongolischen
Sprache” (rudiments of the Mongolian language) with the since unchanged transcription
of the vowels7 to His Majesty Nicholas the First, in 1831, almost twenty years were to pass
before Franz Xaver Ritter von Miklosich (in Slovenian: Franc Miklošič, 1813-1891) was
appointed to the first chair of Slavic philology at the University of Vienna in 1849. In the
Oriental studies, especially in then prominent disciplines of Indology, Iranian and Arabic
Studies, it was only towards the end of the nineteenth century that what we understand
today as transliteration has become firmly established in the methodological apparatus
of scholars.8
While the principle of strict letter-based transliteration has since become widely accepted
in most philological disciplines dealing with foreign scripts, Mongolian studies has
The term glyph is derived from the Greek word γλυφη “carving”, from γλυφειν “to hollow out, cut out with
a knife, engrave, carve”. In typography, this is broadly understood to mean any meaningful sign. While
character refers to the abstract idea, glyph is more understood as its concrete graphic representation.
7 Schmidt, Grammatik der mongolischen Sprache, p. 1.
8 Though it was common at the time to call the unambiguous transposition of letters transcription rather
than transliteration. Sometimes we come across wordings like transcription of the script or (in German)
Schrifttranskription. An important document of the time is the “Rapport de la commission de transcription”
issued by the Tenth International Congress of Orientalists in Geneva in 1894.
6
3
adhered to a kind of mixed form in which the rendering of a letter is combined with an
articulation-based transcription. It is not the place here to expound the history and
authorship of Latin renditions of Written Mongol, used today with a certain degree of variation. Nicholas Poppe, whose grammar may be considered a standard reference for
orthographic questions, has only remarked in the preface: “The transcription used in this
book is that found in most scientific works dealing with the Mongolian language.”9
Around the same time as my participation in the deliberations on the Unicode Standard
for Written Mongolian, I presented some thoughts on a transliteration of the script in a
talk given at the Seventh International Congress of Mongolists in Ulaanbaatar in 1997. The
response was not overwhelming, but I made the wonderful acquaintance of Professor
Juha Janhunen from Helsinki, who in his own presentation at the conference also firmly
expressed the opinion that a true letter-based transliteration was an urgent necessity.10
The result was a joint project, the first outline of which appeared in a paper for the proceedings of the 41st PIAC held in Majvik (Finland) in 1998.11 A detailed description of our
transliteration of the classical Mongolian written language is also contained in the volume
“The Mongolic Languages” edited by Juha Janhunen, which has become a standard work
in Mongolian studies. 12 A brief overview of this romanisation following the Unicode
arrangement of the Mongolian alphabet can be found on the website of the Berlin State
Library.13
The system developed by Janhunen and myself is the first practical and manageable
romanisation of the script that has proved useful, amongst other purposes, in the bibliographical documentation of Mongol publications from Inner Mongolia. It is not intended
to displace or supersede existing transcriptions commonly used in scholarship, which
everyone may continue to maintain as he or she wishes. The system is conceived and
designed as an additional tool to existing forms of notation; anything else would be unhistorical and not feasible. However, our romanisation system claims to describe the Mongolian script more accurately than the existing systems of transcription, and it does so in
several respects.
On the one hand, a general principle is observed that when describing a source, no more
information should be indicated than what the original actually contains or implies. On
the other, all recognisable features of the script should be distinctly expressed in its Latin
rendering. The analysis of the script is done in two steps. At the elementary level, the
9
Poppe, Grammar of written Mongolian, p. xiii.
A short recount of our first meeting can be found in: Balk, Sieben Strophen des Udānavarga in mongolischer Version, p. 25.
11 Balk & Janhunen, A new approach to the Romanization of Written Mongol.
12 See in particular Janhunen's contribution “Written Mongol” (The Mongolic Languages, pp. 30-56) and the
“Chart of Romanization” (pp. xxvii-xxviii.)
13 Please consult https://staatsbibliothek-berlin.de/die-staatsbibliothek/abteilungen/ostasien/rechercheund-ressourcen/zentralasiatischer-katalog/transkription-mongolisch. The Staatsbibliothek zu Berlin has
one of the richest bibliographically documented Mongolian-language library collections in the world. A little
less than 15,000 books and periodicals are post-war holdings from Mongolia, mostly in Cyrillic script. The
slightly more than 7,000 Mongolian items published in the People's Republic of China date mainly from the
period after about 1980 until the present and are catalogued following BJR (Balk-Janhunen Romanisation).
The bibliographic data is available on the internet at http://stabikat.de/.
10
4
glyphs are described, taking up the traditional names mentioned above (титэм, шүд, сүүл,
шилбэ, гэдэс etc) and expressing them through Latin characters. Based on these elementary signs, functional letters can be identified, distinguishing between vowels and consonants as they correspond to the inherent logic of Mongol phonotactic and orthographic
rules. It is a common heritage of Semitic writing that this extends to three characters in
the first place, referred to in Aramaic as ālaph, yodh and waw. These signs can have both
a vocalic and a consonantal function in the Mongol script, which is expressed accordingly
at the alphabetic level. There is also a limited number of ligatures, where two glyphs stand
for one single letter. A tactical detail on which Janhunen and me were very much in agreement from the outset was that we only want to use the 26 letters of the Latin alphabet for
romanisation, hence no diacritics or Greek letters. Distinctions that are necessary are
resolved combinatorially and not by means of such special marks (for example sh qh and
not š ɣ to indicate the dots to the right and the left of letters s q).
One advantage of a purely script-based romanisation is an unbiased look at elementary
facts concerning the relationship between script and language. It is the particular Mongolian way of balancing sound and letter that becomes much more apparent to the eye than
if transcriptional hybrids are used, where letter-based rewriting of a conservative script
is combined with assumptions about a historical articulation that has long since ceased to
be relevant for communication in contemporary language. There are perhaps not too
many examples from other languages where spelling and pronunciation are as far apart
as in the case of Written Mongol and modern Khalkha. The same applies mutatis mutandis
to other Mongolic languages such as Buryat, albeit to a lesser extent.
The gap between speech and writing is even wider than in English, where the Tudor
Vowel Shift has created a discrepancy between spelling and pronunciation that to call
loose would be an understatement. Nevertheless, adherence to traditional spellings has
not hindered the rise of English as the modern world’s most important language. Rather,
the apparent inconsistencies in the notation of vowels in relation to pronunciation seem
to have given English a visual conciseness of orthography that allows the reader to grasp
what is meant quite quickly. Something similar may be argued about Written Mongol.
Despite its reductionism in the representation of vowels, partly also of consonants, its significance and prevalence extends beyond that of a local language as it is used across linguistic boundaries for different Mongolic idioms. The orchestral range in the number of
syllables, a result of the history of the language, as well as the resourceful inventiveness
in the selection of glyphs and design of letters in various positions of a word, give the
spelling a stimulating precision that facilitates reading. Both the recognisability of the
agglutinative structure and the visual unambiguity of the orthographic impression of
words as a whole seem to be higher in Mongol than in the Cyrillic script. For native Mongolian speakers, Written Mongol is complicated to write but comfortable to read if the
spelling of the words is familiar.
In the following remarks on Mongol writing, I will refer to the Cyrillic spelling of the Mongolian language and its terminology throughout. This seems to make more sense to me
than a parallel specification of the transcriptions commonly used in Mongolian studies,
which are likely to be rather unknown to the general public. For Mongolian in Cyrillic
script, there is an abundance of good dictionaries as well as excellent websites freely
5
accessible via the internet, which have been developed thanks to the far-sighted support
of the Mongolian cultural authorities. They are characterised by a high degree of reliability,
and I may take this opportunity to express my deep appreciation for these achievements
and the dedicated people who have created them.
Let me now go into some examples, not claiming to be exhaustive, but merely discussing
a few salient points and some problems that typically arise in connection with Unicode.
Those who are not familiar with Balk-Janhunen Romanisation may take them as a brief
introduction to the idea and functioning of the system. Please consider these names for
elementary signs:
<v>
<v>
<v>
<e>
<i>
<j>
<u>
<g>
<b>
титэм
шүд
сүүл
цацлага
шилбэ
дэгээ
гэдэс
нум
нумтай гэдэс
crown
tooth
tail
final stroke to the left
shin
hook
belly
bow
belly with bow
The system presented here for the Latin rendering of Written Mongol consists, in the first
step, of a transliteration of the elementary signs or glyphs (махбод). The second step is a
romanisation of these glyphs, which claims to capture the logic of writing in terms of the
distribution of vowels and consonants and the implication of ligatures; it will prove surprisingly readable for those familiar with Mongolian. While the glyphs are set in angle
brackets, romanised Mongolian letters appear in bold. Here are examples of the letter g
occurring in varying positions:
2
3
ᠭᠡᠭᠡ ᠬᠥᠭ ᠭᠦᠩ
1
< g-v-g-v-e >
gagae
< g-u-i-i-e >
guig
< g-u-i-v-i-e >
guivg
гэгээ
light
гүн
prince
хөг
tune
In the first line you see a word consisting of a “bow” (нум), followed by a “tooth” (шүд),
another “bow”, another “tooth” and a final stroke to the left at the end (цацлага). The
sequence of these glyphs (нум-шүд-нум-шүд-цацлага) can be expressed via transliteration or elementary conversion into Latin letters by the formula < g‑v‑g‑v‑e >. However,
this formula does not yet express the alphabetical value of the elements according to the
inner logic of the Mongol script. The нум or < g > in glyphic transliteration is used only in
initial and medial position to notate the letter romanised by g. If the same letter g is to be
placed at the end of a word, a ligature of шилбэ and цацлага is written: in the second and
third example it is the glyph sequence < i‑e > that serves the purpose. When talking about
letters, it should be emphasised that the term is based on an alphabetical understanding
6
of the concept. Pronunciation is something else: different phonemes may well correspond
to the same letter, as can be seen from the Mongol-Cyrillic equation guig ~ хөг (traditional
transcription: kög): in the onset of the syllable, the letter g has a different phonetic value
than in the coda. In the nucleus, there is a “belly” (гэдэс) followed by a “shin” (шилбэ),
which are graphically a sequence of two letters (ui) and phonemically represent an umlaut (ө or in other words ү).
The glyphs титэм, шүд and сүүл transliterated with < v > as well as the glyph with the
Mongolian nickname шилбэ represented with < i > allow for both a vocalic and a consonantal interpretation as regards their alphabetic value. If < v > is used for a consonant, v
is retained, as in the final ligature in guivg where the penultima is part of a digraphic
notation of the velar nasal (traditional transcription: güng). Consonantal function of the
glyph < v > is also manifest in initial position of full words (particles aside) in the appearance of a prosthetic consonant before the vowels a i u ui, for which ālaph is also found in
other Semitic scripts. Phonetically, the sign represents the onset of a vowel at the glottis.
In the coda of a closed syllable, consonantal < v > usually serves to mark a nasal, as in
valdav ~ алтан “golden”. Only if a diacritical dot is visible to the left of the element is the
dental nasal expressed in romanisation, as in vuinav ~ үнэн “truth”. If < v > functions as
a vowel, it is uniformly romanised as a, so in gagae (above) or words like ardani ~ эрдэнэ
“juwel” (lacking prosthetic v), regardless of how the vowel is to be understood phonologically. Final цацлага, following a and romanised by e in the word for “light”, is the
second element in the digraphic spelling ae in vocalic auslaut. The principle of either consonantal (v) or vocalic valency (a) is fully independent of whether, in the case of a vocalic
interpretation, the letter is phonetically realised as back (а) or front vowel (э). This may
take some getting used to, but it corresponds to the orthographic facts. In close analogy,
glyph < i > can also be understood in two ways, either as a vowel (i) or a consonant (j). In
examples 2 and 3, ui is a sequence of two vowels, even if phonetically monophthongs.
As said, < g > is the regular character to write g only in initial or medial position of a word
while in final position we find < i-e > for the letter. While transliteration of glyphs consists
in the description of graphic facts, romanisation involves the notation of letters, whereby
the position in the word is a decisive criterion for determining which letter is intended by
a glyph or a combination of glyphs. To describe these mechanisms is to outline the inner
logic of Written Mongol. It should be clear that this does not and should not imply the
specification of a pronunciation – this is what transcription is supposed to do.
If the elementary sign < g > appears in final or in isolated position (the latter is the case
with the accusative particle after a final consonant of the preceding word), the “bow” is
used to express the vowel i as can be seen at the end of the following examples:
5
ᠮᠣᠷᠢ ᠵᠥᠭᠡᠢ
4
< m-u-r-g >
muri
< i-u-i-g-v-g >
juigai
морь
зөгий
7
horse
bee
It was mentioned that “shin” is the Mongol continuation of Semitic yodh. Like most of its
ancestors and relatives, it can have both a consonantal and a vocalic function. The rule is
that шилбэ is always a consonantal letter in initial and intervocalic position of nonenclitic words; only in grammatical particles romanisations such as ijav (reflexive particle)
or ijar (instrumental particle) may occur.14 In medial position can < i > be read as a vowel,
which applies both to singular letter i (between two consonants) and to vowel compounds
or digraphs with i as a second component such as ai, ii or ui. The latter can be observed
in the first syllable of the word for “bee”, where the first and third elements are graphically
identical, but romanised differently: initially by j (corresponding to з in Cyrillic) and
medially by i (part of the digraphic notation of umlaut ө). In final position, “shin” does not
occur and is indicated in the script by an “bow”, which stands for i in this place.
In connection with these samples, it is worth recalling the clear conceptual difference
between graphic transliteration, alphabetical romanisation and phonetic transcription as
understood here. The sequences < u‑i > and < v‑g > are romanised as ui and ai according
to their alphabetic function, which is dependent on the position of the glyphs in the word.
As letters, ui and ai are sequences consisting of two vowels. Phonetically, ui is a digraph
standing for an umlaut (traditionally ö ~ Cyrillic ө) and ai may be regarded as an original
diphthong that has developed into a long vowel in Khalkha (ai ~ ий).
There is a remarkable parallelism between the glyphs < g > and < b > in terms of their
alphabetic valence and otherwise. Unlike the glyph usually called “bow”, there seems to
be no generally used name for <b>, at least not in the works of Kara and Shagdarsüren
(Шагдарсүрэн) already mentioned. A relaxed view allows to see a blend of “bow” and
“belly”, in which a нум extending across the full width of the letter is fused with a гэдэс
on the left side. A Mongolian colleague in the library referred to the sign as нумтай гэдэс,
saying that this is what she was taught in school, and I think the term is good enough to
be retained. Anyway, the parallelism between < g > and < b > is that these glyphs have the
alphabetic value of a g and b only in non-final positions. In final position, they express the
vowels i and u:
7
8
ᠪᠠᠪᠠᠢ ᠬᠦᠦ ᠭᠡᠭᠦᠦ
6
< b-v-b-v-g >
babai
< g-u-i-b >
guju
< g-v-g-u-b >
gaguu
баавай
father
гүү
mare
хүү
son
The examples show < b > and < g > in initial and medial (letters b and g) as well as in final
position (u and i). With regard to guju, an objection could be raised as to whether guiu
should be written instead of guju because of the palatal nature of the word. The sequence
ui would then correspond to the first umlaut in хүү (ui ~ ү). In the evidence available to
me, it is consistently the case that the trigraphic Mongolian sequence < u‑i‑u > cor14
Particles are subject to other rules regarding the orthography of the initial. Prostetic consonants (v) are
generally not written, cf. vuv ~ он “year” versus uv = genitive particle after final consonant other than v.
8
responds to a vowel sequence үү in Cyrillic, as also found in gujusi ~ гүүш “teacher” or
bujur ~ бүүр “group”. I am not aware of any deviating example, at least not in modern
orthography. For the sake of a consistent romanisation, however, a decision must be made
between uiu or uju. I argue for the latter, because this corresponds to the general rule
that < i > between vowels is considered a consonant (as in sajiv ~ сайн “good” or gajit ~
хийд “monastery”).
In order to adequately assess cases such as guju, it must be taken into account that today,
in addition to the glyph < i >, a character < y > is also used in writing, which differs in that
the stroke of the шилбэ has a small upward bend at the end. The difference between j and
y is found in initial position in examples such as jaruca ~ зарц “servant” versus yaruvggai ~ ерөнхий “general”, in medial position in sajiqav ~ сайхан “beautiful” versus sayiqav ~ саяхан “recent”. It is therefore not surprising to find next to uju the sequence uyu,
for example in buyu ~ буюу “is” or suyul ~ соёл “culture”. These back-vowel examples
can be contrasted with front-vowel ones, for example guiyugu ~ гүйх “to walk”.
It can be empirically established that the glyph sequence < u‑i‑u > occurs regularly in palatal words, while in velar words the sequence < u‑y‑u > is used. But this is the evidence
only for today’s orthography. As late as the nineteenth century the glyph < y > is not written in many texts, but uniformly expressed by < i >. We will therefore not only read guju
and bujur (with pattern uju ~ үү), but also buju and sujul throughout (representing
uju ~ ую or оё). Since it is essential that the romanisation be applicable to older orthographic variants as well, there is a case for consistently romanising < u‑i‑u > so that in
intervocalic position the consonant letter j is used for < i > also in palatal words.
It is instructive in this context to look at what can be observed in the glyph sequence
< u‑i‑i > that is comparable. The sequence < u‑i‑i > can appear in both palatal and velar
vowel environments. Examples include gujidav ~ хүйтэн “cold” and jujil ~ зүйл “kind”
on the front side and bujiziqu ~ бойжих “grow” and bujilaqu ~ буйлах “roar (of camels)”
for the back vowels. It seems reasonable to suggest that the Mongol element ji corresponds to short й (also known as и краткое). Viewed in this way, uji is obviously a correct
spelling for the latter two (showing uji ~ ой and уй). In cases like gujidav one would
rather expect a spelling *guijidav, where ui would correspond to the Cyrillic umlaut ү and
ji to following й. But this is not the case. The reason is that Written Mongol does not seem
to permit a sequence of three шилбэ, which such a spelling would amount to. A sequence
< i‑i‑i > does never occur; at least I know of no evidence for it in my lexical records.15
The observation indicates the following: the general rule that the letters traditionally
denoted by ö and ü in the first syllable of a word are to be spelt by a combination of гэдэс
and шилбэ does not always apply. The rule is not adhered to if ui is followed by the
sequence ji. In that case *uiji is shortened to uji (implying u ~ ө ү in first-syllable posi15
In the late 1990s, as part of my library work, I began to compile orthographic checklists of Mongolian
terms appearing in the catalogue or elsewhere, which now contain over 7000 entries in both Cyrillic and
Mongol orthography. From this material is derived my statement about the non-existence of the sequence
< i‑i‑i >. The database is currently being prepared for publication on a website of the State Library,
presumably at the following location, where a number of other online tools are already assembled:
https://crossasia.org/service/crossasia-lab/.
9
tion). From this I derive confidence that the glyph sequence < i‑u‑i‑g > corresponding to
Cyrillic зүй is correctly rendered alphabetically by juji and not *juii. The romanisation
guju can be considered a casual analogy, which seems acceptable in order to preserve the
general principle how < i > is romanised in intervocalic position (as consonantal letter j).
If this principle was abandoned, the glyphic sequences < a‑i‑i > and < u‑i‑i > would also
have to be romanised as aii and uii throughout, which would not be an improvement. The
reliability of the description of the glyphs remains the same, since both romanisations,
uju as well as uiu, imply and represent a Mongol sequence to be transliterated as < u‑i‑u >.
In both cases, then, the romanisation is not equivocal.
At this point I will briefly discuss how the problem of spelling has been solved on the excellent website Монгол хэлний их тайлбар толь.16 This primarily Cyrillic site also provides the lemmas in Mongol script based on the existing Unicode standard. With the help
of tools like “What Unicode character is this?”17 it can be determined which codes were
actually used when entering Mongol characters. The sequence < u‑i‑u> was consistently
realised by U+1826 (UE) inserted twice, which corresponds to üü in traditional notation:
9
10
11
хүү
guju
бүүр
bujur
гүүш
gujusi
ᠬᠦᠦ
ᠭᠦᠦᠰᠢ
ᠪᠦᠦᠷ
üü
QA · UE · UE
üü
GA · UE · UE · SA · I
üü
BA · UE · UE · RA
Unicode letter QA (U+182C) may surprise traditionally socialised readers, but can be explained by the fact that Unicode ignores the difference between velar q and palatal g in
the encoding. Which character will actually appear in the typeface depends on the vowel
environment, which must be either velar (traditionally a o u) or palatal (e ö ü). So basically, what is encoded under label QA is the fact that the assumed sound corresponds to
what is traditionally transcribed as q or k to appear as х in Cyrillic. An analogous thinking
applies to Unicode letter GA (U+182D), which also does not really encode a Mongol character, but a pronunciation for qh or g, which will traditionally appear as γ or g and correspond to Cyrillic г. Here it becomes particularly apparent that Unicode is not oriented
towards alphabetic reality, but towards articulation, which can only be derived from the
script to a limited extent. The two velar signs (q and qh) and the one palatal letter (g) are
encoded as if there were no distinction between velar and palatal. U+182C encodes a
voiceless spirant (QA ~ х) and U+182D a voiced stop (GA ~ г).
16
17
https://mongoltoli.mn/.
https://www.babelstone.co.uk/Unicode/whatisit.html.
10
12
13
14
15
16
хүйтэн
gujidav
зүй
juji
буйлах
bujilaqu
зүйл
jujil
бойжих bujiziqu
ᠬᠦᠢᠲᠡᠨ
ᠵᠦᠢᠯ
ᠵᠦᠢ
ᠪᠣᠶᠢᠵᠢᠬᠤ
ᠪᠤᠶᠢᠯᠠᠬᠤ
üi
QA · UE · I · TA · E · NA
üi
JA · UE · I · LA
üi
JA · UE · I
oyi
BA · O · YA · I · JA · I · QA · U
uyi
BA · U · YA · I · LA · A · QA · U
For the glyph sequence < u‑i‑i > romanised here as uji, the website gives examples of
U+1826 (UE) followed by U+1822 (I) if a word is front-vocalic. If a word is velar in character, the letters uji are written as trigraphs, namely by U+1823 (O) or U+1824 (U) in the
first, U+1836 (YA) in the second and U+1822 (I) in the third place. The sequence < u‑i‑i >
of the Mongol script can thus be coded in Unicode in no less than three different varying
forms. This is an orderly array of readings, but no clear encoding of letters.
The remarkable parallelism between the glyphs < g > and < b > mentioned above is also
evident in other respects. As described, < b > in the final position has the alphabetic
valency of u. Now it is precisely the letters g and b where this rule does not apply. Instead
of writing the final syllable < *b‑b > or < *g‑b >, the glyph < u > is retained here and not
supplanted:
18
ᠪᠣᠣᠪᠣ ᠵᠢᠳᠬᠦᠬᠦ
17
< b-u-u-b-u >
buubu
< i-i-u-v-g-u-g-u > jitgugu
боов
зүтгэх
cake
to pull
At this point, I would like to conclude these considerations; a detailed overall presentation
of the Mongol script would go beyond the scope of this article. The aim was to illustrate,
through a few simple examples, ways of approaching the script without being determined
from the outset by transcriptions, and to show where Unicode becomes problematic.
The following are some remarks on words that are spelled the same but can be interpreted phonetically and semantically differently. The fact that homographs are not necessarily homophones is anything but a Mongolian peculiarity. In English, one may think of
the bow that is pronounced |bou| when used to shoot arrows. The same noun with the
same spelling will sound like |bau| when it is supposed to mean an instance of bending
the head or body, for example in greeting. Mongolian also has many words that are homographs in the Mongol script but can be understood and pronounced differently. The Cyrillic standard, which is lexicologically well established, provides us with a good basis for
contrastive description of such cases:
11
20
21
22
ᠪᠥᠬᠡ ᠪᠥᠭᠡ ᠳᠠᠶᠢᠨ ᠲᠡᠶᠢᠨ
19
< b-u-i-g-v-e >
buigae
< b-u-i-g-v-e >
buigae
< t-v-i-i-v >
tajiv
< t-v-i-i-v >
tajiv
бөх
wrestler
дайн
war
бөө
shaman
тийн
thus
It is at the reader’s discretion whether he or she will understand buigae as “wrestler” and
pronounce it as бөх, or whether a “shaman” is assumed, who will be called бөө. In traditional notation, the two words are distinguished by using two different transcriptions,
böke “wrestler” and böge “shaman”. In Unicode notation, the first word must be typed with
QA (U+182C) in the middle of the word (~ k), the second with GA (U+182D) representing
the voiced stop (~ g).18 A fundamental problem with these inputs tied to interpretations
is that they do capture the real picture, but at the same time limit the range of what is
potentially meant or alluded to by introducing a specification that has no basis in the
spelling itself. Someone writing a prosaic text in Mongol will know whether he or she has
бөх or бөө in mind. Unicode gives the possibility to encode this subcutaneously, so to
speak, if not openly recognisable in the typeface (to be detected only by software tools).
The fact that Unicode requires the user to make such specifications can be a disadvantage.
Why deprive a writer of the opportunity to use a stylistic device known in Sanskrit as śleṣa,
a poetological term signifying “pun, paronomasia, double entendre, susceptibility of a
word or sentence to yield two or more interpretations (regarded as a figure of speech and
very commonly used by poets)”19? It could be, for example, that there is an intention in a
poem that buigae may be understood and pronounced in the sense of both “wrestler” and
“shaman”. Forcing an unambiguous decision here has a totalitarian feel to it. If Written
Mongol gives the freedom to write both words the same, this is more than a licence, it is
the writer’s liberty.
Since tajiv ~ дайн and tajiv ~ тийн belong to different word categories – the former a
noun, the latter a pronominal adverb – more sophisticated spell-checker programs should
be able to offer the “correct” encoding as an automatic suggestion during input. From the
point of view of Mongol script, however, different encodings of the two homographs seem
redundant, as they are not associated with any visible disparity in the typeface.
Superfluous things have the property of either disappearing or losing their relevance.
There is an aspect that is perhaps not entirely unimportant for the practice of writing: for
the first two letters of tajiv ~ дайн “war” and tajiv ~ тийн “thus”, the person writing has
to press different keys on the keyboard: DA (U+1833) and A (U+1820) for the first word
(traditionally dayin), but TA (U+1832) and E (U+1821) for the second (teyin). The big
See Монгол хэлний их тайлбар толь, s.v. бөх II (https://mongoltoli.mn/dictionary/detail/16458) and
бөө (https://mongoltoli.mn/dictionary/detail/16177).
18
19
Apte, The practical Sanskrit-English dictionary, p. 1579, s.v. श्लेषः
12
problem is that you cannot see the difference on the screen or on a paper printout. If you
cannot see a difference, you cannot tell if you have typed your text correctly. Proofreading
is impossible unless additional technical tools are used to identify characters by their code.
If spelling mistakes cannot be seen and recognised straight away, this is an invitation not
to worry about making errors in the first place and not to try to correct them afterwards.
If it is the same whether one writes “war” or “thus” correctly, this may not matter when
reading, but with automatic indexing, the two spellings are sorted into different places,
since the alphabetical arrangement is based on the codes in Unicode and not on the external appearance of the letters.
Homographs also exist in Cyrillic. Insofar as the Cyrillic standard of the state language of
Mongolia is regarded as a reasonably reliable transcription of modern pronunciation, one
can also speak of homophones with regard to the Mongol script, whereby the homophony
of differently spelled words refers primarily only to Khalkha, not necessarily to other
Mongolic tongues.
24
25
26
ᠪᠡᠬᠡ ᠪᠡᠬᠢ ᠲᠡᠦᠬᠡ ᠲᠡᠭᠦᠬᠦ
23
< b-v-g-v-e >
bagae
< b-v-g-i-j >
bagij
< t-v-u-g-v-e >
taugae
< t-v-g-u-g-u >
tagugu
бэх
ink
түүх
history
бэх
strong
түүх
to gather
The words for “ink” and “strong”, homographs in Cyrillic (бэх), show a different orthography at the end in Mongol: the first is written with шүд and цацлага (romanised ae),
the second with шилбэ and дэгээ (ij). The glyph called “hook” in Mongolian, transliterated by < j > and always romanised by j, can in a sense be considered a final variant of
шилбэ, which itself does not appear in final position. This is why there can be no possibility of confusion between the glyph < i > on the one hand and < j > on the other, even
though < i > is romanised by j in initial and intervocalic position: in final position, romanised letter j can only represent a дэгээ under the logic of the script. The examples show
yet another peculiarity of the letter g: vowels following are notated differently at the end
than is the case with most other preceding letters. The vowel is not written < *g‑v > or
< *g‑g >, but < g‑v‑e > and < g‑i‑j >, equivalent to gae and gij in the romanisation (which is
not *ga or *gi). At this point, the parallelism between g and b can be pointed out yet again.
With b, too, the vowel is not written <*b‑v > and < *b‑g >, but expressed digraphically by
< b‑v‑e > and < b‑i‑j >, as can be seen in common words like bae ~ ба “and” and bij ~ би
“I”. Another set of Cyrillic homographs is the word for "history", which philologically
trained historians will probably pronounce as teüke, as distinct from tegükü "to gather".
From a Cyrillic perspective, Written Mongol appears to be graphemically overdifferentiated, especially in regard to the number of syllables, but phonemically underdifferentiated due to the lack of precision in notation especially of vowels. For someone who uses
13
the script, Mongol is certainly more cumbersome and complicated than Cyrillic, but it
seems to make reading easier for those familiar with the orthography. Historically, it was
probably precisely the tendency towards a non-specific correspondence between sound
and letter that ensured that Written Mongol was in use across wide linguistic and dialectal
boundaries and could stand the test of time as the common script of the Mongols. Written
Mongol orthography does not force anyone to adopt a particular pronunciation. The writing is loose and tight at the same time: it leaves the reader free to articulate what is written
in his or her own tongue, and on the other hand is quite precise when it comes to using
orthography to explicitly note what is meant.
There is a striking similarity between traditional transcription mostly used in Mongolian
studies and the modern Cyrillic script with its refined orthographic regularities. In both
systems, there is an evident effort to specify the phonemes as precisely and meticulously
as possible. With the Cyrillic script, this is a good thing, since the script is conceived and
intended as a binding standard for the official language of a country to be learned at school
and expected to be adhered to later. In Written Mongol, the case is somewhat different, as
the script obviously avoids noting phonemes too precisely. In genuine Mongolian words
(foreign words aside), there are only three letters with vowel intention (a i u), a few
digraphs (ai au ii ui uu), trigraphic sequences (aji uji uju), and some special notations
that appear only in final position (e ae ij and a few more). It should have become clear
that the author of this paper considers it a priority to have a reliable transliteration of the
elementary glyphs for the Mongol script, as well as an alphabetic romanisation with
letters that accurately express the inherent logic of Written Mongol – no more, but no less.
Transcription in the sense of traditional conventions, which have developed over a period
of almost two hundred years, undeniably has its merits, which are in no way questioned
here. But it is not a good idea to do transliteration by means of transcription when the
focus is on the script.
The encoding of Written Mongol in Unicode fell into exactly this trap – in the midst of the
rapid technical innovations that took place in those eventful nineties of the last century.
Basically, it was not the simple elements of the script or their alphabetic correlates that
were encoded, but rather the traditional interpretation of these elements. This is particularly clear in the case of the vowels. The practical consequences are serious and not always
pleasant. Now it is definitely impossible to undo or revoke a Unicode encoding once it has
been authorised. Therefore, more flexible options should be discussed on how to supplement the existing specifications with fresh solutions to a convoluted situation. A step
towards something more sustainable could be, for example, to allow the insertion of a
гэдэс in a document even if one does not know (and will never know) whether the vowel
is meant in terms of O or U. This amounts to providing, here and in similar cases, an additional encoding in which a character is not locked into one or other phonetic interpretation, but capable of representing both in an overarching representation. Furthermore, one
should be able to distinguish the letters g and qh and enter them directly, regardless of
the nature of adjacent vowels. This requirement amounts to complementing the letters
encoded as QA (~ х) and GA (~ г) in such a way that q qh can be disentangled from their
counterpart g along the evidence of the letters themselves.
14
Works cited
APTE – The practical Sanskrit-English dictionary / Vaman Shivaram Apte. Kyoto 1978
[reprinted from the revised & enlarged edition, Poona 1957]
BALK – Sieben Strophen des Udānavarga in mongolischer Version / Michael Balk. In: Per
Urales ad Orientem : iter polyphonicum multilingue ; festskrift tillägnad Juha Janhunen på
hans sextioårsdag den 12 februari 2012 / edited by Tiina Hyytiäinen ... Helsinki 2012
(Suomalais-Ugrilaisen Seuran toimituksia ; 264), pp. 25-37
BALK & JANHUNEN – A new approach to the Romanization of Written Mongol / Michael Balk
& Juha Janhunen. In: Writing in the Altaic world / edited by Juha Janhunen and Volker
Rybatzki. Helsinki 1999 (Studia orientalia ; 87), pp. 17-27
JANHUNEN – The Mongolic languages / edited by Juha Janhunen. London 2003 (Routledge
language family series ; 5)
KARA – Books of the Mongolian nomads : more than eight centuries of writing Mongolian / György Kara ; translated from the Ru0ssian by John R. Krueger. Bloomington 2005
(Indiana University Uralic and Altaic series ; volume 171)
POPPE – Grammar of written Mongolian / Nicholas Poppe. Wiesbaden 1954 [5th unrevised printing 2006] (Porta linguarum orientalium ; Neue Serie ; Band 1)
Rapport de la commission de transcription / Xme Congrè s international des orientalistes.
Session de Genè ve. Geneva, Switzerland 1894
SCHMIDT – Grammatik der mongolischen Sprache / verfasst von I. J. Schmidt. St. Petersburg 1831
ШАГДАРСҮРЭН – Монголчуудын үсэг бичигийн товчоон : үсэгзүйн судалгаа = Study of
Mongolian scripts / Цэвэлийн Шагдарсүрэн. Улаанбаатар 2001 (Bibliotheca mongolica ;
monograph 1)
Websites
http://stabikat.de/
http://www.unicode.org/versions/Unicode3.0.0/
https://crossasia.org/service/crossasia-lab/
https://mongoltoli.mn/
https://mongoltoli.mn/dictionary/detail/16177
https://mongoltoli.mn/dictionary/detail/16458
https://staatsbibliothek-berlin.de/die-staatsbibliothek/abteilungen/ostasien/
recherche-und-ressourcen/zentralasiatischer-katalog/transkription-mongolisch
https://www.babelstone.co.uk/Unicode/whatisit.html
https://www.unicode.org/charts/PDF/U1800.pdf
15