Romanization of Khmer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Checked for validity

and accuracy –
October 2017
ROMANIZATION OF KHMER (CAMBODIAN)
B GN/PCGN 1972 Agreement

This system is based on the modified 1959 Service Géographique Khmère (SGK)
system. There are a small number of distinctions between the systems, which are listed
at the end of this system.

While most Cambodian toponyms consist of Khmer lexical items, word division is
not generally indicated, and Khmer diacritical marks are often omitted. Reference
sources should be consulted in cases of uncertainty.

CONSONANT CHARACTERS
and SUBSCRIPT CONSONANT FORMS
When a Khmer graphic cluster contains two consecutive consonants, the second
is generally written below the base character in a special subscript form, sometimes
called a “foot”. Most base consonant characters have a corresponding subscript form.
The list below shows the consonant characters and, in the right-hand columns, the form
of each consonant when it appears as a “foot”.

Consonant Characters Consonant “Feet” Romanization


â series ô series â series ô series
1 ក គ ◌� ◌� k

2 ខ ឃ ◌� ◌្ kh
3 ង ◌� ng
4 ច ជ ◌� ◌� ch

5 ឆ ឈ ◌� ◌្ chh

6 ញ ◌� nh

7 ដ ឌ ◌� ◌� d
8 ឋ ឍ ◌� ◌្ th

9 ថ ធ ◌� ◌� th

10 ណ ន ◌� ◌� n

11 ត ទ ◌� ◌� t

1
Consonant Characters Consonant “Feet” Romanization
â series ô series â series ô series
12 ប ◌្ b, p3
13 ព ◌� p
14 ផ ភ ◌� ◌� ph

15 ម ◌� m

16 យ ◌្ y
17 រ ្រ r
18 ឡ ល ◌� l

19 វ ◌� v

20 ស ◌្ s
21 ហ ◌� h
22 អ ◌� ’4

There is no subscript form for the character ឡ, which is romanized l. The

subscript forms ◌� and ◌� usually represent the characters ដ (d) and ធ (th), respectively,
rather than ត (t) and ឋ (th): ក�ី → kdei, កនាយ → kânthéay, but ក�ន�ប់ → kântráb.
[This last example employs 2 subscript characters on a single base consonant
character. In such cases, the base consonant character is romanized first, followed by
the character below the base, and then the character to the side.] Consultation of a
reference source may sometimes be necessary to establish which consonant character is
represented by the subscript form.
The subscript form usually determines the series of the vocalic nucleus that
follows: e.g. ខ�ង → khpông and ល� → l’â. The base consonant character determines the
vocalic series only if the subscript form is a “weak” consonant (either a nasal consonant
character (ង, ញ, ណ, ន, ម → ng, nh, n, n, m) or one of the following consonant

characters យ, រ, ល, វ → y, r, l, v), and the base consonant character is not one of these

‘weak’ consonants: e.g. ថ� → thmâ.

2
VOWEL CHARACTERS
The Roman-script vowel letters in the â series columns follow a romanized
syllable-initial consonant letter or consonant letter plus subscript form (represented by
◌) of that series: ក → kâ, ្រ → kra. The Roman-script vowel letters in the ô series

columns follow a romanized syllable-initial ◌ of that series: គ → kô, ្រ → kréa. Some


vowel characters are not differentiated as to series and may, therefore, follow a
romanized syllable-initial ◌ of either series: ែក → kê, ែគ → kê, ែ្រ → krê. A Khmer ◌ in

syllable-final position, not accompanied by a vowel character or by ◌៍, should generally

be romanized without a vowel letter following: កក → kâk, អង� → ’ângk.


Independent Dependent
Characters Characters
Normal Shortened
â series ô series â series ô series

1 ◌ â ô ◌◌់ á ó
2 ◌ា a éa ◌ា ◌់ ◌័◌ ă5 eă, oă5

3 ឥ ĕ ◌ិ ĕ ĭ

4 ឦ ei ◌ី ei i
5 ◌ឹ œ̆ œ̆

6 ◌ឺ œ œ

7 ឧ ŏ, ŭ9 ◌ុ ŏ ŭ
8 ◌ូ o u
9 ◌ួ uŏ uŏ
10 េ◌ aeu eu
11 េ◌ឿ œă œă
12 េ◌ៀ iĕ iĕ
13 េ◌ é é
14 ឯ ê ែ◌ ê ê
15 ឰ ai ៃ◌ ai ey

3
Independent Dependent
Characters Characters
Normal Shortened
â series ô series â series ô series

16 ឱ (ឲ) aô េ◌ aô oŭ

17 ឳ au េ◌ au ŏu

18 ឪ âu

19 ឫ rœ̆

20 ឬ rœ
21 ឭ lœ̆
22 ឮ lœ

VOWEL CHARACTERS WITH ANUSVARA OR VISARGA

â series ô series â series ô series


1 ◌ុ◌ំ om ŭm 5 ◌ុ◌ះ ŏh ŭh

2 ◌ំ âm um 6 េ◌◌ះ éh éh

3 ◌ំ឵ ăm ŏâm 7 េ◌឵ះ aôh ŏăh

4 ◌ះ ăh eăh 8 ◌ំ឵ង ăng eăng

DIACRITICAL MARKS
1 ◌៊2,3 4 ◌៌6
2 ◌៉2,3 5 ◌៏7

3 ◌់5 6 ◌៍8

NUMERALS

០ ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩

0 1 2 3 4 5 6 7 8 9

4
NOTES

1 The symbol ◌ represents any Khmer consonant character or consonant


character plus “foot”. The symbol → means “is romanized”.
2 The Khmer diacritical mark ◌៊ written above ◌ of the â series, except ប (b,p)

and ប (ba), changes it to the ô series: ហ៊឵ង → héang (see also Note 3). The

diacritical mark ◌៉ written above ◌ of the ô series changes it to the â series

ញ៉ង → nhâng. (These marks are frequently omitted in Khmer writing,


particularly in words of Indic provenance.)
3 The combination ប plus ឵ is written ប → ba. The latter character is a

graphic device designed to prevent confusion with ហ → hâ. The characters

ប and ប with the diacritical mark ◌៉ are romanized as p in the â series,

rather than as b in the ô series: ប៉ង → pâng. The characters ប and ប when
accompanied by a subscript form are also romanized as p in the â series,
although the Khmer diacritical mark is generally omitted: រប�
័ ង → plêng, ប� →

p’â, ្របា → prăb.

4 The â series consonant character អ is romanized by means of an apostrophe

(’): រក�
័ ក → k’êk, ចេង��ត → châng’iĕt, រអិល → rô’ĕl, អ�ី → ’vei. In word-initial

position before a vowel letter, however, the apostrophe is optional: អ឵ង


→ ’ang or ang.
5 The diacritical mark ◌់ appears only in two combinations: ◌◌់ (example: បត់

→ bát) and ◌ា ◌់. The diacritical mark ◌័ appears only in the

combination ◌័◌. In the â series, both ◌ា ◌់ and ◌័◌ are romanized ă: ច឵ក់ →

chăk, ច័ក → chăk. In the ô series, both are romanized eă when followed by k,

ng, or h; otherwise they are romanized oă: រព឵ក់ → rôpeăk, ម឵ត់ → moăt, វង�
→ veăngk, ភ័ព� → phoăpv.
6 The combination ◌៌ is romanized rC, where C represents any romanized

Khmer consonant character: ធម៌ → thôrm.

5
7 The diacritical mark ◌៏ in syllable-initial position should not be romanized: ស៏

→ sâ, ស៏សស → sâsâs. In syllable-final position ◌៏ indicates that the


consonant is vowelled, i.e., followed by â in the â series or by ô in the ô
series: តំណ៏ → tâmnâ, ពម៏ → pômô.

8 The diacritical mark ◌៍, which appears above characters that are not

pronounced, should not be romanized: បុណ្យ → bŏny, េពធ → poŭthĭ, ភូមិ


→ phumĭ.
9 The independent character ឧ is romanized ŏ or ŭ. Consult a reference
source in case of uncertainty.
10 An inventory of letter-diacritic combinations, with their Unicode encoding, in
addition to the unmodified letters of the basic Roman script is:
â (U+00E2); Â (U+00C2) ă (U+0103); Ă (U+0102) á (U+00E1); Á (U+00C1)
ê (U+00EA); Ê (U+00CA) ĕ (U+0115); Ĕ (U+0114) é (U+00E9); É (U+00C9)
ĭ (U+012D); Ĭ (U+012C) ô (U+00F4); Ô (U+00D4) ó (U+00F3); Ó (U+00D3)
ŏ (U+014F); Ŏ (U+014E) œ (U+0153); Œ (U+0152) œ̆ (U+0153+0306);
Œ� (U+0152+0306)
ŭ (U+016D); Ŭ (U+016C)

11 The Romanization columns show only lowercase forms but, when romanizing,
uppercase and lowercase Roman letters as appropriate should be used.

The BGN/PCGN system differs from that of the 1959 Service Géographique Khmère (SGK)
system in the following respects. In the BGN/PCGN system:

1. The characters ឫ, ឬ, ឭ, ឮ are romanized rœ̆, rœ, lœ̆, lœ, respectively.

2. The graphic cluster ហ� is romanized hv.

3. The “foot” ◌� is romanized d or t.

4. The graphic combinations ◌ា ◌់ and ◌័◌ are romanized eă when followed by k,


ng, or h; otherwise, they are romanized oă.
5. The independent character ឧ is romanized ŏ or ŭ.
6. Diacritical marks and numerals have been added.

7. Notes, which are essential to the application of the system, have been provided.

You might also like