Japones Romanizacion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Cataloging & Classification Quarterly, 48:279–302, 2010

Copyright © Taylor & Francis Group, LLC


ISSN: 0163-9374 print / 1544-4554 online
DOI: 10.1080/01639370903338352

A Study of Romanization Practice for Japanese


Language Titles in OCLC WorldCat Records

YOKO KUDO
Texas A&M University, College Station, Texas, USA

Consistent romanization practice is one of the biggest challenges


in cataloging Japanese materials. This study provides a snapshot of
how Japanese is inconsistently or incorrectly romanized in Online
Computer Library Center (OCLC) WorldCat records, and analyzes
factors that might be causing the romanization problems among
the records. Particular focus is placed on the analysis of word di-
vision problems that might be affected by different interpretations
and applications of the ALA-LC Romanization Tables rules. Results
revealed major factors behind the inconsistencies that were associ-
ated with the ambiguity and complexity of the word division rules.
Solutions and ideas for further studies are suggested.

KEYWORDS cataloging, romanization, Japanese language titles,


OCLC WorldCat, word division

Romanization is a method of representing non-Roman (or non-Latin) scripts


or languages with the Roman (or Latin) alphabet.1 As increasing numbers
of libraries offer vernacular script-based access to non-Roman language re-
sources, the role of romanization for information retrieval may be less critical
than it used to be. Indeed, as Agenbroad pointed out, vernacular script ac-
cess points have the advantage over romanized access points in that they do
not impose upon catalog users the additional burden of learning how the
original languages are romanized.2 Nevertheless, for users who are not very
familiar with original scripts, romanization still serves as an important tool.
It helps those users not only search resources, but also read and understand
bibliographic information in retrieved records.

Received June 2009; revised August 2009; accepted August 2009.


Address correspondence to Yoko Kudo, Texas A&M University Libraries, 5000 TAMU,
College Station, TX 77843-5000. E-mail: [email protected]

279
280 Y. Kudo

In North America, romanization is performed in accordance with ALA-


LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts,3
which were established to secure precise and consistent romanization prac-
tices between institutions that would promote an effective network of coop-
erative cataloging and resource sharing.4 Despite this mission, in the Japanese
language cataloging community, incorrectly or inconsistently romanized bib-
liographic information has been an issue of concern for many years. On
mailing lists such as Eastlib, frequent discussions have occurred on how to
romanize specific words and phrases. According to the findings from ear-
lier studies, the difficulty of Japanese romanization is caused partly by the
ambiguity and complexity of the rules.
This study addresses the issues of romanization practices in Japanese
cataloging. The study examines the extent of inconsistency reflected in ro-
manized Japanese titles in Online Computer Library Center (OCLC) WorldCat,
and delves into the common notion of ambiguous and complex romaniza-
tion rules as a possible cause of the inconsistencies. Based on the findings
from the study, suggestions are offered to help improve consistency and
efficiency of Japanese romanization practice. Potential for further studies is
also suggested.

BRIEF HISTORY OF ALA-LC ROMANIZATION TABLES

According to Tsukamoto, the history of uniform romanization practices in


foreign language cataloging dates back to 18855 when the Transliteration
Committee of American Library Association (ALA) considered transliteration
tables for Semitic, Sanskrit, and Russian.6 The Library of Congress (LC) began
to publish romanization tables in 1945,7 but the rules related to the Japanese
language were not issued until 1957.8 Those rules, which were developed
with the collaborative effort of LC’s Orientalia Processing Committee and
ALA Special Committee on Cataloging Oriental Materials,9 were compiled
into Cataloging Rules of the American Library Association and the Library
of Congress. Additions and Changes, 1949–1958 10 in 1959. With significant
revisions made to a section of the rules entitled “Far Eastern Languages:
Manual of Romanization, Capitalization, Punctuation, and Word Division for
Chinese, Japanese, and Korean,” the Japanese romanization table was issued
in the Spring 1983 issue of LC’s Cataloging Service Bulletin (CSB).11 One of
the major changes was that the third and later editions of Kenkyusha’s New
Japanese-English Dictionary12 were referred to instead of the 1931 edition13
as the source of the modified Hepburn romanization system.14 American
National Standard System for the Romanization of Japanese (ANSI Z39.11-
1972) was also employed in this table for the first time.15
Romanization tables that had been developed separately and provided
in various issues of the CSB were compiled into the 1991 edition of ALA-LC
Romanization Practice for Japanese Language Titles 281

Romanization Tables: Transliteration Schemes for Non-Roman Scripts.16 The


table for Japanese in this edition carried the additional instructions issued in
198517 as well as the rules of 1983. No revision or addition was made to the
table for Japanese from 1991 to 1997, which is the current and valid edition.18
In 2003, LC announced a minor revision of the table by changing the special
character used to divide the letter n and a vowel or y to alif (’)19 from
apostrophe.20 No further official changes have been implemented since then.

LITERATURE REVIEW

Since as early as 1930, a variety of literature has been published on Japanese


romanization both in the United States and Japan. Many have focused on the
study of three different romanization schemes (Nihon-shiki, Kunrei-shiki,
and the Hepburn system) (e.g., Reischauer 1940,21 Umesao 1987,22 Tsuge
200423). Others have also discussed the background of those romanization
schemes in the context of the script reform movement in Japan from the late
1800s to the early 1900s (e.g., Hirai 194824).
Only a few studies have examined the issues related to Japanese ro-
manization practices in cataloging. Tsukamoto, in his 1962 Master’s thesis,
examined selected LC card catalog records, and pointed out that the difficulty
of correctly reading Chinese-based characters, or kanji, and the complexity
of the rules, especially word division rules, made romanization of Japanese
remarkably challenging.25 According to Matsumura, the complexity of word
division is not simply a matter of how its rules are expressed, but comes from
the fact that Japanese lacks the concept of word division.26 Unlike English,
which separates every word with a space, the Japanese writing system does
not clearly distinguish word by word. Matsumura analyzed six word division
rules for Japanese romanization that were concurrently used in Japanese
institutions, which revealed varied opinions and a considerable degree of
confusion on the concept among institutions and scholars.
Almost no publications have touched on Japanese romanization issues
in cataloging since the romanization table came out in 1983. Even so, Mori-
moto’s presentation at the European Association of Japanese Resource Spe-
cialists conference in 2005 clearly showed that the challenge of romanization,
especially word division, still stands.27 Morimoto reported that as a result of
his research, 14 percent of 965 OCLC WorldCat records were incorrectly ro-
manized, and that over 90 percent of these records were left uncorrected in
the local catalogs of OCLC participants. In addition, he pointed out that most
of the errors found were word division related.
This study provides an updated snapshot of how Japanese is incon-
sistently or incorrectly romanized in OCLC WorldCat records in selected
classification ranges, and provides a more detailed analysis of factors that
might be causing the romanization problems among the records. It devotes
282 Y. Kudo

particular attention to various interpretations and applications of the rules


in ALA-LC Romanization Tables that might be affecting proper romanization
practice. The collected sample titles are examined in terms of romanization
in a narrower sense, that is, converting Japanese script into Roman letters,
and word division, which is separating romanized words with a space. Word
reading, capitalization, and punctuation are not included in the scope of this
study.

METHOD
Sample Data
For this study, 950 romanized Japanese titles proper were gathered from the
MARC21 245 subfield “a” of OCLC WorldCat master bibliographic records.
Due to the unavailability of services that could automatically generate a
random sample from WorldCat,28 and to limitations of the number of titles
that could be displayed in the Connexion client, search queries were per-
formed to extract titles published between 1998 and 2008 in the following
classification ranges: BQ, DS, HD, PL, QH, and TR (shown in Table 1). The
year range was chosen to obtain records created in accordance with the
current edition of ALA-LC Romanization Tables. Meanwhile, the classifica-
tion ranges were chosen from LC classes that have been frequently assigned
to Japanese language monographs in order to encompass various Japanese
words and phrases from different subjects. From the lists retrieved, every
third record in the default order of the lists was pulled for examination.
Records missing Japanese script in the 245 field were skipped, and titles
with exceptional readings, such as itsu datte teiku wan (once-in-a-
lifetime opportunity), were also excluded. Records loaded from Japanese

TABLE 1 Search Queries

Date of Total No. No. of Selected


Search Query Search of Records Records

vp:cjk and la:jpn and mt:bks and lc:BQ∗ Feb. 2009 2,728 100
and yr:1998–2008
vp:cjk and la:jpn and mt:bks and lc:DS∗ Nov. 2008 17,599 250
and yr:1998–2008
vp:cjk and la:jpn and mt:bks and lc:HD∗ Jan. 2009 5,521 150
and yr:1998–2008
vp:cjk and la:jpn and mt:bks and lc:PL∗ Nov. 2008 24,378 250
and yr:1998–2008
vp:cjk and la:jpn and mt:bks and lc:QH∗ Jan. 2009 479 100
and yr:1998–2008
vp:cjk and la:jpn and mt:bks and lc:TR∗ Jan. 2009 420 100
and yr:1998–2008
Total 51,125 950
Romanization Practice for Japanese Language Titles 283

vendors and libraries were included, even though their romanization is not
always in conformity with North American standards.

Procedures
ROMANIZATION
The titles were examined to identify any Japanese characters or letters rep-
resented with Roman letters in a way that is against the modified Hepburn
romanization scheme adopted by Kenkyusha’s New Japanese-English Dictio-
nary 3rd, 4th,29 5th editions,30 and the ANSI standard. Of these sources that
provide a different range of syllables, ANSI covers the widest range including
syllables mainly used in non-Japanese words. It is interesting to note that
although ALA-LC Romanization Tables instruct catalogers to refer to ANSI for
“words of foreign origin,”31 ANSI also provides a chart for native Japanese
long vowel syllables,32 which is more detailed and helpful than Kenkyusha’s
brief notes on a page margin.
Having instructions on letter conversion scattered across more than one
source may be confusing enough to cause catalogers to overlook certain
rules, resulting in inconsistent romanization. In addition to this confusing
layout of the rules, the titles were also examined in view of possible con-
fusion due to the following factors: other romanization schemes recognized
in Japan, namely Kunrei-shiki and Nihon-shiki, original spellings for foreign
words, and so-called wapuro romanization, a method used to input Japanese
into word processors and computers with an English keyboard.

WORD DIVISION
The argument that the ALA-LC Romanization Tables rules are too ambiguous
and complex generally refers to those for word division. Considering the sit-
uation where the rules have been interpreted and applied in more than one
way, the sample was compared for consistency rather than examined for the
purpose of judging whether they were “right” or “wrong.” Appropriateness
of word division was discussed only when given cases were clearly illus-
trated in the rules. Each title was broken down into its component elements,
and those elements were categorized according to type of word. This study
is concerned with verbs, adjectives, adverbs, compound particles, and the
types of nouns presented in Table 2. Matsumura’s analysis was consulted
for the definitions of words on the list.33 This word list, as well as word
categorization presented hereafter were prepared solely for the purpose of
this study, and are not necessarily linguistically accurate. It is also noted
that the way to categorize sample words is biased to an unavoidable extent.
Types or forms of words that were too few to compare were not discussed.
284 Y. Kudo

TABLE 2 Word Categories

Type of Words Definition/Scope

Verbs Simple verbs


Compound verbs Verbs formed by connecting with another verb
or noun
Derived verbs Verbs of “noun suru” form
Adjectives/Adverbs Adjectives Adjectives ending with
Adjectival verbs Adjectives ending with da (also known as,
noun )
Adverbs
Compound particles Collocations of more than one particle
Nouns Simple nouns
Pronouns Personal and demonstrative pronouns
Proper nouns Types covered by ALA-LC Romanization Tables
Capitalization rules and their derived words
(Corporate names were identified by whether
or not those names were capitalized on
bibliographic records).
Sino-Japanese Binary and other types other than proper name
compounds derivatives
Native-Japanese Binary and other types other than proper name
compounds derivatives

Foreign word division standards that might be used in Japanese institution


records were not taken into consideration in discussing possible causes of
inconsistency.

RESULTS AND ANALYSIS


Romanization
Of 950 titles, inappropriate or questionable spellings were found in twenty-
five titles (2.63 percent). Those twenty-five problems include six apparently
careless mistakes such as omitting letters or assigning roman letters for a
completely different sound. Types and numbers of problems are shown
in Table 3. Seven problems were related to diacritical symbols, alif (’)
and macron (− ). Three of these used an apostrophe instead of alif. This
is probably because the records were created before the revision of the
rule in 2003, and have not been updated since. In 2007, a concern was
expressed about the frequent misplacement of macron on Eastlib, when sev-
eral applicable cases were found in words such as Tōkyō (Tokyo),34
35
shiryō (document), and ryūgaku (studying abroad).36 From the sample,
however, only one title had this problem as found in t̄ ōzai (east and
west). It is possible that the warning on the mailing list encouraged cata-
logers to correct the records. Incorrect Roman letters were assigned in nine
titles. puresentāshon (presentation) and Shiracchise
Romanization Practice for Japanese Language Titles 285

TABLE 3 Romanization Problems

Title
Type of Problems No. % Example

Missing diacritics 3 12.00 saibo (cell)


denen (countryside, farm land)
Misplaced diacritics 1 4.00 t̄ ōzai (east and west)
Use of incorrect 3 12.00 hon’yaku (translation)
diacritical symbols
Ken’yūsha [proper name]
Spelled-out 4 16.00 Shīesuāru CSR
initials/acronyms
īyū EU
Spelled-out Arabic 2 8.00 nitenzero 2.0
numerals
nijūisseiki 21 (21st century)
Missing letters 3 12.00 hōkosho (report)
gjutsu (skill, technology)
Use of incorrect 9 36.00 puresentāshon
letters (presentation)
Shiracchise [proper name]
Total 25 100.00

(a proper name) show catalogers’ probable confusion due to the factors


mentioned earlier: original English spellings and Japanese input system to
computers. It should also be mentioned that as opposed to a majority of ini-
tials and acronyms transcribed as they appeared, four of those initials were
spelled out as they were pronounced (e.g., īyū for EU). This inconsistency
is apparently caused by lack of clear policies in ALA-LC Romanization Ta-
bles for how to treat non-Japanese letters used with Japanese script. Overall,
examination revealed only an infrequent problem, and a great proportion of
title statements were consistently romanized.

Word Division
In the following discussion, the term “word” may be used to refer to both
examined component elements and minimum units as a result of word di-
vision. Also, the terms “affix,” “prefix,” and “suffix” may be used to indicate
all kinds of nouns attached to nouns whether they turn out to be part of the
attached words as a result of word division or not.

VERBS
Simple Verbs. The results for verbs are shown in Table 4. Most simple
verbs that were in the plain form (this is also the form that precedes nouns)
286 Y. Kudo

TABLE 4 Word Division Inconsistencies: Verbs

Inconsistency
Incorrect Questionable
Type of Forms Total No. No. % No. %

Basic Plain/dictionary form 108 1 0.91 0 0


Inflected form (past tense, 48 1 2.09 0 0
negation, passive, etc.)
Inflected form with verbs 7 1 14.29 0 0
and auxiliary verbs
Total 163 3 1.84 0 0
Compound Plain/dictionary form 9 0 0 1 11.11
Inflected form (past tense, 3 0 0 0 0
negation, passive, etc.)
Inflected form with verbs 0 0 0 0 0
and auxiliary verbs
Total 12 0 0 1 8.33
Derived Plain/dictionary form 10 2 20.00 0 0
Inflected form (past tense, 7 1 14.29 0 0
negation, passive, etc.)
Inflected form with verbs 1 0 0 0 0
and auxiliary verbs
Total 18 3 16.67 0 0
Total 193 6 3.11 1 0.52
Note. Inconsistencies that were covered by ALA-LC Romanization Tables were counted as Incorrect.
Inconsistencies that were not clearly covered by the Tables, and were thus treated in different ways, were
counted as Questionable. Percentages were rounded to the nearest hundredth place.

and the inflected form were consistently written as single words separated
from the preceding and following words. Seven inflected verbs were con-
nected with another verb or auxiliaries as in oshiete kurenai
(can teach [negative form]). They varied in form and combination, but were
consistently written separately from additional verbs and auxiliaries. A prob-
lem was discovered in the plain or inflected form followed by a particle
ka. As shown in Figure 1, three of five such verbs were written together with

FIGURE 1 Examples of simple verbs followed by a particle ka.


Romanization Practice for Japanese Language Titles 287

FIGURE 2 Examples of derived verbs.

the particle. As illustrated in Word Division rule 2(e) and (b)(3) examples,37
the particle ka should be written separately.
Compound/Derived Verbs. (say something; complain) was writ-
ten as two units, mono and mōsu, which was inconsistent with treatment for
the other “noun + verb” form verbs such as yumemite (dreaming). The
word is established as a single verb in some dictionaries,38 but is rec-
ognized as a collocation in others.39 The different perceptions on this word
have likely caused this inconsistency. Figure 2 shows three of eighteen de-
rived verbs of “noun suru” form written as two words, one compound
plus one verb suru. The word (stroll) was the only word
derived from a native Japanese compound. The word was divided into two
parts, sozoro and arukisuru, as if it were one adverb followed by a verb.

ADJECTIVES AND ADVERBS


The results obtained from the sample adjectives and adverbs are shown in
Table 5. Simple adjectives and adverbs were, regardless of their inflection,
all written as single words. In contrast, adjectival verbs, also called quasi-
adjectives,40 presented interesting inconsistencies. Eleven out of sixteen such
words were divided into a noun and followed by an inflected part, whereas
the other five were treated as single words as shown in Figure 3. This result
seems to reflect the opposing views as to whether the adjectival verb is a
distinctive word class or not. While major Japanese dictionaries recognize
a word class independent of the verb or adjective, there has also been
a suggestion to consider this type of word a combination of noun and
particle,41 and Johnson categorizes it as one kind of adjective.42 ALA-LC
Romanization Tables provide little help to solve this problem. The difference
between bimyōnaru (delicate; subtle) in Word Division rule 2(c)43
and nonki na (easygoing; optimistic) in Word Division rule 2(e)44
seems unclear, and adds more confusion to catalogers’ rule interpretation.
288 Y. Kudo

TABLE 5 Word Division Inconsistencies: Adjectives and Adverbs

Inconsistency
Incorrect Questionable
Type of Forms Total No. No. % No. %

Adjectives Plain/dictionary form 19 0 0 0 0


Inflected form (past tense, 2 0 0 0 0
negation, passive, etc.)
Total 21 0 0 0 0
Adjectival verbs Plain/dictionary form 16 0 0 5 31.25
Inflected form (past tense, 0 0 0 0 0
negation, passive, etc.)
Total 16 0 0 5 31.25
Adverbs Plain/dictionary form 10 0 0 0 0
Inflected form (past tense, — — — — —
negation, passive, etc.)
Total 10 0 0 0 0
Total 47 0 0 5 10.64
Note. Inconsistencies that were covered by ALA-LC Romanization Tables were counted as Incorrect.
Inconsistencies that were not clearly covered by the Tables, and were thus treated in different ways, were
counted as Questionable. Percentages were rounded to the nearest hundredth place.

PARTICLES
Eighteen compound particles (e.g., e no, to no, made ni, de
wa, to wa) were found in the sample titles, and only one of them was
written as a single word (5.56 percent).

NOUNS
Simple Nouns and Pronouns. The results obtained from the sample
for simple nouns and pronouns are shown in Table 6. Word division of
simple nouns, including those derived from verbs such as kurashi

FIGURE 3 Examples of adjectival verbs.


Romanization Practice for Japanese Language Titles 289

TABLE 6 Word Division Inconsistencies: Simple Nouns and Pronouns

Inconsistency
Incorrect Questionable
Type of Forms Total No. No. % No. %

Simple Simple single-stem 124 0 0 0 0


nouns nouns
Nouns derived from 22 0 0 0 0
verbs
Total 146 0 0 0 0
Pronouns Personal 5 0 0 0 0
Demonstrative 6 0 0 0 0
Demonstrative, 9 1 11.11 0 0
adjectival use
Total 20 1 5.00 0 0
Total 166 1 0.60 0 0
Note. Inconsistencies that were covered by ALA-LC Romanization Tables were counted as Incorrect.
Inconsistencies that were not clearly covered by the Tables, and were thus treated in different ways, were
counted as Questionable. Percentages were rounded to the nearest hundredth place.

(life) and ayumi (footstep) showed no inconsistency. Demonstrative pro-


nouns, when inflected to function as adjective or adverb, tend to collocate
with verbs as in dō iu (what kind of) and dō shite (how; why).
Nine such combinations were found in the sample titles, and only one of
them, (in this way) was written as a single word as shown in Figure 4.
Very similar examples are provided under the Word Division rule 2(b)(2).45
Proper Nouns Personal Names and Corporate Names. Table 7
shows the proper noun results obtained from the sample. Five out
of nine suffixes attached to names were titles or terms of address
( shinnō, hakase, sama, shi, in), and they were con-
sistently treated as instructed in Word Division rule 4(b)(2) and (3).46
The suffix shō in Dazai Osamu-shō (Dazai Osamu Award),
and ke in Mizuki-ke (Mizuki family) were both hyphenated
onto the name, probably in accordance with Word Division rule 4(a)
exception (4).47 All corporate names were used singularly, and no incon-
sistency was found to an extent reasonably comparable.

FIGURE 4 Examples of demonstrative pronouns collocated with verbs.


290 Y. Kudo

TABLE 7 Word Division Inconsistencies: Proper Nouns

Inconsistency
Incorrect Questionable
Type of Forms Total No. No. % No. %

Personal Single use 76 0 0 0 0


Derived by affixes 9 0 0 0 0
Total 85 0 0 0 0
Corporate Single use 21 0 0 0 0
Derived by affixes 0 0 0 0 0
Total 21 0 0 0 0
Geographic Single use 171 0 0 0 0
Derived by affixes 108 1 0.93 1 0.93
Followed by 18 1 5.56 0 0
generic terms
Total 297 2 0.67 1 0.34
Religion Single use 48 0 0 0 0
Derived by affixes 7 0 0 2 28.57
Total 55 0 0 2 3.64
Historical period Single use 22 0 0 0 0
Derived by affixes 1 0 0 0 0
Total 23 0 0 0 0
Total 481 2 0.42 3 0.62
Note. Inconsistencies that were covered by ALA-LC Romanization Tables were counted as Incorrect.
Inconsistencies that were not clearly covered by the Tables, and were thus treated in different ways, were
counted as Questionable. Percentages were rounded to the nearest hundredth place.

Geographic Names and Names of Religion. Of seven single-character


terms for a generic feature suffixed to geographic names, only yama
in (Mt. Ishimaki) was hyphenated as Ishimaki-yama. This was proba-
bly caused by confusion between Word Division rules 4(c)(4) and (5).48
Because of the LC classes selected for the sample, Nihon (Japan)
made up more than 70 percent of all geographic names found. Of 108 com-
pounds derived from Nihon, Nihongo (Japanese language),
Nihonjin (Japanese people), and Nihon shi (Japanese history) made
up 84 percent (ninety-one words). Since romanization of these three words
is specifically provided in ALA-LC Romanization Tables, no variation was
found. For the following words in Figure 5, however, different rules seem to
have been applied, and thus treatments varied.
Whether suffixes are written separately or hyphenated with names de-
pends on the catalogers’ interpretation of what are “proper names and ti-
tles of books” (Word Division rule 4(a))49 and “single characters which can
be suffixed to any proper names” (Word Division rule 4(a) exception (4)).
For Nihon ron (discussion about Japan), the cataloger’s choice seems
to be made based on the former rule, and Nihon-ron was on the latter. It
is unclear which rules were used for Nihongaku (Japanese studies)
or Nihongogaku (Japanese language studies), even though the suffix
Romanization Practice for Japanese Language Titles 291

FIGURE 5 Examples of compounds derived from Nihon.

gaku was consistently written as part of the compound. No rules in ALA-LC


Romanization Tables specifically instruct catalogers to capitalize and write
such compounds as single words. It is possible that Capitalization rule 850
indirectly suggests “derived words like Nihonjin, which are still con-
sidered to be proper names, but not the derivative formed by the suffix of a
single character like Taiwan-sei (made in Taiwan), be capitalized and
written as a single word.” In the same manner, treatment was also divided
in (history of Buddhism) and (Buddhist studies) as shown in
Figure 6.
Compounds. Sino-Japanese Binary Compounds. The results obtained
from the sample compounds are shown in Table 8. Fifteen hundred and
twenty (1,520) binary compounds were found in 676 titles, where all but one
were consistently written separately from other adjacent elements.
(linguist) was written as a single word, gengogakusha. This might be because
the word was judged as one trinary compound gengogaku (linguistics)
and one suffix sha (person) rather than two binary compounds, gengo

FIGURE 6 Examples of compounds derived from Bukkyō.


292 Y. Kudo

TABLE 8 Word Division Inconsistencies: Compounds

Inconsistency
Incorrect Questionable
Type of Forms Total No. No. % No. %

Sino-Japanese: 1,520 0 0 1 0.07


Binary
Sino-Japanese: Trinary 5 0 0 0 0
Trinary/Derived
Derived by prefix 43 0 0 2 4.65
Derived by suffix 260 2 0.76 0 0
Total 308 2 0.65 2 0.65
Sino-Japanese: 9 2 22.22 3 33.33
Other
Native-Japanese: Native-Japanese 64 0 0 1 1.56
Binary
Hybrid (On and Kun) 0 0 0 0 0
Total 64 0 0 1 1.56
Native-Japanese: Native-Japanese 4 0 0 1 25.00
Trinary
Hybrid (On and Kun) 10 0 0 1 10.00
Total 14 0 0 2 14.29
Total 1,915 4 0.21 9 0.47
Note. Inconsistencies that were covered by ALA-LC Romanization Tables were counted as Incorrect.
Inconsistencies that were not clearly covered by the Tables, and were thus treated in different ways, were
counted as Questionable. Percentages were rounded to the nearest hundredth place.

(language) and gakusha (scholar). This difference in judgment is dis-


cussed more later.
Sino-Japanese Derived Compounds. In romanizing trinary derived
compounds, or words that look like one, understanding all the relevant
rules (Word Division 1(b) Trinary, derived, and other compounds,51 and
3(a), (c), (d) Prefixes, suffixes, etc.),52 and choosing the appropriate rule for
the case in question is quite difficult. This is because the process involves
several judgments. For the prefix, catalogers must determine whether the
character is part of an established compound, whether it belongs to “such
characters as etc.” as in the rule 1(b), and also whether it is a charac-
ter forming “pseudo-compound” rather than derived compound. Even if the
word in question is not established in dictionaries, it has to be written as a
single word if it is a pseudo-compound. To determine if the prefix is part of
the compound, it is also necessary to understand in what context the char-
acter is used, that is to say, whether the prefix affects only the compound to
which it looks attached, or the following ones, too. For example,
could be taken as (new dictionary of acclaimed songs) as well
as (dictionary of newly acclaimed songs). The suffix is basically
written together with preceding components, no matter what context it
is used in. Only the kinds defined in Word Division rule 3(c) require a
Romanization Practice for Japanese Language Titles 293

TABLE 9 Single Character Prefixes

Written Total
Type of Prefixes As part of word Separately No. %

dai 10 0 10 23.26
shin 6 3 9 20.93
hi 2 0 2 4.65
ko 2 0 2 4.65
shō 2 0 2 4.65
sai 2 0 2 4.65
i 2 0 2 4.65
mei 2 0 2 4.65
zen 2 0 2 4.65
ta 2 0 2 4.65
ichi 0 1 1 2.33
Other 7 0 7 16.28
Total 39 4 43 100.00
Note. Percentages were rounded to the nearest hundredth place.

judgment on whether they are part of a compound or not. Sample words


were examined to find if this judgment issue led to inconsistent word divi-
sion.
There were forty-three seeming trinary compounds derived or affected
by a prefix, including shin nōson bijinesu (new rural business)
and shin meika jiten mentioned above. The types and numbers of
prefixes found in the sample compounds are shown in Table 9. The prefix
shin in shin nōson bijinesu and shin meika jiten
were written independently following instruction in Word Division rule 3(a).
Out of the remaining forty-one words, only two prefixes, shin in
shin chimei (new place name) and ichi in ichi seinen (one young
man) were written separately from the following binary compound (shown
in Figure 7). It is probable that these characters were judged not to be part
of an established compound.

FIGURE 7 Examples of compounds affected by a prefix.


294 Y. Kudo

TABLE 10 Single Character Suffixes

Written Total
Type of Suffixes As part of word Separately No. %

x gaku 38 0 38 14.61
sho 37 0 37 14.23
shi 21 0 21 8.08
shū 19 0 19 7.31
ka 16 0 16 6.15
ron 11 2 13 5.00
ki 10 0 10 3.85
sha 10 0 10 3.85
hō 6 0 6 2.31
sei, shō 5 0 5 1.92
tō 0 5 5 1.92
ryoku 5 0 5 1.92
ka, ke 4 0 4 1.54
shi 4 0 4 1.54
teki 3 0 3 1.15
den 3 0 3 1.15
shō 2 0 2 0.77
kō 0 1 1 0.38
Other 58 0 58 22.31
Total 252 8 260 99.99
Note. Percentages were rounded to the nearest hundredth place.

As can be seen in Table 10, suffixes were of much greater number


as well as variety than prefixes. Eight of 260 suffixes were the types that
are defined in Word Division rule 3(c). All five tō (or nado) were written
separately, and three trinary words were written as single compounds (
jokunshō, shunsetsushō, shashinten). Unfortunately, the sample
was too small both in number and kind to examine consistency related to the
rule 3(c). The following Figure 8 shows inconsistent treatments for ron.
Of the thirteen occurrences of this suffix, only two in fukyū katei

FIGURE 8 Examples of compounds affected by a suffix ron.


Romanization Practice for Japanese Language Titles 295

ron (discussion about popularization process) and keizai hatten


ron (discussion about economic growth) were written separately from the
preceding compounds. As mentioned earlier, the suffix should not be written
separately from the preceding nouns.
In spite of confusing instructions and the judgments required, only a
small amount of inconsistency was found in the sample for trinary com-
pounds. This may indicate two things: few controversial cases actually exist
in title statements, and there is a tacit agreement between catalogers that in
general, three-character words be treated as a single compound word.
Other Complex Compounds. This section discusses four-character
words derived or affected by more than one affix. Nine words of
this type were found in the sample as shown in Figure 9. Based
on the examples provided with Word Division rule 3(a) (e.g.,
hi bunkateki [non-cultural]), the hi in hikanjiken (non-
Chinese character using regions) should have been written separately.
According to LC’s comment on romanization of keizai gakushi
(history of economics), such words should be divided into two binary com-
pounds if resulting two words make sense, even if they are not found in
any dictionary.53 Treatments were not perfectly consistent for the similarly
formed sample words. This occurred probably because of different percep-
tions for what “makes sense” and what does not. Whether gakushi
makes more sense than zaigaku or hōteki seems to depend a lot
on catalogers. Considering this LC policy, (linguist) discussed earlier
should be treated as two words, gengo (language) and gakusha
(scholar), because gakusha is definitely an established compound.

FIGURE 9 Examples of complex compounds affected by more than one affix.


296 Y. Kudo

FIGURE 10 Examples of native Japanese compounds.

Native Japanese Compounds. Of sixty-four native Japanese com-


pounds, only (rice making) was divided into two units, kome
and zukuri, which was inconsistent with other words such as
monozukuri (manufacturing) and tezukai (use of hands). The word
hanashikotoba (spoken language) was written as a single word as
opposed to kotoba asobi (playing words) and two additional words
that were divided into two units (shown in Figure 10). Apparently this in-
consistency came from different judgments on whether these words are a
single compound or not.
Figure 11 shows the words composed of a Kun (Japanese read-
ing) single character prefix and compound defined in Word Division rule
2(a)(2).54 The rule instructs catalogers to write this combination as two
words, but only tabi bunka (travel culture) followed that rule in this
sample. The other words were written as single units possibly based on Word
Division rule 2(a)(1)55 or 1(b) for Sino-Japanese trinary compounds. More
confusingly, the prefix ko in kogirei (trim; tidy) in the rule 2(a)(1) is
also a Kun single character.56 The only evident difference between ko-
girei and examples with the rule 2(a)(2) is that the former caused a phonetic
change, but no rule instructs on how to deal with phonetic changes. It is

FIGURE 11 Examples of compounds affected by a single Kun character.


Romanization Practice for Japanese Language Titles 297

likely that this lack of clarity has helped cause inconsistency in romanizing
this type of word.

SUMMARY OF THE STUDY

The extent of inconsistent romanization that appeared in the sample titles


was extremely small. Of the 950 titles, only twenty-five (2.6 percent) pre-
sented errors or inappropriateness in terms of romanization letter conversion.
Those errors, apparently, included a considerable number of careless mis-
takes. Incorrect or questionable word division, as shown in Table 11, was
found in only thirty-one out of 2,820 words extracted from the sample ti-
tles (1.1 percent). This low inconsistency rate may partly be attributed to
the examination method. As noted, the sample words were analyzed based
on comparison with other words of similar type and form, which means
problems might have been left out in types and forms when there were
too few to compare. The overall percentage, however, should stay relatively
low even in view of this possibility. These results suggest that the difficult
situations that have been brought up in catalogers’ discussion are actually
not very common in title statements. Of all the different types of words,
adjectives presented the highest rate of inconsistency with 10.64 percent.
Nouns made up nearly 91 percent of all the examined words (2,562 in total),
yet their word division was consistent to a large extent, showing problems
only in eighteen words (0.7 percent). Of all noun types, pronouns reached
the highest percentage (5 percent), and noun compounds presented the
greatest number of problems (thirteen words).
The results suggest that romanization (letter conversion) errors in for-
eign words occurred under some influence of original language spelling.

TABLE 11 Word Division Inconsistencies: Summary

Inconsistency
Incorrect Questionable Total
Type of Words Total No. No. % No. % No. %

Verbs 193 6 3.11 1 0.52 7 3.63


Adjectives/Adverbs 47 0 0 5 10.64 5 10.64
Compound particles 18 1 5.56 0 0 1 5.56
Nouns
Simple nouns 146 0 0 0 0 0 0.00
Pronouns 20 1 5.00 0 0 1 5.00
Proper nouns 481 2 0.42 3 0.62 5 1.04
Compounds 1,915 4 0.21 9 0.47 13 0.68
Total 2,562 7 0.27 11 0.43 18 0.70
Total 2,820 14 0.50 17 0.60 31 1.10
Note. Percentages were rounded to the nearest hundredth place.
298 Y. Kudo

They also revealed that treatment for Arabic numerals, as well as initials and
acronyms, are not yet definite among catalogers. Incorrect word division was
likely to be caused by missing or mixing up rules. An important point to be
made is that some guidelines are hidden in some of the examples instead of
being explicitly phrased as rules. The particle ka appearing under the rule
for honorific and potential auxiliaries (Word Division rule 2(b)(3)) is a case
in point. Such implicit guidelines could easily be missed, which may have
resulted in some of the incorrect treatments. It was also found that multiple
examples render conflicting ideas (e.g., bimyōnaru [delicate; subtle]
vs. nonki na [easygoing; optimistic] in Word Division rule 2(c) and (e)),
which would undoubtedly lead catalogers to different decisions. Not surpris-
ingly, inconsistency was found more frequently in types or forms of words
that were not clearly covered by the rules or examples. One of those types
of words was the derivative from proper names such as Nihon (Japan).
Without specific instructions, catalogers would need to work on words of
this type by going through a process of reading between the lines or stretch-
ing an interpretation of the rules. Judgments, which are required in some
rules, are another factor making consistent word division difficult. Divisive
treatments of affixes as found in shin chimei (new place name), even
though there were only a few, appear to reflect confusion in judging where a
compound starts and ends. Similarly, it was found that native Japanese com-
pounds were also affected by different views on what should be recognized
as a single word.
These findings strongly support that the ALA-LC Romanization Ta-
bles rules, especially those for word division, are considerably ambiguous
and complex. Understanding ambiguous and complex rules is definitely an
underlying factor in the problems discovered in the sample. At the 1997 meet-
ing of the ALA Committee on Cataloging: Asian & African Materials, the need
for rule clarification was suggested for the purpose of reducing inconsistent
Japanese romanization practices.57 As of today, however, that request has
not yet been put into effect. Based on the comment introduced by Council
on East Asian Libraries (CEAL) Committee on Technical Processing regarding
the treatment of the word Ura Sen-ke [proper name of a tea ceremony
school],58 LC does not seem to be enthusiastic about creating new rules or
additional statements to respond to all possible areas of confusion. They may
have a point, considering the linguistic nature of Japanese language, which
inherently has no concept of clear word division. Adding more regulations
or segmenting the rules on what is essentially an ambiguous situation might
not be practical, as it would end up inviting more diverse interpretations.
It might be more helpful to develop a supplement that would correspond
to the Library of Congress Rule Interpretations (LCRI)59 for Anglo-American
Cataloguing Rules, Second Edition (AACR2).60 Similar to how the LCRI helps
catalogers understand AACR2 rules to make necessary judgments, it would
Romanization Practice for Japanese Language Titles 299

help catalogers if such a supplementary resource existed to share how LC and


other major libraries interpret and apply the ALA-LC Romanization Tables.
Meanwhile, the ALA-LC Romanization Tables’ confusing examples for
use should be improved. Types, variety, and arrangement of examples
should be reconsidered, to make the technical term-filled rules more under-
standable and accessible. They should also be reorganized for more effective
clarification of the differences between the rules. Popular words and phrases
that have been developed after 1983 might be added in order to respond
to increasingly diverse Japanese titles. In fact, part of this improvement has
been fulfilled by the development of Descriptive Cataloging of East Asian
Material: CJK Examples of AACR2 and Library of Congress Rule Interpreta-
tions.61 This workbook conveniently provides additional representations of
romanized Japanese, as well as guidelines for numerals that are not ade-
quately covered by ALA-LC Romanization Tables. It would be useful if the
Japanese representations of this manual could be effectively brought into
line with the ALA-LC Romanization Tables.

CONCLUSION

This study addressed romanization issues reflected in Japanese title state-


ments in OCLC WorldCat records in selected LC classification ranges. The
results have shown that there was a low degree of inconsistency, which
is reassuring, but discovered problems that should be taken seriously and
improved. Ongoing efforts to develop guidelines that are easier to access,
understand, and apply are essential for better romanization practices.
The findings of the study suggest a few improvements and potential
ideas for future research. First, the comparison-based analysis of word di-
vision may not be the best approach in order to capture actual problems.
Depending on how the sample words are categorized, results could be some-
what biased or distorted. It would be useful to conduct another study using
an alternative method, and compare the results. Second, samples should be
collected from a broader range of areas in a balanced and random manner,
so that a better representation of the actual variety of Japanese words and
expressions are used. Third, it would be desirable to examine titles from
records of Japanese origin separately from others.
As shown in Table 12, more than 30 percent of inconsistent romanization
was found in the records loaded into OCLC WorldCat by Japanese vendors
or libraries, many of which had never been updated by North American
institutions. To analyze the cause of inconsistencies more accurately, roman-
ization practice possibly based on foreign standards should be investigated
in a different way. Finally, it would be helpful and of interest to many cata-
logers to examine word division of foreign words (or loan words) and their
derivatives, as those words have been increasingly used in titles.
300 Y. Kudo

TABLE 12 Inconsistencies found in Japanese vendor/institution records

Inconsistency
No. from Japanese
Total No. vendors/institutions %

Romanization 25 7 28.00
Word Division
Incorrect 14 8 57.14
Questionable 17 4 23.53
Total 31 12 38.71
Total 56 19 33.93
Note. Percentages were rounded to the nearest hundredth place.

The title statement is perhaps the most challenging field to romanize


in bibliographic information because of variation. Yet consistent and co-
hesive romanization of titles continues to be crucial to coordinate efficient
bibliographic access to growing Japanese language resources for all kinds
of users. Hopefully, this study will serve as a reminder of the issues, and
inspire further studies that will benefit the relevant library communities.

NOTES

1. Tamiko Matsumura, “Word Division in Romanized Japanese Titles” (Master’s thesis, University
of Chicago, 1964), 11.
2. James E. Agenbroad, “Romanization Is Not Enough,” Cataloging & Classification Quarterly 42,
no. 2 (2006): 22.
3. Randall K. Barry, ed., ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman
Scripts (Washington, DC: Library of Congress Cataloging Distribution Service, 1997). Also available online
at http://www.loc.gov/catdir/cpso/roman.html.
4. Sally C. Tseng, ed., LC Romanization Tables and Cataloging Policies. Assisted by David C.
Tseng and Linda C. Tseng (Metuchen, NJ: Scarecrow Press, 1990), xiii.
5. Jack Toru Tsukamoto, “A Study of Problems of Romanization of the Japanese Language in
Library Cataloging” (Master’s thesis, University of Texas, 1962), 7.
6. Ibid., 8.
7. Tseng, ed., LC Romanization Tables, xi.
8. Tsukamoto, “Japanese Language in Library Cataloging,” 24.
9. Ibid., 23.
10. Cataloging Rules of the American Library Association and the Library of Congress. Additions
and Changes, 1949–1958 (Washington, DC: Library of Congress, 1959).
11. “Japanese,” Cataloging Service Bulletin 20 (Spring 1983): 51–65.
12. Senkichiro Katsumata, ed., Kenkyusha’s New Japanese-English Dictionary, 3rd ed. (Tokyo:
Kenkyusha, 1954), xvi.
13. Yoshitaro Takenobu, ed., Kenkyusha’s New Japanese-English Dictionary (Tokyo: Kenkyusha,
1931), [i].
14. “Japanese,” Cataloging Service Bulletin 20 (Spring 1983): 51.
15. Ibid.; American National Standards Institute, American National Standard System for the Ro-
manization of Japanese (New York: ANSI, 1972).
16. Randall K. Barry, ed., ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman
Scripts (Washington, DC: Library of Congress Cataloging Distribution Service, 1991).
17. “Japanese Romanization,” Cataloging Service Bulletin 29 (Summer 1985): 43.
Romanization Practice for Japanese Language Titles 301

18. Randall K. Barry, ed., “Japanese,” in ALA-LC Romanization Tables: Transliteration Schemes for
Non-Roman Scripts (Washington, DC: Library of Congress Cataloging Distribution Service, 1997), 73–85.
Also available online at http://www.loc.gov/catdir/cpso/romanization/japanese.pdf.
19. Hisako Kotaka, e-mail to OCLC CJK Users Group mailing list, June 5, 2003,
http://listserv.oclc.org/scripts/wa.exe?A2=ind0306&L=oclc-cjk&T=0&F=PP&P=63. The symbol was
copied and pasted from the e-mail message.
20. “Romanization,” Cataloging Service Bulletin 100 (Spring 2003): 84.
21. Edwin O. Reischauer, “Romaji or Romazi,” Journal of the American Oriental Society 60, no. 1
(1940): 82–89.
22. Tadao Umesao, Asu no Nihongo no tame ni (Tokyo: Kumon
Shuppan, 1987).
23. Gen’ichi Tsuge, “Coercively Standardized or Not: Romanization Systems of the Japanese Lan-
guage in Music Literature,” The World of Music 46, no. 2 (2004): 137–143.
24. Masao Hirai, Kokugo kokuji mondai no rekishi (1948; repr., Tokyo:
Sangensha, 1998).
25. Tsukamoto, “Japanese Language in Library Cataloging,” 44.
26. Matsumura, “Word Division,” 11.
27. Hideyuki Morimoto, “Persistence of Misromanisation of Japanese Words/Phrases in Bib-
liographic Records Through Copy-Assisted Cataloging at North American Libraries.” EAJRS Euro-
pean Association of Japanese Resource Specialists, http://japanesestudies.arts.kuleuven.be/eajrs/files-
eajrs/Hideyuki Morimoto-2005eajrsslides01.pdf.
28. Lisa McDonald, e-mail message to author, February 12, 2009.
29. Koh Masuda, ed., Kenkyusha’s New Japanese-English Dictionary, 4th ed. (Tokyo: Kenkyusha,
1974), xiii.
30. Toshiro Watanabe, Edmund R. Skrzypczak, and Paul Snowden, eds., Kenkyusha’s New
Japanese-English Dictionary, 5th ed. (Tokyo: Kenkyusha, 2003), [ix].
31. Barry, ed., “Japanese,” in ALA-LC Romanization Tables, 73
32. American National Standards Institute, American National Standard System, 9.
33. Matsumura, “Word Division,” 74–109.
34. Hideyuki Morimoto, e-mail to Eastlib mailing list, July 10, 2006, http://lists.unc.edu/
read/archive?id=3450452.
35. Morimoto, e-mail to Eastlib mailing list, November 30, 2006, http://lists.unc.edu/read/
archive?id=3674857.
36. Morimoto, e-mail to Eastlib mailing list, December 7, 2006, http://lists.unc.edu/read/
archive?id=3689364.
37. Barry, ed., “Japanese,” in ALA-LC Romanization Tables, 79.
38. Dejitaru daijisen , s.v. “ ,” available from Japan-
Knowledge at http://www.japanknowledge.com/ (accessed May 31, 2009).
39. Nihon kokugo daijiten , 2nd ed., s.v. “ ,” available
from JapanKnowledge at http://www.japanknowledge.com/ (accessed May 31, 2009).
40. Matsumura, “Word Division,” 98.
41. Nihon kokugo daijiten, 2nd ed., s.v. “ .”
42. Yuki Johnson, Fundamentals of Japanese Grammar: Comprehensive Acquisition (Honolulu:
University of Hawaii Press, 2008), 30.
43. Barry, ed., “Japanese,” in ALA-LC Romanization Tables, 79.
44. Ibid.
45. Ibid., 78.
46. Ibid., 82.
47. Ibid.
48. Ibid., 84.
49. Ibid., 81.
50. Ibid., 74.
51. Ibid., 76.
52. Ibid., 79.
302 Y. Kudo

53. Council on East Asian Libraries, Committee on Technical Processing, “Asian Materi-
als Cataloging Questions and Answers,” CEAL Committee on Technical Processing, http://www.
eastasianlib.org/ctp/index.htm.
54. Barry, ed., “Japanese,” in ALA-LC Romanization Tables, 78.
55. Ibid., 77.
56. Kanjigen , Kaitei shinpan [rev. and new ed.], s.v. “ .”
57. Association for Library Collections & Technical Services, Committee on Cataloging: Asian and
African Materials, “1997 Midwinter Meeting Minutes,” Association for Library Collections & Technical
Services, http://www.ala.org/ala/mgrps/divs/alcts/mgrps/ccs/cmtes/catalogingasiana/97MWmin.doc.
58. Council on East Asian Libraries, Committee on Technical Processing, “Questions and Answers.”
59. Library of Congress, Cataloging Policy and Support Office, Library of Congress Rule Interpre-
tations, 2nd ed. (Washington, DC: Library of Congress, 1998–), also available from Cataloger’s Desktop.
60. Anglo-American Cataloguing Rules, 2nd ed., 2002 rev. (Chicago: ALA, 2002), also available
from Cataloger’s Desktop.
61. Library of Congress, “Descriptive Cataloging of East Asian Material: CJK Examples of
AACR2 and Library of Congress Rule Interpretations,” Library of Congress, http://www.loc.gov/
catdir/cpso/CJKIntro2.html.
Copyright of Cataloging & Classification Quarterly is the property of Taylor & Francis Ltd and its content may
not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for individual use.

You might also like