Coreano Registrar Tambien Chino
Coreano Registrar Tambien Chino
Coreano Registrar Tambien Chino
To cite this article: Wooseob Jeong , Joy Kim & Miree Ku (2009) Spaces in Korean Bibliographic
Records: To Be, or Not to Be, Cataloging & Classification Quarterly, 47:8, 708-721, DOI:
10.1080/01639370903203382
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Cataloging & Classification Quarterly, 47:708–721, 2009
Copyright © Taylor & Francis Group, LLC
ISSN: 0163-9374 print / 1544-4554 online
DOI: 10.1080/01639370903203382
WOOSEOB JEONG
School of Information Studies, University of Wisconsin–Milwaukee,
Milwaukee, Wisconsin, USA
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
JOY KIM
Korean Heritage Library, University of Southern California, Los Angeles, California, USA
MIREE KU
Duke University, Durham, North Carolina, USA
INTRODUCTION
708
Spaces in Korean Bibliographic Records 709
LITERATURE REVIEW
The two purposes of this study are to investigate how spacing in the Korean
script fields affects retrieval in various systems and to propose ways to deal
with the problems, both individually and as a community.
METHODOLOGY
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
LIMITATIONS
Each local database is unique, made up of records created over a long period
of time by different people, and serving different and varied clienteles. The
records at each institution reflect idiosyncratic local practices that may not
have been duplicated anywhere else. Also, each system is more or less
unique. One vendor may supply the same product to multiple institutions,
but each campus may customize its system. For this reason, it is difficult to
generalize.
For this study, we only used basic search methods such as Keyword
Search or Title Keyword Search, based on the assumption that library users
use the basic keyword search techniques most often.
The North American Korean Studies librarianship field is a small com-
munity, involving less than 25 institutions overall. Accordingly, the overall
number of survey participants and the responses in each group were not big
enough for inferential statistics.
712 W. Jeong et al.
DATA ANALYSIS
Survey of Librarians on Korean Language Support
in their OPAC System
In order to learn about the current state of Korean language support in the
OPAC systems of North American universities we conducted a brief survey
via Eastlib, a listserv. A total of 23 people responded to the online survey,
representing 22 institutions. These institutions use 7 or 8 different systems
(one respondent did not specify the system name). The respondents were
mostly expert Korean studies librarians but also included a few librarians
with limited Korean expertise whose responsibilities include working with
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
Voyager (9) 8 1 7 1 1
SIRSI (4) 0 4 0 0 4
Innovative (4) 3 1 4 0 0
Aleph (2) 2 0 0 2 0
Horizon (2) 2 0 1 1 0
GEAC (1) 0 1 0 0 1
TLC (1) 0 1 0 0 1
Total 15 8 12 4 7
Spaces in Korean Bibliographic Records 713
Librarians 11 2 1 0
Users 4 4 3 3
(observed by
librarians)
∗
Reasons Better, more Convenient, except Different searches
comprehensive, & for Chinese require different
more accurate characters; cannot strategies
results; ease of romanize
searching
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
option. One librarian who employs both said that different queries require
different strategies.
However, when the librarians were asked about their users’ favorite
ways to search for Korean materials, the answers were evenly spread: four
(4) romanization, four (4) scripts, and three (3) both equally. While the
difference between the two groups is curious, it should be noted that since
the information on users’ search behavior was based on the casual perception
(not by systematic observation) of the librarians, it may not accurately reflect
the reality. A direct survey of the user group would have been more reliable.
Table 2 shows a summary of the survey on favorite search methods for
Korean materials.
Other interesting findings from the survey include that Chinese charac-
ters (called Hancha in Korea) are rarely used, at most incidentally, and that
most non-native Korean librarians who perform Korean studies librarianship
duties on a part time basis were either not aware of the role of spaces in
Korean script fields or had an incorrect perception about the issue.
Keyword Search
Kugo
Institution System (no space) (with space) munpop
1 Duke Aleph 17 17 8
2 Harvard Aleph 37 37 51
3 Univ. of Michigan Aleph 38 38 20
4 Melvyl (UC) Aleph 90 90 5
5 WorldCat OCLC 234 234 103
6 Univ. of Chicago Horizon 50 50 36
7 Library of Congress Voyager no result 452 1015
8 Univ. of British Columbia Voyager no result 20 36
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
TABLE 4 Search Results of Korean Scripts versus Romanization from Duke University OPAC
in the third column were not retrieved from our Hangul searches because
they contained the search terms in Chinese characters. This illustrates the
need for cross mapping between Korean and Chinese. Many Korean schol-
arly vocabularies were derived from Chinese, and Korean scholarly works
commonly use Chinese characters. Those Korean words derived from Chi-
nese can directly replace their Chinese equivalents (called Hancha in Korea)
and vice versa. However, since Hangul and Hancha are not cross referenced,
the queries in Hangul characters cannot retrieve the many works written in
Hancha, and vice versa. Another issue has to do with the phonetic changes
in romanization. The records that include affixes surrounding the search
words such as (as in ) or (as in ) cause euphonic changes
in the pronunciation of these words and therefore are romanized differently
in the MR system. So “kugo” becomes “Hangugo” and “munpop” becomes
“munpomnon.” Obviously, they cannot be retrieved with search terms “kugo
munpop.” Therefore, 10 highly relevant records were only retrievable by Ko-
rean scripts because of the affixes in the words.
by using any of these index terms. We note that the bi-gram segmentation
technique produces some irrelevant or meaningless index terms (such as
in this case), which would theoretically result in false drops.
Our test searches in Korean OPACs generated somewhat inconsistent
results. We speculate the reason to be that each system has a unique local
indexing setting. Tables 5 and 6 show the results of a comparative search
in four OPAC systems in Korea using variations of the string,
and respectively. As seen in Table 5, longer queries seemed to gen-
erate more conflicting results, presumably because more spacing variation
possibilities in the query strings. Table 6 shows that shorter search queries
result in more consistent search results regardless of spaces.
Based on our study so far, the current state of the role of spaces in Korean
script searching on OPAC systems in North America and Korea appears to be
largely dependent on individual libraries’ configuration of indexing mecha-
nisms which seems to vary. For librarians who deal with Korean materials
on a daily basis in today’s networked global environment this is a highly
serious issue. Lacking a standard or a common shared practice, their work of
bibliographic control and user service become unnecessarily complicated. It
is an area crying out for further study and improvement.
In an attempt to form a consensus among Korean studies librarians
and make a recommendation to the Library of Congress, the Committee
OPAC System
Therefore, they have no way of knowing the spacing variations used in the
resource. OCLC and the Library of Congress continue to maintain their own
practices, respectively, so the vote has had no impact as of this writing.
RECOMMENDATIONS
We have demonstrated the need for a common standard for Korean scripts
in North American catalogs. While achieving this goal may take time, we
propose the following temporary solutions that are possible to implement
locally.
718
Option Advantages Disadvantages
ALA/LC/RLIN • Easy for catalogers to apply when creating records—Just • Not user friendly; the average user finds the rules unfamiliar
follow the same spacing in the romanized fields and difficult to learn
• Predictable for both librarians and users • Mediated search by a reference librarian often necessary
• Look and feel unnatural to average users (i.e., native
speakers)
• Inconsistent with OCLC spacing practice for Chinese and
Japanese, which may mislead the unsuspecting CJ
colleagues. The survey responses confirm this concern.
No Space/OCLC • Easy for catalogers to apply when creating records • Look unnatural
• Easy for everyone to search (only one “rule” to learn • Hard to read
and to remember)
• Predictable for everyone
• Consistent with Chinese and Japanese practice
As in Resource • Easy for catalogers to apply when creating • Paying attention to spaces as well as words when creating
records—Just copy the resource records may become an added chore for catalogers
• Unpredictable, since publishers often ignore word division
rules when designing title page, cover, colophon, spine
(descriptive sources)
• Hard to search since the users do not see the resource when
they search for it on library catalogs
• Totally dependent on the whims of publishers and users,
systematic recall cannot be expected
• Requires superior indexing mechanisms for this method to
work, over which we have little control
• Inconsistent with OCLC spacing practice for Chinese and
Japanese, which can mislead the unsuspecting CJ colleagues
Korean Word Division Rules • Rules are well established based on scientific principles • For those unfamiliar with the rules, there is a learning curve
• Widely accepted • Inconsistent with OCLC spacing practice for Chinese and
• Look and feel natural Japanese, which can mislead the unsuspecting CJ colleagues
• Many users are already familiar with the rules
Spaces in Korean Bibliographic Records 719
Hancha.
CONCLUSION
NOTES
1. Eugene Wu, “CEAL at the Dawn of the 21st Century,” JEAL no. 121 (June 2000), http://
contentdm.lib.byu.edu/u?/EastAsianLibraries,152
2. Council on East Asian Libraries Statistics, CEAL Annual Statistics from All Institu-
tions, Year 1950 to 2008 (every 3 years), http://www.lib.ku.edu/ceal/quickview.asp?view=all yearly&
step=3&tblview=1&from=1950&to=2008
3. “WorldCat Records by Language,” http://www.oclc.org/us/en/worldcat/statistics/charts/
languagecloud.htm
4. http://www.loc.gov/catdir/cpso/romanization/korean.pdf
5. “2008 CKM Annual Meeting Agenda,” http://www.eastasianlib.org/ckm/meetings/p2008.html.
6. J. J. Lee, H. Y. Cho, and H. R. Park, “N-gram-based Indexing for Korean Text Retrieval,”
Information Processing & Management 35, no. 4 (1999): 427–441.
720 W. Jeong et al.
7. J. Savoy, “Comparative Study of Monolingual and Multilingual Search Methods for Use with
Asian Languages,” ACM Transactions on Asian Language Information Processing 4, no. 2 (2005): 163–
189.
8. J. J. Lee, H. Y. Cho, and H. R. Park, “N-gram-based Indexing for Korean Text Retrieval”; J. H.
Lee and J. S. Ahn, “Using n-grams for Korean Text Retrieval.” Proceedings of the 19th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval. Zurich: Switzerland (1996):
216–224.
9. R. W. Sprout, Morphology and Computation (Cambridge, Mass.: MIT Press, 1992); W. J. Beesley,
J. K. Kenneth, and M. K. Karttunen, Finite State Morphology (Stanford, Calif.: CSLI Publications, 2003).
10. Seung-Shik Kang and Yung Taek Kim, “Syllable-based Model for the Korean Morphology.”
Proceedings of the 15th Conference on Computational Linguistics, Kyoto, Japan, August 5–9, 1994; Deok-
Bong Kim et al., “A Two-level Morphological Analysis of Korean.” Proceedings of the 15th Conference on
Computational Linguistics, Kyoto, Japan, August 5–9, 1994; Hyun S. Park, “Integrating Phrase Structure
Grammar Rules with Spelling Rules for Morphological Analysis of Korean.” Proceedings of the 18th
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
APPENDIX
Survey on Character Searching (Hangul, Hancha) in Local OPACs
1. Your name (optional):
2. Institution:
3. Your OPAC vendor
4. Does your OPAC support Hangul/Hancha searching?
No (Please skip to item 10 at the end of the survey)
Yes Since (month/year)
5. What is your favorite way of searching?
By romanization By characters No preference
Downloaded by [Universitat Politècnica de València] at 02:21 29 October 2014
Why?