Csaba Hatvani

Followers

Following

Co-authors

Mentions

Public Views

Uploads

Papers by Csaba Hatvani

Címzett: Olcsó Áron

Download

Főnevek a Magyar WordNetben

Download

Manually annotated Hungarian natural language corpus : the Szeged Korpusz

Download

Magyar jelentés-egyértelműsített korpusz

Az első magyar WSD korpusz elkészítéséhez 39 olyan szóalakot választottunk ki, melyek jó mintapél... more Az első magyar WSD korpusz elkészítéséhez 39 olyan szóalakot választottunk ki, melyek jó mintapéldák a jelentés-egyértelműsítés feladatának vizsgálatára. A kiválasztásnál a kritériumok között szerepelt, hogy az adott szóalak legyen gyakori a magyar nyelvben (ennek mérésére a Magyar Nemzeti Szövegtár (MNSZ) [8] gyakorisági adatait használtuk), illetve, hogy legyen több, használatában gyakorinak tekinthető jelentése. A korpusz szövegeit is az MNSZ-ből, annak Heti Világgazdaság (HVG) számaiból összeállított részkorpuszából válogattuk. Így minden egyes példához rendelkezésre áll a vizsgálat szempontjából releváns kontextus (teljes HVG-cikk), illetve automatikus tokenizálás, szófaji kódolás, szótőre vonatkozó információ

Download

LaszlO Tihanyi

Current paper presents the results of a two-year project during which a consortium of the Univers... more Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (Morpho-Syntactic Description) has been used. The corpus contains texts of five different topic areas: schoolchildren's compositions, fiction, computer-related texts, news, and legal texts. During annotation, linguists have checked the morphosyntactic parsing of each word. Finding part-of-speech tagging (disambiguation) rules by machine learning algorithms was also studied by the researchers of the consortium. Due to the fact that the size of the corpus reaches up to 1 million text words without punctuation characters, it may serve as a reference source for numerous future research applications. The corpus can be obtained freely via Internet for research and educational purposes.

Manually annotated Hungarian corpus

Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03, 2003

Download

Kézzel annotált magyar nyelvi korpusz: a Szeged Korpusz

6720 Szeged, Árpád tér 2. { d c se n d e s, h a c s o , a l e x i n , c s i r i k , q y im i) ® i... more 6720 Szeged, Árpád tér 2. { d c se n d e s, h a c s o , a l e x i n , c s i r i k , q y im i) ® i n f .u-s z e q e d .h u h ttp :/ / w w w .i n f .u-s z e q e d .h u 2 MorphoLogic Kft Budapest 1118 Budapest, Késmárki u. 8. p ro sz e k y O m o rp h o lo q ic. hu h t t p : / /www. m o rp h o lo g ic. hu 3 MTA Nyelvtudományi Intézet 1068 Budapest, Benczúr u. 33. v a r a d i o n y t u d. hu h t t p : //www. n y t u d. hu Absztrakt: A Szeged Korpusz jelenlegi állapotában egy 1.2 millió szövegszó ból álló szófajilag egyértelműsített, szintaktikai szempontból laposan elemzett adatbázis. Az elemzések szabályok alapján történő automatikus előelemzést kö vetően kézi ellenőrzéssel és javítással történtek. A folyó munkálatok keretében egy bővebb szintaktikai elemzés, azaz egy magyar nyelvű treebank építése a cél, amelyben már szemantikai információk is szerepelni fognak. A korpusz regiszt ráció után hozzáférhető1, oktatási és kutatási célokra ingyenesen letölthető.

Download

Methods and results of the hungarian wordnet project

This paper presents a complete outline of the results of the Hungarian WordNet (HuWN) project: th... more This paper presents a complete outline of the results of the Hungarian WordNet (HuWN) project: the construction process of the general vocabulary Hungarian WordNet ontology, its validation and evaluation, the construction of a domain ontology of financial terms built on top of the general ontology, and two practical applications demonstrating the utilization of the ontology.

Download

Hungarian Word-sense Disambiguated Corpus

by Csaba Hatvani, Richárd Farkas, and Attila Almási

Proceedings of 6th International Conference on Language Resources and Evaluation, May 26, 2008

To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the pur... more To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the purpose of word sense disambiguation. Among others, selection criteria required the given word form to be frequent in Hungarian language usage (frequency rates available in the Hungarian National Corpus (HNC) were used for measurement (Váradi, 2000)), and to have more than one sense considered frequent in usage. HNC and its Heti Világgazdaság (HVG) subcorpus provided the basis for corpus text selection. This way, ...

Download