We investigate an automatic method for Cross Language Information Retrieval (CLIR) that utilizes ... more We investigate an automatic method for Cross Language Information Retrieval (CLIR) that utilizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into English. Two experiments are presented using OHSUMED, a subset of MEDLINE. Both experiments examine retrieval effectiveness of the translated queries. However, in the second experiment, the query translation procedure is augmented with digram based vocabulary normalization procedures. In this comparative study of retrieval effectiveness the measures used are: 11-point-average precision score (11-AvgP); average interpolated precision at recall of 0.1; and noninterpolated (i.e., exact) precision after 10 retrieved documents. Our results indicate that for Spanish the UMLS Metathesaurus based CLIR method appears equivalent to multilingual dictionary based approaches investigated in the current literature French yields less favorable results and our analysis suggests that linguistic differences may have caused the performance differences.
... Miguel E. Ruiz Padmini Srinivasan The University of Iowa The University of Iowa 3087 Main Lib... more ... Miguel E. Ruiz Padmini Srinivasan The University of Iowa The University of Iowa 3087 Main Library 3087 Main Library Iowa City, IA 52246-1420 Iowa City, IA 52246-1420 Phone: 1 (319) 335-5707 ... Text categorization can be char-acterized es a supervised learning problem. ...
We present results of the University of Iowa topic tracking and detection as well as story segmen... more We present results of the University of Iowa topic tracking and detection as well as story segmentation efforts. Topic tracking is performed for the "boundaries given" case. The DET curves for all the runs are consistently smooth and concave suggesting no sudden changes in expectation required from the user. The effect of reducing the training size of relevant stories is examined. The detection runs are performed using a "pipeline" model to utilize the advantage of the deferral period. Performance is strongly influenced by the fact that roughly 2000 to 3000 declared topic clusters are generated during the detection runs. Performance is analyzed with respect to changing the cluster threshold. In segmentation, an agglomerative clustering strategy is adopted. The decision to declare a boundary depends upon both lexical similarity of neighboring segments as well as the pause duration. The algorithmic complexity of the method is O(k log k) where k is the number of pause delimited sentences in the file. The tracking, detection and segmentation modules provide a sound framework for future extension and experimentation.
Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary tow... more Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user's query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Uni®ed Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of dierent vocabulary views in information retrieval. 7
This paper presents the design and evaluation of a text categorization method based on the Hierar... more This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
IEEE Transactions on Knowledge and Data Engineering, 1999
AbstractÐWe develop an automatic text categorization approach and investigate its application to ... more AbstractÐWe develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two realworld document collections from the MEDLINE database. Next, we investigate the application of automatic categorization to text retrieval. Our experiments clearly indicate that automatic categorization improves the retrieval performance compared with no categorization. We also demonstrate that the retrieval performance using automatic categorization achieves the same retrieval quality as the performance using manual categorization. Furthermore, detailed analysis of the retrieval performance on each individual test query is provided.
We investigate an automatic method for Cross Language Information Retrieval (CLIR) that utilizes ... more We investigate an automatic method for Cross Language Information Retrieval (CLIR) that utilizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into English. Two experiments are presented using OHSUMED, a subset of MEDLINE. Both experiments examine retrieval effectiveness of the translated queries. However, in the second experiment, the query translation procedure is augmented with digram based vocabulary normalization procedures. In this comparative study of retrieval effectiveness the measures used are: 11-point-average precision score (11-AvgP); average interpolated precision at recall of 0.1; and noninterpolated (i.e., exact) precision after 10 retrieved documents. Our results indicate that for Spanish the UMLS Metathesaurus based CLIR method appears equivalent to multilingual dictionary based approaches investigated in the current literature French yields less favorable results and our analysis suggests that linguistic differences may have caused the performance differences.
... Miguel E. Ruiz Padmini Srinivasan The University of Iowa The University of Iowa 3087 Main Lib... more ... Miguel E. Ruiz Padmini Srinivasan The University of Iowa The University of Iowa 3087 Main Library 3087 Main Library Iowa City, IA 52246-1420 Iowa City, IA 52246-1420 Phone: 1 (319) 335-5707 ... Text categorization can be char-acterized es a supervised learning problem. ...
We present results of the University of Iowa topic tracking and detection as well as story segmen... more We present results of the University of Iowa topic tracking and detection as well as story segmentation efforts. Topic tracking is performed for the "boundaries given" case. The DET curves for all the runs are consistently smooth and concave suggesting no sudden changes in expectation required from the user. The effect of reducing the training size of relevant stories is examined. The detection runs are performed using a "pipeline" model to utilize the advantage of the deferral period. Performance is strongly influenced by the fact that roughly 2000 to 3000 declared topic clusters are generated during the detection runs. Performance is analyzed with respect to changing the cluster threshold. In segmentation, an agglomerative clustering strategy is adopted. The decision to declare a boundary depends upon both lexical similarity of neighboring segments as well as the pause duration. The algorithmic complexity of the method is O(k log k) where k is the number of pause delimited sentences in the file. The tracking, detection and segmentation modules provide a sound framework for future extension and experimentation.
Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary tow... more Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user's query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Uni®ed Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of dierent vocabulary views in information retrieval. 7
This paper presents the design and evaluation of a text categorization method based on the Hierar... more This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
IEEE Transactions on Knowledge and Data Engineering, 1999
AbstractÐWe develop an automatic text categorization approach and investigate its application to ... more AbstractÐWe develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two realworld document collections from the MEDLINE database. Next, we investigate the application of automatic categorization to text retrieval. Our experiments clearly indicate that automatic categorization improves the retrieval performance compared with no categorization. We also demonstrate that the retrieval performance using automatic categorization achieves the same retrieval quality as the performance using manual categorization. Furthermore, detailed analysis of the retrieval performance on each individual test query is provided.
Uploads
Papers by Miguel Ruiz