This paper describes the joint submission of two systems to the all-words WSD sub-task of SemEval... more This paper describes the joint submission of two systems to the all-words WSD sub-task of SemEval-2007 task 17. The main goal of this work was to build a competitive unsupervised system by combining heterogeneous algorithms. As a secondary goal, we explored the integration of unsupervised predictions into a supervised system by different means.
In this paper, matching networks of finite degree are computed. Additionally the presented result... more In this paper, matching networks of finite degree are computed. Additionally the presented results are compared with the lower fundamental bounds available in the literature. These bounds are used to certify the optimality of the provided matching networks in function of the attained matching tolerance. To illustrate the presented results, two different examples of matching problems are presented.
AEU - International Journal of Electronics and Communications, 2019
This paper presents a methodology to design filters with folded canonical topologies, which imple... more This paper presents a methodology to design filters with folded canonical topologies, which implement cross couplings between non-adjacent resonators. The technique is based on segmenting the traditional coupling matrix in a step-by-step fashion. At each step, a subset of the whole physical structure is optimized to match the response of the corresponding segment of the coupling matrix. In the context of this design technique, in this paper we propose an efficient segmentation methodology of the coupling matrix based on multiport networks. The use of multiport networks allows to generate at each step several goal functions, which are simultaneously used during the optimization of the corresponding physical segment. These multiport networks allow to efficiently monitor the different paths of the signal, present in folded canonical topologies. It is shown that this strategy leads to a fast convergence of the step-by-step segmentation technique for the design of this type of coupling topologies. We apply the proposed methodology to the design of two filters using the quartet topology. The first filter has two transmission zeros placed at the real frequency axis, and the second one has two complex transmission zeros intended for group delay equalization. The results indicate that the proposed methodology is effective for the design of this type of coupling topologies, leading to initial dimensions for the filters that typically have less than 1% of error when they are compared with those obtained from a final global optimization.
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing t... more This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Sens...
Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -, 2000
This paper revisits the one sense per collocation hypothesis using fine-grained sense distinction... more This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre and topic variations. This explains the low results when performing word sense disambiguation across corpora. In fact, we demonstrate that when two independent corpora share a related genre/topic, the word sense disambiguation results would be better. Future work on word sense disambiguation will have to take into account genre and topic as important parameters on their models.
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing - EMNLP '06, 2006
This paper explores the use of two graph algorithms for unsupervised induction and tagging of nom... more This paper explores the use of two graph algorithms for unsupervised induction and tagging of nominal word senses based on corpora. Our main contribution is the optimization of the free parameters of those algorithms and its evaluation against publicly available gold standards. We present a thorough evaluation comprising supervised and unsupervised modes, and both lexical-sample and all-words tasks. The results show that, in spite of the information loss inherent to mapping the induced senses to the gold-standard, the optimization of parameters based on a small sample of nouns carries over to all nouns, performing close to supervised systems in the lexical sample task and yielding the second-best WSD systems for the Senseval-3 all-words task.
Persistent organic pollutants (POPs) are suggested to contribute to lower vitamin D levels; howev... more Persistent organic pollutants (POPs) are suggested to contribute to lower vitamin D levels; however, studies in humans are scarce and have never focused on pregnancy, a susceptibility period for vitamin D deficiency. We investigated whether serum levels of POPs were associated with circulating 25-hydroxyvitamin D3 [25(OH)D3] concentration in pregnancy. Cross-sectional associations of serum concentrations of eight POPs with plasma 25(OH)D3 concentration were analyzed in 2031 pregnant women participating in the Spanish populationbased cohort INfancia y Medio Ambiente (INMA) Project. Serum concentrations of POPs were measured by gas chromatography and plasma 25(OH)D3 concentration was measured by high-performance liquid chromatography in pregnancy (mean 13.3 ± 1.5 weeks of gestation). Multivariable regression models were performed to assess the relationship between blood concentrations of POPs and 25(OH)D3. An inverse linear relationship was found between serum concentration of PCB180 and circulating 25(OH)D3. Multivariate linear regression models showed higher PCB180 levels to be associated with lower 25(OH)D3 concentration: quartile Q4 vs. quartile Q1, coefficient = −1.59, 95% CI −3.27, 0.08, p trend = 0.060. A non-monotonic inverse relationship was found between the sum of predominant PCB congeners (PCB 180, 153 and 138) and 25(OH)D3 concentration: coefficient (95% CI) for quartile Q2 vs. Q1 [−0.50 (−1.94, 0.94)], quartile Q3 vs. Q1 [−1.56 (−3.11, −0.02)] and quartile Q4 vs. Q1 [−1.21 (−2.80, 0.38)], p trend = 0.081. No significant associations were found between circulating 25(OH)D3 and serum levels of p,p′-DDE, p,p′-DDT, HCB, and ß-HCH. Our results suggest that the background exposure to PCBs may result in lower 25(OH)D3 concentration in pregnant women.
This paper describes a method for detecting event trigger words in biomedical text based on a wor... more This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for certain word types. On the basis of this finding, we combine the WSD approach with the CRF, and obtain significant improvements over the standalone CRF, gaining particularly in recall.
Working Package 6 aims at developing accurate Word Sense Disambiguation (WSD) methods for a numbe... more Working Package 6 aims at developing accurate Word Sense Disambiguation (WSD) methods for a number of languages. The main problem for current systems is the acquisition bottleneck. For instance, supervised systems (which get the most accurate results) need large amounts of hand-tagged data, which is not currently available.
The results of recent WSD exercises, eg Senseval-21 [Edmonds and Cotton, 2001] show clearly that ... more The results of recent WSD exercises, eg Senseval-21 [Edmonds and Cotton, 2001] show clearly that WSD methods based on hand-tagged examples are the ones performing best. However, the main drawback for supervised WSD is the knowledge acquisition bottleneck: the systems need large amounts of costly hand-tagged data. The situation is more dramatic for lesser studied languages. In order to overcome this problem, different research lines have been explored: automatic acquisition of training examples [Mihalcea, 2002], ...
Proceedings of the Fourth International Conference on Language Resources and Evaluations (LREC-04)(Lisbon, Portugal, May 24, 2004
The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged exampl... more The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged examples to be used for Word Sense Disambiguation (WSD). We have applied the “monosemous relatives” method on the Web in order to build such a resource for all nouns in WordNet. The analysis of some parameters revealed that the distribution of the word senses (bias) in the training and test corpus is a determinant factor. Provided there is a method to approximate the bias for each word sense, the results we obtained for English ...
ACM Transactions on Asian Language Information Processing, 2010
This article reconsiders the task of MRD-based word sense disambiguation, in extending the basic ... more This article reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenization schemes and methods of definition extension. In experimentation over the Hinoki Sensebank and the Japanese Senseval-2 dictionary task, we demonstrate that sense-sensitive definition extension over hyponyms, hypernyms, and synonyms, combined with definition extension and word tokenization leads to WSD accuracy above both unsupervised and supervised baselines. In doing so, we demonstrate the utility of ontology induction and establish new opportunities for the development of baseline unsupervised WSD methods.
This paper describes the joint submission of two systems to the all-words WSD sub-task of SemEval... more This paper describes the joint submission of two systems to the all-words WSD sub-task of SemEval-2007 task 17. The main goal of this work was to build a competitive unsupervised system by combining heterogeneous algorithms. As a secondary goal, we explored the integration of unsupervised predictions into a supervised system by different means.
In this paper, matching networks of finite degree are computed. Additionally the presented result... more In this paper, matching networks of finite degree are computed. Additionally the presented results are compared with the lower fundamental bounds available in the literature. These bounds are used to certify the optimality of the provided matching networks in function of the attained matching tolerance. To illustrate the presented results, two different examples of matching problems are presented.
AEU - International Journal of Electronics and Communications, 2019
This paper presents a methodology to design filters with folded canonical topologies, which imple... more This paper presents a methodology to design filters with folded canonical topologies, which implement cross couplings between non-adjacent resonators. The technique is based on segmenting the traditional coupling matrix in a step-by-step fashion. At each step, a subset of the whole physical structure is optimized to match the response of the corresponding segment of the coupling matrix. In the context of this design technique, in this paper we propose an efficient segmentation methodology of the coupling matrix based on multiport networks. The use of multiport networks allows to generate at each step several goal functions, which are simultaneously used during the optimization of the corresponding physical segment. These multiport networks allow to efficiently monitor the different paths of the signal, present in folded canonical topologies. It is shown that this strategy leads to a fast convergence of the step-by-step segmentation technique for the design of this type of coupling topologies. We apply the proposed methodology to the design of two filters using the quartet topology. The first filter has two transmission zeros placed at the real frequency axis, and the second one has two complex transmission zeros intended for group delay equalization. The results indicate that the proposed methodology is effective for the design of this type of coupling topologies, leading to initial dimensions for the filters that typically have less than 1% of error when they are compared with those obtained from a final global optimization.
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing t... more This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Sens...
Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -, 2000
This paper revisits the one sense per collocation hypothesis using fine-grained sense distinction... more This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre and topic variations. This explains the low results when performing word sense disambiguation across corpora. In fact, we demonstrate that when two independent corpora share a related genre/topic, the word sense disambiguation results would be better. Future work on word sense disambiguation will have to take into account genre and topic as important parameters on their models.
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing - EMNLP '06, 2006
This paper explores the use of two graph algorithms for unsupervised induction and tagging of nom... more This paper explores the use of two graph algorithms for unsupervised induction and tagging of nominal word senses based on corpora. Our main contribution is the optimization of the free parameters of those algorithms and its evaluation against publicly available gold standards. We present a thorough evaluation comprising supervised and unsupervised modes, and both lexical-sample and all-words tasks. The results show that, in spite of the information loss inherent to mapping the induced senses to the gold-standard, the optimization of parameters based on a small sample of nouns carries over to all nouns, performing close to supervised systems in the lexical sample task and yielding the second-best WSD systems for the Senseval-3 all-words task.
Persistent organic pollutants (POPs) are suggested to contribute to lower vitamin D levels; howev... more Persistent organic pollutants (POPs) are suggested to contribute to lower vitamin D levels; however, studies in humans are scarce and have never focused on pregnancy, a susceptibility period for vitamin D deficiency. We investigated whether serum levels of POPs were associated with circulating 25-hydroxyvitamin D3 [25(OH)D3] concentration in pregnancy. Cross-sectional associations of serum concentrations of eight POPs with plasma 25(OH)D3 concentration were analyzed in 2031 pregnant women participating in the Spanish populationbased cohort INfancia y Medio Ambiente (INMA) Project. Serum concentrations of POPs were measured by gas chromatography and plasma 25(OH)D3 concentration was measured by high-performance liquid chromatography in pregnancy (mean 13.3 ± 1.5 weeks of gestation). Multivariable regression models were performed to assess the relationship between blood concentrations of POPs and 25(OH)D3. An inverse linear relationship was found between serum concentration of PCB180 and circulating 25(OH)D3. Multivariate linear regression models showed higher PCB180 levels to be associated with lower 25(OH)D3 concentration: quartile Q4 vs. quartile Q1, coefficient = −1.59, 95% CI −3.27, 0.08, p trend = 0.060. A non-monotonic inverse relationship was found between the sum of predominant PCB congeners (PCB 180, 153 and 138) and 25(OH)D3 concentration: coefficient (95% CI) for quartile Q2 vs. Q1 [−0.50 (−1.94, 0.94)], quartile Q3 vs. Q1 [−1.56 (−3.11, −0.02)] and quartile Q4 vs. Q1 [−1.21 (−2.80, 0.38)], p trend = 0.081. No significant associations were found between circulating 25(OH)D3 and serum levels of p,p′-DDE, p,p′-DDT, HCB, and ß-HCH. Our results suggest that the background exposure to PCBs may result in lower 25(OH)D3 concentration in pregnant women.
This paper describes a method for detecting event trigger words in biomedical text based on a wor... more This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for certain word types. On the basis of this finding, we combine the WSD approach with the CRF, and obtain significant improvements over the standalone CRF, gaining particularly in recall.
Working Package 6 aims at developing accurate Word Sense Disambiguation (WSD) methods for a numbe... more Working Package 6 aims at developing accurate Word Sense Disambiguation (WSD) methods for a number of languages. The main problem for current systems is the acquisition bottleneck. For instance, supervised systems (which get the most accurate results) need large amounts of hand-tagged data, which is not currently available.
The results of recent WSD exercises, eg Senseval-21 [Edmonds and Cotton, 2001] show clearly that ... more The results of recent WSD exercises, eg Senseval-21 [Edmonds and Cotton, 2001] show clearly that WSD methods based on hand-tagged examples are the ones performing best. However, the main drawback for supervised WSD is the knowledge acquisition bottleneck: the systems need large amounts of costly hand-tagged data. The situation is more dramatic for lesser studied languages. In order to overcome this problem, different research lines have been explored: automatic acquisition of training examples [Mihalcea, 2002], ...
Proceedings of the Fourth International Conference on Language Resources and Evaluations (LREC-04)(Lisbon, Portugal, May 24, 2004
The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged exampl... more The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged examples to be used for Word Sense Disambiguation (WSD). We have applied the “monosemous relatives” method on the Web in order to build such a resource for all nouns in WordNet. The analysis of some parameters revealed that the distribution of the word senses (bias) in the training and test corpus is a determinant factor. Provided there is a method to approximate the bias for each word sense, the results we obtained for English ...
ACM Transactions on Asian Language Information Processing, 2010
This article reconsiders the task of MRD-based word sense disambiguation, in extending the basic ... more This article reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenization schemes and methods of definition extension. In experimentation over the Hinoki Sensebank and the Japanese Senseval-2 dictionary task, we demonstrate that sense-sensitive definition extension over hyponyms, hypernyms, and synonyms, combined with definition extension and word tokenization leads to WSD accuracy above both unsupervised and supervised baselines. In doing so, we demonstrate the utility of ontology induction and establish new opportunities for the development of baseline unsupervised WSD methods.
Uploads
Papers by david martinez