Papers by Daniel Dechelotte
Advances in Pattern Recognition, 2008
Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have p... more Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have provided the opportunity for enablement of speech recognition system, and even end-to-end speech translation system on these devices. However, two-way free-form speech-to-speech translation (as opposite to fixed phrase translation) is a highly complex task. A large amount of computation is involved to achieve reliable transformation performance.
Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09, 2009
This paper describes our Statistical Machine Translation systems for the WMT09 (en:fr) shared tas... more This paper describes our Statistical Machine Translation systems for the WMT09 (en:fr) shared task. For this evaluation, we have developed four systems, using two different MT Toolkits: our primary submission, in both directions, is based on Moses, boosted with contextual information on phrases, and is contrasted with a conventional Moses-based system. Additional contrasts are based on the Ncode toolkit, one of which uses (part of) the English/French GigaWord parallel corpus.
Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation - SSST '07, 2007
The purpose of this work is to explore the integration of morphosyntactic information into the tr... more The purpose of this work is to explore the integration of morphosyntactic information into the translation model itself, by enriching words with their morphosyntactic categories. We investigate word disambiguation using morphosyntactic categories, n-best hypotheses reranking, and the combination of both methods with word or morphosyntactic n-gram language model reranking. Experiments are carried out on the English-to-Spanish translation task. Using the morphosyntactic language model alone does not results in any improvement in performance. However, combining morphosyntactic word disambiguation with a word based 4-gram language model results in a relative improvement in the BLEU score of 2.3% on the development set and 1.9% on the test set.
IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 2005
This paper reports on recent experiments for speech to text (STT) translation of European Parliam... more This paper reports on recent experiments for speech to text (STT) translation of European Parliamentary speeches. A Spanish speech to English text translation system has been built using data from the TC-STAR European project. The speech recognizer is a state-of-the-art multipass system trained for the Spanish EPPS task and the statistical translation system relies on the IBM-4 model. First, MT results are compared using manual transcriptions and 1-best ASR hypotheses with different word error rates. Then, an n-best interface between the ASR and MT components is investigated to improve the STT process. Derivation of the fundamental equation for machine translation suggests that the source language model is not necessary for STT. This was investigated by using weak source language models and by n-best rescoring adding the acoustic model score only. A significant loss in the BLEU score was observed suggesting that the source language model is needed given the insufficiencies of the translation model. Adding the source language model score in the n-best rescoring process recovers the loss and slightly improves the BLEU score over the 1-best ASR hypothesis. The system achieves a BLEU score of 37.3 with an ASR word error rate of 10% and a BLEU score of 40.5 using the manual transcripts.
Eighth International Conference on …, 2004
This paper presents a two-way speech translation system that is completely hosted on an off-the-s... more This paper presents a two-way speech translation system that is completely hosted on an off-the-shelf handheld device. Specifically, this end-to-end system includes an HMM-based large vocabulary continuous speech recognizer (LVCSR) for both English and Chinese ...
IEEE Transactions on Audio Speech and Language Processing
This paper describes an approach for computing a consensus translation from the outputs of multip... more This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different...
Uploads
Papers by Daniel Dechelotte