SOM-based corpus modeling for disambiguation purposes in MT

Hits: 2370
Research areas: Year: 2012
Type of Publication: In Proceedings Keywords: Machine translation, word translation disambiguation, SOM
Authors:
  • 34, Dologlou 35
Book title: Proceedings of the Hybrid Machine Translation Workshop [held in conjunction with the 15th International Conference on Text, Speech and Dialogue [(TSD2012)]
Address: Brno, Czech Republic
Organization: Hybrid Machine Translation Workshop [held in conjunction with the 15th International Conference on Text, Speech and Dialogue (TSD2012)] Month: September 3
Abstract:
The PRESEMT project constitutes a novel approach to the machine translation (MT) task. This project aims to develop a language-independent MT system architecture that is readily portable to new language pairs. PRESEMT falls within the Corpus-based MT (CBMT) paradigm, using a small bilingual parallel corpus and a large TL monolingual corpus. The present article investigates the process of selecting the best translation for a given token, by choosing over a set of suggested translations. For this disambiguation task, a dedicated module based on the SOM model (Self-Organizing Map) is presented. Though the SOM has been studied extensively for text processing applications, the present application on translation disambiguation is novel. The actual features employed are described, which project textual data on the SOM lattice. Details are provided on the modifications required to model very large corpora and on experimental results of integrating SOM to the PRESEMT system.
JRESEARCH_FULLTEXT: SOM_TSD2012.pdf