PRESEMT: A hybrid machine translation system based on large monolingual corpora

Hits: 4224
Research areas: Year: 2014
Type of Publication: In Proceedings
Editor: G. Kotzoglou, K. Nikolou, E. Karantzola, K. Frantzi, I. Galantomos, M. Georgalidou, V. Kourti-Kazoullis, Ch. Papadopoulou, E. Vlachou
Book title: Proceedings of the 11th International Conference on Greek Linguistics
Pages: 1642-1654
Address: Rhodes, Greece
Organization: 11th International Conference on Greek Linguistics Month: September 26–29
ISBN: 978-960-87197-9-8
The current article discusses a novel language-independent approach to machine translation, based on machine learning principles and inexpensive language resources, which was developed within the PRESEMT project. Unlike most modern MT methodologies requiring large parallel data, this methodology exploits only a limited size parallel corpus and relies instead on monolingual corpora models. This renders PRESEMT ideal for low-resourced languages as Greek. The article presents the translation process for the Greek-English language pair together with objective and subjective evaluation results, which assess translation accuracy. Emphasis is placed on benchmarking PRESEMT in comparison to other established MT systems.