Project details
The PRESEMT project constitutes a novel approach to Machine Translation, characterised by the use of (a) cross-disciplinary techniques, mainly borrowed from the machine learning and computational intelligence domains, and (b) relatively inexpensive language resources. The aim is to develop a language-independent methodology for the creation of a flexible and adaptable MT system, the features of which ensure easy portability to new language pairs or adaptability to particular user requirements. PRESEMT falls within the Corpus-based MT (CBMT) paradigm. The resources employed, a small bilingual corpus and a large target language (TL) monolingual one, are collected as far as possible over the web, to simplify the development of resources for new language pairs.
The key aspects of PRESEMT involve modelling based on syntactic phrases, as they have been proven to improve translation quality, pattern recognition approaches (such as extended clustering or neural networks) towards the development of a language-independent analysis and evolutionary algorithms for system optimisation.
PRESEMT has a duration of 3 years. The work plan is analysed into 9 work packages relating to five aspects, namely project management (WP1), dissemination activities (WP8), system specifications (WP2), system development & integration (WP3 – WP7) and validation & evaluation (WP9).
The language pairs studied are given below:
-
Czech --> English & German
-
English--> German
-
German --> English
-
Greek --> English & German
-
Norwegian --> English & German
Near the end of the project an assessment phase is scheduled, where additional language pairs will be investigated, with Italian as the target language.