PRESEMT

Conditional Random Fields versus template-matching in MT phrasing tasks involving sparse training data

Research areas:	Machine translation	Year:	2015
Type of Publication:	Article	Keywords:	Parsing of natural language; Template-matching; Conditional-random fields; Phrasing model generator; Machine translation
Journal:	Pattern Recognition Letters	Volume:	53
Pages:	44-52


Abstract:	This communication focuses on comparing the template-matching technique to established probabilistic approaches – such as conditional random fields (CRF) – on a specific linguistic task, namely the phrasing of a sequence of words into phrases. This task represents a low-level parsing of the sequence into linguistically-motivated phrases. CRF represents the established method for implementing such a data-driven parser, while template-matching is a simpler method that is faster to train and operate. The two aforementioned techniques are compared here to determine the most suitable approach for extracting an accurate model. The specific application studied is related to a machine translation (MT) methodology (namely PRESEMT), though the comparison performed holds for other applications as well, for which only sparse training data are available. PRESEMT uses small parallel corpora to learn structural transformations from a source language (SL) to a target language (TL) and thus translate input text. This results in the availability of only sparse training data from which to train the parser. Experimental results indicate that for a limited-size training set, as is the case for the PRESEMT methodology, template-matching generates a superior phrasing model that in turn generates higher quality translations. This is confirmed by studying more than one source/target language pairs, for multiple independent testsets.
Online version

Back

Top

Skip to content

Web design, realisation, maintenance and administration by Marina Vassiliou
Logo design and realisation by Zacharias Detorakis
The research leading to these results has received funding from the European Community's
Seventh Framework Programme (FP7/2007-2013) under grant agreement No 248307.

PRESEMT

Conditional Random Fields versus template-matching in MT phrasing tasks involving sparse training data

Results

Links

Login Form

The PRESEMT book