Articles

Data

1. Evaluation data sets for development purposes [Link]

These data sets were manually developed based on content drawn from the web. Each set consists of ca. 200 sentences.

Language pairs:

  • {Czech, German, Greek, Norwegian} to English
  • {Czech, English, Greek, Norwegian} to German

 

Creative Commons Licence
Evaluation data sets (development) are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

 

2. Evaluation data sets for test purposes [Link]

These data sets, each of which contains 200 sentences, were manually developed based on content drawn from the web. They were used for the evaluation of the PRESEMT system by human evaluators.

Language pairs:

  • {Czech, German, Greek, Norwegian} to English
  • {Czech, English, Greek, Norwegian} to German

 

Creative Commons Licence
Evaluation data sets (testing) are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

 

3. Bilingual corpora [txt[xml]

The specific corpora were manually developed based on content drawn from the web. Each set consists of 200-300 sentences.

Language pairs:

  • {Czech, German, Greek, Norwegian} to English
  • {Czech, English, Greek, Norwegian} to German

 

Creative Commons Licence
Bilingual corpora are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.