O avtomatski evalvaciji strojnega prevajanja

Darinka Verdonik (Avtor), Mirjam Sepesy Maučec (Avtor)

Povzetek

Stalen del razvoja strojnega prevajanja je evalvacija prevodov, pri čemer se v glavnem uporabljajo avtomatski postopki. Ti vedno temeljijo na referenčnem prevodu. V tem prispevku pokažemo, kako zelo različni so lahko referenčni prevodi za področje podnaslavljanja ter kako lahko to vpliva na oceno – ista metrika lahko isti prevajalnik oceni kot neuporaben ali kot zelo uspešen samo na podlagi tega, da uporabimo referenčne prevode, ki so pridobljeni po različnih postopkih, vendar vedno jezikovno in pomensko povsem ustrezni.

Ključne besede

strojno prevajanje;vrednotenje;

Podatki

Jezik:	Slovenski jezik
Leto izida:	2013
Tipologija:	1.01 - Izvirni znanstveni članek
Organizacija:	UM FERI - Fakulteta za elektrotehniko, računalništvo in informatiko
UDK:	81'322.4
COBISS:	16892438
ISSN:	2335-2736
Št. ogledov:	892
Št. prenosov:	315
Ocena:	0 (0 glasov)
Metapodatki:

Ostali podatki

Sekundarni jezik:	Angleški jezik
Sekundarni naslov:	On automatic machine translation evaluation
Sekundarni povzetek:	An important task of developing machine translation (MT) is evaluating system performance. Automatic measures are most commonly used for this task, as manual evaluation is time-consuming and costly. However, to perform an objective evaluation is not a trivial task. Automatic measures, such as BLEU, TER, NIST, METEOR etc., have their own weaknesses, while manual evaluations are also problematic since they are always to some extent subjective. In this paper we test the influence of a test set on the results of automatic MT evaluation for the subtitling domain. Translating subtitles is a rather specific task for MT, since subtitles are a sort of summarization of spoken text rather than a direct translation of (written) text. Additional problem when translating language pair that does not include English, in our example Slovene-Serbian, is that commonly the translations are done from English to Serbian and from English to Slovenian, and not directly, since most of the TV production is originally filmed in English. All this poses additional challenges to MT and consequently to MT evaluation. Automatic evaluation is based on a reference translation, which is usually taken from an existing parallel corpus and marked as a test set. In our experiments, we compare the evaluation results for the same MT system output using three types of test set. In the first round, the test set are 4000 subtitles from the parallel corpus of subtitles SUMAT. These subtitles are not direct translations from Serbian to Slovene or vice versa, but are based on an English original. In the second round, the test set are 1000 subtitles randomly extracted from the first test set and translated anew, from Serbian to Slovenian, based solely on the Serbian written subtitles. In the third round, the test set are the same 1000 subtitles, however this time the Slovene translations were obtained by manually correcting the Slovene MT outputs so that they are correct translations of the Serbian subtitles. The results of MT evaluation were calculated for the metrics NIST, BLEU and TER. They were strikingly diverse, even though the system output was always the same: when calculated on the original translations from the parallel corpus, BLEU was 19.47%, TER 65.27% and NIST 5.05; when calculated on directly translated subtitles from Serbian to Slovenian, BLEU was 43.10%, TER 32.91% and NIST 7.78; when calculated on the manually corrected MT output, BLEU (also so-called hBLEU) was 71.6%, (h)TER 14.1% and (h)NIST 10.62.
Sekundarne ključne besede:	machine translating;evaluation;
URN:	URN:NBN:SI
Vrsta dela (COBISS):	Znanstveno delo
Strani:	str. 111-133
Letnik:	ǂLetn. ǂ1
Zvezek:	ǂšt. ǂ1
Čas izdaje:	2013
Ključne besede (UDK):	language;linguistics;literature;jezikoslovje;filologija;književnost;linguistics and languages;jezikoslovje in jeziki;
ID:	1439526