Jani Dugonik (Avtor), Mirjam Sepesy Maučec (Avtor), Domen Verber (Avtor), Janez Brest (Avtor)

Povzetek

This paper proposes a hybrid machine translation (HMT) system that improves the quality of neural machine translation (NMT) by incorporating statistical machine translation (SMT). Therefore, two NMT systems and two SMT systems were built for the Slovenian-English language pair, each for translation in one direction. We used a multilingual language model to embed the source sentence and translations into the same vector space. From each vector, we extracted features based on the distances and similarities calculated between the source sentence and the NMT translation, and between the source sentence and the SMT translation. To select the best possible translation, we used several well-known classifiers to predict which translation system generated a better translation of the source sentence. The proposed method of combining SMT and NMT in the hybrid system is novel. Our framework is language-independent and can be applied to other languages supported by the multilingual language model. Our experiment involved empirical applications. We compared the performance of the classifiers, and the results demonstrate that our proposed HMT system achieved notable improvements in the BLEU score, with an increase of 1.5 points and 10.9 points for both translation directions, respectively.

Ključne besede

nevronsko strojno prevajanje;statistično strojno prevajanje;podobnost;klasifikacija;hibridno strojno prevajanje;neural machine translation;statistical machine translation;sentence embedding;similarity;classification;hybrid machine translation;

Podatki

Jezik: Angleški jezik
Leto izida:
Tipologija: 1.01 - Izvirni znanstveni članek
Organizacija: UM FERI - Fakulteta za elektrotehniko, računalništvo in informatiko
Založnik: MDPI
UDK: 004.5
COBISS: 154543107 Povezava se bo odprla v novem oknu
ISSN: 2227-7390
Št. ogledov: 210
Št. prenosov: 12
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Slovenski jezik
Sekundarne ključne besede: nevronsko strojno prevajanje;statistično strojno prevajanje;podobnost;klasifikacija;hibridno strojno prevajanje;
Vrsta dela (COBISS): Članek v reviji
Strani: 22 str.
Letnik: ǂVol. ǂ11
Zvezek: ǂno. ǂ11, [Article no.] 2484
Čas izdaje: 2023
DOI: 10.3390/math11112484
ID: 22958077