Jernej Vičič (Avtor), Andrej Brodnik (Avtor)

Povzetek

The article describes a method that enhances translation performance of language pairs with a less used source language and a widely used target language. We propose a method that enables the use of parse tree based statistical translation algorithms for language pairs with a less used source language and a widely used target language. Automatic part of speech (POS) tagging algorithms have become accurate to the extent of efficient use in many tasks. Most of these methods are quite easily implementable in most world languages. The method is divided in two parts; the first part constructs alignments between POS tags of source sentences and induced parse trees of target language. The second part searches through trained data and selects the best candidates for target sentences, the translations. The method was not fully implemented due to time constraints; the training part was implemented and incorporated into a functional translation system; the inclusion of a word alignment model into the translation part was not implemented. The empirical evaluation addressing the quality of trained data was carried out on a full implementation of the presented training algorithms and the results confirm the employability of the method.

Ključne besede

machine translation;parse tree;

Podatki

Jezik: Angleški jezik
Leto izida:
Tipologija: 1.01 - Izvirni znanstveni članek
Organizacija: UP - Univerza na Primorskem
Založnik: Fakulteta za družbene vede
UDK: 004.8
COBISS: 2818007 Povezava se bo odprla v novem oknu
ISSN: 1854-0023
Št. ogledov: 5504
Št. prenosov: 93
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Neznan jezik
URN: URN:NBN:SI
Vrsta dela (COBISS): Delo ni kategorizirano
Strani: str. 65-81
Letnik: ǂVol. ǂ5
Zvezek: ǂno. ǂ1
Čas izdaje: 2008
Ključne besede (UDK): science and knowledge;organization;computer science;information;documentation;librarianship;institutions;publications;znanost in znanje;organizacije;informacije;dokumentacija;bibliotekarstvo;institucije;publikacije;prolegomena;fundamentals of knowledge and culture;propaedeutics;prolegomena;splošne osnove znanosti in kulture;computer science and technology;computing;data processing;računalniška znanost in tehnologija;računalništvo;obdelava podatkov;artificial intelligence;umetna inteligenca;
ID: 35801
Priporočena dela:
, data collection and parallel corpus compilation for machine translation of subtitles
, diplomsko delo