Jernej Vičič (Author), Andrej Brodnik (Author)

Abstract

The article describes a method that enhances translation performance of language pairs with a less used source language and a widely used target language. We propose a method that enables the use of parse tree based statistical translation algorithms for language pairs with a less used source language and a widely used target language. Automatic part of speech (POS) tagging algorithms have become accurate to the extent of efficient use in many tasks. Most of these methods are quite easily implementable in most world languages. The method is divided in two parts; the first part constructs alignments between POS tags of source sentences and induced parse trees of target language. The second part searches through trained data and selects the best candidates for target sentences, the translations. The method was not fully implemented due to time constraints; the training part was implemented and incorporated into a functional translation system; the inclusion of a word alignment model into the translation part was not implemented. The empirical evaluation addressing the quality of trained data was carried out on a full implementation of the presented training algorithms and the results confirm the employability of the method.

Keywords

machine translation;parse tree;

Data

Language: English
Year of publishing:
Typology: 1.01 - Original Scientific Article
Organization: UP - University of Primorska
Publisher: Fakulteta za družbene vede
UDC: 004.8
COBISS: 2818007 Link will open in a new window
ISSN: 1854-0023
Views: 5504
Downloads: 93
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Unknown
URN: URN:NBN:SI
Type (COBISS): Not categorized
Pages: str. 65-81
Volume: ǂVol. ǂ5
Issue: ǂno. ǂ1
Chronology: 2008
Keywords (UDC): science and knowledge;organization;computer science;information;documentation;librarianship;institutions;publications;znanost in znanje;organizacije;informacije;dokumentacija;bibliotekarstvo;institucije;publikacije;prolegomena;fundamentals of knowledge and culture;propaedeutics;prolegomena;splošne osnove znanosti in kulture;computer science and technology;computing;data processing;računalniška znanost in tehnologija;računalništvo;obdelava podatkov;artificial intelligence;umetna inteligenca;
ID: 35801
Recommended works:
, data collection and parallel corpus compilation for machine translation of subtitles
, diplomsko delo