data collection and parallel corpus compilation for machine translation of subtitles

Abstract

This paper describes the data collection and parallel corpus compilation activities carried out in the FP7 EU-funded SUMAT project. This project aims to develop an online subtitle translation service for nine European languages combined into 14 different language pairs. This data provides bilingual and monolingual training data for statistical machine translation engines which will semi-automate the subtitle translation processes of subtitling companies on a large scale.

Keywords

parallel multilingua corpora;statistical machine translation;subtitle translation service;

Data

Language: English
Year of publishing:
Typology: 1.08 - Published Scientific Conference Contribution
Organization: UM FERI - Faculty of Electrical Engineering and Computer Science
UDC: 004.8
COBISS: 16027926 Link will open in a new window
Views: 1420
Downloads: 53
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Unknown
URN: URN:SI:UM:
Type (COBISS): Not categorized
Pages: Str. 21-28
Keywords (UDC): science and knowledge;organization;computer science;information;documentation;librarianship;institutions;publications;znanost in znanje;organizacije;informacije;dokumentacija;bibliotekarstvo;institucije;publikacije;prolegomena;fundamentals of knowledge and culture;propaedeutics;prolegomena;splošne osnove znanosti in kulture;computer science and technology;computing;data processing;računalniška znanost in tehnologija;računalništvo;obdelava podatkov;artificial intelligence;umetna inteligenca;
ID: 1439062
Recommended works:
, data collection and parallel corpus compilation for machine translation of subtitles
, diplomsko delo