diplomsko delo
Abstract
Avtomatsko strojno prevajanje je lahko za podjetja zelo koristno,
ker uporabnikom, ki ne govorijo angleščine, omogoča uporabo programske
opreme in poslovanje v izbranem jeziku. Storitve strojnega prevajanja ponujajo
številna podjetja, vendar te pri prevodih domensko specifičnih besedil
ponavadi ne dosežejo dovolj visoke natančnosti. Za reševanje tega problema
nekateri ponujajo storitve z možnostjo uglaševanja prevajalnikov s svojimi
podatki. Med njih spada tudi Microsoftov Azure Custom Translator, ki je
uporabljen v naši diplomski nalogi. Ker na sam model nimamo vpliva, se to
delo v večini osredotoča na pridobivanje in pripravo podatkov. Z uporabo
modelov LASER in Vecalign se iz profesionalno prevedenih besedil izloči in
poravna vzporedne stavke. Z njimi se nauči dve različici prevajalnika po
meri, ki temeljita na splošnem in tehnološkem osnovnem modelu. Z ocenami
BLEU, chrF++ in BERTScore se naša modela primerja z drugimi možnostmi
znotraj Azure ter eno vodilnih zunanjih storitev. Tehnološki model doseže
rezultate, ki so primerljivi z zunanjim. Z najboljšim modelom izdelamo tudi
preprosto prevajalniško aplikacijo.
Keywords
prevajanje;izdelava strojnega prevajalnika;označevalna industrija;Azure;oblačne storitve;univerzitetni študij;diplomske naloge;
Data
| Language: |
Slovenian |
| Year of publishing: |
2023 |
| Typology: |
2.11 - Undergraduate Thesis |
| Organization: |
UL FRI - Faculty of Computer and Information Science |
| Publisher: |
[A. Zrimšek] |
| UDC: |
004:81'322.4(043.2) |
| COBISS: |
166032899
|
| Views: |
42 |
| Downloads: |
9 |
| Average score: |
0 (0 votes) |
| Metadata: |
|
Other data
| Secondary language: |
English |
| Secondary title: |
Developing a machine translation system for use in the labelling industry |
| Secondary abstract: |
Automatic machine translation can be an incredibly useful tool
for companies whose employees are not fluent in English, because it can
enable them to use software in their preferred language. While several companies
offer machine translation services, these often do not reach the desired
accuracy level when translating industry-specific texts. As a solution to this
issue, some firms offer services that allow for translator tuning, using one’s
own data. One of these is Microsoft’s Azure Custom Translator, which is
the basis for this research paper. Since we cannot affect the model itself,
this paper primarily focuses on gathering and processing the required data.
Using LASER and Vecalign models, parallel sentences are extracted from
professionally-translated texts and properly aligned. These are then used to
train two separate versions of a custom translator, one based on a general
baseline model, and the other on a technology baseline model. To evaluate
our models, we employ the BLUE, chrF++, and BERTScore scoring systems
to compare them with Azure’s other options, as well as one of the leading
outside services. Upon completing our analysis, we conclude that our technology
baseline model is comparable to the outside service. Finally, we use
the best model to develop a simple translation app. |
| Secondary keywords: |
translation;translator;developing machine translation system;labeling industry;Azure;cloud services;computer science;diploma;Strojno učenje;Prevajanje in tolmačenje;Strojno prevajanje;Računalništvo;Univerzitetna in visokošolska dela; |
| Type (COBISS): |
Bachelor thesis/paper |
| Study programme: |
1000468 |
| Embargo end date (OpenAIRE): |
1970-01-01 |
| Thesis comment: |
Univ. v Ljubljani, Fak. za računalništvo in informatiko |
| Pages: |
48 str. |
| ID: |
21439470 |