master's thesis

Abstract

Implementing natural language processing (NLP) techniques for low-reso-urce languages is one of the biggest challenges in today's machine learning field. Most state-of-the-art works are focused on well-resourced languages, such as English. However, most languages have scarce resources and it is hard, and in some cases almost impossible, to develop NLP models. We focus on implementation of automatic question answering (QA) models in Macedonian. Since there are no QA datasets in Macedonian yet, we provide the first semi-automatic translation of the SuperGLUE benchmark. Using three question answering datasets from this benchmark (BoolQ, COPA and MultiRC) we fine-tune and compare several transformer-based models. The obtained results show that even in a low-resource language such as Macedonian, we can obtain good results for automatic QA. The translated benchmark and the fine-tuned models can represent a baseline for further research.

Keywords

question answering;cross-lingual transfer;information retrieval;deep learning;Macedonian language;transformer models;computer science;master's thesis;

Data

Language: English
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [L. Dodevska]
UDC: 004.8:81'322(043.2)
COBISS: 128897795 Link will open in a new window
Views: 55
Downloads: 10
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary title: Medjezikovni prenos virov in modelov za problem odgovarjanja na vprašanja
Secondary abstract: Implementacija tehnik obdelave naravnega jezika (NLP) za jezike z malo viri je eden večjih izzivov na področju strojnega učenja. Večina raziskav je osredotočena na jezike z dovolj viri, kot je angleščina. Ker so za večino jezikov viri omejeni, je zanje težko razviti modele NLP. V magisterskem delu se osredotočimo na implementacijo modelov avtomatskega odgovarjanja na vprašanja (QA) v makedonskem jeziku. Ker v makedonščini še ne obstajajo učne množice za ta namen, izdelamo prvi polavtomatski prevod zbirke nalog SuperGLUE. Z uporabo treh učnih množic za odgovarjanje na vprašanja (BoolQ, COPA in MultiRC) prilagodimo več modelov, ki temeljijo na arhitekturi transformer. Dobljeni rezultati kažejo, da lahko tudi v jeziku z malo viri, kot je makedonščina, dobimo dobre rezultate za QA. Prevedene učne množice in prilagojeni modeli predstavljajo izhodišče za nadaljnje raziskave.
Secondary keywords: odgovarjanje na vprašanja;medjezikovni prenos;pridobivanje informacij;globoko učenje;makedonščina;transformer model;magisteriji;Obdelava naravnega jezika (računalništvo);Računalniško jezikoslovje;Strojno učenje;Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS): Master's thesis/paper
Study programme: 1000471
Embargo end date (OpenAIRE): 1970-01-01
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: VIII, 59 str.
ID: 16812122