diplomsko delo

Abstract

V diplomskem delu analiziramo pristranskost slovenskih novičarskih medijev do politično-ideoloških tem ter oseb, ki se pogosto pojavljajo v njih. Članke želimo klasificirati v razrede (proti, za, ni) glede na pristranskost oziroma naklonjenost avtorja neki temi ali osebi. Detekcija pristranskosti v slovenskem jeziku še ni rešena, saj ne obstaja podatkovna množica za ta problem. Za učenje naših modelov smo uporabili javno dostopno označeno učno množico objav na omrežju Twitter. Uporabili smo angleško in prevedeno slovensko verzijo te učne množice. Sami smo za evalvacijo označili 150 slovenskih člankov. Preizkusimo dva klasifikacijska modela, ki temeljita na modelu BERT, SloBERTa in CroSloEngualBERT. Poizkusi kažejo precejšnje razlike med tematikami. Večina modelov najbolje napoveduje na celotnih člankih. Najboljše rezultate smo dobili na tematiki feminizem z mero F1 enako 0,58 najslabše pa na tematiki ateizem z mero F1 enako 0,33.

Keywords

detekcija pristrankosti;model BERT;večjezikovni modeli;medjezikovni prenos;CroSloEngualBERT;SloBERTa;univerzitetni študij;diplomske naloge;

Data

Language: Slovenian
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [A. Potočnik]
UDC: 004.85:81'322:32(043.2)
COBISS: 142949123 Link will open in a new window
Views: 58
Downloads: 21
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Political stance detection in news using large language models
Secondary abstract: We analyse the bias of Slovenian news media towards political-ideological topics and people who often appear in them. We want to classify the articles into classes (against, for, neutral) according to authors' inclination towards a certain topic or person. Stance detection in Slovene language is not yet solved, as there is no dataset for this problem. To learn our models, we used a publicly available labelled training set of Twitter posts in English and in the translated Slovenian version. We test two classification models based on the BERT model, SloBERTa and CroSloEngualBERT. The experiments show significant differences between the topics. Most models predict best on full articles. The best results were obtained on the topic of feminism with the F1-measure of 0,58 and the worst on the topic of atheism with the F1-measure of 0,33.
Secondary keywords: natural language processing;stance detection;BERT model;multilanguage models;crosslingual transfer;CroSloEngualBERT;SloBERTa;computer science;computer and information science;diploma;Obdelava naravnega jezika (računalništvo);Računalniško jezikoslovje;Mediji in politika;Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS): Bachelor thesis/paper
Study programme: 1000468
Embargo end date (OpenAIRE): 1970-01-01
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: 46 str.
ID: 17908353