diplomsko delo
Erik Calcina (Author), Marko Robnik Šikonja (Mentor), Erik Novak (Co-mentor)

Abstract

V sodobnem svetu se vsak dan prebijamo skozi poplavo novic. Za njihovo lažje iskanje je koristno, če so novice združene glede na pripadajoče dogodke. V diplomski nalogi predstavimo metodologijo za gručenje novic v dogodke. Metodologija kombinira uporabo tekstovnih vložitev, algoritma za gručenje in metod za filtriranje novic. Metodologijo smo preizkusili na naboru podatkov spletnih novic ter naredili statistično in ročno evalvacijo. Rezultati so pokazali, da gruče novice v večini opisujejo enake dogodke. Posledica višje natančnosti je veliko nerazporejenih novic.

Keywords

novice;dogodki;gručenje novic;detekcija dogodkov;jezikovni model;visokošolski strokovni študij;diplomske naloge;

Data

Language: Slovenian
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [E. Calcina]
UDC: 004.85:81'322(043.2)
COBISS: 163762435 Link will open in a new window
Views: 8
Downloads: 1
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Event-based news clustering
Secondary abstract: In the modern world, we daily face a flood of news. For easier searching, it is useful if the news are grouped according to related events. In the thesis, we present a methodology for clustering news by events. The methodology combines the use of text embeddings, a clustering algorithm and news filtering methods. We tested the methodology on a dataset of online news and evaluated it statisticaly and manualy. The results indicate that the news clusters primarily depict the same events. However, higher accuracy is accompanied by a substantial amount of non-clustered news.
Secondary keywords: news;events;machine learning;news clustering;event detection;language model;computer science;diploma;Računalniško jezikoslovje;Strojno učenje;Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS): Bachelor thesis/paper
Study programme: 1000470
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: 23 str.
ID: 19904923