Language: | Slovenian |
---|---|
Year of publishing: | 2023 |
Typology: | 2.11 - Undergraduate Thesis |
Organization: | UL FRI - Faculty of Computer and Information Science |
Publisher: | [B. Bulić] |
UDC: | 004.8:81'322(043.2) |
COBISS: | 168959747 |
Views: | 70 |
Downloads: | 8 |
Average score: | 0 (0 votes) |
Metadata: |
Secondary language: | English |
---|---|
Secondary title: | Word sense induction in Slovene using large language models |
Secondary abstract: | In the thesis, we developed a procedure for discovering new word meanings. We extracted the list of observed words from the word-sense disambiguation dataset. Sentences containing the observed word were obtained from the news database from the Event Registry service. We represented the words with vectors using the models multilingual-BERT-Base, Cased and SloBERTa and clustered them in various ways. We compared the results with the data from the disambiguation dataset and manually checked some words with known semantic shifts. The obtained results are not promising. We believe that the main reason is an unsuitable text database. |
Secondary keywords: | meanings of words;sentence vector embedding;clustering;BERT;natural language processing;word sense induction;computer science;computer and information science;computer science and mathematics;interdisciplinary studies;diploma;Računalniško jezikoslovje;Računalništvo;Univerzitetna in visokošolska dela; |
Type (COBISS): | Bachelor thesis/paper |
Study programme: | 1000407 |
Embargo end date (OpenAIRE): | 1970-01-01 |
Thesis comment: | Univ. v Ljubljani, Fak. za računalništvo in informatiko |
Pages: | 37 str. |
ID: | 19937509 |