magistrsko delo
Abstract
Naraščajoč pomen spletnega komuniciranja v sodobni družbi je poudaril potrebo po
razumevanju in prepoznavanju samomorilnih misli v teh spletnih prostorih. Spletne skupnosti,
še posebej tiste, osredotočene na duševno zdravje, pogosto vključujejo komunikacije, ki so
tesno prepletene z izrazi samomorilnih misli. Medtem ko je zaznavanje teh izrazov pomembno
za raziskave, je tudi ključnega pomena za proaktivno moderiranje in preventivne strategije na
teh platformah. Tradicionalne metode strojnega učenja so pokazale obetajoče rezultate pri
prepoznavanju samomorilnih nagnjenj v besedilnih podatkih. Vendar pa pojav velikih
jezikovnih modelov (LLM), kot je GPT-4, zasnovanih na sofisticiranih arhitekturah globokega
učenja, ponuja potencial za globlje in bolj niansirano zaznavanje subtilnih namigov, povezanih
s samomorilnimi mislimi, ki so pogosto prepleteni z drugimi temami in jih je težko ločiti.
Osrednja tema te raziskave je preučiti sposobnost LLM pri zaznavanju samomorilne vsebine v
spletnih vsebinah. Cilji vključujejo 1) vdelavo besedil in njihovo združevanje na podlagi
podobnosti vsebine ter 2) fino nastavitev modelov za razlikovanje in kategorizacijo
dokumentov glede na prisotnost pristnih samomorilnih misli nasproti splošnim razpravam o
duševnem zdravju. Rezultati potrjujejo učinkovitost LLM v obeh nalogah, saj uspešno
združujejo objave na podlagi njihove podobnosti vsebine, da ustvarijo oznake razredov, poleg
tega pa imajo visoko natančnost in obnovitev pri razlikovanju samomorilnih misli od splošnih
pripovedi o duševnem zdravju.
Keywords
suicide detection;machine learning;large language models;document embedding;clustering;
Data
Language: |
English |
Year of publishing: |
2023 |
Typology: |
2.09 - Master's Thesis |
Organization: |
UL FDV - Faculty of Social Sciences |
Publisher: |
[P. Kerkez] |
UDC: |
316.624(043.2) |
COBISS: |
167950083
|
Views: |
32 |
Downloads: |
6 |
Average score: |
0 (0 votes) |
Metadata: |
|
Other data
Secondary language: |
Slovenian |
Secondary title: |
Suicidal ideation detection in online posts using large language models |
Secondary abstract: |
The increasing significance of online communication in contemporary society has underlined
the need to understand and identify suicidal ideation within these online spaces. Online
communities, especially those centered on mental health, frequently feature communications
deeply interwoven with expressions of suicidal ideation. While detecting these expressions is
important for research, it is also fundamental for proactive moderation and prevention
strategies within these platforms. Traditional machine learning methodologies have shown
promise in recognizing suicidal tendencies in textual data. However, the emergence of large
language models (LLM’s) like GPT-4, built on sophisticated deep learning architectures, offers
potential for a deeper and more nuanced detection of subtle cues linked with suicidal ideation
that are often mingled with other themes and difficult to isolate. The core focus of this research
is to examine the capability of LLM's in detecting suicidal content in online content. The
objectives include 1) embedding the texts and clustering them based on content similarity, and
2) fine-tuning the models to distinguish and categorize documents based on the presence of
genuine suicidal ideation versus general mental health discussions. The results validate the
efficacy of LLMs in both tasks, achieving successful clustering of posts based on their content
similarities to generate class labels, as well as having high precision and recall in differentiating
suicidal ideation from general mental health narratives. |
Secondary keywords: |
zaznavanje samomora;strojno učenje;veliki jezikovni modeli;vložitve
dokumentov;hierarhično združevanje;Socialna psihologija;Svetovni splet;Samomorilnost;Univerzitetna in visokošolska dela; |
Type (COBISS): |
Master's thesis/paper |
Study programme: |
0 |
Embargo end date (OpenAIRE): |
1970-01-01 |
Thesis comment: |
Univ. v Ljubljani, Fak. za družbene vede |
Pages: |
66 str. |
ID: |
20034058 |