bachelor thesis
Abstract
In natural language processing, sarcasm detection determines whether a given text is sarcastic or not. It can have many real-world applications such as machine translation. In this work, we present three language modelling approaches and adapt them to the task of sarcasm detection. Two approaches are pretrained language models, BERT uses the encoder part of the transformer architecture and GPT-3 uses the decoder part of the transformer. The third method uses a newly-proposed task-driven learning technique TLM. We evaluated the methods using well-known metrics such as classification accuracy, precision and recall. GPT-3 performed the best in almost every aspect, with BERT being a close second. Our findings showed that TLM is very dependent on the task data and is therefore not suitable for a general task such as sarcasm detection.
Keywords
natural language processing;language models;sarcasm detection;transformer architecture;computer and information science;diploma thesis;
Data
Language: |
English |
Year of publishing: |
2022 |
Typology: |
2.11 - Undergraduate Thesis |
Organization: |
UL FRI - Faculty of Computer and Information Science |
Publisher: |
[A. Dimitrievikj] |
UDC: |
004.8:81'322.2(043.2) |
COBISS: |
125453827
|
Views: |
44 |
Downloads: |
27 |
Average score: |
0 (0 votes) |
Metadata: |
|
Other data
Secondary language: |
Slovenian |
Secondary title: |
Jezikovni modeli in učenje s prilagajanjem nalogi za prepoznavanje sarkazma |
Secondary abstract: |
Zaznavanje sarkazma je postopek ugotavljanja, ali je besedilo sarkastično ali ne. Avtomatsko prepoznavanje sarkazma je pomemben vidik obdelave naravnega jezika in ima lahko veliko aplikacij, npr strojno prevajanje. V delu predstavljamo tri pristope jezikovnega modeliranja in jih prilagajamo nalogi odkrivanja sarkazma. Dva pristopa sta vnaprej naučena jezikovna modela, BERT uporablja kodirni del transformerske arhitekture, GPT-3 pa uporablja dekodirni del transformerja. Tretja metoda, TLM, uporablja novo predlagano tehniko učenja, ki temelji na ekstrakciji podatkov glede na dano nalogo. Metode smo ovrednotili z uporabo dobro znanih metrik, kot so klasifikacijska točnost, natančnost in priklic. Metoda GPT-3 se je izkazala za najboljšo v skoraj vseh vidikih, BERT pa je bil na drugem mestu. Naše ugotovitve so pokazale, da je TLM zelo odvisen od podatkov dane naloge in zato ni primeren za splošno nalogo, kot je odkrivanje sarkazma. |
Secondary keywords: |
jezikovni modeli;prepoznavanje sarkazma;arhitektura transformer;računalništvo in informatika;univerzitetni študij;diplomske naloge;Obdelava naravnega jezika (računalništvo);Računalniško jezikoslovje;Računalništvo;Univerzitetna in visokošolska dela; |
Type (COBISS): |
Bachelor thesis/paper |
Study programme: |
1000468 |
Embargo end date (OpenAIRE): |
1970-01-01 |
Thesis comment: |
Univ. v Ljubljani, Fak. za računalništvo in informatiko |
Pages: |
33 str. |
ID: |
16252552 |