Bayesian attention networks for reliable hate speech detection

Povzetek

Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, have achieved superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test whether affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it offers state-of-the-art classification performance and can detect less trusted predictions.

Ključne besede

obdelava naravnega jezika;strojno učenje;nevronske mreže transformer;Bayesovske nevronske mreže;modeli BERT;natural language processing;machine learning;transformer neural networks;Bayesian neural networks;BERT models;

Podatki

Jezik: Angleški jezik
Leto izida:
Tipologija: 1.01 - Izvirni znanstveni članek
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
UDK: 004.85:81'322.2
COBISS: 80879363 Povezava se bo odprla v novem oknu
ISSN: 1866-9956
Št. ogledov: 19
Št. prenosov: 4
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Slovenski jezik
Sekundarne ključne besede: obdelava naravnega jezika;strojno učenje;nevronske mreže transformer;bayesovske nevronske mreže;modeli BERT;
Vrsta dela (COBISS): Članek v reviji
Strani: str. 353-371
Letnik: ǂVol. ǂ14
Zvezek: ǂiss. ǂ1
Čas izdaje: Jan. 2022
DOI: 10.1007/s12559-021-09826-9
ID: 18264590