diplomsko delo
Ines Panker (Avtor), Janez Demšar (Mentor)

Povzetek

Avtomatsko določanje avtorstva slovenskih leposlovnih besedil

Ključne besede

podatkovno rudarjenje;določanje avtorstva;naivni Bayesov klasifikator;računalništvo;univerzitetni študij;diplomske naloge;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.11 - Diplomsko delo
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [I. Panker]
UDK: 004.9(043.2)
COBISS: 9174100 Povezava se bo odprla v novem oknu
Št. ogledov: 43
Št. prenosov: 2
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: Automated authorship attribution for Slovenian literary texts
Sekundarni povzetek: Automatic authorship attribution is an umbrella term for methods trying to derive authorship from text. To achieve this they make use of various data mining techniques. Our chosen task was to test the successfulness of such procedures on a subset of Slovenian literary texts. Each text was represented as a vector with dimensions corresponding to the attributes we decided to measure. We started the calculations by measuring the number of punctuations and continued by measuring the number of word occurrences. We relied on the simple and most known classificators, we tested the SVM, kNN, classification trees and naive Bayes classificator. The last one was found to be giving the best results. Our final results were very satisfactory, with rudimentary approaches we achieved a classification accuracy of 78% and an average precision of 87% with 2 thirds of the authors having precision at 100%.
Sekundarne ključne besede: data mining;autorship attribution;naive Bayes classifier;computer science;diploma;
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Diplomsko delo
Komentar na gradivo: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Strani: 49 str.
ID: 24063110