diplomsko delo
Maks Horvat (Author), Marko Robnik Šikonja (Mentor)

Abstract

Orodja za tekstovno rudarjenje v slovenščini

Keywords

tekstovno rudarjenje;obdelava naravnega jezika;slovenski jezik;jezikovna orodja;visokošolski strokovni študij;računalništvo;računalništvo in informatika;diplomske naloge;

Data

Language: Slovenian
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [M. Horvat]
UDC: 004(043.2)
COBISS: 9903956 Link will open in a new window
Views: 54
Downloads: 3
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Text mining tools for Slovene language
Secondary abstract: We introduce the use of various tools for Slovenian language processing and adapt them for NLTK library. To automatically determine the part of speech tags we use algorithms from the NLTK library. From Gigafida corpus we build several taggers: n-gram, Brill, naive Bayes, maximum entropy and hidden Markov model. We measure the accuracy of part of speech tags and time complexity of the taggers. We also incorporated Obeliks program for lemmatization and part of speech tags assignment. For text parsing and identification of named entities we use dependencyParser and SLNER tools. We develop and test a module for information retrieval. We use inverted index, search with boolean operators, vector representation of documents and cosine similarity.
Secondary keywords: text mining;natural language processing;Slovenian language;language tools;computer science;computer and information science;diploma;
File type: application/pdf
Type (COBISS): Bachelor thesis/paper
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: 58 str.
ID: 24168236