diplomska naloga
Blaž Novak (Avtor), Ivan Bratko (Mentor), Dunja Mladenić (Komentor)

Povzetek

Odkrivanje tematik v zaporedju besedil in sledenje njihovim spremembam

Ključne besede

razvrščanje;rudarjenje v podatkovnih tokovih;računalništvo;univerzitetni študij;diplomske naloge;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.11 - Diplomsko delo
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [B. Novak]
UDK: 004(043.2)
COBISS: 6723668 Povezava se bo odprla v novem oknu
Št. ogledov: 795
Št. prenosov: 163
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: [Topic detection and tracking in a stream of documents]
Sekundarni povzetek: A challenge created by the recent development in information technology is that people are often faced with an overwhelming amount of information available to them, with blogs presenting the latest and most abundant source of such information. In this thesis, I approach the problem from a standpoint of organizing the newly created information into sensible groups. The first part of the thesis is an overview of the state of the art in the areas relevant to the problem and an analysis of shortcomings of different methods. The main contribution is the development of a new algorithm that pieces together various ideas presented in the first part. It is an online hierarchical clustering algorithm that is capable of incremental model updates that support the addition and also the removal of documents. The structure of the model is adapted after each step to better reflect the structure of the currently observed world. The model can also be optimized while waiting for new events. Some experiments to test the properties of the new algorithm were performed using simulated data streams created from the Reuters Corpus Volume 1 dataset. I have found that the basic assumptions about time complexity and the ability to adapt the model are correct and that the algorithm performs surprisingly well for a range of different inputs.
Sekundarne ključne besede: clustering;stream mining;computer science;diploma;
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Diplomsko delo
Komentar na gradivo: Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
Strani: 63 str.
ID: 23829061