diplomsko delo
Vid Keršič (Author), Damjan Strnad (Mentor), Štefan Kohek (Co-mentor)

Abstract

Graf je neevklidska podatkovna struktura, ki jo je težko neposredno analizirati z metodami strojnega učenja, ki obdelujejo podatke v vektorski obliki. Zaradi tega so v zadnjih letih postale priljubljene metode strojnega učenja za vektorsko vložitev, ki graf transformirajo v vektorski prostor. V diplomskem delu zgradimo graf iz člankov z angleške Wikipedije s sledenjem vsebovanim hiperpovezavam. Eksperiment izvedemo za filme in glasbene albume. Vozlišča dobljenega grafa vložimo v vektorski prostor, kar nam omogoči učinkovitejšo analizo grafa, pri kateri se osredotočimo na vizualizacijo, podobnost ter klasifikacijo filmov in albumov v žanre. Med seboj primerjamo vložitve metod DeepWalk, node2vec in SDNE. Pri klasifikaciji filmov v povprečju dosežemo 88,5 % točnost, pri albumih pa 89,3 % točnost.

Keywords

strojno učenje;graf;vložitev vozlišč;naključni sprehod;avtokodirnik;diplomske naloge;

Data

Language: Slovenian
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UM FERI - Faculty of Electrical Engineering and Computer Science
Publisher: [V. Keršič]
UDC: 004.85:004.422.63(043.2)
COBISS: 38602243 Link will open in a new window
Views: 338
Downloads: 65
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Machine learning methods for vector embedding of graph nodes
Secondary abstract: A graph is a non-Euclidean data structure, which is hard to analyze directly with machine learning methods that process data in the vector form. Therefore, in the recent years, machine learning methods for vector embedding, which transform graphs into vector space, have gained a lot of traction. In the thesis, we construct a graph from English Wikipedia articles by following contained hyperlinks. We conduct experiments for movies and music albums. We embed the nodes of the obtained graph in a vector space, which allows us to analyze them more efficiently, focusing on visualization, similarity, and classification of movies and albums into genres. We compare the embeddings produced by methods DeepWalk, node2vec, and SDNE. We achieve, on average, the classification accuracy of 88.5 % for movies and 89.3 % for albums.
Secondary keywords: machine learning;graph;node embeddings;random walk;autoencoder;
Type (COBISS): Bachelor thesis/paper
Thesis comment: Univ. v Mariboru, Fak. za elektrotehniko, računalništvo in informatiko, Računalništvo in informacijske tehnologije
Pages: XII, 42 str.
ID: 11967015