diploma thesis
Klemen Simonič (Author), Vladimir Batagelj (Mentor), Primož Škraba (Co-mentor)

Abstract

Network structural properties and their application to missing property prediction

Keywords

Linked Data;graph mining;network;structural properties;missing properties;prediction;computer science;computer and information science;computer science and mathematics;diploma;

Data

Language: English
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [K. Simonič]
UDC: 004(043.2)
COBISS: 9896020 Link will open in a new window
Views: 44
Downloads: 3
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary title: Network structural properties and their application to missing property prediction
Secondary abstract: The volume of available structured data is increasing, particularly in the form of Linked Data, where relationships between individual pieces of data are encoded by a graph-like structure. Despite increasing scales of the data, the use and applicability of these resources is currently limited by mistakes and omissions in the linking data. In this diploma thesis, we look at the problem of predicting potential instance properties (types of relations). Given a specific query node in our multigraph dataset, can we correctly rank possibly omitted properties? We propose a method based on leveraging properties from similar nodes in our dataset. In order to compute similar nodes, we define various network structural properties, which induce dissimilarities between nodes. These structural properties are based on either local or global processing of the underlying network. Since their complexity highly varies, a special treatment needs to be considered when dealing with networks containing hundreds of millions of nodes and edges. In our tool LODminer, we use weighted averages of property frequency vectors over a set of similar nodes to determine the most likely missing in¬stance property. We investigate the performance of different dissimilarities and compare them to several other methods on three large-scale datasets, two based on DBpedia and one based on Freebase. Mathematics Subject Classification [MSC2010]: 68R10 [Graph the¬ory], 68T30 [Knowledge representation], 05C82 [Small world graphs, complex networks], 91D30 [Social networks]. CCS Categories and Subject Descriptors [1998 system]: G.2.2 [Graph Theory], I.2.4 [Knowledge Representation Formalisms and Methods]: Seman¬tic networks, H.2 [Database Management]: Database Applications – Data Mining.
Secondary keywords: Linked Data;analiza grafov;omrežja;strukturne lastnosti;manjkajoče lastnosti;napovedovanje;računalništvo;računalništvo in informatika;računalništvo in matematika;univerzitetni študij;diplomske naloge;
File type: application/pdf
Type (COBISS): Undergraduate thesis
Thesis comment: Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
Pages: 55 str.
ID: 24168244