Network structural properties and their application to missing property prediction

diploma thesis

Klemen Simonič (Avtor), Vladimir Batagelj (Mentor), Primož Škraba (Komentor)

Povzetek

Ključne besede

Linked Data;graph mining;network;structural properties;missing properties;prediction;computer science;computer and information science;computer science and mathematics;diploma;

Podatki

Jezik:	Angleški jezik
Leto izida:	2013
Tipologija:	2.11 - Diplomsko delo
Organizacija:	UL FRI - Fakulteta za računalništvo in informatiko
Založnik:	[K. Simonič]
UDK:	004(043.2)
COBISS:	9896020
Št. ogledov:	44
Št. prenosov:	3
Ocena:	0 (0 glasov)
Metapodatki:

Ostali podatki

Sekundarni jezik:	Slovenski jezik
Sekundarni naslov:	Network structural properties and their application to missing property prediction
Sekundarni povzetek:	The volume of available structured data is increasing, particularly in the form of Linked Data, where relationships between individual pieces of data are encoded by a graph-like structure. Despite increasing scales of the data, the use and applicability of these resources is currently limited by mistakes and omissions in the linking data. In this diploma thesis, we look at the problem of predicting potential instance properties (types of relations). Given a speciﬁc query node in our multigraph dataset, can we correctly rank possibly omitted properties? We propose a method based on leveraging properties from similar nodes in our dataset. In order to compute similar nodes, we deﬁne various network structural properties, which induce dissimilarities between nodes. These structural properties are based on either local or global processing of the underlying network. Since their complexity highly varies, a special treatment needs to be considered when dealing with networks containing hundreds of millions of nodes and edges. In our tool LODminer, we use weighted averages of property frequency vectors over a set of similar nodes to determine the most likely missing in¬stance property. We investigate the performance of diﬀerent dissimilarities and compare them to several other methods on three large-scale datasets, two based on DBpedia and one based on Freebase. Mathematics Subject Classiﬁcation [MSC2010]: 68R10 [Graph the¬ory], 68T30 [Knowledge representation], 05C82 [Small world graphs, complex networks], 91D30 [Social networks]. CCS Categories and Subject Descriptors [1998 system]: G.2.2 [Graph Theory], I.2.4 [Knowledge Representation Formalisms and Methods]: Seman¬tic networks, H.2 [Database Management]: Database Applications – Data Mining.
Sekundarne ključne besede:	Linked Data;analiza grafov;omrežja;strukturne lastnosti;manjkajoče lastnosti;napovedovanje;računalništvo;računalništvo in informatika;računalništvo in matematika;univerzitetni študij;diplomske naloge;
Vrsta datoteke:	application/pdf
Vrsta dela (COBISS):	Diplomsko delo
Komentar na gradivo:	Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
Strani:	55 str.
ID:	24168244