Sekundarni povzetek: |
The volume of available structured data is increasing, particularly in the form of Linked Data, where relationships between individual pieces of data are encoded by a graph-like structure. Despite increasing scales of the data, the use and applicability of these resources is currently limited by mistakes and omissions in the linking data.
In this diploma thesis, we look at the problem of predicting potential instance properties (types of relations). Given a specific query node in our multigraph dataset, can we correctly rank possibly omitted properties? We propose a method based on leveraging properties from similar nodes in our dataset.
In order to compute similar nodes, we define various network structural properties, which induce dissimilarities between nodes. These structural properties are based on either local or global processing of the underlying network. Since their complexity highly varies, a special treatment needs to be considered when dealing with networks containing hundreds of millions of nodes and edges.
In our tool LODminer, we use weighted averages of property frequency vectors over a set of similar nodes to determine the most likely missing in¬stance property. We investigate the performance of different dissimilarities and compare them to several other methods on three large-scale datasets, two based on DBpedia and one based on Freebase.
Mathematics Subject Classification [MSC2010]: 68R10 [Graph the¬ory], 68T30 [Knowledge representation], 05C82 [Small world graphs, complex networks], 91D30 [Social networks].
CCS Categories and Subject Descriptors [1998 system]: G.2.2 [Graph Theory], I.2.4 [Knowledge Representation Formalisms and Methods]: Seman¬tic networks, H.2 [Database Management]: Database Applications – Data Mining. |