Hierarhično gručenje na velikih podatkih

magistrsko delo

Nejc Debevec (Author), Blaž Zupan (Mentor)

Abstract

Hierarhično gručenje je zelo priljubljena in uporabna metoda gručenja. Omogoča nam gradnjo informativne vizualizacije hierarhij v podatkih imenovano dendrogram. Težava se pojavi pri obdelavi večjih količin podatkov, saj ima metoda visoko časovno in prostorsko zahtevnost. V magistrskem delu predstavimo pristop za zmanjšanje kompleksnosti metode hierarhičnega gručenja. Ta temelji na preobdelavi podatkov s hitrejšimi tehnikami gručenja. V ta namen preizkusimo metode: DBSCAN, BIRCH, MeanSHift, metoda voditeljev in pa gručenje v omrežjih. Vsako izmed metod preizkusimo na različnih sintetičnih in realnih podatkovnih množicah. Prav tako podamo idejno vizualizacijo za prikaz rezultatov našega pristopa. Iz rezultatov je razvidno, da z našim pristopom bistveno časovno izboljšamo metodo hierarhičnega gručenja, vendar pri tem izgubimo pri natančnosti. Naš pristop namreč ne vrača popolnoma istih rezultatov, kot metoda hierarhičnega gručenja.

Keywords

odkrivanje znanj iz podatkov;razvrščanje v skupine;hierarhično gručenje;vizualizacija podatkov;računalništvo;računalništvo in informatika;magisteriji;

Data

Language:	Slovenian
Year of publishing:	2020
Typology:	2.09 - Master's Thesis
Organization:	UL FRI - Faculty of Computer and Information Science
Publisher:	[N. Debevec]
UDC:	004.8(043.2)
COBISS:	51746051
Views:	1076
Downloads:	216
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Hierarchical Clustering for Large Data Sets
Secondary abstract:	Hierarchical clustering is a very popular and useful clustering method. It allows us to build an informative visualization of hierarchies in data called a dendrogram. The problem arises when processing large amounts of data, as the method has a high time and space complexity. In the master's thesis, we present an approach to reducing the complexity of the method of hierarchical clustering. This is based on data processing with faster clustering techniques. For this purpose, we test the methods: DBSCAN, BIRCH, MeanShift, K-means and Louvain clustering. Each of the methods is tested on different synthetic and real data sets. We also provide a conceptual visualization to show the results of our approach. It is evident from the results that our approach significantly improves the time complexity of the method of hierarchical clustering, but we do lose accuracy. Namely, our approach does not return exactly the same results as the method of hierarchical clustering.
Secondary keywords:	data mining;clustering;hierarchical clustering;data visualization;computer science;computer and information science;master's degree;
Type (COBISS):	Master's thesis/paper
Study programme:	1000471
Embargo end date (OpenAIRE):	1970-01-01
Thesis comment:	Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages:	60 str.
ID:	12352463