Metoda alokacije za klasifikacijo neuravnoteženih podatkov

doktorska disertacija

Sašo Karakatič (Author), Vili Podgorelec (Mentor)

Abstract

V doktorski disertaciji predstavimo metodo z imenom alokacija, ki je namenjena klasifikaciji neuravnoteženih podatkov. Metoda alokacije je klasifikacijski ansambel iz dveh nivojev. V prvem nivoju deluje alokator, ki se s pomočjo algoritmov nenadzorovanega učenja nauči učinkovito deliti izvorno množico podatkov na homogene podmnožice, ki se nato alocirajo specializiranim klasifikatorjem na drugem nivoju. Drugi nivo sestavlja množica specializiranih klasifikatorjev, kjer je vsak naučen na specifični podmnožici, ki mu je bila alocirana, in se tako specializira za točno določeno vrsto podatkov. Ti klasifikatorji tako vrnejo končno odločitev o razredu posameznih instanc, kar je tudi rezultat metode alokacije. Z namenom preizkusa delovanja koncepta metode alokacije smo v okviru doktorske disertacije razvili dve varianti alokatorja -- alokator z detekcijo anomalij, ki uporablja eno razredni klasifikator SVM, in alokator z gručenjem k-means. Obe vrsti alokatorja smo preizkusili v kombinaciji s šestimi klasifikacijskimi metodami na mestu specializiranih klasifikatorjev na drugem nivoju. Vse variante metode alokacije v vseh kombinacijah smo preverili na neuravnoteženih in uravnoteženih podatkih, slednje z namenom validacije metode kot splošnega klasifikacijskega pristopa. Rezultate alokacij smo primerjali z obstoječimi metodami za spopadanje z neuravnoteženi podatki -- informiranim podvzorčenjem, nadvzorčenjem SMOTE in ansambli bagging, MultiBoost in AdaBoost. V eksperimentih smo primerjali rezultate metrik klasifikacije (ki smo jih identificirali v teoretičnem delu disertacije) in čase, potrebne za učenje klasifikacijskega modela. Rezultate eksperimentov smo dodatno preverili s statistično analizo in na podlagi tega prišli do zaključkov, da je metoda alokacije učinkovita alternativa obstoječim pristopom pri klasifikaciji neuravnoteženih in tudi uravnoteženih podatkov.

Keywords

strojno učenje;klasifikacija;neuravnoteženi podatki;detekcija anomalij;alokacija;gručenje;ansambli;doktorske disertacije;

Data

Language:	Slovenian
Year of publishing:	2017
Typology:	2.08 - Doctoral Dissertation
Organization:	UM FERI - Faculty of Electrical Engineering and Computer Science
Publisher:	S. Karakatič]
UDC:	004.67:519.254(043.3)
COBISS:	20562966
Views:	2036
Downloads:	370
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Allocation method for classification of imbalanced data
Secondary abstract:	In this doctoral dissertation, we present a method called allocation, which is intended to classify imbalanced data. The allocation method is a classification ensemble composed of two levels. In the first level, there is the allocator, an algorithm of unsupervised learning, which learns to split the original dataset to homogeneous subsets and allocates them to specialized classifiers on the second level. The second level consists of multiple specialized classifiers where each learns on the specific subset of instances allocated to it and so specializes in a particular type of data. These specialized classifiers return the class of the instances, which is also the final result of the allocation method. To test the concept of the allocation method, we developed two variants of the allocator – the allocator with anomaly detection, which uses the one-class SVM classifier and the allocator with the k-means clustering method. Both types of the allocator were tested in combination with the six basic classification methods as the specialized classifiers on the second level. All variants and the combinations of the allocation method were tested on unbalanced and balanced datasets, the latter for the purpose of validation of the allocation as a general classification approach. The results of the allocation method were compared with existing methods for dealing with unbalanced data -- informed subsampling, SMOTE oversampling and ensemble methods bagging, MultiBoost and AdaBoost. In the experiments, we compared the classification metrics of each method (which were identified in the theoretical part of the thesis) and the time duration needed to construct a classification model. The results of the experiments were further analyzed with statistical methods, with which we confirmed that the allocation method is an effective alternative to the existing approaches for the classification of unbalanced and balanced data.
Secondary keywords:	machine learning;classification;imbalanced data;anomaly detection;allocation;clustering;ensembles;Podatki;Disertacije;Klasifikacija;Strojno učenje;
URN:	URN:SI:UM:
Type (COBISS):	Doctoral dissertation
Thesis comment:	Univ. v Mariboru, Fak. za elektrotehniko, računalništvo in informatiko
Pages:	XXI, 181 str.
ID:	9604511