doktorska disertacija
Branko Kavšek (Avtor), Igor Kononenko (Mentor), Nada Lavrač (Komentor)

Povzetek

Odkrivanje podskupin z uporabo algoritmov za učenje pravil

Ključne besede

strojno učenje;podatkovno rudarjenje;odkrivanje podskupin;učenje pravil;ROC analiza;evalvacijske mere;utežna relativna klasifikacijska točnost;računalništvo;disertacije;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.08 - Doktorska disertacija
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [B. Kavšek]
UDK: 004.8
COBISS: 4607060 Povezava se bo odprla v novem oknu
Št. ogledov: 15
Št. prenosov: 0
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: Using rule learning for subgroup discovery
Sekundarni povzetek: This dissertation investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of a selected population that are sufficiently large and statistically unusual in terms of class distribution. The dissertation presents a subgroup discovery algorithm, CN2-SD, developed by modifying parts of the CN2 classification rule learner: its covering algorithm, search heuristic, probabilistic classification of instances, and evaluation measures. Experimental evaluation of CN2-SD on selected data sets shows substantial reduction of the number of induced rules, increased rule coverage, rule significance and overall coverage of the target concept as well as slight improvements in terms of the area under ROC curve, when compared with rule learning algorithms CN2 and RIPPER. An application of CN2-SD to a large traffic accident data set confirms these findings. This dissertation presents also the subgroup discovery algorithm APRIORI-SD, developed by adapting association rule learning to subgroup discovery. This was achieved by building a classification rule learner APRIORI-C, enhanced with a novel post–processing mechanism, a new quality measure for induced rules (weighted relative accuracy) and using probabilistic classification of instances. Experimental results a similar behavior of APRIORI-SD and the subgroup discovery algorithm CN2-SD i.e. substantial reduction of the number of induced rules, increased rule coverage, rule significance and overall coverage of the target concept as well as slight improvements in terms of the area under ROC curve, when compared with rule learning algorithms CN2, RIPPER and APRIORI-C. A new optimization approach to subgroup discovery based on ROC analysis is also presented and implemented as an adaptation of the APRIORI-SD algorithm. The implications of the “number-of-rules–unusualness–coverage” trade off to subgroup discovery are investigated through an experimental evaluation of the adapted APRIORI-SD algorithm on selected data sets. The results are presented in the form of 2D graphs depicting the dependencies between the number of induced rules, unusualness, accuracy and overall coverage of the target concept and the original APRIORI-SD subgroup discovery algorithm is discussed in this new optimization framework. Finally, the dissertation presents the comparison of the new algorithms with existing state–of–the–art subgroup discovery algorithms and the application of CN2-SD and APRIORI-SD to a real–life problem – the traffic accident database – a database describing traffic accidents in Great Britain.
Sekundarne ključne besede: machine learning;data mining;subgroup discovery;rule learning;ROC analysis;evaluation measures;weighted relative accuracy;
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Doktorska disertacija
Komentar na gradivo: Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
Strani: XII, 132 str.
ID: 23829111