diplomsko delo na interdisciplinarnem univerzitetnem študiju
Anže Starič (Avtor), Blaž Zupan (Mentor)

Povzetek

Pristopi strojnega učenja za tekmovanje UCSD Data Mining Contest

Ključne besede

strojno učenje;naivni Bayes;rangiranje;ansambli klasifikatorjev;računalništvo;univerzitetni študij;diplomske naloge;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.11 - Diplomsko delo
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [A. Starič]
UDK: 004.85(043.2)
COBISS: 7963220 Povezava se bo odprla v novem oknu
Št. ogledov: 159
Št. prenosov: 6
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: [Machine learning techniques for UCSD Data Mining Contest]
Sekundarni povzetek: With participation in machine learning competitions we get acquainted with new problem domains and new types of problems. We are forced to look for and try out new techniques and search for innovative problem solving approaches. In UCSD Data Mining Contest, our task was to rank the ordering consumer pool according to who is most likely to become a customer of the retailer. In the following dissertation we have developed a technique for predicting the probability of a consumer becoming a customer of the retailer. Standard machine learning algorithms were evaluated and attribute analysis has been performed on the train dataset. In order to improve the score of standard algorithms review of methods that augment Naive Bayes for ranking has also been carried out and the most promising one has been implemented by using the Orange framework. We have also assessed the impact of data discretization on the Naive Bayes and evaluated ensemble techniques that combine the Naive Bayes Classifiers. Results show that ranking of potential customers is indeed a hard task for standard machine learning algorithms. Augmented Naive Bayes performed slightly better in terms of AUC, but the best results were produced using a combination of data discretization and standard Naive Bayes Classifier. AUC scores achieved were relatively low compared to scores achieved on other machine learning problems. This suggests that more attributes should be introduced into dataset before using this method in production environment.
Sekundarne ključne besede: machine learning;naive Bayes;ranking;ensemble techniques;computer science;diploma;
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Diplomsko delo
Komentar na gradivo: Univerza v Ljubljani, Fakulteta za računalništvo in informatiko
Strani: 36 str.
ID: 23960040