master's thesis
Peter Mlakar (Author), Polona Oblak (Mentor), Tapio Nummi (Co-mentor)

Abstract

Regression and clustering are important components of machine learning. The first servers as a tool for discovering relations between dependent and independent variables in a dataset. With the second, data can be ordered in clusters or group, depending on the similarities between individual data entries. In our thesis, we investigate a novel algorithm that conducts both tasks at the same time. The algorithm for non-parametric regression, which is based on Gaussian mixed models, discovers cluster in longitudinal datasets and, with the help of non-parametric regression, creates smooth mean development curves for those clusters. In the proposed algorithm, the non-parametric regression is based on natural cubic spline regression. We present the theoretical basis for the algorithm and its components. We also incorporate approaches to reduce the proposed algorithms computational complexity. An implementation of the proposed algorithm and corresponding speed-ups are constructed in the programming language Julia. The algorithms performance is demonstrated quantitatively on a synthetic and qualitatively on a real dataset. A Covid-19 dataset available from the World Health Organization was utilized in the later evaluation. The goal of this evaluation is to group together countries with similar epidemiological development trends.

Keywords

mixture models;regression;natural cubic splines;clustering;computer science;computer and information science;master's degree;

Data

Language: English
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [P. Mlakar]
UDC: 004.8:51(043.2)
COBISS: 77153027 Link will open in a new window
Views: 345
Downloads: 51
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary title: Uporaba regresije z mešanimi modeli v strojnem učenju
Secondary abstract: Regresija ter gručenje sta pomembni komponenti strojnega učenja. Prva služi kot pripomoček pri odkrivanju relacij med odvisnimi ter neodvisnimi spremenljivkami v podatkih. S pomočjo druge metode podatke uredimo v skupine glede na njihove medsebojne podobnosti. V našem delu predstavimo nov algoritem, ki hkrati opravlja obe nalogi. Algoritem za neparametrično regresijo, ki temelji na Gaussovih mešanih modelih, v časovno odvisnih podatkih poišče gruče ter s pomočjo neparametrične regresije ustvari povprečne razvojne krivulje posameznih gruč. V predstavljenem algoritmu neparametrična regresija temelji na regresiji z naravnimi kubičnimi zlepki. Na začetku predstavimo teoretično ozadje predlaganega algoritma ter njegovih komponent. Prav tako algoritmu zmanjšamo časovno kompleksnost s pomočjo različnih pohitritev. Algoritem ter uporabljenje pohitritve smo implementirali v programskem jeziku Julia. Njegovo delovanje evalviramo kvantitativno na umetni ter kvalitativno na resnični podatkovni zbirki Covid-19. Cilj slednje evalvacije je gručenje podobnih držav glede na potek epidemije Covid-19 v posameznh državah.
Secondary keywords: mešani modeli;regresija;naravni kubični zlepki;gručenje;računalništvo in informatika;magisteriji;Strojno učenje;Regresijska analiza;Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS): Master's thesis/paper
Study programme: 1000471
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: XII, 83 str.
ID: 13381032