doktorska disertacija
Abstract
Kanonične korelacijske metode sestavljajo družino statističnih metod, ki omogočajo analizo povezanosti med dvema množicama spremenljivk. Standardni postopek reševanja temelji na reševanju problema lastnih vrednosti. Kanonično rešitev sestavljata par kanoničnih spremenljivk in pripadajoča kanonična korelacija. Kanonične rešitve so med seboj nekorelirane in si sledijo po padajoči vrednosti kanonične korelacije. Klasična kanonična korelacijska analiza proučuje linearno povezanost med dvema množicama spremenljivk, medtem ko nadgradnje osnovne metode omogočajo tudi druge vrste analiz. Podrobneje bomo obravnavali dve vrsti nadgradenj, in sicer kanonično korelacijsko analizo z nenegativnimi omejitvami in jedrno kanonično korelacijsko analizo z nenegativnimi omejitvami. Prva omogoča analizo linearne povezanosti in druga analizo nelinearne povezanosti. Standardni postopek reševanja obeh problemov z nenegativnimi omejitvami je omejen na izračun prve kanonične rešitve. Zaradi eksponentne časovne zahtevnosti postopka so že problemi z nekaj deset spremenljivkami praktično nerešljivi v razumnem času. V doktorski disertaciji je predstavljen alternativni pristop, ki temelji na uporabi metode alternirajočih najmanjših kvadratov in regularizacije. Oboje skupaj nam omogoča, da lahko v razumnem času poiščemo prvo in ostale kanonične rešitve za probleme z več deset tisoč spremenljivkami. Predlagani alternativni pristop za reševanje problemov z nenegativnimi omejitvami smo zapisali v obliki algoritmov, ki smo jih implementirali v programskem jeziku Python. Na primeru podatkov iz mednarodne raziskave TIMSS in medjezičnega iskanja informacij smo predlagane algoritme tudi uspešno preizkusili.
Keywords
jedrne metode;kanonična korelacijska analiza z omejitvami;medjezično iskanje informacij;metoda alternirajočih najmanjših kvadratov;regularizacija;Družboslovno raziskovanje;Statistične metode;Metodologija;Doktorske disertacije;
Data
Language: |
Slovenian |
Year of publishing: |
2020 |
Typology: |
2.08 - Doctoral Dissertation |
Organization: |
UL FDV - Faculty of Social Sciences |
Publisher: |
[E. Polajnar] |
UDC: |
519.22:303(043.3) |
COBISS: |
29041411
|
Views: |
518 |
Downloads: |
186 |
Average score: |
0 (0 votes) |
Metadata: |
|
Other data
Secondary language: |
English |
Secondary title: |
Regularization or restricted canonical correlation analysis |
Secondary abstract: |
Canonical correlation methods are a family of statistical methods for the analysis of correlation between two sets of variables. The standard technique for solving canonical correlation analysis problems is based on an eigenvalue problem. The canonical solution consists of a pair of canonical variables and the corresponding canonical correlation. The first pair of canonical variables has the largest canonical correlation, the second pair of canonical variables has the second largest canonical correlation, and so on. The original canonical correlation analysis was developed to examine linear relationships between two sets of variables. In order to increase the flexibility of the original method, several extensions of canonical correlation analysis have been proposed. Two extensions will be discussed in some detail, restricted canonical correlation analysis and restricted kernel canonical correlation analysis. The former examines linear relationships and the latter non-linear relationships. The standard technique for solving the two restricted problems is limited to the first pair of canonical variables. The search process has an exponential time complexity and even problems with a few tens of variables cannot be solved in a feasible time. In this doctoral dissertation we propose an alternative technique for solving the two restricted problems. The proposed alternative technique is based on the alternating least-squares and regularization. Combining both, we were able to solve the two restricted problems with tens of thousands of variables in a feasible time. The proposed alternative technique was implemented as several algorithms in Python. The algorithms were successfully applied to the analysis of TIMSS international assessment data and to the problem of cross-language information retrieval. |
Secondary keywords: |
alternating least-squares;cross-language information retrieval;kernel methods;restricted canonical correlation analysis;regularization;Social science research;Statistical methods;Methodology;Doctoral dissertations; |
Type (COBISS): |
Doctoral dissertation |
Study programme: |
0 |
Embargo end date (OpenAIRE): |
1970-01-01 |
Thesis comment: |
Univ. v Ljubljani, Fak. za družbene vede |
Pages: |
167 str. |
ID: |
12035039 |