Primerjava statističnih modelov za napovedovanje uporabnikove zvestobe menjalnici kriptovalut

magistrsko delo

Matej Gregorc (Author), Irena Ograjenšek (Mentor), Erik Štrumbelj (Co-mentor)

Abstract

V magistrskem delu s pomočjo več statističnih modelov na podlagi podatkov o uporabnikih menjalnice kriptovalut napovedujemo uporabnikovo zvestobo. Zvestoba je v kontekstu uporabnikov storitve nekega podjetja pomemben dejavnik, ki vpliva na uspešnost poslovanja. Vedenje zvestega uporabnika nekega podjetja se kaže v njegovem prepričanju, da storitve oz. produkti tega podjetja dobro zadovoljijo njegove potrebe. Zaradi tega se jih ponovno poslužuje, pri čemer nima interesa po uporabi ponudbe konkurence. Prvi gradnik k zvestobi je kakovost izdelkov in storitev, ki jih podjetja ponujajo. Pri ohranjanju zvestih uporabnikov pa je pomembno tudi, da gradijo odnos z njimi. Pogoj za grajenje dobrega odnosa pa je poznavanje svoje baze zvestih uporabnikov, tako njihovih sociodemografskih kot tudi vedenjskih značilnosti. Zbiranje in dostopnost velikih količin podatkov o uporabnikih in njihovih dejavnostih v menjalnicah kriptovalut omogoča podroben vpogled v vsakega posameznika in identificiranje skupnih lastnosti zvestih uporabnikov. Ta spoznanja podjetja lahko uporabijo za ohranjanje in povečevanje baze zvestih uporabnikov, kar lahko vpliva na poslovno uspešnost. V magistrskem delu so uporabljeni trije različni statistični modeli: logistična regresija, naključni gozdovi in gradient boosted odločitvena drevesa, s katerimi napovedujemo binarno spremenljivko zvestoba. Dogodek zvestobe je redek, zgolj 2,96% uporabnikov je po definiciji, ki se uporablja v podjetju, zvestih. Statistični modeli so vrednoteni na podlagi občutljivosti in specifičnosti napovedi ter mere AUC in G-povprečja z 8-kratnim prečnim preverjanjem. Modeli grajeni na osnovnem vzorcu, brez prilagajanja za redke dogodke, kažejo močno pristranskost redkega dogodka. Razlika med občutljivostjo in specifičnostjo napovedi je velika, pri čemer je točnost napovedi za zveste uporabnike nizka (največ 51%). Pristranskost redkega dogodka smo naslovili z metodo podvzorčenja, kjer modele učimo na zmanjšanem vzorcu z enakim razmerjem med zvestimi in ne-zvestimi uporabniki. Modeli na zmanjšanem vzorcu napovedujejo bolj točno, hkrati pa so razlike med občutljivostjo in specifičnostjo občutno manjše. Povprečna vrednost G-povprečja vseh treh modelov je višja od 85% največ pri modelu gradient boosting, kjer znaša 89,61%, pri čemer je točnost napovedi za zveste uporabnike 91,70%. Pomemben del analize zvestih uporabnikov je tudi razumevanje lastnosti teh oseb, kar lahko analiziramo s pomočjo pomembnosti spremenljivk. Ta je v magistrskem delu vrednotena z vrednostmi SHAP, ki omogočajo interpretacijo kompleksnih modelov. Med tremi uporabljenimi modeli lahko najdemo spremenljivke, ki imajo v vseh podoben vpliv, hkrati pa s primerjanjem med modeli zaznamo tudi nekaj razlik v vplivu spremenljivk na napoved.

Keywords

zvestoba;napovedni modeli;logistična regresija;naključni gozd;gradient boosted odločitvena drevesa;pomembnost spremenljivk;vrednost SHAP;magisteriji;

Data

Language:	Slovenian
Year of publishing:	2025
Typology:	2.09 - Master's Thesis
Organization:	UL FE - Faculty of Electrical Engineering
Publisher:	[M. Gregorc]
UDC:	311(043.3)
COBISS:	234034691
Views:	192
Downloads:	64
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Comparison of statistical models for prediction of user loyalty to a cryptocurrency exchange
Secondary abstract:	In the master's thesis, multiple statistical models are used to predict user loyalty based on data from cryptocurrency exchange users. Loyalty, in the context of users of a service a company provides, is an important factor that influences business success. The behavior of a loyal user is reflected in their belief that the company's services or products adequately meet their needs, leading them to use them repeatedly without showing interest in competitors' offerings. The first building block of loyalty is the quality of the products or services offered by the company. Maintaining loyal users also requires building a relationship with them, and a prerequisite for building a good relationship is knowing your loyal user base, including their sociodemographic and behavioral characteristics. The collection and availability of large amounts of data on users and their activities in cryptocurrency exchanges allow for a detailed insight into each individual, while also identifying common characteristics of loyal users. Companies can use these insights to maintain and expand their base of such users, which can, in turn, impact business performance. Three different statistical models are used in the master's thesis: logistic regression, random forest, and gradient-boosted decision trees, to predict the variable loyalty, which is a binary variable with values of 0 or 1. The loyalty event (value 1 of the loyalty variable) is rare, with only 2.96% of users being loyal according to the company's definition. The statistical models are evaluated based on the sensitivity and specificity of the prediction, as well as the AUC and G-means measures, using 8-fold cross-validation. All three models built on the baseline sample, without adjusting for rare events, show a strong rare event bias. There is a significant difference between the sensitivity and specificity of the predictions, with low accuracy for loyal users (up to 51%). The rare event bias was addressed using the undersampling method, where the models are trained on a reduced sample with an equal ratio of loyal to non-loyal users. The models on the reduced sample show higher prediction accuracy, and the differences between sensitivity and specificity are significantly smaller. The average value of G-means of all three models is above 85%, with the highest being in the gradient boosting model, where it is 89,61%, with prediction accuracy for loyal users at 91.70%. An important part of the analysis of loyal users is also understanding the characteristics of these individuals, which can be analyzed using the importance of variables in predictive models. The importance of variables in the master's thesis is evaluated using SHAP values, which allow for the interpretation of complex models. Among the three models used, common points or variables can be found that have a similar impact across all three models, while some differences in the impact of variables on the final prediction can also be observed when comparing the models.
Secondary keywords:	loyalty;prediction models;logistic regression;random forest;gradient boosted decision trees;variable importance;SHAP value;
Type (COBISS):	Master's thesis/paper
Study programme:	1000927
Thesis comment:	Univ. v Ljubljani, Fak. za elektrotehniko
Pages:	1 spletni vir (1 datoteka PDF (XII, 59 str.))
ID:	26048550