Metode za ustvarjanje sintetičnih podatkov športnih aktivnosti

magistrsko delo

Rok Kukovec (Author), Iztok Fister (Mentor)

Abstract

S pojavom vseprisotnih naprav vsak njihov uporabnik ustvarja podatke fizičnih aktivnosti, ne glede na to, ali se tega zaveda. Raziskovanje na področju športa, ki predstavlja družbo in kulturo, je oteženo, saj je dostop do teh podatkov omejen. Za namene strojnega učenja in umetne inteligence se je potreba po velikih količinah podatkov povečala na vseh področjih, kar vodi v generiranje sintetičnih podatkov. To so podatki, ki imajo korelacije, vzorce in statistične značilnosti resničnih podatkov, vendar so nastali s tehnikami vzorčenja ali simuliranja naravnega okolja. Z njimi razširimo izvorno učno množico za strojno učenje in se izognemo ogrožanju varnosti posameznikov. Smemo jih prosto objaviti, saj ne vsebujejo resničnih osebnih podatkov. Njihova struktura je določena, možnosti za napake je manj. Pomembno je, da nastale podatke kvalitetno ocenimo in se pred njihovo uporabo prepričamo, da so primerljivi resničnim. Kot rezultat eksperimentalnega dela je nastala programska knjižnica SportyDataGen, zmožna generiranja sintetičnih podatkov in njihovega ocenjevanja s statističnimi metrikami.

Keywords

DKW neenakost;generiranje podatkov;večrazsežni test KS;sintetični podatki;SportyDataGen;športne aktivnosti;zbirne metrike;magistrske naloge;

Data

Language:	Slovenian
Year of publishing:	2024
Typology:	2.09 - Master's Thesis
Organization:	UM FERI - Faculty of Electrical Engineering and Computer Science
Publisher:	[R. Kukovec]
UDC:	004.652.6(043.2)
COBISS:	202502915
Views:	105
Downloads:	25
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Methods for generating synthetic sports activity data
Secondary abstract:	With the emergence of ubiquitous devices, every user generates data of physical activity, whether they are aware of it or not. Research in the field of sport, representing society and culture, is hindered by the limited access to this data. For machine learning and artificial intelligence purposes, the demand for large amounts of data has increased across all fields, leading to the generation of synthetic data. These are datasets that possess correlations, patterns and statistical characteristics of real data but are produced by sampling techniques or by simulating the natural environment. They are used to expand the original training set for machine learning while avoiding the compromise of individual security. These datasets can be freely published as they do not contain actual personal data. Their structure is well-defined reducing the possibilities for errors. It is important to assess the quality of the resulting data and ensure their comparability to real data before their use. As a result of experimental work, the software library SportyDataGen has been developed, capable of generating synthetic data and evaluating it with statistical metrics.
Secondary keywords:	DKW inequality;data generation;multivariate KS test;synthetic data;SportyDataGen;sports activities;summary metrics;
Type (COBISS):	Master's thesis/paper
Thesis comment:	Univ. v Mariboru, Fak. za elektrotehniko, računalništvo in informatiko, Informatika in tehnologije komuniciranja
Pages:	1 spletni vir (1 datoteka PDF (XVII, 82 f.))
ID:	23455930