magistrsko delo
Abstract
S pojavom vseprisotnih naprav vsak njihov uporabnik ustvarja podatke fizičnih aktivnosti, ne glede na to, ali se tega zaveda. Raziskovanje na področju športa, ki predstavlja družbo in kulturo, je oteženo, saj je dostop do teh podatkov omejen. Za namene strojnega učenja in umetne inteligence se je potreba po velikih količinah podatkov povečala na vseh področjih, kar vodi v generiranje sintetičnih podatkov. To so podatki, ki imajo korelacije, vzorce in statistične značilnosti resničnih podatkov, vendar so nastali s tehnikami vzorčenja ali simuliranja naravnega okolja. Z njimi razširimo izvorno učno množico za strojno učenje in se izognemo ogrožanju varnosti posameznikov. Smemo jih prosto objaviti, saj ne vsebujejo resničnih osebnih podatkov. Njihova struktura je določena, možnosti za napake je manj. Pomembno je, da nastale podatke kvalitetno ocenimo in se pred njihovo uporabo prepričamo, da so primerljivi resničnim. Kot rezultat eksperimentalnega dela je nastala programska knjižnica SportyDataGen, zmožna generiranja sintetičnih podatkov in njihovega ocenjevanja s statističnimi metrikami.
Keywords
DKW neenakost;generiranje podatkov;večrazsežni test KS;sintetični podatki;SportyDataGen;športne aktivnosti;zbirne metrike;magistrske naloge;
Data
Language: |
Slovenian |
Year of publishing: |
2024 |
Typology: |
2.09 - Master's Thesis |
Organization: |
UM FERI - Faculty of Electrical Engineering and Computer Science |
Publisher: |
[R. Kukovec] |
UDC: |
004.652.6(043.2) |
COBISS: |
202502915
|
Views: |
105 |
Downloads: |
25 |
Average score: |
0 (0 votes) |
Metadata: |
|
Other data
Secondary language: |
English |
Secondary title: |
Methods for generating synthetic sports activity data |
Secondary abstract: |
With the emergence of ubiquitous devices, every user generates data of physical activity, whether they are aware of it or not. Research in the field of sport, representing society and culture, is hindered by the limited access to this data. For machine learning and artificial intelligence purposes, the demand for large amounts of data has increased across all fields, leading to the generation of synthetic data. These are datasets that possess correlations, patterns and statistical characteristics of real data but are produced by sampling techniques or by simulating the natural environment. They are used to expand the original training set for machine learning while avoiding the compromise of individual security. These datasets can be freely published as they do not contain actual personal data. Their structure is well-defined reducing the possibilities for errors. It is important to assess the quality of the resulting data and ensure their comparability to real data before their use. As a result of experimental work, the software library SportyDataGen has been developed, capable of generating synthetic data and evaluating it with statistical metrics. |
Secondary keywords: |
DKW inequality;data generation;multivariate KS test;synthetic data;SportyDataGen;sports activities;summary metrics; |
Type (COBISS): |
Master's thesis/paper |
Thesis comment: |
Univ. v Mariboru, Fak. za elektrotehniko, računalništvo in informatiko, Informatika in tehnologije komuniciranja |
Pages: |
1 spletni vir (1 datoteka PDF (XVII, 82 f.)) |
ID: |
23455930 |