Uporaba prenesenega učenja za zaznavo objektov v videu

diplomsko delo

Andraž Kristan (Avtor), Marko Meža (Mentor)

Povzetek

V diplomski nalogi sem preučeval uporabo metode prenesenega učenja za zaznavo in klasifikacijo metov igralne kocke. Glavni cilj je bil ugotoviti, kako na delovanje modela vplivata količina učnih podatkov ter raznolikost ozadja. Zajel sem 1200 slik metov kock, po 200 slik za vsako možno vrednost od 1 do 6, in jih razdelil v šest podatkovnih nizov različnih velikosti: 1200, 600, 150, 60 in 30 slik. Zadnji podatkovni niz je bil nato umetno razširjen z afinimi transformacijami (rotacije, translacije) na 600 slik. Na tej osnovi sem s pomočjo ogrodja PyTorch in metode prenesenega učenja izučil šest modelov YOLOv8, pri čemer je vsak model uporabljal svoj ustrezni učni nabor. Za testiranje modelov sem posnel 50 posnetkov metov kock – polovico na istem ozadju kot učne slike (referenčno ozadje), drugo polovico pa na črnem ozadju. Te posnetke sem uporabil za testiranje vseh modelov ter zabeležil stopnjo zaznave in klasifikacijsko natančnost, izračunal makro natančnost, priklic in F1 mero ter oblikoval matrike razvrščanja. Ugotovil sem, da je za brezhibno delovanje na referenčnem ozadju (100% zaznava in klasifikacijska natančnost) zadostovalo že 150 učnih slik. Na črnem ozadju je stopnja zaznave tega modela ostala visoka (98%), klasifikacijska natančnost in F1 mera pa sta se znižali na 78% oziroma 77%, kar kaže na pomemben vpliv spremembe ozadja. Tudi največja modela (naučena na 1200 in 600 slikah) sta na črnem ozadju naredila po dve napaki, kar kaže na večjo robustnost pri večji količini podatkov. Z manjšanjem količine učnih podatkov se je delovanje modelov poslabšalo. Model s 60 učnimi slikami je na referenčnem ozadju dosegel 73% zaznavo, 83% klasifikacijsko natančnost in 63% F1 mero, na črnem pa 75% zaznavo, 75% klasifikacijsko natančnost in le 55% F1 mero. Model, treniran zgolj na 30 slikah, ni zaznal niti ene kocke na nobenem ozadju, kar potrjuje, da je to število bistveno prenizko za obravnavano nalogo. Ugotovil pa sem, da lahko s pomočjo augmentacije učinkovito nadomestimo pomanjkanje podatkov. Model, izučen na umetno razširjenem naboru iz 30 na 600 slik, je na obeh ozadjih dosegel 100% zaznavo, na referenčnem ozadju 97% klasifikacijsko natančnost in F1 mero, na črnem pa 88% natančnost in 87% F1 mero. Pri manjših modelih ali na črnem ozadju se je najpogostejša napaka pri klasifikaciji pojavila pri vrednosti 2, ki je bila zamenjana za 1, medtem ko sta vrednosti 3 in 5 povzročali težave že pri zaznavi.

Ključne besede

preneseno učenje;detekcija objektov;globoko učenje;YOLO model;klasifikacija metov kock;univerzitetni študij;Elektrotehnika;diplomske naloge;

Podatki

Jezik:	Slovenski jezik
Leto izida:	2025
Tipologija:	2.11 - Diplomsko delo
Organizacija:	UL FE - Fakulteta za elektrotehniko
Založnik:	[A. Kristan]
UDK:	621.3(043.2)(0.034.2)
COBISS:	248279811
Št. ogledov:	75
Št. prenosov:	8
Ocena:	0 (0 glasov)
Metapodatki:

Ostali podatki

Sekundarni jezik:	Angleški jezik
Sekundarni naslov:	Transfer learning-based object detection in video
Sekundarni povzetek:	This thesis explores the use of transfer learning for the detection and classification of dice throws. The main objective was to determine how the number of training images and the variation in background affect model performance. I collected a dataset of 1,200 images, with 200 images per possible dice value, and divided them into six training subsets. The largest set contained all 1,200 images, while the others contained 600, 150, 60, and 30 images respectively. An additional set was created by augmenting the smallest set of 30 images using affine transformations, resulting in 600 training images. Each of the six YOLOv8 models was trained using transfer learning within the PyTorch framework. For testing, I recorded 50 video sequences of dice throws - half on the same background as the training images (reference background), and half on a black background. I used these videos to test all models and recorded detection rates and classification accuracy, calculated macro precision, recall, and F1 scores, and generated confusion matrices. I found that just 150 training images were sufficient for flawless performance on the reference background (100% detection and classification accuracy). On the black background, the detection rate of this model remained high (98%), but the classification accuracy and F1 score dropped to 78% and 77%, respectively, demonstrating the significant impact of background changes. Even the largest models (trained on 1200 and 600 images) made two identical classification errors on the black background, which indicates improved robustness with larger amounts of data. As the number of training images decreased, the performance of the models worsened. The model trained on 60 images achieved 73% detection, 83% classification accuracy, and a 63% F1 score on the reference background; on the black background, it achieved 75% detection, 75% classification accuracy, and only a 55% F1 score. The model trained on only 30 images failed to detect a single die on either background, confirming that this amount is clearly insufficient for the task at hand. However, I found that data augmentation can effectively compensate for the lack of training data. The model trained on the artificially expanded set of 600 images (from the original 30) achieved 100% detection on both backgrounds, 97% classification accuracy and F1 score on the reference background, and 88% accuracy and 87% F1 score on the black background. With smaller models or on the black background, the most frequent classification error occurred with the value 2, which was often misclassified as 1. Values 3 and 5 also caused problems, particularly in the detection phase.
Sekundarne ključne besede:	Transfer learning;object detection;deep learning;YOLO model;dice throw classification;
Vrsta dela (COBISS):	Diplomsko delo/naloga
Študijski program:	1000313
Konec prepovedi (OpenAIRE):	1970-01-01
Komentar na gradivo:	Univ. v Ljubljani, Fak. za elektrotehniko
Strani:	1 spletni vir (1 datoteka PDF (XVI, 41 str.))
ID:	27231883