Okrepitveno učenje agentov za igranje iger v pogonu Unity

magistrsko delo

Jan Banko (Author), Damjan Strnad (Mentor), Štefan Kohek (Co-mentor)

Abstract

V magistrskem delu obravnavamo algoritme okrepitvenega učenja na primeru igranja računalniških iger. Namen magistrskega dela je implementacija igre v okolju Unity in analiza učinkovitosti algoritmov okrepitvenega učenja računalniškega igralca. Opisane so teoretične osnove okrepitvenega učenja, podrobneje pa so predstavljeni algoritmi PPO (angl. Proximal Policy Optimization), SAC (angl. Soft Actor Critic) in DQN (angl. Deep Q-Network), ki so uporabljeni v končni analizi. Rezultati so pokazali, da je bilo učenje agenta v celoti gledano uspešno. V testnem okolju se je najbolje odrezal algoritem PPO, z uporabo katerega je naučen agent v povprečju dosegal 86,4% maksimalne možne nagrade, najslabše pa algoritem DQN, ki ni primeren za uporabo v implementiranem testnem okolju.

Keywords

okrepitveno učenje;računalniške igre;Unity;agent;strojno učenje;magistrske naloge;

Data

Language:	Slovenian
Year of publishing:	2021
Typology:	2.09 - Master's Thesis
Organization:	UM FERI - Faculty of Electrical Engineering and Computer Science
Publisher:	[J. Banko]
UDC:	004.85:004.96(043.2)
COBISS:	67936771
Views:	375
Downloads:	67
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Reinforcement learning of game-playing agents in the Unity engine
Secondary abstract:	In the master thesis we deal with the reinforcement learning algorithms on the example of playing computer games. The purpose of the thesis is to implement a game in the Unity engine and perform an effectiveness analysis of reinforcement learning algorithms of a computer player. Theoretic bases of reinforcement learning are described and PPO (Proximal Policy Optimization), SAC (Soft Actor Critic) and DQN (Deep Q-Network) algorithms that are used in the final analysis are presented in detail. The results have shown that the learning of the agent was overall successful. The best algorithm in the test environment was PPO, using which the agent achieved 86,4% of the maximal possible reward on average, and the worst was DQN, which is not suitable for use in the implemented test environment.
Secondary keywords:	reinforcement learning;computer games;Unity;agent;machine learning;
Type (COBISS):	Master's thesis/paper
Thesis comment:	Univ. v Mariboru, Fak. za elektrotehniko, računalništvo in informatiko, Računalništvo in informacijske tehnologije
Pages:	VIII, 53 str.
ID:	12934011