masters thesis
Povzetek
Drevesno preiskovanje Monte Carlo(MCTS) je postalo znano po zaslugi uspehov
v igri Go, pri kateri računalnik nikoli prej ni premagal človeškega mojstra.
Nastalo je več različic algoritma. Ena izmed najbolj znanih različic je Zgornja
meja zaupanja za drevesa oz. UCT (Kocsis in Szepesvari). Mnoge izboljšave
osnovnega algoritma MCTS vključujejo uporabo domenskih hevristik, zaradi
katerih pa algoritem izgubi na splošnosti.
Cilj tega magistrskega dela je bil raziskati, kako izboljˇsati algoritem MCTS
brez ogrožanja njegove splošnosti. Paradigma spodbujevalnega učenja, ki se
imenuje učenje s časovnimi razlikami, omogoča uporabo kombinacije dveh
konceptov, dinamičnega programiranja in metod Monte Carlo. Moj cilj je
bil vkljuˇciti prednosti učenja s časovnimi razlikami v algoritem MCTS. Na
ta način se spremeni naˇcin posodabljanja vrednosti vozlišč glede na rezultat
oz. nagrado.
Iz rezultatov je mogoče sklepati, da je kombinacija algoritma MCTS in
učenja s časovnimi razlikami dobra ideja. Na novo razvit algoritem Sarsa-
TS(λ) kaže na splošno izboljšanje uspešnosti igranja. Ker pa so igre, na
katerih so bili izvedeni poskusi, zelo različne narave, se učinek algoritma na
uspešnost posameznih iger lahko precej razlikuje.
Ključne besede
Monte Carlo tree search;Monte Carlo;tree search;upper confidence bounds for trees;temporal difference learning;reinforcement learning;artificial intelligence;computer science;computer and information science;master's degree;
Podatki
Jezik: |
Angleški jezik |
Leto izida: |
2015 |
Tipologija: |
2.09 - Magistrsko delo |
Organizacija: |
UL FRI - Fakulteta za računalništvo in informatiko |
Založnik: |
[A. Deleva] |
UDK: |
004.85(043.2) |
COBISS: |
1536598211
|
Št. ogledov: |
650 |
Št. prenosov: |
219 |
Ocena: |
0 (0 glasov) |
Metapodatki: |
|
Ostali podatki
Sekundarni jezik: |
Slovenski jezik |
Sekundarni naslov: |
TD learning in Monte Carlo tree search |
Sekundarni povzetek: |
Monte Carlo tree search (MCTS) has become well known with its success in
the game of Go. A computer has never before won a game against a human
master player before. There have been multiple variations of the algorithm
since. One of the best known versions is the Upper Confidence Bounds for
Trees (UCT) by Kocsis and Szepesv´ari. Many of the enhancements to the
basic MCTS algorithm include usage of domain specific heuristics, which
make the algorithm less general.
The goal of this thesis is to investigate how to improve the MCTS algorithm
without compromising its generality. A Reinforcement Learning (RL)
paradigm, called Temporal Difference (TD) learning, is a method that makes
use of two concepts, Dynamic Programming (DP) and the Monte Carlo (MC)
method. Our goal was to try to incorporate the advantages of the TD learning
paradigm into the MCTS algorithm. The main idea was to change how
rewards for each node are calculated, and when they are updated.
From the results of the experiments, one can conclude that the combination
of the MCTS algorithm and the TD learning paradigm is after all a
good idea. The newly developed Sarsa-TS(λ) shows a general improvement
on the performance. Since the games we have done our experiments on are
all very different, the effect the algorithm has on the performance varies. |
Sekundarne ključne besede: |
drevesno preiskovanje Monte Carlo;Monte Carlo;drevesno preiskovanje;zgornja meja zaupanja za drevesa;učenje s časovnimi razlikami;umetna inteligenca;računalništvo;računalništvo in informatika;magisteriji; |
Vrsta datoteke: |
application/pdf |
Vrsta dela (COBISS): |
Magistrsko delo/naloga |
Študijski program: |
1000471 |
Konec prepovedi (OpenAIRE): |
1970-01-01 |
Komentar na gradivo: |
Univ. v Ljubljani, Fak. za računalništvo in informatiko |
Strani: |
56 str. |
ID: |
9055667 |