Odkrivanje enačb za celoštevilska zaporedja z verjetnostnimi gramatikami

magistrsko delo

Boštjan Gec (Author), Ljupčo Todorovski (Mentor)

Abstract

Algoritmi za odkrivanje enačb, ki uporabljajo verjetnostne gramatike, delujejo tako, da najprej vzorčijo strukture izrazov iz gramatike in nato na podlagi teh poiščejo enačbe, ki se najbolj prilegajo vhodnim podatkom. Strukture izrazov vzorčijo na podlagi verjetnosti, ki jih določa verjetnostna gramatika. Problem, ki ga srečamo pri tem je, da želimo tvoriti samo končne strukture in želimo imeti ustrezno verjetnostno porazdelitev na množici vseh možnih končnih struktur izrazov, ki jih tvori gramatika. Na srečo lahko v ta namen na verjetnostne gramatike gledamo kot na večtipske procese razvejanja. Za te obstaja izrek, ki pod določenimi pogoji pove, kdaj lahko ustrezno porazdelitev definiramo in kdaj ne. Poleg tega v magistrskem delu razvijem empirično okolje, ki omogoča uporabo omenjenih algoritmov za odkrivanje enačb v celoštevilskih zaporedjih iz Spletne enciklopedije celoštevilskih zaporedij (OEIS). Uporabo okolja ilustriram na odkrivanju enačb za štirinajst izbranih zaporedij iz OEIS.

Keywords

odkrivanje enačb;simbolna regresija;strojno učenje;verjetnostne kontekstno-neodvisne gramatike;večtipski procesi razvejanja;

Data

Language:	Slovenian
Year of publishing:	2021
Typology:	2.09 - Master's Thesis
Organization:	UL FMF - Faculty of Mathematics and Physics
Publisher:	[B. Gec]
UDC:	519.21:004
COBISS:	79059971
Views:	1223
Downloads:	100
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Equation discovery for integer sequences with probabilistic grammars
Secondary abstract:	Equation discovery algorithms that are based on probabilistic grammars sample arithmetic expressions from the grammar that are then fitted to the input data, to become equations that describe that data. The arithmetical expressions are generated according to the probabilities encoded in the probabilistic grammar. The problem we encounter in this approach is that we consider only finite expressions and we try to define the corresponding probabilistic distribution on the space of the candidate finite expressions. Fortunately, probabilistic grammars can be seen as multitype branching processes. I present and partly prove a theorem that holds for multitype branching processes that tells us whether the grammar properly define the corresponding distribution or not. Furthermore, in this master thesis I design an empirical framework for applying the aforementioned algorithm to the task of discovery of equations that hold for integer sequences from The On-Line Encyclopedia of Integer Sequences (OEIS). I illustrate the use of the framework on discovery of equations for fourteen selected sequences from OEIS.
Secondary keywords:	equation discovery;symbolic regression;machine learning;probabilistic context-free grammars;multitype branching processes;
Type (COBISS):	Master's thesis/paper
Study programme:	0
Thesis comment:	Univ. v Ljubljani, Fak. za matematiko in fiziko, Oddelek za matematiko, Matematika - 2. stopnja
Pages:	VII, 52 str.
ID:	13600429