magistrsko delo
Nik Pirnovar (Author), Matej Guid (Mentor)

Abstract

Napovedovanje prihodnjih vrednosti časovnih vrst je izvedljivo tako s statističnimi pristopi kot s strojnim učenjem. Slednje po letih raziskav ponuja številne tehnike, protokole in napovedne modele za napovedovanje tako ene kot več prihodnjih vrednosti časovno zbranih podatkov. Pri napovedovanju vrednosti časovnih vrst navadno govorimo o napovedovanju ene točke, saj je napovedovanje več časovnih točk v prihodnost bolj zapleteno. Za večtočkovno napovedovanje se moramo namreč spopasti z višjo akumulacijo napovednih napak. Splošno je sprejeto, da več točk vnaprej kot napovemo, večja bo napovedna napaka bolj oddaljenih točk. V delu smo iskali optimalne kombinacije tehnik obdelav časovnih vrst in parametrizacije različnih napovednih modelov. Napovedovali smo več točk vnaprej različno pogostih dogodkov iz podatkov spletnega oglaševanja na družabnih omrežjih. Poudarek smo namenili različnim arhitekturam nevronskih mrež z dolgim kratkoročnim spominom (LSTM). Napovedi tehnik, ki uporabljajo nevronske mreže, smo želeli primerjati z napovedmi statističnega in za napovedovanje časovnih vrst priznanega napovednega modela ARIMA ter napovedmi regresijskega modela XGBoost. Slednjega smo uporabili, ker v zadnjem času podaja izredno dobre rezultate na različnih tekmovanjih številnih področij strojnega učenja. Predpostavili smo, da bodo arhitekture LSTM dajale najbolj natančne napovedi. S poskusi in analizo rezultatov smo ugotovili, da najbolj natančne napovedi vseh pogostosti dogodkov po pričakovanjih vračajo nevronske mreže z dolgim kratkotrajnim spominom. Po natančnosti se jim predvsem pri pogostih dogodkih približajo le napovedi modela XGBoost, napovedi modela ARIMA pa so v povprečju najmanj natančni. Pomembno vlogo pri natančnosti rezultatov ima uporaba obdelav podatkov. Pri vseh velikostih oken sta se dobro obnesli logaritemska preslikava in normalizacija, pri večjih napovednih oknih pa predvsem odstranitev sezonskosti. Ugotovili smo tudi, da z večanjem napovednega okna napovedni modeli pri posameznih skupinah napovedne točnosti ne izgubljajo.

Keywords

podatkovna znanost;umetna inteligenca;analiza časovnih vrst;obdelava časovnih vrst;napovedovanje več točk časovnih vrst;nevronske mreže z dolgim kratkoročnim spominom;spletno oglaševanje;računalništvo;računalništvo in informatika;magisteriji;

Data

Language: Slovenian
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [N. Pirnovar]
UDC: 004(043.2)
COBISS: 1538494659 Link will open in a new window
Views: 767
Downloads: 446
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Multi-step time series forecasting with long short-term memory neural networks
Secondary abstract: Predicting future values of time series is possible using statistical approaches and machine learning. After years of research, the latter offers numerous techniques, protocols, and predictive models for predicting one as well as numerous future values of time-collected data. When predicting time series values, we usually talk about predicting one point, as predicting more than one future time point is a more complex problem. For predicting more than one point, we must face a higher accumulation of the predictive errors. It is generally accepted that the more points predicted beforehand, the greater the predictive error of the more distant points. In the thesis, we were looking for optimal combinations of the time series processing techniques and parameterization of different predictive models. We predicted several points of predetermined events with different frequency based on online advertising data on social networks. We focused on various architectural neural networks with long short-term memory (LSTM). We wanted to compare the predictions of techniques using neural networks with the predictions of the statistical ARIMA predictive model recognized for predicting time series, as well as the predictions of the XGBoost regression model. The latter was used due to the fact that it has lately given very good results in various competitions of numerous fields of machine learning. We assumed that LSTM architectures will provide the most accurate predictions. Based on the experiments and results analysis, we established that neural networks with long short-term memory give the most accurate predictions of all frequency of events. Considering the accuracy, the closest to the mentioned neural networks are only XGBoost model predictions, while ARIMA model predictions are on average the least accurate. Data processing plays an important role in accuracy of the results. Logarithmic transformation and normalization performed well for all window sizes, and for larger predictive windows the removal of seasonality was the best option. We also established that predictive accuracy is not lost by increasing predictive window sizes in individual groups of the predictive models.
Secondary keywords: data science;artificial intelligence;time series analysis;time series preparation;multi-step time series forecasting;long short-term memory neural networks;online advertising;computer science;computer and information science;master's degree;
Type (COBISS): Master's thesis/paper
Study programme: 1000471
Embargo end date (OpenAIRE): 1970-01-01
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: 137 str.
ID: 11332762