Strojno učenje v porazdeljenem okolju z uporabo paradigme MapReduce

magistrsko delo

Roman Orač (Author), Marko Robnik Šikonja (Mentor), Nada Lavrač (Co-mentor)

Abstract

Keywords

MapReduce;porazdeljeno računanje;Disco;strojno učenje;sumarna oblika;DiscoMLL;porazdeljeni naključni gozdovi;Clowd-Flows;računalništvo;računalništvo in informatika;magisteriji;

Data

Language:	Slovenian
Year of publishing:	2014
Typology:	2.09 - Master's Thesis
Organization:	UL FRI - Faculty of Computer and Information Science
Publisher:	[R. Orač]
UDC:	004.85(043.2)
COBISS:	1536017347
Views:	61
Downloads:	6
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Machine learning algorithms in distributed environment with MapReduce paradigm
Secondary abstract:	Implementation of machine learning algorithms in a distributed environment ensures us multiple advantages, like processing of large datasets and linear speedup with additional processing units. We describe the MapReduce paradigm, which enables distributed computing, and the Disco framework, which implements it. We present the summation form, which is a condition for efficient implementation of algorithms with the MapReduce paradigm, and describe the implementations of the selected algorithms. We propose novel distributed random forest algorithms that build models on subsets of the dataset. We compare time and accuracy of the algorithms with the well recognized data analytics tools. We end our master thesis by describing the integration of the implemented algorithms into the ClowdFlows platform, which is a web platform for construction, execution and sharing of interactive workflows for data mining. With this integration, we enabled processing of big batch data with visual programming.
Secondary keywords:	MapReduce;distributed computing;Disco;machine learning;DiscoMLL;distributed random forest;ClowdFlows;computer science;computer and information science;master's degree;
File type:	application/pdf
Type (COBISS):	Master's thesis/paper
Study programme:	1000471
Thesis comment:	Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages:	123 str.
ID:	8739557