diplomsko delo

Abstract

Ugotavljanje konsistentnosti anketnih odgovorov s strojnim učenjem

Keywords

strojno učenje;anketiranje;spletno anketiranje;računalništvo;računalništvo in informatika;visokošolski strokovni študij;diplomske naloge;

Data

Language: Slovenian
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [D. Ognjenović]
UDC: 004.85(043.2)
COBISS: 8632660 Link will open in a new window
Views: 31
Downloads: 1
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Analysis of survey consistency with machine learning
Secondary abstract: We researched the quality of survey responses. We don't know if answers really reect the opinion of interviewees. We believe that inconsistent, respondents can be detected with the use of machine learning techniques. Our idea is to build a prediction model for every question of a survey. With the models, we get a probability distribution for every answer in the survey. We use cross-validation to get distributions for all instances. We evaluate them with Brier score, information score, probabilities, classification accuracy, Birer ranking, information score ranking, probability ranking and classification accuracy ranking. We merge these scores, and get an inconsistency score for every instance (interviewee) of the survey. We visualize these inconsistent cases for a better comprehension. We developed the method with the statistical system R and packages CORElearn [14], MASS [20] and rpart [16]. For the visualization we used package CORElearn and data mining software Orange [5]. For testing purposes we used data sets Monk, B2B, B2C, DPS and hearnig aid. As prediction models we mostly used random forests, because of their superb accuraccy. Missing values were imputed with the use of k-nearest neighbor (kNN), modus, mean, or the instance was simply removed from the data. We generated inconsistent data and tried to identify these cases. There were some variance in our incosistency scores, so we reduced it by averaging the scores. For a better comprehension and indetification, we have plotted the cases that were identi_ed as inconsistent. The results depend on the data and evaluation method. Brier score, probabilities, Brier ranking and prabability ranking in most cases identified all inconsistent instances (interviewees). Other methods sometimes failed to identify inconsistent cases. The approach is computationaly demanding for larger datasets.
Secondary keywords: survey;machine learning;Brier score;information score;rank;probabilities;R;CORElearn;MASS;rpart;Orange;Monk;B2B;B2C;DPS;Hearing aid;random forest
File type: application/pdf
Type (COBISS): Bachelor thesis/paper
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: 56 str.
ID: 23936526