diplomsko delo

Povzetek

Ugotavljanje konsistentnosti anketnih odgovorov s strojnim učenjem

Ključne besede

strojno učenje;anketiranje;spletno anketiranje;računalništvo;računalništvo in informatika;visokošolski strokovni študij;diplomske naloge;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.11 - Diplomsko delo
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [D. Ognjenović]
UDK: 004.85(043.2)
COBISS: 8632660 Povezava se bo odprla v novem oknu
Št. ogledov: 31
Št. prenosov: 1
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: Analysis of survey consistency with machine learning
Sekundarni povzetek: We researched the quality of survey responses. We don't know if answers really reect the opinion of interviewees. We believe that inconsistent, respondents can be detected with the use of machine learning techniques. Our idea is to build a prediction model for every question of a survey. With the models, we get a probability distribution for every answer in the survey. We use cross-validation to get distributions for all instances. We evaluate them with Brier score, information score, probabilities, classification accuracy, Birer ranking, information score ranking, probability ranking and classification accuracy ranking. We merge these scores, and get an inconsistency score for every instance (interviewee) of the survey. We visualize these inconsistent cases for a better comprehension. We developed the method with the statistical system R and packages CORElearn [14], MASS [20] and rpart [16]. For the visualization we used package CORElearn and data mining software Orange [5]. For testing purposes we used data sets Monk, B2B, B2C, DPS and hearnig aid. As prediction models we mostly used random forests, because of their superb accuraccy. Missing values were imputed with the use of k-nearest neighbor (kNN), modus, mean, or the instance was simply removed from the data. We generated inconsistent data and tried to identify these cases. There were some variance in our incosistency scores, so we reduced it by averaging the scores. For a better comprehension and indetification, we have plotted the cases that were identi_ed as inconsistent. The results depend on the data and evaluation method. Brier score, probabilities, Brier ranking and prabability ranking in most cases identified all inconsistent instances (interviewees). Other methods sometimes failed to identify inconsistent cases. The approach is computationaly demanding for larger datasets.
Sekundarne ključne besede: survey;machine learning;Brier score;information score;rank;probabilities;R;CORElearn;MASS;rpart;Orange;Monk;B2B;B2C;DPS;Hearing aid;random forest
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Diplomsko delo/naloga
Komentar na gradivo: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Strani: 56 str.
ID: 23936526