Generiranje anonimiziranih statističnih vzorcev iz zdravstvenih podatkovnih zbirk

magistrsko delo

Martin Arsovski (Author), Andrej Brodnik (Mentor), Janez Žibert (Co-mentor)

Abstract

Dandanes lahko rečemo, da je precej priljubljeno, hkrati pa lahko zelo koristno, proučevanje podatkov, povezanih z medicinskimi preiskavami med bolniki. Proučevanje takšnih podatkov je lahko zelo koristno v sodobni medicini in lahko tudi izboljša kakovost zdravstvenih storitev. Danes imajo verjetno vse bolnišnice za svoje bolnike zdravstvene podatkovne zbirke, ki vključujejo veliko zasebnih podatkov o pacientih, zdravstvenih obravnavah, posegih, laboratorijskih izvidih ipd. Za uporabo teh podatkov za izvajanje medicinskih raziskav in analiz pa bi morali imeti dovoljenje bolnišnic in drugih institucij, kar ljudem, ki se s tem ukvarjajo, predstavlja težavo. Poleg tega lahko takšne analize včasih stanejo veliko denarja in časa. Podatke je treba še anonimizirati in pripraviti tako, da ohranjajo statistične lastnosti osnovne podatkovne zbirke. V naši magistrski nalogi bomo pregledali in ustrezno predstavili več metod generiranja sintetičnih podatkov na podlagi dejanskih podatkov. Bomo izbrali in implementirali nekaj najboljših metod iz literature. Implementirane metode bomo uporabili za generiranje sintetičnih podatkov. Evaluacija postopkov generiranja vzorcev bo izvedena tako, da se bodo primerjale statistične lastnosti vzorca s populacijskimi lastnostmi. Na podlagi evaluacije bomo ocenili, katere metode generiranja sintetičnih podatkov so pri tem najuspešnejše.

Keywords

vzorčenje populacije;anonimizacija podatkov;zdravstvena informatika;statistika;sintetični podatki;računalništvo in informatika;magisteriji;

Data

Language:	Slovenian
Year of publishing:	2023
Typology:	2.09 - Master's Thesis
Organization:	UL FRI - Faculty of Computer and Information Science
Publisher:	[M. Arsovski]
UDC:	004.65:61(043.2)
COBISS:	178836995
Views:	37
Downloads:	3
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Generation of anonymized statistical samples from health databases
Secondary abstract:	Nowadays, we can say that it is quite popular, and at the same time it can be very useful, to study data related to medical examinations among patients. Studying such data can be very useful in modern medicine and can also improve the quality of health services. Today, probably all hospitals have medical databases for their patients, which include a lot of private data about patients, medical treatments, interventions, laboratory results, etc. However, in order to use this data to conduct medical research and analysis, you would have to get permission from hospitals and other institutions, which presents a problem for the people involved. In addition, such analyzes can sometimes cost a lot of money and time. The data must be anonymized and prepared in such a way that they preserve the statistical properties of the basic database. In our master's thesis, we will review and adequately present several methods of generating synthetic data based on real data. Based on the review, we will select some of the best methods from the literature and implement them. We will use the implemented methods to generate synthetic data. The evaluation of the sample generation procedures will be carried out by comparing the statistical properties of the sample with the population properties. Based on the evaluation, we will assess which methods of generating synthetic data are the most successful.
Secondary keywords:	population sampling;data anonymization;health informatics;statistics;synthetic data;computer science;computer and information science;master's degree;Zbirke podatkov;Zdravstvo;Medicinska informatika;Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS):	Master's thesis/paper
Study programme:	1000471
Embargo end date (OpenAIRE):	1970-01-01
Thesis comment:	Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages:	92 str.
ID:	21843139