magistrsko delo
Veronika Dolšak (Author), Mario Gorenjak (Mentor), Uroš Potočnik (Co-mentor)

Abstract

Izhodišče: Razvoj tehnologije sekvenciranja naslednje generacije je močno pospešil hitrost pridobivanja velike količine podatkov sekvenciranja, ki potrebujejo nadaljnje bioinformatske analize, posledično pa je hitro naraslo tudi število programskih orodij za urejanje teh podatkov. Pogosta izbira za analizo podatkov RNA-sekvenciranja (RNA-seq) za odkrivanje genov in poti diferencialnega izražanja genov z zagotavljanjem popolne analize so programski paketi Bioconductor, namenjeni za delo v programskem okolju R. Različice programskega okolja R se pogosto nadgrajujejo, zaradi česar se v praksi opazi različno učinkovitost, kar lahko vpliva na primerljivost rezultatov analiz RNA-seq, analiziranih z več različicami programskega okolja R. Metode: Surove podatke RNA-seq smo analizirali z uporabo programskih orodij Bioconductor: Rsubread, edgeR in limma, in to v več različicah programskega okolja R: R 3.5, R 3.6, R 4.0, R 4.1 in R 4.2. Rezultati: Rezultati primerjav učinkovitosti poravnave s programskim orodjem Rsubred kažejo statistično pomembne razlike med primerjavami R 4.2 z ostalimi različicami programskega okolja R, prav tako se kažejo statistično pomembne razlike v rezultatih primerjav analize diferencialnega izražanja genov, pridobljenih z istim cevovodom ukazov med različico R 4.2 in ostalimi različicami R ter med različico R 3.5 in ostalimi različicami R. Diskusija: Iz rezultatov smo ugotovili, da je treba izvajati analizo podatkov RNA-seq z najnovejšo posodobljeno različico programskega okolja R in najnovejšimi različicami programskih orodij Bioconduktor, kar je še posebnega pomena, kadar izvajamo metaanalizo podatkov RNA-seq iz različnih neodvisnih študij.

Keywords

RNA-sekvenciranje;diferencialno izražanje genov;R;bioinformatika;

Data

Language: Slovenian
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UM - University of Maribor
Publisher: [V. Dolšak]
UDC: 575.112(043.2)
COBISS: 158635011 Link will open in a new window
Views: 8
Downloads: 2
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Comparison of performance efficiency and reproducibility of RNA-seq bioinformatics analyses between different upgrades of R software environment
Secondary abstract: Basis: The development of next-generation sequencing technology has been greatly accelerated by the speed of obtaining a large amount of sequencing data that needs further bioinformatics analysis. Consequently, the number of software tools for editing this data has also grown rapidly. A common choice for analyzing RNA sequencing (RNA-seq) data to discover genes and pathways of differential gene expression by providing complete analysis is the Bioconductor software packages which are designed to work in the R programming environment. Versions of the R programming environment are frequently upgraded because of which different efficiency occurs in practice, which may affect the comparability of the results of RNA-seq analyses analyzed with different versions of the R programming environment. Methods: We analyzed raw RNA-seq data using the Bioconductor software tools (Rsubread, edgeR, and limma) in different versions of the R programming environment: R 3.5, R 3.6, R 4.0, R 4.1, and R 4.2. Results: The results of the comparisons of the efficiency of the alignment with the Rsubred software tool show statistically significant differences between the comparisons of R 4.2 with other versions of R. There are also statistically significant differences in the results of the comparisons of the analysis of the differential expression of genes obtained with the same pipeline of commands between the versions of R 4.2 and other versions of R, as well as between R 3.5 and other R versions. Discussion: Based on the results, we ascertained that it is necessary to perform the analysis of RNA-seq data with the latest updated version of the R programming environment and the latest versions of the Bioconductor programming tools, which is of particular importance when performing a meta-analysis of RNA-seq data from different independent studies. 
Secondary keywords: RNA sequencing;differential gene expression;R;Medical informatics;RNA;Medicinska informatika;
Type (COBISS): Master's thesis/paper
Thesis comment: Univ. v Mariboru, Fak. za zdravstvene vede
Pages: 1 spletni vir (1 datoteka PDF (XII, 60 str.))
ID: 19019713