(magistrsko delo)
Nejc Haberman (Author), Gregor Štiglic (Mentor), Jernej Ule (Co-mentor)

Abstract

Proteini, ki se vežejo na RNK (ang. RNA-binding proteins), imajo pomembno vlogo pri regulaciji posttrankripcijskih procesov in so ključni regulatorji genske ekspresije. Za preučevanje vezavnih proteinov na RNK v živih celicah sta najprimernejši metodi CLIP (UV cross-linking and immunoprecipitation) in iCLIP (individual-nucleotide resolution CLIP). RNA-seq je postala standardizirana metoda za merjenje genske ekspresije celotnega transkriptoma. V naši raziskavi smo preučevali najprimernejšo normalizacijo CLIP-podatkov z RNA-seq podatki. Glavni problem je, da posamezna CLIP/iCLIP-metoda določi več vezavnih mest v visoko izraženih delih molekule RNK, ki posledično prekrijejo nizko izražena vezavna mesta. Zato je potreba po RNA-seq podatkih, da lahko z njimi normaliziramo vezavna mesta CLIP/iCLIP- podatkov. V ta namen smo načrtovali in implementirali različne metode normalizacij CLIP- in RNA-seq podatkov in jih primerjali z do sedaj znanimi rezultati za protein LIN28A. V našo raziskavo smo vključili programsko orodje Piranha, ki že uporablja metodo normalizacije z regresijskim modelom ZTNBR (zero-truncated negative binomial distributions regression), ki pa se ni izkazala za primerno rešitev našega problema. Izdelali smo hibridno metodo, ki izboljša objavljeno metodo, ki uporablja preprosto normalizacijo vsote RNA-seq odčitkov pri proteinu LIN28A. Hibridna metoda uporablja statistični model ZTNB (zero-truncated negative binomial distributions) za identifikacijo signifikantnih vezavnih mest in je del programskega orodja Piranha. Za vse opisane metode normalizacij smo razvili cevovod programskih skript in bioinformatskih orodij za primerjalno analizo metod. Prav tako smo razvili programski cevovod za preprocesiranje iCLIP-podatkov, ki jih lahko uporabimo v omenjenih normalizacijah. Vse metode bodo prosto dostopne na spletu in se bodo lahko uporabljale v prihodnjih raziskavah za analizo vezavnih proteinov na RNK s CLIP/iCLIP- in ostalimi sorodnimi metodami sekveniranja.

Keywords

normalizacija;vezavni proteini na RNK;CLIPa;iCLIP;RNA-seq;LIN28A;bioinformatska orodja;genska obogatitev;

Data

Language: Slovenian
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UM FZV - Faculty of Health Sciences
Publisher: [N. Haberman]
UDC: 577
COBISS: 1916836 Link will open in a new window
Views: 1680
Downloads: 169
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Binding site identification and gene ranking based on CLIP and RNAseq data
Secondary abstract: RNA binding proteins (RBPs) are key players in post-transcriptional processes and have a major role in the regulation of gene expression. CLIP (UV cross-linking and immunoprecipitation) and iCLIP (individual-nucleotide resolution CLIP) are the most appropriate methods to study protein-RNA interactions in living cells. For gene expression measurements across the transcriptome, RNA-seq has become the standard method of choice. In our study, we searched for the most appropriate method to normalize CLIP data and integrating it with matched RNA-seq data. The main need for normalization stems from the fact that the number of reads detected by CLIP/iCLIP methods depends on the expression level of each mRNA. Therefore, RNA-seq data is needed to normalize the CLIP/iCLIP data. We developed different types of algorithms for data normalization, and tested them against a published method and data sets for LIN28A protein using CLIP and RNA-seq data. In our study we included Piranha software with integrated CLIP and RNA-seq normalization using a ZTNBR (zero-truncated negative binomial distributions regression) model and we have found that ZTNBR is not suitable for the described problem. We devised a hybrid method which extends the previous simplistic normalization method by using a statistical ZTNB (zero-truncated negative binomial distributions) model for significant binding site identification and which also is a part of Piranha software. For all normalization methods we developed a pipeline of scripts and bioinformatics tools for comparative analysis of the different types of normalizations. We also developed a pre-processing pipeline for the iCLIP data which can be used for the same type of normalizations. All the methods will be freely available online and we imagine their frequent use in future studies of RNA binding proteins exploiting CLIP and related sequencing techniques.
Secondary keywords: normalization;RNA binding proteins;CLIP;iCLIP;RNA-seq;LIN28A;bioinformatic tools;gene enrichment;
URN: URN:SI:UM:
Type (COBISS): Master's thesis/paper
Thesis comment: Univ. v Mariboru, Fak. za zdravstvene vede
Pages: VIII, 65 str.
ID: 8725859