master's thesis
Patrik Kojanec (Author), Tomaž Curk (Mentor), Guido Sanguinetti (Co-mentor)

Abstract

Cell-to-cell variability is often associated with cell differentiation in embryo development or cancer outbursts. Although some of the variability in single-cell RNA sequencing (scRNA-seq) experiments is derived from technical noise, a significant proportion is still attributed to the biological processes within the cell. In this Master's thesis, we propose a novel approach to predict cell-to-cell gene expression variability and mean expression directly from the DNA sequence. For this purpose, we use the Enformer, a deep learning transformer model, to embed the DNA sequence into a more favorable feature space, from which we predict the mean expression and overdispersion of scRNA gene expression. We evaluated our approach on the mouse and human data gathered with two scRNA-seq protocols. Our approach can explain up to 60% and 25% of the variance of overdispersion in mouse and human datasets, respectively. Furthermore, in the thesis, we address the changes in the performance of our models caused by the differences in the scRNA-seq protocols.

Keywords

scRNA-seq;gene expression variability;deep learning;computer science;master's thesis;

Data

Language: English
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [P. Kojanec]
UDC: 004.8:575(043.2)
COBISS: 124837891 Link will open in a new window
Views: 31
Downloads: 17
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary title: Modeliranje variabilnosti genskega izražanja posameznih celic na podlagi sekvenc DNA
Secondary abstract: Variabilnost genskega izražanja večkrat povezujemo z dejavniki, ki uravnavajo celično diferenciacijo v zgodnjih fazah embrionalnega razvoja ali pa tvorbo rakavih celic. Variabilnost genskega izražanja posameznih celic lahko merimo z meritvami scRNA-seq, ki pa so zaradi tehničnih pomanjkljivosti zelo šumne. V magistrski nalogi predstavimo inovativen pristop za napoved variabilnosti genskega izražanja na podlagi genskih zaporedij DNA. Pri tem smo uporabili model globokega strojnega učenja Enformer, ki zaporedja DNA vloži v bolj učinkovit prostor značilk. Z uporabo linearnih modelov nato iz vložitev sekvenc napovemo povprečno gensko izražanje in razpršenost podatkov scRNA-seq. Predlagani pristop smo ovrednotili na podatkih dveh različnih organizmov, pridobljenih z dvema različnima protokoloma scRNA-seq. S predlaganim pristopom lahko pojasnimo do 60% variance razpršenosti genskega izražanja na naboru podatkov o miših in 25% na naboru človeških podatkov.
Secondary keywords: scRNA-seq;variabilnost genskega izražanja;globoko strojno učenje;magisteriji;Strojno učenje;Modeliranje podatkov (računalništvo);Genetika;Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS): Master's thesis/paper
Study programme: 1000471
Embargo end date (OpenAIRE): 1970-01-01
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: VI, 55 str.
ID: 16608103