magistrsko delo
Luka Močivnik (Author), Peter Trontelj (Reviewer), Tomaž Skrbinšek (Mentor), Tomaž Skrbinšek (Thesis defence commission member), Peter Trontelj (Thesis defence commission member), Matej Butala (Thesis defence commission member)

Abstract

Tehnologije sekvenciranja tretje generacije, zlasti tehnologija nanopor, omogočajo hitro sekvenciranje dolgih sekvenc DNA. Njihova slaba lastnost so visoka stopnja napak. V magistrski nalogi predstavljamo bioinformatski cevovod za obdelavo sekvenc mikrosatelitov, pridobljenih s sekvenciranjem tretje generacije. Za preizkus cevovoda smo uporabili sekvence mikrosatelitov, pridobljene iz neinvazvnih genetskih vzorcev rjavega medveda (Ursus arctos) in sekvencirane s sekvenatorjem Illumina. V njih smo simulirali substitucije, insercije in delecije v različnih kombinacijah ter ob različni stopnji skupnih napak. Poleg že uporabljenih DNA- oznak vzorcev dolžine 8 bp smo preizkusili še oznake dolžin 12 in 16 bp. Bioinformatski cevovod se je izkazal za učinkovitega samo s substitucijami, pri simuliranih vseh treh vrstah napak pa ne. Kljub temu smo ugotovili, da so trenutno uporabljene oznake dolžine 8 bp pri visokih stopnjah napak, posebej pri simuliranih vseh treh vrstah, neuporabne in je za uspešno identifikacijo vzorcev potrebna uporaba daljših, preferenčno 16 bp dolgih oznak. Ugotovili smo tudi, da se težave lahko pojavijo pri iskanju oligonukleotidnih začetnikov in posledično identifikaciji lokusov, ki jih označujejo. Našli smo šibke točke v cevovodu in predlagamo možne rešitve. Predstavljeni bioinformatski cevovod je tako primeren kot podlaga na nadaljnje delo.

Keywords

sekvenciranje tretje generacije;sekvenciranje z visokimi stopnjami napak;mikrosateliti;kratke sekvence DNA;

Data

Language: Slovenian
Year of publishing:
Typology: 2.09 - Master's Thesis
Organization: UL BF - Biotechnical Faculty
Publisher: [L. Močivnik]
UDC: 577.2(043.2)
COBISS: 55304195 Link will open in a new window
Views: 417
Downloads: 129
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Algorithm for reliable recognition of short DNA tag sequences in presence of high sequencing error rates
Secondary abstract: Third-generation sequencing technologies, especially nanopores, present the possibility of fast sequencing DNA and obtaining long reads. Their downsides are high error rates. In this thesis, we present a bioinformatics pipeline for processing microsatellite sequences obtained using third-generation sequencing. For testing, we used brown bear (Ursus arctos) microsatellite sequences obtained from non-invasive genetic samples. They were sequenced on the Illumina platform. In these sequences, we simulated substitutions, insertions, and deletions with various combinations and different total error rates. Aside from the previously used 8 bp DNA tags for sample marking, we also tested longer 12 and 16 bp tags. Our bioinformatics pipeline was effective when dealing with substitutions only. It was ineffective when all three error types were simulated. Nonetheless, we found that the currently used 8 bp tags are not useful at high error rates, especially when dealing with all three error types. We also found issues with the primer search, and, consequently, identification of loci that are marked by the primers. We identified weak points in the pipeline and thus suggest possible solutions. The presented bioinformatics pipeline should therefore provide a useful basis for further work.
Secondary keywords: third generation sequencing;sequencing with high error rates;microsatellites;short DNA sequences;
Type (COBISS): Master's thesis/paper
Study programme: 0
Embargo end date (OpenAIRE): 1970-01-01
Thesis comment: Univ. Ljubljana, Biotehniška fak.
Pages: IX, 85, [6] f.
ID: 12608546