magistrsko delo
Abstract
V raziskavah in iskanju novih zdravil nam danes pomagajo računalniške tehnologije. Že ustvarjene podatkovne zbirke so na primer lahko uporabljene za klasifikacijo kemijskih struktur. V nalogi nas je zanimala kvaliteta vektorskih vložitev kemijskih struktur z avtokodirnikom pri problemih klasifikacije, kjer bi jih lahko uporabljali kot alternativo že uveljavljenim prstnim odtisom. Arhitektura avtokodirnika sledi trendom raziskav, kjer so uporabljene konvolucijske plasti in rekurenčne enote z vrati. Samo kvaliteto vložitev smo ocenjevali na realnih podatkovnih zbirkah aktualnih učinkovin. Raziskave so pokazale, da so vektorske vložitve primerljive z že razvitimi prstnimi odtisi. Na nekaterih primerih nudi vektorske predstavitve učinkovin, ki izboljšajo točnost uporabljenih tehnik strojnega učenja. Razvili smo tudi gradnik za odprto-kodno programsko opremo Orange, ki omogoča vektorsko vložitev kemijskih struktur v notaciji SMILES tako z metodo razvito v nalogi, kot tudi z ostalimi prstnimi odtisi uporabljenimi v nalogi.
Keywords
vektorske vložitve;samokodirnik;klasifikacija;zapis SMILES;računalništvo;računalništvo in informatika;magisteriji;
Data
Language: |
Slovenian |
Year of publishing: |
2019 |
Typology: |
2.09 - Master's Thesis |
Organization: |
UL FRI - Faculty of Computer and Information Science |
Publisher: |
[B. Golobič] |
UDC: |
004:544.188(043.2) |
COBISS: |
1538418883
|
Views: |
703 |
Downloads: |
180 |
Average score: |
0 (0 votes) |
Metadata: |
|
Other data
Secondary language: |
English |
Secondary title: |
Vector embedding of chemical compounds |
Secondary abstract: |
Recent developments in computational techniques have advanced drug discovery and design. For example, standard databases with known chemicals and their modes of actions can be considered by machine learning to classify new drugs. Here, we were interested in the vectorized presentations of the structure of small molecules, a crucial first step towards any data analytics in computational chemistry. Vectorized presentations were inferred through the construction of autoencoders. We followed the current literature trends and used a combination of convolutional and recurrent layers. Experimental results show that our model is comparable to standard chemical fingerprints, where on some of the test databases even provides for improved accuracy. We published the code to infer the embedder in open source on the GitHub repository and included the embedder within the fingerprinting widget for Orange data mining suite. |
Secondary keywords: |
vector embeddings;autoencoder;classification;SMILES notation;computer science;computer and information science;master's degree; |
Type (COBISS): |
Master's thesis/paper |
Study programme: |
1000471 |
Embargo end date (OpenAIRE): |
1970-01-01 |
Thesis comment: |
Univ. v Ljubljani, Fak. za računalništvo in informatiko |
Pages: |
48 str. |
ID: |
11244023 |