diploma thesis
Abstract
Structured and unstructured textual data requires efficient representation for computation and manipulation. Many different methods have been developed to represent text in numerical form. Some of these methods are based only on statistical metrics, and some introduce the concept of word context. Structured textual data about concepts and entities is stored in knowledge graphs for which different numerical representations have been developed. By using the facts about concepts, semantics can be introduced into the representation of documents. We propose an approach that merges the knowledge base induced numerical representation of texts and entities that appear in the texts, induced from knowledge bases. We analyze the proposed method using two use cases. The results show that the use of external knowledge significantly improves the performance of machine learning models. We show that the proposed method outperforms non-enriched representations.
Keywords
knowledge graphs;word embedding;knowledge graph embedding;natural language processing;computer and information science;diploma thesis;
Data
Language: |
English |
Year of publishing: |
2020 |
Typology: |
2.11 - Undergraduate Thesis |
Organization: |
UL FRI - Faculty of Computer and Information Science |
Publisher: |
[B. Koloski] |
UDC: |
004.85:81'322(043.2) |
COBISS: |
30743555
|
Views: |
1094 |
Downloads: |
251 |
Average score: |
0 (0 votes) |
Metadata: |
|
Other data
Secondary language: |
Slovenian |
Secondary title: |
Obogatitev dokumentnih vložitev z grafi znanja |
Secondary abstract: |
Strukturirani in nestrukturirani tekstovni podatki zahtevajo učinkovito predstavitev za računanje in obdelavo. Za predstavitev besedila v številčni obliki, je bilo razvitih veliko različnih metod. Del teh metod temelji zgolj na statističnih metrikah, nekatere pa uvedejo koncept konteksta besede. Strukturirane tekstovni podatki o konceptih in entitetah so shranjeni v grafih znanja, za katere so bile razvite številne numerične predstavitve. Z uporabo dejstev o konceptih lahko semantiko vnesemo v predstavitev dokumentov. Predlagamo pristop, ki združuje številčno predstavitev besedil in entitet, ki se pojavljajo v besedilih iz baz znanja. Predlagano metodo analiziramo s pomočjo dveh primerov uporabe. Rezultati kažejo, da uporaba zunanjega znanja bistveno izboljša uspešnost modelov strojnega učenja. Poleg tega pokažemo, da predlagana metoda presega neobogatene predstavitve. |
Secondary keywords: |
podatkovni grafi;vektorske vložitve besed;vložitve podatkovnih grafov;procesiranje naravnega jezika;računalništvo in informatika;univerzitetni študij;diplomske naloge; |
Type (COBISS): |
Bachelor thesis/paper |
Study programme: |
1000468 |
Embargo end date (OpenAIRE): |
1970-01-01 |
Thesis comment: |
Univ. v Ljubljani, Fak. za računalništvo in informatiko |
Pages: |
54 str. |
ID: |
12033042 |