Language: | Slovenian |
---|---|
Year of publishing: | 2011 |
Typology: | 1.16 - Independent Scientific Component Part or a Chapter in a Monograph |
Organization: | UL FF - Faculty of Arts |
UDC: | 801.8=163.6:81'322.2:004.738.5 |
COBISS: | 47262818 |
Views: | 5 |
Downloads: | 0 |
Average score: | 0 (0 votes) |
Metadata: |
Secondary language: | English |
---|---|
Secondary abstract: | This paper presents a new method for definition extraction from Slovene domain-specific corpora, based on a model for definition classification learned using machine-learning methods on examples from Slovene Wikipedia. In the first step we extract definition candidates using a Slovene semantic lexicon, automatic terminology recognition and lexico-syntactic patterns. Next, we use the learned classification model to select ŽtrueŽ definitions from the set of definition candidates. The method was tested on a natural science domain corpus from which we extracted more than a thousand definition candidates and achieved up to 70% classification accuracy with the learned classification model. |
Secondary keywords: | corpus linguistics;Slovene language;definition extraction;information extraction;natural language processing;machine learning;information retrieval; |
Type (COBISS): | Article |
Pages: | Str. 145-150 |
ID: | 19892444 |