Jezik: | Slovenski jezik |
---|---|
Leto izida: | 2011 |
Tipologija: | 1.16 - Samostojni znanstveni sestavek ali poglavje v monografski publikaciji |
Organizacija: | UL FF - Filozofska fakulteta |
UDK: | 801.8=163.6:81'322.2:004.738.5 |
COBISS: | 47262818 |
Št. ogledov: | 5 |
Št. prenosov: | 0 |
Ocena: | 0 (0 glasov) |
Metapodatki: |
Sekundarni jezik: | Angleški jezik |
---|---|
Sekundarni povzetek: | This paper presents a new method for definition extraction from Slovene domain-specific corpora, based on a model for definition classification learned using machine-learning methods on examples from Slovene Wikipedia. In the first step we extract definition candidates using a Slovene semantic lexicon, automatic terminology recognition and lexico-syntactic patterns. Next, we use the learned classification model to select ŽtrueŽ definitions from the set of definition candidates. The method was tested on a natural science domain corpus from which we extracted more than a thousand definition candidates and achieved up to 70% classification accuracy with the learned classification model. |
Sekundarne ključne besede: | corpus linguistics;Slovene language;definition extraction;information extraction;natural language processing;machine learning;information retrieval; |
Vrsta dela (COBISS): | Članek v reviji |
Strani: | Str. 145-150 |
ID: | 19892444 |