Janez Brezovnik (Author), Milan Ojsteršek (Author)

Abstract

A natural language processing framework called TextProc is described in this paper. First the frameworks software architecture is described. The architecture is made of several parts and all of them are described in detail. Natural language processing capabilities are implemented as software plug-ins. Plug-ins can be put together into processes that perform a practical natural processing function. Several practical TextProc processes are briefly described, like part-of-speech tagging, named entity tagging and others. One of those is capable to perform plagiarism detection on texts in Slovenian language, which is explained in detail. This process is actually used in digital library of University of Maribor. The integration of digital library with TextProc is also briefly described. At the end of this paper some ideas for future development are given.

Keywords

natural language processing;text processing;text mining;Slovenian language;plagiarism detection;

Data

Language: English
Year of publishing:
Typology: 1.01 - Original Scientific Article
Organization: UM FERI - Faculty of Electrical Engineering and Computer Science
UDC: 004.777
COBISS: 14856982 Link will open in a new window
ISSN: 2074-1316
Views: 2098
Downloads: 68
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary keywords: procesiranje naravnih jezikov;tekstovno procesiranje;detekcija plagiatov;slovenski jezik;
URN: URN:SI:UM:
Pages: str. 293-300
Volume: ǂVol. ǂ5
Issue: ǂiss. ǂ3
Chronology: 2011
ID: 8718519