diplomsko delo
Matic Di Batista (Author), Marko Bajec (Mentor)

Abstract

Označevanje imenskih entitet v pravnih besedilih

Keywords

iskanje imenskih entitet;oblikoskladenjske oznake;pogojna naključna polja;Stanford CoreNLP;CRFsuite;računalništvo;univerzitetni študij;diplomske naloge;

Data

Language: Slovenian
Year of publishing:
Typology: 2.11 - Undergraduate Thesis
Organization: UL FRI - Faculty of Computer and Information Science
Publisher: [M. Di Batista]
UDC: 004.6:004.8(043.2)
COBISS: 10232660 Link will open in a new window
Views: 48
Downloads: 4
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: English
Secondary title: Named entity recognition in legal documents
Secondary abstract: Named entity recognition from natural language texts is getting more important every day, because it helps user with text manipulation. Technologies developed in last decades are able to produce really good result with information retrieval from natural texts. In this diploma thesis we made brief representation of available solutions for named entity recognition in law texts. We want to recognize as many Named entities as possible so we can use them to make hyperlinks to referring documents. In combination of multiple named entities we can get additional information of observed document. We described properties of available solutions for named entity recognition. Afterwards we tested named entity recognition on Slovenian law texts with two solutions – Stanford CoreNLP, and our own solution - application NERInLaw, with the use of CRFsuite. We tested both solutions on hand marked law texts, where we marked multiple named entities. We divided the texts into learning set and test set, so we were able to evaluate the results. Tests were made with the use of different set of attribute functions, so we could see the difference in results and see which functions are more important for the system. Another important property of testing was the speed of tested solutions. With large dataset, it is important that we get results as fast as possible. Our implementation got really good results with some basic settings. We are sure that with the future work, we could get even better results. Another good thing is, that current implementation could be easily used for other languages than Slovenian with some minor changes.
Secondary keywords: named entity recognition;part of speech;conditional random fields;Stanford CoreNLP;CRFsuite;computer science;diploma;
File type: application/pdf
Type (COBISS): Undergraduate thesis
Thesis comment: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages: 59 str.
ID: 24199462
Recommended works:
, diplomsko delo
, diplomsko delo
, diplomsko delo na univerzitetnem študiju