diplomsko delo
Matic Di Batista (Avtor), Marko Bajec (Mentor)

Povzetek

Označevanje imenskih entitet v pravnih besedilih

Ključne besede

iskanje imenskih entitet;oblikoskladenjske oznake;pogojna naključna polja;Stanford CoreNLP;CRFsuite;računalništvo;univerzitetni študij;diplomske naloge;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.11 - Diplomsko delo
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [M. Di Batista]
UDK: 004.6:004.8(043.2)
COBISS: 10232660 Povezava se bo odprla v novem oknu
Št. ogledov: 48
Št. prenosov: 4
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: Named entity recognition in legal documents
Sekundarni povzetek: Named entity recognition from natural language texts is getting more important every day, because it helps user with text manipulation. Technologies developed in last decades are able to produce really good result with information retrieval from natural texts. In this diploma thesis we made brief representation of available solutions for named entity recognition in law texts. We want to recognize as many Named entities as possible so we can use them to make hyperlinks to referring documents. In combination of multiple named entities we can get additional information of observed document. We described properties of available solutions for named entity recognition. Afterwards we tested named entity recognition on Slovenian law texts with two solutions – Stanford CoreNLP, and our own solution - application NERInLaw, with the use of CRFsuite. We tested both solutions on hand marked law texts, where we marked multiple named entities. We divided the texts into learning set and test set, so we were able to evaluate the results. Tests were made with the use of different set of attribute functions, so we could see the difference in results and see which functions are more important for the system. Another important property of testing was the speed of tested solutions. With large dataset, it is important that we get results as fast as possible. Our implementation got really good results with some basic settings. We are sure that with the future work, we could get even better results. Another good thing is, that current implementation could be easily used for other languages than Slovenian with some minor changes.
Sekundarne ključne besede: named entity recognition;part of speech;conditional random fields;Stanford CoreNLP;CRFsuite;computer science;diploma;
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Diplomsko delo
Komentar na gradivo: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Strani: 59 str.
ID: 24199462
Priporočena dela:
, diplomsko delo
, diplomsko delo
, diplomsko delo na univerzitetnem študiju