Secondary abstract: |
Named entity recognition from natural language texts is getting more important every day, because it helps user with text manipulation. Technologies developed in last decades are able to produce really good result with information retrieval from natural texts.
In this diploma thesis we made brief representation of available solutions for named entity recognition in law texts. We want to recognize as many Named entities as possible so we can use them to make hyperlinks to referring documents. In combination of multiple named entities we can get additional information of observed document.
We described properties of available solutions for named entity recognition. Afterwards we tested named entity recognition on Slovenian law texts with two solutions – Stanford CoreNLP, and our own solution - application NERInLaw, with the use of CRFsuite.
We tested both solutions on hand marked law texts, where we marked multiple named entities. We divided the texts into learning set and test set, so we were able to evaluate the results. Tests were made with the use of different set of attribute functions, so we could see the difference in results and see which functions are more important for the system. Another important property of testing was the speed of tested solutions. With large dataset, it is important that we get results as fast as possible.
Our implementation got really good results with some basic settings. We are sure that with the future work, we could get even better results. Another good thing is, that current implementation could be easily used for other languages than Slovenian with some minor changes. |