Text analysis with sequence matching

Marko Ferme (Author), Milan Ojsteršek (Author)

Abstract

This article describes some common problems faced in natural language processing. The main problem consist of a user given sentence, which has to be matched against an existing knowledge base, consisting of semantically described words or phrases. Some main problems in this process are outlined and the most common solutions used in natural language processing are overviewed. A sequence matching algorithm is introduced as an alternative solution and its advantages over the existing approaches are explained. The algorithm is explained in detail where the longest subsequences discovery algorithm is explained first. Then the major components of the similarity measure are defined and the computation of concurrence and dispersion measure is presented. Results of the algorithms performance on a test set are then shown and different implementations of algorithm usage are discussed. The work is concluded with some ideas for the future and some examples where our approach can be practically used.

Keywords

sequence matching;subsequence analysis;similarity measure;fuzzy string search;phrase detection;

Data

Language:	English
Year of publishing:	2011
Typology:	1.01 - Original Scientific Article
Organization:	UM FERI - Faculty of Electrical Engineering and Computer Science
UDC:	004.77
COBISS:	14857750
ISSN:	1998-4308
Views:	1501
Downloads:	46
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
URN:	URN:SI:UM:
Pages:	str. 235-242
Volume:	ǂVol. ǂ5
Issue:	ǂiss. ǂ2
Chronology:	2011
ID:	8718521