Text analysis with sequence matching

Marko Ferme (Avtor), Milan Ojsteršek (Avtor)

Povzetek

This article describes some common problems faced in natural language processing. The main problem consist of a user given sentence, which has to be matched against an existing knowledge base, consisting of semantically described words or phrases. Some main problems in this process are outlined and the most common solutions used in natural language processing are overviewed. A sequence matching algorithm is introduced as an alternative solution and its advantages over the existing approaches are explained. The algorithm is explained in detail where the longest subsequences discovery algorithm is explained first. Then the major components of the similarity measure are defined and the computation of concurrence and dispersion measure is presented. Results of the algorithms performance on a test set are then shown and different implementations of algorithm usage are discussed. The work is concluded with some ideas for the future and some examples where our approach can be practically used.

Ključne besede

sequence matching;subsequence analysis;similarity measure;fuzzy string search;phrase detection;

Podatki

Jezik:	Angleški jezik
Leto izida:	2011
Tipologija:	1.01 - Izvirni znanstveni članek
Organizacija:	UM FERI - Fakulteta za elektrotehniko, računalništvo in informatiko
UDK:	004.77
COBISS:	14857750
ISSN:	1998-4308
Št. ogledov:	1501
Št. prenosov:	46
Ocena:	0 (0 glasov)
Metapodatki:

Ostali podatki

Sekundarni jezik:	Angleški jezik
URN:	URN:SI:UM:
Strani:	str. 235-242
Letnik:	ǂVol. ǂ5
Zvezek:	ǂiss. ǂ2
Čas izdaje:	2011
ID:	8718521