Gregor Donaj (Author), Zdravko Kačič (Author)

Abstract

The incorporation of grammatical information into speech recognition systems is often used to increase performance in morphologically rich languages. However, this introduces demands for sufficiently large training corpora and proper methods of using the additional information. In this paper, we present a method for building factored language models that use data obtained by morphosyntactic tagging. The models use only relevant factors that help to increase performance and ignore data from other factors, thus also reducing the need for large morphosyntactically tagged training corpora. Which data is relevant is determined at run-time, based on the current text segment being estimated, i.e., the context. We show that using a context-dependent model in a two-pass recognition algorithm, the overall speech recognition accuracy in a Broadcast News application improved by 1.73% relatively, while simpler models using the same data achieved only 0.07% improvement. We also present a more detailed error analysis based on lexical features, comparing first-pass and second-pass results.

Keywords

govorne tehnologije;razpoznavanje govora;avtomatsko razpoznavanje govora;speech recognition;factored language model;dynamic backoff path;word context;inflectional language;morphosyntactic tags;

Data

Language: English
Year of publishing:
Typology: 1.01 - Original Scientific Article
Organization: UM FERI - Faculty of Electrical Engineering and Computer Science
UDC: 004.934
COBISS: 20330774 Link will open in a new window
ISSN: 1687-4722
Views: 1139
Downloads: 319
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary keywords: govorne tehnologije;razpoznavanje govora;avtomatsko razpoznavanje govora;
URN: URN:SI:UM:
Type (COBISS): Scientific work
Pages: str. 1-16
Volume: ǂVol. ǂ2017
Issue: ǂno. ǂ6
Chronology: 2017
DOI: 10.1186/s13636-017-0104-6
ID: 10844541
Recommended works:
, no subtitle data available
, diplomska naloga univerzitetnega študijskega programa
, diplomska naloga visokošolskega študijskega programa
, diplomska naloga univerzitetnega študijskega programa