
|
4 Krasutskogo str Saint Petersburg 196084 Russia tel. +7 812 331-0665 fax: +7 812 327-9297 redactar una carta
| |
|
21.10.2008
MORPHOLOGICAL RANDOM FORESTS FOR LANGUAGE MODELING OF INFLECTIONAL LANGUAGES
In this paper, we are concerned with using decision trees (DT) and random forests (RF) in language modeling for Czech LVCSR. We show that the RF approach can be successfully implemented for language modeling of an inflectional language. Performance of word-based and morphological DTs and RFs was evaluated on lecture recognition task. We show that while DTs perform worse than conventional trigram language models (LM), RFs outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction). In this paper we also investigate distribution of morphological feature types chosen for splitting data at different levels of DTs.
21.10.2008
Inflectional Language Modeling with Random Forests for ASR
In this paper we show that the Random Forest (RF) approach can be successfully implemented for language modeling of an inflectional language for Automatic Speech Recognition (ASR) tasks. While Decision Trees (DTs) perform worse than a conventional trigram language model (LM), RFs outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction).
÷èòàòü äàëåå
20.10.2008
Large Scale Russian Hybrid Unit Selection TTS
This paper outlines a project on the development of a new hybrid unit-selection and concatenative Russian TTS system. Project is held within Federal Research and Development Program in Priority Directions of Development of Scientific and Technological Complex of Russia in 2007-2012. A new generation Russian TTS that makes use of syntactic and semantic analysis and can be implemented in various types of electronic devices is the major aim of the project.
÷èòàòü äàëåå
18.10.2007
Outline of a New Hybrid Russian TTS System
This paper outlines a recently started project on development of a new hybrid unit-selection and
concatenative Russian TTS system. Project is held within Federal Research and Development Program in Priority Directions of Development of Scientific and Technological Complex of Russia in 2007-20121 (http://www.fcntp.ru/). Major features of the proposed system are presented. Stimulating a wide scientific discussion that would help to improve the system at the
early stages is the main aim of the paper.
÷èòàòü äàëåå
16.10.2007
Eigen Channel Method for Text-Independent Russian Speaker Verification
The method for compensation of session variability in text-independent speaker verification is presented in this paper. It is based on maximum likelihood estimations for speaker sessions modelling. The method is shown to reduce the verification error by 21% for 4-second and by 36% for 20-second testing segments comparing to the GMM-UBM baseline. The evaluation was performed for conversational speech recorded in GSM channels.
÷èòàòü äàëåå
16.10.2007
USING PARAMETERS OF IDENTICAL PITCH CONTOUR ELEMENTS
A formalized approach to pitch-based speaker discrimination using identical components of pitch contour structure is presented. The designed list of pitch units includes 7 basic unit types (16 subtypes). Each unit is described with a set of relevant pitch parameters. The effectiveness of three unit types (nuclear fall, nuclear rise and a filled hesitation pause) was tested on a 10-male speech corpus first on 2-session and 3-session data. The results show a positive discriminating potential of certain pitch parameters. The lowest EER values were obtained for the so-called “physical” F0 parameters of a rising nucleus (18% for F0 minimum in a 2-session comparison and 22% for F0 mean in a 3-session comparison). The fusion of all parameters for the three contour unit types produced an EER of 13% in a 3-session comparison.
÷èòàòü äàëåå
15.10.2007
Phone Recognition driven Method for Creating Context-Dependent Phones
Progress in the development of the Large Vocabulary Speech Recognition System created at Speech Technology Center is presented in this paper. The most widely used method for creating context-dependent phones is based on growing a decision tree with branches defined by binary questions, concerning neighbor phones. The list of questions may vary and is to some extent arbitrary. Decision on splitting is based on the behavior of entropy.
A new method based on recognition scores of phones is proposed. At each step of the algorithm additional models are introduced whenever existent models of monophones or triphones are poorly recognized. Retraining of the new model rearranges the training data between all models.
The problem of unseen triphones is solved by introducing a measure of similarity of contexts and
by clustering monophones according to this measure.
Context-dependent phones obtained with this method were tested on the task of keyword spotting. This limited task was chosen due to known and not properly solved problems concerned with creating decoder for LVCSR.
÷èòàòü äàëåå
10.08.2007
SPEAKER IDENTIFICATION USING SELECTIVE COMPARISON OF
A method of selective pitch data comparison for speaker identification is presented. Pitch parameters of rising and falling nuclear monosyllables and filled hesitation pauses are evaluated for their discriminating ability using F-ratio and EER measures obtained on a 10-male 3-session speech database. “Physical” F0 parameters providing 20%- 30% EER in isolation proved more effective than linguistically conditioned ones. Using all parameters in combination produced an EER of 13%.
Directions of future research are outlined and the scope of possible method application in forensic tasks is scussed.
÷èòàòü äàëåå
29.06.2007
Frequency-Domain Auditory Suppression Modelling (FASM) – A WDFT-Based Anthropomorphic Noise-Robust Feature Extraction Algorithm For Speech Recognition
This paper presents a physiologically inspired feature
extraction algorithm for employment within the speech
recognition engines, which are supposed to remain effective in noisy environments. Essentially, the algorithm simulates a key property of the “active cochlea” models – a signal dependent variable gain over the frequency range. In order to drastically reduce computational complexity of the algorithm in comparison to the original time domain “active cochlea” models, it is implemented in the frequency domain with the help of a warped discrete Fourier transformation (WDFT). The essence of FASM technique is that in the presence of the noise, higher frequency channels get more attenuation if there
are “enough” signal components in the lower, less susceptible to the noise influence, part of the spectrum. As it is confirmed by the performed measurements FASM algorithm allows to boost feature invariance to noise while keeping feature informativeness at the acceptable level.
read more...
29.06.2007
Building Acoustic Models for a Large Vocabulary Continuous Speech Recognizer for Russian
Different types of acoustic models created at Speech
Technology Center are evaluated in this paper. Our main goal was to test how well those models work and choose one model for implementation in a large vocabulary continuous speech recognition (LVCSR) system for Russian which is under development now. Context-independent discrete and continuous models, as well as context-dependent continuous models, were built and evaluated on an isolated word recognition task. The results gained with the contextdep endent
continuous model prove its consistency and show it
can be used for acoustic modelling in a large vocabulary
speech recognizer.
read more...
27.06.2007
AUDIO SEARCH AND MINING: A Look at Unstructured Data
Search engines such as Google are receiving about 1 billion search requests per day. Taking the ability to query text and applying it to voice opens up many areas of opportunity. As it pertains to the Web, users can search audio files and audiovideo feeds. Enterprises can use this technology to find important customer concerns and even potentially enable employees to search voicemails or recorded calls for key words and phrases. It would seem that the sky is the limit for this technology, which is why Speech Technology magazine pulled together a group of experts to determine the limitations and opportunities for the audio search and mining industry.
read more...
20.06.2007
Flexible Rule-Based Break Assignment for Russian
This paper introduces a linguistically motivated rule-based method of automatic segmentation of a Russian text in intonational units. We claim that it can provide a high correct prosodic break assignment rate (92.1% junctures correct for spontaneous speech), staying flexible at the same time, which is especially important for rule-based models. The kernel of the proposed algorithm, in addition to the use of punctuation, is automatic morphological analysis, which, being a complicated construct, is the base of a very simple interface grammar. This contextual grammar performs as an intermediate between a linguist and the core of the model, allowing the former to tune the whole system in a very efficient and intuitive way. The algorithm we present in this paper is implemented in the recent “Orator” text-to-speech system for Russian.
read more...
02.06.2007
Analysis of the IHC Adaptation for the Anthropomorphic Speech Processing Systems
We analyse the properties of the physiological model of the adaptive behaviour of the chemical synapse between inner hair cells (IHC) and auditory neurons. On the basis of the performed analysis, we propose equivalent structures of the model for implementation in the digital domain. The main conclusion of the analysis is that the synapse reservoir model is equivalent in its properties to the signal-dependent automatic gain-control mechanism. We plot guidelines for creation of artificial anthropomorphic algorithms, which exploit properties of the original synapse model. This paper also presents a concise description of the experiments, which prove the presence of the positive effect from the introduction of the depicted anthropomorphic algorithm into feature extraction of the automated speech recognition engine.
Keywords and phrases: inner hair cell (IHC),Meddis IHC model, IHC adaptation, auditory models, modulation spectrum filtering.
read more...
14.05.2007
Neuromorphic audio processing: A model simulation of the way auditory neurons encode signals
Neuromorphic technology sets a goal of artificial recreation of the algorithms, which govern information processing in the live brain. The most vivid question here is a characterisation of the way real neurons relay information to each other. The present paper contains a concise description of a modelling approach aimed at quantification of the information transfer in the ascending auditory neural pass-ways.
read more...
12.05.2007
Speaker recognition system for standard telephone network
This paper introduces a novel automatic speaker recognition system. The system includes a database of telephone speech templates and personal data for selected target speakers. The system performs noise cancellation and spectral normalization of the speech signal in the communication channel. Then automatic speech signal segmentation of the telephone conversation is performed to single out speech of a separate speaker. Having this speech sample a search is done over the database and an answer given how close this target speaker is to the known speakers. Speech spectral characteristics (formants) are compared. The reliability test gives 16% EER when both known and unknown speech samples are of 16 sec duration and 8% EER for 96 sec. The test was done on speech data of RUSTEN database (Russian telephone speech, 120 speakers * 5 sessions).
read more...
19.04.2007
Stem-Based Approach to Pronunciation Vocabulary Construction and Language Modeling for Russian
This paper deals with the problem of pronunciation dictionary construction for large vocabulary continuous speech recognition (LVCSR). Here we present a method of constructing a stem-inflexion pronunciation vocabularies designed to use with a stem-based language model (LM) of Russian as an inflective language. We show that stem-based approach leads to 13 times reduction in the size of pronunciation vocabularies. On the base of this method we developed a tool called STcEM. STcEM was applied to several LVCSR tasks – to stemming of a free text and to the high-quality transcribing of stems and inflexions into monophones. We report here the results achieved while solving these two tasks. The related subtasks of context homography disambiguation, stress and ¸ place detection are discussed in details.
read more...
02.03.2007
Using Speech Synthesis in Keyword Spotting
The technology for unlimited vocabulary automatic keyword spotting in spontaneous Russian speech is presented in this paper. We propose a novel speech database search system. It is based on the ideas of word pattern recognition and speech synthesis. Keywords to be searched are input in the text form and corresponding speech signals are synthesized by a text-tospeech (TTS) system. These signals are used as training material for a recognizer based on the dynamic programming approach. Evaluation of the system was performed for telephone and microphone channels. In the latter case, for a limited number of keywords searched simultaneously we gain
83% of hits without speaker adaptation (false alarm at 9.3%), and 99% of hits with speaker adaptation (17.5% of false alarms). Hit rate in noisy telephone channel is 78.5%, while false alarm rate is 60%.
read more...
07.02.2007
Noisy speech text decoding by experts based on psychoacoustic approach
Poor quality speech transcription is an everyday challenge in many applications, e.g. sound archives processing, forensic audio, etc. However speech intelligibility is often quite low due to noisy recording conditions or inadequate recording equipment performance. Moreover, usually it is impossible to get a new clearer recording or to enhance useful speech essentially. So listeners have to deal with degraded speech. This report describes some principles of noisy speech recordings processing for improvement of message intelligibility and quality and some methods to improve speech decoding speed and reliability. They are based on demasking and normalization of audio signals and on usage of binaural and adaptive properties of human hearing.
The discussed methods have a proven working track in STC and other services performing speech transcription by orders of courts, law enforcement, emergency and accident investigation services, etc.
read more...
12.01.2007
Robust Rule-Based Method for Automatic Break Assignment in Russian Texts
In this paper a new rule-based approach to break assignment for the Russian language is discussed. It is a flexible and robust method of segmentation of texts in Russian in prosodic units. We implemented it in the recent “Orator” text-to-speech (TTS) system. The model was
developed to use for the inflective languages as an alternative both for statistic and for strict rule-based algorithms. It is designed in such a way that all potentially tunable context dependencies are brought up to the
interface grammar and can be easily modified by linguists. The algorithm we developed performs well on different kinds of texts due to this simple and intuitive grammar built upon an elaborate mechanism of morphogrammatical
analysis. Juncture correct rate varies between more than
98% for simple literary texts and 85% for raw transcripts of spontaneous speech.
read more...
|
|
 |
|
|
05.12.2008
Los trabajos para la creación del producto común e integración con Genesys Voice Platform (GVP) se llevaron el último medio año. En la compañía STC se elaboró un servidor MRCP con soporte de los protocolos MRCPv1 y MRCPv1.
Detalladamente...
04.12.2008
Speech Technology Center y Centro Interregional de Cobro de Deudas llevaron a cabo la integración de sus productos: sistema de aviso automático “Rupor” y sistema de control de cobro de deudas “Historias de créditos” (SCCD), para organizar el cobro fiable y rápido de deudas en los bancos y agencias colectoras.
Detalladamente...
11.11.2008
 STC en INTREPOLITECH
En Moscú finalizó la XII Feria Internacional dedicada a los medios de seguridad de estado "Interpolitech". La participación en esta Feria es tradicional para Speech Technology Center hace mucho.
Detalladamente...
|
|
 |
Subscribe to Newsletter
|