Spelling suggestions: "subject:"epeech inn noise"" "subject:"epeech iin noise""
1 |
Signal enhancement based on multivariable adaptive noise cancellationHung, Chih-Pin January 1995 (has links)
No description available.
|
2 |
Statistical models for noise-robust speech recognitionvan Dalen, Rogier Christiaan January 2011 (has links)
A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. This thesis looks into improving on this approximation in two ways: firstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution. The first part of this work is about compensating for within-component feature correlations under noise. For this, the covariance matrices of the computed Gaussians should be full instead of diagonal. The estimation of off-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, like VTS compensation, use for dynamic coefficients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coefficients, and dynamic coefficients, which represent signal changes over time, and are normally computed from a window of static coefficients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It first compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coefficients. It introduces a number of methods that address the correlation changes that occur in noise within this framework. The next problem is decoding speed with full covariances. This thesis re-analyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost. The second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to use for recognition, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. The KL divergence proves to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make.
|
3 |
Subspace Gaussian mixture models for automatic speech recognitionLu, Liang January 2013 (has links)
In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach — the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored. In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database.
|
4 |
The Effects of Spectral Smearing and Elevated Thresholds on Speech in Noise Recognition in Simulated Electric-Acoustic HearingMulder, Aretha January 2014 (has links)
Combined Electric-acoustic stimulation (EAS) is becoming an increasingly viable treatment option for individuals with sloping severe to profound high frequency hearing loss and residual low frequency hearing. Sound stimulation via EAS is delivered to the high frequency region electrically using cochlear implantation, and to the low frequency region acoustically with or without amplification from hearing aids. This combined mode of stimulation often results in improved speech recognition in background noise compared to either mode of stimulation in isolation. It is important to note that many EAS listeners have some degree of hearing loss in the low frequency region, and may experience associated effects such as reduced frequency selectivity and elevated audiometric thresholds. This study simulated EAS listening in 20 normal hearing listeners by combining vocoded high frequency sound with low frequency sound. Low frequency sound was further manipulated by applying varying degrees of spectral smearing and attenuation to the low frequency region in an EAS simulation, to simulate changes in frequency selectivity and sensitivity that usually accompany sensorineural hearing loss. The aim of this study was to investigate the effects of spectral smearing and attenuation of low frequency information on the identification of vocoded speech in noise. Participants were required to complete a sentence recognition task in the presence of competing talkers for six simulated listening conditions with varying degrees of processing in the low frequency region. Results indicated that the advantage for speech in noise of simulated combined EAS over simulated electric stimulation alone was 3.9 dB when low frequency sound was unprocessed, 2.9 dB when low frequency sound had spectral smearing of x3 applied, and 2.4 dB when low frequency sound had spectral smearing of x6 applied. When 30 dB attenuation was applied as well as x3 spectral smearing, no significant benefit was observed. When 60 dB attenuation was applied as well as x3 spectral smearing, a significant negative relationship was found, with a 3 dB disadvantage found for simulated EAS compared to simulated electric stimulation alone. Overall, the results of this study indicate that there is indeed a significant improvement in speech recognition in a background of competing speakers with simulated EAS compared to simulated electric stimulation only. However, when reduced hearing thresholds were simulated for the residual low frequency hearing, we found that this benefit was either absent or reversed. These results therefore support the use of amplification for individuals with reduced hearing thresholds in the low frequencies in order to utilize the benefit they are able to achieve with combined EAS.
|
5 |
Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free EnvironmentLaverty, Stephen William. January 2005 (has links)
Thesis (M.S.) -- Worcester Polytechnic Institute. / Keywords: automobile; noise reduction; passing vehicle noise; hands-free Includes bibliographical references (p. 89-91).
|
6 |
Evaluating Speech-in-Noise Performance of Bilateral Cochlear Implant PerformanceLim, Stacey 20 August 2013 (has links)
No description available.
|
7 |
Maturation of speech-in-noise performance in children using binaural diotic and antiphasic digits-in-noise testingWolmarans, Jenique January 2019 (has links)
Digits-in-noise (DIN) tests have become very popular over the past 15 years for hearing loss detection. Several recent studies have highlighted the potential utility of digits-in-noise (DIN) as a school-aged hearing test. However, age may influence test performance in children. In addition, a new antiphasic stimulus paradigm has been introduced. This study determined the maturation of speech recognition for diotic and antiphasic DIN in children and evaluated DIN self-testing in young children. A cross-sectional, quantitative, quasi-experimental research design was used in this study. Participants with confirmed normal hearing were tested with diotic and antiphasic DIN test. During the DIN test, arrangements of three spoken digits were presented in noise via headphones at varying signal-to-noise ratios (SNRs). The researcher entered each three-digit sequence the participant said on a smartphone keypad. Six hundred and twenty-one normal hearing (bilateral pure tone threshold of ≤ 20 dB HL at 1, 2, and 4kHz) children between the ages of 6-13 years with normal hearing were recruited in order to examine the comparative maturation of diotic and antiphasic performance. A further sample of 30 first grade (7-year-old) children with normal hearing were recruited to determine the validity of self-testing on a smartphone. Multiple regression analysis including age, gender, and English additional language (i.e. Person whose first language or home language is not English) showed only age to be a significant predictor for both diotic and antiphasic SRT (p < 0.05). Speech reception thresholds improved by 0.15 dB and 0.35 dB SNR per year for diotic and antiphasic SRT, respectively. Post hoc multiple age group comparisons using Bonferroni adjustment for multiple comparisons (by year) showed SRTs for young children (6 to 9 years old) differed significantly from older children (11 to 13 years old) (p < 0.05). There was no significant difference in SRT between age 10 and upward. Self- and facilitated testing in young children was significantly (p > 0.05) different for the antiphasic condition and demonstrated poor reliability in diotic and antiphasic conditions. Increasing age was significantly associated with improved SRT using diotic and antiphasic DIN. Beyond 10 years of age, SRT results of children became more adult-like. However, age effects were only significant up to 10 and 12 years for antiphasic and diotic SRT, respectively. Furthermore, between self- and facilitated testing, the SRT difference was not significant (p > 0.05). / Dissertation (MA)--University of Pretoria, 2019. / Speech-Language Pathology and Audiology / MA / Unrestricted
|
8 |
Taluppfattning av enstaviga ord i stationärt brus med och utan top-down stöd. / Speech reception thresholds of monosyllabic words in noise with and without top-down supportPersson, Johan January 2012 (has links)
Teorier inom Kognitiv hörselvetenskap beskriver hur uppfattning av tal beror på två olika typer av processer. Bottom-up processer associeras med akustiska och fonetiska egenskaper hos en inkommande signal och top-down processer associeras med lexikala, syntaktiska, semantiska samt kontextuella egenskaper. Förmågan att utnyttja top-down processer tros bero på kapaciteten hos arbetsminnet. För att undersöka en skillnad mellan bottom-up och top-down samt deras förhållande till arbetsminnet har ett Speech-in-Noise (SIN) test utformats och genomförts på 15 försöksdeltagare. Testet undersöker skillnader i tröskelvärden för att identifiera ett enstavigt ord i ett uppåtgående förhållande till stationärt brus, mot tröskelvärden för att identifiera ett enstavigt ord i stationärt brus med hjälp att top-down stöd. Top-down stöd ges i form av explicit priming och undersöks i både uppåtgående och nedåtgående förhållande till bruset. Två typer av arbetsminnestest, ”Letter Memory Test” och ”Reading Span Test”, användes för att undersöka en korrelation med differenser mellan tröskelvärdena. Resultaten visade på en signifikant skillnad mellan vanliga tröskelvärden och tröskelvärden då explicit top-down stöd används. Någon signifikant korrelation mellan kapaciteten hos arbetsminnet och differenserna dessa tröskelvärden fanns inte. Dock så fann analysen en signifikant korrelation mellan skillnad i tröskelvärden, för top-down stöd i uppåtgående och nedåtgående förhållande till brus, och ”Letter Memory Test”.
|
9 |
Does Vocabulary Knowledge Affect Lexical Segmentation in Adverse Conditions?Bishell, Michelle January 2015 (has links)
There is significant variability in the ability of listeners to perceive degraded speech. Existing research has suggested that vocabulary knowledge is one factor that differentiates better listeners from poorer ones, though the reason for such a relationship is unclear. This study aimed to investigate whether a relationship exists between vocabulary knowledge and the type of lexical segmentation strategy listeners use in adverse conditions. This study conducted error pattern analysis using an existing dataset of 34 normal-hearing listeners (11 males, 23 females, aged 18 to 35) who participated in a speech recognition in noise task. Listeners were divided into a higher vocabulary (HV) and a lower vocabulary (LV) group based on their receptive vocabulary score on the Peabody Picture Vocabulary Test (PPVT). Lexical boundary errors (LBEs) were analysed to examine whether the groups showed differential use of syllabic strength cues for lexical segmentation. Word substitution errors (WSEs) were also analysed to examine patterns in phoneme identification. The type and number of errors were compared between the HV and LV groups. Simple linear regression showed a significant relationship between vocabulary and performance on the speech recognition task. Independent samples t-tests showed no significant differences between the HV and LV groups in Metrical Segmentation Strategy (MSS) ratio or number of LBEs. Further independent samples t-tests showed no significant differences between the WSEs produced by HV and LV groups in the degree of phonemic resemblance to the target. There was no significant difference in the proportion of target phrases to which HV and LV listeners responded. The results of this study suggest that vocabulary knowledge does not affect lexical segmentation strategy in adverse conditions. Further research is required to investigate why higher vocabulary listeners appear to perform better on speech recognition tasks.
|
10 |
An Approach to Automatic and Human Speech Recognition Using Ear-Recorded SpeechJohnston, Samuel John Charles, Johnston, Samuel John Charles January 2017 (has links)
Speech in a noisy background presents a challenge for the recognition of that speech both by human listeners and by computers tasked with understanding human speech (automatic speech recognition; ASR). Years of research have resulted in many solutions, though none so far have completely solved the problem. Current solutions generally require some form of estimation of the noise, in order to remove it from the signal. The limitation is that noise can be highly unpredictable and highly variable, both in form and loudness.
The present report proposes a method of recording a speech signal in a noisy environment that largely prevents noise from reaching the recording microphone. This method utilizes the human skull as a noise-attenuation device by placing the microphone in the ear canal. For further noise dampening, a pair of noise-reduction earmuffs are used over the speakers' ears.
A corpus of speech was recorded with a microphone in the ear canal, while also simultaneously recording speech at the mouth. Noise was emitted from a loudspeaker in the background. Following the data collection, the speech recorded at the ear was analyzed. A substantial noise-reduction benefit was found over mouth-recorded speech. However, this speech was missing much high-frequency information. With minor processing, mid-range frequencies were amplified, increasing the intelligibility of the speech.
A human perception task was conducted using both the ear-recorded and mouth-recorded speech. Participants in this experiment were significantly more likely to understand ear-recorded speech over the noisy, mouth-recorded speech. Yet, participants found mouth-recorded speech with no noise the easiest to understand.
These recordings were also used with an ASR system. Since the ear-recorded speech is missing much high-frequency information, it did not recognize the ear-recorded speech readily. However, when an acoustic model was trained low-pass filtered speech, performance improved.
These experiments demonstrated that humans, and likely an ASR system, with additional training, would be able to more easily recognize ear-recorded speech than speech in noise. Further speech processing and training may be able to improve the signal's intelligibility for both human and automatic speech recognition.
|
Page generated in 0.0529 seconds