1 |
Robust low bit rate analysis-by-synthesis predictive speech codingSalami, Redwan Ali January 1990 (has links)
No description available.
|
2 |
Digital encoding of speech signals at 16-4.8 KBPSKondoz, Ahmet M. January 1988 (has links)
Speech coding at 64 and 32 Kb/s is well developed and standardized. The next bit rate of interest is at 16 Kb/s. Although. standardization has yet to be made, speech coding at 16 Kb/s is fairly well developed. The existing coders can produce good quality speech at rates as low as about 9.6 Kb/s. At present the major research area is at 8 to 4.8 Kb/s. This work deals first of all with enhancing the quality andkcomplexity of some of the most promising coders at 16 to 9.6 Kb/s as well as proposing new alternative coders. For this purpose coders operating at 16 Kb/s and 12 to 9.6 Kb/s have been grouped together and optimized for their corresponding bit rates. The second part of the work deals with the possibilities of coding the speech signals at lower rates than 9.6 Kb/s. Therefore, coders which produce good quality speech at bit rates 8 to 4.8 Kb/s have been designed and simulated. As well as designing coders to operate at rates below 32 Kb/s. it is very important to test them. Coders operating at 32 Kb/s and above contain only quantization noise and usually have large signal to noise ratios (SNR). For this reason their SNR's may be used for comparison of the coders. However, for the coders operating at 16 Kb/s and below this is not so and hence subjective testing is necessary for true comparison of the coders. The final part of this work deals with the subjective testing of 6 coders, three at 16 Kb/s and the other three at 9.6 Kb/s.
|
3 |
Paralinguistic event detection in children's speechRao, Hrishikesh 07 January 2016 (has links)
Paralinguistic events are useful indicators of the affective state of a speaker. These cues, in children's speech, are used to form social bonds with their caregivers. They have also been found to be useful in the very early detection of developmental disorders such as autism spectrum disorder (ASD) in children's speech. Prior work on children's speech has focused on the use of a limited number of subjects which don't have sufficient diversity in the type of vocalizations that are produced. Also, the features that are necessary to understand the production of paralinguistic events is not fully understood. To account for the lack of an off-the-shelf solution to detect instances of laughter and crying in children's speech, the focus of the thesis is to investigate and develop signal processing algorithms to extract acoustic features and use machine learning algorithms on various corpora. Results obtained using baseline spectral and prosodic features indicate the ability of the combination of spectral, prosodic, and dysphonation-related features that are needed to detect laughter and whining in toddlers' speech with different age groups and recording environments. The use of long-term features were found to be useful to capture the periodic properties of laughter in adults' and children's speech and detected instances of laughter to a high degree of accuracy. Finally, the thesis focuses on the use of multi-modal information using acoustic features and computer vision-based smile-related features to detect instances of laughter and to reduce the instances of false positives in adults' and children's speech. The fusion of the features resulted in an improvement of the accuracy and recall rates than when using either of the two modalities on their own.
|
4 |
Single-Microphone Speech Dereverberation: Modulation Domain Processing and Quality AssessmentZHENG, CHENXI 25 July 2011 (has links)
In a reverberant enclosure, acoustic speech signals are degraded by reflections from
walls, ceilings, and objects. Restoring speech quality and intelligibility from reverberated speech has received increasing interest over the past few years. Although multiple channel dereverberation methods provide some improvements in speech quality/
intelligibility, single-channel dereverberation remains an open challenge. Two types of advanced single-channel dereverberation methods, namely acoustic domain spectral subtraction and modulation domain filtering, provide small improvement in speech quality and intelligibility. In this thesis, we study single-channel dereverberation algorithms. Firstly, an
upper bound of time-frequency masking (TFM) performance for dereverberation is
obtained using ideal time-frequency masking (ITFM). ITFM has access to both the
clean and reverberated speech signals in estimating the binary-mask matrix. ITFM
implements binary masking in the short time Fourier transform (STFT) domain, preserving
only those spectral components less corrupted by reverberation. The experiment
results show that single-channel ITFM outperforms four existing multi-channel
dereverberation methods and suggest that large potential improvements could be
obtained using TFM for speech dereverberation. Secondly, a novel modulation domain spectral subtraction method is proposed for dereverberation. This method estimates modulation domain long reverberation spectral variance (LRSV) from time domain LRSV using a statistical room impulse response (RIR) model and implements spectral subtraction in the modulation domain. On one hand, different from acoustic domain spectral subtraction, our method
implements spectral subtraction in the modulation domain, which has been shown
to play an important role in speech perception. On the other hand, different from
modulation domain filtering which uses a time-invariant filter, our method takes the
changes of reverberated speech spectral variance along time into account and implements spectral subtraction adaptively. Objective and informal subjective tests show
that our proposed method outperforms two existing state-of-the-art single-channel
dereverberation algorithms. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2011-07-20 03:18:30.021
|
5 |
Implementation of i-vector algorithm in speech emotion recognition by using two different classifiers : Gaussian mixture model and support vector machineGomes, Joan January 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Emotions are essential for our existence, as they exert great influence on the mental health of people. Speech is the most powerful mode to communicate. It controls our intentions and emotions. Over the past years many researchers worked hard to recognize emotion from speech samples. Many systems have been proposed to make the Speech Emotion Recognition (SER) process more correct and accurate. This thesis research discusses the design of speech emotion recognition system implementing a comparatively new method, i-vector model. I-vector model has found much success in the areas of speaker identification, speech recognition, and language identification. But it has not been much explored in recognition of emotion. In this research, i-vector model was implemented in processing extracted features for speech representation. Two different classification schemes were designed using two different classifiers - Gaussian Mixture Model (GMM) and Support Vector Machine (SVM), along with i-vector algorithm. Performance of these two systems was evaluated using the same emotional speech database to identify four emotional speech signals: Angry, Happy, Sad and Neutral. Results were analyzed, and more than 75% of accuracy was obtained by both systems, which proved that our proposed i-vector algorithm can identify speech emotions with less error and with more accuracy.
|
6 |
DSP Techniques for Performance Enhancement of Digital Hearing AidUdayashankara, V 12 1900 (has links)
Hearing impairment is the number one chronic disability affecting people in the world. Many people have great difficulty in understanding speech with background noise. This is especially true for a large number of elderly people and the sensorineural impaired persons. Several investigations on speech intelligibility have demonstrated that subjects with sensorineural loss may need a 5-15 dB higher signal-to-noise ratio than the normal hearing subjects. While most defects in transmission chain up to cochlea can nowadays be successfully rehabilitated by means of surgery, the great majority of the remaining inoperable cases are sensorineural hearing impaired, Recent statistics of the hearing impaired patients applying for a hearing aid reveal that 20% of the cases are due to conductive losses, more than 50% are due to sensorineural losses, and the rest 30% of the cases are of mixed origin. Presenting speech to the hearing impaired in an intelligible form remains a major challenge in hearing-aid research today. Even-though various methods have been suggested in the literature for the minimization of noise from the contaminated speech signals, they fail to give good SNR improvement and intelligibility improvement for moderate to-severe sensorineural loss subjects. So far, the power and capability of Newton's method, Nonlinear adaptive filtering methods and the feedback type artificial neural networks have not been exploited for this purpose. Hence we resort to the application of all these methods for improving SNR and intelligibility for the sensorineural loss subjects. Digital hearing aids frequently employ the concept of filter banks. One of the major drawbacks of this techniques is the complexity of computation requiring more number of multiplications. This increases the power consumption. Therefore this Thesis presents the new approach to speech enhancement for the hearing impaired and also the construction of filter bank in Digital hearing aid with minimum number of multiplications. The following are covered in this thesis.
One of the most important application of adaptive systems is in noise cancellation using adaptive filters. The ANC setup requires two input signals (viz., primary and reference). The primary input consists of the sum of the desired signal and noise which is uncorrelated. The reference input consists of mother noise which is correlated in Some unknown way with noise of primary input. The primary signal is obtained by placing the omnidirectional microphone just above one ear on the head of the KEMAR mannikan and the reference signal is obtained by placing the hypercardioid microphone at the center of the vertebral column on the back. Conventional speech enhancement techniques use linear schemes for enhancing speech signals. So far Nonlinear adaptive filtering techniques are not used in hearing aid applications. The motivation behind the use of nonlinear model is that it gives better noise suppression as compared to linear model. This is because the medium through which signals reach the microphone may be highly nonlinear. Hence the use of linear schemes, though motivated by computational simplicity and mathematical tractability, may be suboptimal. Hence, we propose the use of nonlinear models to enhance the speech signals for the hearing impaired: We propose both Linear LMS and Nonlinear second order Volterra LMS schemes to enhance speech signals. Studies conducted for different environmental noise including babble, cafeteria and low frequency noise show that the second-order Volterra LMS performs better compared to linear LMS algorithm. We use measures such as signal-to-noise ratio (SNR),
time plots, and intelligibility tests for performance comparison.
We also propose an ANC scheme which uses Newton's method to enhance speech signals. The main problem associated with LMS based ANC is that their convergence is slow and hence their performance becomes poor for hearing aid applications. The reason for choosing Newton's method is that they have high performance adaptive-filtering methods that often converge and track faster than LMS method. We propose two models to enhance speech signals: one is conventional linear model and the other is a nonlinear model using a second order Volterra function. Development of Newton's type algorithm for linear mdel results in familiar Recursive least square (RLS) algorithm. The performance of both linear and non-linear Newton's algorithm is evaluated for babble, cafeteria and frequency noise. SNR, timeplots and intelligibility tests are used for performance comparison. The results show that Newton's method using Volterra nonlinearity performs better than RLS method.
ln addition to the ANC based schemes, we also develop speech enhancement for the hearing impaired by using the feedback type neural network (FBNN). The main reason is that here we have parallel algorithm which can be implemented directly in hardware. We translate the speech enhancement problem into a neural network (NN) framework by forming an appropriate energy function. We propose both linear and nonlinear FBNN for enhancing the speech signals. Simulated studies on different environmental noise reveal that the FBNN using the Volterra nonlinearity is superior to linear FBNN in enhancing speech signals. We use SNR, time plots, and intelligibility tests for performance comparison.
The design of an effective hearing aid is a challenging problem for sensorineural hearing impaired people. For persons with sensorineural losses it is necessary that the frequency response should be optimally fitted into their residual auditory area. Digital filter enhances the performance of the hearing aids which are either difficult or impossible to realize using analog techniques. The major problem in digital hearing aid is that of reducing power consumption. Multiplication is one of the most power consuming operation in digital filtering. Hence a serious effort has been made to design filter bank with minimum number of multiplications, there by minimizing the power consumption. It is achieved by using Interpolated and complementary FIR filters. This method gives significant savings in the number of arithmetic operations.
The Thesis is concluded by summarizing the results of analysis, and suggesting scope for further investigation
|
7 |
Kalbos signalų segmentacija / Speech signal segmentationLokutijevskaja, Alina 11 June 2004 (has links)
The task of our work is segmentation of a speech signal when having a speech waveform and parameters of the segments. We used dynamic programming approach.
|
8 |
Kalbos garsų aiškumo pagerinimas / Improvement of quality of speech signalSiliuk, Žana 14 June 2005 (has links)
In this work is analyzed the use of digital filter to improve the quality of a speech signal. For this purpose the influence of noises on the understanding of language is inspected and the digital filter design with optimal pole placement is presented. In the process of experimentation the speech signal was recorded by using Praat program, generated noise is made by means of Matlab function randn (n) or the sum two or one cosines. The notated speech signal is mixed with the generated noise. The algorithm based on Matlab program is created to sum up the meanings of speech signal with the corresponding meanings of the noise amplitudes. In order to improve the quality of speech signal mixed with the generated noise, the low pass, notch and strip filters is used, depending on noise, which has been used for connection of noise and a speech signal. The used programs are written down by program Matlab. In the end of the work are show the texts of the algorithms, the text of the created filters and groups of words.
|
9 |
A kepstrum approach to real-time speech enhancement : thesis for the degree of Doctor of Philosophy, Information Engineering, Institute of Technology and Engineering, Massey University at AlbanyJeong, Jinsoo January 2007 (has links)
Content removed due to copyright: Conference proceedings (I) J. Jeong, and T.J. Moir, "Kepstrum approach to real-time speech enhancement methods using two microphones", Proceedings of the International Conference on Sensing Technology (ICST), pp 691-695, November 21-23, 2005, Palmerston North, New Zealand Conference proceedings (II) J. Jeong and T. J. Moir, "Two-microphone kepstrum approach to real-time speech enhancement methods" Proceedings of the IEEE International Conference on Engineering of Intelligent Systems (ICEIS), pp 392-397, April 22-23, 2006, Islamabad, Pakistan Conference proceedings (III) T. J. Moir and J. Jeong, "Identification of non-minimum phase transfer function components" Proceedings of the IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp 380-384, August 27-30, 2006, Vancouver, Canada / This research is mainly concerned with a robust method for an improved performance of a real-time speech enhancement and noise cancellation in a real reverberant environment. Therefore, the thesis titled, "A Kepstrum Approach to Real-Time Speech Enhancement" presents an application technique of a kepstrum method to a speech enhancement method. The kepstrum approach is based on a fundamental theory of kepstrum analysis, which gives a mathematical construct to the application of a speech enhancement. kepstrum analysis is applied to the system identification application of unknown acoustic transfer functions between two microphones. This kepstrum method provides a mathematical representation with FFT based processing and is independent of acoustic path model order. The front-end application of the kepstrum method to speech enhancement methods provides an improved performance in speech enhancement and noise cancellation with several favourable effects.
|
10 |
Jednoduchý textově nezávislý hlasový zámek - Softwarový systém pro verifikaci mluvčích / Simple text-independent voice lock - speaker verification software systemKotulek, Milan January 2015 (has links)
A brief introduction into biometrics is described in this thesis leading to description and to design a solution of verification system using speech analysis. The designed system provides firstly basic signal processing, then vowel recognition in fluent Czech speech. For each found vowel, observed speech features are calculated. The created GUI application was tested on created speaker database and its efficiency is approximately 54 % for short testing utterances, and approx. 88 % for long testing utterances respectively.
|
Page generated in 0.0691 seconds