Advances in digital technology in the last decade have motivated the development of very efficient and high quality speech compression algorithms. While in the early low bit rate coding systems, the main target was production of intelligible speech at low information rates, expansion of new applications such as mobile satellite systems increased the demand for high quality speech at lowest possible bit rates. This resulted in the development of efficient parametric models for speech production system. These models were the basis of powerful speech compression algorithms such as CELP and Multiband excitation. CELP is a very efficient algorithm at medium bit rates and has achieved almost toll quality at 8 kb/s. However, the performance of CELP rapidly reduces at bit rates below 4.8 kb/s. The sinusoidal based coding algorithms and in particular multiband excitation technique have proved their abilities in producing high quality speech at bit rates below 5 kb/s. In recent years, another efficient speech compression algorithm called prototype waveform interpolation (PWI) has emerged. PWI presented a novel model which proved to be very efficient in removing redundant information from speech. While the early PWI systems produced high quality speech at bit rates around 3.5 kb/s, its latest versions produce an even higher quality at the bit rates as low as 2.4 kb/s. The key to the success of PWI is the approach it exploits in reducing the distortion associated with low bit rate coding algorithms. However, the price for this achievement is a very high computational demand which has been the main hurdle in its real time applications. The aim of the research in this thesis is the development of low complexity PWI systems without sacrificing the high quality. While the target of the majority of PWI systems is efficient coding of the excitation signal in the LP model of speech, this research focuses on exploiting PWI to directly encode the original speech. In the first part of the thesis, basic techniques in low bit rate speech coding are described and proper tools are developed to be exploited in a PWI based coding system. In the second part, the original PWI algorithm operating in the LP residual domain is briefly explained and application of PWI in speech domain is introduced as a method to cope with problems associated with the original PWI. To demonstrate the abilities of this approach, various coding schemes operating in the range of 1.85 to 2.95 kb/s are developed. In the final stage, a new technique which combines the two powerful low bit rate coding techniques, i.e multiband excitation and PWI, is developed to produce high quality synthetic speech at 2.6 kb/s.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:360903 |
Date | January 1997 |
Creators | Yaghmaie, Khashayar |
Publisher | University of Surrey |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://epubs.surrey.ac.uk/843152/ |
Page generated in 0.0022 seconds