Spelling suggestions: "subject:"epeech aprocessing"" "subject:"epeech eprocessing""
61 |
Efficient speech storage via compression of silence periodsGan, Cheong Kuoon January 1984 (has links)
An adaptive optimal silence detector is designed and implemented in four speech coding schemes: N-bit PCM (N = 5 to 12), N-bit A-law PCM (N = 4 to 8), N-bit ADPCM (N = 3 to 8) and ADM (Adaptive Delta Modulation) for bit-rates of 16Kps, 24Kps and 32Kps.
The amount of compression is approximately 35% for voice recordings such as radio newscasts, highly active conversations and readings from prepared texts. Subjective evaluation shows that the silence-edited versions (silence played back as absolute silence) have acceptability scores of 1.07 lower than the unedited versions with respect to a specific coding scheme for a score range of 1 to 5. With noise-edited versions (silence replaced by random noise during playback) the score degradation is 0.5. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
|
62 |
The stability of pitch synthesis filters in speech coding /Lam, Victor T. M. January 1985 (has links)
No description available.
|
63 |
Speech synthesis by Haar functions with comparison to a terminal analog device /Meltzer, David January 1972 (has links)
No description available.
|
64 |
The Generation of Synthetic Speech Sounds by Digital CodingSteinberger, Eddy Alan 01 October 1975 (has links) (PDF)
The feasibility of representing human speech by serial digital codes was investigated by exercising specially constructed digital logic coupled with standard audio output equipment. The theories being tested represent a radical departure from previous efforts in the field of speech research. Therefore, this initial investigation was limited in scope to a study of unconnected English language speech sounds at the phenome level. The experiments were conducted in two parts, with the first being the development of serialized digital codes, for selected speech sounds, derived from actual human speech. The second part was to synthesize these sounds using the specially constructed digital synthesizer, and have human listeners analyze them for intelligibility. The results seem to indicate that this is a viable scheme for speech synthesis.
|
65 |
THE QUALITY OF SYNTHESIZED SPEECH USING LINEAR PREDICTIVE CODING ON FINITE WORDLENGTH INTEGRATED CIRCUITS.CARLSON, GERRARD MERRILL. January 1985 (has links)
This paper studies the quality of synthetic speech produced by integrated circuit (IC) hardware using fixed-point arithmetic and Linear Predictive Coding (LPC). A theoretical model explaining the combined effects of finite wordlength and parametric model order is developed. This model is used to predict the results obtained in the experimental phase of this study. In the experimental phase, selected model utterances are synthesized under finite wordlength constraints using LPC parameters. The synthetic speech is evaluated in terms of the log area ratios which define objective speech quality as a parametric distance. A theoretical model is developed to predict the experimental results. Simulations of this model produce data that predict the experimental results. The same information is extracted from the model as that obtained from actually running the fixed-point synthesizer simulator. Since the predictions of the theoretical model agree quite well with the experimental measurements, it is concluded that fixed-point synthesizer performance can be predicted without actually running a complicated and expensive fixed-point synthesizer. Secondly, results obtained from either method clearly indicate that for 15 or 16 bits, ten is the best number of poles to use. Eight useable poles are indicated for 14 bits, while seven are indicated for 13 bits. Based on the results of this study, the use of less than 13 bits for fixed-point calculations is not recommended.
|
66 |
Effects of noise type on speech understandingNg, H. N., Elaine., 吳凱寧. January 2006 (has links)
published_or_final_version / abstract / Speech and Hearing Sciences / Master / Master of Science in Audiology
|
67 |
EFFICIENT CODING OF SPEECH SYNTHESIS DATA.Hosne-Sanaye, Simin. January 1984 (has links)
No description available.
|
68 |
Speech processing using digital MEMS microphonesZwyssig, Erich Paul January 2013 (has links)
The last few years have seen the start of a unique change in microphones for consumer devices such as smartphones or tablets. Almost all analogue capacitive microphones are being replaced by digital silicon microphones or MEMS microphones. MEMS microphones perform differently to conventional analogue microphones. Their greatest disadvantage is significantly increased self-noise or decreased SNR, while their most significant benefits are ease of design and manufacturing and improved sensitivity matching. This thesis presents research on speech processing, comparing conventional analogue microphones with the newly available digital MEMS microphones. Specifically, voice activity detection, speaker diarisation (who spoke when), speech separation and speech recognition are looked at in detail. In order to carry out this research different microphone arrays were built using digital MEMS microphones and corpora were recorded to test existing algorithms and devise new ones. Some corpora that were created for the purpose of this research will be released to the public in 2013. It was found that the most commonly used VAD algorithm in current state-of-theart diarisation systems is not the best-performing one, i.e. MLP-based voice activity detection consistently outperforms the more frequently used GMM-HMM-based VAD schemes. In addition, an algorithm was derived that can determine the number of active speakers in a meeting recording given audio data from a microphone array of known geometry, leading to improved diarisation results. Finally, speech separation experiments were carried out using different post-filtering algorithms, matching or exceeding current state-of-the art results. The performance of the algorithms and methods presented in this thesis was verified by comparing their output using speech recognition tools and simple MLLR adaptation and the results are presented as word error rates, an easily comprehensible scale. To summarise, using speech recognition and speech separation experiments, this thesis demonstrates that the significantly reduced SNR of the MEMS microphone can be compensated for with well established adaptation techniques such as MLLR. MEMS microphones do not affect voice activity detection and speaker diarisation performance.
|
69 |
Voice recognition systems : assessment of implementation aboard U.S. naval shipsWilson, Shawn C. 03 1900 (has links)
Approved for public release; distribution is unlimited. / Technological advances have had profound effects on
the conduct of military operations in both peacetime and in
war. One advance that has had a great impact outside the
military by reducing human intervention is Voice
Recognition (VR) technology. This thesis will examine the
implementation of a Voice Recognition System as a shipdriving
device and as a means of decreasing the occurrence
of mishaps while reducing the level of fatigue of
watchstanders on the bridge. Chapter I will discuss the
need for the United States Navy to investigate the
implementation of a Voice Recognition System to help reduce
the probability of mishaps occurring. Chapter II will
explain voice recognition technology, how it works, and how
the proposed system can be fielded aboard U.S. Navy ships.
Chapter III will examine the opinions (on the
implementation of a Voice Recognition System) of officers
charged with the safe navigation of naval ships. Chapter
IV will review the concerns of officers, and will justify
the implementation by answering these concerns. The
conclusion will iterate the advances in voice recognition,
and why a Voice Recognition system should be implemented on
the bridges of U.S. Navy ships. / Lieutenant, United States Navy
|
70 |
Query-by-example spoken term detection for low-resource languages / CUHK electronic theses & dissertations collectionJanuary 2014 (has links)
In this thesis, we consider the problem of query-by-example (QbyE) spoken term detection (STD) for low-resource languages. The problem is to automatically detect and locate the occurrences of a query term in a large audio database. The query term is given in the form of one or more audio examples. This research is motivated by the demand for information retrieval technologies that can handle speech data of low-resource languages. The major technical difficulty is that manual transcriptions and linguistic knowledge are not available for these languages. / The framework of acoustic segment modeling (ASM) is adopted for unsupervised training of a speech tokenizer. Three novel algorithms are developed for segment labeling in the ASM framework. The proposed algorithms are based on the use of different class-by-segment posterior representations and spectral clustering techniques. The posterior representations are shown to be more robust than conventional spectral representations. Spectral clustering has achieved significant success in many applications. Reformulations of spectral clustering algorithms are made to make them computationally feasible for clustering a large number of speech segments. Experiments on a multilingual speech database demonstrate the advantage of the proposed algorithms over existing approaches. / The speech tokenizer obtained with ASM is applied to QbyE STD. The detection of spoken queries is based on a frame-based template matching framework. The ASM tokenizer serves as the front-end to generate posterior features, which are used for temporal template matching by dynamic time warping (DTW). Experiments show that the ASM tokenizer outperforms a GMM tokenizer and language-mismatched phoneme recognizers. Moreover, a two-step approach is proposed for efficient search. / The frame-based template matching framework for QbyE STD is enhanced in three ways. A novel DTW matrix combination approach is proposed for the fusion of multiple systems with different posterior features. Pseudo-relevance feedback is used for query expansion, and score normalization is applied to calibrate the score distributions of different query terms. Experimental results show that the performances of the QbyE STD system are significantly improved by the three approaches. / 關鍵詞檢測是一項在大量語音數據庫中查找某關鍵詞位置的技術。關鍵詞檢測無論在學術研究領域還是實際應用領域都有非常重要的價值。傳統關鍵詞檢測的研究主要針對資源豐富的語言。本文研究針對資源匱乏的語言的關鍵詞檢測。在本文設定條件下,目標語言沒有足夠的資源訓練語音識別系統,並且關鍵詞以聲音樣例的形式給定。 / 本文採用聲學語音段建模(ASM)框架來無監督訓練語音識別器。我們提出三種新的方法用於ASM框架中的語音片段聚類。我們的方法基於一種新的魯棒的語音片段特徵,並且採用了譜聚類技術。實驗證明我們的方法優於另外三種常用的基線方法,能夠取得更好的建模效果。 / 我們將ASM識別器用於基於模板匹配的關鍵詞檢測系統中。在該系統中,ASM識別器被視為前端特徵轉換模塊,用於提取後驗概率特徵。為了提高檢測效率,我們還提出一種兩步檢測方法。實驗效果證明我們的方法能夠取得較高的檢測準確率。 / 為了進一步提高檢測準確率,本文從三個角度優化基於模板匹配的關鍵詞檢測系統。首先我們提出在動態時間規整的距離矩陣上進行系統融合。其次我們提出用偽相關反饋技術來獲取更多的關鍵詞樣例。最後我們對系統打分進行規整從而有利於在設定統一的打分門限。實驗結果證明這三種方法都有效的提高了關鍵詞檢測的系統性能。 / Wang, Haipeng. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 110-127). / Abstracts also in Chinese. / Title from PDF title page (viewed on 05, December, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only.
|
Page generated in 0.0589 seconds