• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 256
  • 47
  • 25
  • 21
  • 16
  • 16
  • 16
  • 16
  • 16
  • 16
  • 12
  • 11
  • 6
  • 2
  • 2
  • Tagged with
  • 442
  • 442
  • 322
  • 144
  • 120
  • 79
  • 79
  • 69
  • 53
  • 43
  • 42
  • 41
  • 40
  • 39
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Low bit rate speech coding

Kritzinger, Carl 03 1900 (has links)
Thesis (MScIng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / Despite enormous advances in digital communication, the voice is still the primary tool with which people exchange ideas. However, uncompressed digital speech tends to require prohibitively high data rates (upward of 64kbps), making it impractical for many applications. Speech coding is the process of reducing the data rate of digital voice to manageable levels. Parametric speech coders or vocoders utilise a-priori information about the mechanism by which speech is produced in order to achieve extremely efficient compression of speech signals (as low as 1 kbps). The greater part of this thesis comprises an investigation into parametric speech coding. This consisted of a review of the mathematical and heuristic tools used in parametric speech coding, as well as the implementation of an accepted standard algorithm for parametric voice coding. In order to examine avenues of improvement for the existing vocoders, we examined some of the mathematical structure underlying parametric speech coding. Following on from this, we developed a novel approach to parametric speech coding which obtained promising results under both objective and subjective evaluation. An additional contribution by this thesis was the comparative subjective evaluation of the effect of parametric speech coding on English and Xhosa speech. We investigated the performance of two different encoding algorithms on the two languages.
242

Automatic alignment and error detection for phonetic transcriptions in the African speech technology project databases

De Villiers, Edward 03 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006. / The African Speech Technology (AST) project ran from 2000 to 2004 and involved collecting speech data for five South African languages, transcribing the data and building automatic speech recognition systems in these languages. The work described here form part of this project and involved implementing methods for automatic boundary placement in manually labelled files and for determining errors made by transcribers during the labelling process.
243

Nonlinear Acoustic Echo Cancellation for Mobile Phones: A Practical Approach

Fhager, Anders, Hussien, Jemal Mohammed January 2010 (has links)
<p>Acoustic echo cancelation (AEC) composes a fundamental property of speech processing to enable a pleasant telecommunication conversation. Without this property of the telephone the communicator would hear an annoying echo of his own voice along with the speech from the other communicator. This would make a conversation through any telecommunication device an unpleasant experience.</p><p>AEC has been subject of interest since 1950s in the telecom industry and very efficient solutions were devised to cancel linear echo. With the advent of low cost hands free communication devices the issue of non linear echo became prominent because these devices use cheap loudspeakers that produce artifacts in addition to the desired sound which will cause non linear echo that cannot be cancelled by linear echo cancellers.</p><p>In this thesis a Harmonic Distortion Residual Echo Cancelation algorithm has been chosen for further investigations (HDRES). HDRES has many of those features that are desirable for an algorithm which is dealing with nonlinear acoustic echo cancelation, such as low computational complexity and fast convergence. The algorithm was first implemented in Matlab where it was tested and modified. The final result of the modified algorithm was then implemented in C and integrated with a complete AEC system. Before the implementation a number of measurements were done to distinguish the nonlinearities that were cause by the mobile phone loudspeaker. The measurements were performed on three different mobile pones which were documented to have problems with nonlinear acoustic echo.</p><p>The result of this thesis has shown that it might be possible to use an adaptive filter, which has both low complexity and fast convergence, in an operating AEC system. However, the request for such a system to work would be that a doubletalk detector is implemented along with the adaptive algorithm. That way the doubletalk situation could be found and the adaptation of the algorithm could be stopped. Thus, the major part of the speech would be saved.</p>
244

Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

Gritzman, Ashley Daniel January 2016 (has links)
A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016 / Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications. / MT2017
245

Audio search of surveillance data using keyword spotting and dynamic models =: 利用關鍵詞及動態模型進行的語音情報搜尋. / 利用關鍵詞及動態模型進行的語音情報搜尋 / Audio search of surveillance data using keyword spotting and dynamic models =: Li yong guan jian ci ji dong tai mo xing jin xing de yu yin qing bao sou xun. / Li yong guan jian ci ji dong tai mo xing jin xing de yu yin qing bao sou xun

January 2001 (has links)
Lam Hiu Sing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references. / Text in English; abstracts in English and Chinese. / Lam Hiu Sing. / Acknowledgement --- p.5 / Chapter I. --- Table of Content --- p.6 / Chapter II. --- Lists of Tables --- p.9 / Chapter III. --- Lists of Figures --- p.11 / Chapter Chapter 1 --- Introduction --- p.13 / Chapter 1.1 --- Intelligence gathering by surveillance --- p.13 / Chapter 1.2 --- Speech recognition and keyword spotting --- p.16 / Chapter 1.3 --- Audio indexing and searching --- p.18 / Chapter 1.3.1 --- Nature of audio sources --- p.18 / Chapter 1.3.2 --- Different searching objectives --- p.19 / Chapter 1.4 --- Objective of thesis --- p.22 / Chapter 1.5 --- Thesis outline --- p.23 / Chapter 1.6 --- References --- p.24 / Chapter Chapter 2 --- HMM-based Keyword Spotting System --- p.28 / Chapter 2.1 --- Statistical speech model --- p.28 / Chapter 2.1.1 --- Speech signal representations --- p.29 / Chapter 2.1.2 --- Acoustic modeling --- p.29 / Chapter 2.1.3 --- HMM message generation model --- p.32 / Chapter 2.2 --- Basics of keyword spotting --- p.34 / Chapter 2.2.1 --- Keyword and non-keyword modeling --- p.34 / Chapter 2.2.2 --- Language model --- p.37 / Chapter 2.2.3 --- Performance measure --- p.38 / Chapter 2.3 --- Keyword spotting applications --- p.39 / Chapter 2.3.1 --- Information query system --- p.40 / Chapter 2.3.2 --- Topic identification system --- p.41 / Chapter 2.3.3 --- Audio indexing and searching system --- p.42 / Chapter 2.3.4 --- Lexicon learning system --- p.42 / Chapter 2.4 --- Summary --- p.43 / Chapter 2.5 --- References --- p.44 / Chapter Chapter 3 --- Cantonese Characteristics --- p.49 / Chapter 3.1 --- Cantonese Dialect --- p.49 / Chapter 3.2 --- Phonological properties of Cantonese --- p.51 / Chapter 3.2.1 --- Initials and finals of Cantonese --- p.51 / Chapter 3.2.2 --- Tones of Cantonese --- p.54 / Chapter 3.3 --- Summary --- p.55 / Chapter 3.4 --- References --- p.55 / Chapter Chapter 4 --- System Configuration for Audio Search of Surveillance Data --- p.57 / Chapter 4.1 --- Audio Search of Surveillance Data --- p.57 / Chapter 4.2 --- Requirements and Specifications of the Proposed Audio Search System --- p.59 / Chapter 4.3 --- Proposed Audio Search System Architecture --- p.62 / Chapter 4.4 --- Summary --- p.65 / Chapter 4.5 --- References --- p.66 / Chapter Chapter 5 --- Development of a Keyword Spotting based Audio Indexing and Searching System --- p.67 / Chapter 5.1 --- Acoustic Models for Keywords and Fillers --- p.67 / Chapter 5.2 --- Adaptation mechanism --- p.76 / Chapter 5.2.1 --- Adaptation techniques --- p.76 / Chapter 5.2.2 --- Adaptation strategy for MLLR --- p.85 / Chapter 5.3 --- Language model --- p.86 / Chapter 5.4 --- Summary --- p.88 / Chapter 5.5 --- References --- p.88 / Chapter Chapter 6 --- System Evaluations --- p.93 / Chapter 6.1 --- Data for training and evaluation of the system --- p.94 / Chapter 6.1.1 --- Training Data --- p.94 / Chapter 6.1.2 --- Evaluation data --- p.95 / Chapter 6.1.3 --- Performance measure --- p.97 / Chapter 6.2 --- Cluster settings for MLLR adaptation --- p.98 / Chapter 6.3 --- Effects of word insertion penalty --- p.102 / Chapter 6.4 --- Acoustic modeling performance comparisons --- p.103 / Chapter 6.4.1 --- System robustness test --- p.104 / Chapter 6.4.2 --- The performance limit --- p.105 / Chapter 6.5 --- Overall System Performance --- p.107 / Chapter 6.6 --- Summary --- p.108 / Chapter 6.7 --- References --- p.108 / Chapter Chapter 7 --- Conclusions and Future Works --- p.109 / Chapter 7.1 --- Conclusions --- p.109 / Chapter 7.2 --- Future works --- p.110 / Chapter 7.2.1 --- Discriminative adaptation --- p.110 / Chapter 7.2.2 --- Pronunciation dictionary --- p.111 / Chapter 7.2.3 --- Channel effect --- p.111
246

The use of subword-based audio indexing in Chinese spoken document retrieval.

January 2001 (has links)
Li Yuk Chi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves [112]-119). / Abstracts in English and Chinese. / Abstract --- p.2 / List of Figures --- p.8 / List of Tables --- p.12 / Chapter 1 --- Introduction --- p.17 / Chapter 1.1 --- Information Retrieval --- p.18 / Chapter 1.1.1 --- Information Retrieval Models --- p.19 / Chapter 1.1.2 --- Information Retrieval in English --- p.20 / Chapter 1.1.3 --- Information Retrieval in Chinese --- p.22 / Chapter 1.2 --- Spoken Document Retrieval --- p.24 / Chapter 1.2.1 --- Spoken Document Retrieval in English --- p.25 / Chapter 1.2.2 --- Spoken Document Retrieval in Chinese --- p.25 / Chapter 1.3 --- Previous Work --- p.28 / Chapter 1.4 --- Motivation --- p.32 / Chapter 1.5 --- Goals --- p.33 / Chapter 1.6 --- Thesis Organization --- p.34 / Chapter 2 --- Investigation Framework --- p.35 / Chapter 2.1 --- Indexing the Spoken Document Collection --- p.36 / Chapter 2.2 --- Query Processing --- p.37 / Chapter 2.3 --- Subword Indexing --- p.37 / Chapter 2.4 --- Robustness in Chinese Spoken Document Retrieval --- p.40 / Chapter 2.5 --- Retrieval --- p.40 / Chapter 2.6 --- Evaluation --- p.43 / Chapter 2.6.1 --- Average Inverse Rank --- p.43 / Chapter 2.6.2 --- Mean Average Precision --- p.44 / Chapter 3 --- Subword-based Chinese Spoken Document Retrieval --- p.46 / Chapter 3.1 --- The Cantonese Corpus --- p.48 / Chapter 3.2 --- Known-Item Retrieval --- p.49 / Chapter 3.3 --- Subword Formulation for Cantonese Spoken Document Retrieval --- p.50 / Chapter 3.4 --- Audio Indexing by Cantonese Speech Recognition --- p.52 / Chapter 3.4.1 --- Seed Models from Adapted Data --- p.52 / Chapter 3.4.2 --- Retraining Acoustic Models --- p.53 / Chapter 3.5 --- The Retrieval Model --- p.55 / Chapter 3.6 --- Experiments --- p.56 / Chapter 3.6.1 --- Setup and Observations --- p.57 / Chapter 3.6.2 --- Results Analysis --- p.58 / Chapter 3.7 --- Chapter Summary --- p.63 / Chapter 4 --- Robust Indexing and Retrieval Methods --- p.64 / Chapter 4.1 --- Query Expansion using Phonetic Confusion --- p.65 / Chapter 4.1.1 --- Syllable-Syllable Confusions from Recognition --- p.66 / Chapter 4.1.2 --- Experimental Setup and Observation --- p.67 / Chapter 4.2 --- Document Expansion --- p.71 / Chapter 4.2.1 --- The Side Collection for Expansion --- p.72 / Chapter 4.2.2 --- Detailed Procedures in Document Expansion --- p.72 / Chapter 4.2.3 --- Improvements due to Document Expansion --- p.73 / Chapter 4.3 --- Using both Query and Document Expansion --- p.75 / Chapter 4.4 --- Chapter Summary --- p.76 / Chapter 5 --- Cross-Language Spoken Document Retrieval --- p.78 / Chapter 5.1 --- The Topic Detection and Tracking Collection --- p.80 / Chapter 5.1.1 --- The Spoken Document Collection --- p.81 / Chapter 5.1.2 --- The Translingual Query --- p.82 / Chapter 5.1.3 --- The Side Collection --- p.82 / Chapter 5.1.4 --- Subword-based Indexing --- p.83 / Chapter 5.2 --- The Translingual Retrieval Task --- p.83 / Chapter 5.3 --- Machine Translated Query --- p.85 / Chapter 5.3.1 --- The Unbalanced Query --- p.85 / Chapter 5.3.2 --- The Balanced Query --- p.87 / Chapter 5.3.3 --- Results on the Weight Balancing Scheme --- p.88 / Chapter 5.4 --- Document Expansion from a Side Collection --- p.89 / Chapter 5.5 --- Performance Evaluation and Analysis --- p.91 / Chapter 5.6 --- Chapter Summary --- p.93 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Future Directions --- p.97 / Chapter A --- Input format for the IR engine --- p.101 / Chapter B --- Preliminary Results on the Two Normalization Schemes --- p.102 / Chapter C --- Significance Tests --- p.103 / Chapter C.1 --- Query Expansions for Cantonese Spoken Document Retrieval --- p.103 / Chapter C.2 --- Document Expansion for Cantonese Spoken Document Retrieval --- p.105 / Chapter C.3 --- Balanced Query for Cross-Language Spoken Document Retrieval --- p.107 / Chapter C.4 --- Document Expansion for Cross-Language Spoken Document Retrieval --- p.107 / Chapter D --- The Use of an Unrelated Source for Expanding Spoken Doc- uments in Cantonese --- p.110 / Bibliography --- p.110
247

A computational framework for mixed-initiative dialog modeling.

January 2002 (has links)
Chan, Shuk Fong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 114-122). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Contributions --- p.5 / Chapter 1.3 --- Thesis Outline --- p.9 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Mixed-Initiative Interactions --- p.11 / Chapter 2.2 --- Mixed-Initiative Spoken Dialog Systems --- p.14 / Chapter 2.2.1 --- Finite-state Networks --- p.16 / Chapter 2.2.2 --- Form-based Approaches --- p.17 / Chapter 2.2.3 --- Sequential Decision Approaches --- p.18 / Chapter 2.2.4 --- Machine Learning Approaches --- p.20 / Chapter 2.3 --- Understanding Mixed-Initiative Dialogs --- p.24 / Chapter 2.4 --- Cooperative Response Generation --- p.26 / Chapter 2.4.1 --- Plan-based Approach --- p.27 / Chapter 2.4.2 --- Constraint-based Approach --- p.28 / Chapter 2.5 --- Chapter Summary --- p.29 / Chapter 3 --- Mixed-Initiative Dialog Management in the ISIS system --- p.30 / Chapter 3.1 --- The ISIS Domain --- p.31 / Chapter 3.1.1 --- System Overview --- p.31 / Chapter 3.1.2 --- Domain-Specific Constraints --- p.33 / Chapter 3.2 --- Discourse and Dialog --- p.34 / Chapter 3.2.1 --- Discourse Inheritance --- p.37 / Chapter 3.2.2 --- Mixed-Initiative Dialogs --- p.41 / Chapter 3.3 --- Challenges and New Directions --- p.45 / Chapter 3.3.1 --- A Learning System --- p.46 / Chapter 3.3.2 --- Combining Interaction and Delegation Subdialogs --- p.49 / Chapter 3.4 --- Chapter Summary --- p.57 / Chapter 4 --- Understanding Mixed-Initiative Human-Human Dialogs --- p.59 / Chapter 4.1 --- The CU Restaurants Domain --- p.60 / Chapter 4.2 --- "Task Goals, Dialog Acts, Categories and Annotation" --- p.61 / Chapter 4.2.1 --- Task Goals and Dialog Acts --- p.61 / Chapter 4.2.2 --- Semantic and Syntactic Categories --- p.64 / Chapter 4.2.3 --- Annotating the Training Sentences --- p.65 / Chapter 4.3 --- Selective Inheritance Strategy --- p.67 / Chapter 4.3.1 --- Category Inheritance Rules --- p.67 / Chapter 4.3.2 --- Category Refresh Rules --- p.73 / Chapter 4.4 --- Task Goal and Dialog Act Identification --- p.78 / Chapter 4.4.1 --- Belief Networks Development --- p.78 / Chapter 4.4.2 --- Varying the Input Dimensionality --- p.80 / Chapter 4.4.3 --- Evaluation --- p.80 / Chapter 4.5 --- Procedure for Discourse Inheritance --- p.83 / Chapter 4.6 --- Chapter Summary --- p.86 / Chapter 5 --- Cooperative Response Generation in Mixed-Initiative Dialog Modeling --- p.88 / Chapter 5.1 --- System Overview --- p.89 / Chapter 5.1.1 --- State Space Generation --- p.89 / Chapter 5.1.2 --- Task Goal and Dialog Act Generation for System Response --- p.92 / Chapter 5.1.3 --- Response Frame Generation --- p.93 / Chapter 5.1.4 --- Text Generation --- p.100 / Chapter 5.2 --- Experiments and Results --- p.100 / Chapter 5.2.1 --- Subjective Results --- p.103 / Chapter 5.2.2 --- Objective Results --- p.105 / Chapter 5.3 --- Chapter Summary --- p.105 / Chapter 6 --- Conclusions --- p.108 / Chapter 6.1 --- Summary --- p.108 / Chapter 6.2 --- Contributions --- p.110 / Chapter 6.3 --- Future Work --- p.111 / Bibliography --- p.113 / Chapter A --- Domain-Specific Task Goals in CU Restaurants Domain --- p.123 / Chapter B --- Full list of VERBMOBIL-2 Dialog Acts --- p.124 / Chapter C --- Dialog Acts for Customer Requests and Waiter Responses in CU Restaurants Domain --- p.125 / Chapter D --- The Two Grammers for Task Goal and Dialog Act Identifi- cation --- p.130 / Chapter E --- Category Inheritance Rules --- p.143 / Chapter F --- Category Refresh Rules --- p.149 / Chapter G --- Full list of Response Trigger Words --- p.154 / Chapter H --- Evaluation Test Questionnaire for Dialog System in CU Restaurants Domain --- p.159 / Chapter I --- Details of the statistical testing Regarding Grice's Maxims and User Satisfaction --- p.161
248

Enhancement and bandwidth compression of noisy speech by estimation of speech and its model parameters.

Lim, Jae Soo January 1978 (has links)
Thesis. 1978. Sc.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Includes bibliographical references. / Sc.D.
249

Digital encoding of speech and audio signals based on the perceptual requirements of the auditory system

Krasner, Michael Allen January 1979 (has links)
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1979. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Bibliography: leaves 130-138. / by Michael Allen Krasner. / Ph.D.
250

Methods of endpoint detection for isolated word recognition

Lamel, Lori Faith January 1980 (has links)
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Includes bibliographical references. / by Lori F. Lamel. / M.S.

Page generated in 0.3053 seconds