Spelling suggestions: "subject:"speaker"" "subject:"peaker""
21 |
Learning speaker-specific characteristics with deep neural architectureSalman, Ahmad January 2012 (has links)
Robust Speaker Recognition (SR) has been a focus of attention for researchers since long. The advancement in speech-aided technologies especially biometrics highlights the necessity of foolproof SR systems. However, the performance of a SR system critically depends on the quality of speech features used to represent the speaker-specific information. This research aims at extracting the speaker-specific information from Mel-frequency Cepstral Coefficients (MFCCs) using deep learning. Speech is a mixture of various information components that include linguistic, speaker-specific and speaker’s emotional state information. Feature extraction for each information component is inevitable in different speech-related tasks for robust performance. However, almost all forms of speech representation carry all the information as a whole, which is responsible for the compromised performances by SR systems. Motivated by the complex problem solving ability of deep architectures by learning high-level task-specific information in the data, we propose a novel Deep Neural Architecture (DNA) to extract speaker-specific information (SI) from MFCCs, a popular frequency domain speech signal representation. A two-stage learning strategy is adopted, which is based on unsupervised training for network initialisation followed by regularised contrastive learning. To train our network in the 2nd stage, we devise a contrastive loss function to discriminate the speakers on the basis of their intrinsic statistical patterns, distributed in the representations yielded by our deep network. This is achieved in the contrastive pair-wise comparison of these representations for similar or dissimilar speakers. To improve the generalisation and reduce the interference of environmental effects with the speaker-specific representation, we regulate the contrastive loss with the data reconstruction loss in a multi-objective optimisation. A detailed study has been done to analyse the parametric space in training the proposed deep architecture for optimum performance. Finally we compare the performance of our learned speaker-specific representations with several state-of-the-art techniques in speaker verification and speaker segmentation tasks. It is evident that the representations acquired through learned DNA are invariant and comparatively less sensitive to the text, language and environmental variability.
|
22 |
Some Innovations in an Oral Approach to Teaching English to Spanish-Speaking Students: Eighth Grade LevelWoolsey, Normada L. 01 1900 (has links)
The aim of this thesis is to suggest how some of the trends mentioned above may be incorporated into a program to help the eighth grade Spanish-speaking student in a predominately English-speaking school, to help the student who has not only given up the idea of getting an education himself, but is considered by his teachers "too late" to reach.
|
23 |
Speaker Recognition in a handheld computerDomínguez Sánchez, Carlos January 2010 (has links)
Handheld computers are widely used, be it a mobile phone, personal digital assistant (PDA), or a media player. Although these devices are personal, often a small set of persons can use a given device, for example a group of friends or a family. The most natural way to communicate for most humans is through speech. Therefore a natural way for these devices to know who is using them is for the device to listen to the user’s speech, i.e., to recognize the speaker based upon their speech. This project exploits the microphone built into most of these devices and asks whether it is possible to develop an effective speaker recognition system which can operate within the limited resources of these devices (as compared to a desktop PC). The goal of this speaker recognition is to distinguish between the small set of people that could share a handheld device and those outside of this small set. Therefore the criteria is that the device should work for any of the members of this small set and not work for anyone outside of this small set. Furthermore, within this small set the device should recognize which specific person within this small group is using it. An application for a Windows Mobile PDA has been developed using C++. This application and its underlying theoretical concepts, as well as parts of the code and the results obtained (in terms of accuracy rate and performance) are presented in this thesis. The experiments conducted within this research indicate that it is feasible to recognize the user based upon their speech is within a small group and further more to identify which member of the group is the user. This has great potential for automatically configuring devices within a home or office environment for the specific user. Potentially all a user needs to do is speak within hearing range of the device to identify themselves to the device. The device in turn can configure itself for this user. / Handdatorer används mycket, det kan vara en mobiltelefon, handdator (PDA) eller en media spelare. Även om dessa enheter är personliga, kan en liten uppsättning med personer ofta använda en viss enhet, t.ex. en grupp av vänner eller en familj. Det mest naturliga sättet att kommunicera för de flesta människor är att tala. Därför ett naturligt sätt för dessa enheten att veta vem som använder dem är för enheten att lyssna på användarens röst, till exempel att erkänna talaren baserat på deras röst. Detta projekt utnyttjar mikrofonen inbyggd i de flesta av dessa enheter och frågar om det är möjligt att utveckla ett effektivt system högtalare erkännande som kan verka inom de begränsade resurserna av dessa enheter (jämfört med en stationär dator). Målet med denna högtalare erkännande är att skilja mellan den lilla set av människor som skulle kunna dela en handdator och de utanför detta lilla set. Därför kriterierna är att enheten bör arbeta för någon av medlemmarna i detta lilla set och inte fungerar för någon utanför detta lilla set. Övrigt inom denna lilla set, bör enheten erkänna som specifik person inom denna lilla grupp. En ansökan om emph Windows Mobile PDA har utvecklats med C++. Denna ansökan och det underliggande teoretiska begreppet, liksom delar av koden och uppnådda resultat (i form av noggrannhet hastighet och prestanda) presenteras i denna avhandling. Experimenten som utförs inom denna forskning visar att det är möjligt att känna användaren baserat på deras röst inom en liten grupp och ytterligare mer att identifiera vilken medlem i gruppen är användaren. Detta har stor potential för att automatiskt konfigurera enheter inom en hemifrån eller från kontoret till den specifika användaren. Potentiellt behöver en användare tala inom hörhåll för att identifiera sig till enheten. Enheten kan konfigurera själv för denna användare.
|
24 |
An analysis of Hosii in modern spoken JapaneseHoshino, Takane January 1991 (has links)
No description available.
|
25 |
Rozpoznávání mluvčího na mobilním telefonu / Speaker Recognition on Mobile PhonePešán, Jan January 2011 (has links)
Tato práce se zaměřuje na implementaci počítačového systému rozpoznávání řečníka do prostředí mobilního telefonu. Je zde popsán princip, funkce, a implementace rozpoznávače na mobilním telefonu Nokia N900.
|
26 |
Children prefer to acquire information from unambiguous speakersGillis, Randall January 2011 (has links)
Detecting ambiguity is essential for successful communication. Two studies investigated whether preschool- (4- to 5-year-old) and school-age (6- to 7-year-old) children show sensitivity to communicative ambiguity and can use this cue to determine which speakers constitute valuable informational sources. Children were provided clues to the location of hidden dots by speakers who varied in clarity and accuracy. Subsequently, children decided from whom they would like to receive additional information. In Study 1, preschool- (n=40) and school-age (n=42) children preferred to solicit information from unambiguous than from ambiguous speakers. However, ambiguous speakers were preferred to speakers who provided inaccurate information. In Study 2, when not provided with information about the outcome of the speakers’ clues, school-age (n=22), but not preschool-age (n=19), children preferred unambiguous relative to ambiguous speakers. Results highlight a developmental progression in children’s use of communicative ambiguity as a cue to determining which individuals are preferable informants.
|
27 |
The Utilization of Listening Strategies in the Development of Listening Comprehension among Skilled and Less-skilled Non-native English Speakers at the College LevelLiu, Yi-Chun 2009 December 1900 (has links)
This study aimed to explore Chinese and Korean EFL learners? perceptions with regards to the use of listening strategies. The purpose is to learn whether Chinese and Korean students achieve academic listening comprehension through specific listening strategies. The data were collected from first and second year students currently studying abroad in the US. Although they are immersed in an English speaking environment, the use of listening strategies still affects their development of academic listening comprehension based on what they have learned in their home countries. For this reason, this study provides a corpus for understanding Chinese and Korean EFL students' listening behavior and what constrains their English listening comprehension.
The research design is one hundred and sixty-six college level students from three public universities in Texas who completed web-based questionnaires. Skilled and less-skilled groups were differentiated according to their TOEFL listening scores. If the student had a score of more than 570, he/she was categorized into the skilled listeners group; below 570, they belonged to the less-skilled listeners group. In terms of the need for additional research on the different factors that affect developmental outcomes in L2 listening comprehension, the following research questions were investigated: 1) Is there a statistically significant relationship between the self-reported use of listening strategies and self-reported listening comprehension scores on the TOEFL? 2) Is there a difference between skilled and less-skilled non-native English speakers in the self-reported use of four categories of listening strategies (memory, cognitive, meta-cognitive, and socio-affective)? 3) What factors influence the use of self-reported listening strategies?
The findings show that students in this sample tended to employ memory strategies as a means of achieving listening comprehension. In theory, cognitive and metacognitive strategies are more difficult than memory strategies, prompting a lack of sophisticated strategies for Chinese and Korean students. In addition, students? listening skills are not mature. The pedagogical implications of this study for EFL education are that teachers, while teaching listening, should be alert to spot such phenomena and, specifically, instruct students to reach listening maturity via cognitive and metacognitive strategies.
|
28 |
DSP Base Independent Phrase Real Time Speaker Recognition SystemYan, Ming-Xiang 27 July 2004 (has links)
The thesis illustrates a DSP-based speaker recognition system . In order to make the modular within the representation floating-point, we simplify the algorithm. This speaker recognition system is including hardware setting and implementation of speaker algorithm. The DSP chip is float arithmetic DSP(ADSP-21161 of ADI SHARK Series) , the algorithm of speaker recognition is gaussian mixture model. According to result of experiments, the speaker recognition of DSP can gain good recognition and speed efficiency.
|
29 |
Feature Design for Text Independent Speaker Recognition in Numerous Speaker CasesHuang, Chun-Hao 28 June 2001 (has links)
A Microsoft Windows program is designed to implement a text independent speaker recognition system in numerous speaker cases based on Mel-Cepstrum and hierarchical tree classifier and binary vector quantization. Experimental result show that the accuracy is barely affected by increasing population sizes. And the speed of recognizing is fast than traditional methods.
|
30 |
Chinese Input Method Based on First Mandarin Phonetic Alphabet for Mobile Devices and an Approach in Speaker Diarization with Divide-and-ConquerTseng, Chun-han 09 September 2008 (has links)
There are two research topics in this thesis. First, we implement a
highly efficient Chinese input method. Second, we apply a
divide-and-conquer scheme to the speaker diarization problem.
The implemented Chinese input method transforms an input first-symbol
sequence into a character string (a sentence). This means that a user
only needs to input a first Mandarin phonetic symbol per character,
which is very efficient compared to the current methods.
The implementation is based on a dynamic programming scheme
and language models. To reduce time complexity, the vocabulary for the
language model consists of 1-, 2-, and 3-character words only.
The speaker diarization system consists of segmentation and clustering
modules. The divide-and-conquer scheme is essentially implemented in
the clustering module. We evaluate the performance of our system using
the speaker diarization score defined in the 2003 Rich Transcription
Evaluation Plan. Compared to the baseline, our method significantly
reduces the processing time without compromising diarization accuracy.
|
Page generated in 0.0481 seconds