Spelling suggestions: "subject:"speaker"" "subject:"peaker""
31 |
Children prefer to acquire information from unambiguous speakersGillis, Randall January 2011 (has links)
Detecting ambiguity is essential for successful communication. Two studies investigated whether preschool- (4- to 5-year-old) and school-age (6- to 7-year-old) children show sensitivity to communicative ambiguity and can use this cue to determine which speakers constitute valuable informational sources. Children were provided clues to the location of hidden dots by speakers who varied in clarity and accuracy. Subsequently, children decided from whom they would like to receive additional information. In Study 1, preschool- (n=40) and school-age (n=42) children preferred to solicit information from unambiguous than from ambiguous speakers. However, ambiguous speakers were preferred to speakers who provided inaccurate information. In Study 2, when not provided with information about the outcome of the speakers’ clues, school-age (n=22), but not preschool-age (n=19), children preferred unambiguous relative to ambiguous speakers. Results highlight a developmental progression in children’s use of communicative ambiguity as a cue to determining which individuals are preferable informants.
|
32 |
Automatic speaker recognition using phase based featuresThiruvaran, Tharmarajah , Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2009 (has links)
Despite recent advances, improving the accuracy of automatic speaker recognition systems remains an important and challenging area of research. This thesis investigates two-phase based features, namely the frequency modulation (FM) feature and the group delay feature in order to improve the speaker recognition accuracy. Introducing features complementary to spectral envelope-based features is a promising approach for increasing the information content of the speaker recognition system. Although phase-based features are motivated by psychophysics and speech production considerations, they have rarely been incorporated into speaker recognition front-ends. A theory has been developed and reported in this thesis, to show that the FM component can be extracted using second-order all pole modelling, and a technique for extracting FM features using this model is proposed, to produce very smooth, slowly varying FM features that are effective for speaker recognition tasks. This approach is shown herein to significantly improve speaker recognition performance over other existing FM extraction methods. A highly computationally efficient FM estimation technique is then proposed and its computational efficiency is shown through a comparative study with other methods with respect to the trade off between computational complexity and performance. In order to further enhance the FM based front-end specifically for speaker recognition, optimum frequency band allocation is studied in terms of the number of sub-bands and spacing of centre frequencies, and two new frequency band re-allocations are proposed for FM based speaker recognition. Two group delay features are also proposed: log compressed group delay feature and the sub-band group delay feature, to address problems in group delay caused by the zeros of the z-transform polynomial of a speech signal being close to the unit circle. It has been shown that the combination of group delay and FM, complements Mel Frequency Cepstral Coefficient (MFCC) in speaker recognition tasks. Furthermore, the proposed FM feature is successfully utilised for automatic forensic speaker recognition, which is implemented based on the likelihood ratio framework with two stage modelling and calibration, and shown to behave in a complementary manner to MFCCs. Notably, the FM based system provides better calibration loss than the MFCC based system, suggesting less ambiguity of FM information than MFCC information in an automatic forensic speaker recognition system. In order to demonstrate the effectiveness of FM features in a large scale speaker recognition environment, an FM-based speaker recognition subsystem is developed and submitted to the NIST 2008 speaker recognition evaluation as part of the I4U submission. Post evaluation analysis shows a 19.7% relative improvement over the traditional MFCC based subsystem when it is augmented by the FM based subsystem. Consistent improvements in performance are obtained when MFCC is augmented with FM in all sub-categories of NIST 2008, in three development tasks and for the NIST 2001 database, demonstrating the complementary behaviour of MFCC and FM features.
|
33 |
No, they won't "just sound like each other" NNS-NNS negotiated interaction and attention to phonological form on targeted L2 pronunciation tasksSicola, Laura January 1900 (has links)
Zugl.: Philadelphia, Univ. of Pennsylvania, Diss.
|
34 |
Speaker Verification Systems Under Various Noise and SNR ConditionsWan, Qianhui January 2017 (has links)
In speaker verification, the mismatches between the training speech and the testing speech can greatly affect the robustness of classification algorithms, and the mismatches are mainly caused by the changes in the noise types and the signal to noise ratios. This thesis aims at finding the most robust classification methods under multi-noise and multiple signal to noise ratio conditions. Comparison of several well-known state of the art classification algorithms and features in speaker verification are made through examining the performance of small-set speaker verification system (e.g. voice lock for a family). The effect of the testing speech length is also examined. The i-vector/Probabilistic Linear Discriminant Analysis method with compensation strategies is shown to provide a stable performance for both previously seen and previously unseen noise scenarios, and a C++ implementation with online processing and multi-threading is developed for this approach.
|
35 |
A Nonlinear Mixture Autoregressive Model For Speaker VerificationSrinivasan, Sundararajan 30 April 2011 (has links)
In this work, we apply a nonlinear mixture autoregressive (MixAR) model to supplant the Gaussian mixture model for speaker verification. MixAR is a statistical model that is a probabilistically weighted combination of components, each of which is an autoregressive filter in addition to a mean. The probabilistic mixing and the datadependent weights are responsible for the nonlinear nature of the model. Our experiments with synthetic as well as real speech data from standard speech corpora show that MixAR model outperforms GMM, especially under unseen noisy conditions. Moreover, MixAR did not require delta features and used 2.5x fewer parameters to achieve comparable or better performance as that of GMM using static as well as delta features. Also, MixAR suffered less from overitting issues than GMM when training data was sparse. However, MixAR performance deteriorated more quickly than that of GMM when evaluation data duration was reduced. This could pose limitations on the required minimum amount of evaluation data when using MixAR model for speaker verification.
|
36 |
C-SALT: Conversational Style Attribution Given Legislative TranscriptionsSummers, Garrett D 01 June 2016 (has links) (PDF)
Common authorship attribution is well described by various authors summed up in Jacques Savoy’s work. Namely, authorship attribution is the process “whereby the author of a given text must be determined based on text samples written by known authors [48].” The field of authorship attribution has been explored in various contexts. Most of these works have been done on the authors written text. This work seeks to approach a similar field to authorship attribution. We seek to attribute not a given author to a work based on style, but a style itself that is used by a group of people. Our work classifies an author into a category based off the spoken dialogue they have said, not text they have written down. Using this system, we differentiate California State Legislators from other entities in a hearing. This is done using audio transcripts of the hearing in question. As this is not Authorship Attribution, the work can better be described as ”Conversational Style Attribution”. Used as a tool in speaker identification classifiers, we were able to increase the accuracy of audio recognition by 50.9%, and facial recognition by 51.6%. These results show that our research into Conversational Style Attribution provides a significant benefit to the speaker identification process.
|
37 |
Analysis of speaking time and content of the various debates of the presidential campaign : Automated AI analysis of speech time and content of presidential debates based on the audio using speaker detection and topic detection / Analys av talartid och innehåll i de olika debatterna under presidentvalskampanjen. : Automatiserad AI-analys av taltid och innehåll i presidentdebatter baserat på ljudet med hjälp av talardetektering och ämnesdetektering.Valentin Maza, Axel January 2023 (has links)
The field of artificial intelligence (AI) has grown rapidly in recent years and its applications are becoming more widespread in various fields, including politics. In particular, presidential debates have become a crucial aspect of election campaigns and it is important to analyze the information exchanged in these debates in an objective way to let voters choose without being influenced by biased data. The objective of this project was to create an automatic analysis tool for presidential debates using AI. The main challenge of the final system was to determine the speaking time of each candidate and to analyze what each candidate said, to detect the topics discussed and to calculate the time spent on each topic. This thesis focus mainly on the speaker detection part of this system. In addition, the high overlap rate in the debates, where candidates cut each other off, posed a significant challenge for speaker diarization, which aims to determine who speaks when. This problem was considered appropriate for a Master’s thesis project, as it involves a combination of advanced techniques in AI and speech processing, making it an important and difficult task. The application to political debates and the accompanying overlapping pathways makes this task both challenging and innovative. There are several ways to solve the problem of speaker detection. We have implemented classical approaches that involve segmentation techniques, speaker representation using embeddings such as i-vectors or x-vectors, and clustering. Yet, due to speech overlaps, the End-to-end solution was implemented using pyannote-audio (an open-source toolkit written in Python for speaker diarization) and the diarization error rate was significantly reduced after refining the model using our own labeled data. The results of this project showed that it was possible to create an automated presidential debate analysis tool using AI. Specifically, this thesis has established a state of the art of speaker detection taking into account the particularities of the politics such as the high speaker overlap rate. / AI-området (artificiell intelligens) har vuxit snabbt de senaste åren och dess tillämpningar blir alltmer utbredda inom olika områden, inklusive politik. Särskilt presidentdebatter har blivit en viktig aspekt av valkampanjerna och det är viktigt att analysera den information som utbyts i dessa debatter på ett objektivt sätt så att väljarna kan välja utan att påverkas av partiska uppgifter. Målet med detta projekt var att skapa ett automatiskt analysverktyg för presidentdebatter med hjälp av AI. Den största utmaningen för det slutliga systemet var att bestämma taltid för varje kandidat och att analysera vad varje kandidat sa, att upptäcka diskuterade ämnen och att beräkna den tid som spenderades på varje ämne. Denna avhandling fokuserar huvudsakligen på detektering av talare i detta system. Dessutom innebar den höga överlappningsgraden i debatterna, där kandidaterna avbröt varandra, en stor utmaning för talardarization, som syftar till att fastställa vem som talar när. Detta problem ansågs lämpligt för ett examensarbete, eftersom det omfattar en kombination av avancerade tekniker inom AI och talbehandling, vilket gör det till en viktig och svår uppgift. Tillämpningen på politiska debatter och den åtföljande överlappande vägar gör denna uppgift både utmanande och innovativ. Det finns flera sätt att lösa problemet med att upptäcka talare. Vi har genomfört klassiska metoder som innefattar segmentering tekniker, representation av talare med hjälp av inbäddningar som i-vektorer eller x-vektorer och klustering. På grund av talöverlappningar implementerades dock Endto-end-lösningen med pyannote-audio (en verktygslåda med öppen källkod skriven i Python för diarisering av talare) och diariseringsfelprocenten reducerades avsevärt efter att modellen förfinats med hjälp av våra egna märkta data. Resultaten av detta projekt visade att det var möjligt att skapa ett automatiserat verktyg för analys av presidentdebatten med hjälp av AI. Specifikt har denna avhandling etablerat en state of the art av talardetektion med hänsyn till politikens särdrag såsom den höga överlappningsfrekvensen av talare.
|
38 |
On Traffic Analysis Attacks To Encrypted VoIP CallsLu, Yuanchao 10 December 2009 (has links)
No description available.
|
39 |
Functions of <i>yahari/yappari</i>Okutsu, Yuko January 1992 (has links)
No description available.
|
40 |
Robust Formant tracking for Continuous Speech with Speaker Variability / Robust Formant tracking for Continuous SpeechMustafa, Kamran 12 1900 (has links)
Exposure to loud sounds can cause damage to the inner ear, leading to degradation of the neural response to speech and to formant frequencies in particular. This may result in decreased intelligibility of speech. An amplification scheme for hearing aids, called Contrast Enhanced Frequency Shaping (CEFS), may improve speech perception for ears with sound-induced hearing damage. CEFS takes into account across-frequency distortions introduced by the impaired ear and requires accurate and robust formant frequency estimates to allow dynamic, speech-spectrum-dependent amplification of speech in hearing aids. Several algorithms have been developed for extracting the formant information from speech signals, however most of these algorithms are either not robust in real-life noise environments or are not suitable for real-time implementation. The algorithm proposed in this thesis achieves formant extraction from continuous speech by using a time-varying adaptive filterbank to track and estimate individual formant frequencies. The formant tracker incorporates an adaptive voicing detector and a gender detector for robust formant extraction from continuous speech, for both male and female speakers in the presence of background noise. Thorough testing of the algorithm using various speech sentences has shown promising results over a wide range of SNRs for various types of background noises, such as AWGN, single and multiple competing background speakers and various other environmental sounds. / Thesis / Master of Applied Science (MASc)
|
Page generated in 0.0372 seconds