• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 1
  • Tagged with
  • 6
  • 6
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatic speaker recognition by linear prediction : a study of the parametric sensitivity of the model

Collins, Anthony McLaren, n/a January 1982 (has links)
The application of the linear prediction Model for speech waveform analysis to context-independent automatic speaker recognition is explored, primarily in terns of the parametric sensitivity of the model. Feature vectors to characterize speakers are formed from linear prediction speech parameters computed as inverse filter coefficients, reflection coefficients or cepstral coefficients, and also power spectrum parameters via Fast Fourier Transform coefficients. The comparative performance of these parameters is investigated in speaker recognition experiments. The stability of the linear prediction parameters is tested over a range of model order from p=6 to p=30. Two independent speech databases are used to substantiate the experimental results. The quality of the automatic recognition technique is assessed in a novel experiment based on a direct performance comparison with the human skill of aural recognition. Correlation is sought between the performance of the aural and automatic recognition methods, for each of the four parameter sets. Although the recognition accuracy of the automatic system is superior to that of the direct aural technique, the error distributions are highly variable. The performance of the automatic system is shown to be empirically based and unlike the intuitive human process. An extended preamble to the description of the experiments reviews the current art of automatic speaker recognition, with a critical consideration of the performance of linear prediction techniques. As supported by our experimental results, it is concluded that success in the laboratory rests upon a rather fragile foundation. Application to problems beyond the controlled laboratory environment is seen, therefore, to be still more precarious.
2

Development of a text-independent automatic speaker recognition system

Mokgonyane, Tumisho Billson January 2021 (has links)
Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2021 / The task of automatic speaker recognition, wherein a system verifies or identifies speakers from a recording of their voices, has been researched for several decades. However, research in this area has been carried out largely on freely accessible speaker datasets built on languages that are well-resourced like English. This study undertakes automatic speaker recognition research focused on a low-resourced language, Sepedi. As one of the 11 official languages in South Africa, Sepedi is spoken by at least 2.8 million people. Pre-recorded voices were acquired from a speech and language national repository, namely, the National Centre for Human Language Technology (NCHLT), were we selected the Sepedi NCHLT Speech Corpus. The open-source pyAudioAnalysis python library was used to extract three types of acoustic features of speech namely, time, frequency and cepstral domain features, from the acquired speech data. The effects and compatibility of these acoustic features was investigated. It was observed that combining the three acoustic features of speech had a more significant effect than using individual features as far as speaker recognition accuracy is concerned. The study also investigated the performance of machine learning algorithms on low-resourced languages such as Sepedi. Five machine learning (ML) algorithms implemented on Scikit-learn namely, K-nearest neighbours (KNN), support vector machines (SVM), random forest (RF), logistic regression (LR), and multi-layer perceptrons (MLP) were used to train different classifier models. The GridSearchCV algorithm, also implemented on Scikit-learn, was used to deduce ideal hyper-parameters for each of the five ML algorithms. The classifier models were evaluated on recognition accuracy and the results show that the MLP classifier, with a recognition accuracy of 98%, outperforms KNN, RF, LR and SVM classifiers. A graphical user interface (GUI) is developed and the best performing classifier model, MLP, is deployed on the developed GUI intended to be used for real time speaker identification and verification tasks. Participants were recruited to the GUI performance and acceptable results were obtained
3

Measuring, refining and calibrating speaker and language information extracted from speech

Brummer, Niko 12 1900 (has links)
Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: We propose a new methodology, based on proper scoring rules, for the evaluation of the goodness of pattern recognizers with probabilistic outputs. The recognizers of interest take an input, known to belong to one of a discrete set of classes, and output a calibrated likelihood for each class. This is a generalization of the traditional use of proper scoring rules to evaluate the goodness of probability distributions. A recognizer with outputs in well-calibrated probability distribution form can be applied to make cost-effective Bayes decisions over a range of applications, having di fferent cost functions. A recognizer with likelihood output can additionally be employed for a wide range of prior distributions for the to-be-recognized classes. We use automatic speaker recognition and automatic spoken language recognition as prototypes of this type of pattern recognizer. The traditional evaluation methods in these fields, as represented by the series of NIST Speaker and Language Recognition Evaluations, evaluate hard decisions made by the recognizers. This makes these recognizers cost-and-prior-dependent. The proposed methodology generalizes that of the NIST evaluations, allowing for the evaluation of recognizers which are intended to be usefully applied over a wide range of applications, having variable priors and costs. The proposal includes a family of evaluation criteria, where each member of the family is formed by a proper scoring rule. We emphasize two members of this family: (i) A non-strict scoring rule, directly representing error-rate at a given prior. (ii) The strict logarithmic scoring rule which represents information content, or which equivalently represents summarized error-rate, or expected cost, over a wide range of applications. We further show how to form a family of secondary evaluation criteria, which by contrasting with the primary criteria, form an analysis of the goodness of calibration of the recognizers likelihoods. Finally, we show how to use the logarithmic scoring rule as an objective function for the discriminative training of fusion and calibration of speaker and language recognizers. / AFRIKAANSE OPSOMMING: Ons wys hoe om die onsekerheid in die uittree van outomatiese sprekerherkenning- en taalherkenningstelsels voor te stel, te meet, te kalibreer en te optimeer. Dit maak die bestaande tegnologie akkurater, doeltre ender en meer algemeen toepasbaar.
4

Speaker recognition by voice / Asmens atpažinimas pagal balsą

Kamarauskas, Juozas 15 June 2009 (has links)
Questions of speaker’s recognition by voice are investigated in this dissertation. Speaker recognition systems, their evolution, problems of recognition, systems of features, questions of speaker modeling and matching used in text-independent and text-dependent speaker recognition are considered too. The text-independent speaker recognition system has been developed during this work. The Gaussian mixture model approach was used for speaker modeling and pattern matching. The automatic method for voice activity detection was proposed. This method is fast and does not require any additional actions from the user, such as indicating patterns of the speech signal and noise. The system of the features was proposed. This system consists of parameters of excitation source (glottal) and parameters of the vocal tract. The fundamental frequency was taken as an excitation source parameter and four formants with three antiformants were taken as parameters of the vocal tract. In order to equate dispersions of the formants and antiformants we propose to use them in mel-frequency scale. The standard mel-frequency cepstral coefficients (MFCC) for comparison of the results were implemented in the recognition system too. These features make baseline in speech and speaker recognition. The experiments of speaker recognition have shown that our proposed system of features outperformed standard mel-frequency cepstral coefficients. The equal error rate (EER) was equal to 5.17% using proposed... [to full text] / Disertacijoje nagrinėjami kalbančiojo atpažinimo pagal balsą klausimai. Aptartos kalbančiojo atpažinimo sistemos, jų raida, atpažinimo problemos, požymių sistemos įvairovė bei kalbančiojo modeliavimo ir požymių palyginimo metodai, naudojami nuo ištarto teksto nepriklausomame bei priklausomame kalbančiojo atpažinime. Darbo metu sukurta nuo ištarto teksto nepriklausanti kalbančiojo atpažinimo sistema. Kalbėtojų modelių kūrimui ir požymių palyginimui buvo panaudoti Gauso mišinių modeliai. Pasiūlytas automatinis vokalizuotų garsų išrinkimo (segmentavimo) metodas. Šis metodas yra greitai veikiantis ir nereikalaujantis iš vartotojo jokių papildomų veiksmų, tokių kaip kalbos signalo ir triukšmo pavyzdžių nurodymas. Pasiūlyta požymių vektorių sistema, susidedanti iš žadinimo signalo bei balso trakto parametrų. Kaip žadinimo signalo parametras, panaudotas žadinimo signalo pagrindinis dažnis, kaip balso trakto parametrai, panaudotos keturios formantės bei trys antiformantės. Siekiant suvienodinti žemesnių bei aukštesnių formančių ir antiformančių dispersijas, jas pasiūlėme skaičiuoti melų skalėje. Rezultatų palyginimui sistemoje buvo realizuoti standartiniai požymiai, naudojami kalbos bei asmens atpažinime – melų skalės kepstro koeficientai (MSKK). Atlikti kalbančiojo atpažinimo eksperimentai parodė, kad panaudojus pasiūlytą požymių sistemą buvo gauti geresni atpažinimo rezultatai, nei panaudojus standartinius požymius (MSKK). Gautas lygių klaidų lygis, panaudojant pasiūlytą požymių... [toliau žr. visą tekstą]
5

Efficient speaker diarization and low-latency speaker spotting / Segmentation et regroupement efficaces en locuteurs et détection des locuteurs à faible latence

Patino Villar, José María 24 October 2019 (has links)
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'analyse spectrale multi-résolution, la détection explicite des changements de locuteurs utilisant les BKs, ainsi que les techniques de fusion SRL qui combinent les avantages des BKs et des solutions basées sur un apprentissage approfondi. La tâche de la SRL est étroitement liée à celle de la reconnaissance ou de la détection du locuteur, qui consiste à comparer deux segments de parole et à déterminer s'ils ont été prononcés par le même locuteur ou non. Même si de nombreuses applications pratiques nécessitent leur combinaison, les deux tâches sont traditionnellement exécutées indépendamment l'une de l'autre. La deuxième partie de cette thèse porte sur une application où les solutions de SRL et de reconnaissance des locuteurs sont réunies. La nouvelle tâche, appelée détection de locuteurs à faible latence, consiste à détecter rapidement les locuteurs connus dans des flux audio à locuteurs multiples. Il s'agit de repenser la SRL en ligne et la manière dont les sous-systèmes de SRL et de détection devraient être combinés au mieux. / Speaker diarization (SD) involves the detection of speakers within an audio stream and the intervals during which each speaker is active, i.e. the determination of ‘who spoken when’. The first part of the work presented in this thesis exploits an approach to speaker modelling involving binary keys (BKs) as a solution to SD. BK modelling is efficient and operates without external training data, as it operates using test data alone. The presented contributions include the extraction of BKs based on multi-resolution spectral analysis, the explicit detection of speaker changes using BKs, as well as SD fusion techniques that combine the benefits of both BK and deep learning based solutions. The SD task is closely linked to that of speaker recognition or detection, which involves the comparison of two speech segments and the determination of whether or not they were uttered by the same speaker. Even if many practical applications require their combination, the two tasks are traditionally tackled independently from each other. The second part of this thesis considers an application where SD and speaker recognition solutions are brought together. The new task, coined low latency speaker spotting (LLSS), involves the rapid detection of known speakers within multi-speaker audio streams. It involves the re-thinking of online diarization and the manner by which diarization and detection sub-systems should best be combined.
6

Анализа мел-фреквенцијских кепстралних коефицијената као обележја коришћених при аутоматском препознавању говорника / Analiza mel-frekvencijskih kepstralnih koeficijenata kao obeležja korišćenih pri automatskom prepoznavanju govornika / Analysis of mel-frequency cepstral coefficients as features used for automatic speaker recognition

Jokić Ivan 24 October 2014 (has links)
<p>Рад је окренут ка анализи мел-фреквенцијских кепстралних коефицијената као обележја говорника која се користе при аутоматском препознавању говорника. Испитан је утицај промене облика чујних критичних опсега као и модификације енергије у њима на тачност препознавања говорника. Такође испитане су и неке трансформације ради умањења временске променљивости модела истих говорника.</p> / <p>Rad je okrenut ka analizi mel-frekvencijskih kepstralnih koeficijenata kao obeležja govornika koja se koriste pri automatskom prepoznavanju govornika. Ispitan je uticaj promene oblika čujnih kritičnih opsega kao i modifikacije energije u njima na tačnost prepoznavanja govornika. Takođe ispitane su i neke transformacije radi umanjenja vremenske promenljivosti modela istih govornika.</p> / <p>The work is oriented towards the analysis of mel-frequency cepstral<br />coefficients as speaker features used in automatic speaker recognition. The<br />influence of the shape of auditory critical bands as well as the proposed<br />energy modification inside them is tested. Also, some transformations for<br />reducing of time variability of models of the same speakers are proposed.</p>

Page generated in 0.1003 seconds