Spelling suggestions: "subject:"automatic speakers recognition""
1 |
Automatic speaker recognition by linear prediction : a study of the parametric sensitivity of the modelCollins, Anthony McLaren, n/a January 1982 (has links)
The application of the linear prediction Model for
speech waveform analysis to context-independent automatic
speaker recognition is explored, primarily in terns of the
parametric sensitivity of the model. Feature vectors to
characterize speakers are formed from linear prediction
speech parameters computed as inverse filter coefficients,
reflection coefficients or cepstral coefficients, and also
power spectrum parameters via Fast Fourier Transform coefficients.
The comparative performance of these parameters is
investigated in speaker recognition experiments. The stability
of the linear prediction parameters is tested over a
range of model order from p=6 to p=30. Two independent
speech databases are used to substantiate the experimental
results.
The quality of the automatic recognition technique is
assessed in a novel experiment based on a direct performance
comparison with the human skill of aural recognition.
Correlation is sought between the performance of the aural
and automatic recognition methods, for each of the four parameter
sets. Although the recognition accuracy of the automatic system is superior to that of the direct aural technique,
the error distributions are highly variable. The performance
of the automatic system is shown to be empirically
based and unlike the intuitive human process.
An extended preamble to the description of the experiments
reviews the current art of automatic speaker recognition,
with a critical consideration of the performance of
linear prediction techniques. As supported by our experimental
results, it is concluded that success in the laboratory
rests upon a rather fragile foundation. Application to
problems beyond the controlled laboratory environment is
seen, therefore, to be still more precarious.
|
2 |
Development of a text-independent automatic speaker recognition systemMokgonyane, Tumisho Billson January 2021 (has links)
Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2021 / The task of automatic speaker recognition, wherein a system verifies or identifies
speakers from a recording of their voices, has been researched for several decades.
However, research in this area has been carried out largely on freely accessible
speaker datasets built on languages that are well-resourced like English. This study
undertakes automatic speaker recognition research focused on a low-resourced
language, Sepedi. As one of the 11 official languages in South Africa, Sepedi is
spoken by at least 2.8 million people. Pre-recorded voices were acquired from a
speech and language national repository, namely, the National Centre for Human
Language Technology (NCHLT), were we selected the Sepedi NCHLT Speech
Corpus. The open-source pyAudioAnalysis python library was used to extract three
types of acoustic features of speech namely, time, frequency and cepstral domain
features, from the acquired speech data. The effects and compatibility of these
acoustic features was investigated. It was observed that combining the three acoustic
features of speech had a more significant effect than using individual features as far
as speaker recognition accuracy is concerned. The study also investigated the
performance of machine learning algorithms on low-resourced languages such as
Sepedi. Five machine learning (ML) algorithms implemented on Scikit-learn namely,
K-nearest neighbours (KNN), support vector machines (SVM), random forest (RF),
logistic regression (LR), and multi-layer perceptrons (MLP) were used to train different
classifier models. The GridSearchCV algorithm, also implemented on Scikit-learn, was
used to deduce ideal hyper-parameters for each of the five ML algorithms. The
classifier models were evaluated on recognition accuracy and the results show that
the MLP classifier, with a recognition accuracy of 98%, outperforms KNN, RF, LR and
SVM classifiers. A graphical user interface (GUI) is developed and the best performing
classifier model, MLP, is deployed on the developed GUI intended to be used for real time speaker identification and verification tasks. Participants were recruited to the
GUI performance and acceptable results were obtained
|
3 |
Measuring, refining and calibrating speaker and language information extracted from speechBrummer, Niko 12 1900 (has links)
Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: We propose a new methodology, based on proper scoring rules, for the evaluation
of the goodness of pattern recognizers with probabilistic outputs. The
recognizers of interest take an input, known to belong to one of a discrete set
of classes, and output a calibrated likelihood for each class. This is a generalization
of the traditional use of proper scoring rules to evaluate the goodness
of probability distributions. A recognizer with outputs in well-calibrated probability
distribution form can be applied to make cost-effective Bayes decisions
over a range of applications, having di fferent cost functions. A recognizer
with likelihood output can additionally be employed for a wide range of prior
distributions for the to-be-recognized classes.
We use automatic speaker recognition and automatic spoken language
recognition as prototypes of this type of pattern recognizer. The traditional
evaluation methods in these fields, as represented by the series of NIST Speaker
and Language Recognition Evaluations, evaluate hard decisions made by the
recognizers. This makes these recognizers cost-and-prior-dependent. The proposed
methodology generalizes that of the NIST evaluations, allowing for the
evaluation of recognizers which are intended to be usefully applied over a wide
range of applications, having variable priors and costs.
The proposal includes a family of evaluation criteria, where each member
of the family is formed by a proper scoring rule. We emphasize two members
of this family: (i) A non-strict scoring rule, directly representing error-rate
at a given prior. (ii) The strict logarithmic scoring rule which represents
information content, or which equivalently represents summarized error-rate,
or expected cost, over a wide range of applications.
We further show how to form a family of secondary evaluation criteria,
which by contrasting with the primary criteria, form an analysis of the goodness
of calibration of the recognizers likelihoods.
Finally, we show how to use the logarithmic scoring rule as an objective
function for the discriminative training of fusion and calibration of speaker
and language recognizers. / AFRIKAANSE OPSOMMING: Ons wys hoe om die onsekerheid in die uittree van outomatiese
sprekerherkenning- en taalherkenningstelsels voor te stel, te meet, te kalibreer
en te optimeer. Dit maak die bestaande tegnologie akkurater, doeltre ender
en meer algemeen toepasbaar.
|
4 |
Speaker recognition by voice / Asmens atpažinimas pagal balsąKamarauskas, Juozas 15 June 2009 (has links)
Questions of speaker’s recognition by voice are investigated in this dissertation. Speaker recognition systems, their evolution, problems of recognition, systems of features, questions of speaker modeling and matching used in text-independent and text-dependent speaker recognition are considered too.
The text-independent speaker recognition system has been developed during this work. The Gaussian mixture model approach was used for speaker modeling and pattern matching.
The automatic method for voice activity detection was proposed. This method is fast and does not require any additional actions from the user, such as indicating patterns of the speech signal and noise.
The system of the features was proposed. This system consists of parameters of excitation source (glottal) and parameters of the vocal tract. The fundamental frequency was taken as an excitation source parameter and four formants with three antiformants were taken as parameters of the vocal tract. In order to equate dispersions of the formants and antiformants we propose to use them in mel-frequency scale. The standard mel-frequency cepstral coefficients (MFCC) for comparison of the results were implemented in the recognition system too. These features make baseline in speech and speaker recognition. The experiments of speaker recognition have shown that our proposed system of features outperformed standard mel-frequency cepstral coefficients. The equal error rate (EER) was equal to 5.17% using proposed... [to full text] / Disertacijoje nagrinėjami kalbančiojo atpažinimo pagal balsą klausimai. Aptartos kalbančiojo atpažinimo sistemos, jų raida, atpažinimo problemos, požymių sistemos įvairovė bei kalbančiojo modeliavimo ir požymių palyginimo metodai, naudojami nuo ištarto teksto nepriklausomame bei priklausomame kalbančiojo atpažinime.
Darbo metu sukurta nuo ištarto teksto nepriklausanti kalbančiojo atpažinimo sistema. Kalbėtojų modelių kūrimui ir požymių palyginimui buvo panaudoti Gauso mišinių modeliai.
Pasiūlytas automatinis vokalizuotų garsų išrinkimo (segmentavimo) metodas. Šis metodas yra greitai veikiantis ir nereikalaujantis iš vartotojo jokių papildomų veiksmų, tokių kaip kalbos signalo ir triukšmo pavyzdžių nurodymas.
Pasiūlyta požymių vektorių sistema, susidedanti iš žadinimo signalo bei balso trakto parametrų. Kaip žadinimo signalo parametras, panaudotas žadinimo signalo pagrindinis dažnis, kaip balso trakto parametrai, panaudotos keturios formantės bei trys antiformantės. Siekiant suvienodinti žemesnių bei aukštesnių formančių ir antiformančių dispersijas, jas pasiūlėme skaičiuoti melų skalėje. Rezultatų palyginimui sistemoje buvo realizuoti standartiniai požymiai, naudojami kalbos bei asmens atpažinime – melų skalės kepstro koeficientai (MSKK). Atlikti kalbančiojo atpažinimo eksperimentai parodė, kad panaudojus pasiūlytą požymių sistemą buvo gauti geresni atpažinimo rezultatai, nei panaudojus standartinius požymius (MSKK). Gautas lygių klaidų lygis, panaudojant pasiūlytą požymių... [toliau žr. visą tekstą]
|
5 |
Efficient speaker diarization and low-latency speaker spotting / Segmentation et regroupement efficaces en locuteurs et détection des locuteurs à faible latencePatino Villar, José María 24 October 2019 (has links)
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'analyse spectrale multi-résolution, la détection explicite des changements de locuteurs utilisant les BKs, ainsi que les techniques de fusion SRL qui combinent les avantages des BKs et des solutions basées sur un apprentissage approfondi. La tâche de la SRL est étroitement liée à celle de la reconnaissance ou de la détection du locuteur, qui consiste à comparer deux segments de parole et à déterminer s'ils ont été prononcés par le même locuteur ou non. Même si de nombreuses applications pratiques nécessitent leur combinaison, les deux tâches sont traditionnellement exécutées indépendamment l'une de l'autre. La deuxième partie de cette thèse porte sur une application où les solutions de SRL et de reconnaissance des locuteurs sont réunies. La nouvelle tâche, appelée détection de locuteurs à faible latence, consiste à détecter rapidement les locuteurs connus dans des flux audio à locuteurs multiples. Il s'agit de repenser la SRL en ligne et la manière dont les sous-systèmes de SRL et de détection devraient être combinés au mieux. / Speaker diarization (SD) involves the detection of speakers within an audio stream and the intervals during which each speaker is active, i.e. the determination of ‘who spoken when’. The first part of the work presented in this thesis exploits an approach to speaker modelling involving binary keys (BKs) as a solution to SD. BK modelling is efficient and operates without external training data, as it operates using test data alone. The presented contributions include the extraction of BKs based on multi-resolution spectral analysis, the explicit detection of speaker changes using BKs, as well as SD fusion techniques that combine the benefits of both BK and deep learning based solutions. The SD task is closely linked to that of speaker recognition or detection, which involves the comparison of two speech segments and the determination of whether or not they were uttered by the same speaker. Even if many practical applications require their combination, the two tasks are traditionally tackled independently from each other. The second part of this thesis considers an application where SD and speaker recognition solutions are brought together. The new task, coined low latency speaker spotting (LLSS), involves the rapid detection of known speakers within multi-speaker audio streams. It involves the re-thinking of online diarization and the manner by which diarization and detection sub-systems should best be combined.
|
6 |
Анализа мел-фреквенцијских кепстралних коефицијената као обележја коришћених при аутоматском препознавању говорника / Analiza mel-frekvencijskih kepstralnih koeficijenata kao obeležja korišćenih pri automatskom prepoznavanju govornika / Analysis of mel-frequency cepstral coefficients as features used for automatic speaker recognitionJokić Ivan 24 October 2014 (has links)
<p>Рад је окренут ка анализи мел-фреквенцијских кепстралних коефицијената као обележја говорника која се користе при аутоматском препознавању говорника. Испитан је утицај промене облика чујних критичних опсега као и модификације енергије у њима на тачност препознавања говорника. Такође испитане су и неке трансформације ради умањења временске променљивости модела истих говорника.</p> / <p>Rad je okrenut ka analizi mel-frekvencijskih kepstralnih koeficijenata kao obeležja govornika koja se koriste pri automatskom prepoznavanju govornika. Ispitan je uticaj promene oblika čujnih kritičnih opsega kao i modifikacije energije u njima na tačnost prepoznavanja govornika. Takođe ispitane su i neke transformacije radi umanjenja vremenske promenljivosti modela istih govornika.</p> / <p>The work is oriented towards the analysis of mel-frequency cepstral<br />coefficients as speaker features used in automatic speaker recognition. The<br />influence of the shape of auditory critical bands as well as the proposed<br />energy modification inside them is tested. Also, some transformations for<br />reducing of time variability of models of the same speakers are proposed.</p>
|
Page generated in 0.1465 seconds