Spelling suggestions: "subject:"voice biometric"" "subject:"joice biometric""
1 |
Voice biometrics under mismatched noise conditionsPillay, Surosh Govindasamy January 2011 (has links)
This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes.
|
2 |
Použitelnost Deepfakes v oblasti kybernetické bezpečnosti / Applicability of Deepfakes in the Field of Cyber SecurityFirc, Anton January 2021 (has links)
Deepfake technológia je v poslednej dobe na vzostupe. Vzniká mnoho techník a nástrojov pre tvorbu deepfake médií a začínajú sa používať ako pre nezákonné tak aj pre prospešné činnosti. Nezákonné použitie vedie k výskumu techník pre detekciu deepfake médií a ich neustálemu zlepšovaniu, takisto ako k potrebe vzdelávať širokú verejnosť o nástrahách, ktoré táto technológia prináša. Jedna z málo preskúmaných oblastí škodlivého použitia je používanie deepfake pre oklamanie systémov hlasovej autentifikácie. Názory spoločnosti na vykonateľnosť takýchto útokov sa líšia, no existuje len málo vedeckých dôkazov. Cieľom tejto práce je preskúmať aktuálnu pripravenosť systémov hlasovej biometrie čeliť deepfake nahrávkam. Vykonané experimenty ukazujú, že systémy hlasovej biometrie sú zraniteľné pomocou deepfake nahrávok. Napriek tomu, že skoro všetky verejne dostupné nástroje a modely sú určené pre syntézu anglického jazyka, v tejto práci ukazujem, že syntéza hlasu v akomkoľvek jazyku nie je veľmi náročná. Nakoniec navrhujem riešenie pre zníženie rizika ktoré deepfake nahrávky predstavujú pre systémy hlasovej biometrie, a to používať overenie hlasu závislé na texte, nakoľko som ukázal, že je odolnejšie proti deepfake nahrávkam.
|
3 |
Automatic Person Verification Using Speech and Face InformationSanderson, Conrad, conradsand@ieee.org January 2003 (has links)
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the persons speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems based on face images and/or speech signals have been shown to be quite effective. However, their performance easily degrades in the presence of a mismatch between training and testing conditions. For speech based systems this is usually in the form of channel distortion and/or ambient noise; for face based systems it can be in the form of a change in the illumination direction. A system which uses more than one biometric at the same time is known as a multi-modal verification system; it is often comprised of several modality experts and a decision stage. Since a multi-modal system uses complimentary discriminative information, lower error rates can be achieved; moreover, such a system can also be more robust, since the contribution of the modality affected by environmental conditions can be decreased. This thesis makes several contributions aimed at increasing the robustness of single- and multi-modal verification systems. Some of the major contributions are listed below. The robustness of a speech based system to ambient noise is increased by using Maximum Auto-Correlation Value (MACV) features, which utilize information from the source part of the speech signal. A new facial feature extraction technique is proposed (termed DCT-mod2), which utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients of spatially neighbouring blocks. The DCT-mod2 features are shown to be robust to an illumination direction change as well as being over 80 times quicker to compute than 2D Gabor wavelet derived features. The fragility of Principal Component Analysis (PCA) derived features to an illumination direction change is solved by introducing a pre-processing step utilizing the DCT-mod2 feature extraction. We show that the enhanced PCA technique retains all the positive aspects of traditional PCA (that is, robustness to compression artefacts and white Gaussian noise) while also being robust to the illumination direction change. Several new methods, for use in fusion of speech and face information under noisy conditions, are proposed; these include a weight adjustment procedure, which explicitly measures the quality of the speech signal, and a decision stage comprised of a structurally noise resistant piece-wise linear classifier, which attempts to minimize the effects of noisy conditions via structural constraints on the decision boundary.
|
4 |
Formy zadávání a zpracování textových dat a informací v podnikových IS - trendy a aktuální praxe / Forms of text data input and processing in business information systems - trends and current practicesVálková, Jana January 2011 (has links)
This thesis introduces readers to the basic types of the text and information inputs and processing to the computer. Thesis also includes historical contexts, current trends and future perspective of computer data input technologies and their use in practice. The first part of the thesis is a summary of a particular forms of entering and processing of the text data and information. The following part presents technological trends on the market concentrated on the automatic speech recognition systems along with the possibilities of their application in the business sphere. The rest of the thesis consists of a survey between Czech IT companies and based on it's results comes a suggestion of which technologies should be used as a part of the information systems.
|
5 |
Efficient speaker diarization and low-latency speaker spotting / Segmentation et regroupement efficaces en locuteurs et détection des locuteurs à faible latencePatino Villar, José María 24 October 2019 (has links)
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'analyse spectrale multi-résolution, la détection explicite des changements de locuteurs utilisant les BKs, ainsi que les techniques de fusion SRL qui combinent les avantages des BKs et des solutions basées sur un apprentissage approfondi. La tâche de la SRL est étroitement liée à celle de la reconnaissance ou de la détection du locuteur, qui consiste à comparer deux segments de parole et à déterminer s'ils ont été prononcés par le même locuteur ou non. Même si de nombreuses applications pratiques nécessitent leur combinaison, les deux tâches sont traditionnellement exécutées indépendamment l'une de l'autre. La deuxième partie de cette thèse porte sur une application où les solutions de SRL et de reconnaissance des locuteurs sont réunies. La nouvelle tâche, appelée détection de locuteurs à faible latence, consiste à détecter rapidement les locuteurs connus dans des flux audio à locuteurs multiples. Il s'agit de repenser la SRL en ligne et la manière dont les sous-systèmes de SRL et de détection devraient être combinés au mieux. / Speaker diarization (SD) involves the detection of speakers within an audio stream and the intervals during which each speaker is active, i.e. the determination of ‘who spoken when’. The first part of the work presented in this thesis exploits an approach to speaker modelling involving binary keys (BKs) as a solution to SD. BK modelling is efficient and operates without external training data, as it operates using test data alone. The presented contributions include the extraction of BKs based on multi-resolution spectral analysis, the explicit detection of speaker changes using BKs, as well as SD fusion techniques that combine the benefits of both BK and deep learning based solutions. The SD task is closely linked to that of speaker recognition or detection, which involves the comparison of two speech segments and the determination of whether or not they were uttered by the same speaker. Even if many practical applications require their combination, the two tasks are traditionally tackled independently from each other. The second part of this thesis considers an application where SD and speaker recognition solutions are brought together. The new task, coined low latency speaker spotting (LLSS), involves the rapid detection of known speakers within multi-speaker audio streams. It involves the re-thinking of online diarization and the manner by which diarization and detection sub-systems should best be combined.
|
Page generated in 0.0402 seconds