• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 11
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 27
  • 27
  • 12
  • 9
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Controlo remoto de presenças recorrendo à tecnologia de speaker verification

Moura, Paulo André Alves January 2010 (has links)
Estágio realizado na PT Inovação e orientado pelo Eng.º Sérgio Ramalho / Tese de mestrado integrado. Engenharia Electrotécnica e de Computadores (Major Telecomunicações). Faculdade de Engenharia. Universidade do Porto. 2010
2

Towards the use of sub-band processing in automatic speaker recognition

Finan, Robert Andrew January 1998 (has links)
No description available.
3

Speaker Verification Systems Under Various Noise and SNR Conditions

Wan, Qianhui January 2017 (has links)
In speaker verification, the mismatches between the training speech and the testing speech can greatly affect the robustness of classification algorithms, and the mismatches are mainly caused by the changes in the noise types and the signal to noise ratios. This thesis aims at finding the most robust classification methods under multi-noise and multiple signal to noise ratio conditions. Comparison of several well-known state of the art classification algorithms and features in speaker verification are made through examining the performance of small-set speaker verification system (e.g. voice lock for a family). The effect of the testing speech length is also examined. The i-vector/Probabilistic Linear Discriminant Analysis method with compensation strategies is shown to provide a stable performance for both previously seen and previously unseen noise scenarios, and a C++ implementation with online processing and multi-threading is developed for this approach.
4

A Nonlinear Mixture Autoregressive Model For Speaker Verification

Srinivasan, Sundararajan 30 April 2011 (has links)
In this work, we apply a nonlinear mixture autoregressive (MixAR) model to supplant the Gaussian mixture model for speaker verification. MixAR is a statistical model that is a probabilistically weighted combination of components, each of which is an autoregressive filter in addition to a mean. The probabilistic mixing and the datadependent weights are responsible for the nonlinear nature of the model. Our experiments with synthetic as well as real speech data from standard speech corpora show that MixAR model outperforms GMM, especially under unseen noisy conditions. Moreover, MixAR did not require delta features and used 2.5x fewer parameters to achieve comparable or better performance as that of GMM using static as well as delta features. Also, MixAR suffered less from overitting issues than GMM when training data was sparse. However, MixAR performance deteriorated more quickly than that of GMM when evaluation data duration was reduced. This could pose limitations on the required minimum amount of evaluation data when using MixAR model for speaker verification.
5

Text-Dependent Speaker Verification Implemented in Matlab Using MFCC and DTW

Tolunay, Atahan January 2010 (has links)
Even though speaker verification is a broad subject, the commercial and personal use implementations are rare. There are several problems that need to be solved before speaker verification can become more useful. The amount of pattern matching and feature extraction techniques is large and the decision on which ones to use is debatable. One of the main problems of speaker verification in general is the impact of noise. The very popular feature extraction technique MFCC is inherently sensitive to mismatch between training and verification conditions. MFCC is used in many speech recognition applications and is not only useful in text-dependent speaker verification. However the most reliable verification techniques are text-dependent. One of the most popular pattern matching techniques in text-dependent speaker verification is DTW. Although having limitations outside the text-dependent applications it is a reliable way of matching templates even with limited amount of training material. The signal processing techniques, MFCC and DTW are explained and discussed in detail along with a Matlab program where these techniques have been implemented. The choices made in signal processing, feature extraction and pattern matching  are determined by discussions of available studies on these topics. The results indicate that it is possible to program text-dependent speaker verification systems that are functional in clean conditions with tools like Matlab.
6

Robustní rozpoznávání mluvčího pomocí neuronových sítí / Robust Speaker Verification with Deep Neural Networks

Profant, Ján January 2019 (has links)
The objective of this work is to study state-of-the-art deep neural networks based speaker verification systems called x-vectors on various conditions, such as wideband and narrowband data and to develop the system, which is robust to unseen language, specific noise or speech codec. This system takes variable length audio recording and maps it into fixed length embedding which is afterward used to represent the speaker. We compared our systems to BUT's submission to Speakers in the Wild Speaker Recognition Challenge (SITW) from 2016, which used previously popular statistical models - i-vectors. We observed, that when comparing single best systems, with recently published x-vectors we were able to obtain more than 4.38 times lower Equal Error Rate on SITW core-core condition compared to SITW submission from BUT. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers for SITW core-multi condition but we could not see the same trend on NIST SRE 2018 VAST data.
7

Rozpoznávání mluvčího ve Skype hovorech / Speaker Recognition in Skype Calls

Kaňok, Tomáš January 2011 (has links)
This diploma thesis is concerned with machine identification and verification of speaker, it's theory and applications. It evaluates existing implementation of the subject by the Speech@FIT group. It also considers plugins for the Skype program. Then a functioning plugin is proposed which makes possible identification of the speaker. It is implemented and evaluated here. Towards the end of this thesis suggestions of future development are presented.
8

Defending against Adversarial Attacks in Speaker Verification Systems

Li-Chi Chang (11178210) 26 July 2021 (has links)
<p>With the advance of the technologies of Internet of things, smart devices or virtual personal assistants at home, such as Google Assistant, Apple Siri, and Amazon Alexa, have been widely used to control and access different objects like door lock, blobs, air conditioner, and even bank accounts, which makes our life convenient. Because of its ease for operations, voice control becomes a main interface between users and these smart devices. To make voice control more secure, speaker verification systems have been researched to apply human voice as biometrics to accurately identify a legitimate user and avoid the illegal access. In recent studies, however, it has been shown that speaker verification systems are vulnerable to different security attacks such as replay, voice cloning, and adversarial attacks. Among all attacks, adversarial attacks are the most dangerous and very challenging to defend. Currently, there is no known method that can effectively defend against such an attack in speaker verification systems.</p> <p>The goal of this project is to design and implement a defense system that is simple, light-weight, and effectively against adversarial attacks for speaker verification. To achieve this goal, we study the audio samples from adversarial attacks in both the time domain and the Mel spectrogram, and find that the generated adversarial audio is simply a clean illegal audio with small perturbations that are similar to white noises, but well-designed to fool speaker verification. Our intuition is that if these perturbations can be removed or modified, adversarial attacks can potentially loss the attacking ability. Therefore, we propose to add a plugin-function module to preprocess the input audio before it is fed into the verification system. As a first attempt, we study two opposite plugin functions: denoising that attempts to remove or reduce perturbations and noise-adding that adds small Gaussian noises to an input audio. We show through experiments that both methods can significantly degrade the performance of a state-of-the-art adversarial attack. Specifically, it is shown that denoising and noise-adding can reduce the targeted attack success rate of the attack from 100% to only 56% and 5.2%, respectively. Moreover, noise-adding can slow down the attack 25 times in speed and has a minor effect on the normal operations of a speaker verification system. Therefore, we believe that noise-adding can be applied to any speaker verification system against adversarial attacks. To the best of our knowledge, this is the first attempt in applying the noise-adding method to defend against adversarial attacks in speaker verification systems.</p><br>
9

How do voiceprints age?

Nachesa, Maya Konstantinovna January 2023 (has links)
Voiceprints, like fingerprints, are a biometric. Where fingerprints record a person's unique pattern on their finger, voiceprints record what a person's voice "sounds like", abstracting away from what the person said. They have been used in speaker recognition, including verification and identification. In other words, they have been used to ask "is this speaker who they say they are?" or "who is this speaker?", respectively. However, people age, and so do their voices. Do voiceprints age, too? That is, can a person's voice change enough that after a while, the original voiceprint can no longer be used to identify them? In this thesis, I use Swedish audio recordings from Riksdagen's (the Swedish parliament) debate speeches to test this idea. Depending on the answer, this influences how well we can search the database for previously unmarked speeches. I find that speaker verification performance decreases as the age-gap between voiceprints increases, and that it decreases more strongly after roughly five years. Additionally, I grouped the speakers into age groups spanning five years, and found that speaker verification has the highest performance for those for whom the initial voiceprint was recorded from 29-33 years of age. Additionally, longer input speech provides higher quality voiceprints, with performance improvements stagnating when voiceprints become longer than 30 seconds. Finally, voiceprints for men age more strongly than those for women after roughly 5 years. I also investigated how emotions are encoded in voiceprints, since this could potentially impede in speaker recognition. I found that it is possible to train a classifier to recognise emotions from voiceprints, and that this classifier does better when recognising emotions from known speakers. That is, emotions are encoded more characteristically per person as opposed to per emotion itself. As such, they are unlikely to interfere with speaker recognition.
10

Personalized Voice Activated Grasping System for a Robotic Exoskeleton Glove

Guo, Yunfei 05 January 2021 (has links)
Controlling an exoskeleton glove with a highly efficient human-machine interface (HMI), while accurately applying force to each joint remains a hot topic. This paper proposes a fast, secure, accurate, and portable solution to control an exoskeleton glove. This state of the art solution includes both hardware and software components. The exoskeleton glove uses a modified serial elastic actuator (SEA) to achieve accurate force sensing. A portable electronic system is designed based on the SEA to allow force measurement, force application, slip detection, cloud computing, and a power supply to provide over 2 hours of continuous usage. A voice-control-based HMI referred to as the integrated trigger-word configurable voice activation and speaker verification system (CVASV), is integrated into a robotic exoskeleton glove to perform high-level control. The CVASV HMI is designed for embedded systems with limited computing power to perform voice-activation and voice-verification simultaneously. The system uses MobileNet as the feature extractor to reduce computational cost. The HMI is tuned to allow better performance in grasping daily objects. This study focuses on applying the CVASV HMI to the exoskeleton glove to perform a stable grasp with force-control and slip-detection using SEA based exoskeleton glove. This research found that using MobileNet as the speaker verification neural network can increase the speed of processing while maintaining similar verification accuracy. / Master of Science / The robotic exoskeleton glove used in this research is designed to help patients with hand disabilities. This thesis proposes a voice-activated grasping system to control the exoskeleton glove. Here, the user can use a self-defined keyword to activate the exoskeleton and use voice to control the exoskeleton. The voice command system can distinguish between different users' voices, thereby improving the safety of the glove control. A smartphone is used to process the voice commands and send them to an onboard computer on the exoskeleton glove. The exoskeleton glove then accurately applies force to each fingertip using a force feedback actuator.This study focused on designing a state of the art human machine interface to control an exoskeleton glove and perform an accurate and stable grasp.

Page generated in 0.0911 seconds