• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 6
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 29
  • 29
  • 11
  • 11
  • 8
  • 7
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Rozpoznávání mluvčího ve Skype hovorech / Speaker Recognition in Skype Calls

Kaňok, Tomáš January 2011 (has links)
This diploma thesis is concerned with machine identification and verification of speaker, it's theory and applications. It evaluates existing implementation of the subject by the Speech@FIT group. It also considers plugins for the Skype program. Then a functioning plugin is proposed which makes possible identification of the speaker. It is implemented and evaluated here. Towards the end of this thesis suggestions of future development are presented.
12

Detecting Speakers in Video Footage

Williams, Michael 01 April 2018 (has links) (PDF)
Facial recognition is a powerful tool for identifying people visually. Yet, when the end goal is more specific than merely identifying the person in a picture problems can arise. Speaker identification is one such task which expects more predictive power out of a facial recognition system than can be provided on its own. Speaker identification is the task of identifying who is speaking in video not simply who is present in the video. This extra requirement introduces numerous false positives into the facial recognition system largely due to one main scenario. The person speaking is not on camera. This paper investigates a solution to this problem by incorporating information from a new system which indicates whether or not the person on camera is speaking. This information can then be combined with an existing facial recognition to boost its predictive capabilities in this instance. We propose a speaker detection system to visually detect when someone in a given video is speaking. The system relies strictly on visual information and is not reliant on audio information. By relying strictly on visual information to detect when someone is speaker the system can be synced with an existing facial recognition system and extend its predictive power. We use a two-stream convolutional neural network to accomplish the speaker detection. The neural network is trained and tested using data extracted from Digital Democracy’s large database of transcribed political hearings [4]. We show that the system is capable of accurately detecting when someone on camera is speaking with an accuracy of 87% on a dataset of legislators. Furthermore we demonstrate how this information can benefit a facial recognition system with the end goal of identifying the speaker. The system increased the precision of a existing facial recognition system by up to 5% at the cost of a large drop in recall.
13

SPEAKER AND GENDER IDENTIFICATION USING BIOACOUSTIC DATA SETS

Jose, Neenu 01 January 2018 (has links)
Acoustic analysis of animal vocalizations has been widely used to identify the presence of individual species, classify vocalizations, identify individuals, and determine gender. In this work automatic identification of speaker and gender of mice from ultrasonic vocalizations and speaker identification of meerkats from their Close calls is investigated. Feature extraction was implemented using Greenwood Function Cepstral Coefficients (GFCC), designed exclusively for extracting features from animal vocalizations. Mice ultrasonic vocalizations were analyzed using Gaussian Mixture Models (GMM) which yielded an accuracy of 78.3% for speaker identification and 93.2% for gender identification. Meerkat speaker identification with Close calls was implemented using Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM), with an accuracy of 90.8% and 94.4% respectively. The results obtained shows these methods indicate the presence of gender and identity information in vocalizations and support the possibility of robust gender identification and individual identification using bioacoustic data sets.
14

Investigation Of The Significance Of Periodicity Information In Speaker Identification

Gursoy, Secil 01 April 2008 (has links) (PDF)
In this thesis / general feature selection methods and especially the use of periodicity and aperiodicity information in speaker verification task is searched. A software system is constructed to obtain periodicity and aperiodicity information from speech. Periodicity and aperiodicity information is obtained by using a 16 channel filterbank and analyzing channel outputs frame by frame according to the pitch of that frame. Pitch value of a frame is also found by using periodicity algorithms. Parzen window (kernel density estimation) is used to represent each person&rsquo / s selected phoneme. Constructed method is tested for different phonemes in order to find out its usability in different phonemes. Periodicity features are also used with MFCC features to find out their contribution to speaker identification problem.
15

Identifikace mluvčího v temporální doméně řeči / Speaker identification in the temporal domain of speech

Weingartová, Lenka January 2015 (has links)
This thesis aims to thoroughly describe the temporal characteristics of spoken Czech by means of phone durations and their changes under the influence of several prosodic and segmental factors, such as position in a higher unit (syllable, word or prosodic phrase), length of the higher unit, segmental environment, structure of the syllable or phrase-final lengthening. The speech material comes from a semi-spontaneous corpus of scripted dialogues comprising 4046 utterances by 34 speakers. The descriptions are afterwards used for the creation of a rule-based temporal model, which provides a baseline for analysing local articulation rate contours and their speaker-specificity. The results indicate, that systematic speaker-specific differences can be found in the segmental domain, as well as in the temporal contours. Moreover, speaker identification potential of articulation rate and global temporal features is also assessed. Keywords: temporal characteristics, temporal modelling, phone duration, speaker identification, Czech
16

Rozpoznávání mluvčího na mobilním telefonu / Speaker Recognition on Mobile Phone

Pešán, Jan January 2011 (has links)
Tato práce se zaměřuje na implementaci počítačového systému rozpoznávání řečníka do prostředí mobilního telefonu. Je zde popsán princip, funkce, a implementace rozpoznávače na mobilním telefonu Nokia N900.
17

Asmens identifikavimas pagal pirštų atspaudus ir balsą / Person Identification by fingerprints and voice

Kisel, Andrej 30 December 2010 (has links)
Penkiose disertacijos darbo dalyse nagrinėjamos asmens identifikavimo pagal pirštų atspaudus ir balsą problemos ir siūlomi jų sprendimai. Pirštų atspaudų požymių išskyrimo algoritmų kokybės įvertinimo problemą siūloma spręsti panaudojant sintezuotus pirštų atspaudus. Darbe siūlomos žinomo pirštų atpaudų sintezės algoritmo modifikacijos, kurios leidžia sukurti piršto atspaudo vaizdą su iš anksto nustatytomis charakteristikomis ir požymiais bei pagreitina sintezės procesą. Pirštų atspaudų požymių palyginimo problemos yra aptartos ir naujas palyginimo algoritmas yra siūlomas deformuotų pirštų palyginimui. Algoritmo kokybė yra įvertinta naudojant viešai prieinamas ir vidines duomenų bazes. Naujas asmens identifikavimo pagal balsą metodas remiantis tiesinės prognozės modelio grupinės delsos požymiais ir tų požymių palyginimo metrika kokybės prasme lenkia tradicinius asmens identifikavimo pagal balsą metodus. Pirštų ir balso įrašų nepriklausomumas yra irodytas ir asmens atpažinimas pagal balsą ir pirštų atspaudus kartu yra pasiūlytas siekiant išspręsti bendras biometrinių sistemų problemas. / This dissertation focuses on person identification problems and proposes solutions to overcome those problems. First part is about fingperprint feaures extraction algorithm performance evaluaiton. Modifications to a known synthesis algorithm are proposed to make it fast and suitable for performance evaluation. Matching of deformed fingerprints is discussed in the second part of the work. New fingerprint matching algorithm that uses local structures and does not perform fingerprint alignment is proposed to match deformed fingerprints. The use of group delay features of linear prediciton model for speaker identification is proposed in the third part of the work. New similarity metric that uses group delay features is described. It is demonstrated that automatic speaker identification system with proposed features and similarity metric outperforms traditional speaker identification systems. Multibiometrics using fingerprints and voice is adressed in the last part of the dissertation.
18

Využití dlouhodobé formantové distribuce pro rozpoznatelnost mluvčího v různých akustických podmínkách / Using long-term formant distributions for speaker identification in various acoustic conditions

Lazárková, Dita January 2015 (has links)
The analysis of long-time formant distribution is relatively young but promising discipline of speaker identification. It is a method of mapping the long-term behavior of formants in speech of individual speakers. Frequently encountered problems in practice are bad acoustic quality and very short duration of analyzed recordings. This work aims to present the historical development of forensic phonetics and currently used methods. In the practical part, it deals with the usability of LTF method in forensic practice, especially in recordings containing background noise. It was shown that the noise appreciably affects extracted LTF values and unfortunately the change is not systematic. Therefore, we proposed several methods to compensate the noise in recordings, in order to be able to compare recordings with and without noise. We also investigated the minimum duration of recording, which is necessary for statistical reliability of the resulting values. This boundary is not exact and for each speaker, it is substantially individual. But it is apparent that recordings (vocalic streams) shorter than 15 s often provide incomplete information, wherefore they cannot be recommended for analysis. Keywords: LTF, long-time formant distribution, speaker identification, forensic phonetics, acoustic quality of...
19

Analyse acoustique de la voix émotionnelle de locuteurs lors d’une interaction humain-robot / Acoustic analysis of speakers emotional voices during a human-robot interaction

Tahon, Marie 15 November 2012 (has links)
Mes travaux de thèse s'intéressent à la voix émotionnelle dans un contexte d'interaction humain-robot. Dans une interaction réaliste, nous définissons au moins quatre grands types de variabilités : l'environnement (salle, microphone); le locuteur, ses caractéristiques physiques (genre, âge, type de voix) et sa personnalité; ses états émotionnels; et enfin le type d'interaction (jeu, situation d'urgence ou de vie quotidienne). A partir de signaux audio collectés dans différentes conditions, nous avons cherché, grâce à des descripteurs acoustiques, à imbriquer la caractérisation d'un locuteur et de son état émotionnel en prenant en compte ces variabilités.Déterminer quels descripteurs sont essentiels et quels sont ceux à éviter est un défi complexe puisqu'il nécessite de travailler sur un grand nombre de variabilités et donc d'avoir à sa disposition des corpus riches et variés. Les principaux résultats portent à la fois sur la collecte et l'annotation de corpus émotionnels réalistes avec des locuteurs variés (enfants, adultes, personnes âgées), dans plusieurs environnements, et sur la robustesse de descripteurs acoustiques suivant ces quatre variabilités. Deux résultats intéressants découlent de cette analyse acoustique: la caractérisation sonore d'un corpus et l'établissement d'une liste "noire" de descripteurs très variables. Les émotions ne sont qu'une partie des indices paralinguistiques supportés par le signal audio, la personnalité et le stress dans la voix ont également été étudiés. Nous avons également mis en oeuvre un module de reconnaissance automatique des émotions et de caractérisation du locuteur qui a été testé au cours d'interactions humain-robot réalistes. Une réflexion éthique a été menée sur ces travaux. / This thesis deals with emotional voices during a human-robot interaction. In a natural interaction, we define at least, four kinds of variabilities: environment (room, microphone); speaker, its physic characteristics (gender, age, voice type) and personality; emotional states; and finally the kind of interaction (game scenario, emergency, everyday life). From audio signals collected in different conditions, we tried to find out, with acoustic features, to overlap speaker and his emotional state characterisation taking into account these variabilities.To find which features are essential and which are to avoid is hard challenge because it needs to work with a high number of variabilities and then to have riche and diverse data to our disposal. The main results are about the collection and the annotation of natural emotional corpora that have been recorded with different kinds of speakers (children, adults, elderly people) in various environments, and about how reliable are acoustic features across the four variabilities. This analysis led to two interesting aspects: the audio characterisation of a corpus and the drawing of a black list of features which vary a lot. Emotions are ust a part of paralinguistic features that are supported by the audio channel, other paralinguistic features have been studied such as personality and stress in the voice. We have also built automatic emotion recognition and speaker characterisation module that we have tested during realistic interactions. An ethic discussion have been driven on our work.
20

Improved GMM-Based Classification Of Music Instrument Sounds

Krishna, A G 05 1900 (has links)
This thesis concerns with the recognition of music instruments from isolated notes. Music instrument recognition is a relatively nascent problem fast gaining importance not only because of the academic value the problem provides, but also for the potential it has in being able to realize applications like music content analysis, music transcription etc. Line spectral frequencies are proposed as features for music instrument recognition and shown to perform better than Mel filtered cepstral coefficients and linear prediction cepstral coefficients. Assuming a linear model of sound production, features based on the prediction residual, which represents the excitation signal, is proposed. Four improvements are proposed for classification using Gaussian mixture model (GMM) based classifiers. One of them involves characterizing the regions of overlap between classes in the feature space to improve classification. Applications to music instrument recognition and speaker recognition are shown. An experiment is proposed for discovering the hierarchy in music instrument in a data-driven manner. The hierarchy thus discovered closely corresponds to the hierarchy defined by musicians and experts and therefore shows that the feature space has successfully captured the required features for music instrument characterization.

Page generated in 0.4071 seconds