• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 178
  • 30
  • 21
  • 18
  • 11
  • 10
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 2
  • 1
  • Tagged with
  • 313
  • 313
  • 209
  • 108
  • 90
  • 71
  • 66
  • 54
  • 44
  • 37
  • 36
  • 35
  • 33
  • 30
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Estimation of glottal source features from the spectral envelope of the acoustic speech signal

Torres, Juan Félix 17 May 2010 (has links)
Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects of glottal source information that are already contained within the spectral features commonly used in speech analysis, yielding an objective assessment regarding the expected advantages of explicitly using glottal information extracted from the speech signal via currently available IF methods, versus the alternative of relying on the glottal source information that is implicitly contained in spectral envelope representations.
232

Probabilistic space maps for speech with applications

Kalgaonkar, Kaustubh 22 August 2011 (has links)
The objective of the proposed research is to develop a probabilistic model of speech production that exploits the multiplicity of mapping between the vocal tract area functions (VTAF) and speech spectra. Two thrusts are developed. In the first, a latent variable model that captures uncertainty in estimating the VTAF from speech data is investigated. The latent variable model uses this uncertainty to generate many-to-one mapping between observations of the VTAF and speech spectra. The second uses the probabilistic model of speech production to improve the performance of traditional speech algorithms, such as enhancement, acoustic model adaptation, etc. In this thesis, we propose to model the process of speech production with a probability map. This proposed model treats speech production as a probabilistic process with many-to-one mapping between VTAF and speech spectra. The thesis not only outlines a statistical framework to generate and train these probabilistic models from speech, but also demonstrates its power and flexibility with such applications as enhancing speech from both perceptual and recognition perspectives.
233

A multi-objective programming perspective to statistical learning problems

Yaman, Sibel 17 November 2008 (has links)
It has been increasingly recognized that realistic problems often involve a tradeoff among many conflicting objectives. Traditional methods aim at satisfying multiple objectives by combining them into a global cost function, which in most cases overlooks the underlying tradeoffs between the conflicting objectives. This raises the issue about how different objectives should be combined to yield a final solution. Moreover, such approaches promise that the chosen overall objective function is optimized over the training samples. However, there is no guarantee on the performance in terms of the individual objectives since they are not considered on an individual basis. Motivated by these shortcomings of traditional methods, the objective in this dissertation is to investigate theory, algorithms, and applications for problems with competing objectives and to understand the behavior of the proposed algorithms in light of some applications. We develop a multi-objective programming (MOP) framework for finding compromise solutions that are satisfactory for each of multiple competing performance criteria. The fundamental idea for our formulation, which we refer to as iterative constrained optimization (ICO), evolves around improving one objective while allowing the rest to degrade. This is achieved by the optimization of individual objectives with proper constraints on the remaining competing objectives. The constraint bounds are adjusted based on the objective functions obtained in the most recent iteration. An aggregated utility function is used to evaluate the acceptability of local changes in competing criteria, i.e., changes from one iteration to the next. Conflicting objectives arise in different contexts in many problems of speech and language technologies. In this dissertation, we consider two applications. The first application is language model (LM) adaptation, where a general LM is adapted to a specific application domain so that the adapted LM is as close as possible to both the general model and the application domain data. Language modeling and adaptation is used in many speech and language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing, and information retrieval. The second application is automatic language identification (LID), where the standard detection performance evaluation measures false-rejection (or miss) and false-acceptance (or false alarm) rates for a number of languages are to be simultaneously minimized. LID systems might be used as a pre-processing stage for understanding systems and for human listeners, and find applications in, for example, a hotel lobby or an international airport where one might speak to a multi-lingual voice-controlled travel information retrieval system. This dissertation is expected to provide new insights and techniques for accomplishing significant performance improvement over existing approaches in terms of the individual competing objectives. Meantime, the designer has a better control over what is achieved in terms of the individual objectives. Although many MOP approaches developed so far are formal and extensible to large number of competing objectives, their capabilities are examined only with two or three objectives. This is mainly because practical problems become significantly harder to manage when the number of objectives gets larger. We, however, illustrate the proposed framework with a larger number of objectives.
234

Noun phrase generation for situated dialogs

Stoia, Laura Cristina. January 2007 (has links)
Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 154-163).
235

Automatic classification of spoken South African English variants using a transcription-less speech recognition approach

Du Toit, A. (Andre) 03 1900 (has links)
Thesis (MEng)--University of Stellenbosch, 2004. / ENGLISH ABSTRACT: We present the development of a pattern recognition system which is capable of classifying different Spoken Variants (SVs) of South African English (SAE) using a transcriptionless speech recognition approach. Spoken Variants (SVs) allow us to unify the linguistic concepts of accent and dialect from a pattern recognition viewpoint. The need for the SAE SV classification system arose from the multi-linguality requirement for South African speech recognition applications and the costs involved in developing such applications. / AFRIKAANSE OPSOMMING: Ons beskryf die ontwikkeling van 'n patroon herkenning stelsel wat in staat is om verskillende Gesproke Variante (GVe) van Suid Afrikaanse Engels (SAE) te klassifiseer met behulp van 'n transkripsielose spraak herkenning metode. Gesproke Variante (GVe) stel ons in staat om die taalkundige begrippe van aksent en dialek te verenig vanuit 'n patroon her kenning oogpunt. Die behoefte aan 'n SAE GV klassifikasie stelsel het ontstaan uit die meertaligheid vereiste vir Suid Afrikaanse spraak herkenning stelsels en die koste verbonde aan die ontwikkeling van sodanige stelsels.
236

Determinadores de pitch / not available

Daniel Espanhol Razera 05 May 2004 (has links)
Os parâmetros acústicos da voz abordados em diversas pesquisas de análise digital da voz, apresentam-se válidos para o uso em processo diagnóstico e terapêutico. O grupo de parâmetros de perturbação da voz necessita do conhecimento de todos os períodos do trecho de sinal de voz analisado, para ter seu valor calculado. Esta tarefa é desempenhada pelos determinadores de pitch, e a sua precisão determina a confiabilidade que se pode ter nos parâmetros calculados. Este trabalho visa estudar diversos métodos propostos ao longo dos anos e estabelecer qual destes tem a melhor precisão e robustez, quando utilizados com vozes patológicas. Estuda-se também algoritmos estimadores de pitch como uma ferramenta de auxílio para a correção e ajuste dos determinadores. Os resultados obtidos demonstram a necessidade de modificações externas e internas aos algoritmos determinadores e estimadores, para alcançarem a robustez e precisão desejada. Dois algoritmos determinadores, determinador por autocorrelação e por extração de harmônicas, mostraram-se dentro das metas estabelecidas e confirmam-se como os mais promissores em aplicações para obtenção de parâmetros acústicos da voz. / Several researches of digital speech processing validate the use of acoustic parameters of the voice in diagnosis and therapeutic processes. Perturbation parameters need the knowledge of all the periods of the analyzed voice signal, to have their values calculated. This task is carried out by the pitch trackers and their precision determines the reliability off the evaluated parameters. The purpose of this work is to study several methods proposed along the years and to establish which algorithm has the best precision and robustness, when used with pathological voices. The pitch estimation is also studied as an aid tool for the correction and adjustment of the pitch trackers. The results demonstrate the need of external and internal modifications of the trackers and detector algorithms to reach the wanted robustness and precision. The algorithms for autocorrelation and for extraction of harmonics are confirmed as the most promising in applications for obtaining of acoustic parameters of the voice.
237

Proposta de metodologia de avaliação de voz sintética com ênfase no ambiente educacional / Methodology for evaluation of synthetic speech emphasizing the educational environment

Leite, Harlei Miguel de Arruda, 1989- 06 September 2014 (has links)
Orientador: Dalton Soares Arantes / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-25T15:09:09Z (GMT). No. of bitstreams: 1 Leite_HarleiMigueldeArruda_M.pdf: 3631088 bytes, checksum: b997adfa6f8915d31a23e0eb6daf0cc3 (MD5) Previous issue date: 2014 / Resumo: A principal contribuição desta dissertação é a proposta de uma metodologia de avaliação de voz sintetizada. O método consiste em um conjunto de etapas que buscam auxiliar o avaliador nas etapas de planejamento, aplicação e análise dos dados coletados. O método foi originalmente desenvolvido para avaliar um conjunto de vozes sintetizadas para encontrar a voz que melhor se adapta a ambientes de educação a distância usando avatares. Também foram estudadas as relações entre inteligibilidade, compreensibilidade e naturalidade a fim conhecer os fatores a serem considerados para aprimorar os sintetizadores de fala. Esta dissertação também apresenta os principais métodos de avaliação encontrados na literatura e o princípio de funcionamento dos sistemas TTS / Abstract: This thesis proposes, as main contribution, a new synthesized voice evaluation methodology. The method consists of a set of steps that seek to assist the assessor in the stages of planning, implementation and analysis of data collected. The method was originally developed to evaluate a set of synthesized voices to find the voice that best fits the environments for distance education using avatars. Relations between intelligibility, comprehensibility and naturalness were studied in order to know the factors to be considered to enhance the speech synthesizers. This thesis also presents the main evaluation methods in the literature and how TTS (Text-to-Speech) systems work / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica
238

Uso de parâmetros multifractais no reconhecimento de locutor / Use of multifractal parameters for speaker recognition

González González, Diana Cristina, 1984- 19 August 2018 (has links)
Orientadores: Lee Luan Ling, Fábio Violaro / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-19T05:40:32Z (GMT). No. of bitstreams: 1 GonzalezGonzalez_DianaCristina_M.pdf: 2589944 bytes, checksum: ddbbbef6076eb402f4abe638ebcd232b (MD5) Previous issue date: 2011 / Resumo: Esta dissertação apresenta a implementação de um sistema de Reconhecimento Automático de Locutor (ASR). Este sistema emprega um novo parâmetro de características de locutor baseado no modelo multifractal "VVGM" (Variable Variance Gaussian Multiplier). A metodologia adotada para o desenvolvimento deste sistema foi formulada em duas etapas. Inicialmente foi implementado um sistema ASR tradicional, usando como vetor de características os MFCCs (Mel-Frequency Cepstral Coefficients) e modelo de mistura gaussiana (GMM) como classificador, uma vez que é uma configuração clássica, adotada como referência na literatura. Este procedimento permite ter um conhecimento amplo sobre a produção de sinais de voz, além de um sistema de referência para comparar o desempenho do novo parâmetro VVGM. A segunda etapa foi dedicada ao estudo de processos multifractais em sinais de fala, já que eles enfatizam-se na análise das informações contidas nas partes não estacionárias do sinal avaliado. Aproveitando essa característica, sinais de fala são modelados usando o modelo VVGM. Este modelo é baseado no processo de cascata multiplicativa binomial, e usa as variâncias dos multiplicadores de cada estágio como um novo vetor de característica. As informações obtidas pelos dois métodos são diferentes e complementares. Portanto, é interessante combinar os parâmetros clássicos com os parâmetros multifractais, a fim de melhorar o desempenho dos sistemas de reconhecimento de locutor. Os sistemas propostos foram avaliados por meio de três bases de dados de fala com diferentes configurações, tais como taxas de amostragem, número de falantes e frases e duração do treinamento e teste. Estas diferentes configurações permitem determinar as características do sinal de fala requeridas pelo sistema. Do resultado dos experimentos foi observado que o sistema de identificação de locutor usando os parâmetros VVGM alcançou taxas de acerto significativas, o que mostra que este modelo multifractal contém informações relevantes sobre a identidade de cada locutor. Por exemplo, a segunda base de dados é composta de sinais de fala de 71 locutores (50 homens e 21 mulheres) digitalizados a 22,05 kHz com 16 bits/amostra. O treinamento foi feito com 20 frases para cada locutor, com uma duração total de cerca de 70 s. Avaliando o sistema ASR baseado em VVGM, com locuções de teste de 3 s de comprimento, foi obtida uma taxa de reconhecimento de 91,30%. Usando estas mesmas condições, o sistema ASR baseado em MFCCs atingiu uma taxa de reconhecimento de 98,76%. No entanto, quando os dois parâmetros foram combinados, a taxa de reconhecimento aumentou para 99,43%, mostrando que a nova característica acrescenta informações importantes para o sistema de reconhecimento de locutor / Abstract: This dissertation presents an Automatic Speaker Recognition (ASR) system, which employs a new parameter based on the ¿VVGM? (Variable Variance Gaussian Multiplier) multifractal model. The methodology adopted for the development of this system is formulated in two stages. Initially, a traditional ASR system was implemented, based on the use of Mel-Frequency Cepstral Coefficients (MFCCs) and the Gaussian mixture models (GMMs) as the classifier, since it is the method with the best results in the literature. This procedure allows having a broad knowledge about the production of speech signals and a reference system to compare the performance of the new VVGM parameter. The second stage was dedicated to the study of the multifractal processes for speech signals, given that with them, it is possible to analyze information contained in non-stationary parts of the evaluated signal. Taking advantage of this characteristic, speech signals are modeled using the VVGM model, which is based on the binomial multiplicative cascade process, and uses the variances of multipliers for each state as a new speech feature. The information obtained by the two methods is different and complementary. Therefore, it is interesting to combine the classic parameters with the multifractal parameters in order to improve the performance of speaker recognition systems. The proposed systems were evaluated using three databases with different settings, such as sampling rates, number of speakers and phrases, duration of training and testing. These different configurations allow the determination of characteristics of the speech signal required by the system. With the experiments, the speaker identification system based on the VVGM parameters achieved significant success rates, which shows that this multifractal model contains relevant information of the identity of each speaker. For example, the second database is composed of speech signals of 71 speakers (50 men and 21 women) digitized at 22.05 kHz with 16 bits/sample. The training was done with 20 phrases for each speaker, with an approximately total duration of 70 s. Evaluating the ASR system based on VVGM, with this database and using test locutions with 3s of duration, it was obtained a recognition rate of 91.3%. Using these same conditions, the ASR system based on MFCCs reached a recognition rate of 98.76%. However, when the two parameters are combined, the recognition rate increased to 99.43%, showing that the new feature adds substantial information to the speaker recognition system / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica
239

Unsupervised Morphological Segmentation and Part-of-Speech Tagging for Low-Resource Scenarios

Eskander, Ramy January 2021 (has links)
With the high cost of manually labeling data and the increasing interest in low-resource languages, for which human annotators might not be even available, unsupervised approaches have become essential for processing a typologically diverse set of languages, whether high-resource or low-resource. In this work, we propose new fully unsupervised approaches for two tasks in morphology: unsupervised morphological segmentation and unsupervised cross-lingual part-of-speech (POS) tagging, which have been two essential subtasks for several downstream NLP applications, such as machine translation, speech recognition, information extraction and question answering. We propose a new unsupervised morphological-segmentation approach that utilizes Adaptor Grammars (AGs), nonparametric Bayesian models that generalize probabilistic context-free grammars (PCFGs), where a PCFG models word structure in the task of morphological segmentation. We implement the approach as a publicly available morphological-segmentation framework, MorphAGram, that enables unsupervised morphological segmentation through the use of several proposed language-independent grammars. In addition, the framework allows for the use of scholar knowledge, when available, in the form of affixes that can be seeded into the grammars. The framework handles the cases when the scholar-seeded knowledge is either generated from language resources, possibly by someone who does not know the language, as weak linguistic priors, or generated by an expert in the underlying language as strong linguistic priors. Another form of linguistic priors is the design of a grammar that models language-dependent specifications. We also propose a fully unsupervised learning setting that approximates the effect of scholar-seeded knowledge through self-training. Moreover, since there is no single grammar that works best across all languages, we propose an approach that picks a nearly optimal configuration (a learning setting and a grammar) for an unseen language, a language that is not part of the development. Finally, we examine multilingual learning for unsupervised morphological segmentation in low-resource setups. For unsupervised POS tagging, two cross-lingual approaches have been widely adapted: 1) annotation projection, where POS annotations are projected across an aligned parallel text from a source language for which a POS tagger is accessible to the target one prior to training a POS model; and 2) zero-shot model transfer, where a model of a source language is directly applied on texts in the target language. We propose an end-to-end architecture for unsupervised cross-lingual POS tagging via annotation projection in truly low-resource scenarios that do not assume access to parallel corpora that are large in size or represent a specific domain. We integrate and expand the best practices in alignment and projection and design a rich neural architecture that exploits non-contextualized and transformer-based contextualized word embeddings, affix embeddings and word-cluster embeddings. Additionally, since parallel data might be available between the target language and multiple source ones, as in the case of the Bible, we propose different approaches for learning from multiple sources. Finally, we combine our work on unsupervised morphological segmentation and unsupervised cross-lingual POS tagging by conducting unsupervised stem-based cross-lingual POS tagging via annotation projection, which relies on the stem as the core unit of abstraction for alignment and projection, which is beneficial to low-resource morphologically complex languages. We also examine morpheme-based alignment and projection, the use of linguistic priors towards better POS models and the use of segmentation information as learning features in the neural architecture. We conduct comprehensive evaluation and analysis to assess the performance of our approaches of unsupervised morphological segmentation and unsupervised POS tagging and show that they achieve the state-of-the-art performance for the two morphology tasks when evaluated on a large set of languages of different typologies: analytic, fusional, agglutinative and synthetic/polysynthetic.
240

Tvorba zvuku v technologii VST / Sound Creation Using VST

Švec, Michal January 2014 (has links)
This diploma thesis deals with digital sound synthesis. The main task was to design and implement new sound synthesizer. Created tool uses different approaches to the sound synthesis, so it can be described as a hybrid. Instrument design was inspired by existing audio synthesizers. For implementation, C++ language and VST technology from Steinberg are used. As an extension, a module, that can process voice or text input and then build a MIDI file with melody (which can be interpreted with using any synthesizer) was designed and implemented. For this module, Python language is used. For the synthesizer, a simple graphical user interface was created.

Page generated in 0.0348 seconds