• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 6
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 44
  • 44
  • 15
  • 13
  • 12
  • 9
  • 8
  • 8
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Speech Analysis for Processing of Musical Signals / Speech Analysis for Processing of Musical Signals

Mészáros, Tomáš January 2015 (has links)
Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.
32

Robust Speech Filter And Voice Encoder Parameter Estimation using the Phase-Phase Correlator

Azad, Abul K. 08 November 2019 (has links)
In recent years, linear prediction voice encoders have become very efficient in terms of computing execution time and channel bandwidth usage while providing, in the absence of im- pulsive noise, natural sounding synthetic speech signals. This good performance has been achieved via the use of a maximum likelihood parameter estimation of an auto-regressive model of order ten that best fits the speech signal under the assumption that the signal and the noise are Gaussian stochastic processes. However, this method breaks down in the presence of impulse noise, which is common in practice, resulting in harsh or non-intelligible audio signals. In this paper, we propose a robust estimator of correlation, the Phase-Phase correlator that is able to cope with impulsive noise. Utilizing this correlator, we develop a Robust Mixed Excitation Linear Prediction encoder that provides improved audio quality for voiced, unvoiced, and transition speech segments. This is achieved by applying a statistical test to robust Mahalanobis distances for identifying the outliers in the corrupted speech signal, which are then replaced with filtered signals. Simulation results reveal that the proposed method outperforms in variance, bias, and breakdown point three other robust approaches based on the arcsin law, the polarity coincidence correlator, and the median- of-ratio estimator without sacrificing the encoder bandwidth efficiency and the compression gain while remaining compatible with real-time applications. Furthermore, in the presence of impulsive noise, the proposed speech encoder speech perceptual quality also outperforms the state of the art in terms of mean opinion score. / Doctor of Philosophy / Impulsive noise is a natural phenomenon in everyday experience. Impulsive noise can be analogous to discontinuities or a drastic change in natural progressions of events. Specifically in this research the disrupting events can occur in signals such as speech, power transmission, stock market, communication systems, etc. Sudden power outage due to lighting, maintenance or other catastrophic events are some of the reasons why we may experience performance degradation in our electronic devices. Another example of impulsive noise is when we play an old damaged vinyl records, which results in annoying clicking sounds. At the time instance of each click, the true music or speech or simply the audible waveform is completely destroyed. Other examples of impulse noise is a sudden crash in the stock market; a sudden dive in the market can destroy the regression and future predictions. Unfortunately, in the presence of impulsive noise, classical methods methods are unable to filter out the impulse corruptions. The intended filtering objective of this dissertation is specific, but not limited, to speech signal processing. Specifically, research different filter model to determine the optimum method of eliminating impulsive noise in speech. Note, that the optimal filter model is different for time series signal model such as speech, stock market, power systems, etc. In our studies we have shown that our speech filter method outperforms the state of the art algorithms. Another major contribution of our research is in speech compression algorithm that is robust to impulse noise in speech. In digital signal processing, a compression method entails in representing the same signal with less data and yet convey the the same same message as the original signal. For example, human auditory system can produce sounds in the range of approximately 60 Hz and 3500 Hz, another word speech can occupy approximately 4000 Hz in frequency space. So the challenge is, can we compress speech in one of half of that space, or even less. This is a very attractive proposition because frequency space is limited but the wireless service providers desires to service as many users as possible without sacrificing quality and ultimately maximize the bottom line. Encoding impulse corrupted speech produces harsh quality of synthesized audio. We have shown if the encoding is done with the proposed method, synthesized audio quality is far superior to the sate of the art.
33

Transforming high-effort voices into breathy voices using adaptive pre-emphasis linear prediction

Nordstrom, Karl 29 April 2008 (has links)
During musical performance and recording, there are a variety of techniques and electronic effects available to transform the singing voice. The particular effect examined in this dissertation is breathiness, where artificial noise is added to a voice to simulate aspiration noise. The typical problem with this effect is that artificial noise does not effectively blend into voices that exhibit high vocal effort. The existing breathy effect does not reduce the perceived effort; breathy voices exhibit low effort. A typical approach to synthesizing breathiness is to separate the voice into a filter representing the vocal tract and a source representing the excitation of the vocal folds. Artificial noise is added to the source to simulate aspiration noise. The modified source is then fed through the vocal tract filter to synthesize a new voice. The resulting voice sounds like the original voice plus noise. Listening experiments were carried out. These listening experiments demonstrated that constant pre-emphasis linear prediction (LP) results in an estimated vocal tract filter that retains the perception of vocal effort. It was hypothesized that reducing the perception of vocal effort in the estimated vocal tract filter may improve the breathy effect. This dissertation presents adaptive pre-emphasis LP (APLP) as a technique to more appropriately model the spectral envelope of the voice. The APLP algorithm results in a more consistent vocal tract filter and an estimated voice source that varies more appropriately with changes in vocal effort. This dissertation describes how APLP estimates a spectral emphasis filter that can transform the spectral envelope of the voice, thereby reducing the perception of vocal effort. A listening experiment was carried out to determine whether APLP is able to transform high effort voices into breathy voices more effectively than constant pre-emphasis LP. The experiment demonstrates that APLP is able to reduce the perceived effort in the voice. In addition, the voices transformed using APLP sound less artificial than the same voices transformed using constant pre-emphasis LP. This indicates that APLP is able to more effectively transform high-effort voices into breathy voices.
34

Characterization of the Voice Source by the DCT for Speaker Information

Abhiram, B January 2014 (has links) (PDF)
Extracting speaker-specific information from speech is of great interest to both researchers and developers alike, since speaker recognition technology finds application in a wide range of areas, primary among them being forensics and biometric security systems. Several models and techniques have been employed to extract speaker information from the speech signal. Speech production is generally modeled as an excitation source followed by a filter. Physiologically, the source corresponds to the vocal fold vibrations and the filter corresponds to the spectrum-shaping vocal tract. Vocal tract-based features like the melfrequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients have been shown to contain speaker information. However, high speed videos of the larynx show that the vocal folds of different individuals vibrate differently. Voice source (VS)-based features have also been shown to perform well in speaker recognition tasks, thereby revealing that the VS does contain speaker information. Moreover, a combination of the vocal tract and VS-based features has been shown to give an improved performance, showing that the latter contains supplementary speaker information. In this study, the focus is on extracting speaker information from the VS. The existing techniques for the same are reviewed, and it is observed that the features which are obtained by fitting a time-domain model on the VS perform poorly than those obtained by simple transformations of the VS. Here, an attempt is made to propose an alternate way of characterizing the VS to extract speaker information, and to study the merits and shortcomings of the proposed speaker-specific features. The VS cannot be measured directly. Thus, to characterize the VS, we first need an estimate of the VS, and the integrated linear prediction residual (ILPR) extracted from the speech signal is used as the VS estimate in this study. The voice source linear prediction model, which was proposed in an earlier study to obtain the ILPR, is used in this work. It is hypothesized here that a speaker’s voice may be characterized by the relative proportions of the harmonics present in the VS. The pitch synchronous discrete cosine transform (DCT) is shown to capture these, and the gross shape of the ILPR in a few coefficients. The ILPR and hence its DCT coefficients are visually observed to distinguish between speakers. However, it is also observed that they do have intra-speaker variability, and thus it is hypothesized that the distribution of the DCT coefficients may capture speaker information, and this distribution is modeled by a Gaussian mixture model (GMM). The DCT coefficients of the ILPR (termed the DCTILPR) are directly used as a feature vector in speaker identification (SID) tasks. Issues related to the GMM, like the type of covariance matrix, are studied, and it is found that diagonal covariance matrices perform better than full covariance matrices. Thus, mixtures of Gaussians having diagonal covariances are used as speaker models, and by conducting SID experiments on three standard databases, it is found that the proposed DCTILPR features fare comparably with the existing VS-based features. It is also found that the gross shape of the VS contains most of the speaker information, and the very fine structure of the VS does not help in distinguishing speakers, and instead leads to more confusion between speakers. The major drawbacks of the DCTILPR are the session and handset variability, but they are also present in existing state-of-the-art speaker-specific VS-based features and the MFCCs, and hence seem to be common problems. There are techniques to compensate these variabilities, which need to be used when the systems using these features are deployed in an actual application. The DCTILPR is found to improve the SID accuracy of a system trained with MFCC features by 12%, indicating that the DCTILPR features capture speaker information which is missed by the MFCCs. It is also found that a combination of MFCC and DCTILPR features on a speaker verification task gives significant performance improvement in the case of short test utterances. Thus, on the whole, this study proposes an alternate way of extracting speaker information from the VS, and adds to the evidence for speaker information present in the VS.
35

Link Stability Analysis of Wireless Sensor Networks Over the Ocean Surface

Shahanaghi, Alireza 03 September 2021 (has links)
Ocean-surface Wireless Sensor Networks (WSN) are essential in various thalassic applications, such as maritime communication, ocean monitoring, seawater examination, pollution detection, etc. Formed by simple structured sensor nodes, ocean-surface WSN can improve the data transmission rate, enhance the monitoring resolution, expand the geographical coverage, extend the observation period, and lower the cost compared to the vessel-based monitoring approaches. Despite the importance and the broad applications of ocean-surface WSNs, little is known about the stability of the wireless links among the sensors. Especially, research suffers from the lack of an accurate model that describes the environmnetal effects, including the ocean surface movements and the wind speed on the link stability. The inappropriate understanding of link stability can result in network protocols that are not robust to environmental interruptions. Such a shortcoming decreases the network reliability and degrades the accuracy of the network planning. To compensate for this shortcoming, in this dissertation, we provide a thorough analysis on the stability of the wireless links over the ocean. In particular, we investigate and capture the effects of ocean waves on the link stability through the following steps. First, we use the linear wave theory and obtain a novel stochastic model of Line-of-Sight (LoS) links over the ocean based on the realistic behavior of ocean waves. Second, we present and prove an important theorem on the level-crossing of Wide Sense Stationary (WSS) random processes, and combine that with our stochastic model of LoS links to study two important indicators of link stability, i.e., the blockage probability and the blockage and connectivity periods. The former indicates the probability that a LoS link is blocked by the ocean waves while the latter determines the duration of on/off periods of the LoS links over the ocean. The aforementioned stability parameters directly affect different stages of network design, such as choosing the antenna height, planning the sensors' deployment distances, determining the packet length, designing the retransmission and scheduling strategies in the Medium Access Control (MAC) protocols and transport layer protocols, selecting the fragmentation threshold in Internet Protocol (IP), etc., which will be discussed in the respective chapters. In the last part of our dissertation, we investigate the problem of linear prediction of ocean waves, which has special importance in the design of ocean-surface WSNs. In this regard, we first introduce a low-complexity metric for effectiveness of k-step-ahead linear prediction, which we refer to as efficiency curve. The significance of efficiency curve becomes evident when we decide upon the number of previous samples in the linear prediction model, and determine the extent to which the predictor forecasts the future. After efficiency curve, we formulate an adaptive Wiener filter to predict the ocean waves and adapt the prediction model according to the environmental changes. / Doctor of Philosophy / Covering almost three quarters of the earth and supplying half of its oxygen, oceans are vital to the support of life on our planet. It is important to continuously monitor different parts of the ocean environment for tracking climate changes, detecting pollution, etc. However, the existing monitoring approaches have serious weaknesses, which prevent us from constantly monitoring the state of ocean, and drastically limit the geographical coverage. For instance, the traditional ocean monitoring system using oceanographic research vessels is time-consuming and expensive. Besides, it has low resolution in time and space, which poses serious challenges to oceanographers by providing under-sampled records of the ocean. To compensate for these defects, one of the promising alternatives is to employ Wireless Sensor Networks (WSN) which has many advantages, such as real-time access to data for a longer period of time and a larger geographical coverage of the ocean, higher resolution of monitoring, faster processing of collected data and instantaneous transmission to onshore monitoring centers. With the benefit of simple structure sensor nodes, ocean-surface WSNs can also decrease the cost by at least one order of magnitude compared to the conventional approaches. Despite the advantages that ocean surface WSN have over traditional ocean monitoring methods, ocean surface WSN research suffers from the lack of an accurate model that describes the stability of wireless links among sensor nodes. While some of the existing literature has developed accurate models of the electromagnetic wave propagation over the ocean surface, they have failed to consider the environmental effects, such as ocean waves on the stability of the links. To fill this void, in this dissertation, we investigate ocean surface waves' effects on the Line-of-Sight (LoS) link between the sensors in an ocean-surface WSN. Specifically, we derive the blockage probability, and the blockage and connectivity periods of LoS links between a transmitter and receiver pair due to wave movements. In addition to the link stability analysis, we dedicate the last part of this dissertation to look into the problem of linear prediction of ocean waves, which has special importance in the design process of ocean-surface WSNs. In this regard, we present a low-complexity metric for effectiveness of k-step-ahead linear prediction, and formulate an adaptive Wiener filter to predict the ocean waves and adapt the prediction model according to the environmental changes.
36

Nouvelles méthodes multi-échelles pour l'analyse non-linéaire de la parole / Novel multiscale methods for nonlinear speech analysis

Khanagha, Vahid 16 January 2013 (has links)
Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l’analyse par prédiction linéaire parcimonieuse et une solution efficace pour l’approximation multipulse du signal source d'excitation. / This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.
37

Melizmų sintezė dirbtinių neuronų tinklais / Melisma Synthesis Using Artificial Neural Networks

Leonavičius, Romas 12 January 2007 (has links)
Modern methods of speech synthesis are not suitable for restoration of song signals due to lack of vitality and intonation in the resulted sounds. The aim of presented work is to synthesize melismas met in Lithuanian folk songs, by applying Artificial Neural Networks. An analytical survey of rather a widespread literature is presented. First classification and comprehensive discussion of melismas are given. The theory of dynamic systems which will make the basis for studying melismas is presented and finally the relationship for modeling a melisma with nonlinear and dynamic systems is outlined. Investigation of the most widely used Linear Prediction Coding method and possibilities of its improvement. The modification of original Linear Prediction method based on dynamic LPC frame positioning is proposed. On its basis, the new melisma synthesis technique is presented. Developed flexible generalized melisma model, based on two Artificial Neural Networks – a Multilayer Perceptron and Adaline – as well as on two network training algorithms – Levenberg- Marquardt and the Least Squares error minimization – is presented. Moreover, original mathematical models of Fortis, Gruppett, Mordent and Trill are created, fit for synthesizing melismas, and their minimal sizes are proposed. The last chapter concerns experimental investigation, using over 500 melisma records, and corroborates application of the new mathematical models to melisma synthesis of one performer.
38

Melizmų sintezė dirbtinių neuronų tinklais / Melisma Synthesis Using Artificial Neural Networks

Leonavičius, Romas 12 January 2007 (has links)
Modern methods of speech synthesis are not suitable for restoration of song signals due to lack of vitality and intonation in the resulted sounds. The aim of presented work is to synthesize melismas met in Lithuanian folk songs, by applying Artificial Neural Networks. An analytical survey of rather a widespread literature is presented. First classification and comprehensive discussion of melismas are given. The theory of dynamic systems which will make the basis for studying melismas is presented and finally the relationship for modeling a melisma with nonlinear and dynamic systems is outlined. Investigation of the most widely used Linear Prediction Coding method and possibilities of its improvement. The modification of original Linear Prediction method based on dynamic LPC frame positioning is proposed. On its basis, the new melisma synthesis technique is presented. Developed flexible generalized melisma model, based on two Artificial Neural Networks – a Multilayer Perceptron and Adaline – as well as on two network training algorithms – Levenberg- Marquardt and the Least Squares error minimization – is presented. Moreover, original mathematical models of Fortis, Gruppett, Mordent and Trill are created, fit for synthesizing melismas, and their minimal sizes are proposed. The last chapter concerns experimental investigation, using over 500 melisma records, and corroborates application of the new mathematical models to melisma synthesis of one performer.
39

Automatická klasifikace výslovnosti hlásky R / Automatic classification of pronunciation of the letter „R“

Hrušovský, Enrik January 2018 (has links)
This diploma thesis deals with automatic clasification of vowel R. Purpose of this thesis is to made program for detection of pronounciation of speech defects at vowel R in children. In thesis are processed parts as speech creation, speech therapy, dyslalia and subsequently speech signal processing and analysis methods. In the last part is designed software for automatic detection of pronounciation of vowel R. For recognition of pronounciation is used algorithm MFCC for extracting features. This features are subsequently classified by neural network to the group of correct or incorrect pronounciation and is evaluated classification success.
40

Remediation of instability in Best Linear Unbiased Prediction

Eatwell, Karen Anne January 2013 (has links)
In most breeding programmes breeders use phenotypic data obtained in breeding trials to rank the performance of the parents or progeny on pre-selected performance criteria. Through this ranking the best candidates are identified and selected for breeding or production purposes. Best Linear Unbiased Prediction (BLUP), is an efficient selection method to use, combining information into a single index. Unbalanced or messy data is frequently found in tree breeding trial data. Trial individuals are related and a degree of correlation is expected between individuals over sites, which can lead to collinearity in the data which may lead to instability in certain selection models. A high degree of collinearity may cause problems and adversely affect the prediction of the breeding values in a BLUP selection index. Simulation studies have highlighted that instability is a concern and needs to be investigated in experimental data. The occurrence of instability, relating to collinearity, in BLUP of tree breeding data and possible methods to deal with it were investigated in this study. Case study data from 39 forestry breeding trials (three generations) of Eucalyptus grandis and 20 trials of Pinus patula (two generations) were used. A series of BLUP predictions (rankings) using three selection traits and 10 economic weighting sets were made. Backward and forward prediction models with three different matrix inversion techniques (singular value decomposition, Gaussian elimination - partial and full pivoting) and an adapted ridge regression technique were used in calculating BLUP indices. A Delphi and Clipper version of the same BLUP programme which run with different computational numerical precision were used and compared. Predicted breeding values (forward prediction) were determined in the F1 and F2 E. grandis trials and F1 P. patula trials and realised breeding performance (backward prediction) was determined in the F2 and F3 E. grandis trials and F2 P. patula trials. The accuracy (correlation between the predicted breeding values and realised breeding performance) was estimated in order to assess the efficiency of the predictions and evaluate the different matrix inversion methods. The magnitude of the accuracy (correlations) was found to mostly be of acceptable magnitude when compared to the heritability of the compound weighted trait in the F1F2 E. grandis scenarios. Realised genetic gains were also calculated for each method used. Instability was observed in both E. grandis and P. patula breeding data in the study, and this may cause a significant loss in realised genetic gains. Instability can be identified by examining the matrix calculated from the product of the phenotypic covariance matrix with its inverse, for deviations from the expected identity pattern. Results of this study indicate that it may not always be optimal to use a higher numerical precision programme when there is collinearity in the data and instability in the matrix calculations. In some cases, where there is a large amount of collinearity, the use of a higher precision programme for BLUP calculations can significantly increase or decrease the accuracy of the rankings. The different matrix inversion techniques particularly SVD and adapted ridge regression did not perform much better than the full pivoting technique. The study found that it is beneficial to use the full pivoting Gaussian elimination matrix inversion technique in preference to the partial pivoting Gaussian elimination matrix inversion technique for both high and lower numerical precision programmes. / Thesis (PhD)--University of Pretoria, 2013. / gm2014 / Genetics / unrestricted

Page generated in 0.5015 seconds