• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • 1
  • 1
  • Tagged with
  • 13
  • 8
  • 5
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Single-Microphone Speech Dereverberation: Modulation Domain Processing and Quality Assessment

ZHENG, CHENXI 25 July 2011 (has links)
In a reverberant enclosure, acoustic speech signals are degraded by reflections from walls, ceilings, and objects. Restoring speech quality and intelligibility from reverberated speech has received increasing interest over the past few years. Although multiple channel dereverberation methods provide some improvements in speech quality/ intelligibility, single-channel dereverberation remains an open challenge. Two types of advanced single-channel dereverberation methods, namely acoustic domain spectral subtraction and modulation domain filtering, provide small improvement in speech quality and intelligibility. In this thesis, we study single-channel dereverberation algorithms. Firstly, an upper bound of time-frequency masking (TFM) performance for dereverberation is obtained using ideal time-frequency masking (ITFM). ITFM has access to both the clean and reverberated speech signals in estimating the binary-mask matrix. ITFM implements binary masking in the short time Fourier transform (STFT) domain, preserving only those spectral components less corrupted by reverberation. The experiment results show that single-channel ITFM outperforms four existing multi-channel dereverberation methods and suggest that large potential improvements could be obtained using TFM for speech dereverberation. Secondly, a novel modulation domain spectral subtraction method is proposed for dereverberation. This method estimates modulation domain long reverberation spectral variance (LRSV) from time domain LRSV using a statistical room impulse response (RIR) model and implements spectral subtraction in the modulation domain. On one hand, different from acoustic domain spectral subtraction, our method implements spectral subtraction in the modulation domain, which has been shown to play an important role in speech perception. On the other hand, different from modulation domain filtering which uses a time-invariant filter, our method takes the changes of reverberated speech spectral variance along time into account and implements spectral subtraction adaptively. Objective and informal subjective tests show that our proposed method outperforms two existing state-of-the-art single-channel dereverberation algorithms. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2011-07-20 03:18:30.021
2

Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

Evers, Christine January 2010 (has links)
Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency.
3

ベイス推定に基づく音楽アライメント / Bayesian Music Alignment

前澤, 陽 23 March 2015 (has links)
Kyoto University (京都大学) / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19106号 / 情博第552号 / 新制||情||98 / 32057 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 田中 利幸, 講師 吉井 和佳 / 学位規則第4条第1項該当
4

Far-Field Speech Recognition / Far-Field Speech Recognition

Žmolíková, Kateřina January 2016 (has links)
Systémy rozpoznávání řeči v dnešní době dosahují poměrně vysoké úspěšnosti. V případě řeči, která je snímána vzdáleným mikrofonem a je tak narušena množstvím šumu a dozvukem (reverberací), je ale přesnost rozpoznávání značně zhoršena. Tento problém je možné zmírnit využitím mikrofonních polí. Tato práce se zabývá technikami, které umožňují kombinovat signály z více mikrofonů tak, aby byla zlepšena kvalita výsledného signálu a tedy i přesnost rozpoznávání. Práce nejprve shrnuje teorii rozpoznávání řeči a uvádí nejpoužívanější algoritmy pro zpracování mikrofonních polí. Následně jsou demonstrovány a analyzovány výsledky použití dvou metod pro beamforming a metody dereverberace vícekanálových signálů. Na závěr je vyzkoušen alternativní způsob beamformingu za použití neuronových sítí.
5

Bayesian Music Alignment / ベイス推定に基づく音楽アライメント

Maezawa, Akira 23 March 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19106号 / 情博第552号 / 新制||情||98(附属図書館) / 32057 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 田中 利幸, 講師 吉井 和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
6

Deep learning methods for speaker separation in reverberant conditions

Delfarah, Masood 16 October 2019 (has links)
No description available.
7

Extraction of Loudspeaker- and room impulse responses under overlapping conditions

Gustafsson, Felix January 2022 (has links)
A loudspeaker is often considered to be a Linear Time Invariant (LTI) system, which can be completely categorized by its impulse response. What sets loudspeakers apart from other LTI-systems is the acoustical aspect including echoes, which makes it a lot harder to take accurate noise free measurements compared to other LTI-systems such as a simple RC circuit. There are two main challenges regarding loudspeaker measurement, the first is high frequency reflections of surrounding surfaces and the second is low frequency modal resonances in the room stemming from the initial echoes. A straightforward way of dealing with this issue is simply truncating the measured impulse response before the arrival of the first high frequency reflection. This is however not without its problems as this will result in high uncertainty for low frequency content of the measurement. The longer time until the first reflection is measured, the better the measurement. The ideal measurement would be a noise free environment with infinite distance towards the nearest reflective surface. This is of course not possible in practice, but this ideal environment can be simulated by using an anechoic chamber. This thesis investigates the possibility of creating pseudo anechoic measurements in a general room using optimization with information extracted from measurement data in combination with linear time-varying (LTV) filtering. Algorithms for extracting information such as time delay between reflections as well as compensation for distortion in the reflections have been developed. This information is later used to minimize a cost function in order to obtain an estimation of the loudspeakers' impulse response using multiple measurements. The resulting estimation is then filtered using the LTV filter in order to obtain the pseudo anechoic impulse response. This thesis investigates two different loudspeakers in two ordinary rooms as well as in an anechoic chamber, and evaluates the performance of the developed methods. The overall results seem promising, but due to some inconsistencies of the measurements taken in the anechoic chamber that changes the direct wave of the loudspeakers, the developed methods are unable to achieve a true anechoic impulse response.  It is concluded that to be able to achieve true pseudo anechoic results, measurements in rooms must better resemble the ones taken inside the anechoic chamber. This in combination with tuning the hyper parameters of the LTV filter looks promising to achieve pseudo anechoic impulse responses with high correlation to the true anechoic measurements.
8

A Unified Statistical Approach to Fast and Robust Multichannel Speech Separation and Dereverberation / 高速かつ頑健な多チャンネル音声分離・残響除去のための統合的・統計的アプローチ

Sekiguchi, Kouhei 23 March 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23309号 / 情博第745号 / 新制||情||127(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)准教授 吉井 和佳, 教授 河原 達也, 教授 西野 恒, 教授 田中 利幸 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
9

Analyse de scène sonore multi-capteurs : un front-end temps-réel pour la manipulation de scène / Multi-sensor sound scene analysis : a real-time front-end for scene manipulation

Baque, Mathieu 09 June 2017 (has links)
La thèse s’inscrit dans un contexte d’essor de l’audio spatialisé (5.1, Dolby Atmos...). Parmi les formats audio 3D existants, l’ambisonie permet une représentation spatiale homogène du champ sonore et se prête naturellement à des manipulations : rotations, distorsion du champ sonore. L’objectif de cette thèse est de fournir un outil d’analyse et de manipulation de contenus audio (essentiellement vocaux) au format ambisonique. Un fonctionnement temps-réel et en conditions acoustiques réelles sont les principales contraintes à respecter. L’algorithme mis au point est basé sur une analyse en composantes indépendantes (ACI) appliquée trame à trame qui permet de décomposer le champ acoustique en un ensemble de contributions, correspondant à des sources (champ direct) ou à de la réverbération. Une étape de classification bayésienne, appliquée aux composantes extraites, permet alors l’identification et le dénombrement des sources sonores contenues dans le mélange. Les sources identifiées sont localisées grâce à la matrice de mélange obtenue par ACI, pour fournir une cartographie de la scène sonore. Une étude exhaustive des performances est menée sur des contenus réels en fonction de plusieurs paramètres : nombre de sources, environnement acoustique, longueur des trames, ou ordre ambisonique utilisé. Des résultats fiables en terme de localisation et de comptage de sources ont été obtenus pour des trames de quelques centaines de ms. L’algorithme, exploité comme prétraitement dans un prototype d’assistant vocal domestique, permet d’améliorer significativement les performances de reconnaissance, notamment en prise de son lointaine et en présence de sources interférentes. / The context of this thesis is the development of spatialized audio (5.1 contents, Dolby Atmos...) and particularly of 3D audio. Among the existing 3D audio formats, Ambisonics and Higher Order Ambisonics (HOA) allow a homogeneous spatial representation of a sound field and allows basics manipulations, like rotations or distorsions. The aim of the thesis is to provides efficient tools for ambisonics and HOA sound scene analyse and manipulations. A real-time implementation and robustness to reverberation are the main constraints to deal with. The implemented algorithm is based on a frame-by-frame Independent Component Analysis (ICA), wich decomposes the sound field into a set of acoustic contributions. Then a bayesian classification step is applied to the extracted components to identify the real sources and the residual reverberation. Direction of arrival of the sources are extracted from the mixing matrix estimated by ICA, according to the ambisonic formalism, and a real-time cartography of the sound scene is obtained. Performances have been evaluated in different acoustic environnements to assess the influence of several parameters such as the ambisonic order, the frame length or the number of sources. Accurate results in terms of source localization and source counting have been obtained for frame lengths of a few hundred milliseconds. The algorithm is exploited as a pre-processing step for a speech recognition prototype and allows a significant increasing of the recognition results, in far field conditions and in the presence of noise and interferent sources.
10

Deep learning methods for reverberant and noisy speech enhancement

Zhao, Yan 15 September 2020 (has links)
No description available.

Page generated in 0.0818 seconds