1 |
Time Delay Estimate Based Direction of Arrival Estimation for Speech in Reverberant EnvironmentsVarma, Krishnaraj M. 11 November 2002 (has links)
Time delay estimation (TDE)-based algorithms for estimation of direction of arrival (DOA) have been most popular for use with speech signals. This is due to their simplicity and low computational requirements. Though other algorithms, like the steered response power with phase transform (SRP-PHAT), are available that perform better than TDE based algorithms, the huge computational load required for this algorithm makes it unsuitable for applications that require fast refresh rates using short frames. In addition, the estimation errors that do occur with SRP-PHAT tend to be large. This kind of performance is unsuitable for an application such as video camera steering, which is much less tolerant to large errors than it is to small errors.
We propose an improved TDE-based DOA estimation algorithm called time delay selection (TIDES) based on either minimizing the weighted least squares error (MWLSE) or minimizing the time delay separation (MWTDS). In the TIDES algorithm, we consider not only the maximum likelihood (ML) TDEs for each pair of microphones, but also other secondary delays corresponding to smaller peaks in the generalized cross-correlation (GCC). From these multiple candidate delays for each microphone pair, we form all possible combinations of time delay sets. From among these we pick one set based on one of the two criteria mentioned above and perform least squares DOA estimation using the selected set of time delays. The MWLSE criterion selects that set of time delays that minimizes the least squares error. The MWTDS criterion selects that set of time delays that has minimum distance from a statistically averaged set of time delays from previously selected time delays.
Both TIDES algorithms are shown to out-perform the ML-TDE algorithm in moderate signal to reverberation ratios. In fact, TIDES-MWTDS gives fewer large errors than even the SRP-PHAT algorithm, which makes it very suitable for video camera steering applications. Under small signal to reverberation ratio environments, TIDES-MWTDS breaks down, but TIDES-MWLSE is still shown to out-perform the algorithm based on ML-TDE. / Master of Science
|
2 |
Kinect em conjunto com o SRP-PHAT como solução de localização de fonte sonoraSeewald, Lucas Adams 28 March 2014 (has links)
Submitted by Maicon Juliano Schmidt (maicons) on 2015-07-06T18:20:56Z
No. of bitstreams: 1
Lucas Adams Seewald.pdf: 2650183 bytes, checksum: b48d406145d4e90aaf15d30b38b2ccbc (MD5) / Made available in DSpace on 2015-07-06T18:20:56Z (GMT). No. of bitstreams: 1
Lucas Adams Seewald.pdf: 2650183 bytes, checksum: b48d406145d4e90aaf15d30b38b2ccbc (MD5)
Previous issue date: 2014-01-31 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / PROSUP - Programa de Suporte à Pós-Gradução de Instituições de Ensino Particulares / Este documento apresenta uma avaliação de aplicabilidade do Kinect em conjunto com o SPR-PHAT como solução de Localização de Fonte Sonora. Um protótipo capaz de se comunicar com o aparelho e executar SRP-PHAT foi implementado com a finalidade de testar a precisão da solução. É realizada uma revisão dos fundamentos da Localização de Fonte Sonora e seus princípios matemáticos, com foco específico no SRP-PHAT. Seguindo para o Kinect, são realizadas algumas considerações a respeito de seus componentes e limitações. São apresentados alguns trabalhos que recorrem ao aparelho para localizar fontes sonoras, seguidos de resultados de precisão do SRP-PHAT obtidos por diferentes autores. Foram realizados dois grupos de experimentos, um voltado para as características da fonte sonora e o outro para a qualidade da solução proposta. Os experimentos incluem localização em duas e três dimensões, utilizando dois Kinects no segundo caso. As particularidades de implementação do programa que manipula os Kinects e executa o algoritmo de localização são fornecidas juntamente com descrições dos procedimentos de teste adotados. Os resultados apresentados mostram que a solução é capaz de apontar com precisão para a direção da fonte. / This document presents an evaluation of Kinect together with SRP-PHAT as a Sound Source Localization solution. A functional prototype able to communicate with the device and perform SRP-PHAT was implemented in order to test the solution’s accuracy. The fundamentals of Sound Source Localization and it’s mathematical principles are reviewed, focusing specifically on the SRP-PHAT. Moving on to the Kinect device, some considerations are made about it’s components and limitations. Related work which resources to Kinects source localization capabilities is presented, followed by SRP-PHAT precision test results attained by different authors. Two experimental sets were conducted, one focused on the source signal properties and the other on measuring the proposed solutions quality. Performed experiments comprehend two dimensional and three dimensional localization, being a second Kinect needed for the latter. Implementation aspects concerning the software responsible for manipulating both Kinects and executing the localization algorithm are described along with experimental procedure details. Presented results show that the proposed solution can accurately point at the sources direction.
|
3 |
EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTIONRamamurthy, Anand 01 January 2007 (has links)
The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation.
|
4 |
Phase Transform Time Delay Estimation to Counteract Spectral Haystacking Effects in Jet Exhaust Flow MeasurementsSilas, Kevin Alexander 01 September 2021 (has links)
This study determined a superior data processing technique for correlating an acoustic signal passing through a subsonic jet engine exhaust in order to estimate the traversal time of the signal. Thrust measurement is possible with enough time delay estimates across different portions of the exhaust. This preliminary study did not take the full array of data necessary to measure thrust, but did validate key aspects of the measurement process. The turbulent shear layers of the exhaust spectrally broaden the signal, creating the appearance of spectral "haystacks", making traditional correlation methods unworkable. An experiment was performed to evaluate the ability of a novel sound source to produce a signal from which a reliable and precise time delay estimate could be found. The test apparatus was installed on either side of a Honeywell TFE731-2 turbofan research engine exhaust cone, with the source and receivers placed near the jet exit plane. The signal was then directed across the jet exhaust. This flow environment is considered an extreme challenge for accurate acoustic signal propagation. A key contribution of this paper is the determination that the Phase Transform processor of the Generalized Cross-Correlation (GCC) method produces the most reliable time delay estimates, for the given signal and flow conditions. Several alternative time delay estimators and GCC processors were examined and evaluated on this data. A proposed explanation is provided for why this time delay estimation technique produces the most accurate results, as well as explanations for why the technique became less reliable as the flow environment became more challenging, with an observed 22% anomalous TDE selection rate for the N1Corr = 60% and N1Corr = 70% conditions combined, versus only 6% for the idle and N1Corr = 50% conditions combined. This paper also details the development and first use of a novel acoustic source that produces a two-tone narrowband signal emanating from a single point – the dual Hartmann generator. / Master of Science / This study builds on a Computational Tomography (CT) technique that uses an acoustic signal and an array of receivers to measure the velocity and temperature of a gas flow field. In particular, the velocity and temperature field tested involves multiple turbulent and disruptive elements, requiring a loud and specifically designed signal. As such, a novel acoustic signal generator, the dual Hartmann generator, was designed that is both loud and produces a specific two-toned signal. The key contribution of the study was to process the data, comparing the sets of transmitted and received signals, in order to estimate the time delay amongst receiver pairs – a key input in the CT method. Traditional cross-correlation methods were inadequate, and multiple alternatives were evaluated. The Phase Transform (PHAT) technique showed the most promise, and an explanation is given for why this technique is most suitable for this type of signal.
|
5 |
Water Depth Estimation Using Ultrasound Pulses for Handheld Diving Equipment / Skattning av vattendjup med ultraljudspulser för mobil dykarutrustningMollén, Katarina January 2015 (has links)
This thesis studies the design and implementation of an ultra-sonic water depth sounder. The depth sounder is implemented in a hand-held smart console used by divers. Since the idea of echo sounding is to measure the flight time between transmitting the signal and receiving the echo, the main challenge of this task is to find a time-of-flight (ToF) estimation for a signal in noise. It should be suitable for this specific application and robust when implemented in the device. The thesis contains an investigation of suitable ToF methods. More detailed evaluations of the matched filter, also known as the correlation method, and the linear phase approach are done. Aspects like pulse frequency and duration, speed of sound in water and underwater noise are taken into account. The ToF-methods are evaluated through simulation and experiments. The matched filter approach is found suitable based on these simulations and tests with signals recorded by the console. This verification leads to the implementation of the algorithm on the device. The algorithm is tested in real time, the results are evaluated and improvements suggested. / Denna rapport behandlar skattning av vattendjup med hjälp av ultraljudspulser och implementation av detta. Djupmätaren implementeras i en handhållen dykarkonsoll. Eftersom grundidén i ekolodning är att mäta tiden mellan att pulsen skickas iväg och att ekot tas emot är en stor del av utmaningen att hitta en lämplig metod för att skatta flykttiden för en signal i brus. Metoden ska passa för detta användingsområde och vara robust. Rapporten tar upp tidigare forskning gjord inom flykttidsestimering. De metoder som utvärderas för implementation är det matchade filtret, också kallad korrelationsmetoden, och linjär fas-metoden. Andra aspekter som avvägs och utreds är pulsfrekvens och pulsvaraktighet, ljudets hastighet och brus under vattnet. Metoderna för att skatta flykttid utvärderas genom simuleringar. Det matchade filtret bedöms vara lämpligt baserat på dessa simuleringar och experiment med data inspelad med konsollen. Denna verifikation leder till att algoritmen implementeras på konsollen. Den implementerade algoritmen testas i realtid, resultaten utvärderas och förbättringar föreslås.
|
6 |
Source Localization and Speech Enhancement for Speech Recognition for Real time EnvironmentMuhammad, Asim, Ali, Akbar January 2012 (has links)
Popularity of speech communication is rapidly increasing in various contexts such as conferencing systems, mobile/fixed electronic devices and laptops thus leading to a heightened demand for new services and improved speech quality. Dictaphones used for dictations usually have one microphone. Single microphone does not give enough degree of freedom to allow estimation of location of the source. Microphone array makes use of multiple microphones for spatial filtering suppressing the background noise. This report aims for speech enhancement utilizing the benefits inherited with microphone arrays to find direction of desired speaker and focus the listening beam in that direction. A comparison is made between Generalized Cross Correlation (GCC) methods for locating the source in real office environment. Beamforming is implemented to make the microphone array listen in the desired direction thus reducing the interference from other sources. Minimum Variance Distortion-less Response (MVDR) approach is shown to give better results compared to more simplistic techniques. Perceptual based Eigen filter incorporating human hearing models in subspace incorporated in the suppressor eliminates the residual noise. Objective system performance is evaluated by estimating Signal-to-Noise-Ratio improvement (SNRI), segmental SNR, signal degradation and noise suppression. Perpetual Evaluation of Speech Quality (PESQ) gives Mean Opinion Score for subjective evaluation. / asim_zolo@yahoo.com, akbarali45@gmail.com
|
7 |
Multichannel audio processing for speaker localization, separation and enhancementMartí Guerola, Amparo 29 October 2013 (has links)
This thesis is related to the field of acoustic signal processing and its applications to emerging
communication environments. Acoustic signal processing is a very wide research area covering
the design of signal processing algorithms involving one or several acoustic signals to perform
a given task, such as locating the sound source that originated the acquired signals, improving
their signal to noise ratio, separating signals of interest from a set of interfering sources or recognizing
the type of source and the content of the message. Among the above tasks, Sound Source
localization (SSL) and Automatic Speech Recognition (ASR) have been specially addressed in
this thesis. In fact, the localization of sound sources in a room has received a lot of attention in
the last decades. Most real-word microphone array applications require the localization of one
or more active sound sources in adverse environments (low signal-to-noise ratio and high reverberation).
Some of these applications are teleconferencing systems, video-gaming, autonomous
robots, remote surveillance, hands-free speech acquisition, etc. Indeed, performing robust sound
source localization under high noise and reverberation is a very challenging task. One of the
most well-known algorithms for source localization in noisy and reverberant environments is
the Steered Response Power - Phase Transform (SRP-PHAT) algorithm, which constitutes the
baseline framework for the contributions proposed in this thesis. Another challenge in the design
of SSL algorithms is to achieve real-time performance and high localization accuracy with a reasonable
number of microphones and limited computational resources. Although the SRP-PHAT
algorithm has been shown to be an effective localization algorithm for real-world environments,
its practical implementation is usually based on a costly fine grid-search procedure, making the
computational cost of the method a real issue. In this context, several modifications and optimizations
have been proposed to improve its performance and applicability. An effective strategy
that extends the conventional SRP-PHAT functional is presented in this thesis. This approach
performs a full exploration of the sampled space rather than computing the SRP at discrete spatial
positions, increasing its robustness and allowing for a coarser spatial grid that reduces the
computational cost required in a practical implementation with a small hardware cost (reduced
number of microphones). This strategy allows to implement real-time applications based on
location information, such as automatic camera steering or the detection of speech/non-speech
fragments in advanced videoconferencing systems.
As stated before, besides the contributions related to SSL, this thesis is also related to the
field of ASR. This technology allows a computer or electronic device to identify the words spoken
by a person so that the message can be stored or processed in a useful way. ASR is used on
a day-to-day basis in a number of applications and services such as natural human-machine
interfaces, dictation systems, electronic translators and automatic information desks. However,
there are still some challenges to be solved. A major problem in ASR is to recognize people
speaking in a room by using distant microphones. In distant-speech recognition, the microphone
does not only receive the direct path signal, but also delayed replicas as a result of multi-path
propagation. Moreover, there are multiple situations in teleconferencing meetings when multiple
speakers talk simultaneously. In this context, when multiple speaker signals are present, Sound
Source Separation (SSS) methods can be successfully employed to improve ASR performance
in multi-source scenarios. This is the motivation behind the training method for multiple talk
situations proposed in this thesis. This training, which is based on a robust transformed model
constructed from separated speech in diverse acoustic environments, makes use of a SSS method
as a speech enhancement stage that suppresses the unwanted interferences. The combination
of source separation and this specific training has been explored and evaluated under different
acoustical conditions, leading to improvements of up to a 35% in ASR performance. / Martí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
|
8 |
Acoustic Beamforming : Design and Development of Steered Response Power With Phase Transformation (SRP-PHAT). / Acoustic Beamforming : Design and Development of Steered Response Power With Phase Transformation (SRP-PHAT).Dey, Ajoy Kumar, Saha, Susmita January 2011 (has links)
Acoustic Sound Source localization using signal processing is required in order to estimate the direction from where a particular acoustic source signal is coming and it is also important in order to find a soluation for hands free communication. Video conferencing, hand free communications are different applications requiring acoustic sound source localization. This applications need a robust algorithm which can reliably localize and position the acoustic sound sources. The Steered Response Power Phase Transform (SRP-PHAT) is an important and roubst algorithm to localilze acoustic sound sources. However, the algorithm has a high computational complexity thus making the algorithm unsuitable for real time applications. This thesis focuses on describe the implementation of the SRP-PHAT algorithm as a function of source type, reverberation levels and ambient noise. The main objective of this thesis is to present different approaches of the SRP-PHAT to verify the algorithm in terms of acoustic enviroment, microphone array configuration, acoustic source position and levels of reverberation and noise.
|
9 |
Lokalizace pohyblivých akustických zdrojů / Localization of moving acoustical sourcesBezdíček, Martin January 2010 (has links)
This master's thesis is focused on localization static (entering semester project) and moving acoustic sources (entering master's thesis) by the help of microphonic arrays. In the first part deal with common problems of localization. Further are here described types of microphonic arrays, simplifying possibilities which delimited this problems and general information about room acoustics. In the next part of this master's thesis are step by step mentioned methods localization of acoustic sources. In practical part were used algorithms: Steered-Beamformer-Based Locators and TDOA-Based Locators. Last part of this master's work includes results of these algorithms.
|
10 |
Mikrofonní pole malých rozměrů pro odhad směru přicházejícího zvuku / Small-Size Microphone Array for Estimation of Direction of Arrival of SoundKubišta, Ladislav January 2020 (has links)
This thesis describe detection of direction receiving sound with small–size microphone array. Work is based on analyzing methods of time delay estimation, energy decay or phase difference signal. Work focus mainly on finding of angle of arrival in small time difference. Results of measuring, as programming sound, so sound recorded in laboratory conditions and real enviroment, are mentioned in conclusion. All calculations were done by platform Matlab
|
Page generated in 0.025 seconds