Global ETD Search

1	Kinect em conjunto com o SRP-PHAT como solução de localização de fonte sonora Seewald, Lucas Adams 28 March 2014 (has links) Submitted by Maicon Juliano Schmidt (maicons) on 2015-07-06T18:20:56Z No. of bitstreams: 1 Lucas Adams Seewald.pdf: 2650183 bytes, checksum: b48d406145d4e90aaf15d30b38b2ccbc (MD5) / Made available in DSpace on 2015-07-06T18:20:56Z (GMT). No. of bitstreams: 1 Lucas Adams Seewald.pdf: 2650183 bytes, checksum: b48d406145d4e90aaf15d30b38b2ccbc (MD5) Previous issue date: 2014-01-31 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / PROSUP - Programa de Suporte à Pós-Gradução de Instituições de Ensino Particulares / Este documento apresenta uma avaliação de aplicabilidade do Kinect em conjunto com o SPR-PHAT como solução de Localização de Fonte Sonora. Um protótipo capaz de se comunicar com o aparelho e executar SRP-PHAT foi implementado com a finalidade de testar a precisão da solução. É realizada uma revisão dos fundamentos da Localização de Fonte Sonora e seus princípios matemáticos, com foco específico no SRP-PHAT. Seguindo para o Kinect, são realizadas algumas considerações a respeito de seus componentes e limitações. São apresentados alguns trabalhos que recorrem ao aparelho para localizar fontes sonoras, seguidos de resultados de precisão do SRP-PHAT obtidos por diferentes autores. Foram realizados dois grupos de experimentos, um voltado para as características da fonte sonora e o outro para a qualidade da solução proposta. Os experimentos incluem localização em duas e três dimensões, utilizando dois Kinects no segundo caso. As particularidades de implementação do programa que manipula os Kinects e executa o algoritmo de localização são fornecidas juntamente com descrições dos procedimentos de teste adotados. Os resultados apresentados mostram que a solução é capaz de apontar com precisão para a direção da fonte. / This document presents an evaluation of Kinect together with SRP-PHAT as a Sound Source Localization solution. A functional prototype able to communicate with the device and perform SRP-PHAT was implemented in order to test the solution’s accuracy. The fundamentals of Sound Source Localization and it’s mathematical principles are reviewed, focusing specifically on the SRP-PHAT. Moving on to the Kinect device, some considerations are made about it’s components and limitations. Related work which resources to Kinects source localization capabilities is presented, followed by SRP-PHAT precision test results attained by different authors. Two experimental sets were conducted, one focused on the source signal properties and the other on measuring the proposed solutions quality. Performed experiments comprehend two dimensional and three dimensional localization, being a second Kinect needed for the latter. Implementation aspects concerning the software responsible for manipulating both Kinects and executing the localization algorithm are described along with experimental procedure details. Presented results show that the proposed solution can accurately point at the sources direction. SRP-PHAT Localização de Fonte Sonora Kinect Sound source localization
2	Time Delay Estimate Based Direction of Arrival Estimation for Speech in Reverberant Environments Varma, Krishnaraj M. 11 November 2002 (has links) Time delay estimation (TDE)-based algorithms for estimation of direction of arrival (DOA) have been most popular for use with speech signals. This is due to their simplicity and low computational requirements. Though other algorithms, like the steered response power with phase transform (SRP-PHAT), are available that perform better than TDE based algorithms, the huge computational load required for this algorithm makes it unsuitable for applications that require fast refresh rates using short frames. In addition, the estimation errors that do occur with SRP-PHAT tend to be large. This kind of performance is unsuitable for an application such as video camera steering, which is much less tolerant to large errors than it is to small errors. We propose an improved TDE-based DOA estimation algorithm called time delay selection (TIDES) based on either minimizing the weighted least squares error (MWLSE) or minimizing the time delay separation (MWTDS). In the TIDES algorithm, we consider not only the maximum likelihood (ML) TDEs for each pair of microphones, but also other secondary delays corresponding to smaller peaks in the generalized cross-correlation (GCC). From these multiple candidate delays for each microphone pair, we form all possible combinations of time delay sets. From among these we pick one set based on one of the two criteria mentioned above and perform least squares DOA estimation using the selected set of time delays. The MWLSE criterion selects that set of time delays that minimizes the least squares error. The MWTDS criterion selects that set of time delays that has minimum distance from a statistically averaged set of time delays from previously selected time delays. Both TIDES algorithms are shown to out-perform the ML-TDE algorithm in moderate signal to reverberation ratios. In fact, TIDES-MWTDS gives fewer large errors than even the SRP-PHAT algorithm, which makes it very suitable for video camera steering applications. Under small signal to reverberation ratio environments, TIDES-MWTDS breaks down, but TIDES-MWLSE is still shown to out-perform the algorithm based on ML-TDE. / Master of Science MUSIC Beamformer Microphone array processing Least squares estimate TDE SRP-PHAT PHAT GCC
3	Source Localization and Speech Enhancement for Speech Recognition for Real time Environment Muhammad, Asim, Ali, Akbar January 2012 (has links) Popularity of speech communication is rapidly increasing in various contexts such as conferencing systems, mobile/fixed electronic devices and laptops thus leading to a heightened demand for new services and improved speech quality. Dictaphones used for dictations usually have one microphone. Single microphone does not give enough degree of freedom to allow estimation of location of the source. Microphone array makes use of multiple microphones for spatial filtering suppressing the background noise. This report aims for speech enhancement utilizing the benefits inherited with microphone arrays to find direction of desired speaker and focus the listening beam in that direction. A comparison is made between Generalized Cross Correlation (GCC) methods for locating the source in real office environment. Beamforming is implemented to make the microphone array listen in the desired direction thus reducing the interference from other sources. Minimum Variance Distortion-less Response (MVDR) approach is shown to give better results compared to more simplistic techniques. Perceptual based Eigen filter incorporating human hearing models in subspace incorporated in the suppressor eliminates the residual noise. Objective system performance is evaluated by estimating Signal-to-Noise-Ratio improvement (SNRI), segmental SNR, signal degradation and noise suppression. Perpetual Evaluation of Speech Quality (PESQ) gives Mean Opinion Score for subjective evaluation. / asim_zolo@yahoo.com, akbarali45@gmail.com Beamforming Localization Lapped Transform SRP-PHAT MVDR Subspace Supression PESQ Computer Sciences Datavetenskap (datalogi) Signal Processing Signalbehandling
4	Multichannel audio processing for speaker localization, separation and enhancement Martí Guerola, Amparo 29 October 2013 (has links) This thesis is related to the field of acoustic signal processing and its applications to emerging communication environments. Acoustic signal processing is a very wide research area covering the design of signal processing algorithms involving one or several acoustic signals to perform a given task, such as locating the sound source that originated the acquired signals, improving their signal to noise ratio, separating signals of interest from a set of interfering sources or recognizing the type of source and the content of the message. Among the above tasks, Sound Source localization (SSL) and Automatic Speech Recognition (ASR) have been specially addressed in this thesis. In fact, the localization of sound sources in a room has received a lot of attention in the last decades. Most real-word microphone array applications require the localization of one or more active sound sources in adverse environments (low signal-to-noise ratio and high reverberation). Some of these applications are teleconferencing systems, video-gaming, autonomous robots, remote surveillance, hands-free speech acquisition, etc. Indeed, performing robust sound source localization under high noise and reverberation is a very challenging task. One of the most well-known algorithms for source localization in noisy and reverberant environments is the Steered Response Power - Phase Transform (SRP-PHAT) algorithm, which constitutes the baseline framework for the contributions proposed in this thesis. Another challenge in the design of SSL algorithms is to achieve real-time performance and high localization accuracy with a reasonable number of microphones and limited computational resources. Although the SRP-PHAT algorithm has been shown to be an effective localization algorithm for real-world environments, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this context, several modifications and optimizations have been proposed to improve its performance and applicability. An effective strategy that extends the conventional SRP-PHAT functional is presented in this thesis. This approach performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation with a small hardware cost (reduced number of microphones). This strategy allows to implement real-time applications based on location information, such as automatic camera steering or the detection of speech/non-speech fragments in advanced videoconferencing systems. As stated before, besides the contributions related to SSL, this thesis is also related to the field of ASR. This technology allows a computer or electronic device to identify the words spoken by a person so that the message can be stored or processed in a useful way. ASR is used on a day-to-day basis in a number of applications and services such as natural human-machine interfaces, dictation systems, electronic translators and automatic information desks. However, there are still some challenges to be solved. A major problem in ASR is to recognize people speaking in a room by using distant microphones. In distant-speech recognition, the microphone does not only receive the direct path signal, but also delayed replicas as a result of multi-path propagation. Moreover, there are multiple situations in teleconferencing meetings when multiple speakers talk simultaneously. In this context, when multiple speaker signals are present, Sound Source Separation (SSS) methods can be successfully employed to improve ASR performance in multi-source scenarios. This is the motivation behind the training method for multiple talk situations proposed in this thesis. This training, which is based on a robust transformed model constructed from separated speech in diverse acoustic environments, makes use of a SSS method as a speech enhancement stage that suppresses the unwanted interferences. The combination of source separation and this specific training has been explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance. / Martí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101 Sound source localization Sound source separation SRP-PHAT Microphone array Speaker detection Automatic speech recognition. TEORIA DE LA SEÑAL Y COMUNICACIONES
5	Acoustic Beamforming : Design and Development of Steered Response Power With Phase Transformation (SRP-PHAT). / Acoustic Beamforming : Design and Development of Steered Response Power With Phase Transformation (SRP-PHAT). Dey, Ajoy Kumar, Saha, Susmita January 2011 (has links) Acoustic Sound Source localization using signal processing is required in order to estimate the direction from where a particular acoustic source signal is coming and it is also important in order to find a soluation for hands free communication. Video conferencing, hand free communications are different applications requiring acoustic sound source localization. This applications need a robust algorithm which can reliably localize and position the acoustic sound sources. The Steered Response Power Phase Transform (SRP-PHAT) is an important and roubst algorithm to localilze acoustic sound sources. However, the algorithm has a high computational complexity thus making the algorithm unsuitable for real time applications. This thesis focuses on describe the implementation of the SRP-PHAT algorithm as a function of source type, reverberation levels and ambient noise. The main objective of this thesis is to present different approaches of the SRP-PHAT to verify the algorithm in terms of acoustic enviroment, microphone array configuration, acoustic source position and levels of reverberation and noise. Acoustic Beamforming SRP-PHAT Sound Source Localization Source Position Microphone Array algorithm Signal Processing Signalbehandling Computer Sciences Datavetenskap (datalogi) Telecommunications Telekommunikation

1

Page generated in 0.0221 seconds