Global ETD Search

1	Moving Sound Sources Direction of Arrival Classification Using Different Deep Learning Schemes Rusrus, Jana 19 April 2023 (has links) Sound source localization is an important task for several applications and the use of deep learning for this task has recently become a popular research topic. While the majority of the previous work has focused on static sound sources, in this work we evaluate the performance of a deep learning classification system for localization of high-speed moving sound sources. In particular, we systematically evaluate the effect of a wide range of parameters at three levels including: data generation (e.g., acoustic conditions), feature extraction (e.g., STFT parameters), and model training (e.g., neural network architectures). We evaluate the performance of multiple metrics in terms of precision, recall, F-score and confusion matrix in a multi-class multi-label classification framework. We used four different deep learning models: feedforward neural networks, recurrent neural network, gated recurrent networks and temporal Convolutional neural network. We showed that (1) the presence of some reverberation in the training dataset can help in achieving better detection for the direction of arrival of acoustic sources, (2) window size does not affect the performance of static sources but highly affects the performance of moving sources, (3) sequence length has a significant effect on the performance of recurrent neural network architectures, (4) temporal convolutional neural networks can outperform both recurrent and feedforward networks for moving sound sources, (5) training and testing on white noise is easier for the network than training on speech data, and (6) increasing the number of elements in the microphone array improves the performance of the direction of arrival estimation. Direction of arrival Deep learninig Sound source localization
2	Kinect em conjunto com o SRP-PHAT como solução de localização de fonte sonora Seewald, Lucas Adams 28 March 2014 (has links) Submitted by Maicon Juliano Schmidt (maicons) on 2015-07-06T18:20:56Z No. of bitstreams: 1 Lucas Adams Seewald.pdf: 2650183 bytes, checksum: b48d406145d4e90aaf15d30b38b2ccbc (MD5) / Made available in DSpace on 2015-07-06T18:20:56Z (GMT). No. of bitstreams: 1 Lucas Adams Seewald.pdf: 2650183 bytes, checksum: b48d406145d4e90aaf15d30b38b2ccbc (MD5) Previous issue date: 2014-01-31 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / PROSUP - Programa de Suporte à Pós-Gradução de Instituições de Ensino Particulares / Este documento apresenta uma avaliação de aplicabilidade do Kinect em conjunto com o SPR-PHAT como solução de Localização de Fonte Sonora. Um protótipo capaz de se comunicar com o aparelho e executar SRP-PHAT foi implementado com a finalidade de testar a precisão da solução. É realizada uma revisão dos fundamentos da Localização de Fonte Sonora e seus princípios matemáticos, com foco específico no SRP-PHAT. Seguindo para o Kinect, são realizadas algumas considerações a respeito de seus componentes e limitações. São apresentados alguns trabalhos que recorrem ao aparelho para localizar fontes sonoras, seguidos de resultados de precisão do SRP-PHAT obtidos por diferentes autores. Foram realizados dois grupos de experimentos, um voltado para as características da fonte sonora e o outro para a qualidade da solução proposta. Os experimentos incluem localização em duas e três dimensões, utilizando dois Kinects no segundo caso. As particularidades de implementação do programa que manipula os Kinects e executa o algoritmo de localização são fornecidas juntamente com descrições dos procedimentos de teste adotados. Os resultados apresentados mostram que a solução é capaz de apontar com precisão para a direção da fonte. / This document presents an evaluation of Kinect together with SRP-PHAT as a Sound Source Localization solution. A functional prototype able to communicate with the device and perform SRP-PHAT was implemented in order to test the solution’s accuracy. The fundamentals of Sound Source Localization and it’s mathematical principles are reviewed, focusing specifically on the SRP-PHAT. Moving on to the Kinect device, some considerations are made about it’s components and limitations. Related work which resources to Kinects source localization capabilities is presented, followed by SRP-PHAT precision test results attained by different authors. Two experimental sets were conducted, one focused on the source signal properties and the other on measuring the proposed solutions quality. Performed experiments comprehend two dimensional and three dimensional localization, being a second Kinect needed for the latter. Implementation aspects concerning the software responsible for manipulating both Kinects and executing the localization algorithm are described along with experimental procedure details. Presented results show that the proposed solution can accurately point at the sources direction. SRP-PHAT Localização de Fonte Sonora Kinect Sound source localization
3	Servostyrning med binaural ljudlokalisering / Servo Control Using Binaural Sound Source Localization Jansson, Conny January 2015 (has links) People are usually directed towards each other in conversations, to make it easier to hear what is being said. Algorithms for voice and speech recognition works in a similar way, regarding the microphone direction towards the sound source. In this thesis in electronics has therefore a servo control with binaural sound localization been implemented on a microcontroller connected to two microphones. When people perceive sound, the brain can estimate the sound source direction by comparing the time taken by the sound reaching one ear to the other [1]. The difference in time is called the interaural time difference, and can be calculated using various techniques. An exploratory comparison between the techniques cross-correlation and cross-spectrum analysis was carried out before implementation. Advantages and disadvantages of each technique were evaluated at the same time. The result is a functioning servo control, that uses a cross correlation algorithm to calculate the interaural time difference, and controls a servo motor towards the sound source with a P-regulated error reduction method. The project was implemented on the microcontroller ATmega328P from Atmel without using floating point calculations. The thesis was carried out on behalf of the company Jetspark Robotics. Binaural ljudlokalisering korskorrelation sound source localization interaural time delay Jetspark Robotics Binaural ljudlokalisering korskorrelation
4	STATISTICAL MODELS FOR CONSTANT FALSE-ALARM RATE THRESHOLD ESTIMATION IN SOUND SOURCE DETECTION SYSTEMS Saghaian Nejad Esfahani, Sayed Mahdi 01 January 2010 (has links) Constant False Alarm Rate (CFAR) Processors are important for applications where thousands of detection tests are made per second, such as in radar. This thesis introduces a new method for CFAR threshold estimation that is particularly applicable to sound source detection with distributed microphone systems. The novel CFAR Processor exploits the near symmetry about 0 for the acoustic pixel values created by steered-response coherent power in conjunction with a partial whitening preprocessor to estimate thresholds for positive values, which represent potential targets. To remove the low frequency components responsible for degrading CFAR performance, fixed and adaptive high-pass filters are applied. A relation is proposed and it tested the minimum high-pass cut-off frequency and the microphone geometry. Experimental results for linear, perimeter and planar arrays illustrate that for desired false alarm (FA) probabilities ranging from 10-1 and 10-6, a good CFAR performance can be achieved by modeling the coherent power with Chi-square and Weibull distributions and the ratio of desired over experimental FA probabilities can be limited within an order of magnitude. Sound Source Localization CFAR Processor High-pass Filter Chi-square Distribution Weibull Distribution Electrical and Computer Engineering
5	Dynamic Spatial Hearing by Human and Robot Listeners January 2015 (has links) abstract: This study consisted of several related projects on dynamic spatial hearing by both human and robot listeners. The first experiment investigated the maximum number of sound sources that human listeners could localize at the same time. Speech stimuli were presented simultaneously from different loudspeakers at multiple time intervals. The maximum of perceived sound sources was close to four. The second experiment asked whether the amplitude modulation of multiple static sound sources could lead to the perception of auditory motion. On the horizontal and vertical planes, four independent noise sound sources with 60° spacing were amplitude modulated with consecutively larger phase delay. At lower modulation rates, motion could be perceived by human listeners in both cases. The third experiment asked whether several sources at static positions could serve as "acoustic landmarks" to improve the localization of other sources. Four continuous speech sound sources were placed on the horizontal plane with 90° spacing and served as the landmarks. The task was to localize a noise that was played for only three seconds when the listener was passively rotated in a chair in the middle of the loudspeaker array. The human listeners were better able to localize the sound sources with landmarks than without. The other experiments were with the aid of an acoustic manikin in an attempt to fuse binaural recording and motion data to localize sounds sources. A dummy head with recording devices was mounted on top of a rotating chair and motion data was collected. The fourth experiment showed that an Extended Kalman Filter could be used to localize sound sources in a recursive manner. The fifth experiment demonstrated the use of a fitting method for separating multiple sounds sources. / Dissertation/Thesis / Doctoral Dissertation Speech and Hearing Science 2015 Behavioral sciences Acoustics Robotics machine hearing sound source localization spatial hearing
6	Sound localization for human interaction in real environment Strömberg, Ralf, Svensson, Stig-Åke January 2011 (has links) For a robot to succeed at speech recognition, it is advantageous to have a strong and clear signal tointerpret. To facilitate this the robot can steer and aim for the sound source to get a clearer signal, todo this a sound source localization system is required. If the robot turns towards the speaker thisalso gives a more natural feeling when a human interacts with the robot. To determine where thesound source is positioned, an angle relative to the microphone pair is calculated using theinteraural time difference (ITD), which is the difference in time of arrival of the sound between thepair of microphones. To achieve good result the microphone signals needs to be preprocessed andthere are also different algorithms for calculating the time difference which are investigated in thisthesis. The results presented in this work are from tests, with an emphasis on focusing at real-timesystems, involving noisy environment and response time. The results show the complexity of thebalance between computational time and precision. / För att en robot ska lyckas med taleigenkänning, är det fördelaktigt att ha en stark och tydlig signalatt tolka. För att underlätta detta kan roboten styra och rikta in sig mot ljudkällan för att få entydligare signal och för att detta skall vara möjligt krävs ett system för lokalisering av ljudkällan.Om roboten vänder sig mot talaren ger detta även en mer naturlig känsla när en människainteragerar med roboten. För att avgöra var ljudkällan är placerad, beräknas en vinkel i förhållandetill mikrofonparet med hjälp av interaurala tidsskillnaden (ITD), vilket är skillnaden i ankomsttid avljudet mellan mikrofonparet. För att uppnå bra resultat måste mikrofonsignalerna förbehandlas ochdet finns också olika algoritmer för att beräkna tidsskillnaden som undersöks i detta examensarbete.Det resultat som presenteras i detta arbete kommer från tester, med tonvikt på att fokusera pårealtidssystem, som inbegriper bullrig miljö och svarstid. Resultaten visar komplexiteten i balansenmellan beräknings tid och precision. Cross-correlation ITD Fourier transform Sound source localization Computer Sciences Datavetenskap (datalogi)
7	Sound Source Localization for an Urban Outdoor Setting : A Systematic Review / Ljudlokalisering för en urban utomhusmiljö : En systematisk litteraturstudie Malmgren, Anna January 2022 (has links) Sound source localization (SSL) is a broad field, with many important application areas. In outdoor environments SSL systems can, among other things, be used to increase citizens’ safety by detecting and locating abnormal sounds such as gunshots or screams. Localization is a complex field, in the case of an outdoor setting, the sound signal is affected by weather conditions, noise, and objects blocking the propagation path. Furthermore, challenges concerning implementing cost-effective algorithms, robustness, accuracy, and balancing trade-offs, still remain. SSL is a field of intense research, and new studies are continuously published. However, to the best of the authors knowledge, there are no recent reviews of state of the art SSL solutions, applicable in an outdoor urban setting. Hence, this study provides a knowledge base concerning current SSL approaches, intended for the aforesaid environment, and to this end a systematic literature review was performed. The review consisted of a total of 43 studies, published between 2017-2021. From the extracted data, a taxonomy of currently seen design principles was developed. Additionally, both the applied measurement techniques and the positioning methods were defined. It can be seen from the result that classical methods such as direction of arrival and time difference of arrival still are the most used principles in research. However, learning-based approaches have seemingly started to attract more attention. Furthermore, a general description of the SSL approaches has been presented. Thus, the knowledge base provided by this study contains both information on what current state of the art techniques are most commonly adopted as well as the basic ideas behind these principles. sound source localization outdoor localization state of the art SSL Computer Sciences Datavetenskap (datalogi)
8	Improvement of Sound Source Localization for a Binaural Robot of Spherical Head with Pinnae / 耳介付球状頭部を持つ両耳聴ロボットのための音源定位の高性能化 Kim, Ui-Hyun 24 September 2013 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第17928号 / 情博第510号 / 新制\|\|情\|\|90(附属図書館) / 30748 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授奥乃博, 教授河原達也, 教授山本章博 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Intelligent robot audition human-robot interaction voice activity detection sound source localization front-back disambiguation 007
9	Sound Source Localization and Beamforming for Teleconferencing Solutions Kjellson, Angelica January 2014 (has links) In teleconferencing the audio quality is key to conducting successful meetings. The conference room setting imposes various challenges on the speech signal processing, such as noise and interfering signals, reverberation, or participants positioned far from the telephone unit. This work aims at improving the received speech signal of a conference telephone by implementing sound source localization and beamforming. The implemented microphone array signal processing techniques are compared to the performance of an existing multi-microphone solution and evaluated under various conditions using a planar uniform circular array. Recordings of test-sequences for the evaluation were performed using a custom-built array mockup. The implemented algorithms did not show good enough performance to motivate the increased computational complexity compared to the existing solution. Moreover, an increase in number of microphones used was concluded to have little or no effect on the performance of the methods. The type of microphone used was, however, concluded to have impact on the performance and a subjective listening evaluation indicated a preference for omnidirectional microphones which is recommended to investigate further. / God ljudkvalitet är en grundsten för lyckade telefonmöten. Miljön i ett konferens-rum medför ett flertal olika utmaningar för behandlingen av mikrofonsignalerna: det kan t.ex. vara brus och störningar, eller att den som talar befinner sig långt från telefonen. Målet med detta arbete är att förbättra den talsignal som tas upp av en konferenstelefon genom att implementera lösningar för lokalisering av talaren och riktad ljudupptagning med hjälp av ett flertal mikrofoner. De implementerade metoderna jämförs med en befintlig lösning och utvärderas under olika brusscenarion för en likformig cirkulär mikrofonkonstellation. För utvärderingen användes testsignaler som spelades in med en specialbyggd enhet. De implementerade algoritmerna kunde inte uppvisa en tillräcklig förbättring i jämförelse med den befintliga lösningen för att motivera den ökade beräkningskomplexitet de skulle medföra. Dessutom konstaterades att en fördubbling av antalet mikrofoner gav liten eller ingen förbättring på metoderna. Vilken typ av mikrofon som användes konstaterades däremot påverka resultatet och en subjektiv utvärdering indikerade en preferens för de rundupptagande mikrofonerna, en skillnad som föreslås undersökas vidare. statistical digital signal processing microphone array signal processing sound source localization (SSL) beamforming uniform circular array (UCA) teleconferencing
10	CONSTANT FALSE ALARM RATE PERFORMANCE OF SOUND SOURCE DETECTION WITH TIME DELAY OF ARRIVAL ALGORITHM Wang, Xipeng 01 January 2017 (has links) Time Delay of Arrival (TDOA) based algorithms and Steered Response Power (SRP) based algorithms are two most commonly used methods for sound source detection and localization. SRP is more robust under high reverberation and multi-target conditions, while TDOA is less computationally intensive. This thesis introduces a modified TDOA algorithm, TDOA delay table search (TDOA-DTS), that has more stable performance than the original TDOA, and requires only 4% of the SRP computation load for a 3-dimensional space of a typical room. A 2-step adaptive thresholding procedure based on a Weibull noise peak distributions for the cross-correlations and a binomial distribution for combing potential peaks over all microphone pairs for the final detection. The first threshold limits the potential target peaks in the microphone pair cross-correlations with a user-defined false-alarm (FA) rates. The initial false-positive peak rate can be set to a higher level than desired for the final FA target rate so that high accuracy is not required of the probability distribution model (where model errors do not impact FA rates as they work for threshold set deep into the tail of the curve). The final FA rate can be lowered to the actual desired value using an M out of N (MON) rule on significant correlation peaks from different microphone pairs associated is a point in the space of interest. The algorithm is tested with simulated and real recorded data to verify resulting FA rates are consistent with the user-defined rates down to 10-6. Sound Source Localization Multiple Targets Detection Time Delay of Arrival Constant False Alarm Rate M out of N Rule Signal Processing

Search results