11 |
Playable ambisonic spatial motion: music performance techniques and mappings for the extended bassoonCannon, Joanne January 2009 (has links)
This research dissertation presents work undertaken to develop new performance techniques and mappings for the expressive control of spatial motion using Ambisonic projection. The dissertation reviews relevant research from the fields of Spatial Sound and Extended Instruments, and establishes playability as a useful set of criteria for a reflexive project methodology and evaluation. This reflexive research systematically investigates Trevor Wishart’s taxonomy of spatial motions through the development of new hardware, software, performance techniques and spatial motion analysis.
|
12 |
Evaluation of ambisonic microphone techniques in conjunction with spot-microphones for 360-degree video within an acoustic environmentSjöholm, Linus January 2023 (has links)
In recent years, the popularity of 360-degree video paired with 1st order ambisonic audio has seen a rise on different social media platforms online. Due to this increase in popularity, many new ambisonic microphones have been developed and are now available on the market. However, most of the research into this field has almost exclusively been in the form of case studies where microphone manufacturers showcase practical applications of their equipment, and no real comparisons between ambisonic recording methods have been made. This study aims to fill that gap by conducting a listening test that compares four common methods of recording ambisonic audio and to evaluate listeners’ preferences regarding spatial attributes. Due to a relatively small sample size of 15 no definitive conclusions can be made, but the study did find a clear preference towards a combined method of an ambisonic microphone paired with spot microphones.
|
13 |
Spatial Audio for Bat BiosonarLee, Hyeon 24 August 2020 (has links)
Research investigating the behavioral and physiological responses of bats to echoes typically includes analysis of acoustic signals from microphones and/or microphone arrays, using time difference of arrival (TDOA) between array elements or the microphones to locate flying bats (azimuth and elevation). This has provided insight into transmission adaptations with respect to target distance, clutter, and interference. Microphones recording transmitted signals and echoes near a stationary bat provide sound pressure as a function of time but no directional information.
This dissertation introduces spatial audio techniques to bat biosonar studies as a complementary method to the current TDOA based acoustical study methods. This work proposes a couple of feasible methods based on spatial audio techniques, that both track bats in flight and pinpoint the directions of echoes received by a bat. A spatial audio/soundfield microphone array is introduced to measure sounds in the sonar frequency range (20-80 kHz) of the big brown bat (Eptesicus fuscus). The custom-built ultrasonic tetrahedral soundfield microphone consists of four capacitive microphones that were calibrated to match magnitude and phase responses using a transfer function approach. Ambisonics, a signal processing technique used in three-dimensional (3D) audio applications, is used for the basic processing and reproduction of the signals measured by the soundfield microphone. Ambisonics provides syntheses and decompositions of a signal containing its directional properties, using the relationship between the spherical harmonics and the directional properties.
As the first proposed method, a spatial audio decoding technique called HARPEx (High Angular Resolution Planewave Expansion) was used to build a system providing angle and elevation estimates. HARPEx can estimate the direction of arrivals (DOA) for up to two simultaneous sources since it decomposes a signal into two dominant planewaves. Experiments proved that the estimation system based on HARPEx provides accurate DOA estimates of static or moving sources. It also reconstructed a smooth flight-path of a bat by accurately estimating its direction at each snapshot of pulse measurements in time. The performance of the system was also assessed using statistical analyses of simulations. A signal model was built to generate microphone capsule responses to a virtual source emitting an LFM signal (3 ms, two harmonics: 40-22 kHz and 80-44 kHz) at an angle of 30° in the simulations. Medians and RMSEs (root-mean-square error) of 10,000 simulations for each case represent the accuracy and precision of the estimations, respectively. Results show lower d (distance between a capsule and the soundfield microphone center) or/and higher SNR (signal-to-noise ratio) are required to achieve higher estimator performance. The Cramer-Rao lower bounds (CRLB) of the estimator are also computed with various d and SNR conditions. The CRLB which is for TDOA based methods does not cover the effects of different incident angles to the capsules and signal delays between the capsules due to a non-zero d, on the estimation system. This shows the CRLB is not a proper tool to assess the estimator performance.
For the second proposed method, the matched-filter technique is used instead of HARPEx to build another estimation system. The signal processing algorithm based on Ambisonics and the matched-filter approach reproduces a measured signal in various directions, and computes matched-filter responses of the reproduced signals in time-series. The matched-filter result points a target(s) by the highest filter response. This is a sonar-like estimation system that provides information of the target (range, direction, and velocity) using sonar fundamentals. Experiments using a loudspeaker (emitter) and an artificial or natural target (either stationary or moving) show the system provides accurate estimates of the target's direction and range. Simulations of imitating a situation where a bat emits a pulse and receives an echo from a target (30°) were also performed. The echo sound level is determined using the sonar equation. The system processed the virtual bat pulse and echo, and accurately estimated the direction, range, and velocity of the target. The simulation results also appear to recommend an echo level over -3 dB for accurate and precise estimations (below 15% RMSE for all parameters).
This work proposes two methods to track bats in flight or/and pinpoint the directions of targets using spatial audio techniques. The suggested methods provide accurate estimates of the direction, range, or/and velocity of a bat based on its pulses or of a target based on echoes. This demonstrates these methods can be used as key tools to reconstruct bat biosonar. They would be also an independent tool or a complementary option to TDOA based methods, for bat echolocation studies. The developed methods are believed to be also useful in improving man-made sonar technology. / Doctor of Philosophy / While bats are one of the most intriguing creatures to the general population, they are also a popular subject of study in various disciplines. Their extraordinary ability to navigate and forage irrespective of clutter using echolocation has gotten attention from many scientists and engineers. Research investigating bats typically includes analysis of acoustic signals from microphones and/or microphone arrays. Using time difference of arrival (TDOA) between the array elements or the microphones is probably the most popular method to locate flying bats (azimuth and elevation). Microphone responses to transmitted signals and echoes near a bat provide sound pressure but no directional information.
This dissertation proposes a complementary way to the current TDOA methods, that delivers directional information by introducing spatial audio techniques. This work shows a couple of feasible methods based on spatial audio techniques, that can both track bats in flight and pinpoint the directions of echoes received by a bat. An ultrasonic tetrahedral soundfield microphone is introduced as a measurement tool for sounds in the sonar frequency range (20-80 kHz) of the big brown bat (Eptesicus fuscus). Ambisonics, a signal processing technique used in three-dimensional (3D) audio applications, is used for the basic processing of the signals measured by the soundfield microphone. Ambisonics also reproduces a measured signal containing its directional properties.
As the first method, a spatial audio decoding technique called HARPEx (High Angular Resolution Planewave Expansion) was used to build a system providing angle and elevation estimates. HARPEx can estimate the direction of arrivals (DOA) for up to two simultaneous sound sources. Experiments proved that the estimation system based on HARPEx provides accurate DOA estimates of static or moving sources. The performance of the system was also assessed using statistical analyses of simulations. Medians and RMSEs (root-mean-square error) of 10,000 simulations for each simulation case represent the accuracy and precision of the estimations, respectively. Results show shorter distance between a capsule and the soundfield microphone center, or/and higher SNR (signal-to-noise ratio) are required to achieve higher performance.
For the second method, the matched-filter technique is used to build another estimation system. This is a sonar-like estimation system that provides information of the target (range, direction, and velocity) using matched-filter responses and sonar fundamentals. Experiments using a loudspeaker (emitter) and an artificial or natural target (either stationary or moving) show the system provides accurate estimates of the target's direction and range. Simulations imitating a situation where a bat emits a pulse and receives an echo from a target (30°) were also performed. The system processed the virtual bat pulse and echo, and accurately estimated the direction, range, and velocity of the target.
The suggested methods provide accurate estimates of the direction, range, or/and velocity of a bat based on its pulses or of a target based on echoes. This demonstrates these methods can be used as key tools to reconstruct bat biosonar. They would be also an independent tool or a complementary option to TDOA based methods, for bat echolocation studies. The developed methods are also believed to be useful in improving sonar technology.
|
14 |
Popis a reprezentace dvourozměrných zvukových scén ve vícekanálových systémech reprodukce zvuku / 2D Audio Scene Analysis and Rendering in Multichannel Sound-Reproduction SystemsTrzos, Michal January 2009 (has links)
This thesis deals with cues used by the human auditory system to identify the location of sound and methods for sound localisation based these cues, namely, vector based amplitude panning and ambisonics, which are described in detail. These methods have been implemented as a VST plug-in module. This thesis also contains listening tests of second order ambisonics along with acquired data analysis.
|
15 |
Ambisonie d'ordre élevé en trois dimensions : captation, transformations et décodage adaptatif de champs sonoresLecomte, Pierre January 2016 (has links)
Résumé : La synthèse de champs sonores est un domaine de recherche actif trouvant de nombreuses
applications musicales, multimédias ou encore industrielles. Dans ce dernier cas, la re-
construction précise du champ sonore est souhaitée, ce qui implique de répondre à un
certains nombre de questionnements scientifiques. À l’aide de réseaux de microphones et
de haut-parleurs, la captation, la synthèse et la reconstruction précise de champs sonores
sont théoriquement possibles. Seulement, pour des applications pratiques, la disposition
des haut-parleurs et l’influence acoustique du lieu de restitution sont des facteurs cruciaux
à prendre en compte pour s’assurer de la bonne reconstruction du champ sonore.
Dans ce contexte, cette thèse de doctorat propose des méthodes et des techniques pour la
captation, la transformation et la reconstruction précise de champs sonores en trois dimen-
sions en se basant sur la méthode ambisonique d’ordre élevé. Une configuration sphérique
pour le réseau de microphones et de haut-parleurs est proposée. Elle suit un maillage de
Lebedev à cinquante points qui permet la captation et la reconstruction du champ sonore
jusqu’à l’ordre 5 avec le formalisme ambisonique. Les limitations de cette approche, tel le
repliement spatial, sont étudiés en détails. De plus, une opération de transformation du
champ sonore est présentée. Elle est établie dans le domaine des harmoniques sphériques
et permet d’effectuer un filtrage directionnel avant le décodage pour privilégier certaines
directions dans le champ sonore, suivant une fonction de directivité choisie. Pour la re-
construction, une approche originale, également établie dans le domaine des harmoniques
sphériques, permet de prendre en compte l’influence acoustique du lieu de restitution,
ainsi que les défauts du système de restitution. Ce traitement permet alors d’adapter la
synthèse de champs sonores au lieu de restitution, en conservant le formalisme théorique
établi en champ libre. Finalement, une validation expérimentale des méthodes et des tech-
niques développées au cours de la thèse est faite. Dans ce contexte, une suite logicielle de
synthèse et traitement en temps-réel des champs sonore est développée. / Abstract : Sound field synthesis is an active research domain with various musical, multimedia or
industrial applications. In the latter case, the accurate reconstruction of the sound field is
targeted, which involves answering several scientific questions. Using arrays of microphones
and loudspeakers, the capture, synthesis and accurate reconstruction of sound fields are
theoretically possible. However, for practical applications, the arrangement of the loud-
speakers and the acoustic influence of the restitution room are critical factors to consider
in order to ensure the accurate reconstruction of the sound field.
In this context, this thesis proposes methods and techniques for the capture, transforma-
tions and accurate reconstruction of sound fields in three dimensions based on the Higher
Order Ambisonics (HOA) method. A spherical configuration for the array of microphones
and loudspeakers is proposed. It follows a fifty-node Lebedev grid that enables the capture
and reconstruction of the sound field up to order 5 with HOA formalism. The limitations
of this approach, such as the spatial aliasing, are studied in detail. A transformation op-
eration of the sound field is also proposed. The formulation is established in the spherical
harmonics domain and enables a directional filtering on the sound field prior to the decod-
ing step. For the reconstruction of the sound field, an original approach, also established
in the spherical harmonics domain, can take into account the acoustic influence of the
restitution room and the defects of the playback system. This treatment then adapts the
synthesis of sound fields to the restitution room, maintaining the theoretical formalism
established in free field. Finally, an experimental validation of methods and techniques
developed in the thesis is made. In this context, a digital signal processing toolkit is de-
veloped. It process in real-time the microphones, ambisonics, and loudspeaker signals for
the sound field capture, transformations, and decoding.
|
16 |
Implementation and Evaluation of Encoder Tools for Multi-Channel AudioMalmelöv, Tomas January 2019 (has links)
The increasing interest for immersive experiences in areas such as augmented and virtual reality makes high quality 3D sound more important than ever before. A technique for capturing and rendering 3D audio which has received more attention during the last twenty years are Higher Order Ambisonics (HOA). Higher Order Ambisonics is a scene based audio format which has a lot of advantages compared to other standard formats. Hovever, one problem with HOA is that it requires a lot of bandwidth. For example, sending an uncoded high quality HOA signal requires 49 channels to be transmitted at the same time which requires a bandwidth of about 40 Mbps. A lot of effort has been made in the last ten years on coding HOA signals. In this thesis, two different approaches are taken on coding HOA signals. In one approach, called Sound Field Rotation (SFR) in this thesis, the microphone that records the sound field is virtually rotated to see if it is possible to make some of the channels zero. The second approach, called Sound Field Decomposition (SFD) in this thesis, use Principal component analysis to decompose a sound field into a foreground and background component. The Sound Field Decomposition approach is inspired by the emerging MPEG-H 3D Audio standard for coding HOA signals. The result shows that the Sound Field Rotation method only works for very simple sound scenes. It has also been shown that a 49 channels HOA signal can be reduced to as little as 7 channels if the sound scene consists of a point source. The Sound Field Deomposition method worked for more complex sound scenes. It was shown that a MPEG similar system could be improved. Result from MUSHRA (Multiple stimuli with hidden reference and anchor) listening tests showed that an improved MPEG similar system reached a MUSHRA score about 78 while the MPEG similar system reached 55 at a bitrate of 256 kbps. Without coding each monochannels with the 3GPP EVS (Enhanced voice services) codec, the improved MPEG similar system reached the MUSHRA score 85. At 256 kbps, the improved MPEG similar system coded the HOA signal into six channels instead of 49 for the uncoded signal. From objective results, it was shown that the improved MPEG similar system had largest effect at low bitrates.
|
17 |
Localisation et rehaussement de sources de parole au format Ambisonique : analyse de scènes sonores pour faciliter la commande vocale / Localization and enhancement of speech from the Ambisonics formatPerotin, Lauréline 31 October 2019 (has links)
Cette thèse s'inscrit dans le contexte de l'essor des assistants vocaux mains libres. Dans un environnement domestique, l'appareil est généralement posé à un endroit fixe, tandis que le locuteur s'adresse à lui depuis diverses positions, sans nécessairement s'appliquer à être proche du dispositif, ni même à lui faire face. Cela ajoute des difificultés majeures par rapport au cas, plus simple, de la commande vocale en champ proche (pour les téléphones portables par exemple) : ici, la réverbération est plus importante ; des réflexions précoces sur les meubles entourant l'appareil peuvent brouiller le signal ; les bruits environnants sont également sources d'interférences. À ceci s'ajoutent de potentiels locuteurs concurrents qui rendent la compréhension du locuteur principal particulièrement difficile. Afin de faciliter la reconnaissance vocale dans ces conditions adverses, plusieurs pré-traitements sont proposés ici. Nous utilisons un format audio spatialisé, le format Ambisonique, adapté à l'analyse de scènes sonores. Dans un premier temps, nous présentons une méthode de localisation des sources sonores basée sur un réseau de neurones convolutif et récurrent. Nous proposons des descripteurs inspirés du vecteur d'intensité acoustique qui améliorent la performance de localisation, notamment dans des situations réelles où plusieurs sources sont présentes et l'antenne de microphones est posée sur une table. La technique de visualisation appelée layerwise relevance propagation (LRP) met en valeur les zones temps-fréquence positivement corrélées avec la localisation prédite par le réseau dans un cas donné. En plus d'être méthodologiquement indispensable, cette analyse permet d'observer que le réseau de neurones exploite principalement les zones dans lesquelles le son direct domine la réverbération et le bruit ambiant. Dans un second temps, nous proposons une méthode pour rehausser la parole du locuteur principal et faciliter sa reconnaissance. Nous nous plaçons dans le cadre de la formation de voies basée sur des masques temps-fréquence estimés par un réseau de neurones. Afin de traiter le cas où plusieurs personnes parlent à un volume similaire, nous utilisons l'information de localisation pour faire un premier rehaussement à large bande dans la direction du locuteur cible. Nous montrons que donner cette information supplémentaire au réseau n'est pas suffisant dans le cas où deux locuteurs sont proches ; en revanche, donner en plus la version rehaussée du locuteur concurrent permet au réseau de renvoyer de meilleurs masques. Ces masques permettent d'en déduire un filtre multicanal qui améliore grandement la reconnaissance vocale. Nous évaluons cet algorithme dans différents environnements, y compris réels, grâce à un moteur de reconnaissance de la parole utilisé comme boîte noire. Dans un dernier temps, nous combinons les systèmes de localisation et de rehaussement et nous évaluons la robustesse du second aux imprécisions du premier sur des exemples réels. / This work was conducted in the fast-growing context of hands-free voice command. In domestic environments, smart devices are usually laid in a fixed position, while the human speaker gives orders from anywhere, not necessarily next to the device, or nor even facing it. This adds difficulties compared to the problem of near-field voice command (typically for mobile phones) : strong reverberation, early reflections on furniture around the device, and surrounding noises can degrade the signal. Moreover, other speakers may interfere, which make the understanding of the target speaker quite difficult. In order to facilitate speech recognition in such adverse conditions, several preprocessing methods are introduced here. We use a spatialized audio format suitable for audio scene analysis : the Ambisonic format. We first propose a sound source localization method that relies on a convolutional and recurrent neural network. We define an input feature vector inspired by the acoustic intensity vector which improves the localization performance, in particular in real conditions involving several speakers and a microphone array laid on a table. We exploit the visualization technique called layerwise relevance propagation (LRP) to highlight the time-frequency zones that are correlate positively with the network output. This analysis is of paramount importance to establish the validity of a neural network. In addition, it shows that the neural network essentially relies on time-frequency zones where direct sound dominates reverberation and background noise. We then present a method to enhance the voice of the main speaker and ease its recognition. We adopt a mask-based beamforming framework based on a time-frequency mask estimated by a neural network. To deal with the situation of multiple speakers with similar loudness, we first use a wideband beamformer to enhance the target speaker thanks to the associated localization information. We show that this additional information is not enough for the network when two speakers are close to each other. However, if we also give an enhanced version of the interfering speaker as input to the network, it returns much better masks. The filters generated from those masks greatly improve speech recognition performance. We evaluate this algorithm in various environments, including real ones, with a black-box automatic speech recognition system. Finally, we combine the proposed localization and enhancement systems and evaluate the robustness of the latter to localization errors in real environments.
|
18 |
Méthodes de spatialisation sonore et intégration dans le processus de compositionNéron Baribeau, Raphaël 07 1900 (has links)
L’espace est un élément peu exploré en musique. Méconnu des compositeurs, il n’est généralement pas pensé comme paramètre musical « composable ». Pourtant si la musique peut être perçue comme une organisation et une succession d’éléments dans le temps, pourquoi ne pourrait-elle pas l’être aussi dans l’espace?
Ce travail se veut en quelque sorte un pont entre la recherche et la pratique, qui se construit par la synthèse de l’information que j’ai pu trouver sur chacune des quatre méthodes de spatialisation abordées ici. Dans un premier temps, je traiterai de leur développement, leur fonctionnement et des possibilités d’intégration de ces méthodes dans le processus de composition musicale, notamment en discutant les outils disponibles.
Dans un second temps, les pièces Minimale Sédation et Fondations, toutes deux composées en octophonie seront discutées. J’expliquerai leurs processus de composition à travers les intentions, les techniques d’écriture et les outils qui ont menés à leurs créations. / Space is a parameter of sound that is relatively unexplored in music. Misunderstood by composers, it is not generally thought of as "composable" musical parameter. Yet if music can be seen as an organization and a succession of elements in time, why could it not also be in space? This work is intended to somehow bridge the gap between research and practice, by synthesizing the information I could find on each of the four sound spatialization methods discussed here. As a first step, I will discuss their development, operation and integration capabilities in the process of musical composition, as well as the tools available. In a second step, the work Minimale Sédation and Foundations, both composed in eight channels will be discussed. I will explain their process of composition through intentions, writing techniques and tools that have led to their creations.
Keywords
|
19 |
La représentation intermédiaire et abstraite de l’espace comme outil de spatialisation du son : enjeux et conséquences de l’appropriation musicale de l’ambisonie et des expérimentations dans le domaine des harmoniques sphériques / The intermediate and abstract representation of space as a tool for sound spatialization : enjeux et conséquences de l’appropriation musicale de l’ambisonie et des expérimentations dans le domaine des harmoniques sphériquesGuillot, Pierre 20 December 2017 (has links)
Penser les traitements du son spatialisés en ambisonie permet de mettre en valeur le potentiel musical de la décomposition du champ sonore en harmoniques sphériques, et amène à redéfinir la représentation de l’espace sonore. Cette thèse défend que les représentations abstraites et intermédiaires de l’espace sonore permettent d’élaborer de nouvelles approches originales de la mise en espace du son. Le raisonnement amenant à cette affirmation commence par l’appropriation musicale de l’approche ambisonique. La création de nouveaux traitements de l’espace et du son amène à utiliser de manière originale les signaux associés aux harmoniques sphériques, et à concevoir différemment les relations qui les régissent, ainsi que leur hiérarchisation. La particularité de ces approches expérimentales et les caractéristiques singulières des champs sonores générés, nécessitent de concevoir de nouveaux outils théoriques et pratiques pour leur analyse et leur restitution. Les changements opérés permettent alors de libérer cette approche des enjeux techniques et matériels initiaux en ambisonie. Mais ils permettent surtout de s’émanciper des modèles psychoacoustiques et acoustiques sur lesquels ces techniques reposent originellement. Dans ce contexte, les signaux associés aux harmoniques sphériques ne sont plus nécessairement une représentation rationnelle du champ sonore, mais deviennent une représentation abstraite de l’espace sonore possédant en soi, un potentiel musical. Cette thèse propose alors un nouveau modèle de spatialisation fondé sur une décomposition matricielle de l’espace sonore permettant de valider les hypothèses. / The creation of sound effects in space with Ambisonics highlights the musical potential of sound field decomposition by spherical harmonics, and redefines the representation of the sound space. This thesis defends that the abstract and intermediate representations of the sound space make it possible to develop new original approaches to sound spatialization. The reasoning that leads to this affirmation begins with the musical appropriation of the ambisonic approach. The creation of new space and sound processing patterns in Ambisonics leads to an original way of using signals associated with spherical harmonics, and to a different conception of the relations between them, and their hierarchization. The specificities of these experimental approaches and the singular characteristics of the sound fields generated call for the design of new theoretical and practical tools, for their analysis and restitution. The performed changes make it possible to free this approach from the initial technical and material issues of Ambisonics. But above all, it emancipates this approach from the psychoacoustic and acoustic models on which ambisonic techniques are originally defined. In this context, the signals associated with spherical harmonics are no longer necessarily a rational representation of the sound field but become an abstract representation of the sound space possessing in itself a musical potential. To validate the hypotheses, this thesis then proposes a new spatialization model based on a matrix decomposition of the sound space.
|
20 |
Abordagem à espacialização de um ensemble de percissão no palco sonoro : estudo de caso Drumming GPRibeiro, Suse Patrícia Carvalho January 2012 (has links)
Tese de mestrado. Multimédia (Música Interactiva e Design de Som). Faculdade de Engenharia. Universidade do Porto. 2012
|
Page generated in 0.0563 seconds