11 |
Psychophysical and Neural Correlates of Auditory Attraction and AversionJanuary 2014 (has links)
abstract: This study explores the psychophysical and neural processes associated with the perception of sounds as either pleasant or aversive. The underlying psychophysical theory is based on auditory scene analysis, the process through which listeners parse auditory signals into individual acoustic sources. The first experiment tests and confirms that a self-rated pleasantness continuum reliably exists for 20 various stimuli (r = .48). In addition, the pleasantness continuum correlated with the physical acoustic characteristics of consonance/dissonance (r = .78), which can facilitate auditory parsing processes. The second experiment uses an fMRI block design to test blood oxygen level dependent (BOLD) changes elicited by a subset of 5 exemplar stimuli chosen from Experiment 1 that are evenly distributed over the pleasantness continuum. Specifically, it tests and confirms that the pleasantness continuum produces systematic changes in brain activity for unpleasant acoustic stimuli beyond what occurs with pleasant auditory stimuli. Results revealed that the combination of two positively and two negatively valenced experimental sounds compared to one neutral baseline control elicited BOLD increases in the primary auditory cortex, specifically the bilateral superior temporal gyrus, and left dorsomedial prefrontal cortex; the latter being consistent with a frontal decision-making process common in identification tasks. The negatively-valenced stimuli yielded additional BOLD increases in the left insula, which typically indicates processing of visceral emotions. The positively-valenced stimuli did not yield any significant BOLD activation, consistent with consonant, harmonic stimuli being the prototypical acoustic pattern of auditory objects that is optimal for auditory scene analysis. Both the psychophysical findings of Experiment 1 and the neural processing findings of Experiment 2 support that consonance is an important dimension of sound that is processed in a manner that aids auditory parsing and functional representation of acoustic objects and was found to be a principal feature of pleasing auditory stimuli. / Dissertation/Thesis / Masters Thesis Psychology 2014
|
12 |
Natural Correlations of Spectral Envelope and their Contribution to Auditory Scene AnalysisJanuary 2017 (has links)
abstract: Auditory scene analysis (ASA) is the process through which listeners parse and organize their acoustic environment into relevant auditory objects. ASA functions by exploiting natural regularities in the structure of auditory information. The current study investigates spectral envelope and its contribution to the perception of changes in pitch and loudness. Experiment 1 constructs a perceptual continuum of twelve f0- and intensity-matched vowel phonemes (i.e. a pure timbre manipulation) and reveals spectral envelope as a primary organizational dimension. The extremes of this dimension are i (as in “bee”) and Ʌ (“bun”). Experiment 2 measures the strength of the relationship between produced f0 and the previously observed phonetic-pitch continuum at three different levels of phonemic constraint. Scat performances and, to a lesser extent, recorded interviews were found to exhibit changes in accordance with the natural regularity; specifically, f0 changes were correlated with the phoneme pitch-height continuum. The more constrained case of lyrical singing did not exhibit the natural regularity. Experiment 3 investigates participant ratings of pitch and loudness as stimuli vary in f0, intensity, and the phonetic-pitch continuum. Psychophysical functions derived from the results reveal that moving from i to Ʌ is equivalent to a .38 semitone decrease in f0 and a .75 dB decrease in intensity. Experiment 4 examines the potentially functional aspect of the pitch, loudness, and spectral envelope relationship. Detection thresholds of stimuli in which all three dimensions change congruently (f0 increase, intensity increase, Ʌ to i) or incongruently (no f0 change, intensity increase, i to Ʌ) are compared using an objective version of the method of limits. Congruent changes did not provide a detection benefit over incongruent changes; however, when the contribution of phoneme change was removed, congruent changes did offer a slight detection benefit, as in previous research. While this relationship does not offer a detection benefit at threshold, there is a natural regularity for humans to produce phonemes at higher f0s according to their relative position on the pitch height continuum. Likewise, humans have a bias to detect pitch and loudness changes in phoneme sweeps in accordance with the natural regularity. / Dissertation/Thesis / Doctoral Dissertation Psychology 2017
|
13 |
Interactions audiovisuelles pour l'analyse de scènes auditives / Audiovisual interactions for auditory scene analysisDevergie, Aymeric 10 December 2010 (has links)
Percevoir la parole dans le bruit représente une opération complexe pour notre système perceptif. Pour parvenir à analyser cette scène auditive, nous mettons en place des mécanismes de ségrégation auditive. Nous pouvons également lire sur les lèvres pour améliorer notre compréhension de la parole. L'hypothèse initiale, présentée dans ce travail de thèse, est que ce bénéfice visuel pourrait en partie reposer sur des interactions entre l'information visuelle et les mécanismes de ségrégation auditive. Les travaux réalisés montrent que lorsque la cohérence audiovisuelle est importante, les mécanismes de ségrégation précoce peuvent être renforcés. Les mécanismes de ségrégation tardives, quant à eux, ont été démontré comme mettant en jeu des processus attentionnels. Ces processus attentionnels pourraient donc être renforcés par la présentation d'un indice visuel lié perceptivement. Il apparaît que ce liage entre un flux de voyelles et un indice visuel élémentaire est possible mais cependant moins fort que lorsque l'indice visuel possède un contenu phonétique. En conclusion, les résultats présentés dans ce travail suggèrent que les mécanismes de ségrégation auditive puissent être influencés par un indice visuel pour peu que la cohérence audiovisuelle soit importante comme dans le cas de la parole. / Perceive speech in noise is a complex operation for our perceptual system. To achieve this auditory scene analysis, we involve mechanisms of auditory streaming. We can also read lips to improve our understanding of speech. The intial hypothesis, presented in this thesis, is that visual benefit could be partly based on interactions between the visual input and the auditory streaming mechanisms. Studies conduced here shows that when the audiovisual coherence is strong, primary streaming mechanisms can be strengthened. Late segregation mechanisms, meanwhile, have been shown as involving attentional processes. These attentional processes could therefore be strengthened by the presentation of a visual cue linked perceptually to auditory signal. It appears that binding between a stream of vowels and a elementary visual cue can occur but is less strong than when the visual cue contained phonetic information. In conclusion, the results presented in this work suggest that the mechanisms of auditory streaming can be influenced by a visual cue as long as the audiovisual coherence is important as in the case of speech.
|
14 |
Bayesian Microphone Array Processing / ベイズ法によるマイクロフォンアレイ処理Otsuka, Takuma 24 March 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18412号 / 情博第527号 / 新制||情||93(附属図書館) / 31270 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
15 |
Sound source segregation of multiple concurrent talkers via Short-Time Target CancellationCantu, Marcos Antonio 22 October 2018 (has links)
The Short-Time Target Cancellation (STTC) algorithm, developed as part of this dissertation research, is a “Cocktail Party Problem” processor that can boost speech intelligibility for a target talker from a specified “look” direction, while suppressing the intelligibility of competing talkers. The algorithm holds promise for both automatic speech recognition and assistive listening device applications. The STTC algorithm operates on a frame-by-frame basis, leverages the computational efficiency of the Fast Fourier Transform (FFT), and is designed to run in real time. Notably, performance in objective measures of speech intelligibility and sound source segregation is comparable to that of the Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM). Because the STTC algorithm computes a time-frequency mask that can be applied independently to both the left and right signals, binaural cues for spatial hearing, including Interaural Time Differences (ITDs), Interaural Level Differences (ILDs) and spectral cues, can be preserved in potential hearing aid applications. A minimalist design for a proposed STTC Assistive Listening Device (ALD), consisting of six microphones embedded in the frame of a pair of eyeglasses, is presented and evaluated using virtual room acoustics and both objective and behavioral measures. The results suggest that the proposed STTC ALD can provide a significant speech intelligibility benefit in complex auditory scenes comprised of multiple spatially separated talkers. / 2020-10-22T00:00:00Z
|
16 |
Bio-inspired noise robust auditory featuresJavadi, Ailar 12 June 2012 (has links)
The purpose of this work
is to investigate a series of biologically inspired modifications to state-of-the-art Mel-
frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition
results. We have provided recommendations to improve speech recognition results de-
pending on signal-to-noise ratio levels of input signals. This work has been motivated by
noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a
spatial derivative step is used to sharpen the results, followed by an envelope detector (recti-
fication and smoothing) and down-sampling for each filter bank before being compressed.
DCT is then applied to the results of all filter banks to produce features. The Hidden-
Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech
recognition given the features we have extracted. In this work, we investigate the
role of filter types, window size, spatial derivative, rectification types, smoothing, down-
sampling and compression and compared the final results to state-of-the-art Mel-frequency
cepstral coefficients (MFCC). A series of conclusions and insights are provided for each
step of the process. The goal of this work has not been to outperform MFCCs; however,
we have shown that by changing the compression type from log compression to 0.07 root
compression we are able to outperform MFCCs for all noisy conditions.
|
17 |
Association of Sound to Motion in Video Using Perceptual OrganizationRavulapalli, Sunil Babu 29 March 2006 (has links)
Technological developments and innovations of the first forty years of the digital era have primarily addressed either the audio or the visual senses. Consequently, designers have primarily focused on the audio or the visual aspects of design. In the perspective of video surveillance, the data under consideration has always been visual. However, in light of the new behavioral and physiological studies which established a proof of cross modality in human perception i.e. humans do not process audio and visual stimulus separately, but percieve a scene based on all stimulus available, similar cues are being used to develop a surveillance system which uses both audio and visual data available. Human beings can easily associate a particular sound to an object in the surrounding. Drawing from such studies, we demonstrate a technique by which we can isolate concurrent audio and video events and associate them based on perceptual grouping principles. Associating sound to an object can form apart of larger surveillance system by producing a better description of objects.
We represent audio in the pitch-time domain and use image processing algorithms such as line detection to isolate significant events. These events and are then grouped based on gestalt principles of proximity and similarity which operates in audio. Once auditory events are isolated we can extract their periodicity. In video, we can extract objects by using simple background subtraction. We extract motion and shape periodicities of all the objects by tracking their position or the number of pixels in each frame. By comparing all the periodicities in audio and video using a simple index we can easily associate audio to video. We show results on five scenariosin outdoor settings with different kinds of human activity such as running, walking and other moving objects such as balls and cars.
|
18 |
AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURESUnnikrishnan, Harikrishnan 01 January 2010 (has links)
Auditory stream denotes the abstract effect a source creates in the mind of the listener. An auditory scene consists of many streams, which the listener uses to analyze and understand the environment. Computer analyses that attempt to mimic human analysis of a scene must first perform Audio Scene Segmentation (ASS). ASS find applications in surveillance, automatic speech recognition and human computer interfaces. Microphone arrays can be employed for extracting streams corresponding to spatially separated sources. However, when a source moves to a new location during a period of silence, such a system loses track of the source. This results in multiple spatially localized streams for the same source. This thesis proposes to identify local streams associated with the same source using auditory features extracted from the beamformed signal. ASS using the spatial cues is first performed. Then auditory features are extracted and segments are linked together based on similarity of the feature vector. An experiment was carried out with two simultaneous speakers. A classifier is used to classify the localized streams as belonging to one speaker or the other. The best performance was achieved when pitch appended with Gammatone Frequency Cepstral Coefficeints (GFCC) was used as the feature vector. An accuracy of 96.2% was achieved.
|
19 |
Monaural Speech Segregation in Reverberant EnvironmentsJin, Zhaozhang 27 September 2010 (has links)
No description available.
|
20 |
A biologically inspired approach to the cocktail party problemChou, Kenny F. 19 May 2020 (has links)
At a cocktail party, one can choose to scan the room for conversations of interest, attend to a specific conversation partner, switch between conversation partners, or not attend to anything at all. The ability of the normal-functioning auditory system to flexibly listen in complex acoustic scenes plays a central role in solving the cocktail party problem (CPP). In contrast, certain demographics (e.g., individuals with hearing impairment or older adults) are unable to solve the CPP, leading to psychological ailments and reduced quality of life. Since the normal auditory system still outperforms machines in solving the CPP, an effective solution may be found by mimicking the normal-functioning auditory system.
Spatial hearing likely plays an important role in CPP-processing in the auditory system. This thesis details the development of a biologically based approach to the CPP by modeling specific neural mechanisms underlying spatial tuning in the auditory cortex. First, we modeled bottom-up, stimulus-driven mechanisms using a multi-layer network model of the auditory system. To convert spike trains from the model output into audible waveforms, we designed a novel reconstruction method based on the estimation of time-frequency masks. We showed that our reconstruction method produced sounds with significantly higher intelligibility and quality than previous reconstruction methods. We also evaluated the algorithm's performance using a psychoacoustic study, and found that it provided the same amount of benefit to normal-hearing listeners as a current state-of-the-art acoustic beamforming algorithm.
Finally, we modeled top-down, attention driven mechanisms that allowed the network to flexibly operate in different regimes, e.g., monitor the acoustic scene, attend to a specific target, and switch between attended targets. The model explains previous experimental observations, and proposes candidate neural mechanisms underlying flexible listening in cocktail-party scenarios. The strategies proposed here would benefit hearing-assistive devices for CPP processing (e.g., hearing aids), where users would benefit from switching between various modes of listening in different social situations. / 2022-05-19T00:00:00Z
|
Page generated in 0.0933 seconds