Spelling suggestions: "subject:"scene analysis"" "subject:"acene analysis""
21 |
Interactions audiovisuelles pour l'analyse de scènes auditives / Audiovisual interactions for auditory scene analysisDevergie, Aymeric 10 December 2010 (has links)
Percevoir la parole dans le bruit représente une opération complexe pour notre système perceptif. Pour parvenir à analyser cette scène auditive, nous mettons en place des mécanismes de ségrégation auditive. Nous pouvons également lire sur les lèvres pour améliorer notre compréhension de la parole. L'hypothèse initiale, présentée dans ce travail de thèse, est que ce bénéfice visuel pourrait en partie reposer sur des interactions entre l'information visuelle et les mécanismes de ségrégation auditive. Les travaux réalisés montrent que lorsque la cohérence audiovisuelle est importante, les mécanismes de ségrégation précoce peuvent être renforcés. Les mécanismes de ségrégation tardives, quant à eux, ont été démontré comme mettant en jeu des processus attentionnels. Ces processus attentionnels pourraient donc être renforcés par la présentation d'un indice visuel lié perceptivement. Il apparaît que ce liage entre un flux de voyelles et un indice visuel élémentaire est possible mais cependant moins fort que lorsque l'indice visuel possède un contenu phonétique. En conclusion, les résultats présentés dans ce travail suggèrent que les mécanismes de ségrégation auditive puissent être influencés par un indice visuel pour peu que la cohérence audiovisuelle soit importante comme dans le cas de la parole. / Perceive speech in noise is a complex operation for our perceptual system. To achieve this auditory scene analysis, we involve mechanisms of auditory streaming. We can also read lips to improve our understanding of speech. The intial hypothesis, presented in this thesis, is that visual benefit could be partly based on interactions between the visual input and the auditory streaming mechanisms. Studies conduced here shows that when the audiovisual coherence is strong, primary streaming mechanisms can be strengthened. Late segregation mechanisms, meanwhile, have been shown as involving attentional processes. These attentional processes could therefore be strengthened by the presentation of a visual cue linked perceptually to auditory signal. It appears that binding between a stream of vowels and a elementary visual cue can occur but is less strong than when the visual cue contained phonetic information. In conclusion, the results presented in this work suggest that the mechanisms of auditory streaming can be influenced by a visual cue as long as the audiovisual coherence is important as in the case of speech.
|
22 |
Bayesian Microphone Array Processing / ベイズ法によるマイクロフォンアレイ処理Otsuka, Takuma 24 March 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18412号 / 情博第527号 / 新制||情||93(附属図書館) / 31270 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
23 |
Decoding spatial location of attended audio-visual stimulus with EEG and fNIRSNing, Matthew H. 17 January 2023 (has links)
When analyzing complex scenes, humans often focus their attention on an object at a particular spatial location in the presence of background noises and irrelevant visual objects. The ability to decode the attended spatial location would facilitate brain computer interfaces (BCI) for complex scene analysis. Here, we tested two different neuroimaging technologies and investigated their capability to decode audio-visual spatial attention in the presence of competing stimuli from multiple locations. For functional near-infrared spectroscopy (fNIRS), we targeted dorsal frontoparietal network including frontal eye field (FEF) and intra-parietal sulcus (IPS) as well as superior temporal gyrus/planum temporal (STG/PT). They all were shown in previous functional magnetic resonance imaging (fMRI) studies to be activated by auditory, visual, or audio-visual spatial tasks. We found that fNIRS provides robust decoding of attended spatial locations for most participants and correlates with behavioral performance. Moreover, we found that FEF makes a large contribution to decoding performance. Surprisingly, the performance was significantly above chance level 1s after cue onset, which is well before the peak of the fNIRS response.
For electroencephalography (EEG), while there are several successful EEG-based algorithms, to date, all of them focused exclusively on auditory modality where eye-related artifacts are minimized or controlled. Successful integration into a more ecological typical usage requires careful consideration for eye-related artifacts which are inevitable. We showed that fast and reliable decoding can be done with or without ocular-removal algorithm. Our results show that EEG and fNIRS are promising platforms for compact, wearable technologies that could be applied to decode attended spatial location and reveal contributions of specific brain regions during complex scene analysis.
|
24 |
Sound source segregation of multiple concurrent talkers via Short-Time Target CancellationCantu, Marcos Antonio 22 October 2018 (has links)
The Short-Time Target Cancellation (STTC) algorithm, developed as part of this dissertation research, is a “Cocktail Party Problem” processor that can boost speech intelligibility for a target talker from a specified “look” direction, while suppressing the intelligibility of competing talkers. The algorithm holds promise for both automatic speech recognition and assistive listening device applications. The STTC algorithm operates on a frame-by-frame basis, leverages the computational efficiency of the Fast Fourier Transform (FFT), and is designed to run in real time. Notably, performance in objective measures of speech intelligibility and sound source segregation is comparable to that of the Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM). Because the STTC algorithm computes a time-frequency mask that can be applied independently to both the left and right signals, binaural cues for spatial hearing, including Interaural Time Differences (ITDs), Interaural Level Differences (ILDs) and spectral cues, can be preserved in potential hearing aid applications. A minimalist design for a proposed STTC Assistive Listening Device (ALD), consisting of six microphones embedded in the frame of a pair of eyeglasses, is presented and evaluated using virtual room acoustics and both objective and behavioral measures. The results suggest that the proposed STTC ALD can provide a significant speech intelligibility benefit in complex auditory scenes comprised of multiple spatially separated talkers. / 2020-10-22T00:00:00Z
|
25 |
Bio-inspired noise robust auditory featuresJavadi, Ailar 12 June 2012 (has links)
The purpose of this work
is to investigate a series of biologically inspired modifications to state-of-the-art Mel-
frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition
results. We have provided recommendations to improve speech recognition results de-
pending on signal-to-noise ratio levels of input signals. This work has been motivated by
noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a
spatial derivative step is used to sharpen the results, followed by an envelope detector (recti-
fication and smoothing) and down-sampling for each filter bank before being compressed.
DCT is then applied to the results of all filter banks to produce features. The Hidden-
Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech
recognition given the features we have extracted. In this work, we investigate the
role of filter types, window size, spatial derivative, rectification types, smoothing, down-
sampling and compression and compared the final results to state-of-the-art Mel-frequency
cepstral coefficients (MFCC). A series of conclusions and insights are provided for each
step of the process. The goal of this work has not been to outperform MFCCs; however,
we have shown that by changing the compression type from log compression to 0.07 root
compression we are able to outperform MFCCs for all noisy conditions.
|
26 |
Range Data Recognition: Segmentation, Matching, And Similarity RetrievalYalcin Bayramoglu, Neslihan 01 September 2011 (has links) (PDF)
The improvements in 3D scanning technologies have led the necessity for managing range image databases. Hence, the requirement of describing and indexing this type of data arises. Up to now, rather much work is achieved on capturing, transmission and visualization / however, there is still a gap in the 3D semantic analysis between the requirements of the applications and the obtained results. In this thesis we studied 3D semantic analysis of range data. Under this broad title we address segmentation of range scenes, correspondence matching of range images and the similarity retrieval of range models. Inputs are considered as single view depth images. First, possible research topics related to 3D semantic analysis are introduced. Planar structure detection in range scenes are analyzed and some modifications on available methods are proposed. Also, a novel algorithm to segment 3D point cloud (obtained via TOF camera) into objects by using the spatial information is presented. We proposed a novel local range image matching method that combines 3D surface properties with the 2D scale invariant feature transform. Next, our proposal for retrieving similar models where the query and the database both consist of only range models is presented. Finally, analysis of heat diffusion process on range data is presented. Challenges and some experimental results are presented.
|
27 |
Real-time analysis of aggregate network traffic for anomaly detectionKim, Seong Soo 29 August 2005 (has links)
The frequent and large-scale network attacks have led to an increased need for
developing techniques for analyzing network traffic. If efficient analysis tools were
available, it could become possible to detect the attacks, anomalies and to appropriately
take action to contain the attacks before they have had time to propagate across the
network.
In this dissertation, we suggest a technique for traffic anomaly detection based on
analyzing the correlation of destination IP addresses and distribution of image-based
signal in postmortem and real-time, by passively monitoring packet headers of traffic.
This address correlation data are transformed using discrete wavelet transform for
effective detection of anomalies through statistical analysis. Results from trace-driven
evaluation suggest that the proposed approach could provide an effective means of
detecting anomalies close to the source. We present a multidimensional indicator using
the correlation of port numbers as a means of detecting anomalies.
We also present a network measurement approach that can simultaneously detect,
identify and visualize attacks and anomalous traffic in real-time. We propose to
represent samples of network packet header data as frames or images. With such a
formulation, a series of samples can be seen as a sequence of frames or video. Thisenables techniques from image processing and video compression such as DCT to be
applied to the packet header data to reveal interesting properties of traffic. We show that
??scene change analysis?? can reveal sudden changes in traffic behavior or anomalies. We
show that ??motion prediction?? techniques can be employed to understand the patterns of
some of the attacks. We show that it may be feasible to represent multiple pieces of data
as different colors of an image enabling a uniform treatment of multidimensional packet
header data.
Measurement-based techniques for analyzing network traffic treat traffic volume
and traffic header data as signals or images in order to make the analysis feasible. In this
dissertation, we propose an approach based on the classical Neyman-Pearson Test
employed in signal detection theory to evaluate these different strategies. We use both of
analytical models and trace-driven experiments for comparing the performance of
different strategies. Our evaluations on real traces reveal differences in the effectiveness
of different traffic header data as potential signals for traffic analysis in terms of their
detection rates and false alarm rates. Our results show that address distributions and
number of flows are better signals than traffic volume for anomaly detection. Our results
also show that sometimes statistical techniques can be more effective than the NP-test
when the attack patterns change over time.
|
28 |
Association of Sound to Motion in Video Using Perceptual OrganizationRavulapalli, Sunil Babu 29 March 2006 (has links)
Technological developments and innovations of the first forty years of the digital era have primarily addressed either the audio or the visual senses. Consequently, designers have primarily focused on the audio or the visual aspects of design. In the perspective of video surveillance, the data under consideration has always been visual. However, in light of the new behavioral and physiological studies which established a proof of cross modality in human perception i.e. humans do not process audio and visual stimulus separately, but percieve a scene based on all stimulus available, similar cues are being used to develop a surveillance system which uses both audio and visual data available. Human beings can easily associate a particular sound to an object in the surrounding. Drawing from such studies, we demonstrate a technique by which we can isolate concurrent audio and video events and associate them based on perceptual grouping principles. Associating sound to an object can form apart of larger surveillance system by producing a better description of objects.
We represent audio in the pitch-time domain and use image processing algorithms such as line detection to isolate significant events. These events and are then grouped based on gestalt principles of proximity and similarity which operates in audio. Once auditory events are isolated we can extract their periodicity. In video, we can extract objects by using simple background subtraction. We extract motion and shape periodicities of all the objects by tracking their position or the number of pixels in each frame. By comparing all the periodicities in audio and video using a simple index we can easily associate audio to video. We show results on five scenariosin outdoor settings with different kinds of human activity such as running, walking and other moving objects such as balls and cars.
|
29 |
AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURESUnnikrishnan, Harikrishnan 01 January 2010 (has links)
Auditory stream denotes the abstract effect a source creates in the mind of the listener. An auditory scene consists of many streams, which the listener uses to analyze and understand the environment. Computer analyses that attempt to mimic human analysis of a scene must first perform Audio Scene Segmentation (ASS). ASS find applications in surveillance, automatic speech recognition and human computer interfaces. Microphone arrays can be employed for extracting streams corresponding to spatially separated sources. However, when a source moves to a new location during a period of silence, such a system loses track of the source. This results in multiple spatially localized streams for the same source. This thesis proposes to identify local streams associated with the same source using auditory features extracted from the beamformed signal. ASS using the spatial cues is first performed. Then auditory features are extracted and segments are linked together based on similarity of the feature vector. An experiment was carried out with two simultaneous speakers. A classifier is used to classify the localized streams as belonging to one speaker or the other. The best performance was achieved when pitch appended with Gammatone Frequency Cepstral Coefficeints (GFCC) was used as the feature vector. An accuracy of 96.2% was achieved.
|
30 |
Shape Analysis Using Contour-based And Region-based ApproachesCiftci, Gunce 01 January 2004 (has links) (PDF)
The user of an image database often wishes to retrieve all images similar to the one (s)he already has. In this thesis, shape analysis methods for retrieving shape are investigated. Shape analysis methods can be classified in two groups as contour-based and region-based according to the shape information used. In such a classification, curvature scale space (CSS) representation and angular radial transform (ART) are promising methods for shape similarity retrieval respectively. The CSS representation operates by decomposing the shape contour into convex and concave sections. CSS descriptor is extracted by using the curvature zero-crossings behaviour of the shape boundary while smoothing the boundary with Gaussian filter. The ART descriptor decomposes the shape region into a number of orthogonal 2-D basis functions defined on a unit disk. ART descriptor is extracted using the magnitudes of ART coefficients. These methods are implemented for similarity comparison of binary images and the retrieval performances of descriptors for changing number of sampling points of boundary and order of ART coefficients are investigated. The experiments are done using 1000 images from MPEG7 Core Experiments Shape-1. Results show that for different classes of shape, different descriptors are more successful. When the choice of approach depends on the properties of the query shape, similarity retrieval performance increases.
|
Page generated in 0.0786 seconds