Spelling suggestions: "subject:"crossmodal"" "subject:"crossmodale""
21 |
La synesthésie chez l'enfant : prévalence, aspects développementaux et cognitifs / Synaesthesia in children : prevalence, developmental and cognitive aspectsGarnier, Marie-Margeride 31 May 2016 (has links)
Les synesthètes ont la particularité d’associer une expérience supplémentaire (e.g. une couleur) lors de la présence de certains stimuli (e.g. une lettre). L’objectif de cette thèse était de mieux comprendre le développement de la synesthésie à travers trois types : Graphème-Couleur, Graphème-Personnalité et représentation des nombres dans l'espace (« Forme des nombres »). Si la synesthésie Graphème-Couleur a déjà été étudiée chez l’enfant, les deux autres n’ont jamais fait l’objet d’étude développementale auparavant.Les productions graphiques de 797 enfants de dernière année de maternelle et des deux premières années de primaire (CP et CE1) ont été recueillies (2 sessions, 2 à 3 semaines d’écart). Pour tester l’hypothèse d’un continuum entre les synesthètes et la population générale, nous avons aussi étudié les associations d’enfants non synesthètes.Nous avons mis en avant les difficultés méthodologiques de l’étude de la synesthésie chez les enfants entre 5 et 8 ans. Nos résultats ne nous permettent pas d’attester de façon certaine l’existence de la synesthésie dans cette tranche d’âge. Mais nous avons observé une augmentation du nombre de synesthètes potentiels par niveau scolaire, et un niveau de constance bas de leurs associations par rapport à ce qui est observé chez l’adulte. Comme les enfants connaissent bien les lettres et les nombres, du moins en CE1, on peut penser que ces synesthésies ne sont pas seulement liées à l’apprentissage de l’inducteur en tant que tel. Mais elles pourraient se développe en parallèle d’apprentissages complexes (e.g. calcul ou la rédaction), ou en suivant le développement de la mémoire et de l’imagerie mentale. / Synaesthetes have the peculiarity of associating a supplementary experience (e.g. the colour green) when certain stimuli are presented (e.g. the letter F). The aim of this thesis was to better understand the development of synaesthesia through three types: Grapheme–Colour, Grapheme–Personality and the representation of numbers in space.We collected the graphical representations of 797 children from Kindergarten, 1st Grade, and 2nd Grade twice, at 2-3 weeks interval. To test the hypothesis of a continuum between synaesthetes and the general population, we also studied the associations of non-synaesthete children. We reported the methodological problems specific to the study of synaesthesia in 5 to 8 years old children. We could not definitely attest the existence of synaesthesia in this age group. But we observed a growing number of potential synaesthetes in each grade, as well as a low consistency level of their associations in comparison with adults. Since children know well letters and numbers, at least in the 2nd grade, we can consider that these types of synaesthesia are not only linked to the acquisition of their inducers themselves. But they could develop during harder acquisitions (numeracy or literacy) or with the development of mental imagery and associative memory.
|
22 |
Audio-tactile displays to improve learnability and perceived urgency of alarming stimuliMomenipour, Amirmasoud 01 August 2019 (has links)
Based on cross-modal learning and multiple resources theory, human performance can be improved by receiving and processing additional streams of information from the environment. In alarm situations, alarm meanings need to be distinguishable from each other and learnable for users. In audible alarms, by manipulating the temporal characteristics of sounds different audible signals can be generated. However, in some cases such as in using discrete medical alarms, when there are too many audible signals to manage, changes in temporal characteristics may not generate discriminable signals that would be easy for listeners to learn. Multimodal displays can be developed to generate additional auditory, visual, and tactile stimuli for helping humans benefit from cross-modal learning and multiple attentional resources for a better understanding of the alarm situations. In designing multimodal alarm displays in work domains where the alarms are predominantly auditory-based and where accessing visual displays is not possible at all times, tactile displays can enhance the effectiveness of alarms by providing additional streams of information for understanding the alarms. However, because of low information density of tactile information presentation, the use of tactile alarms has been limited. In this thesis, by using human subjects, the learnability of auditory and tactile alarms, separately and together in an audio-tactile display were studied. The objective of the study was to test cross-modal learning when messages of an alarm (i.e. meaning, urgency level) were conveyed simultaneously in audible, tactile and audio-tactile alarm displays. The alarm signals were designed by using spatial characteristics of tactile, and temporal characteristics of audible signals separately in audible and tactile displays as well as together in an audio-tactile display. This study explored if using multimodal alarms (tactile and audible) would help learning unimodal (audible or tactile) alarm meanings and urgency levels. The findings of this study can help for design of more efficient discrete audio-tactile alarms that promote learnability of alarm meanings and urgency levels.
|
23 |
The Reorganization of Primary Auditory Cortex by Invasion of Ectopic Visual InputsMao, Yuting 06 May 2012 (has links)
Brain injury is a serious clinical problem. The success of recovery from brain injury involves functional compensation in the affected brain area. We are interested in general mechanisms that underlie compensatory plasticity after brain damage, particularly when multiple brain areas or multiple modalities are included. In this thesis, I studied the function of auditory cortex after recovery from neonatal midbrain damage as a model system that resembles patients with brain damage or sensory dysfunction. I addressed maladaptive changes of auditory cortex after invasion by ectopic visual inputs. I found that auditory cortex contained auditory, visual, and multisensory neurons after it recovered from neonatal midbrain damage (Mao et al. 2011). The distribution of these different neuronal responses did not show any clustering or segregation. As might be predicted from the fact that auditory neurons and visual neurons were intermingled throughout the entire auditory cortex, I found that residual auditory tuning and tonotopy in the rewired auditory cortex were compromised. Auditory tuning curves were broader and tonotopic maps were disrupted in the experimental animals. Because lateral inhibition is proposed to contribute to refinement of sensory maps and tuning of receptive fields, I tested whether loss of inhibition is responsible for the compromised auditory function in my experimental animals. I found an increase rather than a decrease of inhibition in the rewired auditory cortex, suggesting that broader tuning curves in the experimental animals are not caused by loss of lateral inhibition.
These results suggest that compensatory plasticity can be maladaptive and thus impair the recovery of the original sensory cortical function. The reorganization of brain areas after recovery from brain damage may require stronger inhibition in order to process multiple sensory modalities simultaneously. These findings provide insight into compensatory plasticity after sensory dysfunction and brain damage and new information about the role of inhibition in cross-modal plasticity. This study can guide further research on design of therapeutic strategies to encourage adaptive changes and discourage maladaptive changes after brain damage, sensory/motor dysfunction, and deafferentation.
|
24 |
Selective attention and speech processing in the cortexRajaram, Siddharth 24 September 2015 (has links)
In noisy and complex environments, human listeners must segregate the mixture of sound sources arriving at their ears and selectively attend a single source, thereby solving a computationally difficult problem called the cocktail party problem. However, the neural mechanisms underlying these computations are still largely a mystery. Oscillatory synchronization of neuronal activity between cortical areas is thought to provide a crucial role in facilitating information transmission between spatially separated populations of neurons, enabling the formation of functional networks.
In this thesis, we seek to analyze and model the functional neuronal networks underlying attention to speech stimuli and find that the Frontal Eye Fields play a central 'hub' role in the auditory spatial attention network in a cocktail party experiment. We use magnetoencephalography (MEG) to measure neural signals with high temporal precision, while sampling from the whole cortex. However, several methodological issues arise when undertaking functional connectivity analysis with MEG data. Specifically, volume conduction of electrical and magnetic fields in the brain complicates interpretation of results. We compare several approaches through simulations, and analyze the trade-offs among various measures of neural phase-locking in the presence of volume conduction. We use these insights to study functional networks in a cocktail party experiment.
We then construct a linear dynamical system model of neural responses to ongoing speech. Using this model, we are able to correctly predict which of two speakers is being attended by a listener. We then apply this model to data from a task where people were attending to stories with synchronous and scrambled videos of the speakers' faces to explore how the presence of visual information modifies the underlying neuronal mechanisms of speech perception. This model allows us to probe neural processes as subjects listen to long stimuli, without the need for a trial-based experimental design. We model the neural activity with latent states, and model the neural noise spectrum and functional connectivity with multivariate autoregressive dynamics, along with impulse responses for external stimulus processing. We also develop a new regularized Expectation-Maximization (EM) algorithm to fit this model to electroencephalography (EEG) data.
|
25 |
Describing and retrieving visual content using natural languageRamanishka, Vasili 11 February 2021 (has links)
Modern deep learning methods have boosted research progress in visual recognition and text understanding but it is a non-trivial task to unite these advances from both disciplines. In this thesis, we develop models and techniques that allow us to connect natural language and visual content enabling automatic video subtitling, visual grounding, and text-based image search. Such models could be useful in a wide range of applications in robotics and human-computer interaction bridging the gap in vision and language understanding.
First, we develop a model that generates natural language descriptions of the main activities and scenes depicted in short videos. While previous methods were constrained to a predefined list of objects, actions, or attributes, our model learns to generate descriptions directly from raw pixels. The model exploits available audio information and the video’s category (e.g., cooking, movie, education) to generate more relevant and coherent sentences.
Then, we introduce a technique for visual grounding of generated sentences using the same video description model. Our approach allows for explaining the model’s prediction by localizing salient video regions for corresponding words in the generated sentence.
Lastly, we address the problem of image retrieval. Existing cross-modal retrieval methods work by learning a common embedding space for different modalities using parallel data such as images and their accompanying descriptions. Instead, we focus on the case when images are connected by relative annotations: given the context set as an image and its metadata, the user can specify desired semantic changes using natural language instructions. The model needs to capture distinctive visual differences between image pairs as described by the user. Our approach enables interactive image search such that the natural language feedback significantly improves the efficacy of image retrieval.
We show that the proposed methods advance the state-of-the-art for video captioning and image retrieval tasks in terms of both accuracy and interpretability.
|
26 |
Online Ads: The influence of color cues on sensory expectation : An expectation-based approach to explain the cross- modal influence of color cues in online adsLindholm, Maxime, Svensson, Sofia January 2021 (has links)
The purpose of this thesis is to contribute to new insights into how businesses can utilize color cues as a multisensory tool in online ads to evoke positive sensory expectations. An expectation-based approach was used to examine the cross-modal influence of warm relative to cold colored cues. The theory chapter starts with sensory marketing. Later on in the chapter, further theories are discussed, such as cross-modal correspondences and the perceptual process. The methodology used in this study was a deductive research approach, and the thesis has been carried out with a quantitative research study by collecting data through a between-subjects experiment. The outcome from this research study has created a greater understanding in color usage in digital media and how this might trigger other senses. It has shown that online ads with a warm colored background have a stronger cross-modal correspondence to taste, touch, sound and smell relative to cold colored ads. This is proven through strong significant differences in all of the hypotheses of this study. This study has created a better understanding of the impact of color cues on consumer expectations. The importance of using color as a tool when planning out marketing activities and advertising is covered in this study. The findings from this research are highly corresponding with theories of color psychology and cognitive psychology.
|
27 |
Empirical approaches to timbre semantics as a foundation for musical analysisReymore, Lindsey E. 30 September 2020 (has links)
No description available.
|
28 |
CLIP-RS: A Cross-modal Remote Sensing Image Retrieval Based on CLIP, a Northern Virginia Case StudyDjoufack Basso, Larissa 21 June 2022 (has links)
Satellite imagery research used to be an expensive research topic for companies and organizations due to the limited data and compute resources. As the computing power and storage capacity grows exponentially, a large amount of aerial and satellite images are generated and analyzed everyday for various applications. Current technological advancement and extensive data collection by numerous Internet of Things (IOT) devices and platforms have amplified labeled natural images. Such data availability catalyzed the development and performance of current state-of-the-art image classification and cross-modal models. Despite the abundance of publicly available remote sensing images, very few remote sensing (RS) images are labeled and even fewer are multi-captioned.These scarcities limit the scope of fine tuned state of the art models to at most 38 classes, based on the PatternNet data, one of the largest publicly available labeled RS data. Recent state-of-the art image-to-image retrieval and detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the retrieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the retrieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform. Our proposed framework CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. Our implementation is deployed on a Web App for inference task on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier option of the Mapbox GL JS API and took advantage of its raster tiles option to locate the retrieved results on a local map, a combination of the downloaded raster tiles. Other options offered on our platform are: image similarity search, locating an image in the map, view images' geocoordinates and addresses.In this work we also proposed two remote sensing fine-tuned models and conducted a comparative analysis of our proposed models with a different fine-tuned model as well as the zeroshot CLIP model on remote sensing data. / Master of Science / Satellite imagery research used to be an expensive research topic for companies and organizations due to the limited data and compute resources. As the computing power and storage capacity grows exponentially, a large amount of aerial and satellite images are generated and analyzed everyday for various applications. Current technological advancement and extensive data collection by numerous Internet of Things (IOT) devices and platforms have amplified labeled natural images. Such data availability catalyzed the devel- opment and performance of current state-of-the-art image classification and cross-modal models. Despite the abundance of publicly available remote sens- ing images, very few remote sensing (RS) images are labeled and even fewer are multi-captioned.These scarcities limit the scope of fine tuned state of the art models to at most 38 classes, based on the PatternNet data,one of the largest publicly avail- able labeled RS data.Recent state-of-the art image-to-image retrieval and detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the re- trieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the re- trieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform.Cross-modal retrieval focuses on data retrieval across different modalities and in the context of this work, we focus on textual and imagery modalities.Our proposed frame- work CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. In deep learning, the concept of fine tuning consists of using weights from a different model or algorithm into a similar model with different domain specific application. Our implementation is deployed on a Web Application for inference tasks on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier option of the Mapbox GL JS API and took advantage of its raster tiles option to locate the retrieved results on a local map, a combination of the downloaded raster tiles. Other options offered on our platform are: image similarity search, locating an image in the map, view images' geocoordinates and addresses.In this work we also pro- posed two remote sensing fine-tuned models and conducted a comparative analysis of our proposed models with a different fine-tuned model as well as the zeroshot CLIP model on remote sensing data.
Detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the re- trieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the re- trieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform.Cross-modal retrieval focuses on data retrieval across different modalities and in the context of this work, we focus on textual and imagery modalities.Our proposed frame- work CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. In deep learning, the concept of fine tuning consists of using weights from a different model or algorithm into a similar model with different domain specific application. Our implementation is deployed on a Web Application for inference tasks on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier option of the Mapbox GL JS API and took advantage of its raster tiles option to locate the retrieved results on a local map, a combination of the downloaded raster tiles. Other options offered on our platform are: image similarity search, locating an image in the map, view images' geocoordinates and addresses.In this work we also pro- posed two remote sensing fine-tuned models and conducted a comparative analysis of our proposed models with a different fine-tuned model as well as the zeroshot CLIP model on remote sensing data.
|
29 |
Modality dominance in young children: the underlying mechanisms and broader implicationsNapolitano, Amanda C. 15 November 2006 (has links)
No description available.
|
30 |
Beyond narrative : a cross-modal approach to soundtrack compositionGeorgiou, Chrystalla January 2017 (has links)
This research project addresses the problem of scoring non-narrative film work. Deprived of a narrative content to follow, the composer faces the fundamental problem of deciding what other elements should be considered for establishing a meaningful relationship between the screened events and the music soundtrack. In order to mitigate the problem, this research project investigates the possibility of applying cross-modal principles to soundtrack composition, and systematically exploits the human ability to experience or interpret the information channeled through one sense modality in terms of another. After the Introduction which explains the research aims and methods, the thesis is structured into subsequent chapters. Chapter two considers cross-modal relationships in music and other expressive arts along with a brief consideration of Reception Theory and its relation to my work. Chapter three provides a set of four case studies of contemporary compositional approaches to non-narrative film. Chapter four demonstrates a new and systematic approach to soundtrack composition through a specially devised Table of Audio-Visual Correspondences, mapping parameters from one domain to another. This method is then applied in Chapter five in relation to a portfolio of original composed soundtracks. A detailed analysis is provided of each piece and the application of crossmodal logic to the scoring of non-narrative video is discussed and evaluated. Finally, Chapter six offers conclusions, recommendations, and outlines the scope for further research. An explanation is given of how work on this thesis has affected my own practice and compositional voice. A suggestion is also provided on how this thesis can benefit the wider film music academic and practitioner community.
|
Page generated in 0.0382 seconds