Global ETD Search

1	Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation January 2018 (has links) abstract: Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated. Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated. Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset. / Dissertation/Thesis / Masters Thesis Computer Engineering 2018 Artificial intelligence Multimodal Learning Text-to-Image Translation Visual Reasoning
2	Data-Driven Representation Learning in Multimodal Feature Fusion January 2018 (has links) abstract: Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction. We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems. In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Computer science Deep Learning Feature Fusion Multimodal Learning Representation Learning
3	Multimodal Learning and Single Source WiFi Based Indoor Localization Wu, Hongyu 15 June 2020 (has links) No description available. Computer Engineering Electrical Engineering
4	Audio-tactile displays to improve learnability and perceived urgency of alarming stimuli Momenipour, Amirmasoud 01 August 2019 (has links) Based on cross-modal learning and multiple resources theory, human performance can be improved by receiving and processing additional streams of information from the environment. In alarm situations, alarm meanings need to be distinguishable from each other and learnable for users. In audible alarms, by manipulating the temporal characteristics of sounds different audible signals can be generated. However, in some cases such as in using discrete medical alarms, when there are too many audible signals to manage, changes in temporal characteristics may not generate discriminable signals that would be easy for listeners to learn. Multimodal displays can be developed to generate additional auditory, visual, and tactile stimuli for helping humans benefit from cross-modal learning and multiple attentional resources for a better understanding of the alarm situations. In designing multimodal alarm displays in work domains where the alarms are predominantly auditory-based and where accessing visual displays is not possible at all times, tactile displays can enhance the effectiveness of alarms by providing additional streams of information for understanding the alarms. However, because of low information density of tactile information presentation, the use of tactile alarms has been limited. In this thesis, by using human subjects, the learnability of auditory and tactile alarms, separately and together in an audio-tactile display were studied. The objective of the study was to test cross-modal learning when messages of an alarm (i.e. meaning, urgency level) were conveyed simultaneously in audible, tactile and audio-tactile alarm displays. The alarm signals were designed by using spatial characteristics of tactile, and temporal characteristics of audible signals separately in audible and tactile displays as well as together in an audio-tactile display. This study explored if using multimodal alarms (tactile and audible) would help learning unimodal (audible or tactile) alarm meanings and urgency levels. The findings of this study can help for design of more efficient discrete audio-tactile alarms that promote learnability of alarm meanings and urgency levels. Audio-tactile alarms Cross-modal learning Medical alarms Multimodal Alarms Multimodal learning Wearables Industrial Engineering
5	MULTIMODAL VIRTUAL LEARNING ENVIRONMENTS: THE EFFECTS OF VISUO-HAPTIC SIMULATIONS ON CONCEPTUAL LEARNING Mayari Serrano Anazco (8790932) 03 May 2020 (has links) <p>Presently, it is possible to use virtual learning environments for simulating abstract and/or complex scientific concepts. Multimodal Virtual Learning Environments use multiple sensory stimuli, including haptic feedback, in the representation of concepts. Past research</p> <p>on the utilization of haptics for learning has shown inconsistent results when gains in conceptual knowledge had been assessed. This research focused on two abstract phenomena</p> <p>Electricity and Magnetism and Buoyancy. These abstract concepts were experienced by students using either visual, visuo-haptic, or hands-on learning activities. Embodied</p> <p>Cognition Theory was used as a for the implementation of the learning environments. Both phenomena were assessed using qualitative and quantitative data analysis techniques.</p> <p>Results suggested that haptic, visual, and physical modalities affected positively the acquisition of conceptual knowledge of both concepts.</p> Educational Technology and Computing visuo-haptic simulation haptic feedback buoyancy electricity and magnetism multimodal learning environments
6	Multimodal Image Classification In Fluoropolymer AFM And Chest X-Ray Images Meshnick, David Chad 26 May 2023 (has links) No description available. Computer Science machine learning multimodal learning image classification chest xray afm fluoropolymer
7	Modeling Complex Networks via Graph Neural Networks Yella, Jaswanth 05 June 2023 (has links) No description available. Computer Science graph neural networks complex networks deep learning multimodal learning drug discovery drug interactions
8	Storytime at Irish Libraries : How public libraries can boostearly literacy through reading promotion events O'Driscoll, Mariana January 2019 (has links) The purpose of this Master’s thesis is to explore how libraries’ storytime for babies and toddlers can be construedas a reading promotion event which boosts early literacy, by ways of a multiple-case study of storytime at four public libraries in western Ireland. The study will also explore how the different libraries design these events and include different elements of traditional reading and storytelling, multimodal reading and technology, participation, and accessibility and inclusion through a sociocultural lens. The theoretical framework is constructed on the concepts and theory related to literacy development and reading promotion, and works as an analytical tool through which the empirical data collection will be examined. Data was collected through observations of storytimes at four public libraries in Ireland, as well as interviews with the involved librarians. The results show that although the librarians do not actively work to implement national and EU storytime templates, they offer programmes which are in tune with their participants’ needs, and invoke engagement and excitement about reading among children and parents or guardians alike. library and information science reading promotion early literacy storytime children’s library sociocultural perspective digital technologies multimodal learning Information Studies Biblioteks- och informationsvetenskap
9	Visual Interactive Labeling of Large Multimedia News Corpora Han, Qi, John, Markus, Kurzhals, Kuno, Messner, Johannes, Ertl, Thomas 25 January 2019 (has links) The semantic annotation of large multimedia corpora is essential for numerous tasks. Be it for the training of classification algorithms, efficient content retrieval, or for analytical reasoning, appropriate labels are often the first necessity before automatic processing becomes efficient. However, manual labeling of large datasets is time-consuming and tedious. Hence, we present a new visual approach for labeling and retrieval of reports in multimedia news corpora. It combines automatic classifier training based on caption text from news reports with human interpretation to ease the annotation process. In our approach, users can initialize labels with keyword queries and iteratively annotate examples to train a classifier. The proposed visualization displays representative results in an overview that allows to follow different annotation strategies (e.g., active learning) and assess the quality of the classifier. Based on a usage scenario, we demonstrate the successful application of our approach. Therein, users label several topics which interest them and retrieve related documents with high confidence from three years of news reports. info:eu-repo/classification/ddc/006 ddc:006
10	Investigating reading comprehension in Reading While Listening and the relevancy of The Voice Effect / Undersökning av läsförståelse och rösteffekten inom samtidig lyssning och läsning Hedenström, Edvin, Barck-Holst, Axel January 2023 (has links) Various forms of multimedia learning have been shown to aid learners time and time again. One form of multimedia learning that has not been thoroughly studied is reading while listening (RWL). This is especially the case when it comes to the immediate impacts on reading comprehension from practising RWL. Furthermore the recent advancements of Text-To-Speech (TTS) have started to challenge the established notion that real human recorded spoken word is always preferable for learning, also known as The Voice Effect. This study looked at Swedish University students with English as their second language (L2) and examined how their reading comprehension in L2 was performing in three different groups. The groups were Reading Only (RO), Reading-While-Listening with spoken word (RWL-SW) and Reading-While-Listening with text-to-speech (RWL-TTS). The RO group was then compared to The RWL groups. The two RWL groups were also compared on test scores as well as perceived enjoyment and aid from the narration as reported by the participants. Our results did not exhibit any statistically significant difference in reading comprehension between the RO group and the RWL groups. When looking at the results of the reading comprehension test the RO and RWL-TTS groups got the exact same number of correct answers. This suggests that RWL did not have any notable impact on reading comprehension. Furthermore no statistical significant difference was found between the two RWL groups in test scores or perceived enjoyment and aid from the narration. What’s interesting to note is that RWL-SW performed slightly worse than RWL-TTS on the comprehension test. The reported perceived enjoyment and aid from the narration was also notably similar to each other. This suggests that The Voice Effect did not have relevance in this test. / Olika former av multimediainlärning har visat sig hjälpa eleverna gång på gång. En form av multimedieinlärning som inte har studerats grundligt är läsning medan man lyssnar (RWL). Detta gäller särskilt när det gäller de omedelbara effekterna på läsförståelsen av att använda på RWL. Dessutom har de senaste framstegen med text till tal (TTS) börjat utmana den etablerade uppfattningen att verkligt mänskligt inspelat talat ord alltid är att föredra vid inlärning, även kallat “Rösteffekten” (The Voice Effect). I den här studien undersöktes svenska universitetsstudenter med engelska som andraspråk (L2) och hur deras läsförståelse i L2 presterade i tre olika grupper. Grupperna var Reading Only (RO), Reading-While-Listening med en mänsklig talare (RWL-SW) och Reading-While-Listening med text-to-speech (RWL-TTS). RO-gruppen jämfördes sedan med RWL-grupperna. De två RWL-grupperna jämfördes också med avseende på testresultat samt upplevd njutning och hjälp från berättandet enligt deltagarnas rapporter. Våra resultat visade ingen statistiskt signifikant skillnad i läsförståelse mellan RO-gruppen och RWL-grupperna. När man tittar på resultaten av läsförståelsetestet fick RO- och RWL-TTS- grupperna exakt lika många korrekta svar. Detta tyder på att RWL inte hade någon anmärkningsvärd inverkan på läsförståelsen. Dessutom hittades ingen statistiskt signifikant skillnad mellan de två RWL-grupperna när det gäller testresultat eller upplevd njutning och hjälp av uppläsningen. Vad som är intressant att notera är att RWL-SW presterade något sämre än RWL-TTS på läsförståelsetestet. Den rapporterade upplevda uppskattningen och hjälp från uppläsning var också anmärkningsvärt likartade. Detta tyder på att “The Voice Effect” inte hade någon betydelse i detta test. RWL Reading-while-listening Multimedia learning Multimodal Learning TTS text-to-speech reading comprehension Computer and Information Sciences Data- och informationsvetenskap

Search results