Approches jointes texte/image pour la compréhension multimodale de documents / Text/image joint approaches for multimodal understanding of documentsDelecraz, Sébastien 10 December 2018 (has links)
Les mécanismes de compréhension chez l'être humain sont par essence multimodaux. Comprendre le monde qui l'entoure revient chez l'être humain à fusionner l'information issue de l'ensemble de ses récepteurs sensoriels. La plupart des documents utilisés en traitement automatique de l'information sont multimodaux. Par exemple, du texte et des images dans des documents textuels ou des images et du son dans des documents vidéo. Cependant, les traitements qui leurs sont appliqués sont le plus souvent monomodaux. Le but de cette thèse est de proposer des traitements joints s'appliquant principalement au texte et à l'image pour le traitement de documents multimodaux à travers deux études : l'une portant sur la fusion multimodale pour la reconnaissance du rôle du locuteur dans des émissions télévisuelles, l'autre portant sur la complémentarité des modalités pour une tâche d'analyse linguistique sur des corpus d'images avec légendes. Pour la première étude nous nous intéressons à l'analyse de documents audiovisuels provenant de chaînes d'information télévisuelle. Nous proposons une approche utilisant des réseaux de neurones profonds pour la création d'une représentation jointe multimodale pour les représentations et la fusion des modalités. Dans la seconde partie de cette thèse nous nous intéressons aux approches permettant d'utiliser plusieurs sources d'informations multimodales pour une tâche monomodale de traitement automatique du langage, afin d'étudier leur complémentarité. Nous proposons un système complet de correction de rattachements prépositionnels utilisant de l'information visuelle, entraîné sur un corpus multimodal d'images avec légendes. / The human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions.
Gestion de flot de conteneurs et de véhicules dans un réseau multimodal / Managing the Flow of Containers and Vehiculs in a Multimodal Network.Hemmidy, Mohamed 06 December 2018 (has links)
Le but de ce travail est l'étude du problème de gestion de flot de conteneurs et de véhicules dans un réseau multimodal. Nous proposons une formulation du problème sous forme d'un modèle mathématique réaliste qui prend en considération les différents aspects liés au transport et au stockage des conteneurs et dont l'objectif est de minimiser le coût global de transport. Pour la résolution des grandes instances une approche de résolution bi-niveaux est proposée. Dans un premier niveau nous construisons un modèle agrégé plus facile à résoudre pour en extraire les décisions liées aux déplacements des trains et des barges ainsi qu'une borne duale de notre problème de base. Ces informations sont utilisées dans un deuxième niveau pour avoir des solutions de bonne qualité de notre problème par rapport aux solutions données par CPLEX en résolvant directement le modèle de base. Des résultats numériques sur des instances générées aléatoirement sont présentés et ils prouvent l'intérêt de l'approche de résolution proposée dans cette thèse. / The aim of this work is the study of the problem of managing the flow of containers and vehicles in a multimodal network. We propose a formulation of the problem under the form of a realistic mathematical model that takes into account the different aspects related to container transport and storage and whose objective is to minimize the overall cost of transport. For large-scale resolution, a two-level resolution approach is proposed. In a first level we build an aggregated model easier to solve to extract the decisions related to the movements of trains and barges as well as a dual terminal of our basic problem. This information is used in a second level to have good quality solutions of our problem compared to the solutions given by CPLEX by directly solving the basic model. Numerical results on randomly generated instances are presented and prove the value of the proposed resolution approach.
Theoretical analysis of the effects of bus operations on urban corridors and networksCastrillon, Felipe 07 January 2016 (has links)
Bus systems have a large passenger capacity when compared to personal vehicles and thus have the potential to improve urban mobility. However, buses that operate in mixed vehicle traffic can undermine the effectiveness of the road system as they travel at lower speeds, take longer to accelerate and stop frequently to board and alight passengers. In traffic flow theory, buses are known as slow-moving bottlenecks that have the potential to create queue-spillbacks and thus increase the probability of gridlock. Currently, traditional metropolitan transportation planning models do not account for these negative effects on roadway capacity. Also, research methods that study multimodal operations are often simulated or algorithmic which can only provide specific results for defined inputs.
The objective of this research is to model and understand the effects of bus operations (e.g., headway, number of stops, number of routes) on system performance (e.g. urban corridor and network vehicular capacity) using a parsimonious analytical approach with a few parameters.The models are built using the Macroscopic Fundamental Diagram (MFD) of traffic which provides aggregate measures of vehicle density and flow. Existing MFD theory, which accounts for corridors with only one vehicle class are extended to include network roadway systems and bus operations. The results indicate that buses have two major effects on corridors: the moving bottleneck and the bus short-block effect. Also, these corridor effects are expanded to urban networks through a vehicle density-weighted average. The models have the potential to transform urban multimodal operations and management as they provide a simple tool to capture aggregate performance of transportation systems.
Recalage pour imagerie multimodale tomographique sur petit animalBandou, Massinissa January 2014 (has links)
L'imagerie médicale joue désormais un rôle central tant en recherche fondamentale sur le
développement des maladies que dans l'aide au diagnostic des pathologies. Il existe plusieurs
techniques ou modalités d'imagerie qui permettent de visualiser de façon non invasive
des corps biologiques et d'en fournir leurs propriétés structurelles et fonctionnelles. Un
des problèmes majeurs rencontré est de pouvoir analyser et traiter ces techniques d'imagerie
dans un référentiel commun. Ce problème, connu sous le nom de recalage ou registration,
consiste en une estimation d'une transformation géométrique permettant la superposition
spatiale des caractéristiques correspondantes entre les images. Deux contextes d'application
peuvent être distingués : Le recalage monomodal qui traite les séquences temporelles
d'images provenant de la même modalité et le recalage multimodal qui traite la mise en
correspondance d'images venant de différentes modalités.
Le laboratoire TomOptUs de l'Université de Sherbrooke a développé un lit multimodal,
avec marqueurs fiduciaux, destiné à accueillir un petit animal pour des prises de mesures
avec différentes modalités. Il est donc intéressant de recaler ces modalités en utilisant
l'information basée sur les attributs géométriques ou sur l'intensité des pixels ou voxels.
Le laboratoire a aussi développé un système laser-scanner permettant de mesurer le relief
d'un objet ou d'un animal afin d'obtenir des images profilométriques. Avec ces images, il
devient possible de faire un recalage de surface avec d'autres modalités d'imagerie dans la
mesure où on veut obtenir le profil extérieur de l'animal afin de faire une reconstruction
interne 3D.
Le cadre de cette recherche consiste à développer un programme informatique permettant
de répondre au besoin du laboratoire en matière de recalage d'image à l'aide de marqueurs fiduciaux et de recalage d'images profilométriques. Le présent document couvre la théorie et les algorithmes utilisés ainsi que la conception informatique détaillée permettant d'obtenir des résultats concluants.
Adding, retrieving and browsing content in social media and e-journalismAlharbe, Mahmood January 2012 (has links)
This thesis explores the use of avatars with facial expressions in social media and e-journalism communication interfaces. This thesis involved three experimental conditions. In the first experimental condition a survey (n=34) and an experiment (n=25) were carried out in order to explore the central problems faced by users during adding and retrieving comments and methods to overcome those problems. The survey intended to find out the position users took towards these metaphors. 25users from the Aljazeera Channel in Doha, Qatar took part. The first experimental condition consisted of two interfaces, TARCS (traditional adding and retrieving comments system) and CMARCS (classification multimodal adding and retrieving comments system). This was carried out in order to assess users' perception of unique text with graphic classification and multimodal in an EARCS (electronic adding and retrieving comments system) interface in the presence and absence of an interactive context. This was implemented in order to assess the role of these unique classification interfaces in a news comment in the term of usability. In the second experiment, forty users evaluated the use of the VARCS (visual adding and retrieving comments system) and MMARCS (multimodal adding and retrieving comments system). Both interfaces evaluated the effect on public opinion as media study and effectiveness, interactivity and user satisfaction in HCI studies. The third experimental condition consisted of one study that investigated the impactbility and usability of facial expressions compared text with graphic and multimodal metaphors. Sixty six users from Al-Arabiya Channel in Dubai, UEA took part in these two experiments. The results obtained show that users had some problems with adding and retrieving comments in social media such as missing data and lack of organisation. Also, the new classification performed better and faster under an interface that implemented avatars with specific facial expressions compared to a textual interface and multimodal. Practical guidelines were also introduced to provide assistance to multimedia designers who use avatars with facial expressions in e-journalism interactive systems as well as its impact on the public opinion.
Multimodal e-assessment : an empirical studyAlgahtani, Amirah January 2015 (has links)
Due to the availability of technology, there has been a shift from traditional assessment methods to e-assessment methods designed to support learning. With this development there is a need to address the suitability and effectiveness of the e-assessment interface. One development in the e-assessment interface has been the use of the multimodal metaphor. Unfortunately, the associated effectiveness of multimodality in terms of usability and its suitability in achieving assessment aims has not been fully addressed. Thus, there is a need to determine the impact of multimodality on the effectiveness of e-assessment and to reveal the benefits, primarily to the user. Moreover, those involved in the development and assessment should be aware of potential impacts and benefits. This thesis investigates the role and effectiveness of multimodal metaphors in e-assessment, specifically; the thesis assesses the effect of multimodal metaphors, alone or in combination, on usability in e-assessment. Usability includes efficiency, effectiveness and user satisfaction. The empirical research described in this study consisted of three experiments of 30 participants each to evaluate the effect of description text, avatars and images individually, avatars, description text and recorded speech in combination with images, and finally, the use of avatars with whole body gestures, earcons and auditory icons. The experimental stages were designed as a progression towards the main focus of the study, which was the effectiveness of full body gesture avatar, considered to be the latest development in multimodal metaphors. The experimentation also assessed the role that an avatar could play as a tutor in e-assessment interfaces. The results proved the positive effectiveness and applicability of metaphors to enhance e-assessment usability. This was achieved through a more effective interaction between the user and the assessment interface. A set of empirically derived guidelines for the design and use of these metaphors to enhance e-assessment is also used in order to generate more usable e-assessment interfaces.
Constructing scientific knowledge in the classroom : a multimodal analysis of conceptual change and the significance of gestureCallinan, Carol Jane January 2014 (has links)
Constructivism remains one of the most influential views of understanding how children learn science today. Research investigating learning from within this viewpoint has led to the development of a range of theoretical models, most of which aim to explain the underlying processes associated with conceptual change. Such models range in depth and scope with some attributing change to purely cognitive processes while others suggest a role for social factors. Contemporary research has also begun to explore links between the role of practical activity, skills development and language. This study utilises a cross-sectional design in order to investigate the development of children’s ideas and concepts related to two areas of the English National Curriculum for Science: ‘electricity’ and ‘floating and sinking’. A new and innovative multimodal methodology combining practical science activities and traditional / conventional perspectives alongside interview and observational protocols is presented. Multimodal research proposes that knowledge and meaning are transmitted through a range of responses types including language, drawings and gesture. The participants in this study were children aged 7, 11 and 14 years attending four schools in the East Midlands region. Results demonstrate that the children’s ideas could be developed using conceptual challenge tasks. The gestures that the children produced were categorised according to five different forms: referential, representative, expressive, thinking and social, often containing information about their science ideas that was not included in other response types. The results also begin to uncover how meaning is socially constructed and supported. These results form the basis of a critique of methodology intended to re-evaluate and inform debate arising from different models of conceptual change. The potential importance of studying children’s gestures in classroom settings for providing important cues and clues to underlying thoughts that may not be present in verbal or other more conventional responses alone is highlighted.
Superhumans: How teachers use graphic novels to encourage student engagement in learning2016 April 1900 (has links)
This qualitative study explored how teachers used graphic novels to encourage student engagement in learning. A case study approach was used to achieve my two research objectives: 1) to examine current research about graphic novels and pedagogical understandings relevant to the study of graphic novels as a pedagogical resource, and 2) to identify the pedagogical understandings of four secondary language arts teachers using graphic novels to encourage student engagement in learning.
Action research framed the approach used to examine the collaborative practices of four teacher participants and myself as we learned about graphic novels. Interviews, focus groups, observations, and artifact analysis all contributed to highlighting the pedagogical understandings of the participants.
The findings confirmed previous scholarship that graphic novels can be a beneficial pedagogical tool in ELA classrooms, further encouraging student engagement in learning and valuing students out of school interests. The findings also confirmed that teachers go through a unique, collaborative, and at times, individualized process of learning before teaching a new resource, but when preparing and sharing graphic novels with students preferred to frame the learning using before, during, and after comprehension strategies and activities to present their units. The findings also affirmed that resource selection and evaluation was highly influenced by the teachers prior-interests and understanding of curriculum.
The study also produced some interesting findings that suggested the need for pre-service and in-service professional development opportunities around graphic novels so that teachers can be prepared to support and growing multimodal and multiliterate population. Furthermore, and unexpectedly, the participants each developed a passion for graphic novels where they previously had none and all continue to use graphic novels in their classrooms and read them for pleasure.
Multimodalt skrivande - förutsättningar och lärandemöjligheter : Litteraturstudie om mellanstadieelevers lärandemöjligheter vid multimodalt skrivande inom svenskämnet och förutsättningar för en multimodal skrivundervisningSundström, Jessica January 2015 (has links)
Kursplanen i svenska förklarar att eleverna ska utveckla det multimodala skrivandet inom svenskämnet. Det multimodala skrivandet innebär att ord, bild och ljud kombineras och samspelar. Huvudsyftet med den här litteraturstudien har varit att undersöka hur det multimodala skrivandet inom svenskämnet för årskurs 4-6 kan se ut, vilka kompetenser och resurser som krävs för att bedriva en multimodal skrivundervisning, samt vilket slags lärande det multimodala skrivandet kan ge upphov till hos eleverna. Litteraturstudien visar att det multimodala skrivandet kan förekomma såväl analogt som digitalt. Vidare visar den att svensk forskning på området är mycket begränsad. De artiklar och avhandlingar som inkluderats i litteraturstudien visar att forskare är eniga om att lärare behöver utveckla sina kunskaper om olika teckenvärldar, såsom auditiva och visuella, för att göra elever medvetna om teckenvärldarnas meningspotential och samspel. Det multimodala skrivandet ger upphov till en form av samordnat lärande, eftersom det multimodala skrivandet är en komplex process, som kräver att eleverna får explicit undervisning om aktuell digital programvara och teckenvärldarnas meningsskapande. Multimodalt skrivande är ett vanligt inslag utanför skolan, men bör få tillträde in i skolvärlden. Det förutsätter att digitala resurser finns tillgängliga och att lärare är positivt inställda till den multimodala skrivutvecklingen.
Score-level fusion for multimodal biometricsAlsaade, Fawaz January 2008 (has links)
This thesis describes research into the score-level fusion process in multimodal biometrics. The emphasis of the research is on the fusion of face and voice biometrics in the two recognition modes of verification and open-set identification. The growing interest in the use of multiple modalities in biometrics is due to its potential capabilities for eradicating certain important limitations of unimodal biometrics. One of the factors important to the accuracy of a multimodal biometric system is the choice of the technique deployed for data fusion. To address this issue, investigations are carried out into the relative performance of several statistical data fusion techniques for combining the score information in both unimodal and multimodal biometrics (i.e. speaker and/ or face verification). Another important issue associated with any multimodal technique is that of variations in the biometric data. Such variations are reflected in the corresponding biometric scores, and can thereby adversely influence the overall effectiveness of multimodal biometric recognition. To address this problem, different methods are proposed and investigated. The first approach is based on estimating the relative quality aspects of the test scores and then passing them on into the fusion process either as features or weights. The approach provides the possibility of tackling the data variations based on adjusting the weights for each of the modalities involved according to its relative quality. Another approach considered for tackling the effects of data variations is based on the use of score normalisation mechanisms. Whilst score normalisation has been widely used in voice biometrics, its effectiveness in other biometrics has not been previously investigated. This method is shown to considerably improve the accuracy of multimodal biometrics by appropriately correcting the scores from degraded modalities prior to the fusion process. The investigations in this work are also extended to the combination of score normalisation with relative quality estimation. The experimental results show that, such a combination is more effective than the use of only one of these techniques with the fusion process. The thesis presents a thorough description of the research undertaken, details the experimental results and provides a comprehensive analysis of them.
