Global ETD Search

1	Polarimetric Imagery for Object Pose Estimation Siefring, Matthew D. 15 May 2023 (has links) No description available. Electrical Engineering Optics Polarimetric Imagery visible-spectrum deep-learning object pose estimation CNN late-fusion Stokes-products dataset
2	Fusion tardive asynchrone appliquée à la reconnaissance des gestes / Asyncronous late fusion applied to gesture recognition Saade, Philippe 11 May 2017 (has links) Dans cette thèse, nous nous intéressons à la reconnaissance de l'activité humaine. Nous commençons par proposer notre propre définition d'une action : une action est une séquence prédéfinie de gestes simples et concaténés. Ainsi, des actions similaires sont composées par les mêmes gestes simples. Chaque réalisation d'une action (enregistrement) est unique. Le corps humain et ses articulations vont effectuer les mêmes mouvements que celles d'un enregistrement de référence, avec des variations d'amplitude et de dynamique ne devant pas dépasser certaines limites qui conduiraient à un changement complet d'action. Pour effectuer nos expérimentations, nous avons capturé un jeu de données contenant des variations de base, puis fusionné certains enregistrements avec d'autres actions pour former un second jeu induisant plus de confusion au cours de la classification. Ensuite, nous avons capturé trois autres jeux contenant des propriétés intéressantes pour nos expérimentations avec la Fusion Tardive Asynchrone (ou Asynchronous Late Fusion notée ALF). Nous avons surmonté le problème des petits jeux non discriminants pour la reconnaissance d'actions en étendant un ensemble d'enregistrements effectués par différentes personnes et capturés par une caméra RGB-D. Nous avons présenté une nouvelle méthode pour générer des enregistrements synthétiques pouvant être utilisés pour l'apprentissage d'algorithmes de reconnaissance de l'activité humaine. La méthode de simulation a ainsi permis d'améliorer les performances des différents classifieurs. Un aperçu général de la classification des données dans un contexte audiovisuel a conduit à l'idée de l'ALF. En effet, la plupart des approches dans ce domaine classifient les flux audio et vidéo séparément, avec des outils différents. Chaque séquence temporelle est analysée séparément, comme dans l'analyse de flux audiovisuels, où la classification délivre des décisions à des instants différents. Ainsi, pour déduire la décision finale, il est important de fusionner les décisions prises séparément, d'où l'idée de la fusion asynchrone. Donc, nous avons trouvé intéressant d'appliquer l'ALF à des séquences temporelles. Nous avons introduit l'ALF afin d'améliorer la classification temporelle appliquée à des algorithmes de fusion tardive tout en justifiant l'utilisation d'un modèle asynchrone lors de la classification des données temporelles. Ensuite, nous avons présenté l'algorithme de l'ALF et les paramètres utilisés pour l'optimiser. Enfin, après avoir mesuré les performances de classifications avec différents algorithmes et jeux de données, nous avons montré que l'ALF donne de meilleurs résultats qu'une solution synchrone simple. Etant donné qu'il peut être difficile d'identifier les jeux de données compatibles avec l'ALF, nous avons construit des indicateurs permettant d'en extraire des informations statistiques. / In this thesis, we took interest in human action recognition. Thus, it was important to define an action. We proposed our own definition: an action is a predefined sequence of concatenated simple gestures. The same actions are composed of the same simple gestures. Every performance of an action (recording) is unique. Hence, the body and the joints will perform the same movements as the reference recording, with changes of dynamicity of the sequence and amplitude in the DOF. We note that the variations in the amplitude and dynamicity must not exceed certain boundaries in order not to lead to entirely different actions. For our experiments, we captured a dataset composed of actions containing basic variations. We merged some of those recordings with other actions to form a second dataset, consequently inducing more confusion than the previous one during the classification. We also captured three other datasets with properties that are interesting for our experimentations with the ALF (Asynchronous Late Fusion). We overcame the problem of non-discriminatory actions datasets for action recognition by enlarging a set of recordings performed by different persons and captured by an RGB-D camera. We presented a novel method for generating synthetic recordings, for training action recognition algorithms. We analyzed the parameters of the method and identified the most appropriate ones, for the different classifiers. The simulation method improved the performances while classifying different datasets. A general overview of data classification starting from the audio-visual context led to the ALF idea. In fact, most of the approaches in the domain classify sound and video streams separately with different tools. Every temporal sequence from a recording is analyzed distinctly, as in audiovisual stream analysis, where the classification outputs decisions at various time instants. Therefore, to infer the final decision, it is important to fuse the decisions that were taken separately, hence the idea of the asynchronous fusion. As a result, we found it interesting to implement the ALF in temporal sequences. We introduced the ALF model for improving temporal events classification applied on late fusion classification algorithms. We showed the reason behind the use of an asynchronous model when classifying datasets with temporal properties. Then, we introduced the algorithm behind the ALF and the parameters used to tune it. Finally, according to computed performances from different algorithms and datasets, we showed that the ALF improves the results of a simple Synchronous solution in most of the cases. As it can be difficult for the user of the ALF solution to determine which datasets are compatible with the ALF, we built indicators to compare the datasets by extracting statistical information from the recordings. We developed indexes: the ASI and the ASIP, combined into a final index (the ASIv) to provide information concerning the compatibility of the dataset with the ALF. We evaluated the performances of the ALF on the segmentation of action series and compared the results between synchronous and ALF solutions. The method that we proposed increased the performances. We analyzed the human movement and gave a general definition of an action. Later, we improved this definition and proposed a "visual definition" of an action. With the aid of the ALF model, we focus on the parts and joints of an action that are the most discriminant and display them in an image. In the end, we proposed multiple paths as future studies. The most important ones are : - Working on a process to find the ALF's number of parts using the ASIv. - Reducing the complexity by finding the discriminant joints and features thanks to the ALF properties - Studying the MD-DTW features in-depth since the algorithm depends on the choice of the features - Implementing a DNN for comparison purposes - Developing the confidence coefficient. Fusion tardive Reconnaissance de gestes Classification de gestes Analyse temporelle Simulation des gestes Late Fusion Gesture Recognition 3. Gesture Classification Temporal Analysis Gesture Simulation
3	Analyse et interprétation de scènes visuelles par approches collaboratives / Analysis and interpretation of visual scenes through collaborative approaches / Analiza si interpretarea scenelor vizuale prin abordari colaborative Strat, Sabin Tiberius 04 December 2013 (has links) Les dernières années, la taille des collections vidéo a connu une forte augmentation. La recherche et la navigation efficaces dans des telles collections demande une indexation avec des termes pertinents, ce qui nous amène au sujet de cette thèse, l’indexation sémantique des vidéos. Dans ce contexte, le modèle Sac de Mots (BoW), utilisant souvent des caractéristiques SIFT ou SURF, donne de bons résultats sur les images statiques. Notre première contribution est d’améliorer les résultats des descripteurs SIFT/SURF BoW sur les vidéos en pré-traitant les vidéos avec un modèle de rétine humaine, ce qui rend les descripteurs SIFT/SURF BoW plus robustes aux dégradations vidéo et qui leurs donne une sensitivité à l’information spatio-temporelle. Notre deuxième contribution est un ensemble de descripteurs BoW basés sur les trajectoires. Ceux-ci apportent une information de mouvement et contribuent vers une description plus riche des vidéos. Notre troisième contribution, motivée par la disponibilité de descripteurs complémentaires, est une fusion tardive qui détermine automatiquement comment combiner un grand ensemble de descripteurs et améliore significativement la précision moyenne des concepts détectés. Toutes ces approches sont validées sur les bases vidéo du challenge TRECVid, dont le but est la détection de concepts sémantiques visuels dans un contenu multimédia très riche et non contrôlé. / During the last years, we have witnessed a great increase in the size of digital video collections. Efficient searching and browsing through such collections requires an indexing according to various meaningful terms, bringing us to the focus of this thesis, the automatic semantic indexing of videos. Within this topic, the Bag of Words (BoW) model, often employing SIFT or SURF features, has shown good performance especially on static images. As our first contribution, we propose to improve the results of SIFT/SURF BoW descriptors on videos by pre-processing the videos with a model of the human retina, thereby making these descriptors more robust to video degradations and sensitivite to spatio-temporal information. Our second contribution is a set of BoW descriptors based on trajectories. These give additional motion information, leading to a richer description of the video. Our third contribution, motivated by the availability of complementary descriptors, is a late fusion approach that automatically determines how to combine a large set of descriptors, giving a high increase in the average precision of detected concepts. All the proposed approaches are validated on the TRECVid challenge datasets which focus on visual concept detection in very large and uncontrolled multimedia content. Indexation sémantique Vidéo Sac de mots SIFT SURF Rétine Spatio-temporel Trajectoires Fusion tardive Semantic indexing Video Bag of Words SIFT SURF Retina Spatio-temporal Trajectories Late fusion
4	Multimodal Model for Construction Site Aversion Classification Appelstål, Michael January 2020 (has links) Aversion on construction sites can be everything from missingmaterial, fire hazards, or insufficient cleaning. These aversionsappear very often on construction sites and the construction companyneeds to report and take care of them in order for the site to runcorrectly. The reports consist of an image of the aversion and atext describing the aversion. Report categorization is currentlydone manually which is both time and cost-ineffective. The task for this thesis was to implement and evaluate an automaticmultimodal machine learning classifier for the reported aversionsthat utilized both the image and text data from the reports. Themodel presented is a late-fusion model consisting of a Swedish BERTtext classifier and a VGG16 for image classification. The results showed that an automated classifier is feasible for thistask and could be used in real life to make the classification taskmore time and cost-efficient. The model scored a 66.2% accuracy and89.7% top-5 accuracy on the task and the experiments revealed someareas of improvement on the data and model that could be furtherexplored to potentially improve the performance. Machine Learning Artificial Intelligence Convolutional Neural Networks Natural Language Processing BERT Multi modal AI ML CNN late-fusion classification Engineering and Technology Teknik och teknologier
5	Analyse et description de la morphologie foliaire : application à la classification et l'identification d'espèces de plantes / Analysis and description of leaf morphology : application to the classification and identification of plant species Mzoughi, Olfa 14 May 2016 (has links) De nos jours, l’identification automatique des espèces de plantes par l’analyse d’images, devient incontournable pour faire perdurer, standardiser voire approfondir les connaissances relatives à la communauté végétale. Cette thèse aborde le problème d’identification automatique des espèces de plantes en utilisant les images de feuilles. Elle s’attaque à deux principaux challenges: Le premier challenge est le grand nombre et la large variabilité de la morphologie foliaire des espèces et le deuxième challenge est la variabilité intra-espèces qui se manifeste localement au niveau de régions particulières des feuilles. Pour pallier à ces deux problèmes, un retour à la botanique et notamment aux concepts botaniques foliaires a été établi pour définir une structuration automatique des feuilles à deux niveaux: Le premier niveau concerne un schéma de catégorisation selon les deux concepts botaniques “arrangement” et “lobation”. Le deuxième niveau consiste à définir les parties sémantiques qui composent la feuille. L’approche de la thèse s’articule autour de deux principaux volets: Dans le premier volet, nous nous intéressons à mettre en place cette structuration guidée par la sémantique botanique en définissant des propriétés géométriques simples corrélées avec les définitions et les observations botaniques. Dans le deuxième volet, nous étudions la faisabilité et la pertinence d’intégrer cette structuration dans la chaîne d’identification. Particulièrement, nous établissons des recherches ciblées dans les catégories et nous définissons des modèles de parties à significations botaniques. Nous établissons notre évaluation sur les deux bases d’images de Scans de feuilles ImageCLEF 2011 et ImageCLEF 2012. Nous comparons notre approche par rapport à un schéma d’identification de référence, appliqué sur la totalité de la base et en utilisant l’image entière, et par rapport à plusieurs méthodes référencées dans la littérature. / Nowadays, automatic identification of plant species, by image analysis, has become crucial to maintain, standardize or deepen knowledge about the plant community. This thesis focus on the problem of automatic identification of plant species using leaf images. It addresses two main challenges: The first challenge is the large number and the high variability in foliar morphology across species. The second challenge is the intra-species variability which occurs locally at particular regions of leaves. To overcome these two problems, a return to botany and especially to leaf botanical concepts is established in order to define an automatic structuring of leaves at two levels: The first level concerns a categorisation scheme according to the botanical concepts “arrangement” and “lobation". The second level consists in decomposing leaves into semantic parts. The approach of the thesis is based on two key parts: In the first part, we focus on establishing this botanical-based structuring process by defining simple geometric properties correlated with botanical definitions and observations. In the second part, we investigate the feasibility and opportunities to integrate this structuring process in the identification scheme. Particularly, we make targeted researches in categories and we define specific part-based models.Experiments are conducted using the ImageCLEF 2011 and 2012 Scan images leaf databases. We compare our approach with respect to the reference identification scheme, applied on the whole databaseand using the entire images, and with respect to several methods referenced in the literature. Feuilles Botanique Caractères morphologiques Structuration sémantique Catégorisation Partition Modèles statistiques Fusion tardive Identification des espèces Leaves Botany Morphological characters Semantic structuring Categorisation Partition Statistical models Late fusion Species identification
6	Multimodal Classification of Second-Hand E-Commerce Ads / Multimodal klassiciering av annonser på Second-Hand-Marknadsplatser Åberg, Ludvig January 2018 (has links) In second-hand e-commerce, categorization of new products is typically done by the seller. Automating this process makes it easier to upload ads and could lower the number of incorrectly categorized ads. Automatic ad categorization also makes it possible for a second-hand e-commerce platform to use a more detailed category system, which could make the shopping experience better for potential buyers. Product ad categorization is typically addressed as a text classification problem as most metadata associated with products are textual. By including image information, i.e. using a multimodal approach, better performance can however be expected. The work done in this thesis evaluates different multimodal deep learning models for the task of ad categorization on data from Blocket.se. We examine late fusion models, where the modalities are combined at decision level, and early fusion models, where the modalities are combined at feature level. We also introduce our own approach Text Based Visual Attention (TBVA), which extends the image CNN Inception v3 with an attention mechanism to incorporate textual information. For all models evaluated, the text classifier fastText is used to process text data and the Inception v3 network to process image data. Our results show that the late fusion models perform best in our setting. We conclude that these models generally learn which of the baseline models to ’trust’, while early fusion and the TBVA models learn more abstract concepts. As future work, we would like to examine how the TBVA models perform on other tasks, such as ad similarity. / Produkter som läggs ut på marknadsplatser, såsom Blocket.se, kategoriseras oftast av säljaren själv. Att automatisera processen för kategorisering gör det därför både enklare och snabbare att lägga upp annonser och kan minska antalet produkter med felaktig kategori. Automatisk kategorisering gör det ocksåmöjligt för marknadsplatsen att använda ett mer detaljerat kategorisystem, vilket skulle kunna effektivisera sökandet efter produkter för potentiella köpare.Produktkategorisering adresseras ofta som ett klassificeringsproblem för text, eftersom den största delen av produktinformationen finns i skriftlig form. Genom att också inkludera produktbilder kan vi dock förvänta oss bättre resultat.I den här uppsatsen evalueras olika metoder för att använda både bild och text för annonsklassificering av data från blocket.se. I synnerhetundersökslate fusion modeller, där informationen från modaliteterna kombineras i samband med klassificeringen, samt early fusion modeller, där modaliteterna istället kombineras på en abstrakt nivå innan klassificeringen. Vi introduserar också vår egen modell Text Based Visual Attention (TBVA), en utvidgning av bildklassificeraren Inception v3 [1], som använder en attention mekanism för att inkorporera textinformation. För alla modeller som beskrivs i denna uppsats används textklassificeraren fast Text[2] för att processa text och bildklassificeraren Inception v3 för att processa bild. Våra resultat visar att late fusion modeller presterar bäst med vår data. I slutsatsen konstateras att late fusion modellerna lär sig vilka fall den ska 'lita' på text eller bild informationen, där early fusion och TBVA modellerna istället lär sig mer abstrakta koncept. Som framtida arbete tror vi det skulle vara av värde att undersöka hur TBVA modellerna presterar på andra uppgifter, såsom att bedöma likheter mellan annonser. Machine Learning Classification Multimodal Classification Multimodal Learning Representation Learning Late Fusion Early Fusion Visual Attention Marketplace Second Hand E-commerce Blocket.se Computer Sciences Datavetenskap (datalogi)
7	Nuevas contribuciones en aplicaciones de fusión multimodal de bioseñales Pereira González, Luis Manuel 26 December 2024 (has links) [ES] Esta tesis aborda el problema de fusión de datos en el ámbito de la neurociencia. El objetivo principal de este estudio es la fusión de modalidades, con énfasis en la fusión bimodal de señales biomédicas fMRI+EEG y de ECG+EEG. Las técnicas de fusión de datos tienen como objetivo alcanzar la exactitud y precisión en la toma de decisiones que sería más difícil con una sola modalidad. Hemos hecho una extensa revisión bibliográfica que contempla la fusión temprana y la fusión tardía de la siguiente manera: fusión temprana a nivel de sensores; fusión temprana a nivel de características; fusión tardía a nivel de scores; y fusión tardía a nivel de decisiones. En cada uno de esos apartados se presenta una tabla comparativa con las debilidades y fortalezas de cada método, así como los trabajos más citados. También hemos hecho aportes teóricos en esta área abordando el tema de la comparación entre la fusión temprana y la fusión tardía (soft y hard) para un problema multimodal de dos clases, dando elementos sobre la opción más adecuada a la hora de seleccionar la fusión temprana o tardía. Para este análisis hemos asumido inicialmente el conocimiento de los modelos utilizados., para después considerar modelos donde hay que estimar una serie de parámetros a partir de un conjunto de entrenamiento. El análisis se ha hecho para datos incorrelados y se ha extendido a datos con matrices de covarianza arbitrarias. Hemos realizado un estudio experimental como complemento del capítulo teórico. A partir de cuatro experimentos diferentes se destaca la efectividad de la fusión de datos multimodales para la mejora del rendimiento de los clasificadores. Los métodos de fusión y los clasificadores probados mostraron consistentemente un rendimiento superior en términos de métricas como el F1 score, la precisión, AUC y APR, en comparación con el uso de una sola modalidad de datos. Los resultados logrados subrayan la importancia de la fusión de datos en aplicaciones neurocientíficas y abren nuevas posibilidades para el desarrollo de sistemas de diagnóstico más precisos y robustos. / [CA] Aquesta tesi aborda el problema de la fusió de dades en l'àmbit de la neurociència. L'objectiu principal d'aquest estudi és la fusió de modalitats, amb èmfasi en la fusió bimodal de senyals biomèdiques fMRI+EEG i d'ECG+EEG. Les tècniques de fusió de dades tenen com a objectiu assolir l'exactitud i precisió en la presa de decisions que seria més difícil amb una sola modalitat. Hem fet una extensa revisió bibliogràfica que contempla la fusió primerenca i la fusió tardana de la següent manera: fusió primerenca a nivell de sensors; fusió primerenca a nivell de característiques; fusió tardana a nivell de puntuacions; i fusió tardana a nivell de decisions. En cadascun d'aquests apartats es presenta una taula comparativa amb les debilitats i fortaleses de cada mètode, així com els treballs més citats. També hem fet aportacions teòriques en aquesta àrea abordant el tema de la comparació entre la fusió primerenca i la fusió tardana (suau i dura) per a un problema multimodal de dues classes, donant elements sobre l'opció més adequada a l'hora de seleccionar la fusió primerenca o tardana. Per a aquesta anàlisi, hem assumit inicialment el coneixement dels models utilitzats, per després considerar models on cal estimar una sèrie de paràmetres a partir d'un conjunt d'entrenament. L'anàlisi s'ha fet per a dades incorrelades i s'ha estès a dades amb matrius de covariància arbitràries. Hem realitzat un estudi experimental com a complement del capítol teòric. A partir de quatre experiments diferents es destaca l'efectivitat de la fusió de dades multimodals per a la millora del rendiment dels classificadors. Els mètodes de fusió i els classificadors provats van mostrar constantment un rendiment superior en termes de mètriques com el F1 score, la precisió, AUC i APR, en comparació amb l'ús d'una sola modalitat de dades. Els resultats obtinguts subratllen la importància de la fusió de dades en aplicacions neurocientífiques i obrin noves possibilitats per al desenvolupament de sistemes de diagnòstic més precisos i robusts. / [EN] This thesis addresses the problem of data fusion in the field of neuroscience. The main objective of this study is to explore multimodal fusion, with an emphasis on bimodal fusion of biomedical signals such as fMRI+EEG and ECG+EEG. Data fusion techniques aim to achieve accuracy and precision in decision-making that would be more challenging with a single modality. We have conducted an extensive literature review covering early fusion and late fusion, as follows: early fusion at the sensor level, early fusion at the feature level, late fusion at the score level, and late fusion at the decision level. In each of these sections, we present a comparative table outlining the strengths and weaknesses of each method, as well as the most cited works. We have also made theoretical contributions to this area by addressing the comparison between early and late fusion (both soft and hard) for a two-class multimodal problem, providing insights into the most suitable choice between early and late fusion. For this analysis, we initially assumed knowledge of the models used, then considered scenarios where a series of parameters must be estimated from a training set. The analysis was conducted for uncorrelated data and extended to data with arbitrary covariance matrices. We conducted an experimental study to complement the theoretical chapter. Based on four different experiments, the effectiveness of multimodal data fusion in enhancing classifier performance was highlighted. The tested fusion methods and classifiers consistently demonstrated superior performance in terms of metrics such as F1 score, precision, AUC, and APR compared to using a single data modality. The results emphasize the importance of data fusion in neuroscientific applications and open up new possibilities for developing more accurate and robust diagnostic systems. / Pereira González, LM. (2024). Nuevas contribuciones en aplicaciones de fusión multimodal de bioseñales [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/213614 Biomedical signals Late fusion Early fusion Data fusion Neuroscience Machine learning algorithms Fusion methods Classifiers Algoritmos de aprendizaje automático Neurociencia Fusión de datos Fusión temprana Fusión tardía Señales biomédicas Métodos de fusión Clasificadores TEORÍA DE LA SEÑAL Y COMUNICACIONES

1

Page generated in 0.0618 seconds