Global ETD Search

1	Contributions on 3D Human Computer-Interaction using Deep approaches Castro-Vargas, John Alejandro 16 March 2023 (has links) There are many challenges facing society today, both socially and industrially. Whether it is to improve productivity in factories or with the intention of improving the quality of life of people in their homes, technological advances in robotics and computing have led to solutions to many problems in modern society. These areas are of great interest and are in constant development, especially in societies with a relatively ageing population. In this thesis, we address different challenges in which robotics, artificial intelligence and computer vision are used as tools to propose solutions oriented to home assistance. These tools can be organised into three main groups: “Grasping Challenges”, where we have addressed the problem of performing robot grasping in domestic environments; “Hand Interaction Challenges”, where we have addressed the detection of static and dynamic hand gestures, using approaches based on DeepLearning and GeometricLearning; and finally, “Human Behaviour Recognition”, where using a machine learning model based on hyperbolic geometry, we seek to group the actions that performed in a video sequence. CNN Hand gesture Hand pose Self-supervised learning
2	Self-Supervised Remote Sensing Image Change Detection and Data Fusion Chen, Yuxing 27 November 2023 (has links) Self-supervised learning models, which are called foundation models, have achieved great success in computer vision. Meanwhile, the limited access to labeled data has driven the development of self-supervised methods in remote sensing tasks. In remote sensing image change detection, the generative models are extensively utilized in unsupervised binary change detection tasks, while they overly focus on pixels rather than on abstract feature representations. In addition, the state-of-the-art satellite image time series change detection approaches fail to effectively leverage the spatial-temporal information of image time series or generalize well to unseen scenarios. Similarly, in the context of multimodal remote sensing data fusion, the recent successes of deep learning techniques mainly focus on specific tasks and complete data fusion paradigms. These task-specific models lack of generalizability to other remote sensing tasks and become overfitted to the dominant modalities. Moreover, they fail to handle incomplete modalities inputs and experience severe degradation in downstream tasks. To address these challenges associated with individual supervised learning models, this thesis presents two novel contributions to self-supervised learning models on remote sensing image change detection and multimodal remote sensing data fusion. The first contribution proposes a bi-temporal / multi-temporal contrastive change detection framework, which employs contrastive loss on image patches or superpixels to get fine-grained change maps and incorporates an uncertainty method to enhance the temporal robustness. In the context of satellite image time series change detection, the proposed approach improves the consistency of pseudo labels through feature tracking and tackles the challenges posed by seasonal changes in long-term remote sensing image time series using supervised contrastive loss and the random walk loss in ConvLSTM. The second contribution develops a self-supervised multimodal RS data fusion framework, with a specific focus on addressing the incomplete multimodal RS data fusion challenges in downstream tasks. Within this framework, multimodal RS data are fused by applying a multi-view contrastive loss at the pixel level and reconstructing each modality using others in a generative way based on MultiMAE. In downstream tasks, the proposed approach leverages a random modality combination training strategy and an attention block to enable fusion across modal-incomplete inputs. The thesis assesses the effectiveness of the proposed self-supervised change detection approach on single-sensor and cross-sensor datasets of SAR and multispectral images, and evaluates the proposed self-supervised multimodal RS data fusion approach on the multimodal RS dataset with SAR, multispectral images, DEM and also LULC maps. The self-supervised change detection approach demonstrates improvements over state-of-the-art unsupervised change detection methods in challenging scenarios involving multi-temporal and multi-sensor RS image change detection. Similarly, the self-supervised multimodal remote sensing data fusion approach achieves the best performance by employing an intermediate fusion strategy on SAR and optical image pairs, outperforming existing unsupervised data fusion approaches. Notably, in incomplete multimodal fusion tasks, the proposed method exhibits impressive performance on all modal-incomplete and single modality inputs, surpassing the performance of vanilla MultiViT, which tends to overfit on dominant modality inputs and fails in tasks with single modality inputs.
3	From Pixels to Prices with ViTMAE : Integrating Real Estate Images through Masked Autoencoder Vision Transformers (ViTMAE) with Conventional Real Estate Data for Enhanced Automated Valuation / Från pixlar till priser med ViTMAE : Integrering av bostadsbilder genom Masked Autoencoder Vision Transformers (ViTMAE) med konventionell fastighetsdata för förbättrad automatiserad värdering Ekblad Voltaire, Fanny January 2024 (has links) The integration of Vision Transformers (ViTs) using Masked Autoencoder pre-training (ViTMAE) into real estate valuation is investigated in this Master’s thesis, addressing the challenge of effectively analyzing visual information from real estate images. This integration aims to enhance the accuracy and efficiency of valuation, a task traditionally dependent on realtor expertise. The research involved developing a model that combines ViTMAE-extracted visual features from real estate images with traditional property data. Focusing on residential properties in Sweden, the study utilized a dataset of images and metadata from online real estate listings. An adapted ViTMAE model, accessed via the Hugging Face library, was trained on the dataset for feature extraction, which was then integrated with metadata to create a comprehensive multimodal valuation model. Results indicate that including ViTMAE-extracted image features improves prediction accuracy in real estate valuation models. The multimodal approach, merging visual and traditional metadata, improved accuracy over metadata-only models. This thesis contributes to real estate valuation by showcasing the potential of advanced image processing techniques in enhancing valuation models. It lays the groundwork for future research in more refined holistic valuation models, incorporating a wider range of factors beyond visual data. / Detta examensarbete undersöker integrationen av Vision Transformers (ViTs) med Masked Autoencoder pre-training (ViTMAE) i bostadsvärdering, genom att addressera utmaningen att effektivt analysera visuell information från bostadsannonser. Denna integration syftar till att förbättra noggrannheten och effektiviteten i fastighetsvärdering, en uppgift som traditionellt är beroende av en fysisk besiktning av mäklare. Arbetet innefattade utvecklingen av en modell som kombinerar bildinformation extraherad med ViTMAE från fastighetsbilder med traditionella fastighetsdata. Med fokus på bostadsfastigheter i Sverige använde studien en databas med bilder och metadata från bostadsannonser. Den anpassade ViTMAE-modellen, tillgänglig via Hugging Face-biblioteket, tränades på denna databas för extraktion av bildinformation, som sedan integrerades med metadata för att skapa en omfattande värderingsmodell. Resultaten indikerar att inklusion av ViTMAE-extraherad bildinformation förbättrar noggranheten av bostadssvärderingsmodeller. Den multimodala metoden, som kombinerar visuell och traditionell metadata, visade en förbättring i noggrannhet jämfört med modeller som endast använder metadata. Denna uppsats bidrar till bostadsvärdering genom att visa på potentialen hos avancerade bildanalys för att förbättra värderingsmodeller. Den lägger grunden för framtida forskning i mer raffinerade holistiska värderingsmodeller som inkluderar ett bredare spektrum av faktorer utöver visuell data. Computer Vision Transformer Self-supervised Learning Masked Autoencoder Automated Real Estate Appraisal Datorseende Transformer Self-supervised Learning Masked Autoencoder Automatiserad Bostadsvärdering Computer and Information Sciences Data- och informationsvetenskap
4	Supervision Beyond Manual Annotations for Learning Visual Representations Doersch, Carl 01 April 2016 (has links) For both humans and machines, understanding the visual world requires relating new percepts with past experience. We argue that a good visual representation for an image should encode what makes it similar to other images, enabling the recall of associated experiences. Current machine implementations of visual representations can capture some aspects of similarity, but fall far short of human ability overall. Even if one explicitly labels objects in millions of images to tell the computer what should be considered similar—a very expensive procedure—the labels still do not capture everything that might be relevant. This thesis shows that one can often train a representation which captures similarity beyond what is labeled in a given dataset. That means we can begin with a dataset that has uninteresting labels, or no labels at all, and still build a useful representation. To do this, we propose to using pretext tasks: tasks that are not useful in and of themselves, but serve as an excuse to learn a more general-purpose representation. The labels for a pretext task can be inexpensive or even free. Furthermore, since this approach assumes training labels differ from the desired outputs, it can handle output spaces where the correct answer is ambiguous, and therefore impossible to annotate by hand. The thesis explores two broad classes of supervision. The first isweak image-level supervision, which is exploited to train mid-level discriminative patch classifiers. For example, given a dataset of street-level imagery labeled only with GPS coordinates, patch classifiers are trained to differentiate one specific geographical region (e.g. the city of Paris) from others. The resulting classifiers each automatically collect and associate a set of patches which all depict the same distinctive architectural element. In this way, we can learn to detect elements like balconies, signs, and lamps without annotations. The second type of supervision requires no information about images other than the pixels themselves. Instead, the algorithm is trained to predict the context around image patches. The context serves as a sort of weak label: to predict well, the algorithm must associate similar-looking patches which also have similar contexts. After training, the feature representation learned using this within-image context indeed captures visual similarity across images, which ultimately makes it useful for real tasks like object detection and geometry estimation. pretext tasks self-supervised learning computer vision unsupervised learning weakly-supervised learning context
5	Robots that Anticipate Pain: Anticipating Physical Perturbations from Visual Cues through Deep Predictive Models January 2017 (has links) abstract: To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge about sources of perturbation to be encoded before deployment, our method is based on experiential learning. Robots learn to associate visual cues with subsequent physical perturbations and contacts. In turn, these extracted visual cues are then used to predict potential future perturbations acting on the robot. To this end, we introduce a novel deep network architecture which combines multiple sub- networks for dealing with robot dynamics and perceptual input from the environment. We present a self-supervised approach for training the system that does not require any labeling of training data. Extensive experiments in a human-robot interaction task show that a robot can learn to predict physical contact by a human interaction partner without any prior information or labeling. Furthermore, the network is able to successfully predict physical contact from either depth stream input or traditional video input or using both modalities as input. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Robotics Artificial intelligence Anticipating Physical Perturbations Deep Learning Safe Robotics Self-Supervised Learning
6	An Approach to Self-Supervised Object Localisation through Deep Learning Based Classification Politov, Andrei 28 December 2021 (has links) Deep learning has become ubiquitous in science and industry for classifying images or identifying patterns in data. The most widely used approach to training convolutional neural networks is supervised learning, which requires a large set of annotated data. To elude the high cost of collecting and annotating datasets, selfsupervised learning methods represent a promising way to learn the common functions of images and videos from large-scale unlabeled data without using humanannotated labels. This thesis provides the results of using self-supervised learning and explainable AI to localise objects in images from electron microscopes. The work used a synthetic geometric dataset and a synthetic pollen dataset. The classification was used as a pretext task. Different methods of explainable AI were applied: Grad-CAM and backpropagation-based approaches showed the lack of prospects; at the same time, the Extremal Perturbation function has shown efficiency. As a result of the downstream localisation task, the objects of interest were detected with competitive accuracy for one-class images. The advantages and limitations of the approach have been analysed. Directions for further work are proposed. info:eu-repo/classification/ddc/004 ddc:004
7	Time-domain Deep Neural Networks for Speech Separation Sun, Tao 24 May 2022 (has links) No description available. Computer Science Speech Separation Deep Neural Networks Self-supervised Learning Speech Enhancement Speaker Separation
8	Towards label-efficient deep learning for medical image analysis Sun, Li 11 September 2024 (has links) Deep learning methods have achieved state-of-the-art performance in various tasks of medical image analysis. However, the success relies heavily on the expensive and time-consuming collection of large quantities of labeled data, which is not always available. This dissertation investigates the use of self-supervised and generative methods to enhance the label efficiency of deep learning models for 3D medical image analysis. Unlike natural images, medical images contain consistent anatomical contexts specific to the domain, which can be exploited as self-supervision signals to pre-train the model. Furthermore, generative methods can be utilized to synthesize additional samples, thereby increasing sample diversity. In the first part of the dissertation, we introduce self-supervised learning frameworks that learn anatomy-aware and disease-related representation. In order to learn disease-related representation, we propose two domain-specific contrasting strategies that leverage anatomical similarity across patients to create hard negative samples that incentivize learning fine-grained pathological features. In order to learn anatomy-sensitive representation, we develop a novel 3D convolutional layer with kernels that are conditionally parameterized based on the anatomical locations. We perform extensive experiments on large-scale datasets of CT scans, which show that our method improves the performance of many downstream tasks. In the second part of the dissertation, we introduce generative models capable of synthesizing high-resolution, anatomy-guided 3D medical images. Current generative models are typically limited to low-resolution outputs due to memory constraints, despite clinicians' need for high-resolution details in diagnoses. To overcome this, we present a hierarchical architecture that efficiently manages memory demands, enabling the generation of high-resolution images. In addition, diffusion-based generative models are becoming more prevalent in medical imaging. However, existing state-of-the-art methods often under-utilize the extensive information found in radiology reports and anatomical structures. To address these limitations, we propose a text-guided 3D image diffusion model that preserves anatomical details. We conduct experiments on downstream tasks and blind evaluation by radiologists, which demonstrate the clinical value of our proposed methodologies. Artificial intelligence 3D imaging Anatomy-aware Computed tomography Generative models Medical image analysis Self-supervised learning
9	Online Unsupervised Domain Adaptation / Online-övervakad domänanpassning Panagiotakopoulos, Theodoros January 2022 (has links) Deep Learning models have seen great application in demanding tasks such as machine translation and autonomous driving. However, building such models has proved challenging, both from a computational perspective and due to the requirement of a plethora of annotated data. Moreover, when challenged on new situations or data distributions (target domain), those models may perform inadequately. Such examples are transitioning from one city to another, different weather situations, or changes in sunlight. Unsupervised Domain adaptation (UDA) exploits unlabelled data (easy access) to adapt models to new conditions or data distributions. Inspired by the fact that environmental changes happen gradually, we focus on Online UDA. Instead of directly adjusting a model to a demanding condition, we constantly perform minor adaptions to every slight change in the data, creating a soft transition from the current domain to the target one. To perform gradual adaptation, we utilized state-of-the-art semantic segmentation approaches on increasing rain intensities (25, 50, 75, 100, and 200mm of rain). We demonstrate that deep learning models can adapt substantially better to hard domains when exploiting intermediate ones. Moreover, we introduce a model switching mechanism that allows adjusting back to the source domain, after adaptation, without dropping performance. / Deep Learning-modeller har sett stor tillämpning i krävande uppgifter som maskinöversättning och autonom körning. Att bygga sådana modeller har dock visat sig vara utmanande, både ur ett beräkningsperspektiv och på grund av kravet på en uppsjö av kommenterade data. Dessutom, när de utmanas i nya situationer eller datadistributioner (måldomän), kan dessa modeller prestera otillräckligt. Sådana exempel är övergång från en stad till en annan, olika vädersituationer eller förändringar i solljus. Unsupervised Domain adaptation (UDA) utnyttjar omärkt data (enkel åtkomst) för att anpassa modeller till nya förhållanden eller datadistributioner. Inspirerade av att miljöförändringar sker gradvis, fokuserar vi på Online UDA. Istället för att direkt anpassa en modell till ett krävande tillstånd, gör vi ständigt mindre anpassningar till varje liten förändring i data, vilket skapar en mjuk övergång från den aktuella domänen till måldomänen. För att utföra gradvis anpassning använde vi toppmoderna semantiska segmenteringsmetoder för att öka regnintensiteten (25, 50, 75, 100 och 200 mm regn). Vi visar att modeller för djupinlärning kan anpassa sig betydligt bättre till hårda domäner när man utnyttjar mellanliggande. Dessutom introducerar vi en modellväxlingsmekanism som tillåter justering tillbaka till källdomänen, efter anpassning, utan att tappa prestanda. Unsupervised Domain Adaptation Continual Learning Curriculum Learning Clear2Rain Self-Supervised Learning Semantic Segmentation Transfer Learning Online Learning Unsupervised Domain Adaptation Kontinuerligt lärande Curriculum Learning Clear2Rain Self-Supervised Learning Semantisk Segmentering Transfer Learning Online Learning Computer and Information Sciences Data- och informationsvetenskap
10	Self-supervised Representation Learning via Image Out-painting for Medical Image Analysis January 2020 (has links) abstract: In recent years, Convolutional Neural Networks (CNNs) have been widely used in not only the computer vision community but also within the medical imaging community. Specifically, the use of pre-trained CNNs on large-scale datasets (e.g., ImageNet) via transfer learning for a variety of medical imaging applications, has become the de facto standard within both communities. However, to fit the current paradigm, 3D imaging tasks have to be reformulated and solved in 2D, losing rich 3D contextual information. Moreover, pre-trained models on natural images never see any biomedical images and do not have knowledge about anatomical structures present in medical images. To overcome the above limitations, this thesis proposes an image out-painting self-supervised proxy task to develop pre-trained models directly from medical images without utilizing systematic annotations. The idea is to randomly mask an image and train the model to predict the missing region. It is demonstrated that by predicting missing anatomical structures when seeing only parts of the image, the model will learn generic representation yielding better performance on various medical imaging applications via transfer learning. The extensive experiments demonstrate that the proposed proxy task outperforms training from scratch in six out of seven medical imaging applications covering 2D and 3D classification and segmentation. Moreover, image out-painting proxy task offers competitive performance to state-of-the-art models pre-trained on ImageNet and other self-supervised baselines such as in-painting. Owing to its outstanding performance, out-painting is utilized as one of the self-supervised proxy tasks to provide generic 3D pre-trained models for medical image analysis. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Computer science Bioinformatics 3D Deep learning Computer Vision Medical Image Analysis Representation Learning Self-supervised learning Transfer Learning

Search results