Spelling suggestions: "subject:"selfsupervised"" "subject:"semisupervised""
21 |
Exploring adaptation of self-supervised representation learning to histopathology images for liver cancer detectionJonsson, Markus January 2024 (has links)
This thesis explores adapting self-supervised representation learning to visual domains beyond natural scenes, focusing on medical imaging. The research addresses the central question: “How can self-supervised representation learning be specifically adapted for detecting liver cancer in histopathology images?” The study utilizes the PAIP 2019 dataset for liver cancer segmentation and employs a self-supervised approach based on the VICReg method. The evaluation results demonstrated that the ImageNet-pretrained model achieved superior performance on the test set, with a clipped Jaccard index of 0.7747 at a threshold of 0.65. The VICReg-pretrained model followed closely with a score of 0.7461, while the model initialized with random weights trailed behind at 0.5420. These findings indicate that while ImageNet-pretrained models outperformed VICReg-pretrained models, the latter still captured essential data characteristics, suggesting the potential of self-supervised learning in diverse visual domains. The research attempts to contribute to advancing self-supervised learning in non-natural scenes and provides insights into model pretraining strategies.
|
22 |
Self-Supervised Representation Learning for Early Breast Cancer Detection in Mammographic ImagingKristofer, Ågren January 2024 (has links)
The proposed master's thesis investigates the adaptability and efficacy of self-supervised representation learning (SSL) in medical image analysis, focusing on Mammographic Imaging to develop robust representation learning models. This research will build upon existing studies in Mammographic Imaging that have utilized contrastive learning and knowledge distillation-based self-supervised methods, focusing on SimCLR (Chen et al 2020) and SimSiam (Chen et al 2020) and evaluate approaches to increase the classification performance in a transfer learning setting. The thesis will critically evaluate and integrate recent advancements in these SSL paradigms (Chhipa 2023, chapter 2), and incorporating additional SSL approaches. The core objective is to enhance robust generalization and label efficiency in medical imaging analysis, contributing to the broader field of AI-driven diagnostic methodologies. The proposed master's thesis will not only extend the current understanding of SSL in medical imaging but also aims to provide actionable insights that could be instrumental in enhancing breast cancer detection methodologies, thereby contributing significantly to the field of medical imaging and cancer research.
|
23 |
Towards Label Efficiency and Privacy Preservation in Video UnderstandingDave, Ishan Rajendrakumar 01 January 2024 (has links) (PDF)
Video understanding involves tasks like action recognition, video retrieval, human pose propagation which are essential for applications such as surveillance, surgical videos, sports analysis, and content recommendation. The progress in this domain has been largely driven by advancements in deep learning, facilitated by large-scale labeled datasets. However, video annotation presents significant challenges due to its time-consuming and expensive nature. This limitation underscores the importance of developing methods that can learn effectively from unlabeled or limited-labeled data, which makes self-supervised learning (SSL) and semi-supervised learning particularly relevant for video understanding. Another significant challenge in video understanding is privacy preservation, as methods often inadvertently leak private information, presenting a growing concern in the field. In this dissertation, we present methods to improve the label efficiency of deep video models by employing self-supervised and semi-supervised methods, and a self-supervised method designed to mitigate privacy leakage in action recognition task. Our first contribution is the Temporal Contrastive Learning framework for Video Representation (TCLR). Unlike prior contrastive self-supervised learning methods which aim to learn temporal similarity between different clips of the same video, TCLR encourages the learning differences rather than similarities in clips from the same video. TCLR consists of two novel losses to improve upon existing contrastive self-supervised video representations, contrasting temporal segments of the same video at two different temporal aggregation steps: clip level and temporal pooling level. Although TCLR offers an effective solution for video-level downstream tasks, it does not encourage framewise video representation for addressing low-level temporal correspondence-based downstream tasks. To promote a more effective framewise video representation, we first eliminate learning shortcuts present in existing temporal pretext tasks by introducing framewise spatial jittering and proposing more challenging frame-level temporal pretext tasks. Our approach "No More Shortcuts"(NMS) results in state-of-the-art performance across a wide range of downstream tasks, encompassing both high-level semantic and low-level temporal correspondence tasks. While the VideoSSL approaches, TCLR and NMS, focus only on learning from unlabeled videos, in practice, some labeled data often exists. Our next focus is on semi-supervised action recognition, where we have a small set of labeled videos with a large pool of unlabeled videos. Using the observations from the self-supervised representations, we leverage the unlabeled videos using the complementary strengths of temporally-invariant and temporally-distinctive contrastive self-supervised video representations. Our proposed semi-supervised method "TimeBalance" introduces a student-teacher framework that dynamically combines the knowledge of two self-supervised teachers based on the nature of the unlabeled video using the proposed reweighting strategy. Although TimeBalance performs well for coarse-grained actions, it struggles with fine-grained actions. To address this, we propose "FinePseudo" framework, which leverages temporal alignability to learn phase-aware distances. It also introduces collaborative pseudo-labeling between video-level and alignability encoder, refining the pseudo-labeling process for fine-grained actions. Although the above mentioned video representations are useful for various downstream applications, they often leak a considerable amount of private information present in the videos. To mitigate the privacy leaks in videos, we propose SPAct, a self-supervised framework that removes private information from input videos without requiring privacy labels. SPAct exhibits competitive performance compared to supervised methods and introduces new evaluation protocols to assess the generalization capability of the anonymization across novel action and privacy attributes. Overall, this dissertation contributes to the advancement of label-efficient and privacy-preserving video understanding by exploring novel self-supervised and semi-supervised learning approaches and their applications in privacy-preserving action recognition.
|
24 |
Online Unsupervised Domain Adaptation / Online-övervakad domänanpassningPanagiotakopoulos, Theodoros January 2022 (has links)
Deep Learning models have seen great application in demanding tasks such as machine translation and autonomous driving. However, building such models has proved challenging, both from a computational perspective and due to the requirement of a plethora of annotated data. Moreover, when challenged on new situations or data distributions (target domain), those models may perform inadequately. Such examples are transitioning from one city to another, different weather situations, or changes in sunlight. Unsupervised Domain adaptation (UDA) exploits unlabelled data (easy access) to adapt models to new conditions or data distributions. Inspired by the fact that environmental changes happen gradually, we focus on Online UDA. Instead of directly adjusting a model to a demanding condition, we constantly perform minor adaptions to every slight change in the data, creating a soft transition from the current domain to the target one. To perform gradual adaptation, we utilized state-of-the-art semantic segmentation approaches on increasing rain intensities (25, 50, 75, 100, and 200mm of rain). We demonstrate that deep learning models can adapt substantially better to hard domains when exploiting intermediate ones. Moreover, we introduce a model switching mechanism that allows adjusting back to the source domain, after adaptation, without dropping performance. / Deep Learning-modeller har sett stor tillämpning i krävande uppgifter som maskinöversättning och autonom körning. Att bygga sådana modeller har dock visat sig vara utmanande, både ur ett beräkningsperspektiv och på grund av kravet på en uppsjö av kommenterade data. Dessutom, när de utmanas i nya situationer eller datadistributioner (måldomän), kan dessa modeller prestera otillräckligt. Sådana exempel är övergång från en stad till en annan, olika vädersituationer eller förändringar i solljus. Unsupervised Domain adaptation (UDA) utnyttjar omärkt data (enkel åtkomst) för att anpassa modeller till nya förhållanden eller datadistributioner. Inspirerade av att miljöförändringar sker gradvis, fokuserar vi på Online UDA. Istället för att direkt anpassa en modell till ett krävande tillstånd, gör vi ständigt mindre anpassningar till varje liten förändring i data, vilket skapar en mjuk övergång från den aktuella domänen till måldomänen. För att utföra gradvis anpassning använde vi toppmoderna semantiska segmenteringsmetoder för att öka regnintensiteten (25, 50, 75, 100 och 200 mm regn). Vi visar att modeller för djupinlärning kan anpassa sig betydligt bättre till hårda domäner när man utnyttjar mellanliggande. Dessutom introducerar vi en modellväxlingsmekanism som tillåter justering tillbaka till källdomänen, efter anpassning, utan att tappa prestanda.
|
25 |
Self-supervised Representation Learning via Image Out-painting for Medical Image AnalysisJanuary 2020 (has links)
abstract: In recent years, Convolutional Neural Networks (CNNs) have been widely used in not only the computer vision community but also within the medical imaging community. Specifically, the use of pre-trained CNNs on large-scale datasets (e.g., ImageNet) via transfer learning for a variety of medical imaging applications, has become the de facto standard within both communities.
However, to fit the current paradigm, 3D imaging tasks have to be reformulated and solved in 2D, losing rich 3D contextual information. Moreover, pre-trained models on natural images never see any biomedical images and do not have knowledge about anatomical structures present in medical images. To overcome the above limitations, this thesis proposes an image out-painting self-supervised proxy task to develop pre-trained models directly from medical images without utilizing systematic annotations. The idea is to randomly mask an image and train the model to predict the missing region. It is demonstrated that by predicting missing anatomical structures when seeing only parts of the image, the model will learn generic representation yielding better performance on various medical imaging applications via transfer learning.
The extensive experiments demonstrate that the proposed proxy task outperforms training from scratch in six out of seven medical imaging applications covering 2D and 3D classification and segmentation. Moreover, image out-painting proxy task offers competitive performance to state-of-the-art models pre-trained on ImageNet and other self-supervised baselines such as in-painting. Owing to its outstanding performance, out-painting is utilized as one of the self-supervised proxy tasks to provide generic 3D pre-trained models for medical image analysis. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
26 |
Applicability of Detection Transformers in Resource-Constrained Environments : Investigating Detection Transformer Performance Under Computational Limitations and Scarcity of Annotated DataSenel, Altan January 2023 (has links)
Object detection is a fundamental task in computer vision, with significant applications in various domains. However, the reliance on large-scale annotated data and computational resource demands poses challenges in practical implementation. This thesis aims to address these complexities by exploring self-supervised training approaches for the detection transformer(DETR) family of object detectors. The project investigates the necessity of training the backbone under a semi-supervised setting and explores the benefits of initializing scene graph generation architectures with pretrained DETReg and DETR models for faster training convergence and reduced computational resource requirements. The significance of this research lies in the potential to mitigate the dependence on annotated data and make deep learning techniques more accessible to researchers and practitioners. By overcoming the limitations of data and computational resources, this thesis contributes to the accessibility of DETR and encourages a more sustainable and inclusive approach to deep learning research. / Objektigenkänning är en grundläggande uppgift inom datorseende, med betydande tillämpningar inom olika domäner. Dock skapar beroendet av storskaliga annoterade data och krav på datorkraft utmaningar i praktisk implementering. Denna avhandling syftar till att ta itu med dessa komplexiteter genom att utforska självövervakade utbildningsmetoder för detektions transformer (DETR) familjen av objektdetektorer. Projektet undersöker nödvändigheten av att träna ryggraden under en semi-övervakad inställning och utforskar fördelarna med att initiera scenegrafgenereringsarkitekturer med förtränade DETReg-modeller för snabbare konvergens av träning och minskade krav på datorkraft. Betydelsen av denna forskning ligger i potentialen att mildra beroendet av annoterade data och göra djupinlärningstekniker mer tillgängliga för forskare och utövare. Genom att övervinna begränsningarna av data och datorkraft, bidrar denna avhandling till tillgängligheten av DETR och uppmuntrar till en mer hållbar och inkluderande inställning till djupinlärning forskning.
|
27 |
Structural Self-Supervised Objectives for TransformersDi Liello, Luca 21 September 2023 (has links)
In this Thesis, we leverage unsupervised raw data to develop more efficient pre-training objectives and self-supervised tasks that align well with downstream applications.
In the first part, we present three alternative objectives to BERT’s Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution C-RTS, and Swapped Language Modeling (SLM). Unlike MLM, all of these proposals involve token swapping rather than replacing tokens with BERT’s [MASK]. RTS and C-RTS involve pre- dicting the originality of tokens, while SLM tasks the model at predicting the original token values. Each objective is applied to several models, which are trained using the same computational budget and corpora. Evaluation results reveal RTS and C-RTS require up to 45% less pre-training time while achieving performance on par with MLM. Notably, SLM outperforms MLM on several Answer Sentence Selection and GLUE tasks, despite utilizing the same computational budget for pre-training.
In the second part of the Thesis, we propose self-supervised pre-training tasks that exhibit structural alignment with downstream applications, leading to improved performance and reduced reliance on labeled data to achieve comparable results. We exploit the weak supervision provided by large corpora like Wikipedia and CC-News, challenging the model to recognize whether spans of text originate from the same paragraph or document. To this end, we design (i) a pre-training objective that targets multi-sentence inference models by performing predictions over multiple spans of texts simultaneously, (ii) self-supervised objectives tailored to enhance performance in Answer Sentence Selection and its Contextual version, and (iii) a pre-training objective aimed at performance improvements in Summarization.
Through continuous pre-training, starting from renowned checkpoints such as RoBERTa, ELEC- TRA, DeBERTa, BART, and T5, we demonstrate that our models achieve higher performance on Fact Verification, Answer Sentence Selection, and Summarization. We extensively evaluate our proposals on different benchmarks, revealing significant accuracy gains, particularly when annotation in the target dataset is limited. Notably, we achieve state-of-the-art results on the development set of the FEVER dataset and results close to state-of-the-art models using much more parameters on the test set. Furthermore, our objectives enable us to attain state-of-the-art results on ASNQ, WikiQA, and TREC-QA test sets, across all evaluation metrics (MAP, MRR, and P@1). For Summarization, our objective enhances summary quality, as measured by various metrics like ROUGE and BLEURT. We maintain that our proposals can be seamlessly combined with other techniques from recently proposed works, as they do not require alterations to the internal structure of Transformer models but only involve modifications to the training tasks.
|
28 |
[en] VISION TRANSFORMERS AND MASKED AUTOENCONDERS FOR SEISMIC FACEIS SEGMENTATION / [pt] VISION TRANSFORMERS E MASKED AUTOENCONDERS PARA SEGMENTAÇÃO DE FÁCIES SÍSMICASDANIEL CESAR BOSCO DE MIRANDA 12 January 2024 (has links)
[pt] O desenvolvimento de técnicas de aprendizado auto-supervisionado vem
ganhando muita visibilidade na área de Visão Computacional pois possibilita
o pré-treinamento de redes neurais profundas sem a necessidade de dados anotados. Em alguns domínios, as anotações são custosas, pois demandam muito
trabalho especializado para a rotulação dos dados. Esse problema é muito
comum no setor de Óleo e Gás, onde existe um vasto volume de dados não
interpretados. O presente trabalho visa aplicar a técnica de aprendizado auto-supervisionado denominada Masked Autoencoders para pré-treinar modelos Vision Transformers com dados sísmicos. Para avaliar o pré-treino, foi aplicada a
técnica de transfer learning para o problema de segmentação de fácies sísmicas.
Na fase de pré-treinamento foram empregados quatro volumes sísmicos distintos. Já para a segmentação foi utilizado o dataset Facies-Mark e escolhido o
modelo da literatura Segmentation Transformers. Para avaliação e comparação
da performance da metodologia foram empregadas as métricas de segmentação
utilizadas pelo trabalho de benchmarking de ALAUDAH (2019). As métricas
obtidas no presente trabalho mostraram um resultado superior. Para a métrica
frequency weighted intersection over union, por exemplo, obtivemos um ganho
de 7.45 por cento em relação ao trabalho de referência. Os resultados indicam que a
metodologia é promissora para melhorias de problemas de visão computacional
em dados sísmicos. / [en] The development of self-supervised learning techniques has gained a lot
of visibility in the field of Computer Vision as it allows the pre-training of
deep neural networks without the need for annotated data. In some domains,
annotations are costly, as they require a lot of specialized work to label the
data. This problem is very common in the Oil and Gas sector, where there is
a vast amount of uninterpreted data. The present work aims to apply the self-supervised learning technique called Masked Autoencoders to pre-train Vision
Transformers models with seismic data. To evaluate the pre-training, transfer
learning was applied to the seismic facies segmentation problem. In the pre-training phase, four different seismic volumes were used. For the segmentation,
the Facies-Mark dataset was used and the Segmentation Transformers model
was chosen from the literature. To evaluate and compare the performance of
the methodology, the segmentation metrics used by the benchmarking work
of ALAUDAH (2019) were used. The metrics obtained in the present work
showed a superior result. For the frequency weighted intersection over union
(FWIU) metric, for example, we obtained a gain of 7.45 percent in relation to the
reference work. The results indicate that the methodology is promising for
improving computer vision problems in seismic data.
|
29 |
Multimodal Representation Learning for Textual Reasoning over Knowledge GraphsChoudhary, Nurendra 18 May 2023 (has links)
Knowledge graphs (KGs) store relational information in a flexible triplet schema and have become ubiquitous for information storage in domains such as web search, e-commerce, social networks, and biology. Retrieval of information from KGs is generally achieved through logical reasoning, but this process can be computationally expensive and has limited performance due to the large size and complexity of relationships within the KGs. Furthermore, to extend the usage of KGs to non-expert users, retrieval over them cannot solely rely on logical reasoning but also needs to consider text-based search. This creates a need for multi-modal representations that capture both the semantic and structural features from the KGs.
The primary objective of the proposed work is to extend the accessibility of KGs to non-expert users/institutions by enabling them to utilize non-technical textual queries to search over the vast amount of information stored in KGs. To achieve this objective, the research aims to solve four limitations: (i) develop a framework for logical reasoning over KGs that can learn representations to capture hierarchical dependencies between entities, (ii) design an architecture that can effectively learn the logic flow of queries from natural language text, (iii) create a multi-modal architecture that can capture inherent semantic and structural features from the entities and KGs, respectively, and (iv) introduce a novel hyperbolic learning framework to enable the scalability of hyperbolic neural networks over large graphs using meta-learning.
The proposed work is distinct from current research because it models the logical flow of textual queries in hyperbolic space and uses it to perform complex reasoning over large KGs. The models developed in this work are evaluated on both the standard research setting of logical reasoning, as well as, real-world scenarios of query matching and search, specifically, in the e-commerce domain.
In summary, the proposed work aims to extend the accessibility of KGs to non-expert users by enabling them to use non-technical textual queries to search vast amounts of information stored in KGs. To achieve this objective, the work proposes the use of multi-modal representations that capture both semantic and structural features from the KGs, and a novel hyperbolic learning framework to enable scalability of hyperbolic neural networks over large graphs. The work also models the logical flow of textual queries in hyperbolic space to perform complex reasoning over large KGs. The models developed in this work are evaluated on both the standard research setting of logical reasoning and real-world scenarios in the e-commerce domain. / Doctor of Philosophy / Knowledge graphs (KGs) are databases that store information in a way that allows computers to easily identify relationships between different pieces of data. They are widely used in domains such as web search, e-commerce, social networks, and biology. However, retrieving information from KGs can be computationally expensive, and relying solely on logical reasoning can limit their accessibility to non-expert users. This is where the proposed work comes in. The primary objective is to make KGs more accessible to non-experts by enabling them to use natural language queries to search the vast amounts of information stored in KGs. To achieve this objective, the research aims to address four limitations. Firstly, a framework for logical reasoning over KGs that can learn representations to capture hierarchical dependencies between entities is developed. Secondly, an architecture is designed that can effectively learn the logic flow of queries from natural language text. Thirdly, a multi-modal architecture is created that can capture inherent semantic and structural features from the entities and KGs, respectively. Finally, a novel hyperbolic learning framework is introduced to enable the scalability of hyperbolic neural networks over large graphs using meta-learning. The proposed work is unique because it models the logical flow of textual queries in hyperbolic space and uses it to perform complex reasoning over large KGs. The models developed in this work are evaluated on both the standard research setting of logical reasoning, as well as, real-world scenarios of query matching and search, specifically, in the e-commerce domain.
In summary, the proposed work aims to make KGs more accessible to non-experts by enabling them to use natural language queries to search vast amounts of information stored in KGs. To achieve this objective, the work proposes the use of multi-modal representations that capture both semantic and structural features from the KGs, and a novel hyperbolic learning framework to enable scalability of hyperbolic neural networks over large graphs. The work also models the logical flow of textual queries in hyperbolic space to perform complex reasoning over large KGs. The results of this work have significant implications for the field of information retrieval, as it provides a more efficient and accessible way to retrieve information from KGs. Additionally, the multi-modal approach taken in this work has potential applications in other areas of machine learning, such as image recognition and natural language processing. The work also contributes to the development of hyperbolic geometry as a tool for modeling complex networks, which has implications for fields such as network science and social network analysis. Overall, this work represents an important step towards making the vast amounts of information stored in KGs more accessible and useful to a wider audience.
|
30 |
Predicate Calculus for Perception-led AutomataByrne, Thomas J. January 2023 (has links)
Artificial Intelligence is a fuzzy concept. My role, as I see it, is to put
down a working definition, a criterion, and a set of assumptions to set
up equations for a workable methodology. This research introduces the
notion of Artificial Intelligent Agency, denoting the application of Artificial
General Intelligence. The problem being handled by mathematics and
logic, and only thereafter semantics, is Self-Supervised Machine Learning
(SSML) towards Intuitive Vehicle Health Management, in the domain of
cybernetic-physical science.
The present work stems from a broader engagement with a major multinational
automotive OEM, where Intelligent Vehicle Health Management
will dynamically choose suitable variants only to realise predefined variation
points. Physics-based models infer properties of a model of the system,
not properties of the implemented system itself. The validity of their
inference depends on the models’ degree of fidelity, which is always an approximate
localised engineering abstraction. In sum, people are not very
good at establishing causality.
To deduce new truths from implicit patterns in the data about the physical
processes that generate the data, the kernel of this transformative technology
is the intersystem architecture, occurring in-between and involving the physical and engineered system and the construct thereof, through the communication core at their interface. In this thesis it is shown that the
most practicable way to establish causality is by transforming application models into actual implementation. The hypothesis being that the ideal source of training data for SSML, is an isomorphic monoid of indexical facts, trace-preserving events of natural kind.
|
Page generated in 0.1081 seconds