Global ETD Search

1	Title-based video summarization using attention networks Li, Changwei 23 August 2022 (has links) No description available. Electrical Engineering Supervised video summarization Key-frame extraction Text-visual cross-attention Key-shot extraction Query based Summarization Self-Attention
2	Deep Brain Dynamics and Images Mining for Tumor Detection and Precision Medicine Lakshmi Ramesh (16637316) 30 August 2023 (has links) <p>Automatic brain tumor segmentation in Magnetic Resonance Imaging scans is essential for the diagnosis, treatment, and surgery of cancerous tumors. However, identifying the hardly detectable tumors poses a considerable challenge, which are usually of different sizes, irregular shapes, and vague invasion areas. Current advancements have not yet fully leveraged the dynamics in the multiple modalities of MRI, since they usually treat multi-modality as multi-channel, and the early channel merging may not fully reveal inter-modal couplings and complementary patterns. In this thesis, we propose a novel deep cross-attention learning algorithm that maximizes the subtle dynamics mining from each of the input modalities and then boosts feature fusion capability. More specifically, we have designed a Multimodal Cross-Attention Module (MM-CAM), equipped with a 3D Multimodal Feature Rectification and Feature Fusion Module. Extensive experiments have shown that the proposed novel deep learning architecture, empowered by the innovative MM- CAM, produces higher-quality segmentation masks of the tumor subregions. Further, we have enhanced the algorithm with image matting refinement techniques. We propose to integrate a Progressive Refinement Module (PRM) and perform Cross-Subregion Refinement (CSR) for the precise identification of tumor boundaries. A Multiscale Dice Loss was also successfully employed to enforce additional supervision for the auxiliary segmentation outputs. This enhancement will facilitate effectively matting-based refinement for medical image segmentation applications. Overall, this thesis, with deep learning, transformer-empowered pattern mining, and sophisticated architecture designs, will greatly advance deep brain dynamics and images mining for tumor detection and precision medicine.</p> Computer vision Multimodal analysis and synthesis Deep learning Neural networks Semantic Segmentation Brain Tumor Segmentation Deep Learning Computer Vision Multimodal ML 3D Computer Vision Attention Cross-Attention Biomedical Segmentation
3	Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioning Huang, Gabriel 09 1900 (has links) Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies. self-supervised learning few-shot classification few-shot object detection low-data learning object detection instance segmentation representation learning residual network visual transformer Faster R-CNN DETR parametric adversarial divergence generative adversarial network variational auto-encoder maximum-likelihood structured prediction optimal discriminator mutual information implicit generative model multimodal pretraining dense video captioning cross-attention YouCook2 HowTo-100M Youtube-8M Recipe-1M Pascal VOC MSCOCO LVIS mutual information neural estimation apprentissage auto-supervisé classification few-shot détection d'objets few-shot apprentissage efficace en données segmentation en instances apprentissage de représentation réseau résiduel transformer visual divergences antagonistes paramétriques auto-encodeur variationnel maximum de vraisemblance prédiction structurée discriminateur optimal information mutuelle modèle génératif implicite pré-apprentissage multi-modal description dense de vidéo attention croisée ResNet ViT GAN VAE MINE

1

Page generated in 0.0754 seconds