• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Segmentation Network with a Class-Agnostic Loss Function for Training on Incomplete Data / Ett segmenteringsnätverk med en klass-agnostisk förlustfunktion för att träna på inkomplett data

Norman, Gabriella January 2020 (has links)
The use of deep learning methods is increasing in medical image analysis, e.g., segmentation of organs in medical images. Deep learning methods are highly dependent on a large amount of training data, a common obstacle for medical image analysis. This master thesis proposes a class-agnostic loss function as a method to train on incomplete data. The project used CT images from 1587 breast cancer patients, with a variety of available segmentation masks for each patient. The class-agnostic loss function is given labels for each class for each sample, in this project, for each segmentation mask for each CT slice. The label tells the loss function if the mask is an actual mask or just a placeholder. If it is a placeholder, the comparison of the predicted mask and the placeholder will not contribute to the loss value. The results show that it was possible to use the class-agnostic loss function to train a segmentation model with eight output masks, with data that never had all eight masks present at the same time and gain approximately the same performance as single-mask models.
2

Toward Robust Class-Agnostic Object Counting

Jiban, Md Jibanul Haque 01 January 2024 (has links) (PDF)
Object counting is a process of determining the quantity of specific objects in images. Accurate object counting is key for various applications in image understanding. The common applications are traffic monitoring, crowd management, wildlife migration monitoring, cell counting in medical images, plant and insect counting in agriculture, etc. Occlusions, complex backgrounds, changes in scale, and variations in object appearance in real-world settings make object counting challenging. This dissertation explores a progression of techniques to achieve robust localization and counting under diverse image modalities. The exploration initiates with addressing the challenges of vehicular target localization in cluttered environments using infrared (IR) imagery. We propose a network, called TCRNet-2, that processes target and clutter information in two parallel channels and then combines them to optimize the target-to-clutter ratio (TCR) metric. Next, we explore class-agnostic object counting in RGB images using vision transformers. The primary motivation for this work is that most current methods excel at counting known object types but struggle with unseen categories. To solve these drawbacks, we propose a class-agnostic object counting method. We introduce a dual-branch architecture with interconnected cross-attention that generates feature pyramids for robust object representations, and a dedicated feature aggregator module that further improves performance. Finally, we propose a novel framework that leverages vision-language models (VLM) for zero-shot object counting. While our earlier class-agnostic counting method demonstrates high efficacy in generalized counting tasks, it relies on user-defined exemplars of target objects, presenting a limitation. Additionally, the previous zero-shot counting method was a reference-less approach, which limits the ability to control the selection of the target object of interest in multi-class scenarios. To address these shortcomings, we propose to utilize vision-language models for zero-shot counting where object categories of interest can be specified by text prompts.

Page generated in 0.0504 seconds