Global ETD Search

1	Toward Robust Class-Agnostic Object Counting Jiban, Md Jibanul Haque 01 January 2024 (has links) (PDF) Object counting is a process of determining the quantity of specific objects in images. Accurate object counting is key for various applications in image understanding. The common applications are traffic monitoring, crowd management, wildlife migration monitoring, cell counting in medical images, plant and insect counting in agriculture, etc. Occlusions, complex backgrounds, changes in scale, and variations in object appearance in real-world settings make object counting challenging. This dissertation explores a progression of techniques to achieve robust localization and counting under diverse image modalities. The exploration initiates with addressing the challenges of vehicular target localization in cluttered environments using infrared (IR) imagery. We propose a network, called TCRNet-2, that processes target and clutter information in two parallel channels and then combines them to optimize the target-to-clutter ratio (TCR) metric. Next, we explore class-agnostic object counting in RGB images using vision transformers. The primary motivation for this work is that most current methods excel at counting known object types but struggle with unseen categories. To solve these drawbacks, we propose a class-agnostic object counting method. We introduce a dual-branch architecture with interconnected cross-attention that generates feature pyramids for robust object representations, and a dedicated feature aggregator module that further improves performance. Finally, we propose a novel framework that leverages vision-language models (VLM) for zero-shot object counting. While our earlier class-agnostic counting method demonstrates high efficacy in generalized counting tasks, it relies on user-defined exemplars of target objects, presenting a limitation. Additionally, the previous zero-shot counting method was a reference-less approach, which limits the ability to control the selection of the target object of interest in multi-class scenarios. To address these shortcomings, we propose to utilize vision-language models for zero-shot counting where object categories of interest can be specified by text prompts. Object counting class-agnostic vision transformer vision-language infrared target detection

Search results

Toward Robust Class-Agnostic Object Counting