Global ETD Search

1	APPLYING CLIP FOR LAND COVER CLASSIFICATION USING AERIAL AND SATELLITE IMAGERY Kexin Meng (17541795) 04 December 2023 (has links) <p dir="ltr">Land cover classification has always been a crucial topic in the remote sensing domain. Utilizing data collected by unmanned aerial vehicles and satellites, researchers can detect land degradation, monitor environmental changes, and provide insights for urban planning. Recent advancements in large multi-modal models have enabled open-vocabulary classification, which is particularly beneficial in this field. Becuase of the pre-training method, these models can perform zero-shot inference on unseen data, significantly reducing the costs associated with data collection and model training. This open-vocabulary feature of large-scale vision-language pre-training aligns well with the requirements of land cover classification, where benchmark datasets in the remote sensing domain comprise various categories, and transferring results from one dataset to another through supervised learning methods is challenging.</p><p dir="ltr">In this thesis, the author explored the performance of zero-shot CLIP and linear probe CLIP to assess the feasibility of using the CLIP model for land cover classification tasks. Further, the author fine-tuned CLIP by creating hierarchical label sets for the datasets, leading to better zero-shot classification results and improving overall accuracy by 2.5%. Regarding data engineering, the author examined the performance of zero-shot CLIP and linear probe CLIP across different categories and proposed a categorization method for land cover datasets. In summary, this work evaluated CLIP's overall performance on land cover datasets of varying spatial resolutions and proposed a hierarchical classification method to enhance its zero-shot performance. The thesis also offers a practical approach for modifying current dataset categorizations to better align with the model.</p> Computer vision Multimodal analysis and synthesis Land Cover Classification CLIP Deep Learning Vision-language Pre-training
2	Deep Brain Dynamics and Images Mining for Tumor Detection and Precision Medicine Lakshmi Ramesh (16637316) 30 August 2023 (has links) <p>Automatic brain tumor segmentation in Magnetic Resonance Imaging scans is essential for the diagnosis, treatment, and surgery of cancerous tumors. However, identifying the hardly detectable tumors poses a considerable challenge, which are usually of different sizes, irregular shapes, and vague invasion areas. Current advancements have not yet fully leveraged the dynamics in the multiple modalities of MRI, since they usually treat multi-modality as multi-channel, and the early channel merging may not fully reveal inter-modal couplings and complementary patterns. In this thesis, we propose a novel deep cross-attention learning algorithm that maximizes the subtle dynamics mining from each of the input modalities and then boosts feature fusion capability. More specifically, we have designed a Multimodal Cross-Attention Module (MM-CAM), equipped with a 3D Multimodal Feature Rectification and Feature Fusion Module. Extensive experiments have shown that the proposed novel deep learning architecture, empowered by the innovative MM- CAM, produces higher-quality segmentation masks of the tumor subregions. Further, we have enhanced the algorithm with image matting refinement techniques. We propose to integrate a Progressive Refinement Module (PRM) and perform Cross-Subregion Refinement (CSR) for the precise identification of tumor boundaries. A Multiscale Dice Loss was also successfully employed to enforce additional supervision for the auxiliary segmentation outputs. This enhancement will facilitate effectively matting-based refinement for medical image segmentation applications. Overall, this thesis, with deep learning, transformer-empowered pattern mining, and sophisticated architecture designs, will greatly advance deep brain dynamics and images mining for tumor detection and precision medicine.</p> Computer vision Multimodal analysis and synthesis Deep learning Neural networks Semantic Segmentation Brain Tumor Segmentation Deep Learning Computer Vision Multimodal ML 3D Computer Vision Attention Cross-Attention Biomedical Segmentation
3	TEMPORAL DIET AND PHYSICAL ACTIVITY PATTERN ANALYSIS, UNSUPERVISED PERSON RE-IDENTIFICATION, AND PLANT PHENOTYPING Jiaqi Guo (18108289) 06 March 2024 (has links) <p dir="ltr">Both diet and physical activity are known to be risk factors for obesity and chronic diseases such as diabetes and metabolic syndrome. We explore a distance-based approach for clustering daily physical activity time series to find temporal physical activity patterns among U.S. adults (ages 20-65). We further extend this approach to integrate both diet and physical activity, and find joint temporal diet and physical activity patterns. Our experiments indicate that the integration of diet, physical activity, and time has the potential to discover joint patterns with association to health. </p><p dir="ltr">Unsupervised domain adaptive (UDA) person re-identification (re-ID) aims to learn identity information from labeled images in source domains and apply it to unlabeled images in a target domain. We propose a deep learning architecture called Synthesis Model Bank (SMB) to deal with illumination variation in unsupervised person re-ID. From our experiments, the proposed SMB outperforms other synthesis methods on several re-ID benchmarks. </p><p dir="ltr">Recent technology advancement introduced modern high-throughput methodologies such as Unmanned Aerial Vehicles (UAVs) to replace the traditional, labor-intensive phenotyping. For many UAV phenotyping analysis, the first step is to extract the smallest groups of plants called “plots” that have the same genotype. We propose an optimization-based, rotation-adaptive approach for extracting plots in a UAV RGB orthomosaic image. From our experiments, the proposed method achieves better plot extraction accuracy compared to existing approaches, and does not require training data.</p> Computer vision Image processing Multimodal analysis and synthesis Deep learning Neural networks Semi- and unsupervised learning computer vision deep learning physical activity diet time series analysis time series clustering generative model image synthesis diffusion model GAN CUDA segmentation
4	<b>Forensic Analysis of Images and Documents</b> Ruiting Shao (18018187) 23 February 2024 (has links) <p dir="ltr">This thesis involves three topics related to forensic analysis of media data. The first topic is the analysis of images and documents that have been created with a scanner. The goal is to detect and identify scanner model from the scanned images/documents. We propose a deep learning system that can automatically learn the inherent features of the scanned images. This system will produce a scanner model identification and a reliability map for a scanned image. The proposed system has shown promising results in the forensic analysis of scanned images. The second topic is related to forensic integrity of scientific papers. The project is divided into multiple tasks, data collection, image extraction, and manipulation detection. We have constructed a dataset of retracted scientific papers that have been verified to have issues with integrity. We design and maintain a web-based Scientific Integrity System for forensic analysis of the images within scientific publications. The third topic is related to media document analysis. Our goal is to identify the publication style for media document, aiding in the potential document manipulation. We are mainly focusing on image-text consistency check, and synthetic tweets analysis. For image-text inconsistency check, we describe a system that can examine an image in document and the corresponding text caption (or other associated text with the image) to check the image/text consistency. For synthetic tweets analysis, we propose a system to detect and identify the text generation models and paraphrase attack models.</p> Natural language processing Computer vision Image processing Multimodal analysis and synthesis Digital forensics Deep learning scanner detection synthetic text analysis media forensics natural language generation (NLG) author attribution image-text consistency social media misinformations scientific integrity Person of Interest
5	A MULTI-HEAD ATTENTION APPROACH WITH COMPLEMENTARY MULTIMODAL FUSION FOR VEHICLE DETECTION Nujhat Tabassum (18010969) 03 June 2024 (has links) <p dir="ltr">In the realm of autonomous vehicle technology, the Multimodal Vehicle Detection Network (MVDNet) represents a significant leap forward, particularly in the challenging context of weather conditions. This paper focuses on the enhancement of MVDNet through the integration of a multi-head attention layer, aimed at refining its performance. The integrated multi-head attention layer in the MVDNet model is a pivotal modification, advancing the network's ability to process and fuse multimodal sensor information more efficiently. The paper validates the improved performance of MVDNet with multi-head attention through comprehensive testing, which includes a training dataset derived from the Oxford Radar Robotcar. The results clearly demonstrate that the Multi-Head MVDNet outperforms the other related conventional models, particularly in the Average Precision (AP) estimation, under challenging environmental conditions. The proposed Multi-Head MVDNet not only contributes significantly to the field of autonomous vehicle detection but also underscores the potential of sophisticated sensor fusion techniques in overcoming environmental limitations.</p> Electronic sensors Computer vision Multimodal analysis and synthesis Deep learning Neural networks Multi-head Attention Deep Learning Attention Neural Network Autonomous Vehicle Sensor Fusion CNN R-CNN Vehicle Detection Object Detection Deep Fusion Lidar Radar Vision Transformer (ViT)

1

Page generated in 0.1204 seconds