Global ETD Search

1	A Deep 3D Object Pose Estimation Framework for Robots with RGB-D Sensors Wagh, Ameya Yatindra 24 April 2019 (has links) The task of object detection and pose estimation has widely been done using template matching techniques. However, these algorithms are sensitive to outliers and occlusions, and have high latency due to their iterative nature. Recent research in computer vision and deep learning has shown great improvements in the robustness of these algorithms. However, one of the major drawbacks of these algorithms is that they are specific to the objects. Moreover, the estimation of pose depends significantly on their RGB image features. As these algorithms are trained on meticulously labeled large datasets for object's ground truth pose, it is difficult to re-train these for real-world applications. To overcome this problem, we propose a two-stage pipeline of convolutional neural networks which uses RGB images to localize objects in 2D space and depth images to estimate a 6DoF pose. Thus the pose estimation network learns only the geometric features of the object and is not biased by its color features. We evaluate the performance of this framework on LINEMOD dataset, which is widely used to benchmark object pose estimation frameworks. We found the results to be comparable with the state of the art algorithms using RGB-D images. Secondly, to show the transferability of the proposed pipeline, we implement this on ATLAS robot for a pick and place experiment. As the distribution of images in LINEMOD dataset and the images captured by the MultiSense sensor on ATLAS are different, we generate a synthetic dataset out of very few real-world images captured from the MultiSense sensor. We use this dataset to train just the object detection networks used in the ATLAS Robot experiment. Atlas robots pose estimation semantic segmentation
2	UNRESTRICTED CONTROLLABLE ATTACKS FOR SEGMENTATION NEURAL NETWORKS Guangyu Shen (8795963) 12 October 2021 (has links) <p>Despite the rapid development of adversarial attacks on machine learning models, many types of new adversarial examples remain unknown. Undiscovered types of adversarial attacks pose a</p><p>serious concern for the safety of the models, which raises the issue about the effectiveness of current adversarial robustness evaluation. Image semantic segmentation is a practical computer</p><p>vision task. However, segmentation networks’ robustness under adversarial attacks receives insufficient attention. Recently, machine learning researchers started to focus on generating</p><p>adversarial examples beyond the norm-bound restriction for segmentation neural networks. In this thesis, a simple and efficient method: AdvDRIT is proposed to synthesize unconstrained controllable adversarial images leveraging conditional-GAN. Simple CGAN yields poor image quality and low attack effectiveness. Instead, the DRIT (Disentangled Representation Image Translation) structure is leveraged with a well-designed loss function, which can generate valid adversarial images in one step. AdvDRIT is evaluated on two large image datasets: ADE20K and Cityscapes. Experiment results show that AdvDRIT can improve the quality of adversarial examples by decreasing the FID score down to 40% compared to state-of-the-art generative models such as Pix2Pix, and also improve the attack success rate 38% compared to other adversarial attack methods including PGD.</p> Data Communications Adversarial Attacks Semantic segmentation
3	Real-Time Instance and Semantic Segmentation Using Deep Learning Kolhatkar, Dhanvin 10 June 2020 (has links) In this thesis, we explore the use of Convolutional Neural Networks for semantic and instance segmentation, with a focus on studying the application of existing methods with cheaper neural networks. We modify a fast object detection architecture for the instance segmentation task, and study the concepts behind these modifications both in the simpler context of semantic segmentation and the more difficult context of instance segmentation. Various instance segmentation branch architectures are implemented in parallel with a box prediction branch, using its results to crop each instance's features. We negate the imprecision of the final box predictions and eliminate the need for bounding box alignment by using an enlarged bounding box for cropping. We report and study the performance, advantages, and disadvantages of each. We achieve fast speeds with all of our methods. Instance segmentation Semantic segmentation Deep learning Real-time Mask prediction
4	Let there be light... Characterizing the Effects of Adverse Lighting on Semantic Segmentation of Wound Images and Mitigation using a Deep Retinex Model Iyer, Akshay B. 14 May 2020 (has links) Wound assessment using a smartphone image has recently emerged as a novel way to provide actionable feedback to patients and caregivers. Wound segmentation is an important step in image-based wound assessment, after which the wound area can be analyzed. Semantic segmentation algorithms for wounds assume favorable lighting conditions. However, smartphone wound imaging in natural environments can encounter adverse lighting that can cause several errors during semantic segmentation of wound images, which in turn affects the wound analysis. In this work, we study and characterize the effects of adverse lighting on the accuracy of semantic segmentation of wound images. Our findings inform a deep learning-based approach to mitigate the adverse effects. We make three main contributions in this work. First, we create the first large-scale Illumination Varying Dataset (IVDS) of 55440 images of a wound moulage captured under systematically varying illumination conditions and with different camera types and settings. Second, we characterize the effects of changing light intensity on U-Net’s wound semantic segmentation accuracy and show the luminance of images to be highly correlated with the wound segmentation performance. Especially, we show low-light conditions to deteriorate segmentation performance highly. Third, we improve the wound Dice scores of U-Net for low-light images to up to four times the baseline values using a deep learning mitigation method based on the Retinex theory. Our method works well in typical illumination levels observed in homes/clinics as well for a wide gamut of lighting like very dark conditions (20 Lux), medium-intensity lighting (750 - 1500 Lux), and even very bright lighting (6000 Lux). Lighting Semantic Segmentation Deep Learning Wounds Retinex Theory Dataset
5	A Closer Look at Neighborhoods in Graph Based Point Cloud Scene Semantic Segmentation Networks Itani, Hani 11 1900 (has links) Large scale semantic segmentation is considered as one of the fundamental tasks in 3D scene understanding. Point clouds provide a basic and rich geometric representation of scenes and tangible objects. Convolutional Neural Networks (CNNs) have demonstrated an impressive success in processing regular discrete data such as 2D images and 1D audio. However, CNNs do not directly generalize to point cloud processing due to their irregular and un-ordered nature. One way to extend CNNs to point cloud understanding is to derive an intermediate euclidean representation of a point cloud by projecting onto image domain, voxelizing, or treating points as vertices of an un-directed graph. Graph-CNNs (GCNs) have demonstrated to be a very promising solution for deep learning on irregular data such as social networks, biological systems, and recently point clouds. Early works in literature for graph based point networks relied on constructing dynamic graphs in the node feature space to define a convolution kernel. Later works constructed hierarchical static graphs in 3D space for an encoder-decoder framework inspired from image segmentation. This thesis takes a closer look at both dynamic and static graph neighborhoods of graph- based point networks for the task of semantic segmentation in order to: 1) discuss a potential cause for why going deep in dynamic GCNs does not necessarily lead to an improved performance, and 2) propose a new approach in treating points in a static graph neighborhood for an improved information aggregation. The proposed method leads to an efficient graph based 3D semantic segmentation network that is on par with current state-of-the-art methods on both indoor and outdoor scene semantic segmentation benchmarks such as S3DIS and Semantic3D. Deep Learning on point clouds Local Aggregation Function Semantic Segmentation
6	An evaluation of deep learning semantic segmentation for land cover classification of oblique ground-based photography Rose, Spencer 30 September 2020 (has links) This thesis presents a case study on the application of deep learning methods for the dense prediction of land cover types in oblique ground-based photography. While deep learning approaches are widely used in land cover classification of remote-sensing data (i.e., aerial and satellite orthoimagery) for change detection analysis, dense classification of oblique landscape imagery used in repeat photography remains undeveloped. A performance evaluation was carried out to test two state-of the-art architectures, U-net and Deeplabv3+, as well as a fully-connected conditional random fields model used to boost segmentation accuracy. The evaluation focuses on the use of a novel threshold-based data augmentation technique, and three multi-loss functions selected to mitigate class imbalance and input noise. The dataset used for this study was sampled from the Mountain Legacy Project (MLP) collection, comprised of high-resolution historic (grayscale) survey photographs of Canada’s Western mountains captured from the 1880s through the 1950s and their corresponding modern (colour) repeat images. Land cover segmentations manually created by MLP researchers were used as ground truth labels. Experimental results showed top overall F1 scores of 0.841 for historic models, and 0.909 for repeat models. Data augmentation showed modest improvements to overall accuracy (+3.0% historic / +1.0% repeat), but much larger gains for under-represented classes. / Graduate landscape classification semantic segmentation change detection deep learning
7	Extracting Topography from Historic Topographic Maps Using GIS-Based Deep Learning Pierce, Briar Z, Ernenwein, Eileen G 25 April 2023 (has links) Historical topographic maps are valuable resources for studying past landscapes, but two-dimensional cartographic features are unsuitable for geospatial analysis. They must be extracted and converted into digital formats. This has been accomplished by researchers using sophisticated image processing and pattern recognition techniques, and more recently, artificial intelligence. While these methods are sometimes successful, they require a high level of technical expertise, limiting their accessibility. This research presents a straightforward method practitioners can use to create digital representations of historical topographic data within commercially available Geographic Information Systems (GIS) software. This study uses convolutional neural networks to extract elevation contour lines from a 1940 United States Geological Survey (USGS) topographic map in Sevier County, TN, ultimately producing a Digital Elevation Model (DEM). The topographically derived DEM (TOPO-DEM) is compared to a modern LiDAR-derived DEM to analyze its quality and utility. GIS-capable historians, archaeologists, geographers, and others can use this method in their research and land management practices. topographic maps semantic segmentation deep learning DEM of difference Environmental Geography
8	Facade Segmentation in the Wild Para, Wamiq Reyaz 19 August 2019 (has links) Facade parsing is a fundamental problem in urban modeling that forms the back- bone of a variety of tasks including procedural modeling, architectural analysis, urban reconstruction and quite often relies on semantic segmentation as the first step. With the shift to deep learning based approaches, existing small-scale datasets are the bot- tleneck for making further progress in fa ̧cade segmentation and consequently fa ̧cade parsing. In this thesis, we propose a new fa ̧cade image dataset for semantic segmenta- tion called PSV-22, which is the largest such dataset. We show that PSV-22 captures semantics of fa ̧cades better than existing datasets. Additionally, we propose three architectural modifications to current state of the art deep-learning based semantic segmentation architectures and show that these modifications improve performance on our dataset and already existing datasets. Our modifications are generalizable to a large variety of semantic segmentation nets, but are fa ̧cade-specific and employ heuris- tics which arise from the regular grid-like nature of fac ̧ades. Furthermore, results show that our proposed architecture modifications improve the performance compared to baseline models as well as specialized segmentation approaches on fa ̧cade datasets and are either close in, or improve performance on existing datasets. We show that deep models trained on existing data have a substantial performance reduction on our data, whereas models trained only on our data actually improve when evaluated on existing datasets. We intend to release the dataset publically in the future. computer vison semantic segmentation Deep learning urban reconstruction
9	Learning with Limited Labeled Data: Techniques and Applications Lei, Shuo 11 October 2023 (has links) Recent advances in large neural network-style models have demonstrated great performance in various applications, such as image generation, question answering, and audio classification. However, these deep and high-capacity models require a large amount of labeled data to function properly, rendering them inapplicable in many real-world scenarios. This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to learn novel classes with limited labeled data, (2) How to adapt a large pre-trained model to the target domain if only unlabeled data is available, (3) How to boost the performance of the few-shot learning model with unlabeled data, and (4) How to utilize limited labeled data to learn new classes without the training data in the same domain. First, we study few-shot learning in text classification tasks. Meta-learning is becoming a popular approach for addressing few-shot text classification and has achieved state-of-the-art performance. However, the performance of existing approaches heavily depends on the interclass variance of the support set. To address this problem, we propose a TART network for few-shot text classification. The model enhances the generalization by transforming the class prototypes to per-class fixed reference points in task-adaptive metric spaces. In addition, we design a novel discriminative reference regularization to maximize divergence between transformed prototypes in task-adaptive metric spaces to improve performance further. In the second problem we focus on self-learning in cross-lingual transfer task. Our goal here is to develop a framework that can make the pretrained cross-lingual model continue learning the knowledge with large amount of unlabeled data. Existing self-learning methods in crosslingual transfer tasks suffer from the large number of incorrectly pseudo-labeled samples used in the training phase. We first design an uncertainty-aware cross-lingual transfer framework with pseudo-partial-labels. We also propose a novel pseudo-partial-label estimation method that considers prediction confidences and the limitation to the number of candidate classes. Next, to boost the performance of the few-shot learning model with unlabeled data, we propose a semi-supervised approach for few-shot semantic segmentation task. Existing solutions for few-shot semantic segmentation cannot easily be applied to utilize image-level weak annotations. We propose a class-prototype augmentation method to enrich the prototype representation by utilizing a few image-level annotations, achieving superior performance in one-/multi-way and weak annotation settings. We also design a robust strategy with softmasked average pooling to handle the noise in image-level annotations, which considers the prediction uncertainty and employs the task-specific threshold to mask the distraction. Finally, we study the cross-domain few-shot learning in the semantic segmentation task. Most existing few-shot segmentation methods consider a setting where base classes are drawn from the same domain as the new classes. Nevertheless, gathering enough training data for meta-learning is either unattainable or impractical in many applications. We extend few-shot semantic segmentation to a new task, called Cross-Domain Few-Shot Semantic Segmentation (CD-FSS), which aims to generalize the meta-knowledge from domains with sufficient training labels to low-resource domains. Then, we establish a new benchmark for the CD-FSS task and evaluate both representative few-shot segmentation methods and transfer learning based methods on the proposed benchmark. We then propose a novel Pyramid-AnchorTransformation based few-shot segmentation network (PATNet), in which domain-specific features are transformed into domain-agnostic ones for downstream segmentation modules to fast adapt to unseen domains. / Doctor of Philosophy / Nowadays, deep learning techniques play a crucial role in our everyday existence. In addition, they are crucial to the success of many e-commerce and local businesses for enhancing data analytics and decision-making. Notable applications include intelligent transportation, intelligent healthcare, the generation of natural language, and intrusion detection, among others. To achieve reasonable performance on a new task, these deep and high-capacity models require thousands of labeled examples, which increases the data collection effort and computation costs associated with training a model. Moreover, in many disciplines, it might be difficult or even impossible to obtain data due to concerns such as privacy and safety. This dissertation focuses on learning with limited labeled data in natural language processing and computer vision tasks. To recognize novel classes with a few examples in text classification tasks, we develop a deep learning-based model that can capture both cross- task transferable knowledge and task-specific features. We also build an uncertainty-aware self-learning framework and a semi-supervised few-shot learning method, which allow us to boost the pre-trained model with easily accessible unlabeled data. In addition, we propose a cross-domain few-shot semantic segmentation method to generalize the model to different domains with a few examples. By handling these unique challenges in learning with limited labeled data and developing suitable approaches, we hope to improve the eﬀiciency and generalization of deep learning methods in the real world. few-shot learning self-learning semantic segmentation natural language processing
10	Sémantická segmentace v horském prostředí / Semantic Segmentation in Mountainous Environment Pelikán, Jakub January 2017 (has links) Semantic segmentation is one of classic computer vision problems and strong tool for machine processing and understanding of the scene. In this thesis we use semantic segmentation in mountainous environment. The main motivation of this work is to use semantic segmentation for automatic location of geographic position, where the picture was taken. In this thesis we evaluated actual methods of semantic segmentation and we chose three of them that are appropriate for adapting to mountainous environment. We split the dataset with mountainous environment into validation, train and test sets to use for training of chosen semantic segmentation methods. We trained models from chosen methods on mountainous data. We let segments from the best trained models get evaluated in electronic survey by respondents and we evaluated these segments in process of camera orientation estimation. We showed that chosen methods of semantic segmentation are possible to use in mountainous environment. Our models are trained on 11, 5 or 4 mountainous classes and the best of them achieve on 4 class mean IU 57.4%. Models are usable in practise. We show it by their deployment as a part of camera orientation estimation process.

Search results