Spelling suggestions: "subject:"semanticsegmentation"" "subject:"adaptivesegmentation""
1 |
A Deep 3D Object Pose Estimation Framework for Robots with RGB-D SensorsWagh, Ameya Yatindra 24 April 2019 (has links)
The task of object detection and pose estimation has widely been done using template matching techniques. However, these algorithms are sensitive to outliers and occlusions, and have high latency due to their iterative nature. Recent research in computer vision and deep learning has shown great improvements in the robustness of these algorithms. However, one of the major drawbacks of these algorithms is that they are specific to the objects. Moreover, the estimation of pose depends significantly on their RGB image features. As these algorithms are trained on meticulously labeled large datasets for object's ground truth pose, it is difficult to re-train these for real-world applications. To overcome this problem, we propose a two-stage pipeline of convolutional neural networks which uses RGB images to localize objects in 2D space and depth images to estimate a 6DoF pose. Thus the pose estimation network learns only the geometric features of the object and is not biased by its color features. We evaluate the performance of this framework on LINEMOD dataset, which is widely used to benchmark object pose estimation frameworks. We found the results to be comparable with the state of the art algorithms using RGB-D images. Secondly, to show the transferability of the proposed pipeline, we implement this on ATLAS robot for a pick and place experiment. As the distribution of images in LINEMOD dataset and the images captured by the MultiSense sensor on ATLAS are different, we generate a synthetic dataset out of very few real-world images captured from the MultiSense sensor. We use this dataset to train just the object detection networks used in the ATLAS Robot experiment.
|
2 |
UNRESTRICTED CONTROLLABLE ATTACKS FOR SEGMENTATION NEURAL NETWORKSGuangyu Shen (8795963) 12 October 2021 (has links)
<p>Despite the rapid development of adversarial attacks on machine learning models, many types of new adversarial examples remain unknown. Undiscovered types of adversarial attacks pose a</p><p>serious concern for the safety of the models, which raises the issue about the effectiveness of current adversarial robustness evaluation. Image semantic segmentation is a practical computer</p><p>vision task. However, segmentation networks’ robustness under adversarial attacks receives insufficient attention. Recently, machine learning researchers started to focus on generating</p><p>adversarial examples beyond the norm-bound restriction for segmentation neural networks. In this thesis, a simple and efficient method: AdvDRIT is proposed to synthesize unconstrained controllable adversarial images leveraging conditional-GAN. Simple CGAN yields poor image quality and low attack effectiveness. Instead, the DRIT (Disentangled Representation Image Translation) structure is leveraged with a well-designed loss function, which can generate valid adversarial images in one step. AdvDRIT is evaluated on two large image datasets: ADE20K and Cityscapes. Experiment results show that AdvDRIT can improve the quality of adversarial examples by decreasing the FID score down to 40% compared to state-of-the-art generative models such as Pix2Pix, and also improve the attack success rate 38% compared to other adversarial attack methods including PGD.</p>
|
3 |
Temporally consistent semantic segmentation in videosRaza, Syed H. 08 June 2015 (has links)
The objective of this Thesis research is to develop algorithms for temporally consistent semantic segmentation in videos. Though many different forms of semantic segmentations exist, this research is focused on the problem of temporally-consistent holistic scene understanding in outdoor videos. Holistic scene understanding requires an understanding of many individual aspects of the scene including 3D layout, objects present, occlusion boundaries, and depth. Such a description of a dynamic scene would be useful for many robotic applications including object reasoning, 3D perception, video analysis, video coding, segmentation, navigation and activity recognition.
Scene understanding has been studied with great success for still images. However, scene understanding in videos requires additional approaches to account for the temporal variation, dynamic information, and exploiting causality. As a first step, image-based scene understanding methods can be directly applied to individual video frames to generate a description of the scene. However, these methods do not exploit temporal information across neighboring frames. Further, lacking temporal consistency, image-based methods can result in temporally-inconsistent labels across frames. This inconsistency can impact performance, as scene labels suddenly change between frames.
The objective of our this study is to develop temporally consistent scene descriptive algorithms by processing videos efficiently, exploiting causality and data-redundancy, and cater for scene dynamics. Specifically, we achieve our research objectives by (1) extracting geometric context from videos to give broad 3D structure of the scene with all objects present, (2) Detecting occlusion boundaries in videos due to depth discontinuity, (3) Estimating depth in videos by combining monocular and motion features with semantic features and occlusion boundaries.
|
4 |
Superparsing with Improved Segmentation Boundaries through Nonparametric ContextPan, Hong January 2015 (has links)
Scene parsing, or segmenting all the objects in an image and identifying their categories,
is one of the core problems of computer vision. In order to achieve an object-level
semantic segmentation, we build upon the recent superparsing approach by Tighe and
Lazebnik, which is a nonparametric solution to the image labeling problem.
Superparsing consists of four steps. For a new query image, the most similar images
from the training dataset of labeled images is retrieved based on global features. In
the second step, the query image is segmented into superpxiels and 20 di erent local
features are computed for each superpixel. We propose to use the SLICO segmentation
method to allow control of the size, shape and compactness of the superpixels
because SLICO is able to produce accurate boundaries. After all superpixel features
have been extracted, feature-based matching of superpixels is performed to nd the
nearest-neighbour superpixels in the retrieval set for each query superpxiel. Based on
the neighbouring superpixels a likelihood score for each class is calculated. Finally, we
formulate a Conditional Random Field (CRF) using the likelihoods and a pairwise cost
both computed from nonparametric estimation to optimize the labeling of the image.
Speci cally, we de ne a novel pairwise cost to provide stronger semantic contextual
constraints by incorporating the similarity of adjacent superpixels depending on local
features. The optimized labeling obtained with the CRF results in superpixels with the
same labels grouped together to generate segmentation results which also identify the
categories of objects in an image.
We evaluate our improvements to the superparsing approach using segmentation
evaluation measures as well as the per-pixel rate and average per-class rate in a labeling
evaluation. We demonstrate the success of our modi ed approach on the SIFT Flow
dataset, and compare our results with the basic superparsing methods proposed by
Tighe and Lazebnik.
|
5 |
Real-Time Instance and Semantic Segmentation Using Deep LearningKolhatkar, Dhanvin 10 June 2020 (has links)
In this thesis, we explore the use of Convolutional Neural Networks for semantic and instance segmentation, with a focus on studying the application of existing methods with cheaper neural networks. We modify a fast object detection architecture for the instance segmentation task, and study the concepts behind these modifications both in the simpler context of semantic segmentation and the more difficult context of instance segmentation. Various instance segmentation branch architectures are implemented in parallel with a box prediction branch, using its results to crop each instance's features. We negate the imprecision of the final box predictions and eliminate the need for bounding box alignment by using an enlarged bounding box for cropping. We report and study the performance, advantages, and disadvantages of each. We achieve fast speeds with all of our methods.
|
6 |
Let there be light... Characterizing the Effects of Adverse Lighting on Semantic Segmentation of Wound Images and Mitigation using a Deep Retinex ModelIyer, Akshay B. 14 May 2020 (has links)
Wound assessment using a smartphone image has recently emerged as a novel way to provide actionable feedback to patients and caregivers. Wound segmentation is an important step in image-based wound assessment, after which the wound area can be analyzed. Semantic segmentation algorithms for wounds assume favorable lighting conditions. However, smartphone wound imaging in natural environments can encounter adverse lighting that can cause several errors during semantic segmentation of wound images, which in turn affects the wound analysis. In this work, we study and characterize the effects of adverse lighting on the accuracy of semantic segmentation of wound images. Our findings inform a deep learning-based approach to mitigate the adverse effects. We make three main contributions in this work. First, we create the first large-scale Illumination Varying Dataset (IVDS) of 55440 images of a wound moulage captured under systematically varying illumination conditions and with different camera types and settings. Second, we characterize the effects of changing light intensity on U-Net’s wound semantic segmentation accuracy and show the luminance of images to be highly correlated with the wound segmentation performance. Especially, we show low-light conditions to deteriorate segmentation performance highly. Third, we improve the wound Dice scores of U-Net for low-light images to up to four times the baseline values using a deep learning mitigation method based on the Retinex theory. Our method works well in typical illumination levels observed in homes/clinics as well for a wide gamut of lighting like very dark conditions (20 Lux), medium-intensity lighting (750 - 1500 Lux), and even very bright lighting (6000 Lux).
|
7 |
A Closer Look at Neighborhoods in Graph Based Point Cloud Scene Semantic Segmentation NetworksItani, Hani 11 1900 (has links)
Large scale semantic segmentation is considered as one of the fundamental tasks in 3D scene understanding. Point clouds provide a basic and rich geometric representation of scenes and tangible objects. Convolutional Neural Networks (CNNs) have demonstrated an impressive success in processing regular discrete data such as 2D images and 1D audio. However, CNNs do not directly generalize to point cloud processing due to their irregular and un-ordered nature. One way to extend CNNs to point cloud understanding is to derive an intermediate euclidean representation of a point cloud by projecting onto image domain, voxelizing, or treating points as vertices of an un-directed graph. Graph-CNNs (GCNs) have demonstrated to be a very promising solution for deep learning on irregular data such as social networks, biological systems, and recently point clouds. Early works in literature for graph based point networks relied on constructing dynamic graphs in the node feature space to define a convolution kernel. Later works constructed hierarchical static graphs in 3D space for an encoder-decoder framework inspired from image segmentation. This thesis takes a closer look at both dynamic and static graph neighborhoods of graph- based point networks for the task of semantic segmentation in order to: 1) discuss a potential cause for why going deep in dynamic GCNs does not necessarily lead to an improved performance, and 2) propose a new approach in treating points in a static graph neighborhood for an improved information aggregation. The proposed method leads to an efficient graph based 3D semantic segmentation network that is on par with current state-of-the-art methods on both indoor and outdoor scene semantic segmentation benchmarks such as S3DIS and Semantic3D.
|
8 |
An evaluation of deep learning semantic segmentation for land cover classification of oblique ground-based photographyRose, Spencer 30 September 2020 (has links)
This thesis presents a case study on the application of deep learning methods for the dense prediction of land cover types in oblique ground-based photography. While deep learning approaches are widely used in land cover classification of remote-sensing data (i.e., aerial and satellite orthoimagery) for change detection analysis, dense classification of oblique landscape imagery used in repeat photography remains undeveloped. A performance evaluation was carried out to test two state-of the-art architectures, U-net and Deeplabv3+, as well as a fully-connected conditional random fields model used to boost segmentation accuracy. The evaluation focuses on the use of a novel threshold-based data augmentation technique, and three multi-loss functions selected to mitigate class imbalance and input noise. The dataset used for this study was sampled from the Mountain Legacy Project (MLP) collection, comprised of high-resolution historic (grayscale) survey photographs of Canada’s Western mountains captured from the 1880s through the 1950s and their corresponding modern (colour) repeat images. Land cover segmentations manually created by MLP researchers were used as ground truth labels. Experimental results showed top overall F1 scores of 0.841 for historic models, and 0.909 for repeat models. Data augmentation showed modest improvements to overall accuracy (+3.0% historic / +1.0% repeat), but much larger gains for under-represented classes. / Graduate
|
9 |
Extracting Topography from Historic Topographic Maps Using GIS-Based Deep LearningPierce, Briar Z, Ernenwein, Eileen G 25 April 2023 (has links)
Historical topographic maps are valuable resources for studying past landscapes, but two-dimensional cartographic features are unsuitable for geospatial analysis. They must be extracted and converted into digital formats. This has been accomplished by researchers using sophisticated image processing and pattern recognition techniques, and more recently, artificial intelligence. While these methods are sometimes successful, they require a high level of technical expertise, limiting their accessibility. This research presents a straightforward method practitioners can use to create digital representations of historical topographic data within commercially available Geographic Information Systems (GIS) software. This study uses convolutional neural networks to extract elevation contour lines from a 1940 United States Geological Survey (USGS) topographic map in Sevier County, TN, ultimately producing a Digital Elevation Model (DEM). The topographically derived DEM (TOPO-DEM) is compared to a modern LiDAR-derived DEM to analyze its quality and utility. GIS-capable historians, archaeologists, geographers, and others can use this method in their research and land management practices.
|
10 |
Facade Segmentation in the WildPara, Wamiq Reyaz 19 August 2019 (has links)
Facade parsing is a fundamental problem in urban modeling that forms the back- bone of a variety of tasks including procedural modeling, architectural analysis, urban reconstruction and quite often relies on semantic segmentation as the first step. With the shift to deep learning based approaches, existing small-scale datasets are the bot- tleneck for making further progress in fa ̧cade segmentation and consequently fa ̧cade parsing. In this thesis, we propose a new fa ̧cade image dataset for semantic segmenta- tion called PSV-22, which is the largest such dataset. We show that PSV-22 captures semantics of fa ̧cades better than existing datasets. Additionally, we propose three architectural modifications to current state of the art deep-learning based semantic segmentation architectures and show that these modifications improve performance on our dataset and already existing datasets. Our modifications are generalizable to a large variety of semantic segmentation nets, but are fa ̧cade-specific and employ heuris- tics which arise from the regular grid-like nature of fac ̧ades. Furthermore, results show that our proposed architecture modifications improve the performance compared to baseline models as well as specialized segmentation approaches on fa ̧cade datasets and are either close in, or improve performance on existing datasets. We show that deep models trained on existing data have a substantial performance reduction on our data, whereas models trained only on our data actually improve when evaluated on existing datasets. We intend to release the dataset publically in the future.
|
Page generated in 0.1294 seconds