Global ETD Search

11	Learning with Limited Labeled Data: Techniques and Applications Lei, Shuo 11 October 2023 (has links) Recent advances in large neural network-style models have demonstrated great performance in various applications, such as image generation, question answering, and audio classification. However, these deep and high-capacity models require a large amount of labeled data to function properly, rendering them inapplicable in many real-world scenarios. This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to learn novel classes with limited labeled data, (2) How to adapt a large pre-trained model to the target domain if only unlabeled data is available, (3) How to boost the performance of the few-shot learning model with unlabeled data, and (4) How to utilize limited labeled data to learn new classes without the training data in the same domain. First, we study few-shot learning in text classification tasks. Meta-learning is becoming a popular approach for addressing few-shot text classification and has achieved state-of-the-art performance. However, the performance of existing approaches heavily depends on the interclass variance of the support set. To address this problem, we propose a TART network for few-shot text classification. The model enhances the generalization by transforming the class prototypes to per-class fixed reference points in task-adaptive metric spaces. In addition, we design a novel discriminative reference regularization to maximize divergence between transformed prototypes in task-adaptive metric spaces to improve performance further. In the second problem we focus on self-learning in cross-lingual transfer task. Our goal here is to develop a framework that can make the pretrained cross-lingual model continue learning the knowledge with large amount of unlabeled data. Existing self-learning methods in crosslingual transfer tasks suffer from the large number of incorrectly pseudo-labeled samples used in the training phase. We first design an uncertainty-aware cross-lingual transfer framework with pseudo-partial-labels. We also propose a novel pseudo-partial-label estimation method that considers prediction confidences and the limitation to the number of candidate classes. Next, to boost the performance of the few-shot learning model with unlabeled data, we propose a semi-supervised approach for few-shot semantic segmentation task. Existing solutions for few-shot semantic segmentation cannot easily be applied to utilize image-level weak annotations. We propose a class-prototype augmentation method to enrich the prototype representation by utilizing a few image-level annotations, achieving superior performance in one-/multi-way and weak annotation settings. We also design a robust strategy with softmasked average pooling to handle the noise in image-level annotations, which considers the prediction uncertainty and employs the task-specific threshold to mask the distraction. Finally, we study the cross-domain few-shot learning in the semantic segmentation task. Most existing few-shot segmentation methods consider a setting where base classes are drawn from the same domain as the new classes. Nevertheless, gathering enough training data for meta-learning is either unattainable or impractical in many applications. We extend few-shot semantic segmentation to a new task, called Cross-Domain Few-Shot Semantic Segmentation (CD-FSS), which aims to generalize the meta-knowledge from domains with sufficient training labels to low-resource domains. Then, we establish a new benchmark for the CD-FSS task and evaluate both representative few-shot segmentation methods and transfer learning based methods on the proposed benchmark. We then propose a novel Pyramid-AnchorTransformation based few-shot segmentation network (PATNet), in which domain-specific features are transformed into domain-agnostic ones for downstream segmentation modules to fast adapt to unseen domains. / Doctor of Philosophy / Nowadays, deep learning techniques play a crucial role in our everyday existence. In addition, they are crucial to the success of many e-commerce and local businesses for enhancing data analytics and decision-making. Notable applications include intelligent transportation, intelligent healthcare, the generation of natural language, and intrusion detection, among others. To achieve reasonable performance on a new task, these deep and high-capacity models require thousands of labeled examples, which increases the data collection effort and computation costs associated with training a model. Moreover, in many disciplines, it might be difficult or even impossible to obtain data due to concerns such as privacy and safety. This dissertation focuses on learning with limited labeled data in natural language processing and computer vision tasks. To recognize novel classes with a few examples in text classification tasks, we develop a deep learning-based model that can capture both cross- task transferable knowledge and task-specific features. We also build an uncertainty-aware self-learning framework and a semi-supervised few-shot learning method, which allow us to boost the pre-trained model with easily accessible unlabeled data. In addition, we propose a cross-domain few-shot semantic segmentation method to generalize the model to different domains with a few examples. By handling these unique challenges in learning with limited labeled data and developing suitable approaches, we hope to improve the eﬀiciency and generalization of deep learning methods in the real world. few-shot learning self-learning semantic segmentation natural language processing
12	Sémantická segmentace v horském prostředí / Semantic Segmentation in Mountainous Environment Pelikán, Jakub January 2017 (has links) Semantic segmentation is one of classic computer vision problems and strong tool for machine processing and understanding of the scene. In this thesis we use semantic segmentation in mountainous environment. The main motivation of this work is to use semantic segmentation for automatic location of geographic position, where the picture was taken. In this thesis we evaluated actual methods of semantic segmentation and we chose three of them that are appropriate for adapting to mountainous environment. We split the dataset with mountainous environment into validation, train and test sets to use for training of chosen semantic segmentation methods. We trained models from chosen methods on mountainous data. We let segments from the best trained models get evaluated in electronic survey by respondents and we evaluated these segments in process of camera orientation estimation. We showed that chosen methods of semantic segmentation are possible to use in mountainous environment. Our models are trained on 11, 5 or 4 mountainous classes and the best of them achieve on 4 class mean IU 57.4%. Models are usable in practise. We show it by their deployment as a part of camera orientation estimation process.
13	Mutual Enhancement of Environment Recognition and Semantic Segmentation in Indoor Environment Challa, Venkata Vamsi January 2024 (has links) Background:The dynamic field of computer vision and artificial intelligence has continually evolved, pushing the boundaries in areas like semantic segmentation andenvironmental recognition, pivotal for indoor scene analysis. This research investigates the integration of these two technologies, examining their synergy and implicayions for enhancing indoor scene understanding. The application of this integrationspans across various domains, including smart home systems for enhanced ambientliving, navigation assistance for Cleaning robots, and advanced surveillance for security. Objectives: The primary goal is to assess the impact of integrating semantic segmentation data on the accuracy of environmental recognition algorithms in indoor environments. Additionally, the study explores how environmental context can enhance the precision and accuracy of contour-aware semantic segmentation. Methods: The research employed an extensive methodology, utilizing various machine learning models, including standard algorithms, Long Short-Term Memorynetworks, and ensemble methods. Transfer learning with models like EfficientNet B3, MobileNetV3 and Vision Tranformer was a key aspect of the experimentation. The experiments were designed to measure the effect of semantic segmentation on environmental recognition and its reciprocal influence. Results: The findings indicated that the integration of semantic segmentation data significantly enhanced the accuracy of environmental recognition algorithms. Conversely, incorporating environmental context into contour-aware semantic segmentation led to notable improvements in precision and accuracy, reflected in metrics such as Mean Intersection over Union(MIoU). Conclusion: This research underscores the mutual enhancement between semantic segmentation and environmental recognition, demonstrating how each technology significantly boosts the effectiveness of the other in indoor scene analysis. The integration of semantic segmentation data notably elevates the accuracy of environmental recognition algorithms, while the incorporation of environmental context into contour-aware semantic segmentation substantially improves its precision and accuracy.The results also open avenues for advancements in automated annotation processes, paving the way for smarter environmental interaction. Semantic Segmentation Scene Classification Environment Recognition Machine Learning Deep Learning Image Classification Vision Transformers SAM(Segment Anything Model) Image Segmentation Contour-aware semantic segmentation Computer Sciences Datavetenskap (datalogi)
14	Segmentação de cenas em telejornais: uma abordagem multimodal / Scene segmentation in news programs: a multimodal approach Coimbra, Danilo Barbosa 11 April 2011 (has links) Este trabalho tem como objetivo desenvolver um método de segmentação de cenas em vídeos digitais que trate segmentos semânticamente complexos. Como prova de conceito, é apresentada uma abordagem multimodal que utiliza uma definição mais geral para cenas em telejornais, abrangendo tanto cenas onde âncoras aparecem quanto cenas onde nenhum âncora aparece. Desse modo, os resultados obtidos da técnica multimodal foram signifiativamente melhores quando comparados com os resultados obtidos das técnicas monomodais aplicadas em separado. Os testes foram executados em quatro grupos de telejornais brasileiros obtidos de duas emissoras de TV diferentes, cada qual contendo cinco edições, totalizando vinte telejornais / This work aims to develop a method for scene segmentation in digital video which deals with semantically complex segments. As proof of concept, we present a multimodal approach that uses a more general definition for TV news scenes, covering both: scenes where anchors appear on and scenes where no anchor appears. The results of the multimodal technique were significantly better when compared with the results from monomodal techniques applied separately. The tests were performed in four groups of Brazilian news programs obtained from two different television stations, containing five editions each, totaling twenty newscasts Multimodal scene segmentation Multimodal video segmentation Segmentação de cena multimodal Segmentação de vídeo multimodal Segmentaçãop semântica Semantic segmentation
15	Multi-Task Learning using Road Surface Condition Classification and Road Scene Semantic Segmentation Westell, Jesper January 2019 (has links) Understanding road surface conditions is an important component in active vehicle safety. Estimations can be achieved through image classification using increasingly popular convolutional neural networks (CNNs). In this paper, we explore the effects of multi-task learning by creating CNNs capable of simultaneously performing the two tasks road surface condition classification (RSCC) and road scene semantic segmentation (RSSS). A multi-task network, containing a shared feature extractor (VGG16, ResNet-18, ResNet-101) and two taskspecific network branches, is built and trained using the Road-Conditions and Cityscapes datasets. We reveal that utilizing task-dependent homoscedastic uncertainty in the learning process improvesmulti-task model performance on both tasks. When performing task adaptation, using a small set of additional data labeled with semantic information, we gain considerable RSCC improvements on complex models. Furthermore, we demonstrate increased model generalizability in multi-task models, with up to 12% higher F1-score compared to single-task models. Computer Vision Deep Learning Machine Learning Convolutional Neural Networks Classification Semantic Segmentation Signal Processing Signalbehandling
16	Evaluation of Deep Learning-Based Semantic Segmentation Approaches for Autonomous Corrosion Detection on Metallic Surfaces Cheng Qian (7479359) 17 October 2019 (has links) <div> The structural defects can lead to serious safety issues and the corrosponding economic losses. In 2013, it was estimated that 2.5 trillion US dollars were spent on corrosion around the world, which was 3.4\% of the global Gross Domestic Product (GDP) (Koch, 2016). Periodical inspection of corrosion and maintenance of steel structures are essential to minimize these losses. Current corrosion inspection guidelines require inspectors to visually assess every critical member within arm's reach. This process is time-consuming, subjective and labor-intensive, and therefore is done only once every two years. </div><div><br></div><div>A promising solution is to use a robotic system, such as an Unmanned Aerial Vehicle (UAV), with computer vision techniques to assess corrosion on metallic surfaces. Several studies have been conducted in this area, but the shortcoming is that they cannot quantify the corroded region reliably: some studies only classify whether corrosion exists in the image or not; some only draw a box around corroded region; and some need human-engineered features to identify corrosion. This study aims to address this problem by using deep learning-based semantic segmentation to let the computer capture useful features and find the bounding of corroded regions accurately.</div><div><br></div><div>In this study, the performance of four state-of-the-art deep learning techniques for semantic segmentation was investigated for corrosion assessment task，including U-Net, DeepLab, PSPNet, and RefineNet. Six hundred high-resolution images of corroded regions were used to train and test the networks. Ten sets of experiments were performed on each architecture for cross-validation. Since the images were large, two approaches were used to analyze images: 1) subdividing images, 2) down-sampling images. A parametric analysis on these two prepossessing methods was also considered.</div><div><br></div><div>Prediction results were evaluated based on intersection over union (IoU), recall and precision scores. Statistical analysis using box chart and Wilcoxon singled ranked test showed that subdivided image dataset gave a better result, while resized images required less time for prediction. Performance of PSPNet outperformed the other three architectures on the subdivided dataset. DeepLab showed the best performance on the resized dataset. It was found Refinenet was not appropriate for corrosion detection task. U-Net was found to be ideal for real-time processing of image while RefineNet did not perform well for corrosion assessment.</div><div> </div> corrosion assessment semantic segmentation deep learning
17	Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling Zhang, Chenxi 01 January 2014 (has links) This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models. Semantic segmentation image enhancement 3D parametric modeling Multiview stereo Artificial Intelligence and Robotics
18	Improving Photogrammetry using Semantic Segmentation Kernell, Björn January 2018 (has links) 3D reconstruction is the process of constructing a three-dimensional model from images. It contains multiple steps where each step can induce errors. When doing 3D reconstruction of outdoor scenes, there are some types of scene content that regularly cause problems and affect the resulting 3D model. Two of these are water, due to its fluctuating nature, and sky because of it containing no useful (3D) data. These areas cause different problems throughout the process and do generally not benefit it in any way. Therefore, masking them early in the reconstruction chain could be a useful step in an outdoor scene reconstruction pipeline. Manual masking of images is a time-consuming and boring task and it gets very tedious for big data sets which are often used in large scale 3D reconstructions. This master thesis explores if this can be done automatically using Convolutional Neural Networks for semantic segmentation, and to what degree the masking would benefit a 3D reconstruction pipeline. / 3D-rekonstruktion är teknologin bakom att skapa 3D-modeller utifrån bilder. Det är en process med många steg där varje steg kan medföra fel. Vid 3D-rekonstruktion av stora utomhusmiljöer finns det vissa typer av bildinnehåll som ofta ställer till problem. Två av dessa är vatten och himmel. Vatten är problematiskt då det kan fluktuera mycket från bild till bild samt att det kan innehålla reflektioner som ger olika utseenden från olika vinklar. Himmel å andra sidan ska aldrig ge upphov till 3D-information varför den lika gärna kan maskas bort. Manuell maskning av bilder är väldigt tidskrävande och dyrt. Detta examensarbete undersöker huruvida denna maskning kan göras automatiskt med Faltningsnät för Semantisk Segmentering och hur detta skulle kunna förbättra en 3D-rekonstruktionsprocess. photogrammetry semantic segmentation convolutional neural networks
19	Semantic Segmentation of Oblique Views in a 3D-Environment Tranell, Victor January 2019 (has links) This thesis presents and evaluates different methods to semantically segment 3D-models by rendered 2D-views. The 2D-views are segmented separately and then merged together. The thesis evaluates three different merge strategies, two different classification architectures, how many views should be rendered and how these rendered views should be arranged. The results are evaluated both quantitatively and qualitatively and then compared with the current classifier at Vricon presented in [30]. The conclusion of this thesis is that there is a performance gain to be had using this method. The best model was using two views and attains an accuracy of 90.89% which can be compared with 84.52% achieved by the single view network from [30]. The best nine view system achieved a 87.72%. The difference in accuracy between the two and the nine view system is attributed to the higher quality mesh on the sunny side of objects, which typically is the south side. The thesis provides a proof of concept and there are still many areas where the system can be improved. One of them being the extraction of training data which seemingly would have a huge impact on the performance. Semantic segmentation 3D segmentation oblique views multiview segmentation satellite imagery convolutional neural networks Signal Processing Signalbehandling
20	Automated taxiing for unmanned aircraft systems Eaton, William H. January 2017 (has links) Over the last few years, the concept of civil Unmanned Aircraft System(s) (UAS) has been realised, with small UASs commonly used in industries such as law enforcement, agriculture and mapping. With increased development in other areas, such as logistics and advertisement, the size and range of civil UAS is likely to grow. Taken to the logical conclusion, it is likely that large scale UAS will be operating in civil airspace within the next decade. Although the airborne operations of civil UAS have already gathered much research attention, work is also required to determine how UAS will function when on the ground. Motivated by the assumption that large UAS will share ground facilities with manned aircraft, this thesis describes the preliminary development of an Automated Taxiing System(ATS) for UAS operating at civil aerodromes. To allow the ATS to function on the majority of UAS without the need for additional hardware, a visual sensing approach has been chosen, with the majority of work focusing on monocular image processing techniques. The purpose of the computer vision system is to provide direct sensor data which can be used to validate the vehicle s position, in addition to detecting potential collision risks. As aerospace regulations require the most robust and reliable algorithms for control, any methods which are not fully definable or explainable will not be suitable for real-world use. Therefore, non-deterministic methods and algorithms with hidden components (such as Artificial Neural Network (ANN)) have not been used. Instead, the visual sensing is achieved through a semantic segmentation, with separate segmentation and classification stages. Segmentation is performed using superpixels and reachability clustering to divide the image into single content clusters. Each cluster is then classified using multiple types of image data, probabilistically fused within a Bayesian network. The data set for testing has been provided by BAE Systems, allowing the system to be trained and tested on real-world aerodrome data. The system has demonstrated good performance on this limited dataset, accurately detecting both collision risks and terrain features for use in navigation.

Search results