Global ETD Search

81	U-net based deep learning architectures for object segmentation in biomedical images Nahian Siddique (11219427) 04 August 2021 (has links) <div>U-net is an image segmentation technique developed primarily for medical image analysis that can precisely segment images using a scarce amount of training data. These traits provide U-net with a high utility within the medical imaging community and have resulted in extensive adoption of U-net as the primary tool for segmentation tasks in medical imaging. The success of U-net is evident in its widespread use in nearly all major image modalities from CT scans and MRI to X-rays and microscopy. Furthermore, while U-net is largely a segmentation tool, there have been instances of the use of U-net in other applications. Given that U-net's potential is still increasing, this review examines the numerous developments and breakthroughs in the U-net architecture and provides observations on recent trends. We also discuss the many innovations that have advanced in deep learning and discuss how these tools facilitate U-net. In addition, we review the different image modalities and application areas that have been enhanced by U-net.</div><div>In recent years, deep learning for health care is rapidly infiltrating and transforming medical fields thanks to the advances in computing power, data availability, and algorithm development. In particular, U-Net, a deep learning technique, has achieved remarkable success in medical image segmentation and has become one of the premier tools in this area. While the accomplishments of U-Net and other deep learning algorithms are evident, there still exist many challenges in medical image processing to achieve human-like performance. In this thesis, we propose a U-net architecture that integrates a residual skip connections and recurrent feedback with EfficientNet as a pretrained encoder. Residual connections help feature propagation in deep neural networks and significantly improve performance against networks with a similar number of parameters while recurrent connections ameliorate gradient learning. We also propose a second model that utilizes densely connected layers aiding deeper neural networks. And the proposed third model that incorporates fractal expansions to bypass diminishing gradients. EfficientNet is a family of powerful pretrained encoders that streamline neural network design. The use of EfficientNet as an encoder provides the network with robust feature extraction that can be used by the U-Net decoder to create highly accurate segmentation maps. The proposed networks are evaluated against state-of-the-art deep learning based segmentation techniques to demonstrate their superior performance.</div> Computer Vision U-net Image Segmentation Semantic Segmentation Medical imaging Artificial neural network Deep Learning Neural Networks
82	Automatic segmentation of articular cartilage in arthroscopic images using deep neural networks and multifractal analysis Ångman, Mikael, Viken, Hampus January 2020 (has links) Osteoarthritis is a large problem affecting many patients globally, and diagnosis of osteoarthritis is often done using evidence from arthroscopic surgeries. Making a correct diagnosis is hard, and takes years of experience and training on thousands of images. Therefore, developing an automatic solution to perform the diagnosis would be extremely helpful to the medical field. Since machine learning has been proven to be useful and effective at classifying and segmenting medical images, this thesis aimed at solving the problem using machine learning methods. Multifractal analysis has also been used extensively for medical imaging segmentation. This study proposes two methods of automatic segmentation using neural networks and multifractal analysis. The thesis was performed using real arthroscopic images from surgeries. MultiResUnet architecture is shown to be well suited for pixel perfect segmentation. Classification of multifractal features using neural networks is also shown to perform well when compared to related studies. Convolutional neural networks multifractal analysis arthroscopy semantic segmentation wavelet leaders wavelet p-leaders Medical Image Processing Medicinsk bildbehandling Computer Engineering Datorteknik
83	Design of Mobility Cyber Range and Vision-Based Adversarial Attacks on Camera Sensors in Autonomous Vehicles Ramayee, Harish Asokan January 2021 (has links) No description available. Computer Engineering Electrical Engineering 2D Semantic segmentation Vision-based attacks Adversarial images Cyber Range Autonomous Vehicles machine learning computer vision Cyber security Traffic sign recognition Neural network optimization
84	Classification of Terrain Roughness from Nationwide Data Sources Using Deep Learning Fredriksson, Emily January 2022 (has links) 3D semantic segmentation is an expanding topic within the field of computer vision, which has received more attention in recent years due to the development of more powerful GPUs and the newpossibilities offered by deep learning techniques. Simultaneously, the amount of available spatial LiDAR data over Sweden has also increased. This work combines these two advances and investigates if a 3D deep learning model for semantic segmentation can learn to detect terrain roughness in airborne LiDAR data. The annotations for terrain roughness used in this work are taken from SGUs 2D soil type map. Other airborne data sources are also used to filter the annotations and see if additional information can boost the performance of the model. Since this is the first known attempt at terrain roughness classification from 3D data, an initial test was performed where fields were classified. This ensured that the model could process airborne LiDAR data and work for a terrain classification task. The classification of fields showed very promising results without any fine-tuning. The results for the terrain roughness classification task show that the model could find a pattern in the validation data but had difficulty generalizing it to the test data. The filtering methods tested gave an increased mIoU and indicated that better annotations might be necessary to distinguish terrain roughness from other terrain types. None of the features obtained from the other data sources improved the results and showed no discriminating abilities when examining their individual histograms. In the end, more research is needed to determine whether terrain roughness can be detected from LiDAR data or not. Computer vision Machine learning 3D semantic segmentation Airborne LiDAR data Terrain classification
85	Structuring of image databases for the suggestion of products for online advertising / Structuration des bases d’images pour la suggestion des produits pour la publicité en ligne Yang, Lixuan 10 July 2017 (has links) Le sujet de la thèse est l'extraction et la segmentation des vêtements à partir d'images en utilisant des techniques de la vision par ordinateur, de l'apprentissage par ordinateur et de la description d'image, pour la recommandation de manière non intrusive aux utilisateurs des produits similaires provenant d'une base de données de vente. Nous proposons tout d'abord un extracteur d'objets dédié à la segmentation de la robe en combinant les informations locales avec un apprentissage préalable. Un détecteur de personne localises des sites dans l'image qui est probable de contenir l'objet. Ensuite, un processus d'apprentissage intra-image en deux étapes est est développé pour séparer les pixels de l'objet de fond. L'objet est finalement segmenté en utilisant un algorithme de contour actif qui prend en compte la segmentation précédente et injecte des connaissances spécifiques sur la courbure locale dans la fonction énergie. Nous proposons ensuite un nouveau framework pour l'extraction des vêtements généraux en utilisant une procédure d'ajustement globale et locale à trois étapes. Un ensemble de modèles initialises un processus d'extraction d'objet par un alignement global du modèle, suivi d'une recherche locale en minimisant une mesure de l'inadéquation par rapport aux limites potentielles dans le voisinage. Les résultats fournis par chaque modèle sont agrégés, mesuré par un critère d'ajustement globale, pour choisir la segmentation finale. Dans notre dernier travail, nous étendons la sortie d'un réseau de neurones Fully Convolutional Network pour inférer le contexte à partir d'unités locales (superpixels). Pour ce faire, nous optimisons une fonction énergie, qui combine la structure à grande échelle de l'image avec le local structure superpixels, en recherchant dans l'espace de toutes les possibilité d'étiquetage. De plus, nous introduisons une nouvelle base de données RichPicture, constituée de 1000 images pour l'extraction de vêtements à partir d'images de mode. Les méthodes sont validées sur la base de données publiques et se comparent favorablement aux autres méthodes selon toutes les mesures de performance considérées. / The topic of the thesis is the extraction and segmentation of clothing items from still images using techniques from computer vision, machine learning and image description, in view of suggesting non intrusively to the users similar items from a database of retail products. We firstly propose a dedicated object extractor for dress segmentation by combining local information with a prior learning. A person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function.We then propose a new framework for extracting general deformable clothing items by using a three stage global-local fitting procedure. A set of template initiates an object extraction process by a global alignment of the model, followed by a local search minimizing a measure of the misfit with respect to the potential boundaries in the neighborhood. The results provided by each template are aggregated, with a global fitting criterion, to obtain the final segmentation.In our latest work, we extend the output of a Fully Convolution Neural Network to infer context from local units(superpixels). To achieve this we optimize an energy function,that combines the large scale structure of the image with the locallow-level visual descriptions of superpixels, over the space of all possiblepixel labellings. In addition, we introduce a novel dataset called RichPicture, consisting of 1000 images for clothing extraction from fashion images.The methods are validated on the public database and compares favorably to the other methods according to all the performance measures considered. Segmentation des vêtements Segmentation sémantique Apprentissage profond Contour Active Réseau neurone Clothing segmentation Semantic segmentation Deep learning Active Contour Fully convolution network 006.42 006.32
86	Real-time Unsupervised Domain Adaptation / Oövervakad domänanpassning i realtid Botet Colomer, Marc January 2023 (has links) Machine learning systems have been demonstrated to be highly effective in various fields, such as in vision tasks for autonomous driving. However, the deployment of these systems poses a significant challenge in terms of ensuring their reliability and safety in diverse and dynamic environments. Online Unsupervised Domain Adaptation (UDA) aims to address the issue of continuous domain changes that may occur during deployment, such as sudden weather changes. Although these methods possess a remarkable ability to adapt to unseen domains, they are hindered by the high computational cost associated with constant adaptation, making them unsuitable for real-world applications that demand real-time performance. In this work, we focus on the challenging task of semantic segmentation. We present a framework for real-time domain adaptation that utilizes novel strategies to enable online adaptation at a rate of over 29 FPS on a single GPU. We propose a clever partial backpropagation in conjunction with a lightweight domain-shift detector that identifies the need for adaptation, adapting appropriately domain-specific hyperparameters to enhance performance. To validate our proposed framework, we conduct experiments in various storm scenarios using different rain intensities and evaluate our results in different domain shifts, such as fog visibility, and using the SHIFT dataset. Our results demonstrate that our framework achieves an optimal trade-off between accuracy and speed, surpassing state-of-the-art results, while the introduced strategies enable it to run more than six times faster at a minimal performance loss. / Maskininlärningssystem har visat sig vara mycket effektiva inom olika områden, till exempel i datorseende uppgifter för autonom körning. Spridning av dessa system utgör dock en betydande utmaning när det gäller att säkerställa deras tillförlitlighet och säkerhet i olika och dynamiska miljöer. Online Unsupervised Domain Adaptation (UDA) syftar till att behandla problemet med kontinuerliga domänändringar som kan inträffas under systemets användning, till exempel plötsliga väderförändringar. Även om dessa metoder har en anmärkningsvärd förmåga att anpassa sig till okända domäner, hindras de av den höga beräkningskostnaden som är förknippad med ständig nöndvändighet för anpassning, vilket gör dem olämpliga för verkliga tillämpningar som kräver realtidsprestanda. I detta avhandling fokuserar vi på utmanande uppgiften semantisk segmentering. Vi presenterar ett system för domänanpassning i realtid som använder nya strategier för att möjliggöra onlineanpassning med en hastighet av över 29 FPS på en enda GPU. Vi föreslår en smart partiell backpropagation i kombination med en lätt domänförskjutningsdetektor som identifierar nãr anpassning egentligen behövs, vilket kan konfigureras av domänspecifika hyperparametrar på lämpligt sätt för att förbättra prestandan. För att validera vårt föreslagna system genomför vi experiment i olika stormscenarier med olika regnintensiteter och utvärderar våra resultat i olika domänförskjutningar, såsom dimmasynlighet, och med hjälp av SHIFT-datauppsättningen. Våra resultat visar att vårt system uppnår en optimal avvägning mellan noggrannhet och hastighet, och överträffar toppmoderna resultat, medan de introducerade strategierna gör det möjligt att köra mer än sex gånger snabbare med minimal prestandaförlust. Unsupervised Domain Adaptation Real-Time applications Online Learning Self-Learning Semantic Segmentation Reinforcement Learning Oövervakad domänanpassning Realtidsapplikationer Onlineinlärning Självinlärning Semantisk Segmentering Förstärkningsinlärning Computer and Information Sciences Data- och informationsvetenskap
87	<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong> Liu Yang (16642902) 07 August 2023 (has links) <p> </p> <p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p> <p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p> <p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p> <p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p> <p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p> <p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p> Construction engineering Visual SLAM Building Information Modeling Video Object Segmentation Scene Understanding Weakly Supervised Segmentation Localization Mapping Robotics Construction Automation Image-level labels Semantic Segmentation
88	[pt] BUSCA POR ARQUITETURA NEURAL COM INSPIRAÇÃO QUÂNTICA APLICADA A SEGMENTAÇÃO SEMÂNTICA / [en] QUANTUM-INSPIRED NEURAL ARCHITECTURE SEARCH APPLIED TO SEMANTIC SEGMENTATION GUILHERME BALDO CARLOS 14 July 2023 (has links) [pt] Redes neurais profundas são responsáveis pelo grande progresso em diversas tarefas perceptuais, especialmente nos campos da visão computacional,reconhecimento de fala e processamento de linguagem natural. Estes resultados produziram uma mudança de paradigma nas técnicas de reconhecimentode padrões, deslocando a demanda do design de extratores de característicaspara o design de arquiteturas de redes neurais. No entanto, o design de novas arquiteturas de redes neurais profundas é bastante demandanteem termos de tempo e depende fortemente da intuição e conhecimento de especialistas,além de se basear em um processo de tentativa e erro. Neste contexto, a idea de automatizar o design de arquiteturas de redes neurais profundas tem ganhado popularidade, estabelecendo o campo da busca por arquiteturas neurais(NAS - Neural Architecture Search). Para resolver o problema de NAS, autores propuseram diversas abordagens envolvendo o espaço de buscas, a estratégia de buscas e técnicas para mitigar o consumo de recursos destes algoritmos. O Q-NAS (Quantum-inspired Neural Architecture Search) é uma abordagem proposta para endereçar o problema de NAS utilizando um algoritmo evolucionário com inspiração quântica como estratégia de buscas. Este método foi aplicado de forma bem sucedida em classificação de imagens, superando resultados de arquiteturas de design manual nos conjuntos de dados CIFAR-10 e CIFAR-100 além de uma aplicação de mundo real na área da sísmica. Motivados por este sucesso, propõe-se nesta Dissertação o SegQNAS (Quantum-inspired Neural Architecture Search applied to Semantic Segmentation), uma adaptação do Q-NAS para a tarefa de segmentação semântica. Diversos experimentos foram realizados com objetivo de verificar a aplicabilidade do SegQNAS em dois conjuntos de dados do desafio Medical Segmentation Decathlon. O SegQNAS foi capaz de alcançar um coeficiente de similaridade dice de 0.9583 no conjunto de dados de baço, superando os resultados de arquiteturas tradicionais como U-Net e ResU-Net e atingindo resultados comparáveis a outros trabalhos que aplicaram NAS a este conjunto de dados, mas encontrando arquiteturas com muito menos parãmetros. No conjunto de dados de próstata, o SegQNAS alcançou um coeficiente de similaridade dice de 0.6887 superando a U-Net, ResU-Net e o trabalho na área de NAS que utilizamos como comparação. / [en] Deep neural networks are responsible for great progress in performance for several perceptual tasks, especially in the fields of computer vision, speech recognition, and natural language processing. These results produced a paradigm shift in pattern recognition techniques, shifting the demand from feature extractor design to neural architecture design. However, designing novel deep neural network architectures is very time-consuming and heavily relies on experts intuition, knowledge, and a trial and error process. In that context, the idea of automating the architecture design of deep neural networks has gained popularity, establishing the field of neural architecture search (NAS). To tackle the problem of NAS, authors have proposed several approaches regarding the search space definition, algorithms for the search strategy, and techniques to mitigate the resource consumption of those algorithms. Q-NAS (Quantum-inspired Neural Architecture Search) is one proposed approach to address the NAS problem using a quantum-inspired evolutionary algorithm as the search strategy. That method has been successfully applied to image classification, outperforming handcrafted models on the CIFAR-10 and CIFAR-100 datasets and also on a real-world seismic application. Motivated by this success, we propose SegQNAS (Quantum-inspired Neural Architecture Search applied to Semantic Segmentation), which is an adaptation of Q-NAS applied to semantic segmentation. We carried out several experiments to verify the applicability of SegQNAS on two datasets from the Medical Segmentation Decathlon challenge. SegQNAS was able to achieve a 0.9583 dice similarity coefficient on the spleen dataset, outperforming traditional architectures like U-Net and ResU-Net and comparable results with a similar NAS work from the literature but with fewer parameters network. On the prostate dataset, SegQNAS achieved a 0.6887 dice similarity coefficient, also outperforming U-Net, ResU-Net, and outperforming a similar NAS work from the literature. [pt] ALGORITMOS EVOLUCIONARIOS [pt] COMPUTACAO COM INSPIRACAO QUANTICA [pt] BUSCA POR ARQUITETURA NEURAL [pt] SEGMENTACAO SEMANTICA [en] EVOLUTIONARY ALGORITHMS [en] QUANTUM-INSPIRED COMPUTIN [en] NEURAL ARCHITECTURE SEARCH [en] PIXEL-WISE SEMANTIC SEGMENTATION
89	Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation Dickens, James 01 September 2021 (has links) The rise of convolutional neural networks (CNNs) in the context of computer vision has occurred in tandem with the advancement of depth sensing technology. Depth cameras are capable of yielding two-dimensional arrays storing at each pixel the distance from objects and surfaces in a scene from a given sensor, aligned with a regular color image, obtaining so-called RGBD images. Inspired by prior models in the literature, this work develops a suite of RGBD CNN models to tackle the challenging tasks of object detection, instance segmentation, and semantic segmentation. Prominent architectures for object detection and image segmentation are modified to incorporate dual backbone approaches inputting RGB and depth images, combining features from both modalities through the use of novel fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection approach achieves 53.5% mAP on the SUN RGBD 19-class object detection benchmark, while the proposed RGBD semantic segmentation architecture yields 69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4% mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and SUN RGBD datasets. These benchmarks offer researchers a baseline for the task of RGBD panoptic segmentation on these datasets, where the novel depth-aware model outperforms a comparable RGB counterpart. Deep learning Computer vision CNN Object detection Semantic segmentation Instance segmentation Multi-modal deep learning Panoptic segmentation Artificial intelligence Convolutional neural networks Neural networks RGBD Depth images
90	Transformer Based Object Detection and Semantic Segmentation for Autonomous Driving Hardebro, Mikaela, Jirskog, Elin January 2022 (has links) The development of autonomous driving systems has been one of the most popular research areas in the 21st century. One key component of these kinds of systems is the ability to perceive and comprehend the physical world. Two techniques that address this are object detection and semantic segmentation. During the last decade, CNN based models have dominated these types of tasks. However, in 2021, transformer based networks were able to outperform the existing CNN approach, therefore, indicating a paradigm shift in the domain. This thesis aims to explore the use of a vision transformer, particularly a Swin Transformer, in an object detection and semantic segmentation framework, and compare it to a classical CNN on road scenes. In addition, since real-time execution is crucial for autonomous driving systems, the possibility of a parameter reduction of the transformer based network is investigated. The results appear to be advantageous for the Swin Transformer compared to the convolutional based network, considering both object detection and semantic segmentation. Furthermore, the analysis indicates that it is possible to reduce the computational complexity while retaining the performance. Computer Vision Autonomous Driving Machine Learning Transformers Swin CNN Object Detection Semantic Segmentation Grad-CAM PCA Mean Attention Distance

Search results