Spelling suggestions: "subject:"semanticsegmentation"" "subject:"adaptivesegmentation""
101 |
Design of Mobility Cyber Range and Vision-Based Adversarial Attacks on Camera Sensors in Autonomous VehiclesRamayee, Harish Asokan January 2021 (has links)
No description available.
|
102 |
LiDAR Point Cloud De-noising for Adverse WeatherBergius, Johan, Holmblad, Jesper January 2022 (has links)
Light Detection And Ranging (LiDAR) is a hot topic today primarily because of its vast importance within autonomous vehicles. LiDAR sensors are capable of capturing and identifying objects in the 3D environment. However, a drawback of LiDAR is that they perform poorly under adverse weather conditions. Noise present in LiDAR scans can be divided into random and pseudo-random noise. Random noise can be modeled and mitigated by statistical means. The same approach works on pseudo-random noise, but it is less effective. For this, Deep Neural Nets (DNN) are better suited. The main goal of this thesis is to investigate how snow can be detected in LiDAR point clouds and filtered out. The dataset used is Winter Adverse DrivingdataSet (WADS). Supervised filtering contains a comparison between statistical filtering and segmentation-based neural networks and is evaluated on recall, precision, and F1. The supervised approach is expanded by investigating an ensemble approach. The supervised result indicates that neural networks have an advantage over statistical filters, and the best result was obtained from the 3D convolution network with an F1 score of 94.58%. Our ensemble approaches improved the F1 score but did not lead to more snow being removed. We determine that an ensemble approach is a sub-optimal way of increasing the prediction performance and holds the drawback of being more complex. We also investigate an unsupervised approach. The unsupervised networks are evaluated on their ability to find noisy data and correct it. Correcting the LiDAR data means predicting new values for detected noise instead of just removing it. Correctness of such predictions is evaluated manually but with the assistance of metrics like PSNR and SSIM. None of the unsupervised networks produced an acceptable result. The reason behind this negative result is investigated and presented in our conclusion, along with a model that suffers none of the flaws pointed out.
|
103 |
Classification of Terrain Roughness from Nationwide Data Sources Using Deep LearningFredriksson, Emily January 2022 (has links)
3D semantic segmentation is an expanding topic within the field of computer vision, which has received more attention in recent years due to the development of more powerful GPUs and the newpossibilities offered by deep learning techniques. Simultaneously, the amount of available spatial LiDAR data over Sweden has also increased. This work combines these two advances and investigates if a 3D deep learning model for semantic segmentation can learn to detect terrain roughness in airborne LiDAR data. The annotations for terrain roughness used in this work are taken from SGUs 2D soil type map. Other airborne data sources are also used to filter the annotations and see if additional information can boost the performance of the model. Since this is the first known attempt at terrain roughness classification from 3D data, an initial test was performed where fields were classified. This ensured that the model could process airborne LiDAR data and work for a terrain classification task. The classification of fields showed very promising results without any fine-tuning. The results for the terrain roughness classification task show that the model could find a pattern in the validation data but had difficulty generalizing it to the test data. The filtering methods tested gave an increased mIoU and indicated that better annotations might be necessary to distinguish terrain roughness from other terrain types. None of the features obtained from the other data sources improved the results and showed no discriminating abilities when examining their individual histograms. In the end, more research is needed to determine whether terrain roughness can be detected from LiDAR data or not.
|
104 |
Learning from Synthetic Data : Towards Effective Domain Adaptation Techniques for Semantic Segmentation of Urban Scenes / Lärande från Syntetiska Data : Mot Effektiva Domänanpassningstekniker för Semantisk Segmentering av Urbana ScenerValls I Ferrer, Gerard January 2021 (has links)
Semantic segmentation is the task of predicting predefined class labels for each pixel in a given image. It is essential in autonomous driving, but also challenging because training accurate models requires large and diverse datasets, which are difficult to collect due to the high cost of annotating images at pixel-level. This raises interest in using synthetic images from simulators, which can be labelled automatically. However, models trained directly on synthetic data perform poorly in real-world scenarios due to the distributional misalignment between synthetic and real images (domain shift). This thesis explores the effectiveness of several techniques for alleviating this issue, employing Synscapes and Cityscapes as the synthetic and real datasets, respectively. Some of the tested methods exploit a few additional labelled real images (few-shot supervised domain adaptation), some have access to plentiful real images but not their associated labels (unsupervised domain adaptation), and others do not take advantage of any image or annotation from the real domain (domain generalisation). After extensive experiments and a thorough comparative study, this work shows the severity of the domain shift problem by revealing that a semantic segmentation model trained directly on the synthetic dataset scores a poor mean Intersection over Union (mIoU) of 33:5% when tested on the real dataset. This thesis also demonstrates that such performance can be boosted by 25:7% without accessing any annotations from the real domain and 17:3% without leveraging any information from the real domain. Nevertheless, these gains are still inferior to the 31:0% relative improvement achieved with as little as 25 supplementary labelled real images, which suggests that there is still room for improvement in the fields of unsupervised domain adaptation and domain generalisation. Future work efforts should focus on developing better algorithms and creating synthetic datasets with a greater diversity of shapes and textures in order to reduce the domain shift. / Semantisk segmentering är uppgiften att förutsäga fördefinierade klassetiketter för varje pixel i en given bild. Det är viktigt för autonom körning, men också utmanande eftersom utveckling av noggranna modeller kräver stora och varierade datamängder, som är svåra att samla in på grund av de höga kostnaderna för att märka bilder på pixelnivå. Detta väcker intresset att använda syntetiska bilder från simulatorer, som kan märkas automatiskt. Problemet är emellertid att modeller som tränats direkt på syntetiska data presterar dåligt i verkliga scenarier på grund av fördelningsfel mellan syntetiska och verkliga bilder (domänskift). Denna avhandling undersöker effektiviteten hos flera tekniker för att lindra detta problem, med Synscapes och Cityscapes som syntetiska respektive verkliga datamängder. Några av de testade metoderna utnyttjar några ytterligare märkta riktiga bilder (few-shot övervakad domänanpassning), vissa har tillgång till många riktiga bilder men inte deras associerade etiketter (oövervakad domänanpassning), och andra drar inte nytta av någon bild eller annotering från den verkliga domänen (domängeneralisering). Efter omfattande experiment och en grundlig jämförande studie visar detta arbete svårighetsgraden av domänskiftproblemet genom att avslöja att en semantisk segmenteringsmodell som upplärts direkt på den syntetiska datauppsättningen ger en dålig mean Intersection over Union (mIoU) på 33; 5% när den testas på den verkliga datamängden. Denna avhandling visar också att sådan prestanda kan ökas med 25; 7% utan att komma åt några annoteringar från den verkliga domänen och 17; 3% utan att utnyttja någon information från den verkliga domänen. Ändå är dessa vinster fortfarande sämre än den 31; 0% relativa förbättringen som uppnåtts med så lite som 25 kompletterande annoterade riktiga bilder, vilket tyder på att det fortfarande finns utrymme för förbättringar inom områdena oövervakad domänanpassning och domängeneralisering. Framtida arbetsinsatser bör fokusera på att utveckla bättre algoritmer och på att skapa syntetiska datamängder med en större mångfald av former och texturer för att minska domänskiftet.
|
105 |
Semantic Segmentation of Historical Document Images Using Recurrent Neural NetworksAhrneteg, Jakob, Kulenovic, Dean January 2019 (has links)
Background. This thesis focuses on the task of historical document semantic segmentation with recurrent neural networks. Document semantic segmentation involves the segmentation of a page into different meaningful regions and is an important prerequisite step of automated document analysis and digitisation with optical character recognition. At the time of writing, convolutional neural network based solutions are the state-of-the-art for analyzing document images while the use of recurrent neural networks in document semantic segmentation has not yet been studied. Considering the nature of a recurrent neural network and the recent success of recurrent neural networks in document image binarization, it should be possible to employ a recurrent neural network for document semantic segmentation and further achieve high performance results. Objectives. The main objective of this thesis is to investigate if recurrent neural networks are a viable alternative to convolutional neural networks in document semantic segmentation. By using a combination of a convolutional neural network and a recurrent neural network, another objective is also to determine if the performance of the combination can improve upon the existing case of only using the recurrent neural network. Methods. To investigate the impact of recurrent neural networks in document semantic segmentation, three different recurrent neural network architectures are implemented and trained while their performance are further evaluated with Intersection over Union. Afterwards their segmentation result are compared to a convolutional neural network. By performing pre-processing on training images and multi-class labeling, prediction images are ultimately produced by the employed models. Results. The results from the gathered performance data shows a 2.7% performance difference between the best recurrent neural network model and the convolutional neural network. Notably, it can be observed that this recurrent neural network model has a more consistent performance than the convolutional neural network but comparable performance results overall. For the other recurrent neural network architectures lower performance results are observed which is connected to the complexity of these models. Furthermore, by analyzing the performance results of a model using a combination of a convolutional neural network and a recurrent neural network, it can be noticed that the combination performs significantly better with a 4.9% performance increase compared to the case with only using the recurrent neural network. Conclusions. This thesis concludes that recurrent neural networks are likely a viable alternative to convolutional neural networks in document semantic segmentation but that further investigation is required. Furthermore, by combining a convolutional neural network with a recurrent neural network it is concluded that the performance of a recurrent neural network model is significantly increased. / Bakgrund. Detta arbete handlar om semantisk segmentering av historiska dokument med recurrent neural network. Semantisk segmentering av dokument inbegriper att dela in ett dokument i olika regioner, något som är viktigt för att i efterhand kunna utföra automatisk dokument analys och digitalisering med optisk teckenläsning. Vidare är convolutional neural network det främsta alternativet för bearbetning av dokument bilder medan recurrent neural network aldrig har använts för semantisk segmentering av dokument. Detta är intressant eftersom om vi tar hänsyn till hur ett recurrent neural network fungerar och att recurrent neural network har uppnått mycket bra resultat inom binär bearbetning av dokument, borde det likväl vara möjligt att använda ett recurrent neural network för semantisk segmentering av dokument och även här uppnå bra resultat. Syfte. Syftet med arbetet är att undersöka om ett recurrent neural network kan uppnå ett likvärdigt resultat jämfört med ett convolutional neural network för semantisk segmentering av dokument. Vidare är syftet även att undersöka om en kombination av ett convolutional neural network och ett recurrent neural network kan ge ett bättre resultat än att bara endast använda ett recurrent neural network. Metod. För att kunna avgöra om ett recurrent neural network är ett lämpligt alternativ för semantisk segmentering av dokument utvärderas prestanda resultatet för tre olika modeller av recurrent neural network. Därefter jämförs dessa resultat med prestanda resultatet för ett convolutional neural network. Vidare utförs förbehandling av bilder och multi klassificering för att modellerna i slutändan ska kunna producera mätbara resultat av uppskattnings bilder. Resultat. Genom att utvärdera prestanda resultaten för modellerna kan vi i en jämförelse med den bästa modellen och ett convolutional neural network uppmäta en prestanda skillnad på 2.7%. Noterbart i det här fallet är att den bästa modellen uppvisar en jämnare fördelning av prestanda. För de två modellerna som uppvisade en lägre prestanda kan slutsatsen dras att deras utfall beror på en lägre modell komplexitet. Vidare vid en jämförelse av dessa två modeller, där den ena har en kombination av ett convolutional neural network och ett recurrent neural network medan den andra endast har ett recurrent neural network uppmäts en prestanda skillnad på 4.9%. Slutsatser. Resultatet antyder att ett recurrent neural network förmodligen är ett lämpligt alternativ till ett convolutional neural network för semantisk segmentering av dokument. Vidare dras slutsatsen att en kombination av de båda varianterna bidrar till ett bättre prestanda resultat.
|
106 |
Structuring of image databases for the suggestion of products for online advertising / Structuration des bases d’images pour la suggestion des produits pour la publicité en ligneYang, Lixuan 10 July 2017 (has links)
Le sujet de la thèse est l'extraction et la segmentation des vêtements à partir d'images en utilisant des techniques de la vision par ordinateur, de l'apprentissage par ordinateur et de la description d'image, pour la recommandation de manière non intrusive aux utilisateurs des produits similaires provenant d'une base de données de vente. Nous proposons tout d'abord un extracteur d'objets dédié à la segmentation de la robe en combinant les informations locales avec un apprentissage préalable. Un détecteur de personne localises des sites dans l'image qui est probable de contenir l'objet. Ensuite, un processus d'apprentissage intra-image en deux étapes est est développé pour séparer les pixels de l'objet de fond. L'objet est finalement segmenté en utilisant un algorithme de contour actif qui prend en compte la segmentation précédente et injecte des connaissances spécifiques sur la courbure locale dans la fonction énergie. Nous proposons ensuite un nouveau framework pour l'extraction des vêtements généraux en utilisant une procédure d'ajustement globale et locale à trois étapes. Un ensemble de modèles initialises un processus d'extraction d'objet par un alignement global du modèle, suivi d'une recherche locale en minimisant une mesure de l'inadéquation par rapport aux limites potentielles dans le voisinage. Les résultats fournis par chaque modèle sont agrégés, mesuré par un critère d'ajustement globale, pour choisir la segmentation finale. Dans notre dernier travail, nous étendons la sortie d'un réseau de neurones Fully Convolutional Network pour inférer le contexte à partir d'unités locales (superpixels). Pour ce faire, nous optimisons une fonction énergie, qui combine la structure à grande échelle de l'image avec le local structure superpixels, en recherchant dans l'espace de toutes les possibilité d'étiquetage. De plus, nous introduisons une nouvelle base de données RichPicture, constituée de 1000 images pour l'extraction de vêtements à partir d'images de mode. Les méthodes sont validées sur la base de données publiques et se comparent favorablement aux autres méthodes selon toutes les mesures de performance considérées. / The topic of the thesis is the extraction and segmentation of clothing items from still images using techniques from computer vision, machine learning and image description, in view of suggesting non intrusively to the users similar items from a database of retail products. We firstly propose a dedicated object extractor for dress segmentation by combining local information with a prior learning. A person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function.We then propose a new framework for extracting general deformable clothing items by using a three stage global-local fitting procedure. A set of template initiates an object extraction process by a global alignment of the model, followed by a local search minimizing a measure of the misfit with respect to the potential boundaries in the neighborhood. The results provided by each template are aggregated, with a global fitting criterion, to obtain the final segmentation.In our latest work, we extend the output of a Fully Convolution Neural Network to infer context from local units(superpixels). To achieve this we optimize an energy function,that combines the large scale structure of the image with the locallow-level visual descriptions of superpixels, over the space of all possiblepixel labellings. In addition, we introduce a novel dataset called RichPicture, consisting of 1000 images for clothing extraction from fashion images.The methods are validated on the public database and compares favorably to the other methods according to all the performance measures considered.
|
107 |
Real-time Unsupervised Domain Adaptation / Oövervakad domänanpassning i realtidBotet Colomer, Marc January 2023 (has links)
Machine learning systems have been demonstrated to be highly effective in various fields, such as in vision tasks for autonomous driving. However, the deployment of these systems poses a significant challenge in terms of ensuring their reliability and safety in diverse and dynamic environments. Online Unsupervised Domain Adaptation (UDA) aims to address the issue of continuous domain changes that may occur during deployment, such as sudden weather changes. Although these methods possess a remarkable ability to adapt to unseen domains, they are hindered by the high computational cost associated with constant adaptation, making them unsuitable for real-world applications that demand real-time performance. In this work, we focus on the challenging task of semantic segmentation. We present a framework for real-time domain adaptation that utilizes novel strategies to enable online adaptation at a rate of over 29 FPS on a single GPU. We propose a clever partial backpropagation in conjunction with a lightweight domain-shift detector that identifies the need for adaptation, adapting appropriately domain-specific hyperparameters to enhance performance. To validate our proposed framework, we conduct experiments in various storm scenarios using different rain intensities and evaluate our results in different domain shifts, such as fog visibility, and using the SHIFT dataset. Our results demonstrate that our framework achieves an optimal trade-off between accuracy and speed, surpassing state-of-the-art results, while the introduced strategies enable it to run more than six times faster at a minimal performance loss. / Maskininlärningssystem har visat sig vara mycket effektiva inom olika områden, till exempel i datorseende uppgifter för autonom körning. Spridning av dessa system utgör dock en betydande utmaning när det gäller att säkerställa deras tillförlitlighet och säkerhet i olika och dynamiska miljöer. Online Unsupervised Domain Adaptation (UDA) syftar till att behandla problemet med kontinuerliga domänändringar som kan inträffas under systemets användning, till exempel plötsliga väderförändringar. Även om dessa metoder har en anmärkningsvärd förmåga att anpassa sig till okända domäner, hindras de av den höga beräkningskostnaden som är förknippad med ständig nöndvändighet för anpassning, vilket gör dem olämpliga för verkliga tillämpningar som kräver realtidsprestanda. I detta avhandling fokuserar vi på utmanande uppgiften semantisk segmentering. Vi presenterar ett system för domänanpassning i realtid som använder nya strategier för att möjliggöra onlineanpassning med en hastighet av över 29 FPS på en enda GPU. Vi föreslår en smart partiell backpropagation i kombination med en lätt domänförskjutningsdetektor som identifierar nãr anpassning egentligen behövs, vilket kan konfigureras av domänspecifika hyperparametrar på lämpligt sätt för att förbättra prestandan. För att validera vårt föreslagna system genomför vi experiment i olika stormscenarier med olika regnintensiteter och utvärderar våra resultat i olika domänförskjutningar, såsom dimmasynlighet, och med hjälp av SHIFT-datauppsättningen. Våra resultat visar att vårt system uppnår en optimal avvägning mellan noggrannhet och hastighet, och överträffar toppmoderna resultat, medan de introducerade strategierna gör det möjligt att köra mer än sex gånger snabbare med minimal prestandaförlust.
|
108 |
<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong>Liu Yang (16642902) 07 August 2023 (has links)
<p> </p>
<p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p>
<p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p>
<p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p>
<p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p>
<p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p>
<p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p>
|
109 |
[pt] BUSCA POR ARQUITETURA NEURAL COM INSPIRAÇÃO QUÂNTICA APLICADA A SEGMENTAÇÃO SEMÂNTICA / [en] QUANTUM-INSPIRED NEURAL ARCHITECTURE SEARCH APPLIED TO SEMANTIC SEGMENTATIONGUILHERME BALDO CARLOS 14 July 2023 (has links)
[pt] Redes neurais profundas são responsáveis pelo grande progresso em diversas tarefas perceptuais, especialmente nos campos da visão computacional,reconhecimento de fala e processamento de linguagem natural. Estes resultados produziram uma mudança de paradigma nas técnicas de reconhecimentode padrões, deslocando a demanda do design de extratores de característicaspara o design de arquiteturas de redes neurais. No entanto, o design de novas arquiteturas de redes neurais profundas é bastante demandanteem termos de tempo e depende fortemente da intuição e conhecimento de especialistas,além de se basear em um processo de tentativa e erro. Neste contexto, a idea de automatizar o design de arquiteturas de redes neurais profundas tem ganhado popularidade, estabelecendo o campo da busca por arquiteturas neurais(NAS - Neural Architecture Search). Para resolver o problema de NAS, autores propuseram diversas abordagens envolvendo o espaço de buscas, a estratégia de buscas e técnicas para mitigar o consumo de recursos destes algoritmos. O Q-NAS (Quantum-inspired Neural Architecture Search) é uma abordagem proposta para endereçar o problema de NAS utilizando um algoritmo evolucionário com inspiração quântica como estratégia de buscas. Este método foi aplicado de forma bem sucedida em classificação de imagens, superando resultados de arquiteturas de design manual nos conjuntos de dados CIFAR-10 e CIFAR-100 além de uma aplicação de mundo real na área da sísmica. Motivados por este sucesso, propõe-se nesta Dissertação o SegQNAS (Quantum-inspired Neural Architecture Search applied to Semantic Segmentation), uma adaptação do Q-NAS para a tarefa de segmentação semântica. Diversos experimentos foram realizados com objetivo de verificar a aplicabilidade do SegQNAS em dois conjuntos de dados do desafio Medical Segmentation Decathlon. O SegQNAS foi capaz de alcançar um coeficiente de similaridade dice de 0.9583 no conjunto de dados de baço, superando os resultados de arquiteturas tradicionais como U-Net e ResU-Net e atingindo resultados comparáveis a outros trabalhos que aplicaram NAS a este conjunto de dados, mas encontrando arquiteturas com muito menos parãmetros. No conjunto de dados de próstata, o SegQNAS alcançou um coeficiente de similaridade dice de 0.6887 superando a U-Net, ResU-Net e o trabalho na área de NAS que utilizamos como comparação. / [en] Deep neural networks are responsible for great progress in performance
for several perceptual tasks, especially in the fields of computer vision, speech
recognition, and natural language processing. These results produced a paradigm shift in pattern recognition techniques, shifting the demand from feature
extractor design to neural architecture design. However, designing novel deep
neural network architectures is very time-consuming and heavily relies on experts intuition, knowledge, and a trial and error process. In that context, the
idea of automating the architecture design of deep neural networks has gained
popularity, establishing the field of neural architecture search (NAS). To tackle the problem of NAS, authors have proposed several approaches regarding
the search space definition, algorithms for the search strategy, and techniques
to mitigate the resource consumption of those algorithms. Q-NAS (Quantum-inspired Neural Architecture Search) is one proposed approach to address the
NAS problem using a quantum-inspired evolutionary algorithm as the search
strategy. That method has been successfully applied to image classification,
outperforming handcrafted models on the CIFAR-10 and CIFAR-100 datasets
and also on a real-world seismic application. Motivated by this success, we
propose SegQNAS (Quantum-inspired Neural Architecture Search applied to
Semantic Segmentation), which is an adaptation of Q-NAS applied to semantic
segmentation. We carried out several experiments to verify the applicability
of SegQNAS on two datasets from the Medical Segmentation Decathlon challenge. SegQNAS was able to achieve a 0.9583 dice similarity coefficient on the
spleen dataset, outperforming traditional architectures like U-Net and ResU-Net and comparable results with a similar NAS work from the literature but
with fewer parameters network. On the prostate dataset, SegQNAS achieved
a 0.6887 dice similarity coefficient, also outperforming U-Net, ResU-Net, and
outperforming a similar NAS work from the literature.
|
110 |
Depth-Aware Deep Learning Networks for Object Detection and Image SegmentationDickens, James 01 September 2021 (has links)
The rise of convolutional neural networks (CNNs) in the context of computer vision
has occurred in tandem with the advancement of depth sensing technology.
Depth cameras are capable of yielding two-dimensional arrays storing at each pixel
the distance from objects and surfaces in a scene from a given sensor, aligned with
a regular color image, obtaining so-called RGBD images. Inspired by prior models
in the literature, this work develops a suite of RGBD CNN models to tackle
the challenging tasks of object detection, instance segmentation, and semantic
segmentation. Prominent architectures for object detection and image segmentation
are modified to incorporate dual backbone approaches inputting RGB and
depth images, combining features from both modalities through the use of novel
fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection
approach achieves 53.5% mAP on the SUN RGBD 19-class object detection
benchmark, while the proposed RGBD semantic segmentation architecture yields
69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation
benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4%
mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and
SUN RGBD datasets. These benchmarks offer researchers a baseline for the task
of RGBD panoptic segmentation on these datasets, where the novel depth-aware
model outperforms a comparable RGB counterpart.
|
Page generated in 0.1248 seconds