Global ETD Search

401	Interactive segmentation of multiple 3D objects in medical images by optimum graph cuts = Segmentação interativa de múltiplos objetos 3D em imagens médicas por cortes ótimos em grafo / Segmentação interativa de múltiplos objetos 3D em imagens médicas por cortes ótimos em grafo Moya, Nikolas, 1991- 03 December 2015 (has links) Orientador: Alexandre Xavier Falcão / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-27T14:45:13Z (GMT). No. of bitstreams: 1 Moya_Nikolas_M.pdf: 5706960 bytes, checksum: 9304544bfe8a78039de8b62562531865 (MD5) Previous issue date: 2015 / Resumo: Segmentação de imagens médicas é crucial para extrair medidas de objetos 3D (estruturas anatômicas) que são úteis no diagnóstico e tratamento de doenças. Nestas aplicações, segmentação interativa é necessária quando métodos automáticos falham ou não são factíveis. Métodos por corte em grafo são considerados o estado da arte em segmentação interativa, mas diversas abordagens utilizam o algoritmo min-cut/max-flow, que é limitado à segmentação binária, sendo que segmentação de múltiplos objetos pode economizar tempo e esforço do usuário. Este trabalho revisita a transformada imagem floresta diferencial (DIFT, em inglês) -- uma abordagem por corte em grafo adequada para segmentação de múltiplos objetos -- resolvendo problemas relacionados a ela. O algoritmo da DIFT executa em tempo proporcional ao número de voxels nas regiões modificadas em cada execução da segmentação (sublinear). Tal característica é altamente desejável em segmentação interativa de imagens 3D para responder as ações do usuário em tempo real. O algoritmo da DIFT funciona da seguinte forma: o usuário desenha marcadores (traço com voxels de semente) rotulados dentro de cada objeto e o fundo, enquanto o computador interpreta a imagem como um grafo, cujos nós são os voxels e os arcos são definidos por pixels vizinhos, produzindo como resultado uma floresta de caminhos ótimos (partição na imagem) enraizada nos nós sementes do grafo. Nesta floresta, cada objeto é representado pela floresta de caminhos ótimos enraizado em suas sementes internas. Tais árvores são pintadas com a mesmo cor associada ao rótulo do marcador correspondente. Ao adicionar ou remover marcadores, é possível corrigir a segmentação até o mapa de rótulo de objeto representar o resultado desejado. Para garantir consistência na segmentação, métodos baseados em semente sempre devem manter a conectividade entre os voxels e suas sementes. Entretanto, isto não é mantido em algumas abordagens, como Random Walkers ou quando o mapa de rótulos é filtrado para suavizar a fronteira dos objetos. Esta conectividade é primordial para realizar correções sem recomeçar o processo depois de cada intervenção do usuário. Todavia, foi observado que a DIFT falha em manter consistência da segmentação em alguns casos. Consertamos este problema tanto no algoritmo da DIFT, quanto após a suavização dos objetos. Estes resultados comparam diversas estruturas anatômicas 3D de imagens de ressonância magnética e tomografia computadorizada / Abstract: Medical image segmentation is crucial to extract measures from 3D objects (body anatomical structures) that are useful for diagnosis and treatment of diseases. In such applications, interactive segmentation is necessary whenever automated methods fail or are not feasible. Graph-cut methods are considered the state of the art in interactive segmentation, but most approaches rely on the min-cut/max-flow algorithm, which is limited to binary segmentation while multi-object segmentation can considerably save user time and effort. This work revisits the differential image foresting transform (DIFT) ¿ a graph-cut approach suitable for multi-object segmentation in linear time ¿ and solves several problems related to it. Indeed, the DIFT algorithm can take time proportional to the number of voxels in the regions modified at each segmentation execution (sublinear time). Such a characteristic is highly desirable in 3D interactive segmentation to respond the user's actions as close as possible to real time. Segmentation using the DIFT works as follows: the user draws labeled markers (strokes of connected seed voxels) inside each object and background, while the computer interprets the image as a graph, whose nodes are the voxels and arcs are defined by neighboring voxels, and outputs an optimum-path forest (image partition) rooted at the seed nodes in the graph. In the forest, each object is represented by the optimum-path trees rooted at its internal seeds. Such trees are painted with same color associated to the label of the corresponding marker. By adding/removing markers, the user can correct segmentation until the forest (its object label map) represents the desired result. For the sake of consistency in segmentation, similar seed-based methods should always maintain the connectivity between voxels and seeds that have labeled them. However, this does not hold in some approaches, such as random walkers, or when the segmentation is filtered to smooth object boundaries. That connectivity is also paramount to make corrections without starting over the process at each user intervention. However, we observed that the DIFT algorithm fails in maintaining segmentation consistency in some cases. We have fixed this problem in the DIFT algorithm and when the obtained object boundaries are smoothed. These results are presented and evaluated on several 3D body anatomical structures from MR and CT images / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Processamento de imagens Segmentação de imagens Corte de grafos Segmentação de imagens médicas Segmentação de múltiplos objetos Image processing Image segmentation Cut graphs Medical image segmentation Multi-object segmentation
402	Segmentace obrazu jako výškové mapy / Image Segmentation Using Height Maps Moučka, Milan January 2011 (has links) This thesis deals with image segmentation of volumetric medical data. It describes a well-known watershed technique that has received much attention in the field of medical image processing. An application for a direct segmentation of 3D data is proposed and further implemented by using ITK and VTK toolkits. Several kinds of pre-processing steps used before the watershed method are presented and evaluated. The obtained results are further compared against manually annotated datasets by means of the F-Measure and discussed.
403	Steps towards end-to-end neural speaker diarization / Étapes vers un système neuronal de bout en bout pour la tâche de segmentation et de regroupement en locuteurs Yin, Ruiqing 26 September 2019 (has links) La tâche de segmentation et de regroupement en locuteurs (speaker diarization) consiste à identifier "qui parle quand" dans un flux audio sans connaissance a priori du nombre de locuteurs ou de leur temps de parole respectifs. Les systèmes de segmentation et de regroupement en locuteurs sont généralement construits en combinant quatre étapes principales. Premièrement, les régions ne contenant pas de parole telles que les silences, la musique et le bruit sont supprimées par la détection d'activité vocale (VAD). Ensuite, les régions de parole sont divisées en segments homogènes en locuteur par détection des changements de locuteurs, puis regroupées en fonction de l'identité du locuteur. Enfin, les frontières des tours de parole et leurs étiquettes sont affinées avec une étape de re-segmentation. Dans cette thèse, nous proposons d'aborder ces quatre étapes avec des approches fondées sur les réseaux de neurones. Nous formulons d’abord le problème de la segmentation initiale (détection de l’activité vocale et des changements entre locuteurs) et de la re-segmentation finale sous la forme d’un ensemble de problèmes d’étiquetage de séquence, puis nous les résolvons avec des réseaux neuronaux récurrents de type Bi-LSTM (Bidirectional Long Short-Term Memory). Au stade du regroupement des régions de parole, nous proposons d’utiliser l'algorithme de propagation d'affinité à partir de plongements neuronaux de ces tours de parole dans l'espace vectoriel des locuteurs. Des expériences sur un jeu de données télévisées montrent que le regroupement par propagation d'affinité est plus approprié que le regroupement hiérarchique agglomératif lorsqu'il est appliqué à des plongements neuronaux de locuteurs. La segmentation basée sur les réseaux récurrents et la propagation d'affinité sont également combinées et optimisées conjointement pour former une chaîne de regroupement en locuteurs. Comparé à un système dont les modules sont optimisés indépendamment, la nouvelle chaîne de traitements apporte une amélioration significative. De plus, nous proposons d’améliorer l'estimation de la matrice de similarité par des réseaux neuronaux récurrents, puis d’appliquer un partitionnement spectral à partir de cette matrice de similarité améliorée. Le système proposé atteint des performances à l'état de l'art sur la base de données de conversation téléphonique CALLHOME. Enfin, nous formulons le regroupement des tours de parole en mode séquentiel sous la forme d'une tâche supervisée d’étiquetage de séquence et abordons ce problème avec des réseaux récurrents empilés. Pour mieux comprendre le comportement du système, une analyse basée sur une architecture de codeur-décodeur est proposée. Sur des exemples synthétiques, nos systèmes apportent une amélioration significative par rapport aux méthodes de regroupement traditionnelles. / Speaker diarization is the task of determining "who speaks when" in an audio stream that usually contains an unknown amount of speech from an unknown number of speakers. Speaker diarization systems are usually built as the combination of four main stages. First, non-speech regions such as silence, music, and noise are removed by Voice Activity Detection (VAD). Next, speech regions are split into speaker-homogeneous segments by Speaker Change Detection (SCD), later grouped according to the identity of the speaker thanks to unsupervised clustering approaches. Finally, speech turn boundaries and labels are (optionally) refined with a re-segmentation stage. In this thesis, we propose to address these four stages with neural network approaches. We first formulate both the initial segmentation (voice activity detection and speaker change detection) and the final re-segmentation as a set of sequence labeling problems and then address them with Bidirectional Long Short-Term Memory (Bi-LSTM) networks. In the speech turn clustering stage, we propose to use affinity propagation on top of neural speaker embeddings. Experiments on a broadcast TV dataset show that affinity propagation clustering is more suitable than hierarchical agglomerative clustering when applied to neural speaker embeddings. The LSTM-based segmentation and affinity propagation clustering are also combined and jointly optimized to form a speaker diarization pipeline. Compared to the pipeline with independently optimized modules, the new pipeline brings a significant improvement. In addition, we propose to improve the similarity matrix by bidirectional LSTM and then apply spectral clustering on top of the improved similarity matrix. The proposed system achieves state-of-the-art performance in the CALLHOME telephone conversation dataset. Finally, we formulate sequential clustering as a supervised sequence labeling task and address it with stacked RNNs. To better understand its behavior, the analysis is based on a proposed encoder-decoder architecture. Our proposed systems bring a significant improvement compared with traditional clustering methods on toy examples. Détection des changements de locuteurs Segmentation LSTM Propagation d'affinité Partitionnement spectral Speaker diarization Speaker change detection Speech segmentation LSTM Affinity propagation Spectral clustering
404	<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong> Liu Yang (16642902) 07 August 2023 (has links) <p> </p> <p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p> <p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p> <p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p> <p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p> <p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p> <p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p> Construction engineering Visual SLAM Building Information Modeling Video Object Segmentation Scene Understanding Weakly Supervised Segmentation Localization Mapping Robotics Construction Automation Image-level labels Semantic Segmentation
405	Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation Dickens, James 01 September 2021 (has links) The rise of convolutional neural networks (CNNs) in the context of computer vision has occurred in tandem with the advancement of depth sensing technology. Depth cameras are capable of yielding two-dimensional arrays storing at each pixel the distance from objects and surfaces in a scene from a given sensor, aligned with a regular color image, obtaining so-called RGBD images. Inspired by prior models in the literature, this work develops a suite of RGBD CNN models to tackle the challenging tasks of object detection, instance segmentation, and semantic segmentation. Prominent architectures for object detection and image segmentation are modified to incorporate dual backbone approaches inputting RGB and depth images, combining features from both modalities through the use of novel fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection approach achieves 53.5% mAP on the SUN RGBD 19-class object detection benchmark, while the proposed RGBD semantic segmentation architecture yields 69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4% mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and SUN RGBD datasets. These benchmarks offer researchers a baseline for the task of RGBD panoptic segmentation on these datasets, where the novel depth-aware model outperforms a comparable RGB counterpart. Deep learning Computer vision CNN Object detection Semantic segmentation Instance segmentation Multi-modal deep learning Panoptic segmentation Artificial intelligence Convolutional neural networks Neural networks RGBD Depth images
406	Fashion Object Detection and Pixel-Wise Semantic Segmentation : Crowdsourcing framework for image bounding box detection & Pixel-Wise Segmentation Mallu, Mallu January 2018 (has links) Technology has revamped every aspect of our life, one of those various facets is fashion industry. Plenty of deep learning architectures are taking shape to augment fashion experiences for everyone. There are numerous possibilities of enhancing the fashion technology with deep learning. One of the key ideas is to generate fashion style and recommendation using artificial intelligence. Likewise, another significant feature is to gather reliable information of fashion trends, which includes analysis of existing fashion related images and data. When specifically dealing with images, localisation and segmentation are well known to address in-depth study relating to pixels, objects and labels present in the image. In this master thesis a complete framework is presented to perform localisation and segmentation on fashionista images. This work is a part of an interesting research work related to Fashion Style detection and Recommendation. Developed solution aims to leverage the possibility of localising fashion items in an image by drawing bounding boxes and labelling them. Along with that, it also provides pixel-wise semantic segmentation functionality which extracts fashion item label-pixel data. Collected data can serve as ground truth as well as training data for the aimed deep learning architecture. A study related to localisation and segmentation of videos has also been presented in this work. The developed system has been evaluated in terms of flexibility, output quality and reliability as compared to similar platforms. It has proven to be fully functional solution capable of providing essential localisation and segmentation services while keeping the core architecture simple and extensible. / Tekniken har förnyat alla aspekter av vårt liv, en av de olika fasetterna är modeindustrin. Massor av djupa inlärningsarkitekturer tar form för att öka modeupplevelser för alla. Det finns många möjligheter att förbättra modetekniken med djup inlärning. En av de viktigaste idéerna är att skapa modestil och rekommendation med hjälp av artificiell intelligens. På samma sätt är en annan viktig egenskap att samla pålitlig information om modetrender, vilket inkluderar analys av befintliga moderelaterade bilder och data. När det specifikt handlar om bilder är lokalisering och segmentering väl kända för att ta itu med en djupgående studie om pixlar, objekt och etiketter som finns i bilden. I denna masterprojekt presenteras en komplett ram för att utföra lokalisering och segmentering på fashionista bilder. Detta arbete är en del av ett intressant forskningsarbete relaterat till Fashion Style detektering och rekommendation. Utvecklad lösning syftar till att utnyttja möjligheten att lokalisera modeartiklar i en bild genom att rita avgränsande lådor och märka dem. Tillsammans med det tillhandahåller det även pixel-wise semantisk segmenteringsfunktionalitet som extraherar dataelementetikett-pixeldata. Samlad data kan fungera som grundsannelse samt träningsdata för den riktade djuplärarkitekturen. En studie relaterad till lokalisering och segmentering av videor har också presenterats i detta arbete. Det utvecklade systemet har utvärderats med avseende på flexibilitet, utskriftskvalitet och tillförlitlighet jämfört med liknande plattformar. Det har visat sig vara en fullt fungerande lösning som kan tillhandahålla viktiga lokaliseringsoch segmenteringstjänster samtidigt som kärnarkitekturen är enkel och utvidgbar. Computer Systems Datorsystem
407	Deep Brain Dynamics and Images Mining for Tumor Detection and Precision Medicine Lakshmi Ramesh (16637316) 30 August 2023 (has links) <p>Automatic brain tumor segmentation in Magnetic Resonance Imaging scans is essential for the diagnosis, treatment, and surgery of cancerous tumors. However, identifying the hardly detectable tumors poses a considerable challenge, which are usually of different sizes, irregular shapes, and vague invasion areas. Current advancements have not yet fully leveraged the dynamics in the multiple modalities of MRI, since they usually treat multi-modality as multi-channel, and the early channel merging may not fully reveal inter-modal couplings and complementary patterns. In this thesis, we propose a novel deep cross-attention learning algorithm that maximizes the subtle dynamics mining from each of the input modalities and then boosts feature fusion capability. More specifically, we have designed a Multimodal Cross-Attention Module (MM-CAM), equipped with a 3D Multimodal Feature Rectification and Feature Fusion Module. Extensive experiments have shown that the proposed novel deep learning architecture, empowered by the innovative MM- CAM, produces higher-quality segmentation masks of the tumor subregions. Further, we have enhanced the algorithm with image matting refinement techniques. We propose to integrate a Progressive Refinement Module (PRM) and perform Cross-Subregion Refinement (CSR) for the precise identification of tumor boundaries. A Multiscale Dice Loss was also successfully employed to enforce additional supervision for the auxiliary segmentation outputs. This enhancement will facilitate effectively matting-based refinement for medical image segmentation applications. Overall, this thesis, with deep learning, transformer-empowered pattern mining, and sophisticated architecture designs, will greatly advance deep brain dynamics and images mining for tumor detection and precision medicine.</p> Computer vision Multimodal analysis and synthesis Deep learning Neural networks Semantic Segmentation Brain Tumor Segmentation Deep Learning Computer Vision Multimodal ML 3D Computer Vision Attention Cross-Attention Biomedical Segmentation
408	Advertising product improvement opportunities using segmentation in Video-on-Demand services : A case study of MTG’s opportunities in the shift from television to streaming video Kohlberg, Marcus, Westman, Lars-Peter January 2014 (has links) More and more people choose to watch television online through online video-on- demand services. For media corporations, such as the Modern Times Group (MTG), this means that video-on-demand will become an increasingly important source of revenue. Because video-on-demand is an online service, advertising products offered therein are in competition with other online advertising products. Currently, MTG’s video-on-demand advertising products are the same as on regular television, meaning they haven’t yet taken advantage of any advertising product development opportunities made possible by Internet technology. The purpose of this thesis is therefore to determine what MTG’s strategy should be to improve the competitiveness and revenue of their video-on-demand advertising products, and what key concerns need to be addressed in order to realize the determined strategy. By request of the commissioner, MTG, possible uses of segmentation to achieve the strategy are studied. The methods used to collect data include multiple interviews both at MTG and at their current advertising customers, as well as web analytics and a questionnaire. Both qualitative and quantitative analysis was used to answer the research questions. Findings suggest that MTG should strive to improve the engagement of their advertising products, through the use of contextual segmentation and self-segmentation. This goes against the current trend in online advertising, where segmentation is primarily used for ad targeting. The reason for not adhering to the trend is that MTG’s advertising customers operate in a television mindset, where ad targeting is of a very limited nature and engagement is of greater perceived value. Video-on-demand ad avoidance segmentation self-segmentation contextual segmentation advertising products signal strength ad engagement video-on-demand play tjänster segmentering självsegmentering kontextuell segmentering signalstyrka Media Studies Medievetenskap
409	Mutual Enhancement of Environment Recognition and Semantic Segmentation in Indoor Environment Challa, Venkata Vamsi January 2024 (has links) Background:The dynamic field of computer vision and artificial intelligence has continually evolved, pushing the boundaries in areas like semantic segmentation andenvironmental recognition, pivotal for indoor scene analysis. This research investigates the integration of these two technologies, examining their synergy and implicayions for enhancing indoor scene understanding. The application of this integrationspans across various domains, including smart home systems for enhanced ambientliving, navigation assistance for Cleaning robots, and advanced surveillance for security. Objectives: The primary goal is to assess the impact of integrating semantic segmentation data on the accuracy of environmental recognition algorithms in indoor environments. Additionally, the study explores how environmental context can enhance the precision and accuracy of contour-aware semantic segmentation. Methods: The research employed an extensive methodology, utilizing various machine learning models, including standard algorithms, Long Short-Term Memorynetworks, and ensemble methods. Transfer learning with models like EfficientNet B3, MobileNetV3 and Vision Tranformer was a key aspect of the experimentation. The experiments were designed to measure the effect of semantic segmentation on environmental recognition and its reciprocal influence. Results: The findings indicated that the integration of semantic segmentation data significantly enhanced the accuracy of environmental recognition algorithms. Conversely, incorporating environmental context into contour-aware semantic segmentation led to notable improvements in precision and accuracy, reflected in metrics such as Mean Intersection over Union(MIoU). Conclusion: This research underscores the mutual enhancement between semantic segmentation and environmental recognition, demonstrating how each technology significantly boosts the effectiveness of the other in indoor scene analysis. The integration of semantic segmentation data notably elevates the accuracy of environmental recognition algorithms, while the incorporation of environmental context into contour-aware semantic segmentation substantially improves its precision and accuracy.The results also open avenues for advancements in automated annotation processes, paving the way for smarter environmental interaction. Semantic Segmentation Scene Classification Environment Recognition Machine Learning Deep Learning Image Classification Vision Transformers SAM(Segment Anything Model) Image Segmentation Contour-aware semantic segmentation Computer Sciences Datavetenskap (datalogi)
410	FGSSNet: Applying Feature-Guided Semantic Segmentation on real world floorplans Norrby, Hugo, Färm, Gabriel January 2024 (has links) This master thesis introduces FGSSNet, a novel multi-headed feature-guided semantic segmentation (FGSS) architecture designed to improve the generalization ability of segmentation models on floorplans by injecting domain-specific information into the latent space, guiding the segmentation process. FGSSNet features a U-Net segmentation backbone with a jointly trained reconstruction head attached to the U-Net decoder, tasked with reconstructing the injected feature maps, forcing their utilization throughout the decoding process. A multi-headed dedicated feature extractor is used to extract the domain-specific feature maps used by the FGSSNet while also predicting the wall width used for our novel dynamic scaling algorithm, designed to ensure spatial consistency between the training and real-world floorplans. The results show that the reconstruction head proved redundant, diverting the networks attention away from the segmentation task, ultimately hindering its performance. Instead, the ablated reconstruction head model, FGSSNet-NoRec, showed increased performance by utilizing the injected features freely, showcasing their importance. FGSSNet-NoRec slightly improves the IoU performance of comparable U-Net models by achieving 79.3 wall IoU(%) on a preprocessed CubiCasa5K dataset while showing an average IoU increase of 3.0 (5.3%) units on the more challenging real-world floorplans, displaying a superior generalization performance by leveraging the injected domain-specific information. Segmentation Semantic-Segmentation Feature-guided guide segmentation CubiCasa5k Floorplan injecting domain-specific FGSSNet FGSSNet-NoRec Unet Unet-backbone Computer and Information Sciences Data- och informationsvetenskap

Search results