Global ETD Search

91	Transformer Based Object Detection and Semantic Segmentation for Autonomous Driving Hardebro, Mikaela, Jirskog, Elin January 2022 (has links) The development of autonomous driving systems has been one of the most popular research areas in the 21st century. One key component of these kinds of systems is the ability to perceive and comprehend the physical world. Two techniques that address this are object detection and semantic segmentation. During the last decade, CNN based models have dominated these types of tasks. However, in 2021, transformer based networks were able to outperform the existing CNN approach, therefore, indicating a paradigm shift in the domain. This thesis aims to explore the use of a vision transformer, particularly a Swin Transformer, in an object detection and semantic segmentation framework, and compare it to a classical CNN on road scenes. In addition, since real-time execution is crucial for autonomous driving systems, the possibility of a parameter reduction of the transformer based network is investigated. The results appear to be advantageous for the Swin Transformer compared to the convolutional based network, considering both object detection and semantic segmentation. Furthermore, the analysis indicates that it is possible to reduce the computational complexity while retaining the performance. Computer Vision Autonomous Driving Machine Learning Transformers Swin CNN Object Detection Semantic Segmentation Grad-CAM PCA Mean Attention Distance
92	Fashion Object Detection and Pixel-Wise Semantic Segmentation : Crowdsourcing framework for image bounding box detection & Pixel-Wise Segmentation Mallu, Mallu January 2018 (has links) Technology has revamped every aspect of our life, one of those various facets is fashion industry. Plenty of deep learning architectures are taking shape to augment fashion experiences for everyone. There are numerous possibilities of enhancing the fashion technology with deep learning. One of the key ideas is to generate fashion style and recommendation using artificial intelligence. Likewise, another significant feature is to gather reliable information of fashion trends, which includes analysis of existing fashion related images and data. When specifically dealing with images, localisation and segmentation are well known to address in-depth study relating to pixels, objects and labels present in the image. In this master thesis a complete framework is presented to perform localisation and segmentation on fashionista images. This work is a part of an interesting research work related to Fashion Style detection and Recommendation. Developed solution aims to leverage the possibility of localising fashion items in an image by drawing bounding boxes and labelling them. Along with that, it also provides pixel-wise semantic segmentation functionality which extracts fashion item label-pixel data. Collected data can serve as ground truth as well as training data for the aimed deep learning architecture. A study related to localisation and segmentation of videos has also been presented in this work. The developed system has been evaluated in terms of flexibility, output quality and reliability as compared to similar platforms. It has proven to be fully functional solution capable of providing essential localisation and segmentation services while keeping the core architecture simple and extensible. / Tekniken har förnyat alla aspekter av vårt liv, en av de olika fasetterna är modeindustrin. Massor av djupa inlärningsarkitekturer tar form för att öka modeupplevelser för alla. Det finns många möjligheter att förbättra modetekniken med djup inlärning. En av de viktigaste idéerna är att skapa modestil och rekommendation med hjälp av artificiell intelligens. På samma sätt är en annan viktig egenskap att samla pålitlig information om modetrender, vilket inkluderar analys av befintliga moderelaterade bilder och data. När det specifikt handlar om bilder är lokalisering och segmentering väl kända för att ta itu med en djupgående studie om pixlar, objekt och etiketter som finns i bilden. I denna masterprojekt presenteras en komplett ram för att utföra lokalisering och segmentering på fashionista bilder. Detta arbete är en del av ett intressant forskningsarbete relaterat till Fashion Style detektering och rekommendation. Utvecklad lösning syftar till att utnyttja möjligheten att lokalisera modeartiklar i en bild genom att rita avgränsande lådor och märka dem. Tillsammans med det tillhandahåller det även pixel-wise semantisk segmenteringsfunktionalitet som extraherar dataelementetikett-pixeldata. Samlad data kan fungera som grundsannelse samt träningsdata för den riktade djuplärarkitekturen. En studie relaterad till lokalisering och segmentering av videor har också presenterats i detta arbete. Det utvecklade systemet har utvärderats med avseende på flexibilitet, utskriftskvalitet och tillförlitlighet jämfört med liknande plattformar. Det har visat sig vara en fullt fungerande lösning som kan tillhandahålla viktiga lokaliseringsoch segmenteringstjänster samtidigt som kärnarkitekturen är enkel och utvidgbar. Computer Systems Datorsystem
93	Deep Brain Dynamics and Images Mining for Tumor Detection and Precision Medicine Lakshmi Ramesh (16637316) 30 August 2023 (has links) <p>Automatic brain tumor segmentation in Magnetic Resonance Imaging scans is essential for the diagnosis, treatment, and surgery of cancerous tumors. However, identifying the hardly detectable tumors poses a considerable challenge, which are usually of different sizes, irregular shapes, and vague invasion areas. Current advancements have not yet fully leveraged the dynamics in the multiple modalities of MRI, since they usually treat multi-modality as multi-channel, and the early channel merging may not fully reveal inter-modal couplings and complementary patterns. In this thesis, we propose a novel deep cross-attention learning algorithm that maximizes the subtle dynamics mining from each of the input modalities and then boosts feature fusion capability. More specifically, we have designed a Multimodal Cross-Attention Module (MM-CAM), equipped with a 3D Multimodal Feature Rectification and Feature Fusion Module. Extensive experiments have shown that the proposed novel deep learning architecture, empowered by the innovative MM- CAM, produces higher-quality segmentation masks of the tumor subregions. Further, we have enhanced the algorithm with image matting refinement techniques. We propose to integrate a Progressive Refinement Module (PRM) and perform Cross-Subregion Refinement (CSR) for the precise identification of tumor boundaries. A Multiscale Dice Loss was also successfully employed to enforce additional supervision for the auxiliary segmentation outputs. This enhancement will facilitate effectively matting-based refinement for medical image segmentation applications. Overall, this thesis, with deep learning, transformer-empowered pattern mining, and sophisticated architecture designs, will greatly advance deep brain dynamics and images mining for tumor detection and precision medicine.</p> Computer vision Multimodal analysis and synthesis Deep learning Neural networks Semantic Segmentation Brain Tumor Segmentation Deep Learning Computer Vision Multimodal ML 3D Computer Vision Attention Cross-Attention Biomedical Segmentation
94	Exploring the Depth-Performance Trade-Off : Applying Torch Pruning to YOLOv8 Models for Semantic Segmentation Tasks / Utforska kompromissen mellan djup och prestanda : Tillämpning av Torch Pruning på YOLOv8-modeller för uppgifter om semantisk segmentering Wang, Xinchen January 2024 (has links) In order to comprehend the environments from different aspects, a large variety of computer vision methods are developed to detect objects, classify objects or even segment them semantically. Semantic segmentation is growing in significance due to its broad applications in fields such as robotics, environmental understanding for virtual or augmented reality, and autonomous driving. The development of convolutional neural networks, as a powerful tool, has contributed to solving classification or object detection tasks with the trend of larger and deeper models. It is hard to compare the models from the perspective of depth since they are of different structure. At the same time, semantic segmentation is computationally demanding for the reason that it requires classifying each pixel to certain classes. Running these complicated processes on resource-constrained embedded systems may cause performance degradation in terms of inference time and accuracy. Network pruning, a model compression technique, targeting to eliminate the redundant parameters in the models based on a certain evaluation rule, is one solution. Most traditional network pruning methods, structural or nonstructural, apply zero masks to cover the original parameters rather than literally eliminate the connections. A new pruning method, Torch-Pruning, has a general-purpose library for structural pruning. This method is based on the dependency between parameters and it can remove groups of less important parameters and reconstruct the new model. A cutting-edge research work towards solving several computer vision tasks, Yolov8 has proposed several pre-trained models from nano, small, medium to large and xlarge with similar structure but different parameters for different applications. This thesis applies Torch-Pruning to Yolov8 semantic segmentation models to compare the performance of pruning based on existing models with similar structures, thus it is meaningful to compare the depth of the model as a factor. Several configurations of the pruning have been explored. The results show that greater depth does not always lead to better performance. Besides, pruning can bring about more generalization ability for Gaussian noise at medium level, from 20% to 40% compared with the original models. / För att förstå miljöer från olika perspektiv har en mängd olika datorseendemetoder utvecklats för att upptäcka objekt, klassificera objekt eller till och med segmentera dem semantiskt. Semantisk segmentering växer i betydelse på grund av dess breda tillämpningar inom områden som robotik, miljöförståelse för virtuell eller förstärkt verklighet och autonom körning. Utvecklingen av konvolutionella neurala nätverk, som är ett kraftfullt verktyg, har bidragit till att lösa klassificerings- eller objektdetektionsuppgifter med en trend mot större och djupare modeller. Det är svårt att jämföra modeller från djupets perspektiv eftersom de har olika struktur. Samtidigt är semantisk segmentering beräkningsintensiv eftersom den kräver att varje pixel klassificeras till vissa klasser. Att köra dessa komplicerade processer på resursbegränsade inbäddade system kan orsaka prestandanedgång när det gäller inferenstid och noggrannhet. Nätverksbeskärning, en modellkomprimeringsteknik som syftar till att eliminera överflödiga parametrar i modellerna baserat på en viss utvärderingsregel, är en lösning. De flesta traditionella nätverksbeskärningsmetoder, både strukturella och icke-strukturella, tillämpar nollmasker för att täcka de ursprungliga parametrarna istället för att bokstavligen eliminera anslutningarna. En ny beskärningsmetod, Torch-Pruning, har en allmän användningsområde för strukturell beskärning. Denna metod är baserad på beroendet mellan parametrar och den kan ta bort grupper av mindre viktiga parametrar och återskapa den nya modellen. Ett banbrytande forskningsarbete för att lösa flera datorseenduppgifter, Yolov8, har föreslagit flera förtränade modeller från nano, liten, medium till stor och xstor med liknande struktur men olika parametrar för olika tillämpningar. Denna avhandling tillämpar Torch-Pruning på Yolov8 semantiska segmenteringsmodeller för att jämföra prestandan för beskärning baserad på befintliga modeller med liknande strukturer, vilket gör det meningsfullt att jämföra djupet som en faktor. Flera konfigurationer av beskärningen har utforskats. Resultaten visar att större djup inte alltid leder till bättre prestanda. Dessutom kan beskärning medföra en större generaliseringsförmåga för gaussiskt brus på medelnivå, från 20% till 40%, jämfört med de ursprungliga modellerna. Deep Learning Semantic segmentation Network optimization Network pruning Torch Pruning YOLOv8 Network Depth Djup lärning Semantisk segmentering Nätverksoptimering Nätverksbeskärning Fackelbeskärning YOLOv8 Nätverksdjup Computer and Information Sciences Data- och informationsvetenskap
95	Assessing wood failure in plywood by deep learning/semantic segmentation Ferreira Oliveira, Ramon 09 December 2022 (has links) The current method for estimating wood failure is highly subjective. Various techniques have been proposed to improve the current protocol, but none have succeeded. This research aims to use deep learning/semantic segmentation using SegNet architecture to estimate wood failure in four types of three-ply plywood from mechanical shear strength specimens. We trained and tested our approach on custom and commercial plywood with bio-based and phenol-formaldehyde adhesives. Shear specimens were prepared and tested. Photographs of 255 shear bonded areas were taken. Forty photographs were used to solicit visual estimates from five human evaluators, and the remaining photographs were used to train the machine learning models. Twelve models were trained with the combination of four image sizes and three dataset splits. In comparison to visual estimates, the model trained on 512 × 512 image size with 90/10 dataset split had a mean absolute error (MAE) of 6%, which was the best among the literature. wood failure plywood deep learning semantic segmentation SegNet cottonseed soybean phenol formaldehyde visual estimation model prediction Artificial Intelligence and Robotics Wood Science and Pulp, Paper Technology
96	MULTI-SPECTRAL FUSION FOR SEMANTIC SEGMENTATION NETWORKS Justin Cody Edwards (14700769) 31 May 2023 (has links) <p> </p> <p>Semantic segmentation is a machine learning task that is seeing increased utilization in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles. Semantic segmentation performs the pixel-wise classification of images, creating a new, segmented representation of the input that can be useful for detected various terrain and objects within and image. Recently, convolutional neural networks have been heavily utilized when creating neural networks tackling the semantic segmentation task. This is particularly true in the field of autonomous driving systems.</p> <p>The requirements of automated driver assistance systems (ADAS) drive semantic segmentation models targeted for deployment on ADAS to be lightweight while maintaining accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent performance gaps when using visual imagery alone. This comes with a host of benefits, such as increase performance in various lighting conditions and adverse environmental conditions. Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic segmentation model. Being a lightweight architecture is key for successful deployment on ADAS, as these systems often have resource constraints and need to operate in real-time. Multi-Spectral Fusion Network (MFNet) [ 1 ] accomplishes these parameters by leveraging a sensory fusion approach, and as such was selected as the baseline architecture for this research.</p> <p>Many improvements were made upon the baseline architecture by leveraging a variety of techniques. Such improvements include the proposal of a novel loss function categorical cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of pyramid pooling, a new fusion technique, and drop input data augmentation. These improvements culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further improvements were made by introducing depthwise separable convolutional layers leading to lightweight FTFNet variants, FTFNet Lite 1 & 2.</p> Computer vision Neural networks Semantic Segmentation Convolutional Neural Networks CNN Thermal Imagery Sensory Fusion Data Augmentation Loss Function Multi-Spectral Neural Networks
97	Land Use/Land Cover Classification From Satellite Remote Sensing Images Over Urban Areas in Sweden : An Investigative Multiclass, Multimodal and Spectral Transformation, Deep Learning Semantic Image Segmentation Study / Klassificering av markanvändning/marktäckning från satellit-fjärranalysbilder över urbana områden i Sverige : En undersökande multiklass, multimodal och spektral transformation, djupinlärningsstudie inom semantisk bildsegmentering Aidantausta, Oskar, Asman, Patrick January 2023 (has links) Remote Sensing (RS) technology provides valuable information about Earth by enabling an overview of the planet from above, making it a much-needed resource for many applications. Given the abundance of RS data and continued urbanisation, there is a need for efficient approaches to leverage RS data and its unique characteristics for the assessment and management of urban areas. Consequently, employing Deep Learning (DL) for RS applications has attracted much attention over the past few years. In this thesis, novel datasets consisting of satellite RS images over urban areas in Sweden were compiled from Sentinel-2 multispectral, Sentinel-1 Synthetic Aperture Radar (SAR) and Urban Atlas 2018 Land Use/Land Cover (LULC) data. Then, DL was applied for multiband and multiclass semantic image segmentation of LULC. The contributions of complementary spectral, temporal and SAR data and spectral indices to LULC classification performance compared to using only Sentinel-2 data with red, green and blue spectral bands were investigated by implementing DL models based on the fully convolutional network-based architecture, U-Net, and performing data fusion. Promising results were achieved with 25 possible LULC classes. Furthermore, almost all DL models at an overall model level and all DL models at an individual class level for most LULC classes benefited from complementary satellite RS data with varying degrees of classification improvement. Additionally, practical knowledge and insights were gained from evaluating the results and are presented regarding satellite RS data characteristics and semantic segmentation of LULC in urban areas. The obtained results are helpful for practitioners and researchers applying or intending to apply DL for semantic segmentation of LULC in general and specifically in Swedish urban environments. data fusion deep learning land use/land cover classification multiclass multimodal remote sensing semantic segmentation Sentinel satellite spectral index U-Net Urban Atlas Remote Sensing Fjärranalysteknik
98	Hybrid Deep Learning approach for Lane Detection : Combining convolutional and transformer networks with a post-processing temporal information mechanism, for efficient road lane detection on a road image scene Zarogiannis, Dimitrios, Bompai, Stelio January 2023 (has links) Lane detection is a crucial task in the field of autonomous driving and advanced driver assistance systems. In recent years, convolutional neural networks (CNNs) have been the primary approach for solving this problem. However, interesting findings from recent research works regarding the use of Transformer models and attention-based mechanisms have shown to be beneficial in the task of semantic segmentation of the road lane markings. In this work, we investigate the effectiveness of incorporating a Vision Transformer (ViT) to process feature maps extracted by a CNN network for lane detection. We compare the performance of a baseline CNN-based lane detection model with that of a hybrid CNN-ViT pipeline and test the model over a well known dataset. Furthermore, we explore the impact of incorporating temporal information from a road scene on a lane detection model’s predictive performance. We propose a post-processing technique that utilizes information from previous frames to improve the accuracy of the lane detection model. Our results show that incorporating temporal information noticeably improves the model’s performance, and manages to make effective corrections over the originally predicted lane masks. Our SegNet backbone, exploiting the proposed post-processing mechanism, reached an F1 scoreof 0.52 and Intersection-over-Union (IoU) of 0.36 over the TuSimple test set. However, the findings from the testing of our CNN-ViT pipeline and a relevant ablation study, do indicate that this hybrid approach might not be a good fit for lane detection. More specifically, the ViT module fails to exploit the feature sextracted by our CNN backbone and therefore, our hybrid pipeline results in less accurate lane marking spredictions. Lane Detection CNN Vision Transformer Deep Learning Semantic Segmentation Computer Vision Annan elektroteknik och elektronik
99	Sequential Semantic Segmentation of Streaming Scenes for Autonomous Driving Guo Cheng (13892388) 03 February 2023 (has links) <p>In traffic scene perception for autonomous vehicles, driving videos are available from in-car sensors such as camera and LiDAR for road detection and collision avoidance. There are some existing challenges in computer vision tasks for video processing, including object detection and tracking, semantic segmentation, etc. First, due to that consecutive video frames have a large data redundancy, traditional spatial-to-temporal approach inherently demands huge computational resource. Second, in many real-time scenarios, targets move continuously in the view as data streamed in. To achieve prompt response with minimum latency, an online model to process the streaming data in shift-mode is necessary. Third, in addition to shape-based recognition in spatial space, motion detection also replies on the inherent temporal continuity in videos. While current works either lack long-term memory for reference or consume a huge amount of computation. </p> <p><br></p> <p>The purpose of this work is to achieve strongly temporal-associated sensing results in real-time with minimum memory, which is continually embedded to a pragmatic framework for speed and path planning. It takes a temporal-to-spatial approach to cope with fast moving vehicles in autonomous navigation. It utilizes compact road profiles (RP) and motion profiles (MP) to identify path regions and dynamic objects, which drastically reduces video data to a lower dimension and increases sensing rate. Specifically, we sample one-pixel line at each video frame, the temporal congregation of lines from consecutive frames forms a road profile image; while motion profile consists of the average lines by sampling one-belt pixels at each frame. By applying the dense temporal resolution to compensate the sparse spatial resolution, this method reduces 3D streaming data into 2D image layout. Based on RP and MP under various weather conditions, there have three main tasks being conducted to contribute the knowledge domain in perception and planning for autonomous driving. </p> <p><br></p> <p>The first application is semantic segmentation of temporal-to-spatial streaming scenes, including recognition of road and roadside, driving events, objects in static or motion. Since the main vision sensing tasks for autonomous driving are identifying road area to follow and locating traffic to avoid collision, this work tackles this problem by using semantic segmentation upon road and motion profiles. Though one-pixel line may not contain sufficient spatial information of road and objects, the consecutive collection of lines as a temporal-spatial image provides intrinsic spatial layout because of the continuous observation and smooth vehicle motion. Moreover, by capturing the trajectory of pedestrians upon their moving legs in motion profile, we can robustly distinguish pedestrian in motion against smooth background. The experimental results of streaming data collected from various sensors including camera and LiDAR demonstrate that, in the reduced temporal-to-spatial space, an effective recognition of driving scene can be learned through Semantic Segmentation.</p> <p><br></p> <p>The second contribution of this work is that it accommodates standard semantic segmentation to sequential semantic segmentation network (SE3), which is implemented as a new benchmark for image and video segmentation. As most state-of-the-art methods are greedy for accuracy by designing complex structures at expense of memory use, which makes trained models heavily depend on GPUs and thus not applicable to real-time inference. Without accuracy loss, this work enables image segmentation at the minimum memory. Specifically, instead of predicting for image patch, SE3 generates output along with line scanning. By pinpointing the memory associated with the input line at each neural layer in the network, it preserves the same receptive field as patch size but saved the computation in the overlapped regions during network shifting. Generally, SE3 applies to most of the current backbone models in image segmentation, and furthers the inference by fusing temporal information without increasing computation complexity for video semantic segmentation. Thus, it achieves 3D association over long-range while under the computation of 2D setting. This will facilitate inference of semantic segmentation on light-weighted devices.</p> <p><br></p> <p>The third application is speed and path planning based on the sensing results from naturalistic driving videos. To avoid collision in a close range and navigate a vehicle in middle and far ranges, several RP/MPs are scanned continuously from different depths for vehicle path planning. The semantic segmentation of RP/MP is further extended to multi-depths for path and speed planning according to the sensed headway and lane position. We conduct experiments on profiles of different sensing depths and build up a smoothly planning framework according to their them. We also build an initial dataset of road and motion profiles with semantic labels from long HD driving videos. The dataset is published as additional contribution to the future work in computer vision and autonomous driving. </p> Computer vision Image processing Pattern recognition Video processing Deep learning Neural networks Sequential Semantic Segmentation Autonomous Driving Temporal-to-Spatial Inference Model Video Profile Speed and Path Planning
100	Mutual Enhancement of Environment Recognition and Semantic Segmentation in Indoor Environment Challa, Venkata Vamsi January 2024 (has links) Background:The dynamic field of computer vision and artificial intelligence has continually evolved, pushing the boundaries in areas like semantic segmentation andenvironmental recognition, pivotal for indoor scene analysis. This research investigates the integration of these two technologies, examining their synergy and implicayions for enhancing indoor scene understanding. The application of this integrationspans across various domains, including smart home systems for enhanced ambientliving, navigation assistance for Cleaning robots, and advanced surveillance for security. Objectives: The primary goal is to assess the impact of integrating semantic segmentation data on the accuracy of environmental recognition algorithms in indoor environments. Additionally, the study explores how environmental context can enhance the precision and accuracy of contour-aware semantic segmentation. Methods: The research employed an extensive methodology, utilizing various machine learning models, including standard algorithms, Long Short-Term Memorynetworks, and ensemble methods. Transfer learning with models like EfficientNet B3, MobileNetV3 and Vision Tranformer was a key aspect of the experimentation. The experiments were designed to measure the effect of semantic segmentation on environmental recognition and its reciprocal influence. Results: The findings indicated that the integration of semantic segmentation data significantly enhanced the accuracy of environmental recognition algorithms. Conversely, incorporating environmental context into contour-aware semantic segmentation led to notable improvements in precision and accuracy, reflected in metrics such as Mean Intersection over Union(MIoU). Conclusion: This research underscores the mutual enhancement between semantic segmentation and environmental recognition, demonstrating how each technology significantly boosts the effectiveness of the other in indoor scene analysis. The integration of semantic segmentation data notably elevates the accuracy of environmental recognition algorithms, while the incorporation of environmental context into contour-aware semantic segmentation substantially improves its precision and accuracy.The results also open avenues for advancements in automated annotation processes, paving the way for smarter environmental interaction. Semantic Segmentation Scene Classification Environment Recognition Machine Learning Deep Learning Image Classification Vision Transformers SAM(Segment Anything Model) Image Segmentation Contour-aware semantic segmentation Computer Sciences Datavetenskap (datalogi)

Search results