• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 1
  • Tagged with
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Evaluation of deep learning methods for industrial automation

Onning, Ragnar January 2023 (has links)
The rise and adaptation of the transformer architecture from natural language processing to visual tasks have proven a useful and powerful tool. Subsequent architectures such as visual transformers (ViT) and shifting window (SWIN) transformers have proven to be comparable and oftentimes exceed convolutional neural networks (CNNs) in terms of accuracy. However, for mobile vision tasks and limited hardware, the computational complexity of the transformer architecture is an impediment. This project aims to answer the question of whether the Swin Transformer can be adapted towards lightweight and low latency classification as a basis for industrial automation, and how it compares to CNNs for a specific task. A case study from the logging industry, binary classification of wooden boards on chain conveyors, will serve as the basis of this evaluation. For these purposes, a novel dataset has been collected and annotated. The results of this project include an overview of the respective architectures and their performance for different implementations on the classification task. Both architectures exhibited sufficient accuracy, while the CNN models performed best for the specific case study.
2

Comparative Analysis of Transformer and CNN Based Models for 2D Brain Tumor Segmentation

Träff, Henrik January 2023 (has links)
A brain tumor is an abnormal growth of cells within the brain, which can be categorized into primary and secondary tumor types. The most common type of primary tumors in adults are gliomas, which can be further classified into high-grade gliomas (HGGs) and low-grade gliomas (LGGs). Approximately 50% of patients diagnosed with HGG pass away within 1-2 years. Therefore, the early detection and prompt treatment of brain tumors are essential for effective management and improved patient outcomes.  Brain tumor segmentation is a task in medical image analysis that entails distinguishing brain tumors from normal brain tissue in magnetic resonance imaging (MRI) scans. Computer vision algorithms and deep learning models capable of analyzing medical images can be leveraged for brain tumor segmentation. These algorithms and models have the potential to provide automated, reliable, and non-invasive screening for brain tumors, thereby enabling earlier and more effective treatment. For a considerable time, Convolutional Neural Networks (CNNs), including the U-Net, have served as the standard backbone architectures employed to address challenges in computer vision. In recent years, the Transformer architecture, which already has firmly established itself as the new state-of-the-art in the field of natural language processing (NLP), has been adapted to computer vision tasks. The Vision Transformer (ViT) and the Swin Transformer are two architectures derived from the original Transformer architecture that have been successfully employed for image analysis. The emergence of Transformer based architectures in the field of computer vision calls for an investigation whether CNNs can be rivaled as the de facto architecture in this field.  This thesis compares the performance of four model architectures, namely the Swin Transformer, the Vision Transformer, the 2D U-Net, and the 2D U-Net which is implemented with the nnU-Net framework. These model architectures are trained using increasing amounts of brain tumor images from the BraTS 2020 dataset and subsequently evaluated on the task of brain tumor segmentation for both HGG and LGG together, as well as HGG and LGG individually. The model architectures are compared on total training time, segmentation time, GPU memory usage, and on the evaluation metrics Dice Coefficient, Jaccard Index, precision, and recall. The 2D U-Net implemented using the nnU-Net framework performs the best in correctly segmenting HGG and LGG, followed by the Swin Transformer, 2D U-Net, and Vision Transformer. The Transformer based architectures improve the least when going from 50% to 100% of training data. Furthermore, when data augmentation is applied during training, the nnU-Net outperforms the other model architectures, followed by the Swin Transformer, 2D U-Net, and Vision Transformer. The nnU-net benefited the least from employing data augmentation during training, while the Transformer based architectures benefited the most.  In this thesis we were able to perform a successful comparative analysis effectively showcasing the distinct advantages of the four model architectures under discussion. Future comparisons could incorporate training the model architectures on a larger set of brain tumor images, such as the BraTS 2021 dataset. Additionally, it would be interesting to explore how Vision Transformers and Swin Transformers, pre-trained on either ImageNet- 21K or RadImageNet, compare to the model architectures of this thesis on brain tumor segmentation.
3

Transformer Based Object Detection and Semantic Segmentation for Autonomous Driving

Hardebro, Mikaela, Jirskog, Elin January 2022 (has links)
The development of autonomous driving systems has been one of the most popular research areas in the 21st century. One key component of these kinds of systems is the ability to perceive and comprehend the physical world. Two techniques that address this are object detection and semantic segmentation. During the last decade, CNN based models have dominated these types of tasks. However, in 2021, transformer based networks were able to outperform the existing CNN approach, therefore, indicating a paradigm shift in the domain. This thesis aims to explore the use of a vision transformer, particularly a Swin Transformer, in an object detection and semantic segmentation framework, and compare it to a classical CNN on road scenes. In addition, since real-time execution is crucial for autonomous driving systems, the possibility of a parameter reduction of the transformer based network is investigated. The results appear to be advantageous for the Swin Transformer compared to the convolutional based network, considering both object detection and semantic segmentation. Furthermore, the analysis indicates that it is possible to reduce the computational complexity while retaining the performance.
4

Teaching an AI to recycle by looking at scrap metal : Semantic segmentation through self-supervised learning with transformers / Lär en AI att källsortera genom att kolla på metallskrot

Forsberg, Edwin, Harris, Carl January 2022 (has links)
Stena Recycling is one of the leading recycling companies in Sweden and at their facility in Halmstad, 300 tonnes of refuse are handled every day where aluminium is one of the most valuable materials they sort. Today, most of the sorting process is done automatically, but there are still parts of the refuse that are not correctly sorted. Approximately 4\% of the aluminium is currently not properly sorted and goes to waste. Earlier works have investigated using machine vision to help in the sorting process at Stena Recycling. However, consistently through all these previous works, there is a problem in gathering enough annotated data to train the machine learning models. This thesis aims to investigate how machine vision could be used in the recycling process and if pre-training models using self-supervised learning can alleviate the problem of gathering annotated data and yield an improvement. The results show that machine vision models could viably be used in an information system to assist operators. This thesis also shows that pre-training models with self-supervised learning may yield a small increase in performance. Furthermore, we show that models pre-trained using self-supervised learning also appear to transfer the knowledge learned from images created in a lab environment to images taken at the recycling plant.
5

Instance Segmentation on depth images using Swin Transformer for improved accuracy on indoor images / Instans-segmentering på bilder med djupinformation för förbättrad prestanda på inomhusbilder

Hagberg, Alfred, Musse, Mustaf Abdullahi January 2022 (has links)
The Simultaneous Localisation And Mapping (SLAM) problem is an open fundamental problem in autonomous mobile robotics. One of the latest most researched techniques used to enhance the SLAM methods is instance segmentation. In this thesis, we implement an instance segmentation system using Swin Transformer combined with two of the state of the art methods of instance segmentation namely Cascade Mask RCNN and Mask RCNN. Instance segmentation is a technique that simultaneously solves the problem of object detection and semantic segmentation. We show that depth information enhances the average precision (AP) by approximately 7%. We also show that the Swin Transformer backbone model can work well with depth images. Our results also show that Cascade Mask RCNN outperforms Mask RCNN. However, the results are to be considered due to the small size of the NYU-depth v2 dataset. Most of the instance segmentation researches use the COCO dataset which has a hundred times more images than the NYU-depth v2 dataset but it does not have the depth information of the image.
6

Machine Learning for Automatic Annotation and Recognition of Demographic Characteristics in Facial Images / Maskininlärning för Automatisk Annotering och Igenkänning av Demografiska Egenskaper hos Ansiktsbilder

Gustavsson Roth, Ludvig, Rimér Högberg, Camilla January 2024 (has links)
Recent increase in widespread use of facial recognition technologies have accelerated the utilization of demographic information, as extracted from facial features, yet it is accompanied by ethical concerns. It is therefore crucial, for ethical reasons, to ensure that algorithms like face recognition algorithms employed in legal proceedings are equitable and thoroughly documented across diverse populations. Accurate classification of demographic traits are therefore essential for enabling a comprehensive understanding of other algorithms. This thesis explores how classical machine learning algorithms compare to deep-learning models in predicting sex, age and skin color, concluding that the more compute-heavy deep-learning models, where the best performing models achieved an MCC of 0.99, 0.48 and 0.85 for sex, age and skin color respectively, significantly outperform their classical machine learning counterparts which achieved an MCC of 0.57, 0.22 and 0.54 at best. Once establishing that the deep-learning models are superior, further methods such as semi-supervised learning, a multi-characteristic classifier, sex-specific age classifiers and using tightly cropped facial images instead of upper-body images were employed to try and improve the deep-learning results. Throughout all deep-learning experiments the state of the art vision transformer and convolutional neural network were compared. Whilst the different architectures performed remarkably alike, a slight edge was seen for the convolutional neural network. The results further show that using cropped facial images generally improve the model performance and that more specialized models achieve modest improvements as compared to their less specialized counterparts. Semi-supervised learning showed potential in slightly improving the models further. The predictive performances achieved in this thesis indicate that the deep-learning models can reliably predict demographic features close to, or surpassing, a human.
7

Návrh řízení rotačního inverzního kyvadla / Control Design of the Rotation Inverted Pendulum

Cejpek, Zdeněk January 2019 (has links)
Aim of this thesis is building of a simulator model of a rotary (Furuta) pendulum and design of appropriate regulators. This paper describes assembly of a nonlinear simulator model, using Matlab–Simulink and its library Simscape–Simmechanics. Furthermore the paper discuss linear discrete model obtained from the system response, using least squares method. This linear model serves as aproximation of the system for designing of two linear discrete state space regulators with sumator. These regulators are supported by a simple swing–up regulator and logics managing cooperation.

Page generated in 0.0388 seconds