• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • Tagged with
  • 11
  • 11
  • 10
  • 7
  • 7
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

Giroux, James 15 August 2023 (has links)
Object detection utilizing frequency modulated continuous wave radar is becoming increasingly popular in the field of autonomous vehicles. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. Thus, there is a necessity for fully autonomous systems to utilize radar sensing applications in downstream decision-making tasks, generally handled by deep learning algorithms. Commonly, three transformations have been used to form range-azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This method has drawbacks, specifically the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We develop a network utilizing raw radar analog-to-digital converter output capable of operating in near real-time given the removal of all pre-processing. We obtain inference time estimates one-fifth of the traditional range-Doppler pipeline, decreasing from $\SI{156}{\milli\second}$ to $\SI{30}{\milli\second}$, and similar decreases in comparison to the full range-azimuth-Doppler cube. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, \textit{i.e.}, relatively low and high numbers of transmitters and receivers. Our network increases both average recall, and mean intersection over union performance by $\sim 6-7\%$, obtaining state-of-the-art F1 scores as a result on high-definition radar. On low-definition radar, we note an increase in mean average precision of $\sim 2.5\%$ over state-of-the-art range-Doppler networks when raw analog-to-digital converter data is used, and a $\sim5\%$ increase over networks using the full range-azimuth-Doppler cube.
2

Convolution-compacted visiontransformers forprediction of localwall heat flux atmultiple Prandtlnumbers in turbulentchannel flow

Wang, Yuning January 2023 (has links)
Predicting wall heat flux accurately in wall-bounded turbulent flows is critical for a variety of engineering applications, including thermal management systems and energy-efficient designs. Traditional methods, which rely on expensive numerical simulations, are hampered by increasing complexity and extremly high computation cost. Recent advances in deep neural networks (DNNs), however, offer an effective solution by predicting wall heat flux using non-intrusive measurements derived from off-wall quantities. This study introduces a novel approach, the convolution-compacted vision transformer (ViT), which integrates convolutional neural networks (CNNs) and ViT to predict instantaneous fields of wall heat flux accurately based on off-wall quantities including velocity components at three directions and temperature. Our method is applied to an existing database of wall-bounded turbulent flows obtained from direct numerical simulations (DNS). We first conduct an ablation study to examine the effects of incorporating convolution-based modules into ViT architectures and report on the impact of different modules. Subsequently, we utilize fully-convolutional neural networks (FCNs) with various architectures to identify the distinctions between FCN models and the convolution-compacted ViT. Our optimized ViT model surpasses the FCN models in terms of instantaneous field predictions, learning turbulence statistics, and accurately capturing energy spectra. Finally, we undertake a sensitivity analysis using a gradient map to enhance the understanding of the nonlinear relationship established by DNN models, thus augmenting the interpretability of these models. / <p>Presentation online</p>
3

Convolution- compacted vision transformers for prediction of local wall heat flux at multiple Prandtl numbers in turbulent channel flow

Wang, Yuning January 2023 (has links)
Predicting wall heat flux accurately in wall-bounded turbulent flows is critical fora variety of engineering applications, including thermal management systems andenergy-efficient designs. Traditional methods, which rely on expensive numericalsimulations, are hampered by increasing complexity and extremly high computationcost. Recent advances in deep neural networks (DNNs), however, offer an effectivesolution by predicting wall heat flux using non-intrusive measurements derivedfrom off-wall quantities. This study introduces a novel approach, the convolution-compacted vision transformer (ViT), which integrates convolutional neural networks(CNNs) and ViT to predict instantaneous fields of wall heat flux accurately based onoff-wall quantities including velocity components at three directions and temperature.Our method is applied to an existing database of wall-bounded turbulent flowsobtained from direct numerical simulations (DNS). We first conduct an ablationstudy to examine the effects of incorporating convolution-based modules into ViTarchitectures and report on the impact of different modules. Subsequently, we utilizefully-convolutional neural networks (FCNs) with various architectures to identify thedistinctions between FCN models and the convolution-compacted ViT. Our optimizedViT model surpasses the FCN models in terms of instantaneous field predictions,learning turbulence statistics, and accurately capturing energy spectra. Finally, weundertake a sensitivity analysis using a gradient map to enhance the understandingof the nonlinear relationship established by DNN models, thus augmenting theinterpretability of these models
4

Histogram of Oriented Gradients in a Vision Transformer

Malmsten, Jakob, Cengiz, Heja, Lood, David January 2022 (has links)
This study aims to modify Vision Transformer (ViT) to achieve higher accuracy. ViT is a model used in computer vision to, among other things, classify images. By applying ViT to the MNIST data set, an accuracy of approximately 98% is achieved. ViT is modified by implementing a method called Histogram of Oriented Gradients (HOG) in two different ways. The results show that the first approach with HOG gives an accuracy of 98,74% (setup 1) and the second approach gives an accuracy of 96,87% (patch size 4x4 pixels). The study shows that when HOG is applied on the entire image, a better accuracy is obtained. However, no systematic optimization has taken place, which makes it difficult to draw conclusions with certainty.
5

Multiclass Brain Tumour Tissue Classification on Histopathology Images Using Vision Transformers

Spyretos, Christoforos January 2023 (has links)
Histopathology refers to inspecting and analysing tissue samples under a microscope to identify and examine signs of diseases. The manual investigation procedure of histology slides by pathologists is time-consuming and susceptible to misconceptions. Deep learning models have demonstrated outstanding performance in digital histopathology, providing doctors and clinicians with immediate and reliable decision-making assistance in their workflow. In this study, deep learning models, including vision transformers (ViT) and convolutional neural networks (CNN), were employed to compare their performance in patch-level classification task on feature annotations of glioblastoma multiforme in H\&amp;E histology whole slide images (WSI). The dataset utilised in this study was obtained from the Ivy Glioblastoma Atlas Project (IvyGAP). The pre-processing steps included stain normalisation of the images, and patches of size 256x256 pixels were extracted from the WSIs. In addition, the per-subject split method was implemented to prevent data leakage between the training, validation and test sets. Three models were employed to perform the classification task on the IvyGAP data image, two scratch-trained models, a ViT and a CNN (variant of VGG16), and a pre-trained ViT. The models were assessed using various metrics such as accuracy, f1-score, confusion matrices, Matthews correlation coefficient (MCC), area under the curve (AUC) and receiver operating characteristic (ROC) curves. In addition, experiments were conducted to calibrate the models to reflect the ground truth of the task using the temperature scale technique, and their uncertainty was estimated through the Monte Carlo dropout approach. Lastly, the models were statistically compared using the Wilcoxon signed-rank test. Among the evaluated models, the scratch-trained ViT exhibited the best test accuracy of 67%, with an MCC of 0.45. The scratch-trained CNN obtained a test accuracy of 49% and an MCC of 0.15. However, the pre-trained ViT only achieved a test accuracy of 28% and an MCC of 0.034. The reliability diagrams and metrics indicated that the scratch-trained ViT demonstrated better calibration. After applying temperature scaling, only the scratch-trained CNN showed improved calibration. Therefore, the calibrated CNN was used for subsequent experiments. The scratch-trained ViT and calibrated CNN illustrated different uncertainty levels. The scratch-trained ViT had moderate uncertainty, while the calibrated CNN exhibited modest to high uncertainty across classes. The pre-trained ViT had an overall high uncertainty. Finally, the results of the statistical tests reported that the scratch-trained ViT model performed better among the three models at a significant level of approximately 0.0167 after applying the Bonferroni correction.  In conclusion, the scratch-trained ViT model achieved the highest test accuracy and better class discrimination. In contrast, the scratch-trained CNN and pre-trained ViT performed poorly and were comparable to random classifiers. The scratch-trained ViT demonstrated better calibration, while the calibrated CNN showed varying levels of uncertainty. The statistical tests demonstrated no statistical difference among the models.
6

Comparative Analysis of Transformer and CNN Based Models for 2D Brain Tumor Segmentation

Träff, Henrik January 2023 (has links)
A brain tumor is an abnormal growth of cells within the brain, which can be categorized into primary and secondary tumor types. The most common type of primary tumors in adults are gliomas, which can be further classified into high-grade gliomas (HGGs) and low-grade gliomas (LGGs). Approximately 50% of patients diagnosed with HGG pass away within 1-2 years. Therefore, the early detection and prompt treatment of brain tumors are essential for effective management and improved patient outcomes.  Brain tumor segmentation is a task in medical image analysis that entails distinguishing brain tumors from normal brain tissue in magnetic resonance imaging (MRI) scans. Computer vision algorithms and deep learning models capable of analyzing medical images can be leveraged for brain tumor segmentation. These algorithms and models have the potential to provide automated, reliable, and non-invasive screening for brain tumors, thereby enabling earlier and more effective treatment. For a considerable time, Convolutional Neural Networks (CNNs), including the U-Net, have served as the standard backbone architectures employed to address challenges in computer vision. In recent years, the Transformer architecture, which already has firmly established itself as the new state-of-the-art in the field of natural language processing (NLP), has been adapted to computer vision tasks. The Vision Transformer (ViT) and the Swin Transformer are two architectures derived from the original Transformer architecture that have been successfully employed for image analysis. The emergence of Transformer based architectures in the field of computer vision calls for an investigation whether CNNs can be rivaled as the de facto architecture in this field.  This thesis compares the performance of four model architectures, namely the Swin Transformer, the Vision Transformer, the 2D U-Net, and the 2D U-Net which is implemented with the nnU-Net framework. These model architectures are trained using increasing amounts of brain tumor images from the BraTS 2020 dataset and subsequently evaluated on the task of brain tumor segmentation for both HGG and LGG together, as well as HGG and LGG individually. The model architectures are compared on total training time, segmentation time, GPU memory usage, and on the evaluation metrics Dice Coefficient, Jaccard Index, precision, and recall. The 2D U-Net implemented using the nnU-Net framework performs the best in correctly segmenting HGG and LGG, followed by the Swin Transformer, 2D U-Net, and Vision Transformer. The Transformer based architectures improve the least when going from 50% to 100% of training data. Furthermore, when data augmentation is applied during training, the nnU-Net outperforms the other model architectures, followed by the Swin Transformer, 2D U-Net, and Vision Transformer. The nnU-net benefited the least from employing data augmentation during training, while the Transformer based architectures benefited the most.  In this thesis we were able to perform a successful comparative analysis effectively showcasing the distinct advantages of the four model architectures under discussion. Future comparisons could incorporate training the model architectures on a larger set of brain tumor images, such as the BraTS 2021 dataset. Additionally, it would be interesting to explore how Vision Transformers and Swin Transformers, pre-trained on either ImageNet- 21K or RadImageNet, compare to the model architectures of this thesis on brain tumor segmentation.
7

Industrial 3D Anomaly Detection and Localization Using Unsupervised Machine Learning

Bärudde, Kevin, Gandal, Marcus January 2023 (has links)
Detecting defects in industrially manufactured products is crucial to ensure their safety and quality. This process can be both expensive and error-prone if done manually, making automated solutions desirable. There is extensive research on industrial anomaly detection in images, but recent studies have shown that adding 3D information can increase the performance. This thesis aims to extend the 2D anomaly detection framework, PaDiM, to incorporate 3D information. The proposed methods combine RGB with depth maps or point clouds and the effects of using PointNet++ and vision transformers to extract features are investigated. The methods are evaluated on the MVTec 3D-AD public dataset using the metrics image AUROC, pixel AUROC and AUPRO, and on a small dataset collected with a Time-of-Flight sensor. This thesis concludes that the addition of 3D information improves the performance of PaDiM and vision transformers achieve the best results, scoring an average image AUROC of 86.2±0.2 on MVTec 3D-AD.
8

Hybrid Deep Learning approach for Lane Detection : Combining convolutional and transformer networks with a post-processing temporal information mechanism, for efficient road lane detection on a road image scene

Zarogiannis, Dimitrios, Bompai, Stelio January 2023 (has links)
Lane detection is a crucial task in the field of autonomous driving and advanced driver assistance systems. In recent years, convolutional neural networks (CNNs) have been the primary approach for solving this problem. However, interesting findings from recent research works regarding the use of Transformer models and attention-based mechanisms have shown to be beneficial in the task of semantic segmentation of the road lane markings. In this work, we investigate the effectiveness of incorporating a Vision Transformer (ViT) to process feature maps extracted by a CNN network for lane detection. We compare the performance of a baseline CNN-based lane detection model with that of a hybrid CNN-ViT pipeline and test the model over a well known dataset. Furthermore, we explore the impact of incorporating temporal information from a road scene on a lane detection model’s predictive performance. We propose a post-processing technique that utilizes information from previous frames to improve the accuracy of the lane detection model. Our results show that incorporating temporal information noticeably improves the model’s performance, and manages to make effective corrections over the originally predicted lane masks. Our SegNet backbone, exploiting the proposed post-processing mechanism, reached an F1 scoreof 0.52 and Intersection-over-Union (IoU) of 0.36 over the TuSimple test set. However, the findings from the testing of our CNN-ViT pipeline and a relevant ablation study, do indicate that this hybrid approach might not be a good fit for lane detection. More specifically, the ViT module fails to exploit the feature sextracted by our CNN backbone and therefore, our hybrid pipeline results in less accurate lane marking spredictions.
9

Visual Transformers for 3D Medical Images Classification: Use-Case Neurodegenerative Disorders

Khorramyar, Pooriya January 2022 (has links)
A Neurodegenerative Disease (ND) is progressive damage to brain neurons, which the human body cannot repair or replace. The well-known examples of such conditions are Dementia and Alzheimer’s Disease (AD), which affect millions of lives each year. Although conducting numerous researches, there are no effective treatments for the mentioned diseases today. However, early diagnosis is crucial in disease management. Diagnosing NDs is challenging for neurologists and requires years of training and experience. So, there has been a trend to harness the power of deep learning, including state-of-the-art Convolutional Neural Network (CNN), to assist doctors in diagnosing such conditions using brain scans. The CNN models lead to promising results comparable to experienced neurologists in their diagnosis. But, the advent of transformers in the Natural Language Processing (NLP) domain and their outstanding performance persuaded Computer Vision (CV) researchers to adapt them to solve various CV tasks in multiple areas, including the medical field. This research aims to develop Vision Transformer (ViT) models using Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset to classify NDs. More specifically, the models can classify three categories (Cognitively Normal (CN), Mild Cognitive Impairment (MCI), Alzheimer’s Disease (AD)) using brain Fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) scans. Also, we take advantage of Automated Anatomical Labeling (AAL) brain atlas and attention maps to develop explainable models. We propose three ViTs, the best of which obtains an accuracy of 82% on the test dataset with the help of transfer learning. Also, we encode the AAL brain atlas information into the best performing ViT, so the model outputs the predicted label, the most critical region in its prediction, and overlaid attention map on the input scan with the crucial areas highlighted. Furthermore, we develop two CNN models with 2D and 3D convolutional kernels as baselines to classify NDs, which achieve accuracy of 77% and 73%, respectively, on the test dataset. We also conduct a study to find out the importance of brain regions and their combinations in classifying NDs using ViTs and the AAL brain atlas. / <p>This thesis was awarded a prize of 50,000 SEK by Getinge Sterilization for projects within Health Innovation.</p>
10

Instance Segmentation on depth images using Swin Transformer for improved accuracy on indoor images / Instans-segmentering på bilder med djupinformation för förbättrad prestanda på inomhusbilder

Hagberg, Alfred, Musse, Mustaf Abdullahi January 2022 (has links)
The Simultaneous Localisation And Mapping (SLAM) problem is an open fundamental problem in autonomous mobile robotics. One of the latest most researched techniques used to enhance the SLAM methods is instance segmentation. In this thesis, we implement an instance segmentation system using Swin Transformer combined with two of the state of the art methods of instance segmentation namely Cascade Mask RCNN and Mask RCNN. Instance segmentation is a technique that simultaneously solves the problem of object detection and semantic segmentation. We show that depth information enhances the average precision (AP) by approximately 7%. We also show that the Swin Transformer backbone model can work well with depth images. Our results also show that Cascade Mask RCNN outperforms Mask RCNN. However, the results are to be considered due to the small size of the NYU-depth v2 dataset. Most of the instance segmentation researches use the COCO dataset which has a hundred times more images than the NYU-depth v2 dataset but it does not have the depth information of the image.

Page generated in 0.1028 seconds