Spelling suggestions: "subject:"residual neural network"" "subject:"esidual neural network""
1 |
Evaluating Transfer Learning Capabilities of Neural NetworkArchitectures for Image ClassificationDarouich, Mohammed, Youmortaji, Anton January 2022 (has links)
Training a deep neural network from scratch can be very expensive in terms of resources.In addition, training a neural network on a new task is usually done by training themodel form scratch. Recently there are new approaches in machine learning which usesthe knowledge from a pre-trained deep neural network on a new task. The technique ofreusing the knowledge from previously trained deep neural networks is called Transferlearning. In this paper we are going to evaluate transfer learning capabilities of deep neuralnetwork architectures for image classification. This research attempts to implementtransfer learning with different datasets and models in order to investigate transfer learningin different situations.
|
2 |
Automatic usability assessment of CR images using deep learningWårdemark, Erik, Unell, Olle January 2024 (has links)
Computed Radiography exams are rarely performed by the same physicians who will interpret the image. Therefore, if the image does not help the physician diagnose the patient, the image can be rejected by the interpreting physician. The rejection normally happens after the patient has already left the hospital meaning that they will have to return to retake the exam. This leads to unnecessary work for the physicians and for the patient. In order to solve this problem we have explored deep learning algorithms to automatically analyze the images and distinguish between usable and unusable images. The deep learning algorithms include convolutional neural networks, vision transformers and fusion networks utilizing different types of data. In total, seven architectures were used to train 42 models. The models were trained on a dataset of 61 127 DICOM files containing images and metadata collected from a clinical setting and labeled based on if the images were deemed usable in the clinical setting. The complete dataset was used for training generalized models and subsets containing specific body parts were used for training specialized models. Three architectures were used for classification using images only, where two architectures used a ResNet-50 backbone and one architecture used a ViT-B/16 backbone. These architectures created 15 specialized models and three generalized models. Four architectures implementing joint fusion created 20 specialized models and four generalized models. Two of these architectures had a backbone of ResNet-50 and the other two utilized a ViT-B/16 backbone. For each of the backbones used, two types of joint fusion were implemented, type I and type II, which had different structures. The two modalities utilized were images and metadata from the DICOM files. The best image only model had a ViT-B/16 backbone and was trained on a specialized dataset containing hands and feet. This model reached an AUC score of 0.842 and MCC of 0.545. The two fusion models trained on the same dataset reached an AUC score of 0.843 and 0.834 respectively and an MCC of 0.547 and 0.546 respectively. We concluded that it is possible to perform automatic rejections with deep learning models even though the results of this study are not good enough for clinical use. The models using ViT-B/16 performed better than the ones using ResNet-50 for all models. The generalized and specialized models performed equally well in most cases with the exception of the smaller subsets of the full dataset. Utilizing metadata from the DICOM files did not improve the models compared to the image only models.
|
3 |
Investigations of calorimeter clustering in ATLAS using machine learningNiedermayer, Graeme 11 January 2018 (has links)
The Large Hadron Collider (LHC) at CERN is designed to search for new physics by colliding protons with a center-of-mass energy of 13 TeV. The ATLAS detector is a multipurpose particle detector built to record these proton-proton collisions. In order to improve sensitivity to new physics at the LHC, luminosity increases are planned for 2018 and beyond. With this greater luminosity comes an increase in the number of simultaneous proton-proton collisions per bunch crossing (pile-up). This extra pile-up has adverse effects on algorithms for clustering the ATLAS detector's calorimeter cells. These adverse effects stem from overlapping energy deposits originating from distinct particles and could lead to difficulties in accurately reconstructing events. Machine learning algorithms provide a new tool that has potential to improve clustering performance. Recent developments in computer science have given rise to new set of machine learning algorithms that, in many circumstances, out-perform more conventional algorithms. One of these algorithms, convolutional neural networks, has been shown to have impressive performance when identifying objects in 2d or 3d arrays. This thesis will develop a convolutional neural network model for calorimeter cell clustering and compare it to the standard ATLAS clustering algorithm. / Graduate
|
4 |
A Comparative Analysis of Machine Learning Algorithms in Binary Facial Expression RecognitionNordén, Frans, von Reis Marlevi, Filip January 2019 (has links)
In this paper an analysis is conducted regarding whether a higher classification accuracy of facial expressions are possible. The approach used is that the seven basic emotional states are combined into a binary classification problem. Five different machine learning algorithms are implemented: Support vector machines, Extreme learning Machine and three different Convolutional Neural Networks (CNN). The utilized CNN:S were one conventional, one based on VGG16 and transfer learning and one based on residual theory known as RESNET50. The experiment was conducted on two datasets, one small containing no contamination called JAFFE and one big containing contamination called FER2013. The highest accuracy was achieved with the CNN:s where RESNET50 had the highest classification accuracy. When comparing the classification accuracy with the state of the art accuracy an improvement of around 0.09 was achieved on the FER2013 dataset. This dataset does however include some ambiguities regarding what facial expression is shown. It would henceforth be of interest to conduct an experiment where humans classify the facial expressions in the dataset in order to achieve a benchmark.
|
5 |
Semantic segmentation of off-road scenery on embedded hardware using transfer learning / Semantisk segmentering av terränglandskap på inbyggda system med överförd lärandeElander, Filip January 2021 (has links)
Real-time semantic scene understanding is a challenging computer vision task for autonomous vehicles. A limited amount of research has been done regarding forestry and off-road scene understanding, as the industry focuses on urban and on-road applications. Studies have shown that Deep Convolutional Neural Network architectures, using parameters trained on large datasets, can be re-trained and customized with smaller off-road datasets, using a method called transfer learning and yield state-of-the-art classification performance. This master’s thesis served as an extension of such existing off-road semantic segmentation studies. The thesis focused on detecting and visualizing the general trade-offs between classification performance, classification time, and the network’s number of available classes. The results showed that the classification performance declined for every class that got added to the network. Misclassification mainly occurred in the class boundary areas, which increased when more classes got added to the network. However, the number of classes did not affect the network’s classification time. Further, there was a nonlinear trade-off between classification time and classification performance. The classification performance improved with an increased number of network layers and a larger data type resolution. However, the layer depth increased the number of calculations and the larger data type resolution required a longer calculation time. The network’s classification performance increased by 0.5% when using a 16-bit data type resolution instead of an 8-bit resolution. But, its classification time considerably worsened as it segmented about 20 camera frames less per second with the larger data type. Also, tests showed that a 101-layered network slightly degraded in classification performance compared to a 50-layered network, which indicated the nonlinearity to the trade-off regarding classification time and classification performance. Moreover, the class constellations considerably impacted the network’s classification performance and continuity. It was essential that the class’s content and objects were visually similar and shared the same features. Mixing visually ambiguous objects into the same class could drop the inference performance by almost 30%. There are several directions for future work, including writing a new and customized source code for the ResNet50 network. A customized and pruned network could enhance both the application’s classification performance and classification speed. Further, procuring a task-specific forestry dataset and transferring weights pre-trained for autonomous navigation instead of generic object segmentation could lead to even better classification performance. / Se filen
|
6 |
The Effect of Beautification Filters on Image Recognition : "Are filtered social media images viable Open Source Intelligence?" / Effekten av försköningsfilter vid bildigenkänning : "Är filtrerade bilder från sociala media lämpliga som fritt tillgänglig underrättelseinformation?"Skepetzis, Vasilios, Hedman, Pontus January 2021 (has links)
In light of the emergence of social media, and its abundance of facial imagery, facial recognition finds itself useful from an Open Source Intelligence standpoint. Images uploaded on social media are likely to be filtered, which can destroy or modify biometric features. This study looks at the recognition effort of identifying individuals based on their facial image after filters have been applied to the image. The social media image filters studied occlude parts of the nose and eyes, with a particular interest in filters occluding the eye region. Our proposed method uses a Residual Neural Network Model to extract features from images, with recognition of individuals based on distance measures, based on the extracted features. Classification of individuals is also further done by the use of a Linear Support Vector Machine and XGBoost classifier. In attempts to increase the recognition performance for images completely occluded in the eye region, we present a method to reconstruct this information by using a variation of a U-Net, and from the classification perspective, we also train the classifier on filtered images to increase the performance of recognition. Our experimental results showed good recognition of individuals when filters were not occluding important landmarks, especially around the eye region. Our proposed solution shows an ability to mitigate the occlusion done by filters through either reconstruction or training on manipulated images, in some cases, with an increase in the classifier’s accuracy of approximately 17% points with only reconstruction, 16% points when the classifier trained on filtered data, and 24% points when both were used at the same time. When training on filtered images, we observe an average increase in performance, across all datasets, of 9.7% points.
|
7 |
Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitetJoborn, Ludvig, Beming, Mattias January 2022 (has links)
Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS).
|
Page generated in 0.0521 seconds