11 |
Produktmatchning EfficientNet vs. ResNet : En jämförelse / Product matching EfficientNet vs. ResNetMalmgren, Emil, Järdemar, Elin January 2021 (has links)
E-handeln ökar stadigt och mellan åren 2010 och 2014 var det en ökning på antalet konsumenter som handlar online från 28,9% till 34,2%. Otillräcklig information kring en produkts pris tvingar köpare att leta bland flera olika återförsäljare efter det bästa priset. Det finns olika sätt att ta fram informationen som krävs för att kunna jämföra priser. En metod för att kunna jämföra priser är automatiserad produktmatchning. Denna metod använder algoritmer för bildigenkänning där dess syfte är att detektera, lokalisera och känna igen objekt i bilder. Bildigenkänningsalgoritmer har ofta problem med att hitta objekt i bilder på grund av yttre faktorer såsom belysning, synvinklar och om bilden innehåller mycket onödig information. Tidigare har algoritmer såsom ANN (artificial neural network), random forest classifier och support vector machine används men senare undersökningar har visat att CNN (convolutional neural network) är bättre på att hitta viktiga egenskaper hos objekt som gör dem mindre känsliga mot dessa yttre faktorer. Två exempel på alternativa CNN-arkitekturer som vuxit fram är EfficientNet och ResNet som båda har visat bra resultat i tidigare forskning men det finns inte mycket forskning som hjälper en välja vilken CNN-arkitektur som leder till ett så bra resultat som möjligt. Vår frågeställning är därför: Vilken av EfficientNet- och ResNetarkitekturerna ger det högsta resultatet på produktmatchning med utvärderingsmåtten f1-score, precision och recall? Resultatet av studien visar att EfficientNet är den över lag bästa arkitekturen för produktmatchning på studiens datamängd. Resultatet visar också att ResNet var bättre än EfficientNet på att föreslå rätt matchningar av bilderna. De matchningarna ResNet gör stämmer mer än de matchningar EfficientNet föreslår då Resnet fick ett högre recall än vad EfficientNet fick. EfficientNet uppnår dock en bättre recall som visar att EfficientNet är bättre än ResNet på att hitta fler eller alla korrekta matchningar bland sina potentiella matchningar. Men skillnaden i recall är större mellan modellerna vilket göra att EfficientNet får en högre f1-score och är över lag bättre än ResNet, men vad som är viktigast kan diskuteras. Är det viktigt att de föreslagna matchningarna är korrekta eller att man hittar alla korrekta matchningar. Är det viktigaste att de föreslagna matchningarna är korrekta har ResNet ett övertag men är det viktigare att hitta alla korrekta matchningar har EfficientNet ett övertag. Resultatet beror därför på vad som anses vara viktigast för att avgöra vilken av arkitekturerna som ger bäst resultat. / E-commerce is steadily increasing and between the years 2010 and 2014, there was an increase in the number of consumers shopping online from 28,9% to 34,2%. Insufficient information about the price of a product forces buyers to search among several different retailers for the best price. There are different ways to produce the information required to be able to compare prices. One method to compare prices is automated product matching. This method uses image recognition algorithms where its purpose is to detect, locate and recognize objects in images. Image recognition algorithms often have problems finding objects in images due to external factors such as brightness, viewing angles and if the image contains a lot of unnecessary information. In the past, algorithms such as ANN, random forest classifier and support vector machine have been used, but recent studies have shown that CNN is better at finding important properties of objects that make them less sensitive to these external factors. Two examples of alternative CNN architectures that have emerged are EfficientNet and ResNet, both of which have shown good results in previous studies, but there is not a lot of research that helps one choose which CNN architecture that leads to the best possible result. Our question is therefore: Which of the EfficientNet and ResNet architectures gives the highest result on product matching with the evaluation measures f1-score, precision, and recall? The results of the study show that EfficientNet is the overall best architecture for product matching on the dataset. The results also show that ResNet was better than EfficientNet in proposing the right matches for the images. The matches ResNet makes are more accurate than the matches EfficientNet suggests when Resnet received a higher precision than EfficientNet. However, EfficientNet achieves a better recall that shows that EfficientNet is better than ResNet at finding more or all correct matches among its potential matches. The difference in recall is greater than the difference in precision between the models, which means that EfficientNet gets a higher f1-score and is generally better than ResNet, but what is most important can be discussed. Is it important that the suggested matches are correct or that you find all the correct matches? If the most important thing is that the proposed matches are correct, ResNet has an advantage, but if it is more important to find all correct matches, EfficientNet has an advantage. The result therefore depends on what is considered to be most important in determining which of the architectures gives the best results
12 |
Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classificationFrenger, Tobias, Häggmark, Johan January 2020 (has links)
Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training.
13 |
Efektivnost hlubokých konvolučních neuronových sítí na elementární klasifikační úloze / Efficiency of deep convolutional neural networks on an elementary classification taskPrax, Jan January 2021 (has links)
In this thesis deep convolutional neural networks models and feature descriptor models are compared. Feature descriptors are paired with suitable chosen classifier. These models are a part of machine learning therefore machine learning types are described in this thesis. Further these chosen models are described, and their basics and problems are explained. Hardware and software used for tests is listed and then test results and results summary is listed. Then comparison based on the validation accuracy and training time of these said models is done.
14 |
Real-time face recognition using one-shot learning : A deep learning and machine learning projectDarborg, Alex January 2020 (has links)
Face recognition is often described as the process of identifying and verifying people in a photograph by their face. Researchers have recently given this field increased attention, continuously improving the underlying models. The objective of this study is to implement a real-time face recognition system using one-shot learning. “One shot” means learning from one or few training samples. This paper evaluates different methods to solve this problem. Convolutional neural networks are known to require large datasets to reach an acceptable accuracy. This project proposes a method to solve this problem by reducing the number of training instances to one and still achieving an accuracy close to 100%, utilizing the concept of transfer learning.
15 |
Re-identifikace graffiti tagů / Graffiti Tags Re-IdentificationPavlica, Jan January 2020 (has links)
This thesis focuses on the possibility of using current methods in the field of computer vision to re-identify graffiti tags. The work examines the possibility of using convolutional neural networks to re-identify graffiti tags, which are the most common type of graffiti. The work experimented with various models of convolutional neural networks, the most suitable of which was MobileNet using the triplet loss function, which managed to achieve a mAP of 36.02%.
16 |
Improving Situational Awareness in Aviation: Robust Vision-Based Detection of Hazardous ObjectsLevin, Alexandra, Vidimlic, Najda January 2020 (has links)
Enhanced vision and object detection could be useful in the aviation domain in situations of bad weather or cluttered environments. In particular, enhanced vision and object detection could improve situational awareness and aid the pilot in environment interpretation and detection of hazardous objects. The fundamental concept of object detection is to interpret what objects are present in an image with the aid of a prediction model or other feature extraction techniques. Constructing a comprehensive data set that can describe the operational environment and be robust for weather and lighting conditions is vital if the object detector is to be utilised in the avionics domain. Evaluating the accuracy and robustness of the constructed data set is crucial. Since erroneous detection, referring to the object detection algorithm failing to detect a potentially hazardous object or falsely detecting an object, is a major safety issue. Bayesian uncertainty estimations are evaluated to examine if they can be utilised to detect miss-classifications, enabling the use of a Bayesian Neural Network with the object detector to identify an erroneous detection. The object detector Faster RCNN with ResNet-50-FPN was utilised using the development framework Detectron2; the accuracy of the object detection algorithm was evaluated based on obtained MS-COCO metrics. The setup achieved a 50.327 % AP@[IoU=.5:.95] score. With an 18.1 % decrease when exposed to weather and lighting conditions. By inducing artificial artefacts and augmentations of luminance, motion, and weather to the images of the training set, the AP@[IoU=.5:.95] score increased by 15.6 %. The inducement improved the robustness necessary to maintain the accuracy when exposed to variations of environmental conditions, which resulted in just a 2.6 % decrease from the initial accuracy. To fully conclude that the augmentations provide the necessary robustness for variations in environmental conditions, the model needs to be subjected to actual image representations of the operational environment with different weather and lighting phenomena. Bayesian uncertainty estimations show great promise in providing additional information to interpret objects in the operational environment correctly. Further research is needed to conclude if uncertainty estimations can provide necessary information to detect erroneous predictions.
17 |
Deep Learning Based Image Segmentation for Tumor Cell Death CharacterizationForsberg, Elise, Resare, Alexander January 2024 (has links)
This report presents a deep learning based approach for segmenting and characterizing tumor cell deaths using images provided by the Önfelt lab, which contain NK cells and HL60 leukemia cells. We explore the efficiency of convolutional neural networks (CNNs) in distinguishing between live and dead tumor cells, as well as different classes of cell death. Three CNN architectures: MobileNetV2, ResNet-18, and ResNet-50 were employed, utilizing transfer learning to optimize performance given the limited size of available datasets. The networks were trained using two loss functions: weighted cross-entropy and generalized dice loss and two optimizers: Adaptive moment estimation (Adam) and stochastic gradient descent with momentum (SGDM), with performance evaluations based on metrics such as mean accuracy, intersection over union (IoU), and BF score. Our results indicate that MobileNetV2 with cross-entropy loss and the Adam optimizer outperformed other configurations, demonstrating high mean accuracy. Challenges such as class imbalance, annotation bias, and dataset limitations are discussed, alongside potential future directions to enhance model robustness and accuracy. The successful training of networks capable of classifying all identified types of cell death, demonstrates the potential for a deep learning approach to identify different types of cell deaths as a tool for analyzing immunotherapeutic strategies and enhance understanding of NK cell behaviors in cancer treatment.
18 |
Siamese Network with Dynamic Contrastive Loss for Semantic Segmentation of Agricultural LandsPendotagaya, Srinivas 07 1900 (has links)
This research delves into the application of semantic segmentation in precision agriculture, specifically targeting the automated identification and classification of various irrigation system types within agricultural landscapes using high-resolution aerial imagery. With irrigated agriculture occupying a substantial portion of US land and constituting a major freshwater user, the study's background highlights the critical need for precise water-use estimates in the face of evolving environmental challenges, the study utilizes advanced computer vision for optimal system identification. The outcomes contribute to effective water management, sustainable resource utilization, and informed decision-making for farmers and policymakers, with broader implications for environmental monitoring and land-use planning.
In this geospatial evaluation research, we tackle the challenge of intraclass variability and a limited dataset. The research problem centers around optimizing the accuracy in geospatial analyses, particularly when confronted with intricate intraclass variations and constraints posed by a limited dataset. Introducing a novel approach termed "dynamic contrastive learning," this research refines the existing contrastive learning framework. Tailored modifications aim to improve the model's accuracy in classifying and segmenting geographic features accurately. Various deep learning models, including EfficientNetV2L, EfficientNetB7, ConvNeXtXLarge, ResNet-50, and ResNet-101, serve as backbones to assess their performance in the geospatial context. The data used for evaluation consists of high-resolution aerial imagery from the National Agriculture Imagery Program (NAIP) captured in 2015. It includes four bands (red, green, blue, and near-infrared) with a 1-meter ground sampling distance. The dataset covers diverse landscapes in Lonoke County, USA, and is annotated for various irrigation system types. The dataset encompasses diverse geographic features, including urban, agricultural, and natural landscapes, providing a representative and challenging scenario for model assessment.
The experimental results underscore the efficacy of the modified contrastive learning approach in mitigating intraclass variability and improving performance metrics. The proposed method achieves an average accuracy of 96.7%, a BER of 0.05, and an mIoU of 88.4%, surpassing the capabilities of existing contrastive learning methods. This research contributes a valuable solution to the specific challenges posed by intraclass variability and limited datasets in the realm of geospatial feature classification. Furthermore, the investigation extends to prominent deep learning architectures such as Segformer, Swin Transformer, Convexnext, and Convolution Vision Transformer, shedding light on their impact on geospatial image analysis. ConvNeXtXLarge emerges as a robust backbone, demonstrating remarkable accuracy (96.02%), minimal BER (0.06), and a high MIOU (85.99%).
19 |
Нейросетевая модель предупреждения столкновений квадрокоптера на основе компьютерного зрения : магистерская диссертация / Neural Network Based Quadcopter Collision Avoidance System using Computer VisionТуомас, Э. В., Tuomas, E. V. January 2024 (has links)
Разработка нейросетевой модели детектора препятствий для квадрокоптеров с использованием компьютерного зрения и её оптимизация для развертывания на устройствах с ограниченными вычислительными ресурсами. В работе проведен обзор особенностей задачи бинарной классификации, обучена базовая нейросетевая модель детектора препятствий и применены различные методы оптимизации для повышения её вычислительной эффективности. / Development of a neural network obstacle detector for quadcopters using computer vision and its optimization for deployment on resource-constrained embedded devices. The project involved a review of the features of binary classification tasks, training a base neural network model for obstacle detection, and applying various optimization techniques to improve its computational efficiency.
20 |
Charakterizace chodců ve videu / Pedestrian Attribute AnalysisStudená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
Page generated in 0.0428 seconds