• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 3
  • 2
  • 2
  • Tagged with
  • 23
  • 13
  • 13
  • 11
  • 11
  • 7
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Produktmatchning EfficientNet vs. ResNet : En jämförelse / Product matching EfficientNet vs. ResNet

Malmgren, Emil, Järdemar, Elin January 2021 (has links)
E-handeln ökar stadigt och mellan åren 2010 och 2014 var det en ökning på antalet konsumenter som handlar online från 28,9% till 34,2%. Otillräcklig information kring en produkts pris tvingar köpare att leta bland flera olika återförsäljare efter det bästa priset. Det finns olika sätt att ta fram informationen som krävs för att kunna jämföra priser. En metod för att kunna jämföra priser är automatiserad produktmatchning. Denna metod använder algoritmer för bildigenkänning där dess syfte är att detektera, lokalisera och känna igen objekt i bilder. Bildigenkänningsalgoritmer har ofta problem med att hitta objekt i bilder på grund av yttre faktorer såsom belysning, synvinklar och om bilden innehåller mycket onödig information. Tidigare har algoritmer såsom ANN (artificial neural network), random forest classifier och support vector machine används men senare undersökningar har visat att CNN (convolutional neural network) är bättre på att hitta viktiga egenskaper hos objekt som gör dem mindre känsliga mot dessa yttre faktorer. Två exempel på alternativa CNN-arkitekturer som vuxit fram är EfficientNet och ResNet som båda har visat bra resultat i tidigare forskning men det finns inte mycket forskning som hjälper en välja vilken CNN-arkitektur som leder till ett så bra resultat som möjligt. Vår frågeställning är därför: Vilken av EfficientNet- och ResNetarkitekturerna ger det högsta resultatet på produktmatchning med utvärderingsmåtten f1-score, precision och recall? Resultatet av studien visar att EfficientNet är den över lag bästa arkitekturen för produktmatchning på studiens datamängd. Resultatet visar också att ResNet var bättre än EfficientNet på att föreslå rätt matchningar av bilderna. De matchningarna ResNet gör stämmer mer än de matchningar EfficientNet föreslår då Resnet fick ett högre recall än vad EfficientNet fick.  EfficientNet uppnår dock en bättre recall som visar att EfficientNet är bättre än ResNet på att hitta fler eller alla korrekta matchningar bland sina potentiella matchningar. Men skillnaden i recall är större mellan modellerna vilket göra att EfficientNet får en högre f1-score och är över lag bättre än ResNet, men vad som är viktigast kan diskuteras. Är det viktigt att de föreslagna matchningarna är korrekta eller att man hittar alla korrekta matchningar. Är det viktigaste att de föreslagna matchningarna är korrekta har ResNet ett övertag men är det viktigare att hitta alla korrekta matchningar har EfficientNet ett övertag. Resultatet beror därför på vad som anses vara viktigast för att avgöra vilken av arkitekturerna som ger bäst resultat. / E-commerce is steadily increasing and between the years 2010 and 2014, there was an increase in the number of consumers shopping online from 28,9% to 34,2%. Insufficient information about the price of a product forces buyers to search among several different retailers for the best price. There are different ways to produce the information required to be able to compare prices. One method to compare prices is automated product matching. This method uses image recognition algorithms where its purpose is to detect, locate and recognize objects in images. Image recognition algorithms often have problems finding objects in images due to external factors such as brightness, viewing angles and if the image contains a lot of unnecessary information. In the past, algorithms such as ANN, random forest classifier and support vector machine have been used, but recent studies have shown that CNN is better at finding important properties of objects that make them less sensitive to these external factors. Two examples of alternative CNN architectures that have emerged are EfficientNet and ResNet, both of which have shown good results in previous studies, but there is not a lot of research that helps one choose which CNN architecture that leads to the best possible result. Our question is therefore: Which of the EfficientNet and ResNet architectures gives the highest result on product matching with the evaluation measures f1-score, precision, and recall? The results of the study show that EfficientNet is the overall best architecture for product matching on the dataset. The results also show that ResNet was better than EfficientNet in proposing the right matches for the images. The matches ResNet makes are more accurate than the matches EfficientNet suggests when Resnet received a higher precision than EfficientNet. However, EfficientNet achieves a better recall that shows that EfficientNet is better than ResNet at finding more or all correct matches among its potential matches. The difference in recall is greater than the difference in precision between the models, which means that EfficientNet gets a higher f1-score and is generally better than ResNet, but what is most important can be discussed. Is it important that the suggested matches are correct or that you find all the correct matches? If the most important thing is that the proposed matches are correct, ResNet has an advantage, but if it is more important to find all correct matches, EfficientNet has an advantage. The result therefore depends on what is considered to be most important in determining which of the architectures gives the best results
12

Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classification

Frenger, Tobias, Häggmark, Johan January 2020 (has links)
Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training.
13

Efektivnost hlubokých konvolučních neuronových sítí na elementární klasifikační úloze / Efficiency of deep convolutional neural networks on an elementary classification task

Prax, Jan January 2021 (has links)
In this thesis deep convolutional neural networks models and feature descriptor models are compared. Feature descriptors are paired with suitable chosen classifier. These models are a part of machine learning therefore machine learning types are described in this thesis. Further these chosen models are described, and their basics and problems are explained. Hardware and software used for tests is listed and then test results and results summary is listed. Then comparison based on the validation accuracy and training time of these said models is done.
14

Real-time face recognition using one-shot learning : A deep learning and machine learning project

Darborg, Alex January 2020 (has links)
Face recognition is often described as the process of identifying and verifying people in a photograph by their face. Researchers have recently given this field increased attention, continuously improving the underlying models. The objective of this study is to implement a real-time face recognition system using one-shot learning. “One shot” means learning from one or few training samples. This paper evaluates different methods to solve this problem. Convolutional neural networks are known to require large datasets to reach an acceptable accuracy. This project proposes a method to solve this problem by reducing the number of training instances to one and still achieving an accuracy close to 100%, utilizing the concept of transfer learning.
15

Re-identifikace graffiti tagů / Graffiti Tags Re-Identification

Pavlica, Jan January 2020 (has links)
This thesis focuses on the possibility of using current methods in the field of computer vision to re-identify graffiti tags. The work examines the possibility of using convolutional neural networks to re-identify graffiti tags, which are the most common type of graffiti. The work experimented with various models of convolutional neural networks, the most suitable of which was MobileNet using the triplet loss function, which managed to achieve a mAP of 36.02%.
16

Improving Situational Awareness in Aviation: Robust Vision-Based Detection of Hazardous Objects

Levin, Alexandra, Vidimlic, Najda January 2020 (has links)
Enhanced vision and object detection could be useful in the aviation domain in situations of bad weather or cluttered environments. In particular, enhanced vision and object detection could improve situational awareness and aid the pilot in environment interpretation and detection of hazardous objects. The fundamental concept of object detection is to interpret what objects are present in an image with the aid of a prediction model or other feature extraction techniques. Constructing a comprehensive data set that can describe the operational environment and be robust for weather and lighting conditions is vital if the object detector is to be utilised in the avionics domain. Evaluating the accuracy and robustness of the constructed data set is crucial. Since erroneous detection, referring to the object detection algorithm failing to detect a potentially hazardous object or falsely detecting an object, is a major safety issue. Bayesian uncertainty estimations are evaluated to examine if they can be utilised to detect miss-classifications, enabling the use of a Bayesian Neural Network with the object detector to identify an erroneous detection. The object detector Faster RCNN with ResNet-50-FPN was utilised using the development framework Detectron2; the accuracy of the object detection algorithm was evaluated based on obtained MS-COCO metrics. The setup achieved a 50.327 % AP@[IoU=.5:.95] score. With an 18.1 % decrease when exposed to weather and lighting conditions. By inducing artificial artefacts and augmentations of luminance, motion, and weather to the images of the training set, the AP@[IoU=.5:.95] score increased by 15.6 %. The inducement improved the robustness necessary to maintain the accuracy when exposed to variations of environmental conditions, which resulted in just a 2.6 % decrease from the initial accuracy. To fully conclude that the augmentations provide the necessary robustness for variations in environmental conditions, the model needs to be subjected to actual image representations of the operational environment with different weather and lighting phenomena. Bayesian uncertainty estimations show great promise in providing additional information to interpret objects in the operational environment correctly. Further research is needed to conclude if uncertainty estimations can provide necessary information to detect erroneous predictions.
17

Charakterizace chodců ve videu / Pedestrian Attribute Analysis

Studená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
18

Writer identification using semi-supervised GAN and LSR method on offline block characters

Hagström, Adrian, Stanikzai, Rustam January 2020 (has links)
Block characters are often used when filling out forms, for example when writing ones personal number. The question of whether or not there is recoverable, biometric (identity related) information within individual digits of hand written personal numbers is then relevant. This thesis investigates the question by using both handcrafted features and extracting features via Deep learning (DL) models, and successively limiting the amount of available training samples. Some recent works using DL have presented semi-supervised methods using Generative adveserial network (GAN) generated data together with a modified Label smoothing regularization (LSR) function. Using this training method might improve performance on a baseline fully supervised model when doing authentication. This work additionally proposes a novel modified LSR function named Bootstrap label smooting regularizer (BLSR) designed to mitigate some of the problems of previous methods, and is compared to the others. The DL feature extraction is done by training a ResNet50 model to recognize writers of a personal numbers and then extracting the feature vector from the second to last layer of the network.Results show a clear indication of recoverable identity related information within the hand written (personal number) digits in boxes. Our results indicate an authentication performance, expressed in Equal error rate (EER), of around 25% with handcrafted features. The same performance measured in EER was between 20-30% when using the features extracted from the DL model. The DL methods, while showing potential for greater performance than the handcrafted, seem to suffer from fluctuation (noisiness) of results, making conclusions on their use in practice hard to draw. Additionally when using 1-2 training samples the handcrafted features easily beat the DL methods.When using the LSR variant semi-supervised methods there is no noticeable performance boost and BLSR gets the second best results among the alternatives.
19

Deep Learning Models for Human Activity Recognition

Albert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
20

Image forgery detection using textural features and deep learning

Malhotra, Yishu 06 1900 (has links)
La croissance exponentielle et les progrès de la technologie ont rendu très pratique le partage de données visuelles, d'images et de données vidéo par le biais d’une vaste prépondérance de platesformes disponibles. Avec le développement rapide des technologies Internet et multimédia, l’efficacité de la gestion et du stockage, la rapidité de transmission et de partage, l'analyse en temps réel et le traitement des ressources multimédias numériques sont progressivement devenus un élément indispensable du travail et de la vie de nombreuses personnes. Sans aucun doute, une telle croissance technologique a rendu le forgeage de données visuelles relativement facile et réaliste sans laisser de traces évidentes. L'abus de ces données falsifiées peut tromper le public et répandre la désinformation parmi les masses. Compte tenu des faits mentionnés ci-dessus, la criminalistique des images doit être utilisée pour authentifier et maintenir l'intégrité des données visuelles. Pour cela, nous proposons une technique de détection passive de falsification d'images basée sur les incohérences de texture et de bruit introduites dans une image du fait de l'opération de falsification. De plus, le réseau de détection de falsification d'images (IFD-Net) proposé utilise une architecture basée sur un réseau de neurones à convolution (CNN) pour classer les images comme falsifiées ou vierges. Les motifs résiduels de texture et de bruit sont extraits des images à l'aide du motif binaire local (LBP) et du modèle Noiseprint. Les images classées comme forgées sont ensuite utilisées pour mener des expériences afin d'analyser les difficultés de localisation des pièces forgées dans ces images à l'aide de différents modèles de segmentation d'apprentissage en profondeur. Les résultats expérimentaux montrent que l'IFD-Net fonctionne comme les autres méthodes de détection de falsification d'images sur l'ensemble de données CASIA v2.0. Les résultats discutent également des raisons des difficultés de segmentation des régions forgées dans les images du jeu de données CASIA v2.0. / The exponential growth and advancement of technology have made it quite convenient for people to share visual data, imagery, and video data through a vast preponderance of available platforms. With the rapid development of Internet and multimedia technologies, performing efficient storage and management, fast transmission and sharing, real-time analysis, and processing of digital media resources has gradually become an indispensable part of many people’s work and life. Undoubtedly such technological growth has made forging visual data relatively easy and realistic without leaving any obvious visual clues. Abuse of such tampered data can deceive the public and spread misinformation amongst the masses. Considering the facts mentioned above, image forensics must be used to authenticate and maintain the integrity of visual data. For this purpose, we propose a passive image forgery detection technique based on textural and noise inconsistencies introduced in an image because of the tampering operation. Moreover, the proposed Image Forgery Detection Network (IFD-Net) uses a Convolution Neural Network (CNN) based architecture to classify the images as forged or pristine. The textural and noise residual patterns are extracted from the images using Local Binary Pattern (LBP) and the Noiseprint model. The images classified as forged are then utilized to conduct experiments to analyze the difficulties in localizing the forged parts in these images using different deep learning segmentation models. Experimental results show that both the IFD-Net perform like other image forgery detection methods on the CASIA v2.0 dataset. The results also discuss the reasons behind the difficulties in segmenting the forged regions in the images of the CASIA v2.0 dataset.

Page generated in 0.0324 seconds