• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 1
  • 1
  • 1
  • Tagged with
  • 8
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A study of deep learning-based face recognition models for sibling identification

Goel, R., Mehmood, Irfan, Ugail, Hassan 20 March 2022 (has links)
Yes / Accurate identification of siblings through face recognition is a challenging task. This is predominantly because of the high degree of similarities among the faces of siblings. In this study, we investigate the use of state-of-the-art deep learning face recognition models to evaluate their capacity for discrimination between sibling faces using various similarity indices. The specific models examined for this purpose are FaceNet, VGGFace, VGG16, and VGG19. For each pair of images provided, the embeddings have been calculated using the chosen deep learning model. Five standard similarity measures, namely, cosine similarity, Euclidean distance, structured similarity, Manhattan distance, and Minkowski distance, are used to classify images looking for their identity on the threshold defined for each of the similarity measures. The accuracy, precision, and misclassification rate of each model are calculated using standard confusion matrices. Four different experimental datasets for full-frontal-face, eyes, nose, and forehead of sibling pairs are constructed using publicly available HQf subset of the SiblingDB database. The experimental results show that the accuracy of the chosen deep learning models to distinguish siblings based on the full-frontal-face and cropped face areas vary based on the face area compared. It is observed that VGGFace is best while comparing the full-frontal-face and eyes—the accuracy of classification being with more than 95% in this case. However, its accuracy degrades significantly when the noses are compared, while FaceNet provides the best result for classification based on the nose. Similarly, VGG16 and VGG19 are not the best models for classification using the eyes, but these models provide favorable results when foreheads are compared.
2

Identifiering och Klassificering av trafikljussignaler med hjälp av maskininlärningsmodeller : Jämförelse, träning, testning av maskininlärningsmodeller för identifiering och klassificering av trafikljussignaler. / Identification and classification of traffic light signals usingmachine learning models

Bosik, Geni, Gergis, Fadi January 2024 (has links)
Detta examensarbete utforskade utvecklingen av avancerade maskininlärningsmodeller föratt förbättra autonoma transportsystem. Genom att fokusera på identifiering och klassificering av trafikljussignaler, bidrog arbetet till säkerheten och effektiviteten hos självkörandefordon. En granskning av modeller som Single Shot MultiBox Detector (SSD), som objektdetekteringsmodell, InceptionV3 och VGG16, som klassificeringsmodeller, genomfördes,med särskild vikt på deras träning och testningsprocesser.Resultaten, med avseende på valideringsnoggrannhet ’accuracy’ och valideringsförlust(loss), visade att InceptionV3-modellen presterade väl över olika parametrar. Denna modellvisade sig vara robust och anpassningsbar, vilket gjorde den till ett bra val för projektets målom noggrann och pålitlig klassificering av trafikljussignaler.Å andra sidan visade VGG16-modellen varierande resultat. Medan den presterade väl undervissa förutsättningar, visade den sig vara mindre robust vid vissa parametrarinställningar,speciellt vid högre batch-storlekar, vilket ledde till lägre valideringsnoggrannhet och högrevalideringsförlust. / This thesis explored the development of advanced machine learning models to improve autonomous transportation systems. By focusing on the identification and classification of traffic light signals, the work contributes to the safety and efficiency of self-driving vehicles. Areview of models such as the Single Shot MultiBox Detector (SSD), as an object detectionmodel, and InceptionV3 and VGG16, as classification models, was conducted, with particular emphasis on their training and testing processes.The results, in terms of validation accuracy and validation loss, showed that the InceptionV3model performed well across various parameters. This model proved to be robust and adaptable, making it a good choice for the project's goal of accurate and reliable classification oftraffic light signals.On the other hand, the VGG16 model showed varying results. While it performed well undercertain conditions, it proved to be less robust at certain parameter settings, especially at higherbatch sizes, which led to lower validation accuracy and higher validation loss.
3

Re-identifikace graffiti tagů / Graffiti Tags Re-Identification

Pavlica, Jan January 2020 (has links)
This thesis focuses on the possibility of using current methods in the field of computer vision to re-identify graffiti tags. The work examines the possibility of using convolutional neural networks to re-identify graffiti tags, which are the most common type of graffiti. The work experimented with various models of convolutional neural networks, the most suitable of which was MobileNet using the triplet loss function, which managed to achieve a mAP of 36.02%.
4

Multi-Task Convolutional Learning for Flame Characterization

Ur Rehman, Obaid January 2020 (has links)
This thesis explores multi-task learning for combustion flame characterization i.e to learn different characteristics of the combustion flame. We propose a multi-task convolutional neural network for two tasks i.e. PFR (Pilot fuel ratio) and fuel type classification based on the images of stable combustion. We utilize transfer learning and adopt VGG16 to develop a multi-task convolutional neural network to jointly learn the aforementioned tasks. We also compare the performance of the individual CNN model for two tasks with multi-task CNN which learns these two tasks jointly by sharing visual knowledge among the tasks. We share the effectiveness of our proposed approach to a private company’s dataset. To the best of our knowledge, this is the first work being done for jointly learning different characteristics of the combustion flame. / <p>This wrok as done with Siemens, and we have applied for a patent which is still pending.</p>
5

Automatic Change Detection in Visual Scenes

Brolin, Morgan January 2021 (has links)
This thesis proposes a Visual Scene Change Detector(VSCD) system which is a system which involves four parts, image retrieval, image registration, image change detection and panorama creation. Two prestudies are conducted in order to find a proposed image registration method and a image retrieval method. The two found methods are then combined with a proposed image registration method and a proposed panorama creation method to form the proposed VSCD. The image retrieval prestudy evaluates a SIFT related method with a bag of words related method and finds the SIFT related method to be the superior method. The image change detection prestudy evaluates 8 different image change detection methods. Result from the image change detection prestudy shows that the methods performance is dependent on the image category and an ensemble method is the least dependent on the category of images. An ensemble method is found to be the best performing method followed by a range filter method and then a Convolutional Neural Network (CNN) method. Using a combination of the 2 image retrieval methods and the 8 image change detection method 16 different VSCD are formed and tested. The final result show that the VSCD comprised of the best methods from the prestudies is the best performing method. / Detta exjobb föreslår ett Visual Scene Change Detector(VSCD) system vilket är ett system som har 4 delar, image retrieval, image registration, image change detection och panorama creation. Två förstudier görs för att hitta en föreslagen image registration metod och en föreslagen panorama creation metod. De två föreslagna delarna kombineras med en föreslagen image registration och en föreslagen panorama creation metod för att utgöra det föreslagna VSCD systemet. Image retrieval förstudien evaluerar en ScaleInvariant Feature Transform (SIFT) relaterad method med en Bag of Words (BoW) relaterad metod och hittar att den SIFT relaterade methoden är bäst. Image change detection förstudie visar att metodernas prestanda är beroende av catagorin av bilder och att en enemble metod är minst beroende av categorin av bilder. Enemble metoden är hittad att vara den bästa presterande metoden följt av en range filter metod och sedan av en CNN metod. Genom att använda de 2 image retrieval metoder kombinerat med de 8 image change detection metoder är 16 st VSCD system skapade och testade. Sista resultatet visar att den VSCD som använder de bästa metoderna från förstudien är den bäst presterande VSCD.
6

Développement d'outils web de détection d'annotations manuscrites dans les imprimés anciens

M'Begnan Nagnan, Arthur January 2021 (has links) (PDF)
No description available.
7

Deep Learning Models for Human Activity Recognition

Albert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
8

Machine learning assisted decision support system for image analysis of OCT

Yacoub, Elias January 2022 (has links)
Optical Coherence Tomography (OCT) has been around for more than 30 years and is still being continuously improved. The department of ophthalmology is a part of Sahlgrenska Hospital that heavily uses OCT for helping people with the treatment of eye diseases. They are currently facing a problem where the time to go from an OCT scan to treatment is being increased due to having an overload of patient visits every day. Since it requires a trained expert to analyze each OCT scan, the increase of patients is too overwhelming for the few experts that the department has. It is believed that the next phase of this medical field will be through the adoption of machine learning technology. This thesis has been issued by Sahlgrenska University Hospital (SUH), and they want to address the problem that ophthalmology has by introducing the use of machine learning into their workflow. This thesis aims to determine the best suited CNN through training and testing of pre-trained models and to build a tool that a model can be integrated into for use in ophthalmology. Transfer learning was used to compare three different types of pre-trained models offered by Keras, namely VGG16, InceptionResNet50V2 and ResNet50V2. They were all trained on an open dataset containing 84495 OCT images categorized into four different classes. These include the three diseases Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), drusen and normal eyes. To further improve the accuracy of the models, oversampling, undersampling, and data augmentation were applied to the training set and then tested in different variations. A web application was built using Tensorflow.js and Node.js that the best-performed model later was integrated into. The VGG16 model performed the best with only oversampling applied out of the three. It yielded an average of 95% precision, 95% recall and got a 95% F1-score. The second was the Inception model with only oversampling applied that got an average of 93% precision, 93% recall and a 93% F1-score. Last came the ResNet model with an average of 93% precision, 92% recall and a 92% F1-score. The results suggest that oversampling is the overall best technique for this given dataset. The chosen data augmentation techniques only lead to models performing marginally worse in all cases. It also suggests that pre-trained models with more parameters, such as the VGG16 model, have more feature mappings and, therefore, achieve higher accuracy. On this basis, parameters and better mappings of features should be taken into account when using pre-trained models.

Page generated in 0.0373 seconds