1 |
Comparative Analysis of Convolutional Neural Network (CNN) Architectures for Content RestrictionDaher, Abdulhadi January 2024 (has links)
Ökningen av sociala medieanvändare har introducerat betydande utmaningar i att hantera de stora mängder data som delas, särskilt bilder. Med mer än 63% av världens befolkning som använder sociala medieplattformar, har behovet av effektiv innehållsbegränsning blivit kritiskt. Manuell moderering är inte längre praktisk på grund av den stora mängden innehåll. Denna studie adresserar det kritiska problemet med bildbegränsning genom att utvärdera prestandan hos avancerade bildklassificeringsmodeller, specifikt VGG16 och Inception_v3 konvolutionella neurala nätverk (CNNs). För att möta denna utmaning använder studien CIFAR-10 datasetet, vilket är allmänt känt som ett riktmärkesdataset inom bildklassificeringsforskning. Forskningen innebär att implementera förtränade modeller och genomföra en omfattande jämförelse med olika prestandamått, inklusive noggrannhet, precision, återkallning, F1-poäng, förväxlingsmatris, ROC-kurva och AUC. Dessa mått ger en omfattande utvärdering av modellens förmåga att korrekt klassificera bilder. Vidare inkluderar studien en finjusteringsfas efter den inledande jämförelsen för att ytterligare förbättra modellens prestanda. Detta innebär att justera parametrarna i den förtränade modellen för att bättre passa de specifika egenskaperna hos CIFAR-10 datasetet. Efter finjusteringen genomförs ytterligare en jämförande analys för att bedöma förbättringarna och fastställa den mest effektiva modellen. Resultaten visar att både VGG16 och Inception_V3 visade betydande förbättringar i prestanda efter finjustering, med märkbara ökningar i noggrannhet och andra mått. Emellertid visade VGG16 bättre övergripande prestanda, vilket gör den till den föredragna modellen för denna applikation. Huvudsyftet med denna forskning är att identifiera den mest effektiva modellen för bildklassificering och därigenom etablera ett fundamentalt konceptbevis för användningen av konvolutionella neurala nätverk (CNNs) i innehållsbegränsning på sociala medieplattformar. / The increase in social media usage has introduced significant challenges in managing the large amounts of data being shared, particularly images. With more than 63% of the global population using social media platforms, the need for effective content restriction has become critical. Manual moderation is no longer practical due to the large amount of content. This thesis addresses the critical issue of image restriction by evaluating the performance of advanced image classification models, specifically VGG16 and Inception_v3 Convolutional Neural Networks (CNNs). In order to address this challenge, the study utilizes the CIFAR-10 dataset, which is widely known as a benchmark dataset in image classification research. The research involves implementing pre-trained models and conducting a comprehensive comparison using various performance metrics, including Accuracy, Precision, Recall, F1 Score, Confusion Matrix, ROC Curve, and AUC. These metrics provide a comprehensive evaluation of the model's ability to accurately classify images. Furthermore, the study includes a fine-tuning phase after the initial comparison to further improve the model's performance. This involves adjusting the parameters of the pre-trained model to better suit the specific characteristics of the CIFAR-10 dataset. Following the finetuning, another round of comparative analysis is conducted to assess the improvements and determine the most effective model. The results demonstrate that both VGG16 and Inception_V3 showed significant improvements in performance after fine-tuning, with notable increases in accuracy and other metrics. However, VGG16 showed a better overall performance, making it the preferred model for this application. The primary objective of this research is to identify the most effective model for image classification, thereby establishing a foundational proof of concept for the application of Convolutional Neural Networks (CNNs) in content restriction on social media platforms.
|
2 |
A study of deep learning-based face recognition models for sibling identificationGoel, R., Mehmood, Irfan, Ugail, Hassan 20 March 2022 (has links)
Yes / Accurate identification of siblings through face recognition is a challenging task. This is predominantly because of the high degree of similarities among the faces of siblings. In this study, we investigate the use of state-of-the-art deep learning face recognition models to evaluate their capacity for discrimination between sibling faces using various similarity indices. The specific models examined for this purpose are FaceNet, VGGFace, VGG16, and VGG19. For each pair of images provided, the embeddings have been calculated using the chosen deep learning model. Five standard similarity measures, namely, cosine similarity, Euclidean distance, structured similarity, Manhattan distance, and Minkowski distance, are used to classify images looking for their identity on the threshold defined for each of the similarity measures. The accuracy, precision, and misclassification rate of each model are calculated using standard confusion matrices. Four different experimental datasets for full-frontal-face, eyes, nose, and forehead of sibling pairs are constructed using publicly available HQf subset of the SiblingDB database. The experimental results show that the accuracy of the chosen deep learning models to distinguish siblings based on the full-frontal-face and cropped face areas vary based on the face area compared. It is observed that VGGFace is best while comparing the full-frontal-face and eyes—the accuracy of classification being with more than 95% in this case. However, its accuracy degrades significantly when the noses are compared, while FaceNet provides the best result for classification based on the nose. Similarly, VGG16 and VGG19 are not the best models for classification using the eyes, but these models provide favorable results when foreheads are compared.
|
3 |
Identifiering och Klassificering av trafikljussignaler med hjälp av maskininlärningsmodeller : Jämförelse, träning, testning av maskininlärningsmodeller för identifiering och klassificering av trafikljussignaler. / Identification and classification of traffic light signals usingmachine learning modelsBosik, Geni, Gergis, Fadi January 2024 (has links)
Detta examensarbete utforskade utvecklingen av avancerade maskininlärningsmodeller föratt förbättra autonoma transportsystem. Genom att fokusera på identifiering och klassificering av trafikljussignaler, bidrog arbetet till säkerheten och effektiviteten hos självkörandefordon. En granskning av modeller som Single Shot MultiBox Detector (SSD), som objektdetekteringsmodell, InceptionV3 och VGG16, som klassificeringsmodeller, genomfördes,med särskild vikt på deras träning och testningsprocesser.Resultaten, med avseende på valideringsnoggrannhet ’accuracy’ och valideringsförlust(loss), visade att InceptionV3-modellen presterade väl över olika parametrar. Denna modellvisade sig vara robust och anpassningsbar, vilket gjorde den till ett bra val för projektets målom noggrann och pålitlig klassificering av trafikljussignaler.Å andra sidan visade VGG16-modellen varierande resultat. Medan den presterade väl undervissa förutsättningar, visade den sig vara mindre robust vid vissa parametrarinställningar,speciellt vid högre batch-storlekar, vilket ledde till lägre valideringsnoggrannhet och högrevalideringsförlust. / This thesis explored the development of advanced machine learning models to improve autonomous transportation systems. By focusing on the identification and classification of traffic light signals, the work contributes to the safety and efficiency of self-driving vehicles. Areview of models such as the Single Shot MultiBox Detector (SSD), as an object detectionmodel, and InceptionV3 and VGG16, as classification models, was conducted, with particular emphasis on their training and testing processes.The results, in terms of validation accuracy and validation loss, showed that the InceptionV3model performed well across various parameters. This model proved to be robust and adaptable, making it a good choice for the project's goal of accurate and reliable classification oftraffic light signals.On the other hand, the VGG16 model showed varying results. While it performed well undercertain conditions, it proved to be less robust at certain parameter settings, especially at higherbatch sizes, which led to lower validation accuracy and higher validation loss.
|
4 |
Crowd Counting Camera Array and CorrectionFausak, Andrew Todd 05 1900 (has links)
"Crowd counting" is a term used to describe the process of calculating the number of people in a given context; however, crowd counting has multiple challenges especially when images representing a given crowd span multiple cameras or images. In this thesis, we propose a crowd counting camera array and correction (CCCAC) method using a camera array of scaled, adjusted, geometrically corrected, combined, processed, and then corrected images to determine the number of people within the newly created combined crowd field. The purpose of CCCAC is to transform and combine valid regions from multiple images from different sources and order as a uniform proportioned set of images for a collage or discrete summation through a new precision counting architecture. Determining counts in this manner within normalized view (collage), results in superior counting accuracy than processing individual images and summing totals with prior models. Finally, the output from the counting model is adjusted with learned results over time to perfect the counting ability of the entire counting system itself. Results show that CCCAC crowd counting corrected and uncorrected methods perform superior to raw image processing methods.
|
5 |
Re-identifikace graffiti tagů / Graffiti Tags Re-IdentificationPavlica, Jan January 2020 (has links)
This thesis focuses on the possibility of using current methods in the field of computer vision to re-identify graffiti tags. The work examines the possibility of using convolutional neural networks to re-identify graffiti tags, which are the most common type of graffiti. The work experimented with various models of convolutional neural networks, the most suitable of which was MobileNet using the triplet loss function, which managed to achieve a mAP of 36.02%.
|
6 |
Multi-Task Convolutional Learning for Flame CharacterizationUr Rehman, Obaid January 2020 (has links)
This thesis explores multi-task learning for combustion flame characterization i.e to learn different characteristics of the combustion flame. We propose a multi-task convolutional neural network for two tasks i.e. PFR (Pilot fuel ratio) and fuel type classification based on the images of stable combustion. We utilize transfer learning and adopt VGG16 to develop a multi-task convolutional neural network to jointly learn the aforementioned tasks. We also compare the performance of the individual CNN model for two tasks with multi-task CNN which learns these two tasks jointly by sharing visual knowledge among the tasks. We share the effectiveness of our proposed approach to a private company’s dataset. To the best of our knowledge, this is the first work being done for jointly learning different characteristics of the combustion flame. / <p>This wrok as done with Siemens, and we have applied for a patent which is still pending.</p>
|
7 |
Body Rumen Fill Scoring of Dairy Cows Using Digital ImagesDerakhshan, Reza, Yousefzadeh Boroujeni, Soroush January 2024 (has links)
The research presented in this thesis focuses on an innovative use of digital imaging, and the machine learning techniques to assess the body rumen fill scoring in dairy cows. This study aims to enhance the efficiency of monitoring and managing dairy cow health, which is crucial for the dairy industry's productivity and sustainability. The primary objective was to develop an automated annotation system fore valuating rumen fill status in dairy cows using digital images extracted from recorded videos. This system leverages advanced machine learning algorithms and neural networks, aiming to mimic manual assessments by veterinarians and specialists on farms. To achieve the above objectives, this thesis made use of already existing video records from a Swedish dairy farm hosting mainly the Swedish Redand the Swedish Holstein breeds. A subset of these images were then processed, manually classified using a modified rumen fill scoring system based on visual assessment, and supervised classification algorithms were trained on 277 manually annotated images. The thesis explored various machine learning techniques for classifying these images, including Logistic Regression, Support Vector Machine (SVM), and a Deep Neural Network using the VGG16 architecture. These models were trained, validated, and tested with a dataset that included variations in cow color patterns, aiming to determine the most effective approach for automated rumen fill scoring.The results indicated that while each model had its strengths and weaknesses, the simple logistic model was performing the best in terms of test accuracy and F1 score. This research contributes to the field of precision livestock farming, particularly in the context of dairy farming. By automating the process of rumen fill scoring, the study aims to provide dairy farmers with a reliable, efficient, and cost-effective tool for monitoring cow health. This tool has the potential to enhance dairy cow welfare, improve milk production, and support the sustainability of dairy farming operations. However, at the current state, the model accuracy of the best model was only moderate. There is a need for further improvement of the prediction performance possibly by adding more cow images, using improved image processing, and feature engineering.
|
8 |
Automatic Change Detection in Visual ScenesBrolin, Morgan January 2021 (has links)
This thesis proposes a Visual Scene Change Detector(VSCD) system which is a system which involves four parts, image retrieval, image registration, image change detection and panorama creation. Two prestudies are conducted in order to find a proposed image registration method and a image retrieval method. The two found methods are then combined with a proposed image registration method and a proposed panorama creation method to form the proposed VSCD. The image retrieval prestudy evaluates a SIFT related method with a bag of words related method and finds the SIFT related method to be the superior method. The image change detection prestudy evaluates 8 different image change detection methods. Result from the image change detection prestudy shows that the methods performance is dependent on the image category and an ensemble method is the least dependent on the category of images. An ensemble method is found to be the best performing method followed by a range filter method and then a Convolutional Neural Network (CNN) method. Using a combination of the 2 image retrieval methods and the 8 image change detection method 16 different VSCD are formed and tested. The final result show that the VSCD comprised of the best methods from the prestudies is the best performing method. / Detta exjobb föreslår ett Visual Scene Change Detector(VSCD) system vilket är ett system som har 4 delar, image retrieval, image registration, image change detection och panorama creation. Två förstudier görs för att hitta en föreslagen image registration metod och en föreslagen panorama creation metod. De två föreslagna delarna kombineras med en föreslagen image registration och en föreslagen panorama creation metod för att utgöra det föreslagna VSCD systemet. Image retrieval förstudien evaluerar en ScaleInvariant Feature Transform (SIFT) relaterad method med en Bag of Words (BoW) relaterad metod och hittar att den SIFT relaterade methoden är bäst. Image change detection förstudie visar att metodernas prestanda är beroende av catagorin av bilder och att en enemble metod är minst beroende av categorin av bilder. Enemble metoden är hittad att vara den bästa presterande metoden följt av en range filter metod och sedan av en CNN metod. Genom att använda de 2 image retrieval metoder kombinerat med de 8 image change detection metoder är 16 st VSCD system skapade och testade. Sista resultatet visar att den VSCD som använder de bästa metoderna från förstudien är den bäst presterande VSCD.
|
9 |
Développement d'outils web de détection d'annotations manuscrites dans les imprimés anciensM'Begnan Nagnan, Arthur January 2021 (has links) (PDF)
No description available.
|
10 |
Deep Learning Models for Human Activity RecognitionAlbert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
|
Page generated in 0.0272 seconds