21 |
Deep Learning Models for Human Activity RecognitionAlbert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
|
22 |
Image forgery detection using textural features and deep learningMalhotra, Yishu 06 1900 (has links)
La croissance exponentielle et les progrès de la technologie ont rendu très pratique le partage de données visuelles, d'images et de données vidéo par le biais d’une vaste prépondérance de platesformes disponibles. Avec le développement rapide des technologies Internet et multimédia, l’efficacité de la gestion et du stockage, la rapidité de transmission et de partage, l'analyse en temps réel et le traitement des ressources multimédias numériques sont progressivement devenus un élément indispensable du travail et de la vie de nombreuses personnes. Sans aucun doute, une telle croissance technologique a rendu le forgeage de données visuelles relativement facile et réaliste sans laisser de traces évidentes. L'abus de ces données falsifiées peut tromper le public et répandre la désinformation parmi les masses.
Compte tenu des faits mentionnés ci-dessus, la criminalistique des images doit être utilisée pour authentifier et maintenir l'intégrité des données visuelles. Pour cela, nous proposons une technique de détection passive de falsification d'images basée sur les incohérences de texture et de bruit introduites dans une image du fait de l'opération de falsification.
De plus, le réseau de détection de falsification d'images (IFD-Net) proposé utilise une architecture basée sur un réseau de neurones à convolution (CNN) pour classer les images comme falsifiées ou vierges. Les motifs résiduels de texture et de bruit sont extraits des images à l'aide du motif binaire local (LBP) et du modèle Noiseprint. Les images classées comme forgées sont ensuite utilisées pour mener des expériences afin d'analyser les difficultés de localisation des pièces forgées dans ces images à l'aide de différents modèles de segmentation d'apprentissage en profondeur.
Les résultats expérimentaux montrent que l'IFD-Net fonctionne comme les autres méthodes de détection de falsification d'images sur l'ensemble de données CASIA v2.0. Les résultats discutent également des raisons des difficultés de segmentation des régions forgées dans les images du jeu de données CASIA v2.0. / The exponential growth and advancement of technology have made it quite convenient for people to share visual data, imagery, and video data through a vast preponderance of available platforms. With the rapid development of Internet and multimedia technologies, performing efficient storage and management, fast transmission and sharing, real-time analysis, and processing of digital media resources has gradually become an indispensable part of many people’s work and life. Undoubtedly such technological growth has made forging visual data relatively easy and realistic without leaving any obvious visual clues. Abuse of such tampered data can deceive the public and spread misinformation amongst the masses. Considering the facts mentioned above, image forensics must be used to authenticate and maintain the integrity of visual data. For this purpose, we propose a passive image forgery detection technique based on textural and noise inconsistencies introduced in an image because of the tampering operation.
Moreover, the proposed Image Forgery Detection Network (IFD-Net) uses a Convolution Neural Network (CNN) based architecture to classify the images as forged or pristine. The textural and noise residual patterns are extracted from the images using Local Binary Pattern (LBP) and the Noiseprint model. The images classified as forged are then utilized to conduct experiments to analyze the difficulties in localizing the forged parts in these images using different deep learning segmentation models.
Experimental results show that both the IFD-Net perform like other image forgery detection methods on the CASIA v2.0 dataset. The results also discuss the reasons behind the difficulties in segmenting the forged regions in the images of the CASIA v2.0 dataset.
|
23 |
Artificial data for Image classification in industrial applicationsYonan, Yonan, Baaz, August January 2022 (has links)
Machine learning and AI are growing rapidly and they are being implemented more often than before due to their high accuracy and performance. One of the biggest challenges to machine learning is data collection. The training data is the most important part of any machine learning project since it determines how the trained model will behave. In the case of object classification and detection, capturing a large number of images per object is not always possible and can be a very time-consuming and tedious process. This thesis explores options specific to image classification that help reducing the need to capture many images per object while still keeping the same performance accuracy. In this thesis, experiments have been performed with the goal of achieving a high classification accuracy with a limited dataset. One method that is explored is to create artificial training images using a game engine. Ways to expand a small dataset such as different data augmentation methods, and regularization methods, are also employed. / Maskininlärning och AI växer snabbt och de implementeras allt oftare på grund av deras höga noggrannhet och prestanda. En av de största utmaningarna för maskininlärning är datainsamling. Träningsdata är den viktigaste delen av ett maskininlärningsprojekt eftersom den avgör hur den tränade modellen kommer att bete sig. När det gäller objektklassificering och detektering är det inte alltid möjligt att ta många bilder per objekt och det kan vara en process som kräver mycket tid och arbete. Det här examensarbetet utforskar alternativ som är specifika för bildklassificering som minskar behovet av att ta många bilder per objekt samtidigt som prestanda bibehålls. I det här examensarbetet, flera experiment har utförts med målet att uppnå en hög klassificeringsprestanda med en begränsad dataset. En metod som utforskas är att skapa träningsbilder med hjälp av en spelmotor. Metoder för att utöka antal bilder i ett litet dataset, som data augmenteringsmetoder och regleringsmetoder, används också.
|
24 |
Deep Neural Network for Classification of H&E-stained Colorectal Polyps : Exploring the Pipeline of Computer-Assisted HistopathologyBrunzell, Stina January 2024 (has links)
Colorectal cancer is one of the most prevalent malignancies globally and recently introduced digital pathology enables the use of machine learning as an aid for fast diagnostics. This project aimed to develop a deep neural network model to specifically identify and differentiate dysplasia in the epithelium of colorectal polyps and was posed as a binary classification problem. The available dataset consisted of 80 whole slide images of different H&E-stained polyp sections, which were parted info smaller patches, annotated by a pathologist. The best performing model was a pre-trained ResNet-18 utilising a weighted sampler, weight decay and augmentation during fine tuning. Reaching an area under precision-recall curve of 0.9989 and 97.41% accuracy on previously unseen data, the model’s performance was determined to underperform compared to the task’s intra-observer variability and be in alignment with the inter-observer variability. Final model made publicly available at https://github.com/stinabr/classification-of-colorectal-polyps.
|
25 |
Art to Genre through Deep Learning: A Comparative Analysis of ResNet and EfficientNet for Album Cover Image-Based Music ClassificationBernsdorff Wallstedt, Simon January 2024 (has links)
Musical genres enable listeners to differentiate between diverse styles and forms of music, serving as a practical tool to organize and categorize artists, albums, and songs. Album covers, featuring graphic depictions that reflect the vibe and tone of the music, serve as a visual intermediary between the artist and the audience. While numerous machine learning techniques leverage textual, visual, and audio information in a multi-modal approach to categorize music, the sole focus on visual aspects, specifically album cover images, and their correlation with musical genres has been less explored. The question guides this research: How do EfficientNet and ResNet compare in their ability to accurately classify album cover images into specific genres based solely on visual features? Two state-of-the-art convolutional neural networks, ResNet and EfficientNet, are employed to classify a newly created dataset (the EquiGen dataset) of 60,000 album cover images into 15 distinct genres. The dataset was divided into 70% for training, 15% for validation, and 15% for testing.The findings reveal that both ResNet and EfficientNet achieve better-than-random classification accuracy, indicating that visual features alone can be informative for genre classification. Some genres performed much better than others, namely Metal, New Age and Rap. EfficientNet demonstrated slightly superior performance compared to ResNet, with higher accuracy, precision, recall, and F1 scores. However, both models exhibited challenges in generalizing well-to-unseen data and showed signs of overfitting.This study contributes to the interdisciplinary research on Music Genre Categorization (MGC), machine learning, and music.
|
26 |
Enhancing Fairness in Facial Recognition: Balancing Datasets and Leveraging AI-Generated Imagery for Bias Mitigation : A Study on Mitigating Ethnic and Gender Bias in Public Surveillance SystemsAbbas, Rashad, Tesfagiorgish, William Issac January 2024 (has links)
Facial recognition technology has become a ubiquitous tool in security and personal identification. However, the rise of this technology has been accompanied by concerns over inherent biases, particularly regarding ethnic and gender. This thesis examines the extent of these biases by focusing on the influence of dataset imbalances in facial recognition algorithms. We employ a structured methodological approach that integrates AI-generated images to enhance dataset diversity, with the intent to balance representation across ethnics and genders. Using the ResNet and Vgg model, we conducted a series of controlled experiments that compare the performance impacts of balanced versus imbalanced datasets. Our analysis includes the use of confusion matrices and accuracy, precision, recall and F1-score metrics to critically assess the model’s performance. The results demonstrate how tailored augmentation of training datasets can mitigate bias, leading to more equitable outcomes in facial recognition technology. We present our findings with the aim of contributing to the ongoing dialogue regarding AI fairness and propose a framework for future research in the field.
|
27 |
Deep Learning with Vision-based Technologies for Structural Damage Detection and Health MonitoringBai, Yongsheng 08 December 2022 (has links)
No description available.
|
28 |
Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioningHuang, Gabriel 09 1900 (has links)
Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies.
|
Page generated in 0.0493 seconds