• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 24
  • 24
  • 24
  • 14
  • 13
  • 10
  • 10
  • 9
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Proposal networks in object detection / Förslagsnätverk för objektdetektering

Grossman, Mikael January 2019 (has links)
Locating and extracting useful data from images is a task that has been revolutionized in the last decade as computing power has risen to such a level to use deep neural networks with success. A type of neural network that uses the convolutional operation called convolutional neural network (CNN) is suited for image related tasks. Using the convolution operation creates opportunities for the network to learn their own filters, that previously had to be hand engineered. For locating objects in an image the state-of-the-art Faster R-CNN model predicts objects in two parts. Firstly, the region proposal network (RPN) extracts regions from the picture where it is likely to find an object. Secondly, a detector verifies the likelihood of an object being in that region.For this thesis, we review the current literature on artificial neural networks, object detection methods, proposal methods and present our new way of generating proposals. By replacing the RPN with our network, the multiscale proposal network (MPN), we increase the average precision (AP) with 12% and reduce the computation time per image by 10%. / Lokalisering av användbar data från bilder är något som har revolutionerats under det senaste decenniet när datorkraften har ökat till en nivå då man kan använda artificiella neurala nätverk i praktiken. En typ av ett neuralt nätverk som använder faltning passar utmärkt till bilder eftersom det ger möjlighet för nätverket att skapa sina egna filter som tidigare skapades för hand. För lokalisering av objekt i bilder används huvudsakligen Faster R-CNN arkitekturen. Den fungerar i två steg, först skapar RPN boxar som innehåller regioner där nätverket tror det är störst sannolikhet att hitta ett objekt. Sedan är det en detektor som verifierar om boxen är på ett objekt .I denna uppsats går vi igenom den nuvarande litteraturen i artificiella neurala nätverk, objektdektektering, förslags metoder och presenterar ett nytt förslag att generera förslag på regioner. Vi visar att genom att byta ut RPN med vår metod (MPN) ökar vi precisionen med 12% och reducerar tiden med 10%.
22

Service robot for the visually impaired: Providing navigational assistance using Deep Learning

Shakeel, Amlaan 28 July 2017 (has links)
No description available.
23

From Pixels to Predators: Wildlife Monitoring with Machine Learning / Från Pixlar till Rovdjur: Viltövervakning med Maskininlärning

Eriksson, Max January 2024 (has links)
This master’s thesis investigates the application of advanced machine learning models for the identification and classification of Swedish predators using camera trap images. With the growing threats to biodiversity, there is an urgent need for innovative and non-intrusive monitoring techniques. This study focuses on the development and evaluation of object detection models, including YOLOv5, YOLOv8, YOLOv9, and Faster R-CNN, aiming to enhance the surveillance capabilities of Swedish predatory species such as bears, wolves, lynxes, foxes, and wolverines. The research leverages a dataset from the NINA database, applying data preprocessing and augmentation techniques to ensure robust model training. The models were trained and evaluated using various dataset sizes and conditions, including day and night images. Notably, YOLOv8 and YOLOv9 underwent extended training for 300 epochs, leading to significant improvements in performance metrics. The performance of the models was evaluated using metrics such as mean Average Precision (mAP), precision, recall, and F1-score. YOLOv9, with its innovative Programmable Gradient Information (PGI) and GELAN architecture, demonstrated superior accuracy and reliability, achieving an F1-score of 0.98 on the expanded dataset. The research found that training models on images captured during both day and night jointly versus separately resulted in only minor differences in performance. However, models trained exclusively on daytime images showed slightly better performance due to more consistent and favorable lighting conditions. The study also revealed a positive correlation between the size of the training dataset and model performance, with larger datasets yielding better results across all metrics. However, the marginal gains decreased as the dataset size increased, suggesting diminishing returns. Among the species studied, foxes were the least challenging for the models to detect and identify, while wolves presented more significant challenges, likely due to their complex fur patterns and coloration blending with the background.
24

Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioning

Huang, Gabriel 09 1900 (has links)
Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies.

Page generated in 0.0412 seconds