• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 348
  • 42
  • 19
  • 13
  • 10
  • 8
  • 5
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 538
  • 538
  • 250
  • 207
  • 171
  • 132
  • 111
  • 110
  • 108
  • 87
  • 87
  • 80
  • 75
  • 74
  • 73
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
281

Tabular Information Extraction from Datasheets with Deep Learning for Semantic Modeling

Akkaya, Yakup 22 March 2022 (has links)
The growing popularity of artificial intelligence and machine learning has led to the adop- tion of the automation vision in the industry by many other institutions and organizations. Many corporations have made it their primary objective to make the delivery of goods and services and manufacturing in a more efficient way with minimal human intervention. Au- tomated document processing and analysis is also a critical component of this cycle for many organizations that contribute to the supply chain. The massive volume and diver- sity of data created in this rapidly evolving environment make this a highly desired step. Despite this diversity, important information in the documents is provided in the tables. As a result, extracting tabular data is a crucial aspect of document processing. This thesis applies deep learning methodologies to detect table structure elements for the extraction of data and preparation for semantic modelling. In order to find optimal structure definition, we analyzed the performance of deep learning models in different formats such as row/column and cell. The combined row and column detection models perform poorly compared to other models’ detection performance due to the highly over- lapping nature of rows and columns. Separate row and column detection models seem to achieve the best average F1-score with 78.5% and 79.1%, respectively. However, de- termining cell elements from the row and column detections for semantic modelling is a complicated task due to spanning rows and columns. Considering these facts, a new method is proposed to set the ground-truth information called a content-focused annota- tion to define table elements better. Our content-focused method is competent in handling ambiguities caused by huge white spaces and lack of boundary lines in table structures; hence, it provides higher accuracy. Prior works have addressed the table analysis problem under table detection and table structure detection tasks. However, the impact of dataset structures on table structure detection has not been investigated. We provide a comparison of table structure detection performance with cropped and uncropped datasets. The cropped set consists of only table images that are cropped from documents assuming tables are detected perfectly. The uncropped set consists of regular document images. Experiments show that deep learning models can improve the detection performance by up to 9% in average precision and average recall on the cropped versions. Furthermore, the impact of cropped images is negligible under the Intersection over Union (IoU) values of 50%-70% when compared to the uncropped versions. However, beyond 70% IoU thresholds, cropped datasets provide significantly higher detection performance.
282

Investigating techniques for improving accuracy and limiting overfitting for YOLO and real-time object detection on iOS

Güven, Jakup January 2019 (has links)
I detta arbete genomförs utvecklingen av ett realtids objektdetekteringssystem för iOS. För detta ändamål används YOLO, en ett-stegs objektdetekterare och ett s.k. ihoplänkat neuralt nätverk vilket åstadkommer betydligt bättre prestanda än övriga realtidsdetek- terare i termer av hastighet och precision. En dörrdetekterare baserad på YOLO tränas och implementeras i en systemutvecklingsprocess. Maskininlärningsprocessen sammanfat- tas och praxis för att undvika överträning eller “overfitting” samt för att öka precision och hastighet diskuteras och appliceras. Vidare genomförs en rad experiment vilka pekar på att dataaugmentation och inkludering av negativ data i ett dataset medför ökad precision. Hyperparameteroptimisering och kunskapsöverföring pekas även ut som medel för att öka en objektdetekringsmodells prestanda. Författaren lyckas öka modellens mAP, ett sätt att mäta precision för objektdetekterare, från 63.76% till 86.73% utifrån de erfarenheter som dras av experimenten. En modells tendens för överträning utforskas även med resultat som pekar på att träning med över 300 epoker rimligen orsakar en övertränad modell. / This paper features the creation of a real time object detection system for mobile iOS using YOLO, a state-of-the-art one stage object detector and convoluted neural network far surpassing other real time object detectors in speed and accuracy. In this process an object detecting model is trained to detect doors. The machine learning process is outlined and practices to combat overfitting and increasing accuracy and speed are discussed. A series of experiments are conducted, the results of which suggests that data augmentation, including negative data in a dataset, hyperparameter optimisation and transfer learning are viable techniques in improving the performance of an object detection model. The author is able to increase mAP, a measurement of accuracy for object detectors, from 63.76% to 86.73% based on the results of experiments. The tendency for overfitting is also explored and results suggest that training beyond 300 epochs is likely to produce an overfitted model.
283

Apprentissage statistique de classes sémantiques pour l'interprétation d'images aériennes / Learning of semantic classes for aerial image analysis

Randrianarivo, Hicham 15 December 2016 (has links)
Ce travail concerne l'interprétation du contenu des images aériennes optiques panchromatiques très haute résolution. Deux méthodes pour la classification du contenu de ces images ont été développées. Une méthode basée sur la détection des instances des différentes catégories d'objets et une autre méthode basée sur la segmentation sémantique des superpixels de l'image utilisant un modèle de contexte entre les différentes instances des superpixels. La méthode de détection des objets dans une image très haute résolution est basée sur l'apprentissage d'un mélange de modèle d'apparence de la catégorie d'objets à détecter puis d'une fusion des hypothèses renvoyées par les différents modèles. Nous proposons une méthode de partitionnement en sous catégories visuelles basée sur une procédure en deux étapes des exemples d'apprentissages de la base en fonction des métadonnées disponibles et de l'apparence des exemples d'apprentissage. Cette phase de partitionnement permet d'apprendre des modèles d'apparence où chacun est spécialisés dans la reconnaissance d'une sous-partie de la base et dont la fusion permet la généralisation de la détection à l'ensemble des objets de la classe. Les performances du détecteur ainsi obtenu sont évaluées sur plusieurs bases d'images aériennes très haute résolution à des résolution différentes et en plusieurs endroits du monde. La méthode de segmentation sémantique contextuelle développée utilise une combinaison de la description visuelle d'un superpixel extrait d'une image et des informations de contexte extraient entre un superpixel et ses voisins. La représentation du contexte entre les superpixels est obtenu en utilisant une représentation par modèle graphique entre les superpixels voisins. Les noeuds du graphes étant la représentation visuelle d'un superpixel et les arêtes la représentation contextuelle entre deux voisins. Enfin nous présentons une méthode de prédiction de la catégorie d'un superpixel en fonction des décisions données par les voisins pour rendre les prédictions plus robustes. La méthode a été testé sur une base d'image aérienne très haute résolution. / This work is about interpretation of the content of very high resolution aerial optical panchromatic images. Two methods are proposed for the classification of this kind of images. The first method aims at detecting the instances of a class of objects and the other method aims at segmenting superpixels extracted from the images using a contextual model of the relations between the superpixels. The object detection method in very high resolution images uses a mixture of appearance models of a class of objects then fuses the hypothesis returned by the models. We develop a method that clusters training samples into visual subcategories based on a two stages procedure using metadata and visual information. The clustering part allows to learn models that are specialised in recognizing a subset of the dataset and whose fusion lead to a generalization of the object detector. The performances of the method are evaluate on several dataset of very high resolution images at several resolutions and several places. The method proposed for contextual semantic segmentation use a combination of visual description of a superpixel extract from the image and contextual information gathered between a superpixel and its neighbors. The contextual representation is based on a graph where the nodes are the superpixels and the edges are the relations between two neighbors. Finally we predict the category of a superpixel using the predictions made by of the neighbors using the contextual model in order to make the prediction more reliable. We test our method on a dataset of very high resolution images.
284

Multimodal Sensor Fusion with Object Detection Networks for Automated Driving

Schröder, Enrico 07 January 2022 (has links)
Object detection is one of the key tasks of environment perception for highly automated vehicles. To achieve a high level of performance and fault tolerance, automated vehicles are equipped with an array of different sensors to observe their environment. Perception systems for automated vehicles usually rely on Bayesian fusion methods to combine information from different sensors late in the perception pipeline in a highly abstract, low-dimensional representation. Newer research on deep learning object detection proposes fusion of information in higher-dimensional space directly in the convolutional neural networks to significantly increase performance. However, the resulting deep learning architectures violate key non-functional requirements of a real-world safety-critical perception system for a series-production vehicle, notably modularity, fault tolerance and traceability. This dissertation presents a modular multimodal perception architecture for detecting objects using camera, lidar and radar data that is entirely based on deep learning and that was designed to respect above requirements. The presented method is applicable to any region-based, two-stage object detection architecture (such as Faster R-CNN by Ren et al.). Information is fused in the high-dimensional feature space of a convolutional neural network. The feature map of a convolutional neural network is shown to be a suitable representation in which to fuse multimodal sensor data and to be a suitable interface to combine different parts of object detection networks in a modular fashion. The implementation centers around a novel neural network architecture that learns a transformation of feature maps from one sensor modality and input space to another and can thereby map feature representations into a common feature space. It is shown how transformed feature maps from different sensors can be fused in this common feature space to increase object detection performance by up to 10% compared to the unimodal baseline networks. Feature extraction front ends of the architecture are interchangeable and different sensor modalities can be integrated with little additional training effort. Variants of the presented method are able to predict object distance from monocular camera images and detect objects from radar data. Results are verified using a large labeled, multimodal automotive dataset created during the course of this dissertation. The processing pipeline and methodology for creating this dataset along with detailed statistics are presented as well.
285

Spatial Temporal Analysis of Traffic Patterns during the COVID-19 Epidemic by Vehicle Detection using Planet Remote Sensing Satellite Images

Chen, Yulu 07 October 2021 (has links)
No description available.
286

Semantic Segmentation of RGB images for feature extraction in Real Time

Elavarthi, Pradyumna January 2019 (has links)
No description available.
287

The influence of neural network-based image enhancements on object detection

Pettersson, Eric, Al Khayyat, Muhammed January 2023 (has links)
This thesis investigates the impact of image enhancement techniques on object detection for carsin real-world traffic scenarios. The study focuses on upscaling and light correction treatments andtheir effects on detecting cars in challenging conditions. Initially, a YOLOv8x model is trained on clear static car images. The model is then evaluated on a test dataset captured in real-world driving with images from a front-mounted camera on a car, incorporating various lighting conditions and challenges. The images are then enhanced with said treatments and then evaluated again. The results in this experiment with its specific context show that upscaling seems to decreasemAP performance while lighting correction slightly improves accuracy. Additional training on acomplex image dataset outperforms all other approaches, highlighting the importance of diverse and realistic training data. These findings contribute to advancing computer vision research for object detection models.
288

Comparing the effect of random and contextual removal of images on object detection performance

Pettersson, Patrik, Gomez Palomäki, José Gabriel January 2023 (has links)
As datasets grow, the need for automated methods to ensure dataset quality arises. This report presents an experiment conducted on the MSCOCO train2017 dataset to identify image outliers using a force-directed graph built from a co-occurrence context, focusing on the mean average precision and average precision. The experiment involved placing anomaly scores on images using Euclidean distance and k-means clustering, creating subsets where a percentage of images withthe highest anomaly scores were removed. You Only Look Once version 8 models were trained on each subset, and the results showed a promising increase in performance compared to randomlyr emoving images. However, the increase was relatively small, and further research is needed. Interms of future work, other methods of identifying outliers, other datasets, and investigating the uses of contextual information in other areas are discussed.
289

Automatic object detection and tracking for eye-tracking analysis

Cederin, Liv, Bremberg, Ulrika January 2023 (has links)
In recent years, eye-tracking technology has gained considerable attention, facilitating analysis of gaze behavior and human visual attention. However, eye-tracking analysis often requires manual annotation on the objects being gazed upon, making quantitative data analysis a difficult and time-consuming process. This thesis explores the area of object detection and object tracking applied on scene camera footage from mobile eye-tracking glasses. We have evaluated the performance of state-of-the-art object detectors and trackers, resulting in an automated pipeline specialized at detecting and tracking objects in scene videos. Motion blur constitutes a significant challenge in moving cameras, complicating tasks such as object detection and tracking. To address this, we explored two approaches. The first involved retraining object detection models on datasets with augmented motion-blurred images, while the second one involved preprocessing the video frames with deblurring techniques. The findings of our research contributes with insights into efficient approaches to optimally detect and track objects in scene camera footage from eye-tracking glasses. Out of the technologies we tested, we found that motion deblurring using DeblurGAN-v2, along with a DINO object detector combined with the StrongSORT tracker, achieved the highest accuracies. Furthermore, we present an annotated dataset consisting of frames from recordings with eye-tracking glasses, that can be utilized for evaluating object detection and tracking performance.
290

Fusion of Evolution Constructed Features for Computer Vision

Price, Stanton Robert 04 May 2018 (has links)
In this dissertation, image feature extraction quality is enhanced through the introduction of two feature learning techniques and, subsequently, feature-level fusion strategies are presented that improve classification performance. Two image/signal processing techniques are defined for pre-conditioning image data such that the discriminatory information is highlighted for improved feature extraction. The first approach, improved Evolution-COnstructed features, employs a modified genetic algorithm to learn a series of image transforms, specific to a given feature descriptor, for enhanced feature extraction. The second method, Genetic prOgramming Optimal Feature Descriptor (GOOFeD), is a genetic programming-based approach to learning the transformations of the data for feature extraction. GOOFeD offers a very rich and expressive solution space due to is ability to represent highly complex compositions of image transforms through binary, unary, and/or the combination of the two, operators. Regardless of the two techniques employed, the goal of each is to learn a composition of image transforms from training data to present a given feature descriptor with the best opportunity to extract its information for the application at hand. Next, feature-level fusion via multiple kernel learning (MKL) is utilized to better combine the features extracted and, ultimately, improve classification accuracy performance. MKL is advanced through the introduction of six new indices for kernel weight assignment. Five of the indices are measured directly from the kernel matrix proximity values, making them highly efficient to compute. The calculation of the sixth index is performed explicitly on distributions in the reproducing kernel Hilbert space. The proposed techniques are applied to an automatic buried explosive hazard detection application and significant results are achieved.

Page generated in 0.092 seconds