Spelling suggestions: "subject:"abject detection."" "subject:"6bject detection.""
281 |
Tuning into uncertainty : A material exploration of object detection through playRukanskaitė, Julija January 2021 (has links)
The ubiquitous yet opaque logic of machine learning complicates both the design process and end-use. Because of this, much of Interaction Design and HCI now focus on making this logic transparent through human-like explanations and tight control while disregarding other, non-normative human-AI interactions as technical failures. In this thesis I re-frame such interactions as generative for both material exploration and user experience in non-purpose-driven applications. By expanding on the notion of machine learning uncertainty with play, queering, and more-than human design, I try to understand them in a designerly way. This re-framing is followed by a material-centred Research through Design process that concludes with Object Detection Radio: a ludic device that sonifies Tensorflow.js Object Detection API’s prediction probabilities. The design process suggests ways of making machine learning uncertainty explicit in human-AI interaction. In addition, I propose play as an alternative way of relating to and understanding the agency of machine learning technology.
|
282 |
Tabular Information Extraction from Datasheets with Deep Learning for Semantic ModelingAkkaya, Yakup 22 March 2022 (has links)
The growing popularity of artificial intelligence and machine learning has led to the adop-
tion of the automation vision in the industry by many other institutions and organizations.
Many corporations have made it their primary objective to make the delivery of goods and
services and manufacturing in a more efficient way with minimal human intervention. Au-
tomated document processing and analysis is also a critical component of this cycle for
many organizations that contribute to the supply chain. The massive volume and diver-
sity of data created in this rapidly evolving environment make this a highly desired step.
Despite this diversity, important information in the documents is provided in the tables.
As a result, extracting tabular data is a crucial aspect of document processing.
This thesis applies deep learning methodologies to detect table structure elements for
the extraction of data and preparation for semantic modelling. In order to find optimal
structure definition, we analyzed the performance of deep learning models in different
formats such as row/column and cell. The combined row and column detection models
perform poorly compared to other models’ detection performance due to the highly over-
lapping nature of rows and columns. Separate row and column detection models seem
to achieve the best average F1-score with 78.5% and 79.1%, respectively. However, de-
termining cell elements from the row and column detections for semantic modelling is
a complicated task due to spanning rows and columns. Considering these facts, a new
method is proposed to set the ground-truth information called a content-focused annota-
tion to define table elements better. Our content-focused method is competent in handling
ambiguities caused by huge white spaces and lack of boundary lines in table structures;
hence, it provides higher accuracy.
Prior works have addressed the table analysis problem under table detection and table
structure detection tasks. However, the impact of dataset structures on table structure
detection has not been investigated. We provide a comparison of table structure detection
performance with cropped and uncropped datasets. The cropped set consists of only
table images that are cropped from documents assuming tables are detected perfectly.
The uncropped set consists of regular document images. Experiments show that deep
learning models can improve the detection performance by up to 9% in average precision
and average recall on the cropped versions. Furthermore, the impact of cropped images is
negligible under the Intersection over Union (IoU) values of 50%-70% when compared to
the uncropped versions. However, beyond 70% IoU thresholds, cropped datasets provide
significantly higher detection performance.
|
283 |
Investigating techniques for improving accuracy and limiting overfitting for YOLO and real-time object detection on iOSGüven, Jakup January 2019 (has links)
I detta arbete genomförs utvecklingen av ett realtids objektdetekteringssystem för iOS. För detta ändamål används YOLO, en ett-stegs objektdetekterare och ett s.k. ihoplänkat neuralt nätverk vilket åstadkommer betydligt bättre prestanda än övriga realtidsdetek- terare i termer av hastighet och precision. En dörrdetekterare baserad på YOLO tränas och implementeras i en systemutvecklingsprocess. Maskininlärningsprocessen sammanfat- tas och praxis för att undvika överträning eller “overfitting” samt för att öka precision och hastighet diskuteras och appliceras. Vidare genomförs en rad experiment vilka pekar på att dataaugmentation och inkludering av negativ data i ett dataset medför ökad precision. Hyperparameteroptimisering och kunskapsöverföring pekas även ut som medel för att öka en objektdetekringsmodells prestanda. Författaren lyckas öka modellens mAP, ett sätt att mäta precision för objektdetekterare, från 63.76% till 86.73% utifrån de erfarenheter som dras av experimenten. En modells tendens för överträning utforskas även med resultat som pekar på att träning med över 300 epoker rimligen orsakar en övertränad modell. / This paper features the creation of a real time object detection system for mobile iOS using YOLO, a state-of-the-art one stage object detector and convoluted neural network far surpassing other real time object detectors in speed and accuracy. In this process an object detecting model is trained to detect doors. The machine learning process is outlined and practices to combat overfitting and increasing accuracy and speed are discussed. A series of experiments are conducted, the results of which suggests that data augmentation, including negative data in a dataset, hyperparameter optimisation and transfer learning are viable techniques in improving the performance of an object detection model. The author is able to increase mAP, a measurement of accuracy for object detectors, from 63.76% to 86.73% based on the results of experiments. The tendency for overfitting is also explored and results suggest that training beyond 300 epochs is likely to produce an overfitted model.
|
284 |
Apprentissage statistique de classes sémantiques pour l'interprétation d'images aériennes / Learning of semantic classes for aerial image analysisRandrianarivo, Hicham 15 December 2016 (has links)
Ce travail concerne l'interprétation du contenu des images aériennes optiques panchromatiques très haute résolution. Deux méthodes pour la classification du contenu de ces images ont été développées. Une méthode basée sur la détection des instances des différentes catégories d'objets et une autre méthode basée sur la segmentation sémantique des superpixels de l'image utilisant un modèle de contexte entre les différentes instances des superpixels. La méthode de détection des objets dans une image très haute résolution est basée sur l'apprentissage d'un mélange de modèle d'apparence de la catégorie d'objets à détecter puis d'une fusion des hypothèses renvoyées par les différents modèles. Nous proposons une méthode de partitionnement en sous catégories visuelles basée sur une procédure en deux étapes des exemples d'apprentissages de la base en fonction des métadonnées disponibles et de l'apparence des exemples d'apprentissage. Cette phase de partitionnement permet d'apprendre des modèles d'apparence où chacun est spécialisés dans la reconnaissance d'une sous-partie de la base et dont la fusion permet la généralisation de la détection à l'ensemble des objets de la classe. Les performances du détecteur ainsi obtenu sont évaluées sur plusieurs bases d'images aériennes très haute résolution à des résolution différentes et en plusieurs endroits du monde. La méthode de segmentation sémantique contextuelle développée utilise une combinaison de la description visuelle d'un superpixel extrait d'une image et des informations de contexte extraient entre un superpixel et ses voisins. La représentation du contexte entre les superpixels est obtenu en utilisant une représentation par modèle graphique entre les superpixels voisins. Les noeuds du graphes étant la représentation visuelle d'un superpixel et les arêtes la représentation contextuelle entre deux voisins. Enfin nous présentons une méthode de prédiction de la catégorie d'un superpixel en fonction des décisions données par les voisins pour rendre les prédictions plus robustes. La méthode a été testé sur une base d'image aérienne très haute résolution. / This work is about interpretation of the content of very high resolution aerial optical panchromatic images. Two methods are proposed for the classification of this kind of images. The first method aims at detecting the instances of a class of objects and the other method aims at segmenting superpixels extracted from the images using a contextual model of the relations between the superpixels. The object detection method in very high resolution images uses a mixture of appearance models of a class of objects then fuses the hypothesis returned by the models. We develop a method that clusters training samples into visual subcategories based on a two stages procedure using metadata and visual information. The clustering part allows to learn models that are specialised in recognizing a subset of the dataset and whose fusion lead to a generalization of the object detector. The performances of the method are evaluate on several dataset of very high resolution images at several resolutions and several places. The method proposed for contextual semantic segmentation use a combination of visual description of a superpixel extract from the image and contextual information gathered between a superpixel and its neighbors. The contextual representation is based on a graph where the nodes are the superpixels and the edges are the relations between two neighbors. Finally we predict the category of a superpixel using the predictions made by of the neighbors using the contextual model in order to make the prediction more reliable. We test our method on a dataset of very high resolution images.
|
285 |
Multimodal Sensor Fusion with Object Detection Networks for Automated DrivingSchröder, Enrico 07 January 2022 (has links)
Object detection is one of the key tasks of environment perception for highly automated vehicles. To achieve a high level of performance and fault tolerance, automated vehicles are equipped with an array of different sensors to observe their environment. Perception systems for automated vehicles usually rely on Bayesian fusion methods to combine information from different sensors late in the perception pipeline in a highly abstract, low-dimensional representation. Newer research on deep learning object detection proposes fusion of information in higher-dimensional space directly in the convolutional neural networks to significantly increase performance. However, the resulting deep learning architectures violate key non-functional requirements of a real-world safety-critical perception system for a series-production vehicle, notably modularity, fault tolerance and traceability.
This dissertation presents a modular multimodal perception architecture for detecting objects using camera, lidar and radar data that is entirely based on deep learning and that was designed to respect above requirements. The presented method is applicable to any region-based, two-stage object detection architecture (such as Faster R-CNN by Ren et al.). Information is fused in the high-dimensional feature space of a convolutional neural network. The feature map of a convolutional neural network is shown to be a suitable representation in which to fuse multimodal sensor data and to be a suitable interface to combine different parts of object detection networks in a modular fashion. The implementation centers around a novel neural network architecture that learns a transformation of feature maps from one sensor modality and input space to another and can thereby map feature representations into a common feature space. It is shown how transformed feature maps from different sensors can be fused in this common feature space to increase object detection performance by up to 10% compared to the unimodal baseline networks. Feature extraction front ends of the architecture are interchangeable and different sensor modalities can be integrated with little additional training effort. Variants of the presented method are able to predict object distance from monocular camera images and detect objects from radar data.
Results are verified using a large labeled, multimodal automotive dataset created during the course of this dissertation. The processing pipeline and methodology for creating this dataset along with detailed statistics are presented as well.
|
286 |
Spatial Temporal Analysis of Traffic Patterns during the COVID-19 Epidemic by Vehicle Detection using Planet Remote Sensing Satellite ImagesChen, Yulu 07 October 2021 (has links)
No description available.
|
287 |
Semantic Segmentation of RGB images for feature extraction in Real TimeElavarthi, Pradyumna January 2019 (has links)
No description available.
|
288 |
The influence of neural network-based image enhancements on object detectionPettersson, Eric, Al Khayyat, Muhammed January 2023 (has links)
This thesis investigates the impact of image enhancement techniques on object detection for carsin real-world traffic scenarios. The study focuses on upscaling and light correction treatments andtheir effects on detecting cars in challenging conditions. Initially, a YOLOv8x model is trained on clear static car images. The model is then evaluated on a test dataset captured in real-world driving with images from a front-mounted camera on a car, incorporating various lighting conditions and challenges. The images are then enhanced with said treatments and then evaluated again. The results in this experiment with its specific context show that upscaling seems to decreasemAP performance while lighting correction slightly improves accuracy. Additional training on acomplex image dataset outperforms all other approaches, highlighting the importance of diverse and realistic training data. These findings contribute to advancing computer vision research for object detection models.
|
289 |
Comparing the effect of random and contextual removal of images on object detection performancePettersson, Patrik, Gomez Palomäki, José Gabriel January 2023 (has links)
As datasets grow, the need for automated methods to ensure dataset quality arises. This report presents an experiment conducted on the MSCOCO train2017 dataset to identify image outliers using a force-directed graph built from a co-occurrence context, focusing on the mean average precision and average precision. The experiment involved placing anomaly scores on images using Euclidean distance and k-means clustering, creating subsets where a percentage of images withthe highest anomaly scores were removed. You Only Look Once version 8 models were trained on each subset, and the results showed a promising increase in performance compared to randomlyr emoving images. However, the increase was relatively small, and further research is needed. Interms of future work, other methods of identifying outliers, other datasets, and investigating the uses of contextual information in other areas are discussed.
|
290 |
Automatic object detection and tracking for eye-tracking analysisCederin, Liv, Bremberg, Ulrika January 2023 (has links)
In recent years, eye-tracking technology has gained considerable attention, facilitating analysis of gaze behavior and human visual attention. However, eye-tracking analysis often requires manual annotation on the objects being gazed upon, making quantitative data analysis a difficult and time-consuming process. This thesis explores the area of object detection and object tracking applied on scene camera footage from mobile eye-tracking glasses. We have evaluated the performance of state-of-the-art object detectors and trackers, resulting in an automated pipeline specialized at detecting and tracking objects in scene videos. Motion blur constitutes a significant challenge in moving cameras, complicating tasks such as object detection and tracking. To address this, we explored two approaches. The first involved retraining object detection models on datasets with augmented motion-blurred images, while the second one involved preprocessing the video frames with deblurring techniques. The findings of our research contributes with insights into efficient approaches to optimally detect and track objects in scene camera footage from eye-tracking glasses. Out of the technologies we tested, we found that motion deblurring using DeblurGAN-v2, along with a DINO object detector combined with the StrongSORT tracker, achieved the highest accuracies. Furthermore, we present an annotated dataset consisting of frames from recordings with eye-tracking glasses, that can be utilized for evaluating object detection and tracking performance.
|
Page generated in 0.1083 seconds