• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1941
  • 313
  • 150
  • 112
  • 108
  • 69
  • 56
  • 46
  • 24
  • 20
  • 14
  • 13
  • 13
  • 13
  • 13
  • Tagged with
  • 3581
  • 3581
  • 974
  • 869
  • 791
  • 791
  • 645
  • 617
  • 578
  • 538
  • 530
  • 525
  • 479
  • 449
  • 447
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
591

Anomaly Detection for Product Inspection and Surveillance Applications / Anomalidetektion för produktinspektions- och övervakningsapplikationer

Thulin, Peter January 2015 (has links)
Anomaly detection is a general theory of detecting unusual patterns or events in data. This master thesis investigates the subject of anomaly detection in two different applications. The first application is product inspection using a camera and the second application is surveillance using a 2D laser scanner. The first part of the thesis presents a system for automatic visual defect inspection. The system is based on aligning the images of the product to a common template and doing pixel-wise comparisons. The system is trained using only images of products that are defined as normal, i.e. non-defective products. The visual properties of the inspected products are modelled using three different methods. The performance of the system and the different methods have been evaluated on four different datasets. The second part of the thesis presents a surveillance system based on a single laser range scanner. The system is able to detect certain anomalous events based on the time, position and velocities of individual objects in the scene. The practical usefulness of the system is made plausible by a qualitative evaluation using unlabelled data.
592

Shape From Shading Analysis By Synthesis

Sathish, Sriram J 10 1900 (has links) (PDF)
No description available.
593

Detect Dense Products on Grocery Shelves with Deep Learning Techniques

Li Shen (8735982) 12 October 2021 (has links)
<div>Object detection is a considerable area of computer vision. The aim of object detection is to increase its efficacy and accuracy that have always been targeted. The research area of object detection has many broad areas, include self-driving, manufacturing and retail stores. However, scenes of using object detection in detecting dense objects have rarely gathered in much attention. Dense and small object detection is relevant to many real-world scenarios, for example, in retail stores and surveillance systems. Human suffers the speed and accuracy to count and audit the crowded product on the shelves. We motivate to detect the dense product on the shelves. It is a research area related to industries. In this thesis, we going to fine-tune CenterNet as a detector to detect the objects on the shelves. To validate the effectiveness of CenterNet network architecture, we collected the Bottle dataset that collected images from real-world supermarket shelves in different environments. We compared performance on the Bottle Dataset with many different circumstances. The ResNet-101(colored+PT) achieved the best result of CenterNet that outperform other network architectures. we proved perspective transformation can be implemented on state-of-the-art detectors, which solved the issue when detector did not achieve a good result on strongly angled images. We concluded that colored information did contribute to the performance in detecting the objects on the shelf, but it did not contribute as much as geometric information provided for learning its information. The result of the accuracy of detection on CenterNet meets the need of accuracy on industry requirements.</div><div><br></div>
594

Object Detection and Semantic Segmentation Using Self-Supervised Learning

Gustavsson, Simon January 2021 (has links)
In this thesis, three well known self-supervised methods have been implemented and trained on road scene images. The three so called pretext tasks RotNet, MoCov2, and DeepCluster were used to train a neural network self-supervised. The self-supervised trained networks where then evaluated on different amount of labeled data on two downstream tasks, object detection and semantic segmentation. The performance of the self-supervised methods are compared to networks trained from scratch on the respective downstream task. The results show that it is possible to achieve a performance increase using self-supervision on a dataset containing road scene images only. When only a small amount of labeled data is available, the performance increase can be substantial, e.g., a mIoU from 33 to 39 when training semantic segmentation on 1750 images with a RotNet pre-trained backbone compared to training from scratch. However, it seems that when a large amount of labeled images are available (&gt;70000 images), the self-supervised pretraining does not increase the performance as much or at all.
595

Programming methodologies for ADAS applications in parallel heterogeneous architectures / Méthodologies de programmation d'applications ADAS sur des architectures parallèles et hétérogènes

Dekkiche, Djamila 10 November 2017 (has links)
La vision par ordinateur est primordiale pour la compréhension et l’analyse d’une scène routière afin de construire des systèmes d’aide à la conduite (ADAS) plus intelligents. Cependant, l’implémentation de ces systèmes dans un réel environnement automobile et loin d’être simple. En effet, ces applications nécessitent une haute performance de calcul en plus d’une précision algorithmique. Pour répondre à ces exigences, de nouvelles architectures hétérogènes sont apparues. Elles sont composées de plusieurs unités de traitement avec différentes technologies de calcul parallèle: GPU, accélérateurs dédiés, etc. Pour mieux exploiter les performances de ces architectures, différents langages sont nécessaires en fonction du modèle d’exécution parallèle. Dans cette thèse, nous étudions diverses méthodologies de programmation parallèle. Nous utilisons une étude de cas complexe basée sur la stéréo-vision. Nous présentons les caractéristiques et les limites de chaque approche. Nous évaluons ensuite les outils employés principalement en terme de performances de calcul et de difficulté de programmation. Le retour de ce travail de recherche est crucial pour le développement de futurs algorithmes de traitement d’images en adéquation avec les architectures parallèles avec un meilleur compromis entre les performances de calcul, la précision algorithmique et la difficulté de programmation. / Computer Vision (CV) is crucial for understanding and analyzing the driving scene to build more intelligent Advanced Driver Assistance Systems (ADAS). However, implementing CV-based ADAS in a real automotive environment is not straightforward. Indeed, CV algorithms combine the challenges of high computing performance and algorithm accuracy. To respond to these requirements, new heterogeneous circuits are developed. They consist of several processing units with different parallel computing technologies as GPU, dedicated accelerators, etc. To better exploit the performances of such architectures, different languages are required depending on the underlying parallel execution model. In this work, we investigate various parallel programming methodologies based on a complex case study of stereo vision. We introduce the relevant features and limitations of each approach. We evaluate the employed programming tools mainly in terms of computation performances and programming productivity. The feedback of this research is crucial for the development of future CV algorithms in adequacy with parallel architectures with a best compromise between computing performance, algorithm accuracy and programming efforts.
596

DRONE CLASSIFICATION WITH MOTION AND APPEARANCE FEATURE USING CONVOLUTIONAL NEURAL NETWORKS

Eunsuh Lee (8981213) 17 June 2020 (has links)
<div> <div> <div> <p>With the advancement in Unmanned Aerial Vehicles (UAV) technology, UAVs have become accessible to the public. However, recent world events have highlighted that the rapid increase of UAVs is bringing with it a threat to public privacy and security. Thus, it is important to think about how to prevent the threats of UAVs to protect our privacy and safety. This study aims to provide an alternative way to substitute an expensive system by using 2D optical sensors that can be easily utilized by people. One of the main challenges for aerial object recognition with computer vision is discriminating other flying objects from the targets, in the far distance. There are limitation to classify the flying object when it appears as a set of small black pixels on the frame. The movement feature can help the system to extract the discriminative feature, so that the classifier can classify the UAV and other objects, such as a bird. Thus, this study proposes a drone detection system using two elements of information, which are appearance information and motion information to overcome the limitation of a vision based system. </p> </div> </div> </div>
597

Leveraging Big Data and Deep Learning for Economical Condition Assessment of Wastewater Pipelines

Srinath Shiv Kumar (8782508) 30 April 2020 (has links)
<p>Sewer pipelines are an essential component of wastewater infrastructure and serve as the primary means for transporting wastewater to treatment plants. In the face of increasing demands and declining budgets, municipalities across the US face unprecedented challenges in maintaining current service levels of the 800,000 miles of public sewer pipes. Inadequate maintenance of sewer pipes leads to inflow and infiltration, sanitary sewer overflows, and sinkholes, which threaten human health and are expensive to correct. Accurate condition information from sewers is essential for planning maintenance, repair, and rehabilitation activities and ensuring the longevity of sewer systems. Currently, this information is obtained through visual closed-circuit television (CCTV) inspections and deterioration modeling of sewer pipelines. CCTV inspection facilitates the identification of defects in pipe walls whereas deterioration modeling estimates the remaining service life of pipes based on their current condition. However, both methods have drawbacks that limit their effective usage for sewer condition assessment. For instance, CCTV inspections tend to be labor intensive, costly, and time consuming, with the accuracy of collected data depending on the operator’s experience and skill level. Current deterioration modeling approaches are unable to incorporate spatial information about pipe deterioration, such as the relative locations, densities, and clustering of defects, which play a crucial role in pipe failure. This study attempts to leverage recent advances in deep learning and data mining to address these limitations of CCTV inspection and deterioration modeling and consists of three objectives. </p> <p> </p> <p>The first objective of this study seeks to develop algorithms for automated defect interpretation, to improve the speed and consistency of sewer CCTV inspections. The development, calibration, and testing of the algorithms in this study followed an iterative approach that began with the development of a defect classification system using a 5-layer convolutional neural network (CNN) and evolved into a two-step defect classification and localization framework, which combines a the ResNet34 CNN and Faster R-CNN object detection model. This study also demonstrates the use of a feature visualization technique, called class activation mapping (CAM), as a diagnostic tool to improve the accuracy of CNNs in defect classification tasks—thereby representing a crucial first step in using CNN interpretation techniques to develop improved models for sewer defect identification. </p> <p> </p> <p>Extending upon the development of automated defect interpretation algorithms, the second objective of this study attempts to facilitate autonomous navigation of sewer CCTV robots. To overcome Global Positioning System (GPS) signal unavailability inside underground pipes, this study developed a vision-based algorithm that combines deep learning-based object detection with optical flow for estimating the orientation of sewer CCTV cameras. This algorithm can enable inspection robots to estimate their trajectories and make corrective actions while autonomously traversing pipes. Hence, considered together, the first two objectives of this study pave the way for future inspection technologies that combine automated defect interpretation with autonomous navigation of sewer CCTV robots.</p> <p> </p> <p>The third and final objective of this study seeks to develop a novel methodology that incorporates spatial information about defects (such as their locations, densities, and co-occurrence characteristics) when assessing sewer deterioration. A methodology called Defect Cluster Analysis (DCA) was developed in order to mine sewer inspection reports and identify pipe segments that contain clusters of defects (i.e., multiple defects in proximity). Additionally, an approach to mine co-occurrence characteristics among defects is also introduced (i.e., identification of defects which occur frequently together). Together the two approaches (i.e., DCA and co-occurrence mining) address a key limitation of existing deterioration modeling approaches (i.e., the lack of consideration to spatial information about defects)—thereby leading to the generation of new insights into pipeline rehabilitation decision-making. </p> <p> </p> <p>The algorithms and approaches presented in this dissertation have the potential to improve the speed, accuracy, and consistency of assessing sewer pipeline deterioration, leading to better prioritization strategies for maintenance, repair, and rehabilitation. The automated defect interpretation algorithms proposed in this study can be used to assign the subjective and error-prone task of defect identification to computer processes, thereby enabling human operators to focus on decision-making aspects, such as deciding whether to repair or rehabilitate a pipe. Automated interpretation of sewer CCTV videos could also facilitate re-evaluation of historical sewer inspection videos, which would be infeasible if performed manually. The information gleaned from re-evaluating these videos could generate insights into pipe deterioration, leading to improved deterioration models. The algorithms for autonomous navigation could enable the development of completely autonomous inspection platforms that utilize unmanned aerial vehicles (UAVs) or similar technologies to facilitate rapid assessment of sewers. Furthermore, these technologies could be integrated into wireless sensor networks, paving the way for real-time condition monitoring of sewer infrastructure. The DCA approach could be used as a diagnostic tool to identify specific sections in a pipeline system that have a high propensity for failure due to the existence of multiple defects in proximity. When combined with contextual information (e.g., soil properties, water table levels, and presence of large trees), DCA could provide insights about the likelihood of void formation due to sand infiltration. The DCA approach could also be used to periodically determine how the distribution of defects and their clustering progresses with time and when examined alongside contextual data (e.g., soil properties, water table levels, presence of trees) could reveal trends in pipeline deterioration. </p>
598

Automated Disconnected Towing System

Yaqin Wang (8797037) 06 May 2020 (has links)
<div><div><div><p>Towing capacity affects a vehicle’s towing ability and it is usually costly to buy or even rent a vehicle that can tow certain amount of weight. A widely swaying towing trailer is one of the main causes for accidents that involves towing trailers. This study propose an affordable automated disconnected towing system (ADTS) that does not require physical connection between leading vehicle and the trailer vehicle by only using a computer vision system. The ADTS contains two main parts: a leading vehicle which can perform lane detection and a trailer vehicle which can automatically follow the leading vehicle by detecting the license plate of the leading vehicle. The trailer vehicle can adjust its speed according to the distance from the leading vehicle.</p></div></div></div>
599

Deep Learning for Computer Vision and it's Application to Machine Perception of Hand and Object

Sangpil Kim (9745326) 15 December 2020 (has links)
<div>The advances in computing power and artificial intelligence have made applications such as augmented reality/virtual reality (AR/VR) and smart factories possible. In smart factories, robots interact with workers and, AR/VR devices are used for skill transfer. In order to enable these types of applications, a computer needs to recognize the user’s hand and body movement with objects and their interactions. In this regard, machine perception of hands and objects is the first step for human and computer integration. This is because personal activity is represented by the interaction of objects and hands. For machine perception of objects and hands, vision sensors are widely used in a wide range of industrial applications since visual information provides non-contact input signals. For these reasons, computer vision-oriented machine perception has been researched extensively. However, due to the complexity of object space and hand movement, machine perception of hands and objects remains a challenging problem.</div><div><br></div><div>Recently, deep learning has been introduced with groundbreaking results in the computer vision domain, which address many challenging problems and significantly improves the performance of AI in many tasks. The success of deep learning algorithms depends on the learning strategy and the quality and quantity of the training data. Therefore, in this thesis, we tackle machine perception of hands and objects with four aspects: learning underlying structure of 2D data, fusing surface and volume content of a 3D object, developing an annotation tool for mechanical components, and using thermal information of bare hands. More broadly, we improve the machine perception of interacting hand and object by developing a learning strategy and framework for large-scale dataset creation.</div><div><br></div><div>For the learning strategy, we use a conditional generative model, which learns conditional distribution of the dataset by minimizing the gap between data distribution and the model distribution for hands and objects. First, we propose an efficient conditional generative model for 2D images that can traverse the latent space given a conditional vector. Subsequently, we develop a conditional generative model for 3D space that fuses volume and surface representations and learns the association of functional parts. These methods improve machine perception of objects and hands for not only 2D images but also in 3D space. However, the performance of deep learning algorithms has positive correlation with the quality and quantity of datasets, which motivates us to develop the a large-scale dataset creation framework.</div><div><br></div><div>In order to leverage the learning strategies of deep learning algorithms, we develop annotation tools that can establish a large-scale dataset for objects and hands and evaluate existing deep learning methods with extensive performance analysis. For the object dataset creation, we establish a taxonomy of mechanical components and a web-based annotation tool. With this framework, we create a large-scale mechanical components dataset. With the dataset, we benchmark seven different machine perception algorithms for 3D objects. For hand annotation, we propose a novel data curation method for pixel-wise hand segmentation dataset creation, which uses thermal information and hand geometry to identify and segment the hands from objects and backgrounds. Also, we introduce a data fusion method that fuses thermal information and RGB-D data for the machine perception of hands while interacting with objects.</div>
600

Learning Structured Representations for Understanding Visual and Multimedia Data

Zareian, Alireza January 2021 (has links)
Recent advances in Deep Learning (DL) have achieved impressive performance in a variety of Computer Vision (CV) tasks, leading to an exciting wave of academic and industrial efforts to develop Artificial Intelligence (AI) facilities for every aspect of human life. Nevertheless, there are inherent limitations in the understanding ability of DL models, which limit the potential of AI in real-world applications, especially in the face of complex, multimedia input. Despite tremendous progress in solving basic CV tasks, such as object detection and action recognition, state-of-the-art CV models can merely extract a partial summary of visual content, which lacks a comprehensive understanding of what happens in the scene. This is partly due to the oversimplified definition of CV tasks, which often ignore the compositional nature of semantics and scene structure. It is even less studied how to understand the content of multiple modalities, which requires processing visual and textual information in a holistic and coordinated manner, and extracting interconnected structures despite the semantic gap between the two modalities. In this thesis, we argue that a key to improve the understanding capacity of DL models in visual and multimedia domains is to use structured, graph-based representations, to extract and convey semantic information more comprehensively. To this end, we explore a variety of ideas to define more realistic DL tasks in both visual and multimedia domains, and propose novel methods to solve those tasks by addressing several fundamental challenges, such as weak supervision, discovery and incorporation of commonsense knowledge, and scaling up vocabulary. More specifically, inspired by the rich literature of semantic graphs in Natural Language Processing (NLP), we explore innovative scene understanding tasks and methods that describe images using semantic graphs, which reflect the scene structure and interactions between objects. In the first part of this thesis, we present progress towards such graph-based scene understanding solutions, which are more accurate, need less supervision, and have more human-like common sense compared to the state of the art. In the second part of this thesis, we extend our results on graph-based scene understanding to the multimedia domain, by incorporating the recent advances in NLP and CV, and developing a new task and method from the ground up, specialized for joint information extraction in the multimedia domain. We address the inherent semantic gap between visual content and text by creating high-level graph-based representations of images, and developing a multitask learning framework to establish a common, structured semantic space for representing both modalities. In the third part of this thesis, we explore another extension of our scene understanding methodology, to open-vocabulary settings, in order to make scene understanding methods more scalable and versatile. We develop visually grounded language models that use naturally supervised data to learn the meaning of all words, and transfer that knowledge to CV tasks such as object detection with little supervision. Collectively, the proposed solutions and empirical results set a new state of the art for the semantic comprehension of visual and multimedia content in a structured way, in terms of accuracy, efficiency, scalability, and robustness.

Page generated in 0.0673 seconds