• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 329
  • 42
  • 19
  • 13
  • 10
  • 8
  • 4
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 516
  • 516
  • 236
  • 196
  • 159
  • 125
  • 106
  • 106
  • 103
  • 84
  • 83
  • 75
  • 73
  • 71
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Thermal-RGB Sensory Data for Reliable and Robust Perception

El Ahmar, Wassim 29 November 2023 (has links)
The significant advancements and breakthroughs achieved in Machine Learning (ML) have revolutionized the field of Computer Vision (CV), where numerous real-world applications are now utilizing state-of-the-art advancements in the field. Advanced video surveillance and analytics, entertainment, and autonomous vehicles are a few examples that rely heavily on reliable and accurate perception systems. Deep learning usage in Computer Vision has come a long way since it sparked in 2012 with the introduction of Alexnet. Convolutional Neural Networks (CNN) have evolved to become more accurate and reliable. This is attributed to the advancements in GPU parallel processing, and to the recent availability of large scale and high quality annotated datasets that allow the training of complex models. However, ML models can only be as good as the data they train on and the data they receive in production. In real-world environments, a perception system often needs to be able to operate in different environments and conditions (weather, lighting, obstructions, etc.). As such, it is imperative for a perception system to utilize information from different types of sensors to mitigate the limitations of individual sensors. In this dissertation, we focus on studying the efficacy of using thermal sensors to enhance the robustness of perception systems. We focus on two common vision tasks: object detection and multiple object tracking. Through our work, we prove the viability of thermal sensors as a complement, and in some scenarios a replacement, to RGB cameras. For their important applications in autonomous vehicles and surveillance, we focus our research on pedestrian and vehicle perception. We also introduce the world's first (to the best of our knowledge) large scale dataset for pedestrian detection and tracking including thermal and corresponding RGB images.
72

Integrating Multiple Deep Learning Models for Disaster Description in Low-Altitude Videos

Wang, Haili 12 1900 (has links)
Computer vision technologies are rapidly improving and becoming more important in disaster response. The majority of disaster description techniques now focus either on identify objects or categorize disasters. In this study, we trained multiple deep neural networks on low-altitude imagery with highly imbalanced and noisy labels. We utilize labeled images from the LADI dataset to formulate a solution for general problem in disaster classification and object detection. Our research integrated and developed multiple deep learning models that does the object detection task as well as the disaster scene classification task. Our solution is competitive in the TRECVID Disaster Scene Description and Indexing (DSDI) task, demonstrating that it is comparable to other suggested approaches in retrieving disaster-related video clips.
73

DEEP LEARNING FOR DETECTING AND CLASSIFYING THE GROWTH STAGES OF WEEDS ON FIELDS

Almalky, Abeer Matar 01 May 2023 (has links) (PDF)
Due to the current and anticipated massive increase of world population, expanding the agriculture cycle is necessary for accommodating the expected human’s demand. However, weeds invasion, which is a detrimental factor for agricultural production and quality, is a challenge for such agricultural expansion. Therefore, controlling weeds on fields by accurate,automatic, low-cost, environment-friendly, and real-time weeds detection technique is required. Additionally, automating the process of detecting, classifying, and counting of weeds per their growth stages is vital for using appropriate weeds controlling techniques. The literature review shows that there is a gap in the research efforts that handle the automation of weeds’ growth stages classification using DL models. Accordingly, in this thesis, a dataset of four weed (Consolida Regalis) growth stages was collected using unnamed arial vehicle. In addition, we developed and trained one-stage and two-stages deep learning models: YOLOv5, RetinaNet (with Resnet-101-FPN, Resnet-50-FPN backbones), and Faster R-CNN (with Resnet-101-DC5, Resnet-101-FPN, Resnet-50-FPN backbones) respectively. Comparing the results of all trained models, we concluded that, in one hand, the Yolov5-small model succeeds in detecting weeds and classifying the weed’s growth stages in the shortest inference time in real-time with the highest recall of 0.794 and succeeds in counting the instances of weeds per the four growth stages in real-time with counting time of 0.033 millisecond per frame. On the other hand, RetinaNet with ResNet-101-FPN backbone shows accurate and precise results in the testing phase (average precision of 87.457). Even though the Yolov5-large model showed the highest precision value in classifying almost all weed’s growth stages in training phase, Yolov5-large could not detect all objects in tested images. As a whole, RetinaNet with ResNet-101-FPN backbone shows accurate and high precision, while Yolov5-small has the shortest real inference time of detection and growth stages classification. Farmers can use the resulted deep learning model to detect, classify, and count weeds per growth stages automatically and as a result decrease not only the needed time and labor cost, but also the use of chemicals to control weeds on fields.
74

Video-based Motion Analysis and Visualization for Shooting Strategies : A visualization tool for shooting videos

Carlsson, Jonas January 2023 (has links)
Video analysis and visualization are widely used in various applications, including sports analysis, video surveillance and medical imaging. This study investigates the use of video visualization as a tool in the field of shooting. The study aims to answer the question: "What can be a good visualization strategy for video-based shooting analysis?". To answer this question, a software, using video visualization to analyze shooting videos, was created. The visualized videos were then tested by both inexperienced individuals and shooting experts. The implementation steps of the project consisted of recording shooting videos followed by implementing the visualizations. Object detection was used to track the shooting target and extract data. Displayed on the visualized videos were colored markings to show the locations of the aimpoint of the gun and the target. The data extracted was used to show graphs of relevant shooting metrics. The user tests focused on collecting quantitative and qualitative data from the users. The testers reacted positively to the use of visualized videos as a tool for analysing shooting performance. The analysis of the testers responses suggests that video visualization is an effective tool for analyzing shooting videos and holds great promise for future research in the field of shooting. The program implemented in the study has good potential for being used as a tool for improving shooting strategies.
75

Generalized Landmark Recognition in Robot Navigation

Zhou, Qiang January 2004 (has links)
No description available.
76

Evaluating Methods for Image Segmentation

Dissing, Lukas January 2023 (has links)
This work implements and evaluates different methods of image analysis and manipulation for the purposesof object recognition. It lays the groundwork for possible future projects that could use machine learning onthe output for the purposes of analyzing the behaviour of lab mice. Three different methods are presented,implemented on a selection of examples and evaluated.
77

Graph-based Inference with Constraints for Object Detection and Segmentation

Ma, Tianyang January 2013 (has links)
For many fundamental problems of computer vision, adopting a graph-based framework can be straight-forward and very effective. In this thesis, I propose several graph-based inference methods tailored for different computer vision applications. It starts from studying contour-based object detection methods. In particular, We propose a novel framework for contour based object detection, by replacing the hough-voting framework with finding dense subgraph inference. Compared to previous work, we propose a novel shape matching scheme suitable for partial matching of edge fragments. The shape descriptor has the same geometric units as shape context but our shape representation is not histogram based. The key contribution is that we formulate the grouping of partial matching hypotheses to object detection hypotheses is expressed as maximum clique inference on a weighted graph. Consequently, each detection result not only identifies the location of the target object in the image, but also provides a precise location of its contours, since we transform a complete model contour to the image. We achieve very competitive results on ETHZ dataset, obtained in a pure shape-based framework, demonstrate that our method achieves not only accurate object detection but also precise contour localization on cluttered background. Similar to the task of grouping of partial matches in the contour-based method, in many computer vision problems, we would like to discover certain pattern among a large amount of data. For instance, in the application of unsupervised video object segmentation, where we need automatically identify the primary object and segment the object out in every frame. We propose a novel formulation of selecting object region candidates simultaneously in all frames as finding a maximum weight clique in a weighted region graph. The selected regions are expected to have high objectness score (unary potential) as well as share similar appearance (binary potential). Since both unary and binary potentials are unreliable, we introduce two types of mutex (mutual exclusion) constraints on regions in the same clique: intra-frame and inter-frame constraints. Both types of constraints are expressed in a single quadratic form. An efficient algorithm is applied to compute the maximal weight cliques that satisfy the constraints. We apply our method to challenging benchmark videos and obtain very competitive results that outperform state-of-the-art methods. We also show that the same maximum weight subgraph with mutex constraints formulation can be used to solve various computer vision problems, such as points matching, solving image jigsaw puzzle, and detecting object using 3D contours. / Computer and Information Science
78

Shape Based Object Detection and Recognition in Silhouettes and Real Images

Yang, Xingwei January 2011 (has links)
Shape is very essential for detecting and recognizing objects. It is robust to illumination, color changes. Human can recognize objects just based on shapes, thus shape based object detection and recognition methods have been popular in many years. Due to problem of segmentation, some researchers have worked on silhouettes instead of real images. The main problem in this area is object recognition and the difficulty is to handle shapes articulation and distortion. Previous methods mainly focus on one to one shape similarity measurement, which ignores context information between shapes. Instead, we utilize graph-transduction methods to reveal the intrinsic relation between shapes on 'shape manifold'. Our methods consider the context information in the dataset, which improves the performance a lot. To better describe the manifold structure, we also propose a novel method to add synthetic data points for densifying data manifold. The experimental results have shown the advantage of the algorithm. Moreover, a novel diffusion process on Tensor Product Graph is carried out for learning better affinities between data. This is also used for shape retrieval, which reaches the best ever results on MPEG-7 dataset. As shapes are important and helpful for object detection and recognition in real images, a lot of methods have used shapes to detect and recognize objects. There are two important parts for shape based methods, model construction and object detection, recognition. Most of the current methods are based on hand selected models, which is helpful but not extendable. To solve this problem, we propose to construct model by shape matching between some silhouettes and one hand decomposed silhouette. This weakly supervised method can be used not only learn the models in one object class, but also transfer the structure knowledge to other classes, which has the similar structure with the hand decomposed silhouette. The other problem is detecting and recognizing objects. A lot of methods search the images by sliding window to detect objects, which can find the global solution but with high complexity. Instead, we use sampling methods to reduce the complexity. The method we utilized is particle filter, which is popular in robot mapping and localization. We modified the standard particle filter to make it suitable for static observations and it is very helpful for object detection. Moreover, The usage of particle filter is extended for solving the jigsaw puzzle problem, where puzzle pieces are square image patches. The proposed method is able to reach much better results than the method with Loopy Belief Propagation. / Computer and Information Science
79

3D Object Detection from Images

Simonelli, Andrea 28 September 2022 (has links)
Remarkable advancements in the field of Computer Vision, Artificial Intelligence and Machine Learning have led to unprecedented breakthroughs in what machines are able to achieve. In many tasks such as in Image Classification in fact, they are now capable of even surpassing human performance. While this is truly outstanding, there are still many tasks in which machines lag far behind. Walking in a room, driving on an highway, grabbing some food for example. These are all actions that feel natural to us but can be quite unfeasible for them. Such actions require to identify and localize objects in the environment, effectively building a robust understanding of the scene. Humans easily gain this understanding thanks to their binocular vision, which provides an high-resolution and continuous stream of information to our brain that efficiently processes it. Unfortunately, things are much different for machines. With cameras instead of eyes and artificial neural networks instead of a brain, gaining this understanding is still an open problem. In this thesis we will not focus on solving this problem as a whole, but instead delve into a very relevant part of it. We will in fact analyze how to make ma- chines be able to identify and precisely localize objects in the 3D space by relying only on visual input i.e. 3D Object Detection from Images. One of the most complex aspects of Image-based 3D Object Detection is that it inherently requires the solution of many different sub-tasks e.g. the estimation of the object’s distance and its rotation. A first contribution of this thesis is an analysis of how these sub-tasks are usually learned, highlighting a destructivebehavior which limits the overall performance and the proposal of an alternative learning method that avoids it. A second contribution is the discovery of a flaw in the computation of the metric which is widely used in the field, affecting the re-computation of the performance of all published methods and the introduction of a novel un-flawed metric which has now become the official one. A third contribution is focused on one particular sub-task, i.e. estimation of the object’s distance, which is demonstrated to be the most challenging. Thanks to the introduction of a novel approach which normalizes the appearance of objects with respect to their distance, detection performances can be greatly improved. A last contribution of the thesis is the critical analysis of the recently proposed Pseudo-LiDAR methods. Two flaws in their training protocol have been identified and analyzed. On top of this, a novel method able to achieve state-of-the-art in Image-based 3D Object Detection has been developed.
80

Analyzing and Navigating Electronic Theses and Dissertations

Ahuja, Aman 21 July 2023 (has links)
Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows: - A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc. - Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed. - A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities. - A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends. / Doctor of Philosophy / Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the research community. Millions of ETDs are now publicly available online, often through one of many online digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier and accessible. Several advances in the field of machine learning for text data in recent years have led to the development of techniques that can serve as the backbone of such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, by parsing the information contained in the long PDF documents that make up ETDs, into a more compute-friendly format. This would enable researchers and developers to build end-user services for digital libraries. We also propose a framework to support document browsing and long document navigation, which are some of the important end-user services required in digital libraries.

Page generated in 0.1474 seconds