Spelling suggestions: "subject:"abject detection"" "subject:"6bject detection""
81 |
Analyzing and Navigating Electronic Theses and DissertationsAhuja, Aman 21 July 2023 (has links)
Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries.
This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows:
- A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc.
- Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed.
- A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities.
- A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends. / Doctor of Philosophy / Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the research community. Millions of ETDs are now publicly available online, often through one of many online digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier and accessible. Several advances in the field of machine learning for text data in recent years have led to the development of techniques that can serve as the backbone of such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, by parsing the information contained in the long PDF documents that make up ETDs, into a more compute-friendly format. This would enable researchers and developers to build end-user services for digital libraries. We also propose a framework to support document browsing and long document navigation, which are some of the important end-user services required in digital libraries.
|
82 |
Features identification and tracking for an autonomous ground vehicleNguyen, Chuong Hoang 14 June 2013 (has links)
This thesis attempts to develop features identification and tracking system for an autonomous ground vehicle by focusing on four fundamental tasks: Motion detection, object tracking, scene recognition, and object detection and recognition. For motion detection, we combined the background subtraction method using the mixture of Gaussian models and the optical flow to highlight any moving objects or new entering objects which stayed still. To increase robustness for object tracking result, we used the Kalman filter to combine the tracking method based on the color histogram and the method based on invariant features. For scene recognition, we applied the algorithm Census Transform Histogram (CENTRIST), which is based on Census Transform images of the training data and the Support Vector Machine classifier, to recognize a total of 8 scene categories. Because detecting the horizon is also an important task for many navigation applications, we also performed horizon detection in this thesis. Finally, the deformable parts-based models algorithm was implemented to detect some common objects, such as humans and vehicles. Furthermore, objects were only detected in the area under the horizon to reduce the detecting time and false matching rate. / Master of Science
|
83 |
Supervoxel Based Object Detection and Seafloor Segmentation Using Novel 3d Side-Scan SonarPatel, Kushal Girishkumar 12 November 2021 (has links)
Object detection and seafloor segmentation for conventional 2D side-scan sonar imagery is a well-investigated problem. However, due to recent advances in sensing technology, the side-scan sonar now produces a true 3D point cloud representation of the seafloor embedded with echo intensity. This creates a need to develop algorithms to process the incoming 3D data for applications such as object detection and segmentation, and an opportunity to leverage advances in 3D point cloud processing developed for terrestrial applications using optical sensors (e.g. LiDAR). A bottleneck in deploying 3D side-scan sonar sensors for online applications is attributed to the complexity in handling large amounts of data which requires higher memory for storing and processing data on embedded computers. The present research aims to improve data processing capabilities on-board autonomous underwater vehicles (AUVs). A supervoxel-based framework for over-segmentation and object detection is proposed which reduces a dense point cloud into clusters of similar points in a neighborhood. Supervoxels extracted from the point cloud are then described using feature vectors which are computed using geometry, echo intensity and depth attributes of the constituent points. Unsupervised density based clustering is applied on the feature space to detect objects which appear as outliers. / Master of Science / Acoustic imaging using side-scan sonar sensors has proven to be useful for tasks like seafloor mapping, mine countermeasures and habitat mapping. Due to advancements in sensing technology, a novel type of side-scan sonar sensor is developed which provides true 3D representation of the seafloor along with the echo intensity image. To improve the usability of the novel sensors on-board the carrying vehicles, efficient algorithms needs to be developed. In underwater robotics, limited computational and data storage capabilities are available which poses additional challenges in online perception applications like object detection and segmentation. In this project, I investigate a clustering based approach followed by an unsupervised machine learning method to perform detection of objects on the seafloor using the novel side scan sonar. I also show the usability of the approach for performing segmentation of the seafloor.
|
84 |
Towards open-world image recognitionSaito, Kuniaki 17 September 2024 (has links)
Deep neural networks can achieve state-of-the-art performance on various image recognition tasks, such as object categorization (image classification) and object localization (object detection), with the help of a large amount of training data. However, to achieve models that perform well in the real world, we must overcome the shift from training to real-world data, which involves two factors: (1) covariate shift and (2) unseen classes.
Covariate shift occurs when the input distribution of a particular category changes from the training time. Deep models can easily make mistakes with a small change in the input, such as small noise addition, lighting change, or changes in the object pose. On the other hand, unseen classes - classes that are absent in the training set - may be present in real-world test samples. It is important to differentiate between "seen" and "unseen" classes in image classification, while locating diverse classes, including classes unseen during training, is crucial in object detection. Therefore, an open-world image recognition model needs to handle both factors. In this thesis, we propose approaches for image classification and object detection that can handle these two kinds of shifts in a label-efficient way.
Firstly, we examine the adaptation of large-scale pre-trained models to the object detection task while preserving their robustness to handle covariate shift. We investigate various pre-trained models and discover that the acquisition of robust representations by a trained model depends heavily on the pre-trained model’s architecture. Based on this intuition, we develop simple techniques to prevent the loss of generalizable representations.
Secondly, we study the adaptation to an unlabeled target domain for object detection to address the covariate shift. Traditional domain alignment methods may be inadequate due to various factors that cause domain shift between the source and target domains, such as layout and the number of objects in an image. To address this, we propose a strong-weak distribution alignment approach that can handle diverse domain shifts. Furthermore, we study the problem of semi-supervised domain adaptation for image classification when partially labeled target data is available. We introduce a simple yet effective approach, MME, for this task, which extracts discriminative features for the target domain using adversarial learning. We also develop a method to handle the situation where the unlabeled target domain includes categories unseen in the source domain. Since there is no supervision, recognizing instances of unseen classes as "unseen" is challenging. To address this, we devise a straightforward approach that trains a one-vs-all classifier using source data to build a classifier that can detect unseen instances. Additionally, we introduce an approach to enable an object detector to recognize an unseen foreground instance as an "object" using a simple data augmentation and learning framework that is applicable to diverse detectors and datasets.
In conclusion, our proposed approaches employ various datasets or architectures due to their simple design and achieve state-of-the-art results. Our work can contribute to the development of a unified open-world image recognition model in future research.
|
85 |
A Machine Learning Approach to Recognize Environmental Features Associated with Social FactorsDiaz-Ramos, Jonathan 11 June 2024 (has links)
In this thesis we aim to supplement the Climate and Economic Justice Screening Tool (CE JST), which assists federal agencies in identifying disadvantaged census tracts, by extracting five environmental features from Google Street View (GSV) images. The five environmental features are garbage bags, greenery, and three distinct road damage types (longitudinal, transverse, and alligator cracks), which were identified using image classification, object detection, and image segmentation. We evaluate three cities using this developed feature space in order to distinguish between disadvantaged and non-disadvantaged census tracts.
The results of the analysis reveal the significance of the feature space and demonstrate the time efficiency, detail, and cost-effectiveness of the proposed methodology. / Master of Science / In this thesis we aim to supplement the Climate and Economic Justice Screening Tool (CE JST), which assists federal agencies in identifying disadvantaged census tracts, by extracting five environmental features from Google Street View (GSV) images. The five environmental features are garbage bags, greenery, and three distinct road damage types (longitudinal, transverse, and alligator cracks), which were identified using image classification, object detection, and image segmentation. We evaluate three cities using this developed feature space in order to distinguish between disadvantaged and non-disadvantaged census tracts.
The results of the analysis reveal the significance of the feature space and demonstrate the time efficiency, detail, and cost-effectiveness of the proposed methodology.
|
86 |
Sémantický popis obrazovky embedded zařízení / Semantic description of the embedded device screenHorák, Martin January 2020 (has links)
Tato diplomová práce se zabývá detekcí prvků uživatelského rozhraní na obrázku displejetiskárny za použití konvolučních neuronových sítí. V teoretické části je provedena rešeršesoučasně používaných architektur pro detekci objektů. V praktické čísti je probrána tvorbagalerie, učení a vyhodnocování vybraných modelů za použití Tensorflow ObjectDetectionAPI. Závěr práce pojednává o vhodnosti vycvičených modelů pro zadaný úkol.
|
87 |
Machine vision for automation of earth-moving machines : Transfer learning experiments with YOLOv3Borngrund, Carl January 2019 (has links)
This master thesis investigates the possibility to create a machine vision solution for the automation of earth-moving machines. This research was done as without some type of vision system it will not be possible to create a fully autonomous earth moving machine that can safely be used around humans or other machines. Cameras were used as the primary sensors as they are cheap, provide high resolution and is the type of sensor that most closely mimic the human vision system. The purpose of this master thesis was to use existing real time object detectors together with transfer learning and examine if they can successfully be used to extract information in environments such as construction, forestry and mining. The amount of data needed to successfully train a real time object detector was also investigated. Furthermore, the thesis examines if there are specifically difficult situations for the defined object detector, how reliable the object detector is and finally how to use service-oriented architecture principles can be used to create deep learning systems. To investigate the questions formulated above, three data sets were created where different properties were varied. These properties were light conditions, ground material and dump truck orientation. The data sets were created using a toy dump truck together with a similarly sized wheel loader with a camera mounted on the roof of its cab. The first data set contained only indoor images where the dump truck was placed in different orientations but neither the light nor the ground material changed. The second data set contained images were the light source was kept constant, but the dump truck orientation and ground materials changed. The last data set contained images where all property were varied. The real time object detector YOLOv3 was used to examine how a real time object detector would perform depending on which one of the three data sets it was trained using. No matter the data set, it was possible to train a model to perform real time object detection. Using a Nvidia 980 TI the inference time of the model was around 22 ms, which is more than enough to be able to classify videos running at 30 fps. All three data sets converged to a training loss of around 0.10. The data set which contained more varied data, such as the data set where all properties were changed, performed considerably better reaching a validation loss of 0.164 compared to the indoor data set, containing the least varied data, only reached a validation loss of 0.257. The size of the data set was also a factor in the performance, however it was not as important as having varied data. The result also showed that all three data sets could reach a mAP score of around 0.98 using transfer learning.
|
88 |
Experiential Sampling For Object Detection In VideoParesh, A 05 1900 (has links)
The problem of object detection deals with determining whether an instance of a given class of object is present or not. There are robust, supervised learning based algorithms available for object detection in an image. These image object detectors (image-based object detectors) use characteristics learnt from the training samples to find object and non-object regions. The characteristics used are such that the detectors work under a variety of conditions and hence are very robust.
Object detection in video can be performed by using such a detector on each frame of the video sequence. This approach checks for presence of an object around each pixel, at different scales. Such a frame-based approach completely ignores the temporal continuity inherent in the video. The detector declares presence of the object independent of what has happened in the past frames. Also, various visual cues such as motion and color, which give hints about the location of the object, are not used.
The current work is aimed at building a generic framework for using a supervised learning based image object detector for video that exploits temporal continuity and the presence of various visual cues. We use temporal continuity and visual cues to speed up the detection and improve detection accuracy by considering past detection results.
We propose a generic framework, based on Experiential Sampling [1], which considers temporal continuity and visual cues to focus on a relevant subset of each frame. We determine some key positions in each frame, called attention samples, and object detection is performed only at scales with these positions as centers. These key positions are statistical samples from a density function that is estimated based on various visual cues, past experience and temporal continuity. This density estimation is modeled as a
Bayesian Filtering problem and is carried out using Sequential Monte Carlo methods (also known as Particle Filtering), where a density is represented by a weighted sample set. The experiential sampling framework is inspired by Neisser’s perceptual cycle [2] and Itti-Koch’s static visual attention model[3].
In this work, we first use Basic Experiential Sampling as presented in[1]for object detection in video and show its limitations. To overcome these limitations, we extend the framework to effectively combine top-down and bottom-up visual attention phenomena. We use learning based detector’s response, which is a top-down cue, along with visual cues to improve attention estimate. To effectively handle multiple objects, we maintain a minimum number of attention samples per object. We propose to use motion as an alert cue to reduce the delay in detecting new objects entering the field of view. We use an inhibition map to avoid revisiting already attended regions. Finally, we improve detection accuracy by using a particle filter based detection scheme [4], also known as Track Before Detect (TBD). In this scheme, we compute likelihood of presence of the object based on current and past frame data. This likelihood is shown to be approximately equal to the product of average sample weights over past frames.
Our framework results in a significant reduction in overall computation required by the object detector, with an improvement in accuracy while retaining its robustness. This enables the use of learning based image object detectors in real time video applications which otherwise are computationally expensive.
We demonstrate the usefulness of this framework for frontal face detection in video. We use Viola-Jones’ frontal face detector[5] and color and motion visual cues. We show results for various cases such as sequences with single object, multiple objects, distracting background, moving camera, changing illumination, objects entering/exiting the frame, crossing objects, objects with pose variation and sequences with scene change.
The main contributions of the thesis are
i) We give an experiential sampling formulation for object detection in video. Many concepts like attention point and attention density which are vague in[1] are precisely defined.
ii) We combine detector’s response along with visual cues to estimate attention. This is inspired by a combination of top-down and bottom-up attention maps in visual attention models. To the best of our knowledge, this is used for the first time for object detection in video.
iii) In case of multiple objects, we highlight the problem with sample based density representation and solve by maintaining a minimum number of attention samples per object.
iv) For objects first detected by the learning based detector, we propose to use a TBD scheme for their subsequent detections along with the learning based detector. This improves accuracy compared to using the learning based detector alone.
This thesis is organized as follows
. Chapter 1: In this chapter we present a brief survey of related work and define our problem.
. Chapter 2: We present an overview of biological models that have motivated our work.
. Chapter 3: We give the experiential sampling formulation as in previous work [1], show results and discuss its limitations.
. Chapter 4: In this chapter, which is on Enhanced Experiential Sampling, we suggest enhancements to overcome limitations of basic experiential sampling. We propose track-before-detect scheme to improve detection accuracy.
. Chapter 5: We conclude the thesis and give possible directions for future work in this area.
. Appendix A: A description of video database used in this thesis.
. Appendix B: A list of commonly used abbreviations and notations.
|
89 |
Automotive 3D Object Detection Without Target Domain AnnotationsGustafsson, Fredrik, Linder-Norén, Erik January 2018 (has links)
In this thesis we study a perception problem in the context of autonomous driving. Specifically, we study the computer vision problem of 3D object detection, in which objects should be detected from various sensor data and their position in the 3D world should be estimated. We also study the application of Generative Adversarial Networks in domain adaptation techniques, aiming to improve the 3D object detection model's ability to transfer between different domains. The state-of-the-art Frustum-PointNet architecture for LiDAR-based 3D object detection was implemented and found to closely match its reported performance when trained and evaluated on the KITTI dataset. The architecture was also found to transfer reasonably well from the synthetic SYN dataset to KITTI, and is thus believed to be usable in a semi-automatic 3D bounding box annotation process. The Frustum-PointNet architecture was also extended to explicitly utilize image features, which surprisingly degraded its detection performance. Furthermore, an image-only 3D object detection model was designed and implemented, which was found to compare quite favourably with current state-of-the-art in terms of detection performance. Additionally, the PixelDA approach was adopted and successfully applied to the MNIST to MNIST-M domain adaptation problem, which validated the idea that unsupervised domain adaptation using Generative Adversarial Networks can improve the performance of a task network for a dataset lacking ground truth annotations. Surprisingly, the approach did however not significantly improve upon the performance of the image-based 3D object detection models when trained on the SYN dataset and evaluated on KITTI.
|
90 |
Detekce objektů na GPU / Object Detection on GPUMacenauer, Pavel January 2015 (has links)
This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.
|
Page generated in 0.0884 seconds