• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 6
  • 2
  • 1
  • Tagged with
  • 42
  • 42
  • 21
  • 14
  • 13
  • 11
  • 10
  • 10
  • 9
  • 8
  • 8
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Automatic Eartag Recognition on Dairy Cows in Real Barn Environment

Ilestrand, Maja January 2017 (has links)
All dairy cows in Europe wear unique identification tags in their ears. These eartags are standardized and contains the cows identification numbers, today only used for visual identification by the farmer. The cow also needs to be identified by an automatic identification system connected to milk machines and other robotics used at the farm. Currently this is solved with a non-standardized radio transmitter which can be placed on different places on the cow and different receivers needs to be used on different farms. Other drawbacks with the currently used identification system are that it is expensive and unreliable. This thesis explores the possibility to replace this non standardized radio frequency based identification system with a standardized computer vision based system. The method proposed in this thesis uses a color threshold approach for detection, a flood fill approach followed by Hough transform and a projection method for segmentation and evaluates template matching, k-nearest neighbour and support vector machines as optical character recognition methods. The result from the thesis shows that the quality of the data used as input to the system is vital. By using good data, k-nearest neighbour, which showed the best results of the three OCR approaches, handles 98 % of the digits.
12

Universal object segmentation in fused range-color data

Finley, Jeffery Michael January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Christopher L. Lewis / This thesis presents a method to perform universal object segmentation on fused SICK laser range data and color CCD camera images collected from a mobile robot. This thesis also details the method of fusion. Fused data allows for higher resolution than range-only data and provides more information than color-only data. The segmentation method utilizes the Expectation Maximization (EM) algorithm to detect the location and number of universal objects modeled by a six-dimensional Gaussian distribution. This is achieved by continuously subdividing objects previously identified by EM. After several iterations, objects with similar traits are merged. The universal object model performs well in environments consisting of both man-made (walls, furniture, pavement) and natural objects (trees, bushes, grass). This makes it ideal for use in both indoor and outdoor environments. The algorithm does not require the number of objects to be known prior to calculation nor does it require a training set of data. Once the universal objects have been segmented, they can be processed and classified or left alone and used inside robotic navigation algorithms like SLAM.
13

Semantic-oriented Object Segmentation / Segmentation d'objet pour l'interprétation sémantique

Zou, Wenbin 13 March 2014 (has links)
Cette thèse porte sur les problèmes de segmentation d’objets et la segmentation sémantique qui visent soit à séparer des objets du fond, soit à l’attribution d’une étiquette sémantique spécifique à chaque pixel de l’image. Nous proposons deux approches pour la segmentation d’objets, et une approche pour la segmentation sémantique. La première approche est basée sur la détection de saillance. Motivés par notre but de segmentation d’objets, un nouveau modèle de détection de saillance est proposé. Cette approche se formule dans le modèle de récupération de la matrice de faible rang en exploitant les informations de structure de l’image provenant d’une segmentation ascendante comme contrainte importante. La segmentation construite à l’aide d’un schéma d’optimisation itératif et conjoint, effectue simultanément, d’une part, une segmentation d’objets basée sur la carte de saillance résultant de sa détection et, d’autre part, une amélioration de la qualité de la saillance à l’aide de la segmentation. Une carte de saillance optimale et la segmentation finale sont obtenues après plusieurs itérations. La deuxième approche proposée pour la segmentation d’objets se fonde sur des images exemples. L’idée sous-jacente est de transférer les étiquettes de segmentation d’exemples similaires, globalement et localement, à l’image requête. Pour l’obtention des exemples les mieux assortis, nous proposons une représentation nouvelle de haut niveau de l’image, à savoir le descripteur orienté objet, qui reflète à la fois l’information globale et locale de l’image. Ensuite, un prédicteur discriminant apprend en ligne à l’aide les exemples récupérés pour attribuer à chaque région de l’image requête un score d’appartenance au premier plan. Ensuite, ces scores sont intégrés dans un schéma de segmentation du champ de Markov (MRF) itératif qui minimise l’énergie. La segmentation sémantique se fonde sur une banque de régions et la représentation parcimonieuse. La banque des régions est un ensemble de régions générées par segmentations multi-niveaux. Ceci est motivé par l’observation que certains objets peuvent être capturés à certains niveaux dans une segmentation hiérarchique. Pour la description de la région, nous proposons la méthode de codage parcimonieux qui représente chaque caractéristique locale avec plusieurs vecteurs de base du dictionnaire visuel appris, et décrit toutes les caractéristiques locales d’une région par un seul histogramme parcimonieux. Une machine à support de vecteurs (SVM) avec apprentissage de noyaux multiple est utilisée pour l’inférence sémantique. Les approches proposées sont largement évaluées sur plusieurs ensembles de données. Des expériences montrent que les approches proposées surpassent les méthodes de l’état de l’art. Ainsi, par rapport au meilleur résultat de la littérature, l’approche proposée de segmentation d’objets améliore la mesure d F-score de 63% à 68,7% sur l’ensemble de données Pascal VOC 2011. / This thesis focuses on the problems of object segmentation and semantic segmentation which aim at separating objects from background or assigning a specific semantic label to each pixel in an image. We propose two approaches for the object segmentation and one approach for semantic segmentation. The first proposed approach for object segmentation is based on saliency detection. Motivated by our ultimate goal for object segmentation, a novel saliency detection model is proposed. This model is formulated in the low-rank matrix recovery model by taking the information of image structure derived from bottom-up segmentation as an important constraint. The object segmentation is built in an iterative and mutual optimization framework, which simultaneously performs object segmentation based on the saliency map resulting from saliency detection, and saliency quality boosting based on the segmentation. The optimal saliency map and the final segmentation are achieved after several iterations. The second proposed approach for object segmentation is based on exemplar images. The underlying idea is to transfer segmentation labels of globally and locally similar exemplar images to the query image. For the purpose of finding the most matching exemplars, we propose a novel high-level image representation method called object-oriented descriptor, which captures both global and local information of image. Then, a discriminative predictor is learned online by using the retrieved exemplars. This predictor assigns a probabilistic score of foreground to each region of the query image. After that, the predicted scores are integrated into the segmentation scheme of Markov random field (MRF) energy optimization. Iteratively finding minimum energy of MRF leads the final segmentation. For semantic segmentation, we propose an approach based on region bank and sparse coding. Region bank is a set of regions generated by multi-level segmentations. This is motivated by the observation that some objects might be captured at certain levels in a hierarchical segmentation. For region description, we propose sparse coding method which represents each local feature descriptor with several basic vectors in the learned visual dictionary, and describes all local feature descriptors within a region by a single sparse histogram. With the sparse representation, support vector machine with multiple kernel learning is employed for semantic inference. The proposed approaches have been extensively evaluated on several challenging and widely used datasets. Experiments demonstrated the proposed approaches outperform the stateofthe- art methods. Such as, compared to the best result in the literature, the proposed object segmentation approach based on exemplar images improves the F-score from 63% to 68.7% on Pascal VOC 2011 dataset.
14

Development of Novel Attention-Aware Deep Learning Models and Their Applications in Computer Vision and Dynamical System Calibration

Maftouni, Maede 12 July 2023 (has links)
In recent years, deep learning has revolutionized computer vision and natural language processing tasks, but the black-box nature of these models poses significant challenges for their interpretability and reliability, especially in critical applications such as healthcare. To address this, attention-based methods have been proposed to enhance the focus and interpretability of deep learning models. In this dissertation, we investigate the effectiveness of attention mechanisms in improving prediction and modeling tasks across different domains. We propose three essays that utilize task-specific designed trainable attention modules in manufacturing, healthcare, and system identification applications. In essay 1, we introduce a novel computer vision tool that tracks the melt pool in X-ray images of laser powder bed fusion using attention modules. In essay 2, we present a mask-guided attention (MGA) classifier for COVID-19 classification on lung CT scan images. The MGA classifier incorporates lesion masks to improve both the accuracy and interpretability of the model, outperforming state-of-the-art models with limited training data. Finally, in essay 3, we propose a Transformer-based model, utilizing self-attention mechanisms, for parameter estimation in system dynamics models that outpaces the conventional system calibration methods. Overall, our results demonstrate the effectiveness of attention-based methods in improving deep learning model performance and reliability in diverse applications. / Doctor of Philosophy / Deep learning, a type of artificial intelligence, has brought significant advancements to tasks like recognizing images or understanding texts. However, the inner workings of these models are often not transparent, which can make it difficult to comprehend and have confidence in their decision-making processes. Transparency is particularly important in areas like healthcare, where understanding why a decision was made can be as crucial as the decision itself. To help with this, we've been exploring an interpretable tool that helps the computer focus on the most important parts of the data, which we call the ``attention module''. Inspired by the human perception system, these modules focus more on certain important details, similar to how our eyes might be drawn to a familiar face in a crowded room. We propose three essays that utilize task-specific attention modules in manufacturing, healthcare, and system identification applications. In essay one, we introduce a computer vision tool that tracks a moving object in a manufacturing X-ray image sequence using attention modules. In the second essay, we discuss a new deep learning model that uses focused attention on lung lesions for more accurate COVID-19 detection on CT scan images, outperforming other top models even with less training data. In essay three, we propose an attention-based deep learning model for faster parameter estimation in system dynamics models. Overall, our research shows that attention-based methods can enhance the performance, transparency, and usability of deep learning models across diverse applications.
15

Human Detection, Tracking and Segmentation in Surveillance Video

Shu, Guang 01 January 2014 (has links)
This dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video feeds are proposed. Firstly, we propose a novel method for human detection which employs unsupervised learning and superpixel segmentation. The performance of generic human detectors is usually degraded in unconstrained video environments due to varying lighting conditions, backgrounds and camera viewpoints. To handle this problem, we employ an unsupervised learning framework that improves the detection performance of a generic detector when it is applied to a particular video. In our approach, a generic DPM human detector is employed to collect initial detection examples. These examples are segmented into superpixels and then represented using Bag-of-Words (BoW) framework. The superpixel-based BoW feature encodes useful color features of the scene, which provides additional information. Finally a new scene-specific classifier is trained using the BoW features extracted from the new examples. Compared to previous work, our method learns scene-specific information through superpixel-based features, hence it can avoid many false detections typically obtained by a generic detector. We are able to demonstrate a significant improvement in the performance of the state-of-the-art detector. Given robust human detection, we propose a robust multiple-human tracking framework using a part-based model. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. We address such problems by developing an online-learning tracking-by-detection method. Our approach learns part-based person-specific Support Vector Machine (SVM) classifiers which capture articulations of moving human bodies with dynamically changing backgrounds. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection. This leads to a significant improvement in detection performance in cluttered scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking. Next, in order to obtain precise boundaries of humans, we propose a novel method for multiple human segmentation in videos by incorporating human detection and part-based detection potential into a multi-frame optimization framework. In the first stage, after obtaining the superpixel segmentation for each detection window, we separate superpixels corresponding to a human and background by minimizing an energy function using Conditional Random Field (CRF). We use the part detection potentials from the DPM detector, which provides useful information for human shape. In the second stage, the spatio-temporal constraints of the video is leveraged to build a tracklet-based Gaussian Mixture Model for each person, and the boundaries are smoothed by multi-frame graph optimization. Compared to previous work, our method could automatically segment multiple people in videos with accurate boundaries, and it is robust to camera motion. Experimental results show that our method achieves better segmentation performance than previous methods in terms of segmentation accuracy on several challenging video sequences. Most of the work in Computer Vision deals with point solution; a specific algorithm for a specific problem. However, putting different algorithms into one real world integrated system is a big challenge. Finally, we introduce an efficient tracking system, NONA, for high-definition surveillance video. We implement the system using a multi-threaded architecture (Intel Threading Building Blocks (TBB)), which executes video ingestion, tracking, and video output in parallel. To improve tracking accuracy without sacrificing efficiency, we employ several useful techniques. Adaptive Template Scaling is used to handle the scale change due to objects moving towards a camera. Incremental Searching and Local Frame Differencing are used to resolve challenging issues such as scale change, occlusion and cluttered backgrounds. We tested our tracking system on a high-definition video dataset and achieved acceptable tracking accuracy while maintaining real-time performance.
16

Pixel-level video understanding with efficient deep models

Hu, Ping 02 February 2024 (has links)
The ability to understand videos at the level of pixels plays a key role in a wide range of computer vision applications. For example, a robot or autonomous vehicle relies on classifying each pixel in the video stream into semantic categories to holistically understand the surrounding environment, and video editing software needs to exploit the spatiotemporal context of video pixels to generate various visual effects. Despite the great progress of Deep Learning (DL) techniques, applying DL-based vision models to process video pixels remains practically challenging, due to the high volume of video data and the compute-intensive design of DL approaches. In this thesis, we aim to design efficient and robust deep models for pixel-level video understanding of high-level semantics, mid-level grouping, and low-level interpolation. Toward this goal, in Part I, we address the semantic analysis of video pixels with the task of Video Semantic Segmentation (VSS), which aims to assign pixel-level semantic labels to video frames. We introduce methods that utilize temporal redundancy and context to efficiently recognize video pixels without sacrificing performance. Extensive experiments on various datasets demonstrate our methods' effectiveness and efficiency on both common GPUs and edge devices. Then, in Part II, we show that pixel-level motion patterns help to differentiate video objects from their background. In particular, we propose a fast and efficient contour-based algorithm to group and separate motion patterns for video objects. Furthermore, we present learning-based models to solve the tracking of objects across frames. We show that by explicitly separating the object segmentation and object tracking problems, our framework achieves efficiency during both training and inference. Finally, in Part III, we study the temporal interpolation of pixels given their spatial-temporal context. We show that intermediate video frames can be inferred via interpolation in a very efficient way, by introducing the many-to-many splatting framework that can quickly warp and fuse pixels at any number of arbitrary intermediate time steps. We also propose a dynamic refinement mechanism to further improve the interpolation quality by reducing redundant computation. Evaluation on various types of datasets shows that our method can interpolate videos with state-of-the-art quality and efficiency. To summarize, we discuss and propose efficient pipelines for pixel-level video understanding tasks across high-level semantics, mid-level grouping, and low-level interpolation. The proposed models can contribute to tackling a wide range of real-world video perception and understanding problems in future research.
17

Simultaneous object detection and segmentation using top-down and bottom-up processing

Sharma, Vinay 07 January 2008 (has links)
No description available.
18

Recognition of Anomalous Motion Patterns in Urban Surveillance

Andersson, Maria, Gustafsson, Fredrik, St-Laurent, Louis, Prevost, Donald January 2013 (has links)
We investigate the unsupervised K-means clustering and the semi-supervised hidden Markov model (HMM) to automatically detect anomalous motion patterns in groups of people (crowds). Anomalous motion patterns are typically people merging into a dense group, followed by disturbances or threatening situations within the group. The application of K-means clustering and HMM are illustrated with datasets from four surveillance scenarios. The results indicate that by investigating the group of people in a systematic way with different K values, analyze cluster density, cluster quality and changes in cluster shape we can automatically detect anomalous motion patterns. The results correspond well with the events in the datasets. The results also indicate that very accurate detections of the people in the dense group would not be necessary. The clustering and HMM results will be very much the same also with some increased uncertainty in the detections. / <p>Funding Agencies|Vinnova (Swedish Governmental Agency for Innovation Systems) under the VINNMER program||</p>
19

Region-based face detection, segmentation and tracking. framework definition and application to other objects

Vilaplana Besler, Verónica 17 December 2010 (has links)
One of the central problems in computer vision is the automatic recognition of object classes. In particular, the detection of the class of human faces is a problem that generates special interest due to the large number of applications that require face detection as a first step. In this thesis we approach the problem of face detection as a joint detection and segmentation problem, in order to precisely localize faces with pixel accurate masks. Even though this is our primary goal, in finding a solution we have tried to create a general framework as independent as possible of the type of object being searched. For that purpose, the technique relies on a hierarchical region-based image model, the Binary Partition Tree, where objects are obtained by the union of regions in an image partition. In this work, this model is optimized for the face detection and segmentation tasks. Different merging and stopping criteria are proposed and compared through a large set of experiments. In the proposed system the intra-class variability of faces is managed within a learning framework. The face class is characterized using a set of descriptors measured on the tree nodes, and a set of one-class classifiers. The system is formed by two strong classifiers. First, a cascade of binary classifiers simplifies the search space, and afterwards, an ensemble of more complex classifiers performs the final classification of the tree nodes. The system is extensively tested on different face data sets, producing accurate segmentations and proving to be quite robust to variations in scale, position, orientation, lighting conditions and background complexity. We show that the technique proposed for faces can be easily adapted to detect other object classes. Since the construction of the image model does not depend on any object class, different objects can be detected and segmented using the appropriate object model on the same image model. New object models can be easily built by selecting and training a suitable set of descriptors and classifiers. Finally, a tracking mechanism is proposed. It combines the efficiency of the mean-shift algorithm with the use of regions to track and segment faces through a video sequence, where both the face and the camera may move. The method is extended to deal with other deformable objects, using a region-based graph-cut method for the final object segmentation at each frame. Experiments show that both mean-shift based trackers produce accurate segmentations even in difficult scenarios such as those with similar object and background colors and fast camera and object movements. Lloc i / Un dels problemes més importants en l'àrea de visió artificial és el reconeixement automàtic de classes d'objectes. En particular, la detecció de la classe de cares humanes és un problema que genera especial interès degut al gran nombre d'aplicacions que requereixen com a primer pas detectar les cares a l'escena. A aquesta tesis s'analitza el problema de detecció de cares com un problema conjunt de detecció i segmentació, per tal de localitzar de manera precisa les cares a l'escena amb màscares que arribin a precisions d'un píxel. Malgrat l'objectiu principal de la tesi és aquest, en el procés de trobar una solució s'ha intentat crear un marc de treball general i tan independent com fos possible del tipus d'objecte que s'està buscant. Amb aquest propòsit, la tècnica proposada fa ús d'un model jeràrquic d'imatge basat en regions, l'arbre binari de particions (BPT: Binary Partition Tree), en el qual els objectes s'obtenen com a unió de regions que provenen d'una partició de la imatge. En aquest treball, s'ha optimitzat el model per a les tasques de detecció i segmentació de cares. Per això, es proposen diferents criteris de fusió i de parada, els quals es comparen en un conjunt ampli d'experiments. En el sistema proposat, la variabilitat dins de la classe cara s'estudia dins d'un marc de treball d'aprenentatge automàtic. La classe cara es caracteritza fent servir un conjunt de descriptors, que es mesuren en els nodes de l'arbre, així com un conjunt de classificadors d'una única classe. El sistema està format per dos classificadors forts. Primer s'utilitza una cascada de classificadors binaris que realitzen una simplificació de l'espai de cerca i, posteriorment, s'aplica un conjunt de classificadors més complexes que produeixen la classificació final dels nodes de l'arbre. El sistema es testeja de manera exhaustiva sobre diferents bases de dades de cares, sobre les quals s'obtenen segmentacions precises provant així la robustesa del sistema en front a variacions d'escala, posició, orientació, condicions d'il·luminació i complexitat del fons de l'escena. A aquesta tesi es mostra també que la tècnica proposada per cares pot ser fàcilment adaptable a la detecció i segmentació d'altres classes d'objectes. Donat que la construcció del model d'imatge no depèn de la classe d'objecte que es pretén buscar, es pot detectar i segmentar diferents classes d'objectes fent servir, sobre el mateix model d'imatge, el model d'objecte apropiat. Nous models d'objecte poden ser fàcilment construïts mitjançant la selecció i l'entrenament d'un conjunt adient de descriptors i classificadors. Finalment, es proposa un mecanisme de seguiment. Aquest mecanisme combina l'eficiència de l'algorisme mean-shift amb l'ús de regions per fer el seguiment i segmentar les cares al llarg d'una seqüència de vídeo a la qual tant la càmera com la cara es poden moure. Aquest mètode s'estén al cas de seguiment d'altres objectes deformables, utilitzant una versió basada en regions de la tècnica de graph-cut per obtenir la segmentació final de l'objecte a cada imatge. Els experiments realitzats mostren que les dues versions del sistema de seguiment basat en l'algorisme mean-shift produeixen segmentacions acurades, fins i tot en entorns complicats com ara quan l'objecte i el fons de l'escena presenten colors similars o quan es produeix un moviment ràpid, ja sigui de la càmera o de l'objecte.
20

Visual Perception of Objects and their Parts in Artificial Systems

Schoeler, Markus 12 October 2015 (has links)
No description available.

Page generated in 0.1275 seconds