Global ETD Search

601	Automatic Gait Recognition : using deep metric learning / Automatisk gångstilsigenkänning Persson, Martin January 2020 (has links) Recent improvements in pose estimation has opened up the possibility of new areas of application. One of them is gait recognition, the task of identifying persons based on their unique style of walking, which is increasingly being recognized as an important method of biometric indentification. This thesis has explored the possibilities of using a pose estimation system, OpenPose, together with deep Recurrent Neural Networks (RNNs) in order to see if there is sufficient information in sequences of 2D poses to use for gait recognition. For this to be possible, a new multi-camera dataset consisting of persons walking on a treadmill was gathered, dubbed the FOI dataset. The results show that this approach has some promise. It achieved an overall classification accuracy of 95,5 % on classes it had seen during training and 83,8 % for classes it had not seen during training. It was unable to recognize sequences from angles it had not seen during training, however. For that to be possible, more data pre-processing will likely be required. Gait recognition Computer vision
602	Development of Dropwise Additive Manufacturing with non-Brownian Suspensions: Applications of Computer Vision and Bayesian Modeling to Process Design, Monitoring and Control: Video Files in Chapter 5 and Appendix E Andrew J. Radcliffe (9080312) 24 July 2020 (has links) Video files found in Chapter 5. : AUTOMATED OBJECT TRACKING, EVENT DETECTION AND RECOGNITION FOR HIGH-SPEED VIDEO OF DROP FORMATION PHENOMENA.<div><br></div><div>Video files found in APPENDIX E. CHAPTER 5, RESOURCE 2.</div> Computer Vision drop formation image processing computer vision event detection object tracking
603	FROM SEEING BETTER TO UNDERSTANDING BETTER: DEEP LEARNING FOR MODERN COMPUTER VISION APPLICATIONS Tianqi Guo (12890459) 17 June 2022 (has links) <p>In this dissertation, we document a few of our recent attempts in bridging the gap between the fast evolving deep learning research and the vast industry needs for dealing with computer vision challenges. More specifically, we developed novel deep-learning-based techniques for the following application-driven computer vision challenges: image super-resolution with quality restoration, motion estimation by optical flow, object detection for shape reconstruction, and object segmentation for motion tracking. Those four topics cover the computer vision hierarchy from the low level where digital images are processed to restore missing information for better human perception, to middle level where certain objects of interest are recognized and their motions are analyzed, finally to high level where the scene captured in the video footage will be interpreted for further analysis. In the process of building the whole-package of ready-to-deploy solutions, we center our efforts on designing and training the most suitable convolutional neural networks for the particular computer vision problem at hand. Complementary procedures for data collection, data annotation, post-processing of network outputs tailored for specific application needs, and deployment details will also be discussed where necessary. We hope our work demonstrates the applicability and versatility of convolutional neural networks for real-world computer vision tasks on a broad spectrum, from seeing better to understanding better.</p> Computer vision Deep learning deep learning computer vision cell tracking cell segmentation super-resolution
604	Intelligent Collision Prevention System For SPECT Detectors by Implementing Deep Learning Based Real-Time Object Detection Tahrir Ibraq Siddiqui (11173185) 23 July 2021 (has links) <p>The SPECT-CT machines manufactured by Siemens consists of two heavy detector heads(~1500lbs each) that are moved into various configurations for radionuclide imaging. These detectors are driven by large torque powered by motors in the gantry that enable linear and rotational motion. If the detectors collide with large objects – stools, tables, patient extremities, etc. – they are very likely to damage the objects and get damaged as well. <a>This research work proposes an intelligent real-time object detection system to prevent collisions</a> between detector heads and external objects in the path of the detector’s motion by implementing an end-to-end deep learning object detector. The research extensively documents all the work done in identifying the most suitable object detection framework for this use case, collecting, and processing the image dataset of target objects, training the deep neural net to detect target objects, deploying the trained deep neural net in live demos by implementing a real-time object detection application written in Python, improving the model’s performance, and finally investigating methods to stop detector motion upon detecting external objects in the collision region. We successfully demonstrated that a <i>Caffe</i> version of <i>MobileNet-SSD </i>can be trained and deployed to detect target objects entering the collision region in real-time by following the methodologies outlined in this paper. We then laid out the future work that must be done in order to bring this system into production, such as training the model to detect all possible objects that may be found in the collision region, controlling the activation of the RTOD application, and efficiently stopping the detector motion.</p> Computer Vision Computer Vision Deep Learning Real-Time Object Detection MobileNet-SSD Caffe OpenCV Machine Learning
605	Evaluating DCNN architecturesfor multinomial area classicationusing satellite data / Utvärdering av DCNN arkitekturer för multinomial arealklassi-cering med hjälp av satellit data Wojtulewicz, Karol, Agbrink, Viktor January 2020 (has links) The most common approach to analysing satellite imagery is building or object segmentation,which expects an algorithm to find and segment objects with specific boundaries thatare present in the satellite imagery. The company Vricon takes satellite imagery analysisfurther with the goal of reproducing the entire world into a 3D mesh. This 3D reconstructionis performed by a set of complex algorithms excelling in different object reconstructionswhich need sufficient labeling in the original 2D satellite imagery to ensure validtransformations. Vricon believes that the labeling of areas can be used to improve the algorithmselection process further. Therefore, the company wants to investigate if multinomiallarge area classification can be performed successfully using the satellite image data availableat the company. To enable this type of classification, the company’s gold-standarddataset containing labeled objects such as individual buildings, single trees, roads amongothers, has been transformed into an large area gold-standard dataset in an unsupervisedmanner. This dataset was later used to evaluate large area classification using several stateof-the-art Deep Convolutional Neural Network (DCNN) semantic segmentation architectureson both RGB as well as RGB and Digital Surface Model (DSM) height data. Theresults yield close to 63% mIoU and close to 80% pixel accuracy on validation data withoutusing the DSM height data in the process. This thesis additionally contributes with a novelapproach for large area gold-standard creation from existing object labeled datasets. Segmentation FCN Computer Vision Autoencoder Area classification
606	Efficient Multi-Object Tracking On Unmanned Aerial Vehicle Xiao Hu (12469473) 27 April 2022 (has links) <p>Multi-object tracking has been well studied in the field of computer vision. Meanwhile, with the advancement of the Unmanned Aerial Vehicles (UAV) technology, the flexibility and accessibility of UAV draws research attention to deploy multi-object tracking on UAV. The conventional solutions usually adapt using the "tracking-by-detection" paradigm. Such a paradigm has the structure where tracking is achieved through detecting objects in consecutive frames and then associating them with re-identification. However, the dynamic background, crowded small objects, and limited computational resources make multi-object tracking on UAV more challenging. Providing energy-efficient multi-object tracking solutions on the drone-captured video is critically demanded by research community. </p> <p> </p> <p>To stimulate innovation in both industry and academia, we organized the 2021 Low-Power Computer Vision Challenge with a UAV Video track focusing on multi-class multi-object tracking with customized UAV video. This thesis analyzes the qualified submissions of 17 different teams and provides a detailed analysis of the best solution. Methods and future directions for energy-efficient AI and computer vision research are discussed. The solutions and insights presented in this thesis are expected to facilitate future research and applications in the field of low-power vision on UAV.</p> <p> </p> <p>With the knowledge gathered from the submissions, an optical flow oriented multi-object tracking framework, named OF-MOT, is proposed to address the similar problem with a more realistic drone-captured video dataset. OF-MOT uses the motion information of each detected object of the previous frame to detect the current frame, then applies a customized object tracker using the motion information to associate the detected instances. OF-MOT is evaluated on a drone-captured video dataset and achieves 24 FPS with 17\% accuracy on a modern GPU Titan X, showing that the optical flow can effectively improve the multi-object tracking.</p> <p> </p> <p>Both competition results analysis and OF-MOT provide insights or experiment results regarding deploying multi-object tracking on UAV. We hope these findings will facilitate future research and applications in the field of UAV vision.</p> Computer Engineering Computer Vision computer vision algorithms Multiple Object Tracking UAV platforms
607	TREE-BASED UNIDIRECTIONAL NEURAL NETWORKS FOR LOW-POWER COMPUTER VISION ON EMBEDDED DEVICES Abhinav Goel (12468279) 27 April 2022 (has links) <p>Deep Neural Networks (DNNs) are a class of machine learning algorithms that are widelysuccessful in various computer vision tasks. DNNs filter input images and videos with manyconvolution operations in each layer to extract high-quality features and achieve high ac-curacy. Although highly accurate, the state-of-the-art DNNs usually require server-gradeGPUs, and are too energy, computation and memory-intensive to be deployed on most de-vices. This is a significant problem because billions of mobile and embedded devices that donot contain GPUs are now equipped with high definition cameras. Running DNNs locallyon these devices enables applications such as emergency response and safety monitoring,because data cannot always be offloaded to the Cloud due to latency, privacy, or networkbandwidth constraints.</p> <p>Prior research has shown that a considerable number of a DNN’s memory accesses andcomputation are redundant when performing computer vision tasks. Eliminating these re-dundancies will enable faster and more efficient DNN inference on low-power embedded de-vices. To reduce these redundancies and thereby reduce the energy consumption of DNNs,this thesis proposes a novel Tree-based Unidirectional Neural Network (TRUNK) architec-ture. Instead of a single large DNN, multiple small DNNs in the form of a tree work togetherto perform computer vision tasks. The TRUNK architecture first finds thesimilaritybe-tween different object categories. Similar object categories are grouped intoclusters. Similarclusters are then grouped into a hierarchy, creating a tree. The small DNNs at every nodeof TRUNK classify between different clusters. During inference, for an input image, oncea DNN selects a cluster, another DNN further classifies among the children of the cluster(sub-clusters). The DNNs associated with other clusters are not used during the inferenceof that image. By doing so, only a small subset of the DNNs are used during inference,thus reducing redundant operations, memory accesses, and energy consumption. Since eachintermediate classification reduces the search space of possible object categories in the image,the small efficient DNNs still achieve high accuracy.</p> <p>In this thesis, we identify the computer vision applications and scenarios that are wellsuited for the TRUNK architecture. We develop methods to use TRUNK to improve the efficiency of the image classification, object counting, and object re-identification problems.We also present methods to adapt the TRUNK structure for different embedded/edge ap-plication contexts with different system architectures, accuracy requirements, and hardware constraints.</p> <p>Experiments with TRUNK using several image datasets reveal the effectiveness of theproposed solution to reduce memory requirement by∼50%, inference time by∼65%, energyconsumption by∼65%, and the number of operations by∼45% when compared with existingDNN architectures. These experiments are conducted on consumer-grade embedded systems:NVIDIA Jetson Nano, Raspberry Pi 3, and Raspberry Pi Zero. The TRUNK architecturehas only marginal losses in accuracy when compared with the state-of-the-art DNNs.</p> Computer Engineering Computer Software Computer Vision Computer System Architecture Computer Vision Low-Power Embedded Devices
608	Multi-Agent Neural Rearrangement Planning of Objects in Cluttered Environments Vivek Gupta (16642227) 27 July 2023 (has links) <p>Object rearrangement is a fundamental problem in robotics with various practical applications ranging from managing warehouses to cleaning and organizing home kitchens. While existing research has primarily focused on single-agent solutions, real-world scenarios often require multiple robots to work together on rearrangement tasks. We propose a comprehensive learning-based framework for multi-agent object rearrangement planning, addressing the challenges of task sequencing and path planning in complex environments. The proposed method iteratively selects objects, determines their relocation regions, and pairs them with available robots under kinematic feasibility and task reachability for execution to achieve the target arrangement. Our experiments on a diverse range of environments demonstrate the effectiveness and robustness of the proposed framework. Furthermore, results indicate improved performance in terms of traversal time and success rate compared to baseline approaches. The videos and supplementary material are available at https://sites.google.com/view/maner-supplementary</p> Intelligent robotics Computer vision Robotics Machine Learning Computer Vision Tranformers Rearrangement Task and Motion Planning
609	Fine-Grained Bayesian Zero-Shot Object Recognition Sarkhan Badirli (11820785) 03 January 2022 (has links) <div>Building machine learning algorithms to recognize objects in real-world tasks is a very challenging problem. With increasing number of classes, it becomes very costly and impractical to collect samples for all classes to obtain an exhaustive data to train the model. This limited labeled data bottleneck prevails itself more profoundly over fine grained object classes where some of these classes may lack any labeled representatives in the training data. A robust algorithm in this realistic scenario will be required to classify samples from well-represented classes as well as to handle samples from unknown origin. In this thesis, we break down this difficult task into more manageable sub-problems and methodically explore novel solutions to address each component in a sequential order.</div><div><br></div><div>We begin with zero-shot learning (ZSL) scenario where classes that are lacking any labeled images in the training data, i.e., unseen classes, are assumed to have some semantic descriptions associated with them. The ZSL paradigm is motivated by analogy to humans’ learning process. We human beings can recognize new categories by just knowing some semantic descriptions of them without even seeing any instances from these categories. We</div><div>develop a novel hierarchical Bayesian classifier for ZSL task. The two-layer architecture of the model is specifically designed to exploit the implicit hierarchy present among classes, in particular evident in fine-grained datasets. In the proposed method, there are latent classes that define the class hierarchy in the image space and semantic information is used to build the Bayesian hierarchy around these meta-classes. Our Bayesian model imposes local priors on semantically similar classes that share the same meta-class to realize knowledge transfer. We finally derive posterior predictive distributions to reconcile information about local and global priors and then blend them with data likelihood for the final likelihood calculation. With its closed form solution, our two-layer hierarchical classifier proves to be fast in training and flexible to model both fine and coarse-grained datasets. In particular, for challenging fine-grained datasets the proposed model can leverage the large number of seen classes to its advantage for a better local prior estimation without sacrificing on seen class accuracy.</div><div>Side information plays a critical role in ZSL and ZSL models hold on a strong assumption that the side information is strongly correlated with image features. Our model uses side information only to build hierarchy, thus, no explicit correlation between image features is assumed. This in turn leads the Bayesian model to be very resilient to various side</div><div>information sources as long as they are discriminative enough to define class hierarchy.</div><div><br></div><div>When dealing with thousands of classes, it becomes very difficult to obtain semantic descriptions for fine grained classes. For example, in species classification where classes display very similar morphological traits, it is impractical if not impossible to derive characteristic</div><div>visual attributes that can distinguish thousands of classes. Moreover, it would be unrealistic to assume that an exhaustive list of visual attributes characterizing all object classes, both seen and unseen, can be determined based only on seen classes. We propose DNA as a side</div><div>information to overcome this obstacle in order to do fine grained zero-shot species classification. We demonstrate that 658 base pair long DNA barcodes can be sufficient to serve as a robust source of side information for newly compiled insect dataset with more than thousand</div><div>classes. The experiments is further validated on well-known CUB dataset on which DNA attributes proves to be as competitive as word vectors. Our proposed Bayesian classifier delivers state of the art results on both datasets while using DNA as side information.</div><div><br></div><div>Traditional ZSL framework, however, is not quite suitable for scalable species identification and discovery. For example, insects are one of the largest groups of animal kingdom</div><div>with estimated 5.5 million species yet only 20% of them is described. We extend the traditional ZSL into a more practical framework where no explicit side information is available for unseen classes. We transform our Bayesian model to utilize taxonomical hierarchy of species</div><div>to perform insect identification at scale. Our approach is the first to combine two different data modalities, namely image and DNA information, to perform insect identification with</div><div>more than thousand classes. Our algorithm not only classifies known species with impressive 97% accuracy but also identifies unknown species and classify them to their true genus with 81% accuracy.</div><div><br></div><div>Our approach has the ability to address some major societal issues in climate change such as changing insect distributions and measuring biodiversity across the world. We believe this work can pave the way for more precise and more importantly the scalable monitoring of</div><div>biodiversity and can become instrumental in offering objective measures of the impacts of recent changes our planet has been going through.</div> Computer Vision computer vision algorithms Machine Learning Hierarchical Bayesian Classification
610	Human Detection, Tracking and Segmentation in Surveillance Video Shu, Guang 01 January 2014 (has links) This dissertation addresses the problem of human detection and tracking in surveillance videos. Even though this is a well-explored topic, many challenges remain when confronted with data from real world situations. These challenges include appearance variation, illumination changes, camera motion, cluttered scenes and occlusion. In this dissertation several novel methods for improving on the current state of human detection and tracking based on learning scene-specific information in video feeds are proposed. Firstly, we propose a novel method for human detection which employs unsupervised learning and superpixel segmentation. The performance of generic human detectors is usually degraded in unconstrained video environments due to varying lighting conditions, backgrounds and camera viewpoints. To handle this problem, we employ an unsupervised learning framework that improves the detection performance of a generic detector when it is applied to a particular video. In our approach, a generic DPM human detector is employed to collect initial detection examples. These examples are segmented into superpixels and then represented using Bag-of-Words (BoW) framework. The superpixel-based BoW feature encodes useful color features of the scene, which provides additional information. Finally a new scene-specific classifier is trained using the BoW features extracted from the new examples. Compared to previous work, our method learns scene-specific information through superpixel-based features, hence it can avoid many false detections typically obtained by a generic detector. We are able to demonstrate a significant improvement in the performance of the state-of-the-art detector. Given robust human detection, we propose a robust multiple-human tracking framework using a part-based model. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. We address such problems by developing an online-learning tracking-by-detection method. Our approach learns part-based person-specific Support Vector Machine (SVM) classifiers which capture articulations of moving human bodies with dynamically changing backgrounds. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection. This leads to a significant improvement in detection performance in cluttered scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking. Next, in order to obtain precise boundaries of humans, we propose a novel method for multiple human segmentation in videos by incorporating human detection and part-based detection potential into a multi-frame optimization framework. In the first stage, after obtaining the superpixel segmentation for each detection window, we separate superpixels corresponding to a human and background by minimizing an energy function using Conditional Random Field (CRF). We use the part detection potentials from the DPM detector, which provides useful information for human shape. In the second stage, the spatio-temporal constraints of the video is leveraged to build a tracklet-based Gaussian Mixture Model for each person, and the boundaries are smoothed by multi-frame graph optimization. Compared to previous work, our method could automatically segment multiple people in videos with accurate boundaries, and it is robust to camera motion. Experimental results show that our method achieves better segmentation performance than previous methods in terms of segmentation accuracy on several challenging video sequences. Most of the work in Computer Vision deals with point solution; a specific algorithm for a specific problem. However, putting different algorithms into one real world integrated system is a big challenge. Finally, we introduce an efficient tracking system, NONA, for high-definition surveillance video. We implement the system using a multi-threaded architecture (Intel Threading Building Blocks (TBB)), which executes video ingestion, tracking, and video output in parallel. To improve tracking accuracy without sacrificing efficiency, we employ several useful techniques. Adaptive Template Scaling is used to handle the scale change due to objects moving towards a camera. Incremental Searching and Local Frame Differencing are used to resolve challenging issues such as scale change, occlusion and cluttered backgrounds. We tested our tracking system on a high-definition video dataset and achieved acceptable tracking accuracy while maintaining real-time performance. Computer vision computer vision system object detection object tracking object segmentation Computer Engineering Engineering

Search results