Global ETD Search

221	Object Recognition with Progressive Refinement for Collaborative Robots Task Allocation Wu, Wenbo 18 December 2020 (has links) With the rapid development of deep learning techniques, the application of Convolutional Neural Network (CNN) has benefited the task of target object recognition. Several state-of-the-art object detectors have achieved excellent performance on the precision for object recognition. When it comes to applying the detection results for the real world application of collaborative robots, the reliability and robustness of the target object detection stage is essential to support efficient task allocation. In this work, collaborative robots task allocation is based on the assumption that each individual robotic agent possesses specialized capabilities to be matched with detected targets representing tasks to be performed in the surrounding environment which impose specific requirements. The goal is to reach a specialized labor distribution among the individual robots based on best matching their specialized capabilities with the corresponding requirements imposed by the tasks. In order to further improve task recognition with convolutional neural networks in the context of robotic task allocation, this thesis proposes an innovative approach for progressively refining the target detection process by taking advantage of the fact that additional images can be collected by mobile cameras installed on robotic vehicles. The proposed methodology combines a CNN-based object detection module with a refinement module. For the detection module, a two-stage object detector, Mask RCNN, for which some adaptations on region proposal generation are introduced, and a one-stage object detector, YOLO, are experimentally investigated in the context considered. The generated recognition scores serve as input for the refinement module. In the latter, the current detection result is considered as the a priori evidence to enhance the next detection for the same target with the goal to iteratively improve the target recognition scores. Both the Bayesian method and the Dempster-Shafer theory are experimentally investigated to achieve the data fusion process involved in the refinement process. The experimental validation is conducted on indoor search-and-rescue (SAR) scenarios and the results presented in this work demonstrate the feasibility and reliability of the proposed progressive refinement framework, especially when the combination of adapted Mask RCNN and D-S theory data fusion is exploited. Object recognition Convolutional neural network Deep learning Machine vision
222	OBJECT DETECTION IN DEEP LEARNING Haoyu Shi (8100614) 10 December 2019 (has links) <p>Through the computing advance and GPU (Graphics Processing Unit) availability for math calculation, the deep learning field becomes more popular and prevalent. Object detection with deep learning, which is the part of image processing, plays an important role in automatic vehicle drive and computer vision. Object detection includes object localization and object classification. Object localization involves that the computer looks through the image and gives the correct coordinates to localize the object. Object classification is that the computer classification targets into different categories. The traditional image object detection pipeline idea is from Fast/Faster R-CNN [32] [58]. The region proposal network generates the contained objects areas and put them into classifier. The first step is the object localization while the second step is the object classification. The time cost for this pipeline function is not efficient. Aiming to address this problem, You Only Look Once (YOLO) [4] network is born. YOLO is the single neural network end-to-end pipeline with the image processing speed being 45 frames per second in real time for network prediction. In this thesis, the convolution neural networks are introduced, including the state of art convolutional neural networks in recently years. YOLO implementation details are illustrated step by step. We adopt the YOLO network for our applications since the YOLO network has the faster convergence rate in training and provides high accuracy and it is the end to end architecture, which makes networks easy to optimize and train. </p> Computer Engineering Deep learning neural network convolution neural network yolo
223	Deep Neural Networks Based Disaggregation of Swedish Household Energy Consumption Bhupathiraju, Praneeth Varma January 2020 (has links) Context: In recent years, households have been increasing energy consumption to very high levels, where it is no longer sustainable. There has been a dire need to find a way to use energy more sustainably due to the increase in the usage of energy consumption. One of the main causes of this unsustainable usage of energy consumption is that the user is not much acquainted with the energy consumed by the smart appliances (dishwasher, refrigerator, washing machine etc) in their households. By letting the household users know the energy usage consumed by the smart appliances. For the energy analytics companies, they must analyze the energy consumed by the smart appliances present in a house. To achieve this Kelly et. al. [7] have performed the task of energy disaggregation by using deep neural networks and producing good results. Zhang et. al. [7] has gone even a step further in improving the deep neural networks proposed by Kelly et. al., The task was performed by Non-intrusive load monitoring (NILM) technique. Objectives: The thesis aims to assess the performance of the deep neural networks which are proposed by Kelly et.al. [7], and Zhang et. al. [8]. We use deep neural networks for disaggregation of the dishwasher energy consumption, in the presence of vampire loads such as electric heaters, in a Swedish household setting. We also try to identify the training time of the proposed deep neural networks. Methods: An intensive literature review is done to identify state-of-the-art deep neural network techniques used for energy disaggregation. All the experiments are being performed on the dataset provided by the energy analytics company Eliq AB. The data is collected from 4 households in Sweden. All the households consist of vampire loads, an electrical heater, whose power consumption can be seen in the main power sensor. A separate smart plug is used to collect the dishwasher power consumption data. Each algorithm training is done on 2 houses with data provided by all the houses except two, which will be used for testing. The metrics used for analyzing the algorithms are Accuracy, Recall, Precision, Root mean square error (RMSE), and F1 measure. These software metrics would help us identify the best suitable algorithm for the disaggregation of dishwasher energy in our case. Results: The results from our study have proved that Gated recurrent unit (GRU) performed best when compared to the other neural networks in our study like Simple recurrent neural network (SRN), Convolutional Neural Network (CNN), Long short-Term memory (LSTM) and Recurrent convolution neural network (RCNN). The Accuracy, RMSE and the F1 score of the GRU algorithm are higher when compared with the other algorithms. Also, if the user does not consider F1 score and RMSE as an evaluation metric and considers training time as his or her metric, then Simple recurrent neural network outperforms all the other neural nets with an average training time of 19.34 minutes. Deep learning Non-intrusive load monitoring disaggregation. Computer Systems Datorsystem
224	Applicability of deep learning for mandibular growth prediction Jiwa, Safeer 29 July 2020 (has links) OBJECTIVES: Cephalometric analysis is a tool used in orthodontics for craniofacial growth assessment. Magnitude and direction of mandibular growth pose challenges that may impede successful orthodontic treatment. Accurate growth prediction enables the practitioner to improve diagnostics and orthodontic treatment planning. Deep learning provides a novel method due to its ability to analyze massive quantities of data. We compared the growth prediction capabilities of a novel deep learning algorithm with an industry-standard method. METHODS: Using OrthoDx™, 17 mandibular landmarks were plotted on selected serial cephalograms of 101 growing subjects, obtained from the Forsyth Moorrees Twin Study. The Deep Learning Algorithm (DLA) was trained for a 2-year prediction with 81 subjects. X/Y coordinates of initial and final landmark positions were inputted into a multilayer perceptron that was trained to improve its growth prediction accuracy over several iterations. These parameters were then used on 20 test subjects and compared to the ground truth landmark locations to compute the accuracy. The 20 subjects’ growth was also predicted using Ricketts’s growth prediction (RGP) in Dolphin Imaging™ 11.9 and compared to the ground truth. Mean Absolute Error (MAE) of Ricketts and DLA were then compared to each other, and human landmark detection error used as a clinical reference mean (CRM). RESULTS: The 2-year mandibular growth prediction MAE was 4.21mm for DLA and 3.28mm for RGP. DLA’s error for skeletal landmarks was 2.11x larger than CRM, while RGP was 1.78x larger. For dental landmarks, DLA was 2.79x, and Ricketts was 1.73x larger than CRM. CONCLUSIONS: DLA is currently not on par with RGP for a 2-year growth prediction. However, an increase in data volume and increased training may improve DLA’s prediction accuracy. Regardless, significant future improvements to all growth prediction methods would more accurately assess growth from lateral cephalograms and improve orthodontic diagnoses and treatment plans. Dentistry Cephalometry Craniofacial Deep learning Growth prediction Mandible Orthodontics
225	Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation You, Di 06 June 2019 (has links) To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted by social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this thesis, we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms seven state-of-the-art recommendation models, achieving at least 3~5.3% improvement. deep learning fact-checking graph neural network recommender system
226	Representation learning for single cell morphological phenotyping / Representationsinlärning för morfologisk fenotypning av enskilda celler Nenner, Andreas January 2022 (has links) Preclinical research for developing new drugs is a long and expensive procedure. Experiments relying on image acquisition and analysis tend to be low throughput and use reporter systems that may influence the studied cells. With image-based assays focusing on extracting qualitative information from microscopic images of mammalian cells, more cost-efficient and high-throughput analyses are possible. Furthermore, studying cell morphology has proven to be a good indicator of cell phenotype. Using hand-crafted feature descriptors based on cell morphology, label-free quantification of cell apoptosis has been achieved. These hand-crafted descriptors are based on cell characteristics translated to quantifiable metrics, but risk being biased towards easily observable features and therefore miss subtle ones. This project proposes an alternative approach by generating a latent representation of cell features using deep learning models and aims to find if they can compete with pre-defined hand-crafted representations in classifying live or dead cells. For this purpose, three deep learning models are implemented, one autoencoder and two variational-autoencoder. We develop a core architecture shared between the models based on a convolutional neural network using a latent space with 16 dimensions. We then train the models to recreate single-cell images of SKOV3 ovarian cancer cells. The latent representation was extracted at specific checkpoints during training and later used for training a logistic regression classifier. Finally, comparing classification accuracy between the hand-crafted feature representations and generated representation was made with novel cell images. The generated representations show a slight but consistent increase in classification accuracy, up to 4.9 percent points, even without capturing all morphological details in the recreation. Thus, we conclude that it is possible for generated representations to outperform hand-crafted feature descriptors in live or dead cell classification. Deep learning Image-based profiling Computer Sciences Datavetenskap (datalogi)
227	Multi-Source and Source-Private Cross-Domain Learning For Visual Recognition Peng, Qucheng 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Domain adaptation is one of the hottest directions in solving annotation insufficiency problem of deep learning. General domain adaptation is not consistent with the practical scenarios in the industry. In this thesis, we focus on two concerns as below. First is that labeled data are generally collected from multiple domains. In other words, multi-source adaptation is a more common situation. Simply extending these single-source approaches to the multi-source cases could cause sub-optimal inference, so specialized multi-source adaptation methods are essential. The main challenge in the multi-source scenario is a more complex divergence situation. Not only the divergence between target and each source plays a role, but the divergences among distinct sources matter as well. However, the significance of maintaining consistency among multiple sources didn't gain enough attention in previous work. In this thesis, we propose an Enhanced Consistency Multi-Source Adaptation (EC-MSA) framework to address it from three perspectives. First, we mitigate feature-level discrepancy by cross-domain conditional alignment, narrowing the divergence between each source and target domain class-wisely. Second, we enhance multi-source consistency via dual mix-up, diminishing the disagreements among different sources. Third, we deploy a target distilling mechanism to handle the uncertainty of target prediction, aiming to provide high-quality pseudo-labeled target samples to benefit the previous two aspects. Extensive experiments are conducted on several common benchmark datasets and demonstrate that our model outperforms the state-of-the-art methods. Second is that data privacy and security is necessary in practice. That is, we hope to keep the raw data stored locally while can still obtain a satisfied model. In such a case, the risk of data leakage greatly decreases. Therefore, it is natural for us to combine the federated learning paradigm with domain adaptation. Under the source-private setting, the main challenge for us is to expose information from the source domain to the target domain while make sure that the communication process is safe enough. In this thesis, we propose a method named Fourier Transform-Assisted Federated Domain Adaptation (FTA-FDA) to alleviate the difficulties in two ways. We apply Fast Fourier Transform to the raw data and transfer only the amplitude spectra during the communication. Then frequency space interpolations between these two domains are conducted, minimizing the discrepancies while ensuring the contact of them and keeping raw data safe. What's more, we make prototype alignments by using the model weights together with target features, trying to reduce the discrepancy in the class level. Experiments on Office-31 demonstrate the effectiveness and competitiveness of our approach, and further analyses prove that our algorithm can help protect privacy and security. Transfer learning Domain adaptation Deep learning Machine learning Image classification
228	An Investigation of Scale Factor in Deep Networks for Scene Recognition Qiao, Zhinan 05 1900 (has links) Is there a significant difference in the design of deep networks for the tasks of classifying object-centric images and scenery images? How to design networks that extract the most representative features for scene recognition? To answer these questions, we design studies to examine the scales and richness of image features for scenery image recognition. Three methods are proposed that integrate the scale factor to the deep networks and reveal the fundamental network design strategies. In our first attempt to integrate scale factors into the deep network, we proposed a method that aggregates both the context and multi-scale object information of scene images by constructing a multi-scale pyramid. In our design, integration of object-centric multi-scale networks achieved a performance boost of 9.8%; integration of object- and scene-centric models obtained an accuracy improvement of 5.9% compared with single scene-centric models. We also exploit bringing the attention scheme to the deep network and proposed a Scale Attentive Network (SANet). The SANet streamlines the multi-scale scene recognition pipeline, learns comprehensive scene features at various scales and locations, addresses the inter-dependency among scales, and further assists feature re-calibration as well as the aggregation process. The proposed network achieved a Top-1 accuracy increase by 1.83% on Place365 standard dataset with only 0.12% additional parameters and 0.24% additional GFLOPs using ResNet-50 as the backbone. We further bring the scale factor implicitly into network backbone design by proposing a Deep-Narrow Network and Dilated Pooling module. The Deep-narrow architecture increased the depth of the network as well as decreased the width of the network, which uses a variety of receptive fields by stacking more layers. We further proposed a Dilated Pooling module which expanded the pooling scope and made use of multi-scale features in the pooling operation. By embedding the Dilated Pooling into Deep-Narrow Network, we obtained a Top-1 accuracy boost of 0.40% using less than half of the GFLOPs and parameters compared to benchmark ResNet-50. deep learning computer vision scene recognition multi-scale
229	A Comparative Study of Routing Methods in Capsule Networks Malmgren, Christoffer January 2019 (has links) Recently, the deep neural network structure caps-net was proposed by Sabouret al. [11]. Capsule networks are designed to learn relative geometry betweenthe features of a layer and the features of the next layer. The Capsule network’smain building blocks are capsules, which are represented by vectors. The ideais that each capsule will represent a feature as well as traits or subfeatures ofthat feature. This allows for smart information routing. Capsules traits are usedto predict the traits of the capsules in the next layer, and information is sent toto next layer capsules on which the predictions agree. This is called routing byagreement.This thesis investigates theoretical support of new and existing routing al-gorithms as well as evaluates their performance on the MNIST [16] and CIFAR-10 [8] datasets. A variation of the dynamic routing algorithm presented in theoriginal paper [11] achieved the highest accuracy and fastest execution time. Computer Vision Deep Learning Capsule Networks Signal Processing Signalbehandling
230	Scene Understanding for Mobile Robots exploiting Deep Learning Techniques Rangel, José Carlos 05 September 2017 (has links) Every day robots are becoming more common in the society. Consequently, they must have certain basic skills in order to interact with humans and the environment. One of these skills is the capacity to understand the places where they are able to move. Computer vision is one of the ways commonly used for achieving this purpose. Current technologies in this field offer outstanding solutions applied to improve data quality every day, therefore producing more accurate results in the analysis of an environment. With this in mind, the main goal of this research is to develop and validate an efficient object-based scene understanding method that will be able to help solve problems related to scene identification for mobile robotics. We seek to analyze state-of-the-art methods for finding the most suitable one for our goals, as well as to select the kind of data most convenient for dealing with this issue. Another primary goal of the research is to determine the most suitable data input for analyzing scenes in order to find an accurate representation for the scenes by meaning of semantic labels or point cloud features descriptors. As a secondary goal we will show the benefits of using semantic descriptors generated with pre-trained models for mapping and scene classification problems, as well as the use of deep learning models in conjunction with 3D features description procedures to build a 3D object classification model that is directly related with the representation goal of this work. The research described in this thesis was motivated by the need for a robust system capable of understanding the locations where a robot usually interacts. In the same way, the advent of better computational resources has allowed to implement some already defined techniques that demand high computational capacity and that offer a possible solution for dealing with scene understanding issues. One of these techniques are Convolutional Neural Networks (CNNs). These networks have the capacity of classifying an image based on their visual appearance. Then, they generate a list of lexical labels and the probability for each label, representing the likelihood of the present of an object in the scene. Labels are derived from the training sets that the networks learned to recognize. Therefore, we could use this list of labels and probabilities as an efficient representation of the environment and then assign a semantic category to the regions where a mobile robot is able to navigate, and at the same time construct a semantic or topological map based on this semantic representation of the place. After analyzing the state-of-the-art in Scene Understanding, we identified a set of approaches in order to develop a robust scene understanding procedure. Among these approaches we identified an almost unexplored gap in the topic of understanding scenes based on objects present in them. Consequently, we propose to perform an experimental study in this approach aimed at finding a way of fully describing a scene considering the objects lying in place. As the Scene Understanding task involves object detection and annotation, one of the first steps is to determine the kind of data to use as input data in our proposal. With this in mind, our proposal considers to evaluate the use of 3D data. This kind of data suffers from the presence of noise, therefore, we propose to use the Growing Neural Gas (GNG) algorithm to reduce noise effect in the object recognition procedure. GNGs have the capacity to grow and adapt their topology to represent 2D information, producing a smaller representation with a slight noise influence from the input data. Applied to 3D data, the GNG presents a good approach able to tackle with noise. However, using 3D data poses a set of problems such as the lack of a 3D object dataset with enough models to generalize methods and adapt them to real situations, as well as the fact that processing three-dimensional data is computationally expensive and requires a huge storage space. These problems led us to explore new approaches for developing object recognition tasks. Therefore, considering the outstanding results obtained by the CNNs in the latest ImageNet challenge, we propose to carry out an evaluation of the former as an object detection system. These networks were initially proposed in the 90s and are nowadays easily implementable due to hardware improvements in the recent years. CNNs have shown satisfying results when they tested in problems such as: detection of objects, pedestrians, traffic signals, sound waves classification, and for medical image processing, among others. Moreover, an aggregate value of CNNs is the semantic description capabilities produced by the categories/labels that the network is able to identify and that could be translated as a semantic explanation of the input image. Consequently, we propose using the evaluation of these semantic labels as a scene descriptor for building a supervised scene classification model. Having said that, we also propose using semantic descriptors to generate topological maps and test the description capabilities of lexical labels. In addition, semantic descriptors could be suitable for unsupervised places or environment labeling, so we propose using them to deal with this kind of problem in order to achieve a robust scene labeling method. Finally, for tackling the object recognition problem we propose to develop an experimental study for unsupervised object labeling. This will be applied to the objects present in a point cloud and labeled using a lexical labeling tool. Then, objects will be used as the training instances of a classifier mixing their 3D features with label assigned by the external tool. Scene Understanding Deep Learning Robotics

Search results