Global ETD Search

161	South African Sign Language Hand Shape and Orientation Recognition on Mobile Devices Using Deep Learning Jacobs, Kurt January 2017 (has links) >Magister Scientiae - MSc / In order to classify South African Sign Language as a signed gesture, five fundamental parameters need to be considered. These five parameters to be considered are: hand shape, hand orientation, hand motion, hand location and facial expressions. The research in this thesis will utilise Deep Learning techniques, specifically Convolutional Neural Networks, to recognise hand shapes in various hand orientations. The research will focus on two of the five fundamental parameters, i.e., recognising six South African Sign Language hand shapes for each of five different hand orientations. These hand shape and orientation combinations will be recognised by means of a video stream captured on a mobile device. The efficacy of Convolutional Neural Network for gesture recognition will be judged with respect to its classification accuracy and classification speed in both a desktop and embedded context. The research methodology employed to carry out the research was Design Science Research. Design Science Research refers to a set of analytical techniques and perspectives for performing research in the field of Information Systems and Computer Science. Design Science Research necessitates the design of an artefact and the analysis thereof in order to better understand its behaviour in the context of Information Systems or Computer Science. / National Research Foundation (NRF) South African Sign Language Deep Learning techniques Hand shape recognition
162	Learning visually grounded meaning representations Silberer, Carina Helga January 2015 (has links) Humans possess a rich semantic knowledge of words and concepts which captures the perceivable physical properties of their real-world referents and their relations. Encoding this knowledge or some of its aspects is the goal of computational models of semantic representation and has been the subject of considerable research in cognitive science, natural language processing, and related areas. Existing models have placed emphasis on different aspects of meaning, depending ultimately on the task at hand. Typically, such models have been used in tasks addressing the simulation of behavioural phenomena, e.g., lexical priming or categorisation, as well as in natural language applications, such as information retrieval, document classification, or semantic role labelling. A major strand of research popular across disciplines focuses on models which induce semantic representations from text corpora. These models are based on the hypothesis that the meaning of words is established by their distributional relation to other words (Harris, 1954). Despite their widespread use, distributional models of word meaning have been criticised as ‘disembodied’ in that they are not grounded in perception and action (Perfetti, 1998; Barsalou, 1999; Glenberg and Kaschak, 2002). This lack of grounding contrasts with many experimental studies suggesting that meaning is acquired not only from exposure to the linguistic environment but also from our interaction with the physical world (Landau et al., 1998; Bornstein et al., 2004). This criticism has led to the emergence of new models aiming at inducing perceptually grounded semantic representations. Essentially, existing approaches learn meaning representations from multiple views corresponding to different modalities, i.e. linguistic and perceptual input. To approximate the perceptual modality, previous work has relied largely on semantic attributes collected from humans (e.g., is round, is sour), or on automatically extracted image features. Semantic attributes have a long-standing tradition in cognitive science and are thought to represent salient psychological aspects of word meaning including multisensory information. However, their elicitation from human subjects limits the scope of computational models to a small number of concepts for which attributes are available. In this thesis, we present an approach which draws inspiration from the successful application of attribute classifiers in image classification, and represent images and the concepts depicted by them by automatically predicted visual attributes. To this end, we create a dataset comprising nearly 700K images and a taxonomy of 636 visual attributes and use it to train attribute classifiers. We show that their predictions can act as a substitute for human-produced attributes without any critical information loss. In line with the attribute-based approximation of the visual modality, we represent the linguistic modality by textual attributes which we obtain with an off-the-shelf distributional model. Having first established this core contribution of a novel modelling framework for grounded meaning representations based on semantic attributes, we show that these can be integrated into existing approaches to perceptually grounded representations. We then introduce a model which is formulated as a stacked autoencoder (a variant of multilayer neural networks), which learns higher-level meaning representations by mapping words and images, represented by attributes, into a common embedding space. In contrast to most previous approaches to multimodal learning using different variants of deep networks and data sources, our model is defined at a finer level of granularity—it computes representations for individual words and is unique in its use of attributes as a means of representing the textual and visual modalities. We evaluate the effectiveness of the representations learnt by our model by assessing its ability to account for human behaviour on three semantic tasks, namely word similarity, concept categorisation, and typicality of category members. With respect to the word similarity task, we focus on the model’s ability to capture similarity in both the meaning and appearance of the words’ referents. Since existing benchmark datasets on word similarity do not distinguish between these two dimensions and often contain abstract words, we create a new dataset in a large-scale experiment where participants are asked to give two ratings per word pair expressing their semantic and visual similarity, respectively. Experimental results show that our model learns meaningful representations which are more accurate than models based on individual modalities or different modality integration mechanisms. The presented model is furthermore able to predict textual attributes for new concepts given their visual attribute predictions only, which we demonstrate by comparing model output with human generated attributes. Finally, we show the model’s effectiveness in an image-based task on visual category learning, in which images are used as a stand-in for real-world objects. 006.3
163	3D Body Tracking using Deep Learning Xu, Qingguo 01 January 2017 (has links) This thesis introduces a 3D body tracking system based on neutral networks and 3D geometry, which can robustly estimate body poses and accurate body joints. This system takes RGB-D data as input. Body poses and joints are firstly extracted from color image using deep learning approach. The estimated joints and skeletons are further translated to 3D space by using camera calibration information. This system is running at the rate of 3 4 frames per second. It can be used to any RGB-D sensors, such as Kinect, Intel RealSense [14] or any customized system with color depth calibrated. Comparing to the sate-of-art 3D body tracking system, this system is more robust, and can get much more accurate joints locations, which will benefits projects require precise joints, such as virtual try-on, body measure, real-time avatar driven. 3D body tracking deep learning Caffe Computer Engineering
164	On-the-fly visual category search in web-scale image collections Chatfield, Ken January 2014 (has links) This thesis tackles the problem of large-scale visual search for categories within large collections of images. Given a textual description of a visual category, such as 'car' or 'person', the objective is to retrieve images containing that category from the corpus quickly and accurately, and without the need for auxiliary meta-data or, crucially and in contrast to previous approaches, expensive pre-training. The general approach to identifying different visual categories within a dataset is to train classifiers over features extracted from a set of training images. The performance of such classifiers relies heavily on sufficiently discriminative image representations, and many methods have been proposed which involve the aggregating of local appearance features into rich bag-of-words encodings. We begin by conducting a comprehensive evaluation of the latest such encodings, identifying best-of-breed practices for training powerful visual models using these representations. We also contrast these methods with the latest breed of Convolutional Network (ConvNet) based features, thus developing a state-of-the-art architecture for large-scale image classification. Following this, we explore how a standard classification pipeline can be adapted for use in a real-time setting. One of the major issues, particularly with bag-of-words based methods, is the high dimensionality of the encodings, which causes ranking over large datasets to be prohibitively expensive. We therefore assess different methods for compressing such features, and further propose a novel cascade approach to ranking which both reduces ranking time and improves retrieval performance. Finally, we explore the problem of training visual models on-the-fly, making use of visual data dynamically collected from the web to train classifiers on demand. On this basis, we develop a novel GPU architecture for on-the-fly visual category search which is capable of retrieving previously unknown categories over unannonated datasets of millions of images in just a few seconds. 006.3
165	Supervised and unsupervised learning for plant and crop row detection in precision agriculture Varshney, Varun January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / The goal of this research is to present a comparison between different clustering and segmentation techniques, both supervised and unsupervised, to detect plant and crop rows. Aerial images, taken by an Unmanned Aerial Vehicle (UAV), of a corn field at various stages of growth were acquired in RGB format through the Agronomy Department at the Kansas State University. Several segmentation and clustering approaches were applied to these images, namely K-Means clustering, Excessive Green (ExG) Index algorithm, Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and a deep learning approach based on Fully Convolutional Networks (FCN), to detect the plants present in the images. A Hough Transform (HT) approach was used to detect the orientation of the crop rows and rotate the images so that the rows became parallel to the x-axis. The result of applying different segmentation methods to the images was then used in estimating the location of crop rows in the images by using a template creation method based on Green Pixel Accumulation (GPA) that calculates the intensity profile of green pixels present in the images. Connected component analysis was then applied to find the centroids of the detected plants. Each centroid was associated with a crop row, and centroids lying outside the row templates were discarded as being weeds. A comparison between the various segmentation algorithms based on the Dice similarity index and average run-times is presented at the end of the work. precision agriculture deep learning machine learning supervised unsupervised
166	Composing Recommendations Using Computer Screen Images: A Deep Learning Recommender System for PC Users Shapiro, Daniel January 2017 (has links) A new way to train a virtual assistant with unsupervised learning is presented in this thesis. Rather than integrating with a particular set of programs and interfaces, this new approach involves shallow integration between the virtual assistant and computer through machine vision. In effect the assistant interprets the computer screen in order to produce helpful recommendations to assist the computer user. In developing this new approach, called AVRA, the following methods are described: an unsupervised learning algorithm which enables the system to watch and learn from user behavior, a method for fast filtering of the text displayed on the computer screen, a deep learning classifier used to recognize key onscreen text in the presence of OCR translation errors, and a recommendation filtering algorithm to triage the many possible action recommendations. AVRA is compared to a similar commercial state-of-the-art system, to highlight how this work adds to the state of the art. AVRA is a deep learning image processing and recommender system that can col- laborate with the computer user to accomplish various tasks. This document presents a comprehensive overview of the development and possible applications of this novel vir- tual assistant technology. It detects onscreen tasks based upon the context it perceives by analyzing successive computer screen images with neural networks. AVRA is a rec- ommender system, as it assists the user by producing action recommendations regarding onscreen tasks. In order to simplify the interaction between the user and AVRA, the system was designed to only produce action recommendations that can be accepted with a single mouse click. These action recommendations are produced without integration into each individual application executing on the computer. Furthermore, the action recommendations are personalized to the user’s interests utilizing a history of the user’s interaction. Deep Learning Machine Learning Artificial Intelligence Recommender System
167	Real-Time Instance and Semantic Segmentation Using Deep Learning Kolhatkar, Dhanvin 10 June 2020 (has links) In this thesis, we explore the use of Convolutional Neural Networks for semantic and instance segmentation, with a focus on studying the application of existing methods with cheaper neural networks. We modify a fast object detection architecture for the instance segmentation task, and study the concepts behind these modifications both in the simpler context of semantic segmentation and the more difficult context of instance segmentation. Various instance segmentation branch architectures are implemented in parallel with a box prediction branch, using its results to crop each instance's features. We negate the imprecision of the final box predictions and eliminate the need for bounding box alignment by using an enlarged bounding box for cropping. We report and study the performance, advantages, and disadvantages of each. We achieve fast speeds with all of our methods. Instance segmentation Semantic segmentation Deep learning Real-time Mask prediction
168	Automatic Programming Code Explanation Generation with Structured Translation Models January 2020 (has links) abstract: Learning programming involves a variety of complex cognitive activities, from abstract knowledge construction to structural operations, which include program design,modifying, debugging, and documenting tasks. In this work, the objective was to explore and investigate the barriers and obstacles that programming novice learners encountered and how the learners overcome them. Several lab and classroom studies were designed and conducted, the results showed that novice students had different behavior patterns compared to experienced learners, which indicates obstacles encountered. The studies also proved that proper assistance could help novices find helpful materials to read. However, novices still suffered from the lack of background knowledge and the limited cognitive load while learning, which resulted in challenges in understanding programming related materials, especially code examples. Therefore, I further proposed to use the natural language generator (NLG) to generate code explanations for educational purposes. The natural language generator is designed based on Long Short Term Memory (LSTM), a deep-learning translation model. To establish the model, a data set was collected from Amazon Mechanical Turks (AMT) recording explanations from human experts for programming code lines. To evaluate the model, a pilot study was conducted and proved that the readability of the machine generated (MG) explanation was compatible with human explanations, while its accuracy is still not ideal, especially for complicated code lines. Furthermore, a code-example based learning platform was developed to utilize the explanation generating model in programming teaching. To examine the effect of code example explanations on different learners, two lab-class experiments were conducted separately ii in a programming novices’ class and an advanced students’ class. The experiment result indicated that when learning programming concepts, the MG code explanations significantly improved the learning Predictability for novices compared to control group, and the explanations also extended the novices’ learning time by generating more material to read, which potentially lead to a better learning gain. Besides, a completed correlation model was constructed according to the experiment result to illustrate the connections between different factors and the learning effect. / Dissertation/Thesis / Doctoral Dissertation Engineering 2020 Computer science Deep-learning Long Short-Term Memory Programming education
169	Limitations of Classical Tomographic Reconstructions from Restricted Measurements and Enhancing with Physically Constrained Machine Learning January 2020 (has links) abstract: This work is concerned with how best to reconstruct images from limited angle tomographic measurements. An introduction to tomography and to limited angle tomography will be provided and a brief overview of the many fields to which this work may contribute is given. The traditional tomographic image reconstruction approach involves Fourier domain representations. The classic Filtered Back Projection algorithm will be discussed and used for comparison throughout the work. Bayesian statistics and information entropy considerations will be described. The Maximum Entropy reconstruction method will be derived and its performance in limited angular measurement scenarios will be examined. Many new approaches become available once the reconstruction problem is placed within an algebraic form of Ax=b in which the measurement geometry and instrument response are defined as the matrix A, the measured object as the column vector x, and the resulting measurements by b. It is straightforward to invert A. However, for the limited angle measurement scenarios of interest in this work, the inversion is highly underconstrained and has an infinite number of possible solutions x consistent with the measurements b in a high dimensional space. The algebraic formulation leads to the need for high performing regularization approaches which add constraints based on prior information of what is being measured. These are constraints beyond the measurement matrix A added with the goal of selecting the best image from this vast uncertainty space. It is well established within this work that developing satisfactory regularization techniques is all but impossible except for the simplest pathological cases. There is a need to capture the "character" of the objects being measured. The novel result of this effort will be in developing a reconstruction approach that will match whatever reconstruction approach has proven best for the types of objects being measured given full angular coverage. However, when confronted with limited angle tomographic situations or early in a series of measurements, the approach will rely on a prior understanding of the "character" of the objects measured. This understanding will be learned by a parallel Deep Neural Network from examples. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020 Electrical engineering Deep Learning Image Reconstruction Sparsity Tomography
170	Physical layer security in emerging wireless transmission systems Bao, Tingnan 06 July 2020 (has links) Traditional cryptographic encryption techniques at higher layers require a certain form of information sharing between the transmitter and the legitimate user to achieve security. Besides, it also assumes that the eavesdropper has an insufficient computational capability to decrypt the ciphertext without the shared information. However, traditional cryptographic encryption techniques may be insufficient or even not suit- able in wireless communication systems. Physical layer security (PLS) can enhance the security of wireless communications by leveraging the physical nature of wireless transmission. Thus, in this thesis, we study the PLS performance in emerging wireless transmission systems. The thesis consists of two main parts. We first consider the PLS design and analysis for ground-based networks em- ploying random unitary beamforming (RUB) scheme at the transmitter. With RUB technique, the transmitter serves multiple users with pre-designed beamforming vectors, selected using limited channel state information (CSI). We study multiple-input single-output single-eavesdropper (MISOSE) transmission system, multi-user multiple-input multiple-output single-eavesdropper (MU-MIMOSE) transmission system, and massive multiple-input multiple-output multiple-eavesdropper (massive MI- MOME) transmission system. The closed-form expressions of ergodic secrecy rate and the secrecy outage probability (SOP) for these transmission scenarios are derived. Besides, the effect of artificial noise (AN) on secrecy performance of RUB-based transmission is also investigated. Numerical results are presented to illustrate the trade-off between performance and complexity of the resulting PLS design. We then investigate the PLS design and analysis for unmanned aerial vehicle (UAV)-based networks. We first study the secrecy performance of UAV-assisted relaying transmission systems in the presence of a single ground eavesdropper. We derive the closed-form expressions of ergodic secrecy rate and intercept probability. When multiple aerial and ground eavesdroppers are located in the UAV-assisted relaying transmission system, directional beamforming technique is applied to enhance the secrecy performance. Assuming the most general κ-μ shadowed fading channel, the SOP performance is obtained in the closed-form expression. Exploiting the derived expressions, we investigate the impact of different parameters on secrecy performance. Besides, we utilize a deep learning approach in UAV-based network analysis. Numerical results show that our proposed deep learning approach can predict secrecy performance with high accuracy and short running time. / Graduate physical layer security deep learning system performance analysis UAV beamforming

Search results