• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1822
  • 57
  • 53
  • 38
  • 37
  • 34
  • 18
  • 12
  • 10
  • 7
  • 4
  • 4
  • 2
  • 2
  • 1
  • Tagged with
  • 2629
  • 2629
  • 1091
  • 936
  • 822
  • 600
  • 571
  • 478
  • 477
  • 454
  • 429
  • 428
  • 403
  • 400
  • 364
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

A Machine Learning approach to Febrile Classification

Kostopouls, Theodore P 25 April 2018 (has links)
General health screening is needed to decrease the risk of pandemic in high volume areas. Thermal characterization, via infrared imaging, is an effective technique for fever detection, however, strict use requirements in combination with highly controlled environmental conditions compromise the practicality of such a system. Combining advanced processing techniques to thermograms of individuals can remove some of these requirements allowing for more flexible classification algorithms. The purpose of this research was to identify individuals who had febrile status utilizing modern thermal imaging and machine learning techniques in a minimally controlled setting. Two methods were evaluated with data that contained environmental, and acclimation noise due to data gathering technique. The first was a pretrained VGG16 Convolutional Neural Network found to have F1 score of 0.77 (accuracy of 76%) on a balanced dataset. The second was a VGG16 Feature Extractor that gives inputs to a principle components analysis and utilizes a support vector machine for classification. This technique obtained a F1 score of 0.84 (accuracy of 85%) on balanced data sets. These results demonstrate that machine learning is an extremely viable technique to classify febrile status independent of noise affiliated.
132

Why did they cite that?

Lovering, Charles 26 April 2018 (has links)
We explore a machine learning task, evidence recommendation (ER), the extraction of evidence from a source document to support an external claim. This task is an instance of the question answering machine learning task. We apply ER to academic publications because they cite other papers for the claims they make. Reading cited papers to corroborate claims is time-consuming and an automated ER tool could expedite it. Thus, we propose a methodology for collecting a dataset of academic papers and their references. We explore deep learning models for ER and achieve 77% accuracy with pairwise models and 75% pairwise accuracy with document-wise models.
133

Parameter Continuation with Secant Approximation for Deep Neural Networks

Pathak, Harsh Nilesh 03 December 2018 (has links)
Non-convex optimization of deep neural networks is a well-researched problem. We present a novel application of continuation methods for deep learning optimization that can potentially arrive at a better solution. In our method, we first decompose the original optimization problem into a sequence of problems using a homotopy method. To achieve this in neural networks, we derive the Continuation(C)- Activation function. First, C-Activation is a homotopic formulation of existing activation functions such as Sigmoid, ReLU or Tanh. Second, we apply a method which is standard in the parameter continuation domain, but to the best of our knowledge, novel to the deep learning domain. In particular, we use Natural Parameter Continuation with Secant approximation(NPCS), an effective training strategy that may find a superior local minimum for a non-convex optimization problem. Additionally, we extend our work on Step-up GANs, a data continuation approach, by deriving a method called Continuous(C)-SMOTE which is an extension of standard oversampling algorithms. We demonstrate the improvements made by our methods and establish a categorization of recent work done on continuation methods in the context of deep learning.
134

Image processing and forward propagation using binary representations, and robust audio analysis using deep learning

Pedersoli, Fabrizio 15 March 2019 (has links)
The work presented in this thesis consists of three main topics: document segmentation and classification into text and score, efficient computation with binary representations, and deep learning architectures for polyphonic music transcription and classification. In the case of musical documents, an important problem is separating text from musical score by detecting the corresponding boundary boxes. A new algorithm is proposed for pixel-wise classification of digital documents in musical score and text. It is based on a bag-of-visual-words approach and random forest classification. A robust technique for identifying bounding boxes of text and music score from the pixel-wise classification is also proposed. For efficient processing of learned models, we turn our attention to binary representations. When dealing with binary data, the use of bit-packing and bit-wise computation can reduce computational time and memory requirements considerably. Efficiency is a key factor when processing large scale datasets and in industrial applications. SPmat is an optimized framework for binary image processing. We propose a bit-packed representation for binary images that encodes both pixels and square neighborhoods, and design SPmat, an optimized framework for binary image processing, around it. Bit-packing and bit-wise computation can also be used for efficient forward propagation in deep neural networks. Quantified deep neural networks have recently been proposed with the goal of improving computational time performance and memory requirements while maintaining as much as possible classification performance. A particular type of quantized neural networks are binary neural networks in which the weights and activations are constrained to $-1$ and $+1$. In this thesis, we describe and evaluate Espresso, a novel optimized framework for fast inference of binary neural networks that takes advantage of bit-packing and bit-wise computations. Espresso is self contained, written in C/CUDA and provides optimized implementations of all the building blocks needed to perform forward propagation. Following the recent success, we further investigate Deep neural networks. They have achieved state-of-the-art results and outperformed traditional machine learning methods in many applications such as: computer vision, speech recognition, and machine translation. However, in the case of music information retrieval (MIR) and audio analysis, shallow neural networks are commonly used. The effectiveness of deep and very deep architectures for MIR and audio tasks has not been explored in detail. It is also not clear what is the best input representation for a particular task. We therefore investigate deep neural networks for the following audio analysis tasks: polyphonic music transcription, musical genre classification, and urban sound classification. We analyze the performance of common classification network architectures using different input representations, paying specific attention to residual networks. We also evaluate the robustness of these models in case of degraded audio using different combinations of training/testing data. Through experimental evaluation we show that residual networks provide consistent performance improvements when analyzing degraded audio across different representations and tasks. Finally, we present a convolutional architecture based on U-Net that can improve polyphonic music transcription performance of different baseline transcription networks. / Graduate
135

Action recognition using deep learning

Palasek, Petar January 2017 (has links)
In this thesis we study deep learning architectures for the problem of human action recognition in image sequences, i.e. the problem of automatically recognizing what people are doing in a given video. As unlabeled video data is easily accessible these days, we first explore models that can learn meaningful representations of sequences without actually having to know what is happening in the sequences at hand. More specifically, we first explore the convolutional restricted Boltzmann machine (RBM) and show how a stack of convolutional RBMs can be used to learn and extract features from sequences in an unsupervised way. Using the classical Fisher vector pipeline to encode the extracted features we apply them on the task of action classification. We move on to feature extraction using larger, deep convolutional neural networks and propose a novel architecture which expresses the processing steps of the classical Fisher vector pipeline as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, defining them as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. We show that our method achieves significant improvements in comparison to the classical Fisher vector extraction chain and results in a comparable performance to other convolutional networks, while largely reducing the number of required trainable parameters. Finally, we explore how the proposed architecture can be modified into a hybrid network that combines the benefits of both unsupervised and supervised training methods, resulting in a model that learns a semi-supervised Fisher vector descriptor of the input data. We evaluate the proposed model at image classification and action recognition problems and show how the model's classification performance improves as the amount of unlabeled data increases during training.
136

Gesture passwords: concepts, methods and challenges

Wu, Jonathan 21 June 2016 (has links)
Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures. A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric. This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods. A byproduct of this work is the creation and release of multiple publicly available, user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures.
137

Artificial intelligence system for continuous affect estimation from naturalistic human expressions

Abd Gaus, Yona Falinie January 2018 (has links)
The analysis and automatic affect estimation system from human expression has been acknowledged as an active research topic in computer vision community. Most reported affect recognition systems, however, only consider subjects performing well-defined acted expression, in a very controlled condition, so they are not robust enough for real-life recognition tasks with subject variation, acoustic surrounding and illumination change. In this thesis, an artificial intelligence system is proposed to continuously (represented along a continuum e.g., from -1 to +1) estimate affect behaviour in terms of latent dimensions (e.g., arousal and valence) from naturalistic human expressions. To tackle the issues, feature representation and machine learning strategies are addressed. In feature representation, human expression is represented by modalities such as audio, video, physiological signal and text modality. Hand- crafted features is extracted from each modality per frame, in order to match with consecutive affect label. However, the features extracted maybe missing information due to several factors such as background noise or lighting condition. Haar Wavelet Transform is employed to determine if noise cancellation mechanism in feature space should be considered in the design of affect estimation system. Other than hand-crafted features, deep learning features are also analysed in terms of the layer-wise; convolutional and fully connected layer. Convolutional Neural Network such as AlexNet, VGGFace and ResNet has been selected as deep learning architecture to do feature extraction on top of facial expression images. Then, multimodal fusion scheme is applied by fusing deep learning feature and hand-crafted feature together to improve the performance. In machine learning strategies, two-stage regression approach is introduced. In the first stage, baseline regression methods such as Support Vector Regression are applied to estimate each affect per time. Then in the second stage, subsequent model such as Time Delay Neural Network, Long Short-Term Memory and Kalman Filter is proposed to model the temporal relationships between consecutive estimation of each affect. In doing so, the temporal information employed by a subsequent model is not biased by high variability present in consecutive frame and at the same time, it allows the network to exploit the slow changing dynamic between emotional dynamic more efficiently. Following of two-stage regression approach for unimodal affect analysis, fusion information from different modalities is elaborated. Continuous emotion recognition in-the-wild is leveraged by investigating mathematical modelling for each emotion dimension. Linear Regression, Exponent Weighted Decision Fusion and Multi-Gene Genetic Programming are implemented to quantify the relationship between each modality. In summary, the research work presented in this thesis reveals a fundamental approach to automatically estimate affect value continuously from naturalistic human expression. The proposed system, which consists of feature smoothing, deep learning feature, two-stage regression framework and fusion using mathematical equation between modalities is demonstrated. It offers strong basis towards the development artificial intelligent system on estimation continuous affect estimation, and more broadly towards building a real-time emotion recognition system for human-computer interaction.
138

ScatterNet hybrid frameworks for deep learning

Singh, Amarjot January 2019 (has links)
Image understanding is the task of interpreting images by effectively solving the individual tasks of object recognition and semantic image segmentation. An image understanding system must have the capacity to distinguish between similar looking image regions while being invariant in its response to regions that have been altered by the appearance-altering transformation. The fundamental challenge for any such system lies within this simultaneous requirement for both invariance and specificity. Many image understanding systems have been proposed that capture geometric properties such as shapes, textures, motion and 3D perspective projections using filtering, non-linear modulus, and pooling operations. Deep learning networks ignore these geometric considerations and compute descriptors having suitable invariance and stability to geometric transformations using (end-to-end) learned multi-layered network filters. These deep learning networks in recent years have come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite the success of these deep networks, there remains a fundamental lack of understanding in the design and optimization of these networks which makes it difficult to develop them. Also, training of these networks requires large labeled datasets which in numerous applications may not be available. In this dissertation, we propose the ScatterNet Hybrid Framework for Deep Learning that is inspired by the circuitry of the visual cortex. The framework uses a hand-crafted front-end, an unsupervised learning based middle-section, and a supervised back-end to rapidly learn hierarchical features from unlabelled data. Each layer in the proposed framework is automatically optimized to produce the desired computationally efficient architecture. The term `Hybrid' is coined because the framework uses both unsupervised as well as supervised learning. We propose two hand-crafted front-ends that can extract locally invariant features from the input signals. Next, two ScatterNet Hybrid Deep Learning (SHDL) networks (a generative and a deterministic) were introduced by combining the proposed front-ends with two unsupervised learning modules which learn hierarchical features. These hierarchical features were finally used by a supervised learning module to solve the task of either object recognition or semantic image segmentation. The proposed front-ends have also been shown to improve the performance and learning of current Deep Supervised Learning Networks (VGG, NIN, ResNet) with reduced computing overhead.
139

Using Capsule Networks for Image and Speech Recognition Problems

January 2018 (has links)
abstract: In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network. In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly. Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2018
140

Systematic generation of datasets and benchmarks for modern computer vision

Malireddi, Sri Raghu 03 April 2019 (has links)
Deep Learning is dominant in the field of computer vision, thanks to its high performance. This high performance is driven by large annotated datasets and proper evaluation benchmarks. However, two important areas in computer vision, depth-based hand segmentation, and local features, respectively lack a large well-annotated dataset and a benchmark protocol that properly demonstrates its practical performance. Therefore, in this thesis, we focus on these two problems. For hand segmentation, we create a novel systematic way to easily create automatic semantic segmentation annotations for large datasets. We achieved this with the help of traditional computer vision techniques and minimal hardware setup of one RGB-D camera and two distinctly colored skin-tight gloves. Our method allows easy creation of large-scale datasets with high annotation quality. For local features, we create a new modern benchmark, that reveals their different aspects. Specifically wide-baseline stereo matching and Multi-View Stereo (MVS), of keypoints in a more practical setup, namely Structure-from-Motion (SfM). We believe that through our new benchmark, we will be able to spur research on learned local features to a more practical direction. In this respect, the benchmark developed for the thesis will be used to host a challenge on local features. / Graduate

Page generated in 0.0778 seconds