Global ETD Search

11	Exploring Latent Structure in Data: Algorithms and Implementations January 2014 (has links) abstract: Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many efforts to generate data-driven representations using clustering and sparse models. This dissertation focuses on building data-driven unsupervised models for analyzing raw data and developing efficient feature representations. Simultaneous segmentation and feature extraction approaches for silicon-pores sensor data are considered. Aggregating data into a matrix and performing low rank and sparse matrix decompositions with additional smoothness constraints are proposed to solve this problem. Comparison of several variants of the approaches and results for signal de-noising and translocation/trapping event extraction are presented. Algorithms to improve transform-domain features for ion-channel time-series signals based on matrix completion are presented. The improved features achieve better performance in classification tasks and in reducing the false alarm rates when applied to analyte detection. Developing representations for multimedia is an important and challenging problem with applications ranging from scene recognition, multi-media retrieval and personal life-logging systems to field robot navigation. In this dissertation, we present a new framework for feature extraction for challenging natural environment sounds. Proposed features outperform traditional spectral features on challenging environmental sound datasets. Several algorithms are proposed that perform supervised tasks such as recognition and tag annotation. Ensemble methods are proposed to improve the tag annotation process. To facilitate the use of large datasets, fast implementations are developed for sparse coding, the key component in our algorithms. Several strategies to speed-up Orthogonal Matching Pursuit algorithm using CUDA kernel on a GPU are proposed. Implementations are also developed for a large scale image retrieval system. Image-based "exact search" and "visually similar search" using the image patch sparse codes are performed. Results demonstrate large speed-up over CPU implementations and good retrieval performance is also achieved. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2014 Electrical engineering Computer engineering Artificial intelligence Feature Learning GPU Machine Learning Retrieval Sparse Coding
12	Feature learning with deep neural networks for keystroke biometrics : A study of supervised pre-training and autoencoders Hellström, Erik January 2018 (has links) Computer security is becoming an increasingly important topic in today’s society, withever increasing connectivity between devices and services. Stolen passwords have thepotential to cause severe damage to companies and individuals alike, leading to therequirement that the security system must be able to detect and prevent fraudulentlogin. Keystroke biometrics is the study of the typing behavior in order to identifythe typist, using features extracted during typing. The features traditionally used inkeystroke biometrics are linear combinations of the timestamps of the keystrokes.This work focuses on feature learning methods and is based on the Carnegie Mellonkeystroke data set. The aim is to investigate if other feature extraction methods canenable improved classification of users. Two methods are employed to extract latentfeatures in the data: Pre-training of an artificial neural network classifier and an autoencoder. Several tests are devised to test the impact of pre-training and compare theresults of a similar network without pre-training. The effect of feature extraction withan autoencoder on a classifier trained on the autoencoder features in combination withthe conventional features is investigated.Using pre-training, I find that the classification accuracy does not improve when using an adaptive learning rate optimizer. However, when a stochastic gradient descentoptimizer is used the accuracy improves by about 8%. Used in conjunction with theconventional features, the features extracted with an autoencoder improve the accuracyof the classifier with about 2%. However, a classifier based on the autoencoder featuresalone is not better than a classifier based on conventional features. Machine Learning Feature Learning Neural Networks Keystroke Biometrics Behaviosec Behaviometrics Computer Sciences Datavetenskap (datalogi)
13	Temporal Feature Selection with Symbolic Regression Fusting, Christopher Winter 01 January 2017 (has links) Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal'' that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic. artificial intelligence evolution feature learning genetic programming machine learning symbolic regression Applied Mathematics
14	Sentiment Analysis on Multi-view Social Data Niu, Teng January 2016 (has links) With the proliferation of social networks, people are likely to share their opinions about news, social events and products on the Web. There is an increasing interest in understanding users’ attitude or sentiment from the large repository of opinion-rich data on the Web. This can beneﬁt many commercial and political applications. Primarily, the researchers concentrated on the documents such as users’ comments on the purchased products. Recent works show that visual appearance also conveys rich human affection that can be predicted. While great efforts have been devoted on the single media, either text or image, little attempts are paid for the joint analysis of multi-view data which is becoming a prevalent form in the social media. For example, paired with the posted textual messages on Twitter, users are likely to upload images and videos which may carry their affective states. One common obstacle is the lack of sufficient manually annotated instances for model learning and performance evaluation. To prompt the researches on this problem, we introduce a multi-view sentiment analysis dataset (MVSA) including a set of manually annotated image-text pairs collected from Twitter. The dataset can be utilized as a valuable benchmark for both single-view and multi-view sentiment analysis. In this thesis, we further conduct a comprehensive study on computational analysis of sentiment from the multi-view data. The state-of-the-art approaches on single view (image or text) or multi view (image and text) data are introduced, and compared through extensive experiments conducted on our constructed dataset and other public datasets. More importantly, the effectiveness of the correlation between different views is also studied using the widely used fusion strategies and advanced multi-view feature extraction methods. Sentiment analysis social media multi-view data textual feature visual feature joint feature learning
15	Benchmarking authorship attribution techniques using over a thousand books by fifty Victorian era novelists Gungor, Abdulmecit 03 April 2018 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Authorship attribution (AA) is the process of identifying the author of a given text and from the machine learning perspective, it can be seen as a classification problem. In the literature, there are a lot of classification methods for which feature extraction techniques are conducted. In this thesis, we explore information retrieval techniques such as Word2Vec, paragraph2vec, and other useful feature selection and extraction techniques for a given text with different classifiers. We have performed experiments on novels that are extracted from GDELT database by using different features such as bag of words, n-grams or newly developed techniques like Word2Vec. To improve our success rate, we have combined some useful features some of which are diversity measure of text, bag of words, bigrams, specific words that are written differently between English and American authors. Support vector machine classifiers with nu-SVC type is observed to give best success rates on the stacked useful feature set. The main purpose of this work is to lay the foundations of feature extraction techniques in AA. These are lexical, character-level, syntactic, semantic, application specific features. We also have aimed to offer a new data resource for the author attribution research community and demonstrate how it can be used to extract features as in any kind of AA problem. The dataset we have introduced consists of works of Victorian era authors and the main feature extraction techniques are shown with exemplary code snippets for audiences in different knowledge domains. Feature extraction approaches and implementation with different classifiers are employed in simple ways such that it would also serve as a beginner step to AA. Some feature extraction techniques introduced in this work are also meant to be employed in different NLP tasks such as sentiment analysis with Word2Vec or text summarization. Using the introduced NLP tasks and feature extraction techniques one can start to implement them on our dataset. We have also introduced several methods to implement extracted features in different methodologies such as feature stack engineering with different classifiers, or using Word2Vec to create sentence level vectors. Authorship Attribution Word2Vec Doc2Vec Word2Vec Inversion Word Scoring
16	Novel representation learning methodologies for consensus module detection, candidate gene prioritization, and biomarker discovery. Ghandikota, Sudhir 31 May 2023 (has links) No description available. Computer Science Feature learning Neural networks Machine Learning Bioinformatics Multimodal Transcriptomics
17	Improved Feature-Selection for Classification Problems using Multiple Auto-Encoders Guo, Xinyu 29 May 2018 (has links) No description available. Computer Science auto-encoder feature selection feature learning deep learning neuroimaging
18	ON CONVOLUTIONAL NEURAL NETWORKS FOR KNOWLEDGE GRAPH EMBEDDING AND COMPLETION Shen, Chen, 0000-0002-8465-6204 January 2020 (has links) Data plays the key role in almost every field of computer sciences, including knowledge graph field. The type of data varies across fields. For example, the data type of knowledge graph field is knowledge triples, while it is visual data like images and videos in computer vision field, and textual data like articles and news in natural language processing field. Data could not be utilized directly by machine learning models, thus data representation learning and feature design for various types of data are two critical tasks in many computer sciences fields. Researchers develop various models and frameworks to learn and extract features, and aim to represent information in defined embedding spaces. The classic models usually embed the data in a low-dimensional space, while neural network models are able to generate more meaningful and complex high-dimensional deep features in recent years. In knowledge graph field, almost every approach represent entities and relations in a low-dimensional space, because there are too many knowledge and triples in real-world. Recently a few approaches apply neural networks on knowledge graph learning. However, these models are only able to capture local and shallow features. We observe the following three important issues with the development of feature learning with neural networks. On one side, neural networks are not black boxes that work well in every case without specific design. There is still a lot of work to do about how to design and propose more powerful and robust neural networks for different types of data. On the other side, more studies about utilizing these representations and features in many applications are necessary. What's more, traditional representations and features work better in some domains, while deep representations and features perform better on other domains. Transfer learning is introduced to bridge the gap between domains and adapt various type of features for many tasks. In this dissertation, we aim to solve the above issues. For knowledge graph learning task, we propose a few important observations both theoretically and practically for current knowledge graph learning approaches, especially for knowledge graph learning based on Convolutional Neural Networks. Besides the work in knowledge graph field, we not only develop different types of feature and representation learning frameworks for various data types, but also develop effective transfer learning algorithm to utilize the features and representations. The obtained features and representations by neural networks are utilized successfully in multiple fields. Firstly, we analyze the current issues on knowledge graph learning models, and present eight observations for existing knowledge graph embedding approaches, especially for approaches based on Convolutional Neural Networks. Secondly, we proposed a novel unsupervised heterogeneous domain adaptation framework that could deal with features in various types. Multimedia features are able to be adapted, and the proposed algorithm could bridge the representation gap between the source and target domains. Thirdly, we propose a novel framework to learn and embed user comments and online news data in unit of sessions. We predict the article of interest for users with deep neural networks and attention models. Lastly, we design and analyze a large number of features to represent dynamics of user comments and news article. The features span a broad spectrum of facets including news article and comment contents, temporal dynamics, sentiment/linguistic features, and user behaviors. Our main insight is that the early dynamics from user comments contribute the most to an accurate prediction, while news article specific factors have surprisingly little influence. / Computer and Information Science Artificial Intelligence Computer Science Feature Learning Knowledge Graph Machine Learning Neural Networks
19	On the Use of Convolutional Neural Networks for Specific Emitter Identification Wong, Lauren J. 12 June 2018 (has links) Specific Emitter Identification (SEI) is the association of a received signal to an emitter, and is made possible by the unique and unintentional characteristics an emitter imparts onto each transmission, known as its radio frequency (RF) fingerprint. SEI systems are of vital importance to the military for applications such as early warning systems, emitter tracking, and emitter location. More recently, cognitive radio systems have started making use of SEI systems to enforce Dynamic Spectrum Access (DSA) rules. The use of pre-determined and expert defined signal features to characterize the RF fingerprint of emitters of interest limits current state-of-the-art SEI systems in numerous ways. Recent work in RF Machine Learning (RFML) and Convolutional Neural Networks (CNNs) has shown the capability to perform signal processing tasks such as modulation classification, without the need for pre-defined expert features. Given this success, the work presented in this thesis investigates the ability to use CNNs, in place of a traditional expert-defined feature extraction process, to improve upon traditional SEI systems, by developing and analyzing two distinct approaches for performing SEI using CNNs. Neither approach assumes a priori knowledge of the emitters of interest. Further, both approaches use only raw IQ data as input, and are designed to be easily tuned or modified for new operating environments. Results show CNNs can be used to both estimate expert-defined features and to learn emitter-specific features to effectively identify emitters. / Master of Science Specific Emitter Identification Convolutional Neural Networks IQ Imbalance Estimation Feature Learning Clustering
20	Discriminative object categorization with external semantic knowledge Hwang, Sung Ju 25 September 2013 (has links) Visual object category recognition is one of the most challenging problems in computer vision. Even assuming that we can obtain a near-perfect instance level representation with the advances in visual input devices and low-level vision techniques, object categorization still remains as a difficult problem because it requires drawing boundaries between instances in a continuous world, where the boundaries are solely defined by human conceptualization. Object categorization is essentially a perceptual process that takes place in a human-defined semantic space. In this semantic space, the categories reside not in isolation, but in relation to others. Some categories are similar, grouped, or co-occur, and some are not. However, despite this semantic nature of object categorization, most of the today's automatic visual category recognition systems rely only on the category labels for training discriminative recognition with statistical machine learning techniques. In many cases, this could result in the recognition model being misled into learning incorrect associations between visual features and the semantic labels, from essentially overfitting to training set biases. This limits the model's prediction power when new test instances are given. Using semantic knowledge has great potential to benefit object category recognition. First, semantic knowledge could guide the training model to learn a correct association between visual features and the categories. Second, semantics provide much richer information beyond the membership information given by the labels, in the form of inter-category and category-attribute distances, relations, and structures. Finally, the semantic knowledge scales well as the relations between categories become larger with an increasing number of categories. My goal in this thesis is to learn discriminative models for categorization that leverage semantic knowledge for object recognition, with a special focus on the semantic relationships among different categories and concepts. To this end, I explore three semantic sources, namely attributes, taxonomies, and analogies, and I show how to incorporate them into the original discriminative model as a form of structural regularization. In particular, for each form of semantic knowledge I present a feature learning approach that defines a semantic embedding to support the object categorization task. The regularization penalizes the models that deviate from the known structures according to the semantic knowledge provided. The first semantic source I explore is attributes, which are human-describable semantic characteristics of an instance. While the existing work treated them as mid-level features which did not introduce new information, I focus on their potential as a means to better guide the learning of object categories, by enforcing the object category classifiers to share features with attribute classifiers, in a multitask feature learning framework. This approach essentially discovers the common low-dimensional features that support predictions in both semantic spaces. Then, I move on to the semantic taxonomy, which is another valuable source of semantic knowledge. The merging and splitting criteria for the categories on a taxonomy are human-defined, and I aim to exploit this implicit semantic knowledge. Specifically, I propose a tree of metrics (ToM) that learns metrics that capture granularity-specific similarities at different nodes of a given semantic taxonomy, and uses a regularizer to isolate granularity-specific disjoint features. This approach captures the intuition that the features used for the discrimination of the parent class should be different from the features used for the children classes. Such learned metrics can be used for hierarchical classification. The use of a single taxonomy can be limited in that its structure is not optimal for hierarchical classification, and there may exist no single optimal semantic taxonomy that perfectly aligns with visual distributions. Thus, I next propose a way to overcome this limitation by leveraging multiple taxonomies as semantic sources to exploit, and combine the acquired complementary information across multiple semantic views and granularities. This allows us, for example, to synthesize semantics from both 'Biological', and 'Appearance'-based taxonomies when learning the visual features. Finally, as a further exploration of more complex semantic relations different from the previous two pairwise similarity-based models, I exploit analogies, which encode the relational similarities between two related pairs of categories. Specifically, I use analogies to regularize a discriminatively learned semantic embedding space for categorization, such that the displacements between the two category embeddings in both category pairs of the analogy are enforced to be the same. Such a constraint allows for a more confusing pair of categories to benefit from a clear separation in the matched pair of categories that share the same relation. All of these methods are evaluated on challenging public datasets, and are shown to effectively improve the recognition accuracy over purely discriminative models, while also guiding the recognition to be more semantic to human perception. Further, the applications of the proposed methods are not limited to visual object categorization in computer vision, but they can be applied to any classification problems where there exists some domain knowledge about the relationships or structures between the classes. Possible applications of my methods outside the visual recognition domain include document classification in natural language processing, and gene-based animal or protein classification in computational biology. / text Computer vision Machine learning Object categorization Object recognition Feature learning Metric learning Multitask learning Multiple kernel learning Embedding Manifold learning Regularization method Structured sparsity Structured regularization Hierarchical model

Search results