Global ETD Search

131	Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis January 2016 (has links) abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016 Computer science Mathematics Electrical engineering activity recognition dimensionality reduction human movement analysis machine learning manifolds riemannian geometry
132	Distinct Feature Learning and Nonlinear Variation Pattern Discovery Using Regularized Autoencoders January 2016 (has links) abstract: Feature learning and the discovery of nonlinear variation patterns in high-dimensional data is an important task in many problem domains, such as imaging, streaming data from sensors, and manufacturing. This dissertation presents several methods for learning and visualizing nonlinear variation in high-dimensional data. First, an automated method for discovering nonlinear variation patterns using deep learning autoencoders is proposed. The approach provides a functional mapping from a low-dimensional representation to the original spatially-dense data that is both interpretable and efficient with respect to preserving information. Experimental results indicate that deep learning autoencoders outperform manifold learning and principal component analysis in reproducing the original data from the learned variation sources. A key issue in using autoencoders for nonlinear variation pattern discovery is to encourage the learning of solutions where each feature represents a unique variation source, which we define as distinct features. This problem of learning distinct features is also referred to as disentangling factors of variation in the representation learning literature. The remainder of this dissertation highlights and provides solutions for this important problem. An alternating autoencoder training method is presented and a new measure motivated by orthogonal loadings in linear models is proposed to quantify feature distinctness in the nonlinear models. Simulated point cloud data and handwritten digit images illustrate that standard training methods for autoencoders consistently mix the true variation sources in the learned low-dimensional representation, whereas the alternating method produces solutions with more distinct patterns. Finally, a new regularization method for learning distinct nonlinear features using autoencoders is proposed. Motivated in-part by the properties of linear solutions, a series of learning constraints are implemented via regularization penalties during stochastic gradient descent training. These include the orthogonality of tangent vectors to the manifold, the correlation between learned features, and the distributions of the learned features. This regularized learning approach yields low-dimensional representations which can be better interpreted and used to identify the true sources of variation impacting a high-dimensional feature space. Experimental results demonstrate the effectiveness of this method for nonlinear variation pattern discovery on both simulated and real data sets. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2016 Industrial engineering Artificial intelligence Computer science Autoassociative neural network Autoencoder Dimensionality reduction Distinct features Nonlinear principal component analysis Nonlinear variation
133	3D - Patch Based Machine Learning Systems for Alzheimer’s Disease classification via 18F-FDG PET Analysis January 2017 (has links) abstract: Alzheimer’s disease (AD), is a chronic neurodegenerative disease that usually starts slowly and gets worse over time. It is the cause of 60% to 70% of cases of dementia. There is growing interest in identifying brain image biomarkers that help evaluate AD risk pre-symptomatically. High-dimensional non-linear pattern classification methods have been applied to structural magnetic resonance images (MRI’s) and used to discriminate between clinical groups in Alzheimers progression. Using Fluorodeoxyglucose (FDG) positron emission tomography (PET) as the pre- ferred imaging modality, this thesis develops two independent machine learning based patch analysis methods and uses them to perform six binary classification experiments across different (AD) diagnostic categories. Specifically, features were extracted and learned using dimensionality reduction and dictionary learning & sparse coding by taking overlapping patches in and around the cerebral cortex and using them as fea- tures. Using AdaBoost as the preferred choice of classifier both methods try to utilize 18F-FDG PET as a biological marker in the early diagnosis of Alzheimer’s . Addi- tional we investigate the involvement of rich demographic features (ApoeE3, ApoeE4 and Functional Activities Questionnaires (FAQ)) in classification. The experimental results on Alzheimer’s Disease Neuroimaging initiative (ADNI) dataset demonstrate the effectiveness of both the proposed systems. The use of 18F-FDG PET may offer a new sensitive biomarker and enrich the brain imaging analysis toolset for studying the diagnosis and prognosis of AD. / Dissertation/Thesis / Thesis Defense Presentation / Masters Thesis Computer Science 2017 Computer science Medical imaging Neurosciences adaboost classifier alzheimer's disease dimensionality reduction machine learning max-pooling sparse coding
134	Investigating Gene-Gene and Gene-Environment Interactions in the Association Between Overnutrition and Obesity-Related Phenotypes Tessier, François January 2017 (has links) Introduction – Animal studies suggested that NFKB1, SOCS3 and IKBKB genes could be involved in the association between overnutrition and obesity. This study aims to investigate interactions involving these genes and nutrition affecting obesity-related phenotypes. Methods – We used multifactor dimensionality reduction (MDR) and penalized logistic regression (PLR) to better detect gene/environment interactions in data from the Toronto Nutrigenomics and Health Study (n=1639) using dichotomized body mass index (BMI) and waist circumference (WC) as obesity-related phenotypes. Exposure variables included genotypes on 54 single nucleotide polymorphisms, dietary factors and ethnicity. Results – MDR identified interactions between SOCS3 rs6501199 and rs4969172, and IKBKB rs3747811 affecting BMI in whites; SOCS3 rs6501199 and NFKB1 rs1609798 affecting WC in whites; and SOCS3 rs4436839 and IKBKB rs3747811 affecting WC in South Asians. PLR found a main effect of SOCS3 rs12944581 on BMI among South Asians. Conclusion – MDR and PLR gave different results, but support some results from previous studies. Epidemiology Genetics Obesity Nutrition Interaction Overnutrition Multifactor dimensionality reduction Penalized logistic regression Gene-gene interaction Gene-environment interaction Genetic epidemiology
135	Metric Learning via Linear Embeddings for Human Motion Recognition Kong, ByoungDoo 18 December 2020 (has links) We consider the application of Few-Shot Learning (FSL) and dimensionality reduction to the problem of human motion recognition (HMR). The structure of human motion has unique characteristics such as its dynamic and high-dimensional nature. Recent research on human motion recognition uses deep neural networks with multiple layers. Most importantly, large datasets will need to be collected to use such networks to analyze human motion. This process is both time-consuming and expensive since a large motion capture database must be collected and labeled. Despite significant progress having been made in human motion recognition, state-of-the-art algorithms still misclassify actions because of characteristics such as the difficulty in obtaining large-scale leveled human motion datasets. To address these limitations, we use metric-based FSL methods that use small-size data in conjunction with dimensionality reduction. We also propose a modified dimensionality reduction scheme based on the preservation of secants tailored to arbitrary useful distances, such as the geodesic distance learned by ISOMAP. We provide multiple experimental results that demonstrate improvements in human motion classification. Human Motion Recognition Few-Shot learning Metric learning Dimensionality Reduction Classification Computer Vision Computer Engineering Computer Sciences Electrical and Computer Engineering
136	Efektivní tagování fotografií / Efficient Image Tagging Procházka, Václav January 2013 (has links) This thesis investigates efficient manual image tagging approaches. It specifically focuses on organising images into clusters depending on their content, and thus on simplifying the selection of similar photos. Such selections may be efficiently tagged with common tags. The thesis investigates known techniques for visualisation of image collections according to the image content, together with dimensionality reduction methods. The most suitable methods are considered and evaluated. The thesis proposes a novel method for presenting image collections on 2D displays which combines a timeline with similarity grouping (Timeline projection). This method utilizes t-Distributed Stochastic Neighbour Embedding (t-SNE) for otpimally projecting groupings in high dimensional feature spaces onto the low-dimensional screen. Various modifications of t-SNE and ways to combine it with the timeline are discussed and chosen combination is implemented as a web interface and is qualitatively evaluated in a user study. Possible directions of further research on the subject are suggested.
137	Optimizing Deep Neural Networks for Classification of Short Texts Pettersson, Fredrik January 2019 (has links) This master's thesis investigates how a state-of-the-art (SOTA) deep neural network (NN) model can be created for a specific natural language processing (NLP) dataset, the effects of using different dimensionality reduction techniques on common pre-trained word embeddings and how well this model generalize on a secondary dataset. The research is motivated by two factors. One is that the construction of a machine learning (ML) text classification (TC) model is typically done around a specific dataset and often requires a lot of manual intervention. It's therefore hard to know exactly what procedures to implement for a specific dataset and how the result will be affected. The other reason is that, if the dimensionality of pre-trained embedding vectors can be lowered without losing accuracy, and thus saving execution time, other techniques can be used during the time saved to achieve even higher accuracy. A handful of deep neural network architectures are used, namely a convolutional neural network (CNN), long short-term memory neural network (LSTM) and a bidirectional LSTM (Bi-LSTM) architecture. These deep neural network architectures are combined with four different word embeddings: GoogleNews-vectors-negative300, glove.840B.300d, paragram_300_sl999 and wiki-news-300d-1M. Three main experiments are conducted in this thesis. In the first experiment, a top-performing TC model is created for a recent NLP competition held at Kaggle.com. Each implemented procedure is benchmarked on how the accuracy and execution time of the model is affected. In the second experiment, principal component analysis (PCA) and random projection (RP) are applied to the pre-trained word embeddings used in the top-performing model to investigate how the accuracy and execution time is affected when creating lower-dimensional embedding vectors. In the third experiment, the same model is benchmarked on a separate dataset (Sentiment140) to investigate how well it generalizes on other data and how each implemented procedure affects the accuracy compared to on the original dataset. The first experiment results in a bidirectional LSTM model and a combination of the three embeddings: glove, paragram and wiki-news concatenated together. The model is able to give predictions with an F1 score of 71% which is good enough to reach 9th place out of 1,401 participating teams in the competition. In the second experiment, the execution time is improved by 13%, by using PCA, while lowering the dimensionality of the embeddings by 66% and only losing half a percent of F1 accuracy. RP gave a constant accuracy of 66-67% regardless of the projected dimensions compared to over 70% when using PCA. In the third experiment, the model gained around 12% accuracy from the initial to the final benchmarks, compared to 19% on the competition dataset. The best-achieved accuracy on the Sentiment140 dataset is 86% and thus higher than the 71% achieved on the Quora dataset. Machine learning Text classification state-of-the-art SOTA Deep neural network Dimensionality reduction Word embeddings Computer and Information Sciences Data- och informationsvetenskap
138	Mera sličnosti između modela Gausovih smeša zasnovana na transformaciji prostora parametara Krstanović Lidija 25 September 2017 (has links) <p>Predmet istraživanja ovog rada je istraživanje i eksploatacija mogućnosti da parametri Gausovih komponenti korišćenih Gaussian mixture modela  (GMM) aproksimativno leže na niže dimenzionalnoj površi umetnutoj u konusu pozitivno definitnih matrica. U tu svrhu uvodimo novu, mnogo efikasniju meru sličnosti između GMM-ova projektovanjem LPP-tipa parametara komponenti iz više dimenzionalnog parametarskog originalno konfiguracijskog prostora u prostor značajno niže dimenzionalnosti. Prema tome, nalaženje distance između dva GMM-a iz originalnog prostora se redukuje na nalaženje distance između dva skupa niže dimenzionalnih euklidskih vektora, ponderisanih odgovarajućim težinama. Predložena mera je pogodna za primene koje zahtevaju visoko dimenzionalni prostor obeležja i/ili veliki ukupan broj Gausovih komponenti. Razrađena metodologija je primenjena kako na sintetičkim tako i na realnim eksperimentalnim podacima.</p> / <p>This thesis studies the possibility that the parameters of Gaussian components of a<br />particular Gaussian Mixture Model (GMM) lie approximately on a lower-dimensional<br />surface embedded in the cone of positive definite matrices. For that case, we deliver<br />novel, more efficient similarity measure between GMMs, by LPP-like projecting the<br />components of a particular GMM, from the high dimensional original parameter space,<br />to a much lower dimensional space. Thus, finding the distance between two GMMs in<br />the original space is reduced to finding the distance between sets of lower<br />dimensional euclidian vectors, pondered by corresponding weights. The proposed<br />measure is suitable for applications that utilize high dimensional feature spaces and/or<br />large overall number of Gaussian components. We confirm our results on artificial, as<br />well as real experimental data.</p>
139	VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS Junhan Zhao (11024559) 25 June 2021 (has links) Enabling human understanding of high-dimensional (HD) data is critical for scientific research but highly challenging. To deal with large datasets, probabilistic-based non-linear DR models, like UMAP and t-SNE, lead the performance on reducing the high dimensionality. However, considering the trade-off between global and local structure preservation and the randomness initialized for computation, applying non-linear models in different parameter settings to unknown high-dimensional structure data may return different 2D visual forms. Much critical neighborhood relationship may be falsely imposed, and uncertainty may be introduced into the low-dimensional embedding visualizations, so-called distortion. In this work, a survey has been conducted to illustrate the most state-of-the-art layout enrichment works for interpreting dimensionality reduction methods and results. Responding to the lack of visual interpretation techniques to probabilistic-based DR methods, we propose a visualization technique called ManiGraph, which facilitates users to explore multi-view 2D embeddings via mesoscopic structure graphs. A dynamic mesoscopic structure first subsets HD data by a hexagonal grid in visual space from non-linear embedding (e.g., UMAP). Then, it measures the regional adapted trustworthiness/continuity and visualizes the restored missing and highlighted false connections between subsets from high-dimensional space to the low-dimensional in a node-linkage manner. The visualization helps users understand and interpret the distortion from both visualization and model stages. We further demonstrate the user cases tested on intuitive 3D toy datasets, fashion-MNIST, and single-cell RNA sequencing with domain experts in unsupervised scenarios. This work will potentially benefit the data science community, from toolkit users to DR algorithm developers.<br> Computer graphics Human-computer interaction data visualization methods dimensionality reduction analysis Uncertainty Unsupervised Learning Computer-Human Interaction Computer Graphics
140	Data analysis for Systematic Literature Reviews Chao, Roger January 2021 (has links) Systematic Literature Reviews (SLR) are a powerful research tool to identify and select literature to answer a certain question. However, an approach to extract inherent analytical data in Systematic Literature Reviews’ multi-dimensional datasets was lacking. Previous Systematic Literature Review tools do not incorporate the capability of providing said analytical insight. Therefore, this thesis aims to provide a useful approach comprehending various algorithms and data treatment techniques to provide the user with analytical insight on their data that is not evident in the bare execution of a Systematic Literature Review. For this goal, a literature review has been conducted to find the most relevant techniques to extract data from multi-dimensional data sets and the aforementioned approach has been tested on a survey regarding Self-Adaptive Systems (SAS) using a web-application. As a result, we find out what are the most adequate techniques to incorporate into the approach this thesis will provide. data analysis systematic literature reviews clustering dimensionality reduction self-adaptive systems multi-dimensional data Computer Engineering Datorteknik

Search results