Global ETD Search

161	Classification in High Dimensional Feature Spaces through Random Subspace Ensembles Pathical, Santhosh P. January 2010 (has links) No description available. Computer Science machine learning classification ensemble learning high dimensional datasets curse of dimensionality
162	Organization of Electronic Dance Music by Dimensionality Reduction / Organisering av Elektronisk Dans Musik genom Dimensionsreducering Tideman, Victor January 2022 (has links) This thesis aims to produce a similarity metric for tracks of the genre: Electronic Dance Music, by taking a high-dimensional data representation of each track and then project it to a low-dimensional embedded space (2D and 3D) by applying two Dimensionality Reduction (DR) techniques called t-distributed stochastic neighbor embedding (t-SNE) and Pairwise Controlled Manifold Approximation (PaCMAP). A content-based approach is taken to identify similarity, which is defined as the distances between points in the embedded space. This work strives to explore the connection between the extractable content and the feel of a track. Features are extracted from every track over a 30 second window with Digital Signal Processing tools. Three evaluation methods were conducted with the purpose of establishing ground truth in the data. The first evaluation method established expected similarity sub clusters and tuned the DR techniques until the expected clusters appeared in the visualisations of the embedded space. The second evaluation method attempted to generate new tracks with a controlled level of separation by applying various distortion techniques with increasing magnitude to copies of a track. The third evaluation method introduces a data set with annotated scores on valence and arousal values of music snippets which was used to train estimators that was used to estimate the feeling of tracks and to perform classification. Lastly, a similarity metric was computed based on distances in the embedded space. Findings suggest that certain contextual groups such as remixes and tracks by the same artist, can be identified with this metric and that tracks with small distortions (similar tracks) are located more closely in the embedded space than tracks with large distortions. Dimensionality Reduction Digital Signal Processing Similarity Music Music Musik Signal Processing Signalbehandling Computer Systems Datorsystem
163	Spatial-spectral analysis in dimensionality reduction for hyperspectral image classification Shah, Chiranjibi 13 May 2022 (has links) This dissertation develops new algorithms with different techniques in utilizing spatial and spectral information for hyperspectral image classification. It is necessary to perform spatial and spectral analysis and conduct dimensionality reduction (DR) for effective feature extraction, because hyperspectral imagery consists of a large number of spatial pixels along with hundreds of spectral dimensions. In the first proposed method, it employs spatial-aware collaboration-competition preserving graph embedding by imposing a spatial regularization term along with Tikhonov regularization in the objective function for DR of hyperspectral imagery. Moreover, Collaboration representation (CR) is an efficient classifier but without using spatial information. Thus, structure-aware collaborative representation (SaCRT) is introduced to utilize spatial information for more appropriate data representations. It is demonstrated that better classification performance can be offered by the SaCRT in this work. For DR, collaborative and low-rank representation-based graph for discriminant analysis of hyperspectral imagery is proposed. It can generate a more informative graph by combining collaborative and low-rank representation terms. With the collaborative term, it can incorporate within-class atoms. Meanwhile, it can preserve global data structure by use of the low-rank term. Since it employs a collaborative term in the estimation of representation coefficients, its closed-form solution results in less computational complexity in comparison to sparse representation. The proposed collaborative and low-rank representation-based graph can outperform the existing sparse and low-rank representation-based graph for DR of hyperspectral imagery. The concept of tree-based techniques and deep neural networks can be combined by use of an interpretable canonical deep tabular data learning architecture (TabNet). It uses sequential attention for choosing appropriate features at different decision steps. An efficient TabNet for hyperspectral image classification is developed in this dissertation, in which the performance of TabNet is enhanced by incorporating a 2-D convolution layer inside an attentive transformer. Additionally, better classification performance of TabNet can be obtained by utilizing structure profiles on TabNet. Signal Processing
164	An Investigation of Unidimensional Testing Procedures under Latent Trait Theory using Principal Component Analysis McGill, Michael T. 11 December 2009 (has links) There are several generally accepted rules for detecting unidimensionality, but none are well tested. This simulation study investigated well-known methods, including but not limited to, the Kaiser (k>1) Criterion, Percentage of Measure Validity (greater than 50%, 40%, or 20%), Ratio of Eigenvalues, and Kelley method, and compares these methods to each other and a new method proposed by the author (McGill method) for assessing unidimensionality. After applying principal component analysis (PCA) to the residuals of a Latent Trait Test Theory (LTTT) model, this study was able to address three purposes: determining the Type I error rates associated with various criterion values, for assessing unidimensionality; determining the Type II error rates and statistical power associated with various rules of thumb when assessing dimensionality; and, finally, determining whether more suitable criterion values could be established for the methods of the study by accounting for various characteristics of the measurement context. For those methods based on criterion values, new modified values are proposed. For those methods without criterion values for dimensionality decisions, criterion values are modeled and presented. The methods compared in this study were investigated using PCA on residuals from the Rasch model. The sample size, test length, ability distribution variability, and item distribution variability were varied and the resulting Type I and Type II error rates of each method were examined. The results imply that certain conditions can cause improper diagnoses as to the dimensionality of instruments. Adjusted methods are suggested to induce a more stable condition relative to the Type I and Type II error rates. The nearly ubiquitous Kaiser method was found to be biased towards signaling multidimensionality whether it exists or not. The modified version of the Kaiser method and the McGill method, proposed by the author were shown to be among the best at detecting unidimensionality when it was present. In short, methods that take into account changes in variables such as sample size, test length, item variability, and person variability are better than methods that use a single, static criterionvalue in decision making with respect to dimensionality. / Ph. D. Unidimensionality Principal Component Analysis Dimensionality Measurement IRT Item Resonse THeory Rasch
165	Toward a General Novelty Detection Framework in Structural Health Monitoring; Challenges and Opportunities in Deep Learning Soleimani-Babakamali, Mohammad Hesam 17 October 2022 (has links) Structural health monitoring (SHM) is an anomaly detection process. Data-driven SHM has gained much attention compared to the model-based strategy, specifically with the current state-of-the-art machine learning routines. Model-based methods require structural information, time-consuming model updating, and may fail with noisy data, a persistent condition in real-time SHM problems. However, there are several hindrances in supervised and unsupervised settings in machine learning-based SHM. This study identifies and addresses such hindrances with the versatility of state-of-the-art deep learning strategies. While managing those complications, we aim at proposing a general, structure-independent (ie requires no prior information) SHM framework. Developing such techniques plays a crucial role in the SHM of smart cities. In the supervised SHM and sensor output validation (SOV) category, data class imbalance results from the lack of data from nuanced structural states. Employing Long Short-Term Memory (LSTM) units, we developed a general technique that manages both SHM and SOV. The developed architecture accepts high-dimensional features, enabling the train of Generative Adversarial Networks for data generation, addressing the complications of data imbalance. GAN-generated SHM data improved accuracy for low-sampled classes from 44.77% to 64.58% and from 73.39% to 90.84% in two SOV and SHM case studies, respectively. Arguing the unsupervised SHM as a practical category since it identifies novelties (ie unseen states), the current application of dimensionality reduction (DR) in unsupervised SHM is investigated. Due to the curse of dimensionality, classical unsupervised techniques cannot function with high-dimensional features, driving the use of DR techniques. Investigations highlighted the importance of avoiding DR in unsupervised SHM, as data dimensions that DR suppresses may contain damage-sensitive features for novelties. With DR, novelty detection accuracy declined up to 60% in two benchmark SHM datasets. Other obstacles in the unsupervised SHM area are case-dependent features, lack of dynamic-class novelty detection, and the impact of user-defined detection parameters on novelty detection accuracy. We chose the fast Fourier transform-based (FFT) of raw signals with no dimensionality reduction to develop the SHM framework. A deep neural network scheme is developed to perform the pattern recognition of that high-dimensional data. The framework does not require prior information, with GAN models implemented, offering robustness to sensor placement in structures. These characteristics make the framework suitable for developing general unsupervised SHM techniques. / Doctor of Philosophy / Detecting abnormal behaviors in structures from the input signals of sensors is called Structural health monitoring (SHM). The vibrational characteristics of signals or direct pattern recognition techniques can be applied to detect anomalies in a data-driven scheme. Machine learning (ML) tools are suitable for data-driven methods; However, challenges exist on both supervised and unsupervised ML-based SHM. Recent improvements in deep learning are employed in this study to address such obstacles after their identification. In supervised learning, the data points for the normal state of structures are abundant, and datasets are usually imbalanced, which is the same issue for the sensor output validation (SOV). SOV must be present before SHM takes place to remove anomalous sensor outputs. First, a unified decision-making system for SHM and SOV problems is proposed, and then data imbalance is alleviated by generating new data objects from low-sampled classes. The proposed unified method is based on the recurrent neural networks, and the generation mechanism is Generative Adversarial Network (GAN). Results indicate improvements in accuracy metrics for data classes in the minority. For the unsupervised SHM, four major issues are identified, including data loss during feature extraction, case-dependency of such extraction schemes. These two issues are solved with the fast Fourier transform (FFT) of signals to be the features with no reduction in their dimensionality. The other obstacles are the lack of discrimination between different novel classes (ie only two classes of damage and undamaged) and the effect of the detection parameters, defined by users, on the SHM analysis. The latter two predicaments are also addressed by online generating new data objects from the incoming signal stream with GAN and tuning the detection system to have the same performance regarding user-defined parameters regarding GAN-generated data. The proposed unsupervised technique is further improved to be insensitive to the sensor placement on structures by employing recurrent neural network layers in the GAN architecture, with the GAN that has overfitted discriminator. Structural health monitoring Reliability Analysis Dimensionality reduction Generative Adversarial Networks Sensor output validation
166	Dimensionality Reduction, Feature Selection and Visualization of Biological Data Ha, Sook Shin 14 September 2012 (has links) Due to the high dimensionality of most biological data, it is a difficult task to directly analyze, model and visualize the data to gain biological insight. Thus, dimensionality reduction becomes an imperative pre-processing step in analyzing and visualizing high-dimensional biological data. Two major approaches to dimensionality reduction in genomic analysis and biomarker identification studies are: Feature extraction, creating new features by combining existing ones based on a mapping technique; and feature selection, choosing an optimal subset of all features based on an objective function. In this dissertation, we show how our innovative reduction schemes effectively reduce the dimensionality of DNA gene expression data to extract biologically interpretable and relevant features which result in enhancing the biomarker identification process. To construct biologically interpretable features and facilitate Muscular Dystrophy (MD) subtypes classification, we extract molecular features from MD microarray data by constructing sub-networks using a novel integrative scheme which utilizes protein-protein interaction (PPI) network, functional gene sets information and mRNA profiling data. The workflow includes three major steps: First, by combining PPI network structure and gene-gene co-expression relationship into a new distance metric, we apply affinity propagation clustering (APC) to build gene sub-networks; secondly, we further incorporate functional gene sets knowledge to complement the physical interaction information; finally, based on the constructed sub-network and gene set features, we apply multi-class support vector machine (MSVM) for MD sub-type classification and highlight the biomarkers contributing to the sub-type prediction. The experimental results show that our scheme could construct sub-networks that are more relevant to MD than those constructed by the conventional approach. Furthermore, our integrative strategy substantially improved the prediction accuracy, especially for those â€˜hard-to-classify' sub-types. Conventionally, pathway-based analysis assumes that genes in a pathway equally contribute to a biological function, thus assigning uniform weight to genes. However, this assumption has been proven incorrect and applying uniform weight in the pathway analysis may not be an adequate approach for tasks like molecular classification of diseases, as genes in a functional group may have different differential power. Hence, we propose to use different weights for the pathway analysis which resulted in the development of four weighting schemes. We applied them in two existing pathway analysis methods using both real and simulated gene expression data for pathways. Weighting changes pathway scoring and brings up some new significant pathways, leading to the detection of disease-related genes that are missed under uniform weight. To help us understand our MD expression data better and derive scientific insight from it, we have explored a suite of visualization tools. Particularly, for selected top performing MD sub-networks, we displayed the network view using Cytoscape; functional annotations using IPA and DAVID functional analysis tools; expression pattern using heat-map and parallel coordinates plot; and MD associated pathways using KEGG pathway diagrams. We also performed weighted MD pathway analysis, and identified overlapping sub-networks across different weight schemes and different MD subtypes using Venn Diagrams, which resulted in the identification of a new sub-network significantly associated with MD. All those graphically displayed data and information helped us understand our MD data and the MD subtypes better, resulting in the identification of several potentially MD associated biomarker pathways and genes. / Ph. D. Gene Expression Feature Selection Dimensionality Reduction PPI network Pathways Visualization Weight
167	On the Effectiveness of Dimensionality Reduction for Unsupervised Structural Health Monitoring Anomaly Detection Soleimani-Babakamali, Mohammad Hesam 19 April 2022 (has links) Dimensionality reduction techniques (DR) enhance data interpretability and reduce space complexity, though at the cost of information loss. Such methods have been prevalent in the Structural Health Monitoring (SHM) anomaly detection literature. While DR is favorable in supervised anomaly detection, where possible novelties are known a priori, the efficacy is less clear in unsupervised detection. In this work, we perform a detailed assessment of the DR performance trade-offs to determine whether the information loss imposed by DR can impact SHM performance for previously unseen novelties. As a basis for our analysis, we rely on an SHM anomaly detection method operating on input signals' fast Fourier transform (FFT). FFT is regarded as a raw, frequency-domain feature that allows studying various DR techniques. We design extensive experiments comparing various DR techniques, including neural autoencoder models, to capture the impact on two SHM benchmark datasets exclusively. Results imply the loss of information to be more detrimental, reducing the novelty detection accuracy by up to 60\% with autoencoder-based DR. Regularization can alleviate some of the challenges though unpredictable. Dimensions of substantial vibrational information mostly survive DR; thus, the regularization impact suggests that these dimensions are not reliable damage-sensitive features regarding unseen faults. Consequently, we argue that designing new SHM anomaly detection methods that can work with high-dimensional raw features is a necessary research direction and present open challenges and future directions. / M.S. / Structural health monitoring (SHM) aids the timely maintenance of infrastructures, saving human lives and natural resources. Infrastructure will undergo unseen damages in the future. Thus, data-driven SHM techniques for handling unlabeled data (i.e., unsupervised learning) are suitable for real-world usage. Lacking labels and defined data classes, data instances are categorized through similarities, i.e., distances. Still, distance metrics in high-dimensional spaces can become meaningless. As a result, applying methods to reduce data dimensions is currently practiced, yet, at the cost of information loss. Naturally, a trade-off exists between the loss of information and the increased interpretability of low-dimensional spaces induced by dimensionality reduction procedures. This study proposes an unsupervised SHM technique that works with low and high-dimensional data to assess that trade-off. Results show the negative impacts of dimensionality reduction to be more severe than its benefits. Developing unsupervised SHM methods with raw data is thus encouraged for real-world applications. Unsupervised SHM Generative Adversarial Networks Anomaly Detection Dimensionality Reduction Autoencoder Regularization
168	WHAT A WASTE : A dialogue between maker and material towards woven textile sculptures Taken, Joanne Jasmijn January 2024 (has links) As soon as a material is viewed as waste, the value of this resource diminishes, leading to a lack of concern for its preservation. By presenting alternative perspectives on such materials, their beauty and potential can be shown. In WHAT A WASTE, new light is shone on discarded fishing ropes from the Swedish seas. The project focuses on the unique qualities of these discarded materials, showcasing their worth despite being worn out. A material-driven approach is employed, wherein the selected materials’ behaviour guides the creation of shape and form. In a dialogue between maker and material, the different characters of the collection take form through a playful and intuitive process. Weaving is the main technique combined with a construction principle in which the unique properties of the discarded ropes lead to shape, colour and texture discoveries. Contemplating utilising an unconventional material in the loom to highlight innovative visual and tactile characteristics. This project proposes new design methods for reusing discarded materials and how people can value waste again through textile design. Exploring ways of working with the ropes, such as detangling, in combination with other yarns can create new aesthetics for three-dimensional woven forms. Aiming to create woven sculptures of varying scales and shapes in a spatial context through a material-led process. textile design weaving repurposing material-led three dimensionality spatial context Design Design
169	Neue Indexingverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen Datenmengen / New indexing techniques for similarity search in metric spaces Guhlemann, Steffen 06 July 2016 (has links) (PDF) Ein zunehmend wichtiges Thema in der Informatik ist der Umgang mit Ähnlichkeit in einer großen Anzahl unterschiedlicher Domänen. Derzeit existiert keine universell verwendbare Infrastruktur für die Ähnlichkeitssuche in allgemeinen metrischen Räumen. Ziel der Arbeit ist es, die Grundlage für eine derartige Infrastruktur zu legen, die in klassische Datenbankmanagementsysteme integriert werden könnte. Im Rahmen einer Analyse des State of the Art wird der M-Baum als am besten geeignete Basisstruktur identifiziert. Dieser wird anschließend zum EM-Baum erweitert, wobei strukturelle Kompatibilität mit dem M-Baum erhalten wird. Die Abfragealgorithmen werden im Hinblick auf eine Minimierung notwendiger Distanzberechnungen optimiert. Aufbauend auf einer mathematischen Analyse der Beziehung zwischen Baumstruktur und Abfrageaufwand werden Freiheitsgrade in Baumänderungsalgorithmen genutzt, um Bäume so zu konstruieren, dass Ähnlichkeitsanfragen mit einer minimalen Anzahl an Anfrageoperationen beantwortet werden können. / A topic of growing importance in computer science is the handling of similarity in multiple heterogenous domains. Currently there is no common infrastructure to support this for the general metric space. The goal of this work is lay the foundation for such an infrastructure, which could be integrated into classical data base management systems. After some analysis of the state of the art the M-Tree is identified as most suitable base and enhanced in multiple ways to the EM-Tree retaining structural compatibility. The query algorithms are optimized to reduce the number of necessary distance calculations. On the basis of a mathematical analysis of the relation between the tree structure and the query performance degrees of freedom in the tree edit algorithms are used to build trees optimized for answering similarity queries using a minimal number of distance calculations. Metrik Metrischer Raum Indexing Curse of Dimensionality EM-Baum M-Baum Ähnlichkeitssuche Bereichssuche k-Nächste-Nachbarn-Suche Metric Metric space Indexing Curse of Dimensionality EM-Tree M-Tree Similarity search Range query k-Nearest-Neighbor-Query ddc:004 rvk:ST 270
170	Regularisation and variable selection using penalized likelihood / Régularisation et sélection de variables par le biais de la vraisemblance pénalisée El anbari, Mohammed 14 December 2011 (has links) Dans cette thèse nous nous intéressons aux problèmes de la sélection de variables en régression linéaire. Ces travaux sont en particulier motivés par les développements récents en génomique, protéomique, imagerie biomédicale, traitement de signal, traitement d’image, en marketing, etc… Nous regardons ce problème selon les deux points de vue fréquentielle et bayésienne.Dans un cadre fréquentiel, nous proposons des méthodes pour faire face au problème de la sélection de variables, dans des situations pour lesquelles le nombre de variables peut être beaucoup plus grand que la taille de l’échantillon, avec présence possible d’une structure supplémentaire entre les variables, telle qu’une forte corrélation ou un certain ordre entre les variables successives. Les performances théoriques sont explorées ; nous montrons que sous certaines conditions de régularité, les méthodes proposées possèdent de bonnes propriétés statistiques, telles que des inégalités de parcimonie, la consistance au niveau de la sélection de variables et la normalité asymptotique.Dans un cadre bayésien, nous proposons une approche globale de la sélection de variables en régression construite sur les lois à priori g de Zellner dans une approche similaire mais non identique à celle de Liang et al. (2008) Notre choix ne nécessite aucune calibration. Nous comparons les approches de régularisation bayésienne et fréquentielle dans un contexte peu informatif où le nombre de variables est presque égal à la taille de l’échantillon. / We are interested in variable sélection in linear régression models. This research is motivated by recent development in microarrays, proteomics, brain images, among others. We study this problem in both frequentist and bayesian viewpoints.In a frequentist framework, we propose methods to deal with the problem of variable sélection, when the number of variables is much larger than the sample size with a possibly présence of additional structure in the predictor variables, such as high corrélations or order between successive variables. The performance of the proposed methods is theoretically investigated ; we prove that, under regularity conditions, the proposed estimators possess statistical good properties, such as Sparsity Oracle Inequalities, variable sélection consistency and asymptotic normality.In a Bayesian Framework, we propose a global noninformative approach for Bayesian variable sélection. In this thesis, we pay spécial attention to two calibration-free hierarchical Zellner’s g-priors. The first one is the Jeffreys prior which is not location invariant. A second one avoids this problem by only considering models with at least one variable in the model. The practical performance of the proposed methods is illustrated through numerical experiments on simulated and real world datasets, with a comparison betwenn Bayesian and frequentist approaches under a low informative constraint when the number of variables is almost equal to the number of observations. Réduction de la dimension Grandes dimensions Lasso Scad Elastic-net Sélection de modèles Propriétés d’Oracle Zellner’s g- prior Calibration Dimensionality réduction High dimensionality LASSO Scad Elastic-net Model selection Oracle property Zellner’s g-prior Calibration Scad

Search results