1 |
On error bounds for linear feature extraction /Thangavelu, Madan Kumar. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2010. / Printout. Includes bibliographical references (leaves 67-71). Also available on the World Wide Web.
|
2 |
Immunologically amplified knowledge and intentions dimensionality reduction in cooperative multi-agent systemsCoulter, Duncan Anthony 08 October 2014 (has links)
Ph.D. (Computer Science) / The development of software systems is a relatively recent field of human endeavour. Even so, it has followed a steady progression of dominant paradigms which have incrementally improved the ease with which developers are able to express the logic and structure of their systems. The initially unstructured era of free-form spaghetti code gave way to structured programming in which the entry and exit points of functional units were well defined through the creation of abstractions such as procedures, sub-routines and functions. The problem of correctly associating data with the set of operations which are legal on this data was addressed through the concept of encapsulation with the onset of object-oriented programming. Object orientation also introduced a set of abstractions for safe code reuse through inheritance and dynamic polymorphism as well as composition/aggregation and delegation. The agent-oriented software development paradigm, when viewed as an extension of object orientation, adds the capacity of agent autonomy to an object, which allows it to select for itself which of its operations it will execute at any point in time. In addition, the separation between an agent and the environment within which it is embedded must be well defined. Agent autonomy allows for the modelling and development of loosely coupled systems with the capacity for complex emergent behaviour. The mapping of a given set of environmental percepts to an agent's operation selection defines its agent function and hence its emergent behaviour. Furthermore, agents may also be embedded into a shared environment together with other agents forming a multi-agent system. The emergent characteristics of such systems are defined not only through changes in environment state but also via agent to agent interactions. Multi-agent systems are categorised into cooperative or competitive based on whether all the agents within the system share a common goal. An argument is presented that even within cooperative multi-agent systems selfishness will emerge as a direct consequence of computational intractability. The core of the argument centres on the finite nature of the computational resources available to an agent which must be divided between the evaluation of the usefulness of other agent's knowledge and intentions towards improving the collective utility of the system and directly acting upon its own. As a direct result of the halting problem it is impossible for an agent to ascertain in general whether another agent's plans are even feasible (i.e. will result in the system reaching a goal state). As a direct consequence of such a limitation agents will in general favour their own courses of action over those of others and hence an emergent selfishness occurs even in ostensibly cooperative systems...
|
3 |
Study of Single and Ensemble Machine Learning Models on Credit Data to Detect Underlying Non-performing LoansLi, Qiongzhu January 2016 (has links)
In this paper, we try to compare the performance of two feature dimension reduction methods, the LASSO and PCA. Both simulation study and empirical study show that the LASSO is superior to PCA when selecting significant variables. We apply Logistics Regression (LR), Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT) and their corresponding ensemble machines constructed by bagging and adaptive boosting (adaboost) in our study. Three experiments are conducted to explore the impact of class-unbalanced data set on all models. Empirical study indicates that when the percentage of performing loans exceeds 83.3%, the training models shall be carefully applied. When we have class-balanced data set, ensemble machines indeed have a better performance over single machines. The weaker the single machine, the more obvious the improvement we can observe.
|
4 |
Dimension Reduction and LASSO using Pointwise and Group NormsJutras, Melanie A 11 December 2018 (has links)
Principal Components Analysis (PCA) is a statistical procedure commonly used for the purpose of analyzing high dimensional data. It is often used for dimensionality reduction, which is accomplished by determining orthogonal components that contribute most to the underlying variance of the data. While PCA is widely used for identifying patterns and capturing variability of data in lower dimensions, it has some known limitations. In particular, PCA represents its results as linear combinations of data attributes. PCA is therefore, often seen as difficult to interpret and because of the underlying optimization problem that is being solved it is not robust to outliers. In this thesis, we examine extensions to PCA that address these limitations. Specific techniques researched in this thesis include variations of Robust and Sparse PCA as well as novel combinations of these two methods which result in a structured low-rank approximation that is robust to outliers. Our work is inspired by the well known machine learning methods of Least Absolute Shrinkage and Selection Operator (LASSO) as well as pointwise and group matrix norms. Practical applications including robust and non-linear methods for anomaly detection in Domain Name System network data as well as interpretable feature selection with respect to a website classification problem are discussed along with implementation details and techniques for analysis of regularization parameters.
|
5 |
Dimension Reduction and Clustering of High Dimensional Data using a Mixture of Generalized Hyperbolic DistributionsPathmanathan, Thinesh January 2018 (has links)
Model-based clustering is a probabilistic approach that views each cluster as a component
in an appropriate mixture model. The Gaussian mixture model is one of the
most widely used model-based methods. However, this model tends to perform poorly
when clustering high-dimensional data due to the over-parametrized solutions that
arise in high-dimensional spaces. This work instead considers the approach of combining
dimension reduction techniques with clustering via a mixture of generalized
hyperbolic distributions. The dimension reduction techniques, principal component
analysis and factor analysis along with their extensions were reviewed. Then the aforementioned
dimension reduction techniques were individually paired with the mixture
of generalized hyperbolic distributions in order to demonstrate the clustering performance
achieved under each method using both simulated and real data sets. For a
majority of the data sets, the clustering method utilizing principal component analysis
exhibited better classi cation results compared to the clustering method based
on the extending the factor analysis model. / Thesis / Master of Science (MSc)
|
6 |
ImageSI: Interactive Deep Learning for Image Semantic InteractionLin, Jiayue 04 June 2024 (has links)
Interactive deep learning frameworks are crucial for effectively exploring and analyzing complex image datasets in visual analytics. However, existing approaches often face challenges related to inference accuracy and adaptability. To address these issues, we propose ImageSI, a framework integrating deep learning models with semantic interaction techniques for interactive image data analysis. Unlike traditional methods, ImageSI directly incorporates user feedback into the image model, updating underlying embeddings through customized loss functions, thereby enhancing the performance of dimension reduction tasks. We introduce three variations of ImageSI, ImageSI$_{text{MDS}^{-1}}$, prioritizing explicit pairwise relationships from user interaction, and ImageSI$_{text{DRTriplet}}$ and ImageSI$_{text{PHTriplet}}$, emphasizing clustering by defining groups of images based on user input. Through usage scenarios and quantitative analyses centered on algorithms, we demonstrate the superior performance of ImageSI$_{text{DRTriplet}}$ and ImageSI$_{text{MDS}^{-1}}$ in terms of inference accuracy and interaction efficiency. Moreover, ImageSI$_{text{PHTriplet}}$ shows competitive results. The baseline model, WMDS$^{-1}$, generally exhibits lower performance metrics. / Master of Science / Interactive deep learning frameworks are crucial for effectively exploring and analyzing complex image datasets in visual analytics. However, existing approaches often face challenges related to inference accuracy and adaptability. To address these issues, we propose ImageSI, a framework integrating deep learning models with semantic interaction techniques for interactive image data analysis. Unlike traditional methods, ImageSI directly incorporates user feedback into the image model, updating underlying embeddings through customized loss functions, thereby enhancing the performance of dimension reduction tasks. We introduce three variations of ImageSI, ImageSI$_{text{MDS}^{-1}}$, prioritizing explicit pairwise relationships from user interaction, and ImageSI$_{text{DRTriplet}}$ and ImageSI$_{text{PHTriplet}}$, emphasizing clustering by defining groups of images based on user input. Through usage scenarios and quantitative analyses centered on algorithms, we demonstrate the superior performance of ImageSI$_{text{DRTriplet}}$ and ImageSI$_{text{MDS}^{-1}}$ in terms of inference accuracy and interaction efficiency. Moreover, ImageSI$_{text{PHTriplet}}$ shows competitive results. The baseline model, WMDS$^{-1}$, generally exhibits lower performance metrics.
|
7 |
Explainable Interactive Projections for Image DataHan, Huimin 12 January 2023 (has links)
Making sense of large collections of images is difficult. Dimension reductions (DR) assist by organizing images in a 2D space based on similarities, but provide little support for explaining why images were placed together or apart in the 2D space. Additionally, they do not provide support for modifying and updating the 2D space to explore new relationships and organizations of images. To address these problems, we present an interactive DR method for images that uses visual features extracted by a deep neural network to project the images into 2D space and provides visual explanations of image features that contributed to the 2D location. In addition, it allows people to directly manipulate the 2D projection space to define alternative relationships and explore subsequent projections of the images. With an iterative cycle of semantic interaction and explainable-AI feedback, people can explore complex visual relationships in image data. Our approach to human-AI interaction integrates visual knowledge from both human mental models and pre-trained deep neural models to explore image data. Two usage scenarios are provided to demonstrate that our method is able to capture human feedback and incorporate it into the model. Our visual explanations help bridge the gap between the feature space and the original images to illustrate the knowledge learned by the model, creating a synergy between human and machine that facilitates a more complete analysis experience. / Master of Science / High-dimensional data is everywhere. A spreadsheet with many columns, text documents, images, ... ,etc. Exploring and visualizing high-dimensional data can be challenging. Dimension reduction (DR) techniques can help. High dimensional data can be projected into 3d or 2d space and visualized as a scatter plot.Additionally, DR tool can be interactive to help users better explore data and understand underlying algorithms. Designing such interactive DR tool is challenging for images. To address this problem, this thesis presents a tool that can visualize images to a 2D plot, data points that are considered similar are projected close to each other and vice versa. Users can manipulate images directly on this scatterplot-like visualization based on own knowledge to update the display, saliency maps are provided to reflect model's re-projection reasoning.
|
8 |
Dimension Reduction and Clustering for Interactive Visual AnalyticsWenskovitch Jr, John Edward 06 September 2019 (has links)
When exploring large, high-dimensional datasets, analysts often utilize two techniques for reducing the data to make exploration more tractable. The first technique, dimension reduction, reduces the high-dimensional dataset into a low-dimensional space while preserving high-dimensional structures. The second, clustering, groups similar observations while simultaneously separating dissimilar observations. Existing work presents a number of systems and approaches that utilize these techniques; however, these techniques can cooperate or conflict in unexpected ways.
The core contribution of this work is the systematic examination of the design space at the intersection of dimension reduction and clustering when building intelligent, interactive tools in visual analytics. I survey existing techniques for dimension reduction and clustering algorithms in visual analytics tools, and I explore the design space for creating projections and interactions that include dimension reduction and clustering algorithms in the same visual interface. Further, I implement and evaluate three prototype tools that implement specific points within this design space. Finally, I run a cognitive study to understand how analysts perform dimension reduction (spatialization) and clustering (grouping) operations. Contributions of this work include surveys of existing techniques, three interactive tools and usage cases demonstrating their utility, design decisions for implementing future tools, and a presentation of complex human organizational behaviors. / Doctor of Philosophy / When an analyst is exploring a dataset, they seek to gain insight from the data. With data sets growing larger, analysts require techniques to help them reduce the size of the data while still maintaining its meaning. Two commonly-utilized techniques are dimension reduction and clustering. Dimension reduction seeks to eliminate unnecessary features from the data, reducing the number of columns to a smaller number. Clustering seeks to group similar objects together, reducing the number of rows to a smaller number. The contribution of this work is to explore how dimension reduction and clustering are currently being used in interactive visual analytics systems, as well as to explore how they could be used to address challenges faced by analysts in the future. To do so, I survey existing techniques and explore the design space for creating visualizations that incorporate both types of computations. I look at methods by which an analyst could interact with those projections in other to communicate their interests to the system, thereby producing visualizations that better match the needs of the analyst. I develop and evaluate three tools that incorporate both dimension reduction and clustering in separate computational pipelines. Finally, I conduct a cognitive study to better understand how users think about these operations, in order to create guidelines for better systems in the future.
|
9 |
On Sufficient Dimension Reduction via Asymmetric Least SquaresSoale, Abdul-Nasah, 0000-0003-2093-7645 January 2021 (has links)
Accompanying the advances in computer technology is an increase collection of high dimensional data in many scientific and social studies. Sufficient dimension reduction (SDR) is a statistical method that enable us to reduce the dimension ofpredictors without loss of regression information. In this dissertation, we introduce principal asymmetric least squares (PALS) as a unified framework for linear and nonlinear sufficient dimension reduction. Classical methods such as sliced inverse regression (Li, 1991) and principal support vector machines (Li, Artemiou and Li, 2011) often do not perform well in the presence of heteroscedastic error, while our proposal addresses this limitation by synthesizing different expectile levels. Through extensive numerical studies, we demonstrate the superior performance of PALS in terms of both computation time and estimation accuracy. For the asymptotic analysis of PALS for linear sufficient dimension reduction, we develop new tools to compute the derivative of an expectation of a non-Lipschitz function.
PALS is not designed to handle symmetric link function between the response and the predictors. As a remedy, we develop expectile-assisted inverse regression estimation (EA-IRE) as a unified framework for moment-based inverse regression. We propose to first estimate the expectiles through kernel expectile regression, and then carry out dimension reduction based on random projections of the regression expectiles. Several popular inverse regression methods in the literature including slice inverse regression, slice average variance estimation, and directional regression are extended under this general framework. The proposed expectile-assisted methods outperform existing moment-based dimension reduction methods in both numerical studies and an analysis of the Big Mac data. / Statistics
|
10 |
Sufficient Dimension Reduction in Complex DatasetsYang, Chaozheng January 2016 (has links)
This dissertation focuses on two problems in dimension reduction. One is using permutation approach to test predictor contribution. The permutation approach applies to marginal coordinate tests based on dimension reduction methods such as SIR, SAVE and DR. This approach no longer requires calculation of the method-specific weights to determine the asymptotic null distribution. The other one is through combining clustering method with robust regression (least absolute deviation) to estimate dimension reduction subspace. Compared with ordinary least squares, the proposed method is more robust to outliers; also, this method replaces the global linearity assumption with the more flexible local linearity assumption through k-means clustering. / Statistics
|
Page generated in 0.0225 seconds