Global ETD Search

11	Integration of computational methods and visual analytics for large-scale high-dimensional data Choo, Jae gul 20 September 2013 (has links) With the increasing amount of collected data, large-scale high-dimensional data analysis is becoming essential in many areas. These data can be analyzed either by using fully computational methods or by leveraging human capabilities via interactive visualization. However, each method has its drawbacks. While a fully computational method can deal with large amounts of data, it lacks depth in its understanding of the data, which is critical to the analysis. With the interactive visualization method, the user can give a deeper insight on the data but suffers when large amounts of data need to be analyzed. Even with an apparent need for these two approaches to be integrated, little progress has been made. As ways to tackle this problem, computational methods have to be re-designed both theoretically and algorithmically, and the visual analytics system has to expose these computational methods to users so that they can choose the proper algorithms and settings. To achieve an appropriate integration between computational methods and visual analytics, the thesis focuses on essential computational methods for visualization, such as dimension reduction and clustering, and it presents fundamental development of computational methods as well as visual analytic systems involving newly developed methods. The contributions of the thesis include (1) the two-stage dimension reduction framework that better handles significant information loss in visualization of high-dimensional data, (2) efficient parametric updating of computational methods for fast and smooth user interactions, and (3) an iteration-wise integration framework of computational methods in real-time visual analytics. The latter parts of the thesis focus on the development of visual analytics systems involving the presented computational methods, such as (1) Testbed: an interactive visual testbed system for various dimension reduction and clustering methods, (2) iVisClassifier: an interactive visual classification system using supervised dimension reduction, and (3) VisIRR: an interactive visual information retrieval and recommender system for large-scale document data. Dimension reduction Clustering High-dimensional data Visualization Visual analytics Dimensional analysis Data structures (Computer science) Information visualization Visual analytics Mathematical statistics Data processing
12	A nonparametric Bayesian perspective for machine learning in partially-observed settings Akova, Ferit 31 July 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Robustness and generalizability of supervised learning algorithms depend on the quality of the labeled data set in representing the real-life problem. In many real-world domains, however, we may not have full knowledge of the underlying data-generating mechanism, which may even have an evolving nature introducing new classes continually. This constitutes a partially-observed setting, where it would be impractical to obtain a labeled data set exhaustively defined by a fixed set of classes. Traditional supervised learning algorithms, assuming an exhaustive training library, would misclassify a future sample of an unobserved class with probability one, leading to an ill-defined classification problem. Our goal is to address situations where such assumption is violated by a non-exhaustive training library, which is a very realistic yet an overlooked issue in supervised learning. In this dissertation we pursue a new direction for supervised learning by defining self-adjusting models to relax the fixed model assumption imposed on classes and their distributions. We let the model adapt itself to the prospective data by dynamically adding new classes/components as data demand, which in turn gradually make the model more representative of the entire population. In this framework, we first employ suitably chosen nonparametric priors to model class distributions for observed as well as unobserved classes and then, utilize new inference methods to classify samples from observed classes and discover/model novel classes for those from unobserved classes. This thesis presents the initiating steps of an ongoing effort to address one of the most overlooked bottlenecks in supervised learning and indicates the potential for taking new perspectives in some of the most heavily studied areas of machine learning: novelty detection, online class discovery and semi-supervised learning. Statistical decision Nonparametric statistics -- Research Mathematical statistics Stochastic processes Boosting (Algorithms) Statistics -- Data processing Machine learning Computational linguistics Data mining Computational intelligence

Search results

Integration of computational methods and visual analytics for large-scale high-dimensional data

A nonparametric Bayesian perspective for machine learning in partially-observed settings