Return to search

Bridging Cognitive Gaps Between User and Model in Interactive Dimension Reduction

High-dimensional data is prevalent in all domains but is challenging to explore. Analysis and exploration of high-dimensional data are important for people in numerous fields. To help people explore and understand high-dimensional data, Andromeda, an interactive visual analytics tool, has been developed. However, our analysis uncovered several cognitive gaps relating to the Andromeda system: users do not realize the necessity of explicitly highlighting all the relevant data points; users are not clear about the dimensional information in the Andromeda visualization; and the Andromeda model cannot capture user intentions when constructing and deconstructing clusters. In this study, we designed and implemented solutions to address these gaps. Specifically, for the gap in highlighting all the relevant data points, we introduced a foreground and background view and distance lines. Our user study with a group of undergraduate students revealed that the foreground and background views and distance lines could significantly alleviate the highlighting issue. For the gap in understanding visualization dimensions, we implemented a dimension-assist feature. The results of a second user study with students with various backgrounds suggested that the dimension-assist feature could make it easier for users to find the extremum in one dimension and to describe correlations among multiple dimensions; however, the dimension-assist feature had only a small impact on characterizing the data distribution and assisting users in understanding the meanings of the weighted multidimensional scaling (WMDS) plot axes. Regarding the gap in creating and deconstructing clusters, we implemented a solution utilizing random sampling. A quantitative analysis of the random sampling strategy was performed, and the results demonstrated that the strategy improved Andromeda's capabilities in constructing and deconstructing clusters. We also applied the random sampling to two-point manipulations, making the Andromeda system more flexible and adaptable to differing data exploration tasks. Limitations are discussed, and potential future research directions are identified. / Master of Science / High-dimensional data is the dataset with hundreds or thousands of features. The animal dataset, which has been used in this study, is an example of high-dimensional dataset, since animals can be categorized by a lot of features, such as size, furry, behavior and so on. High-dimensional data is prevalent but difficult for people to analyze. For example, it is hard to find out the similarity among dozens of animals, or to find the relationship between different characterizations of animals. To help people with no statistical knowledge to analyze the high-dimensional dataset, our group developed a web-based visualization software called Andromeda, which can display data as points (such as animal data points) on a screen and allow people to interact with these points to express their similarity by dragging points on the screen (e.g., drag "Lion," "Wolf," and "Killer Whale" together because all three are hunters, forming a cluster of three animals). Therefore, it enables people to interactively analyze the hidden pattern of high-dimensional data. However, we identified several cognitive gaps that have negatively limited Andromeda's effectiveness in helping people understand high-dimensional data. Therefore, in this work, we intended to make improvements to the original Andromeda system to bridge these gaps, including designing new visual features to help people better understand how Andromeda processes and interacts with high-dimensional data and improving the underlying algorithm so that the Andromeda system can better understand people's intension during the data exploration process. We extensively evaluated our designs through both qualitative and quantitative analysis (e.g., user study on both undergraduate and graduate students and statistical testing) on our animal dataset, and the results confirmed that the improved Andromeda system outperformed the original version significantly in a series of high-dimensional data understanding tasks. Finally, the limitations and potential future research directions were discussed.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/106390
Date05 May 2020
CreatorsWang, Ming
ContributorsComputer Science, North, Christopher L., Polys, Nicholas F., House, Leanna L.
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0024 seconds