Return to search

Mining Complex High-Order Datasets

Selection of an appropriate structure for storage and analysis of complex datasets is a vital but often overlooked decision in the design of data mining and machine learning experiments. Most present techniques impose a matrix structure on the dataset, with rows representing observations and columns representing features. While this assumption is reasonable when features are scalar and do not exhibit co-dependence, the matrix data model becomes inappropriate when dependencies between non-target features must be modeled in parallel, or when features naturally take the form of higher-order multilinear structures. Such datasets particularly abound in functional medical imaging modalities, such as fMRI, where accurate integration of both spatial and temporal information is critical. Although necessary to take full advantage of the high-order structure of these datasets and built on well-studied mathematical tools, tensor analysis methodologies have only recently entered widespread use in the data mining community and remain relatively absent from the literature within the biomedical domain. Furthermore, naive tensor approaches suffer from fundamental efficiency problems which limit their practical use in large-scale high-order mining and do not capture local neighborhoods necessary for accurate spatiotemporal analysis. To address these issues, a comprehensive framework based on wavelet analysis, tensor decomposition, and the WaveCluster algorithm is proposed for addressing the problems of preprocessing, classification, clustering, compression, feature extraction, and latent concept discovery on large-scale high-order datasets, with a particular emphasis on applications in computer-assisted diagnosis. Our framework is evaluated on a 9.3 GB fMRI motor task dataset of both high dimensionality and high order, performing favorably against traditional voxelwise and spectral methods of analysis, discovering latent concepts suggestive of subject handedness, and reducing space and time complexities by up to two orders of magnitude. Novel wavelet and tensor tools are derived in the course of this work, including a novel formulation of an r-dimensional wavelet transform in terms of elementary tensor operations and an enhanced WaveCluster algorithm capable of clustering real-valued as well as binary data. Sparseness-exploiting properties are demonstrated and variations of core algorithms for specialized tasks such as image segmentation are presented. / Computer and Information Science

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/746
Date January 2010
CreatorsBarnathan, Michael
ContributorsMegalooikonomou, Vasilis, Obradovic, Zoran, Lakaemper, Rolf, Mohamed, Feroze B., Faro, Scott H.
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format167 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/728, Theses and Dissertations

Page generated in 0.0022 seconds