Return to search

NEW BIOINFORMATIC TECHNIQUES FOR THE ANALYSIS OF LARGE DATASETS

A new era of chemical analysis is upon us. In the past, a small number of samples were selected from a population for use as a statistical representation of the entire population. More recently, advancements in data collection rate, computer memory, and processing speed have allowed entire populations to be sampled and analyzed. The result is massive amounts of data that convey relatively little information, even though they may contain a lot of information. These large quantities of data have already begun to cause bottlenecks in areas such as genetics, drug development, and chemical imaging. The problem is straightforward: condense a large quantity of data into only the useful portions without ignoring or discarding anything important. Performing the condensation in the hardware of the instrument, before the data ever reach a computer is even better. The research proposed tests the hypothesis that clusters of data may be rapidly identified by linear fitting of quantile-quantile plots produced from each principal component of principal component analysis. Integrated Sensing and Processing (ISP) is tested as a means of generating clusters of principal component scores from samples in a hyperspectral near-field scanning optical microscope. Distances from the centers of these multidimensional cluster centers to all other points in hyperspace can be calculated. The result is a novel digital staining technique for identifying anomalies in hyperspectral microscopic and nanoscopic imaging of human atherosclerotic tissue. This general method can be applied to other analytical problems as well.

Identiferoai:union.ndltd.org:uky.edu/oai:uknowledge.uky.edu:gradschool_diss-1547
Date01 January 2007
CreatorsHarris, Justin Clay
PublisherUKnowledge
Source SetsUniversity of Kentucky
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceUniversity of Kentucky Doctoral Dissertations

Page generated in 0.0019 seconds