Spelling suggestions: "subject:"canonical correlation 2analysis"" "subject:"canonical correlation 3analysis""
1 |
Integrating Sequence and Structure for Annotating Proteins in the Twilight Zone: A Machine Learning ApproachIsye Arieshanti Unknown Date (has links)
Determining protein structure and function experimentally is both costly and time consuming. Transferring function-related protein annotations based on homology-based methods is relatively straightforward for proteins that have sequence identity of more than 40%. However, there are many proteins in the "twilight zone" where sequence similarity with any other protein is very weak, while being structurally similar to several. Such cases require methods that are capable of using and exploiting both sequence and structural similarity. To understand ways of how such methods can and should be designed is the focus of this study. In this thesis, models that use both sequence and structure features are applied on two protein prediction problems that are particularly challenging when relying on sequence alone. Enzyme classification benefits from both kinds of features because on one hand, enzymes can have identical function with limited sequence similarity while on the other hand, proteins with similar fold may have disparate enzyme class annotation. This thesis shows that the full integration of protein sequence and structure-related features (via the use of kernels) automatically places proteins with similar biological properties closer together, leading to superior classification accuracy using Support Vector Machines. Disulfide-bonds link residues in a protein structure, but may appear distant in sequence. Sequence similarity reflecting such structural properties is thus very hard to detect. It is sufficient for the structure to be similar for accurate prediction of disulfide-bonds, but such information is very scarce and predictors that rely on protein structure are not nearly as useful as those operating on sequence alone. This thesis proposes a novel approach based on Kernel Canonical Correlation Analysis that uses structural features during training only. It does so by finding sequence representations that correlate with structural features that are essential for a disulfide bond. The resulting representations enable high prediction accuracy for a range of disulfide-bond problems. The proposed model thus taps the advantage of structural features without requiring protein structure to be available in the prediction process. The merits of this approach should apply to a number of open protein structure prediction problems.
|
2 |
Inferring facial and body languageShan, Caifeng January 2008 (has links)
Machine analysis of human facial and body language is a challenging topic in computer vision, impacting on important applications such as human-computer interaction and visual surveillance. In this thesis, we present research building towards computational frameworks capable of automatically understanding facial expression and behavioural body language. The thesis work commences with a thorough examination in issues surrounding facial representation based on Local Binary Patterns (LBP). Extensive experiments with different machine learning techniques demonstrate that LBP features are efficient and effective for person-independent facial expression recognition, even in low-resolution settings. We then present and evaluate a conditional mutual information based algorithm to efficiently learn the most discriminative LBP features, and show the best recognition performance is obtained by using SVM classifiers with the selected LBP features. However, the recognition is performed on static images without exploiting temporal behaviors of facial expression. Subsequently we present a method to capture and represent temporal dynamics of facial expression by discovering the underlying low-dimensional manifold. Locality Preserving Projections (LPP) is exploited to learn the expression manifold in the LBP based appearance feature space. By deriving a universal discriminant expression subspace using a supervised LPP, we can effectively align manifolds of different subjects on a generalised expression manifold. Different linear subspace methods are comprehensively evaluated in expression subspace learning. We formulate and evaluate a Bayesian framework for dynamic facial expression recognition employing the derived manifold representation. However, the manifold representation only addresses temporal correlations of the whole face image, does not consider spatial-temporal correlations among different facial regions. We then employ Canonical Correlation Analysis (CCA) to capture correlations among face parts. To overcome the inherent limitations of classical CCA for image data, we introduce and formalise a novel Matrix-based CCA (MCCA), which can better measure correlations in 2D image data. We show this technique can provide superior performance in regression and recognition tasks, whilst requiring significantly fewer canonical factors. All the above work focuses on facial expressions. However, the face is usually perceived not as an isolated object but as an integrated part of the whole body, and the visual channel combining facial and bodily expressions is most informative. Finally we investigate two understudied problems in body language analysis, gait-based gender discrimination and affective body gesture recognition. To effectively combine face and body cues, CCA is adopted to establish the relationship between the two modalities, and derive a semantic joint feature space for the feature-level fusion. Experiments on large data sets demonstrate that our multimodal systems achieve the superior performance in gender discrimination and affective state analysis.
|
3 |
High-dimensional statistical data integrationJanuary 2019 (has links)
archives@tulane.edu / Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A representative model for the integrative analysis of multiple data types is to decompose each data matrix into a low-rank common-source matrix generated by latent factors shared across all data types, a low-rank distinctive-source matrix corresponding to each data type, and an additive noise matrix. We propose a novel decomposition method, called the decomposition-based generalized canonical correlation analysis, which appropriately defines those matrices by imposing a desirable orthogonality constraint on distinctive latent factors that aims to sufficiently capture the common latent factors. To further delineate the common and distinctive patterns between two data types, we propose another new decomposition method, called the common and distinctive pattern analysis. This method takes into account the common and distinctive information between the coefficient matrices of the common latent factors. We develop consistent estimation approaches for both proposed decompositions under high-dimensional settings, and demonstrate their finite-sample performance via extensive simulations. We illustrate the superiority of proposed methods over the state of the arts by real-world data examples obtained from The Cancer Genome Atlas and Human Connectome Project. / 1 / Zhe Qu
|
4 |
Relationships between Hospital-Centered and Multihospital-Centered Factors and Perceived Effectiveness: A Canonical Study of Nonprofit HospitalsYavas, Ugur, Romanova, Natalia 01 December 2003 (has links)
This article reports on the results of a survey which investigated the nature of relationships between hospital and multihospital organization-centered factors and background characteristics, and multihospital organization effectiveness. Canonical correlation is employed in analyzing the data. Results and their implications are discussed.
|
5 |
Canonical Correlation and Clustering for High Dimensional DataOuyang, Qing January 2019 (has links)
Multi-view datasets arise naturally in statistical genetics when the genetic
and trait profile of an individual is portrayed by two feature vectors.
A motivating problem concerning the Skin Intrinsic Fluorescence (SIF)
study on the Diabetes Control and Complications Trial (DCCT) subjects
is presented. A widely applied quantitative method to explore the correlation
structure between two domains of a multi-view dataset is the
Canonical Correlation Analysis (CCA), which seeks the canonical loading
vectors such that the transformed canonical covariates are maximally
correlated. In the high dimensional case, regularization of the dataset is
required before CCA can be applied. Furthermore, the nature of genetic
research suggests that sparse output is more desirable. In this thesis, two
regularized CCA (rCCA) methods and a sparse CCA (sCCA) method
are presented. When correlation sub-structure exists, stand-alone CCA
method will not perform well. To tackle this limitation, a mixture of
local CCA models can be employed. In this thesis, I review a correlation
clustering algorithm proposed by Fern, Brodley and Friedl (2005),
which seeks to group subjects into clusters such that features are identically
correlated within each cluster. An evaluation study is performed
to assess the effectiveness of CCA and correlation clustering algorithms
using artificial multi-view datasets. Both sCCA and sCCA-based correlation
clustering exhibited superior performance compare to the rCCA and
rCCA-based correlation clustering. The sCCA and the sCCA-clustering
are applied to the multi-view dataset consisted of PrediXcan imputed gene
expression and SIF measurements of DCCT subjects. The stand-alone
sparse CCA method identified 193 among 11538 genes being correlated
with SIF#7. Further investigation of these 193 genes with simple linear
regression and t-test revealed that only two genes, ENSG00000100281.9
and ENSG00000112787.8, were significance in association with SIF#7. No
plausible clustering scheme was detected by the sCCA based correlation
clustering method. / Thesis / Master of Science (MSc)
|
6 |
The Effects of Cognitive Styles on Summarization of Expository TextMast, Cynda Overton 08 1900 (has links)
The study investigated the relationship among three cognitive styles and summarization abilities. Both summarization products and processes were examined. Summarizing products were scored and a canonical correlation analysis was performed to determine their relationship with three cognitive styles. Summarizing processes were examined by videotaping students as they provided think aloud protocols. Their processes were recorded on composing style sheets and analyzed qualitatively.
Subjects were sixth-grade students in self-contained classes in a suburban school district. Summarizing products were collected over a two week period in the fall. Summarizing processes were collected over an eight week period in the spring of the same school year.
The results of the summarizing products analysis suggest that cognitive styles are related to summarization abilities. Two canonical correlations among the two variable sets were statistically significant at the .05 level of significance (.33 and .29). The results further suggest that students who are field independent, reflective, and flexible in their attentional style may be more adept at organizing their ideas and using written mechanics while summarizing. Students who are impulsive and constricted in attentional style may exhibit strength in expressing their ideas while summarizing.
Results of the summarizing processes analysis suggest that students of one cognitive style combination may exhibit different behaviors while summarizing than those of other cognitive style combinations. Students who are field independent, reflective, and flexible in their attentional style seem to display more mature, interactive behaviors while summarizing than their peers of other cognitive style combinations.
|
7 |
Computational Medical Image Analysis : With a Focus on Real-Time fMRI and Non-Parametric StatisticsEklund, Anders January 2012 (has links)
Functional magnetic resonance imaging (fMRI) is a prime example of multi-disciplinary research. Without the beautiful physics of MRI, there wouldnot be any images to look at in the first place. To obtain images of goodquality, it is necessary to fully understand the concepts of the frequencydomain. The analysis of fMRI data requires understanding of signal pro-cessing, statistics and knowledge about the anatomy and function of thehuman brain. The resulting brain activity maps are used by physicians,neurologists, psychologists and behaviourists, in order to plan surgery andto increase their understanding of how the brain works. This thesis presents methods for real-time fMRI and non-parametric fMRIanalysis. Real-time fMRI places high demands on the signal processing,as all the calculations have to be made in real-time in complex situations.Real-time fMRI can, for example, be used for interactive brain mapping.Another possibility is to change the stimulus that is given to the subject, inreal-time, such that the brain and the computer can work together to solvea given task, yielding a brain computer interface (BCI). Non-parametricfMRI analysis, for example, concerns the problem of calculating signifi-cance thresholds and p-values for test statistics without a parametric nulldistribution. Two BCIs are presented in this thesis. In the first BCI, the subject wasable to balance a virtual inverted pendulum by thinking of activating theleft or right hand or resting. In the second BCI, the subject in the MRscanner was able to communicate with a person outside the MR scanner,through a virtual keyboard. A graphics processing unit (GPU) implementation of a random permuta-tion test for single subject fMRI analysis is also presented. The randompermutation test is used to calculate significance thresholds and p-values forfMRI analysis by canonical correlation analysis (CCA), and to investigatethe correctness of standard parametric approaches. The random permuta-tion test was verified by using 10 000 noise datasets and 1484 resting statefMRI datasets. The random permutation test is also used for a non-localCCA approach to fMRI analysis.
|
8 |
Customer Satisfaction AnalysisFuna, Laura January 2011 (has links)
The objective of this master thesis is to identify “key-drivers” embedded in customer satisfaction data. The data was collected by a large transportation sector corporation during five years and in four different countries. The questionnaire involved several different sections of questions and ranged from demographical information to satisfaction attributes with the vehicle, dealer and several problem areas. Various regression, correlation and cooperative game theory approaches were used to identify the key satisfiers and dissatisfiers. The theoretical and practical advantages of using the Shapley value, Canonical Correlation Analysis and Hierarchical Logistic Regression has been demonstrated and applied to market research.
|
9 |
Infinite dimensional discrimination and classificationShin, Hyejin 17 September 2007 (has links)
Modern data collection methods are now frequently returning observations that should
be viewed as the result of digitized recording or sampling from stochastic processes rather
than vectors of finite length. In spite of great demands, only a few classification methodologies
for such data have been suggested and supporting theory is quite limited. The focus of
this dissertation is on discrimination and classification in this infinite dimensional setting.
The methodology and theory we develop are based on the abstract canonical correlation
concept of Eubank and Hsing (2005), and motivated by the fact that Fisher's discriminant
analysis method is intimately tied to canonical correlation analysis. Specifically, we have
developed a theoretical framework for discrimination and classification of sample paths
from stochastic processes through use of the Loeve-Parzen isomorphism that connects a
second order process to the reproducing kernel Hilbert space generated by its covariance
kernel. This approach provides a seamless transition between the finite and infinite dimensional
settings and lends itself well to computation via smoothing and regularization. In
addition, we have developed a new computational procedure and illustrated it with simulated
data and Canadian weather data.
|
10 |
Multi-Label Dimensionality ReductionJanuary 2011 (has links)
abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms. / Dissertation/Thesis / Ph.D. Computer Science 2011
|
Page generated in 0.1379 seconds