Global ETD Search

1	Evaluating PrediXcan’s Ability to Predict Differential Expression Between Alcoholics and Non-Alcoholics Drake, John E, Jr 01 January 2019 (has links) PrediXcan is a recent software for the imputation of gene expression from genotype data alone. Using an overlapping set of transcriptome datasets from postmortem brain tissues of donors with alcohol use disorder and neurotypical controls, which were generated by two different platforms (e.g., Arraystar and Affymetrix), and an additional unrelated transcriptome dataset from lung tissue, we sought to evaluate PrediXcan’s ability to impute gene expression and identify differentially expressed genes. From the Arraystar platform, 1.3% of matched genes between the measured and imputed expression had a Pearson correlation ≥ 0.5. Our attempt to replicate this finding using the expression data from the Affymetrix platform also lead to a similarly poor outcome (2.7%). Our third attempt using the transcriptome data from lung tissue produced similar results (1.1%) but performance improved markedly after filtering out genes with a low predicted R2, which was a model metric provided by the PrediXcan authors. For example, filtering out genes with a predicted R2 below 0.6 led to 16 genes remaining and a Pearson correlation of 0.365 between the measured and imputed expression. We were unable to reproduce similar performance gains with filtering the Arraystar or Affymetrix alcohol use disorder datasets. Given that PrediXcan can impute a narrow portion of the transcriptome, which is further reduced significantly by filtering, we believe caution is warranted with the interpretation of results derived from PrediXcan. PrediXcan bioinformatics limma alcoholism imputation Bioinformatics
2	Canonical Correlation and Clustering for High Dimensional Data Ouyang, Qing January 2019 (has links) Multi-view datasets arise naturally in statistical genetics when the genetic and trait profile of an individual is portrayed by two feature vectors. A motivating problem concerning the Skin Intrinsic Fluorescence (SIF) study on the Diabetes Control and Complications Trial (DCCT) subjects is presented. A widely applied quantitative method to explore the correlation structure between two domains of a multi-view dataset is the Canonical Correlation Analysis (CCA), which seeks the canonical loading vectors such that the transformed canonical covariates are maximally correlated. In the high dimensional case, regularization of the dataset is required before CCA can be applied. Furthermore, the nature of genetic research suggests that sparse output is more desirable. In this thesis, two regularized CCA (rCCA) methods and a sparse CCA (sCCA) method are presented. When correlation sub-structure exists, stand-alone CCA method will not perform well. To tackle this limitation, a mixture of local CCA models can be employed. In this thesis, I review a correlation clustering algorithm proposed by Fern, Brodley and Friedl (2005), which seeks to group subjects into clusters such that features are identically correlated within each cluster. An evaluation study is performed to assess the effectiveness of CCA and correlation clustering algorithms using artificial multi-view datasets. Both sCCA and sCCA-based correlation clustering exhibited superior performance compare to the rCCA and rCCA-based correlation clustering. The sCCA and the sCCA-clustering are applied to the multi-view dataset consisted of PrediXcan imputed gene expression and SIF measurements of DCCT subjects. The stand-alone sparse CCA method identified 193 among 11538 genes being correlated with SIF#7. Further investigation of these 193 genes with simple linear regression and t-test revealed that only two genes, ENSG00000100281.9 and ENSG00000112787.8, were significance in association with SIF#7. No plausible clustering scheme was detected by the sCCA based correlation clustering method. / Thesis / Master of Science (MSc) Machine Learning Correlation Clustering Sparse Canonical Correlation Analysis Skin Intrinsic Fluorescence Multi-view dataset Lasso Dimensionality reduction PrediXcan High dimensional data

Search results

Evaluating PrediXcan’s Ability to Predict Differential Expression Between Alcoholics and Non-Alcoholics

Canonical Correlation and Clustering for High Dimensional Data