• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • Tagged with
  • 4
  • 4
  • 2
  • 2
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Poisson multiscale methods for high-throughput sequencing data

Xing, Zhengrong 21 December 2016 (has links)
<p> In this dissertation, we focus on the problem of analyzing data from high-throughput sequencing experiments. With the emergence of more capable hardware and more efficient software, these sequencing data provide information at an unprecedented resolution. However, statistical methods developed for such data rarely tackle the data at such high resolutions, and often make approximations that only hold under certain conditions. </p><p> We propose a model-based approach to dealing with such data, starting from a single sample. By taking into account the inherent structure present in such data, our model can accurately capture important genomic regions. We also present the model in such a way that makes it easily extensible to more complicated and biologically interesting scenarios. </p><p> Building upon the single-sample model, we then turn to the statistical question of detecting differences between multiple samples. Such questions often arise in the context of expression data, where much emphasis has been put on the problem of detecting differential expression between two groups. By extending the framework for a single sample to incorporate additional group covariates, our model provides a systematic approach to estimating and testing for such differences. We then apply our method to several empirical datasets, and discuss the potential for further applications to other biological tasks. </p><p> We also seek to address a different statistical question, where the goal here is to perform exploratory analysis to uncover hidden structure within the data. We incorporate the single-sample framework into a commonly used clustering scheme, and show that our enhanced clustering approach is superior to the original clustering approach in many ways. We then apply our clustering method to a few empirical datasets and discuss our findings. </p><p> Finally, we apply the shrinkage procedure used within the single-sample model to tackle a completely different statistical issue: nonparametric regression with heteroskedastic Gaussian noise. We propose an algorithm that accurately recovers both the mean and variance functions given a single set of observations, and demonstrate its advantages over state-of-the art methods through extensive simulation studies.</p>
2

Bayesian lasso| An extension for genome-wide association study

Joo, LiJin 24 March 2017 (has links)
<p>In genome-wide association study (GWAS), variable selection has been used for prioritizing candidate single-nucleotide polymorphism (SNP). Relating densely located SNPs to a complex trait, we need a method that is robust under various genetic architectures, yet is sensitive enough to detect the marginal difference between null and non-null factors. For this problem, ordinary Lasso produced too many false positives, and Bayesian Lasso by Gibbs samplers became too conservative when selection criterion was posterior credible sets. My proposals to improve Bayesian Lasso include two aspects: To use stochastic approximation, variational Bayes for increasing computational efficiency and to use a Dirichlet-Laplace prior for separating small effects from nulls better. Both a double exponential prior of Bayesian Lasso and a Dirichlet-Laplace prior have a global-local mixture representation, and variational Bayes can effectively handle the hierarchies of a model due to the mixture representation. In the analysis of simulated and real sequencing data, the proposed methods showed meaningful improvements on both efficiency and accuracy.
3

Computational approaches for intelligent processing of biomedical data

Mirina, Alexandra 18 December 2015 (has links)
<p> The rapid development of novel experimental techniques has led to the generation of an abundance of biological data, which holds great potential for elucidating many scientific problems. The analysis of such complex heterogeneous information, which we often have to deal with, requires appropriate state-of-the-art analytical methods. Here we demonstrate how an unconventional approach and intelligent data processing can lead to meaningful results.</p><p> This work includes three major parts. In the first part we describe a correction methodology for genome-wide association studies (GWAS). We demonstrate the existing bias for the selection of larger genes for downstream analyses in GWA studies and propose a method to adjust for this bias. Thus, we effectively show the need for data preprocessing in order to obtain a biologically relevant result. In the second part, building on the results obtained in the first part, we attempt to elucidate the underlying mechanisms of aging and longevity by conducting a longevity GWAS. Here we took an unconventional approach to the GWAS analysis by applying the idea of genetic buffering. Doing this allowed us to identify pairs of genetic markers that play a role in longevity. Furthermore, we were able to confirm some of them by means of a downstream network analysis. In the third and final part, we discuss the characteristics of chronic lymphocytic leukemia (CLL) B-cells and perform clustering analysis based on immunoglobulin (Ig) mutation patterns. By comparing the sequences of Ig of CLL patients and healthy donors, we show that different Ig heavy chain (IGHV) regions in CLL exhibit similarities with different B-cell subtypes of healthy donors, which raised a question about the single origin of CLL cases.</p>
4

Subgroup Analysis of Patients with Hepatocellular Carcinoma| A Quest for Statistical Algorithms for Tissue Classification Problem

Ong, Vy Quoc 06 November 2018 (has links)
<p> Hepatocellular carcinoma (HCC) is the most common type of liver cancer. This type of cancer has been observed with prevalence as the third leading cause of death from cancer worldwide and as the ninth leading cause of cancerous mortality in the United States. People with hepatitis B or C are considered to be at high risk for this kind of cancer. Remarkably poor prognostic HCC patients with low survival rates commonly possess intra-hepatic metastases that are either tumor thrombi in the portal vein or intra-hepatic spread. It is uncommon for them to die of extra-hepatic metastases. Therefore, identifying metastatic HCC has become vital and clinically challenging in efforts of timely therapeutic intervention to improve the survival rate of patients who suffer from this disease. </p><p> To date, studies that look for an accurate molecular profiling model have been developed to identify these patients in advance for a better treatment or intervention. An approach has been to focus on identifying individual candidate genes characterizing metastatic HCC. Another direction has been to find a global genome scale solution by using microarray technology to obtain a gene expression for this carcinoma. Among research following the latter was that developed by Qing-Hai Ye et al., <i>Nature Medicine</i>, Volume 9, Number 4, April 2003. They applied cDNA microarray-based gene expression profiling with compound co-variate predictors for primary HCC, metastatic HCC, and metastasis-free HCC binary classification tasks on a dataset of 87 observations and 9984 features taken from 40 hepatitis B-positive Chinese patients. Notably, a robust 153-gene model was generated to successfully classify tumor-thrombi-in-the-portal-vein samples with metastasis-free samples. However, they admitted distinguishing primary tumor samples from their matched-metastatic lesions were still a challenge. In this molecule signature, a gene named osteopontin, a secreted phosphoprotein, served as the lead gene in diagnosing HCC metastasis. </p><p> The analysis is based on the metastatic status of HCC, which is clinically predetermined. However, the validation of the class definition is needed to investigate if the data are sufficient to translate the three classes predefined. We will use some statistical clustering algorithms to validate the class defined. After that, we will conduct variable selection to find markers that are differentially expressed genes among clinical groups validated from this research. Next, using the compound markers found by this research, we will develop a statistical model to predict a new patient&rsquo;s HCC type for intervention. The generalized performance of the prediction model will be evaluated via a cross-validation test. This study aims to build a highly accurate model that renders a better classification of the fore-mentioned clinical groups of HCC and thus enhances the rate of predicting metastatic patients.</p><p>

Page generated in 0.089 seconds