Global ETD Search

91	Finding functional groups of genes using pairwise relational data : methods and applications Brumm, Jochen 05 1900 (has links) Genes, the fundamental building blocks of life, act together (often through their derived proteins) in modules such as protein complexes and molecular pathways to achieve a cellular function such as DNA repair and cellular transport. A current emphasis in genomics research is to identify gene modules from gene profiles, which are measurements (such as a mutant phenotype or an expression level), associated with the individual genes under conditions of interest; genes in modules often have similar gene profiles. Clustering groups of genes with similar profiles can hence deliver candidate gene modules. Pairwise similarity measures derived from these profiles are used as input to the popular hierarchical agglomerative clustering algorithms; however, these algorithms offer little guidance on how to choose candidate modules and how to improve a clustering as new data becomes available. As an alternative, there are methods based on thresholding the similarity values to obtain a graph; such a graph can be analyzed through (probabilistic) methods developed in the social sciences. However, thresholding the data discards valuable information and choosing the threshold is difficult. Extending binary relational analysis, we exploit ranked relational data as the basis for two distinct approaches for identifying modules from genomic data, both based on the theory of random graph processes. We propose probabilistic models for ranked relational data that allow candidate modules to be accompanied by objective confidence scores and that permit an elegant integration of external information on gene-gene relationships. We first followed theoretical work by Ling to objectively select exceptionally isolated groups as candidate gene modules. Secondly, inspired by stochastic block models used in the social sciences, we construct a novel model for ranked relational data, where all genes have hidden module parameters which govern the strength of all gene-gene relationships. Adapting a classical likelihood often used for the analysis of horse races, clustering is performed by estimating the module parameters using standard Bayesian methods. The method allows the incorporation of prior information on gene-gene relationships; the utility of using prior information in the form of protein-protein interaction data in clustering of yeast mutant phenotype profiles is demonstrated. / Science, Faculty of / Statistics, Department of / Graduate Statistics Genomics Bayesian methods
92	Statistical methods for integrative analysis of genomic data Ming, Jingsi 24 August 2018 (has links) Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still several challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, most complex traits are suggested to be highly polygenic, i.e., they are affected by a vast number of risk variants with individually small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Third, accumulating evidence from GWAS suggests the pervasiveness of pleiotropy, a phenomenon that some genetic variants can be associated with multiple traits, but there is a lack of unified framework which is scalable to reveal relationship among a large number of traits and prioritize genetic variants simultaneously with functional annotations integrated. In this thesis, we propose two statistical methods to address these challenges using integrative analysis of summary statistics from GWASs and functional annotations. In the first part, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization (EM) algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWASs of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes. In the second part, we propose a latent probit model (LPM) which combines summary statistics from multiple GWASs and functional annotations, to characterize relationship and increase statistical power to identify risk variants. LPM can also perform hypothesis testing for pleiotropy and annotations enrichment. To enable the scalability of LPM as the number of GWASs increases, we developed an efficient parameter-expanded EM (PX-EM) algorithm which can execute parallelly. We first validated the performance of LPM through comprehensive simulations, then applied it to analyze 44 GWASs with nine genic category annotations. The results demonstrate the benefits of LPM and can offer new insights of disease etiology. Genomics ; Statistical methods
93	Identification of New Sources of Resistance to Anthracnose in Climbing Bean Germplasm from Guatemala Maldonado Mota, Carlos Raul January 2017 (has links) Anthracnose, caused by Colletotrichum lindemuthianum (Sacc. and Magnus) Briosi and Cavara is a fungal disease that affects common bean worldwide. Seed yield loses sometimes reach 100% when the seed is infected and environmental conditions favor the disease. Climbing beans in Guatemala represent the main source of protein for the habitants of this region (9.4 kg/person/year). Unfortunately, anthracnose threatens climbing bean production in the region. Six races were found among samples collected in Guatemala Highlands using the standard common bean differential lines. Also, a germplasm collection from ICTA Guatemala was evaluated for resistance to C. lindemuthianum race 73, which is the predominant race in the U.S. Approximately 10% of 369 climbing bean accessions showed no symptoms (score of 1). GWAS results using 78754 SNP markers indicated that genomic regions for resistance to C. lindemuthianum exist in Pv04 and Pv07. / USAID-Legume Innovation Lab / ICTA (Guatemala) Anthracnose. Colletotrichum lindemuthianum. Genomics.
94	Shining light on the dark matter of the genome January 2019 (has links) archives@tulane.edu / These studies make strides in better understanding retroelement L1 expression and regulation at the locus-specific level using a combination of sequencing technologies. A picture is painted demonstrating tissue specific patterns of L1 expression when identified stringently and confidently with our developed EL-Seq approach. As it was also determined that expressed L1s significantly correlate with regions of open chromatin, these tissue-specific patterns of L1 expression are then most likely explained by tissue-specific chromatin architecture. Evidence is also presented here that L1s in tissues respond differently with genomic stresses and perturbations as is seen in the case of aging indicating that the risk associated with L1 damage and mutagenesis is related to cell type and tissue. This is particularly notable when considering the genesis and promotion of age-related somatic diseases like epithelial cancers. L1s are commonly referred to as the dark matter of the genome, but here we illuminate its biology and regulation to better understand L1-associated damage and risk to human health. / 0 / Tiffany Kaul Genomics LINE-1 Retroelements
95	Genome-wide expression and genomic data integration analyses in sporadic Parkinson disease Dumitriu, Alexandra January 2012 (has links) Thesis (Ph.D.)--Boston University / Parkinson disease (PD) is the second most common neurodegenerative disorder, affecting an estimated 2% of the population above 65 years of age. Although familial forms of PD have been linked to specific mutations responsible for the onset of the disease, the majority of PD cases is still of unknown etiology. PD has been traditionally studied using individual genetic methods, such as linkage analysis, genome-wide association (GWAS), or microarray expression studies. Nevertheless, the intrinsic disease genetic variability, and the unilateral analysis approach of available datasets made the detection of robust gene or pathway signals difficult. Studies of PD that combine a range of systems genetics approaches, and integrate complementary disease-relevant genetic datasets, represent a promising approach for accommodating prior inconsistent, as well as diverse results. To investigate the genetics of idiopathic PD, I performed the largest genome-wide expression study in brain tissue to date. The study was carried out on the 1-color Agilent 60-mer Whole Human Genome Microarray, and included 26 neurologically healthy control and 27 PD samples from the frontal cortex Brodmann 9 area (BA9). The selected brain samples were of high quality (high pH and RNA integrity, no significant signs of Alzheimer disease pathology), and had rich documentation of neuropathological and clinical information available. I analyzed the microarray expression results in combination with genotyping data for PD-associated single nucleotide polymorphisms obtained for the microarray brain samples, and detected a pathway of interest for PD involving the FOXO1 (Forkhead box protein O1) gene. This result was verified in additional publically available expression datasets. I then performed a network-based canonical pathway analysis of PD, combining results from available GWAS, microarray expression, and animal model expression studies. The used analysis framework was a human functional-linkage network (FLN), consisting of genes as nodes, and weighted links indicating the confidence of gene-pair involvement in similar biological processes. I demonstrated the relevance of the used FLN for studying PD. Additionally, I ranked genes and pathways based on the available disease datasets. The frontal cortex BA9 study, and an additional non-PD microarray study were used as the positive and negative controls, respectively, for the obtained results. Parkinson's disease Genomics
96	Computational Methods to Analyze Next-generation Sequencing Data in Genomics and Metagenomics Wang, Saidi 01 January 2022 (has links) (PDF) This thesis focuses on two important computational problems in genomics and metagenomics with the public available next-generation sequencing data. One is about gene regulation, for which we explore how distal regulatory elements may interact with the proximal regulatory elements. The other is about metagenomics, in which we study how to reconstruct bacterial strain genomes from shotgun reads. Studying gene regulation, especially distal gene regulation, is important because regulatory elements, including those in distal regulatory regions, orchestrate when, where and how much a gene is activated under every experimental condition. Their dysfunction results in various types of diseases. Moreover, the current study on distal gene regulation is still under development. The study of bacterial strains is also vital, as the bacterial strains are the main source of drug resistance, mixed infection, reinfection, etc. The study of novel bacterial strains is still in its infancy, with only one tool that can work with multiple metagenomic samples while has suboptimal performance. We identified hundreds of pairs of regulatory elements that are biologically sound and are likely to contribute to the interaction of distal and proximal regulatory regions. We demonstrated for the first time that ribosomal protein genes share common distal regulatory regions under the same experimental conditions and might be differentially regulated across different experimental conditions. In addition, we developed a novel approach called SMS to reconstruct novel bacterial strains from multiple shotgun metagenomic samples. Tested on 702 simulated and 195 experimental datasets, we showed that SMS has high accuracy in inferring the present strains, including the strain number, strain abundance, strain variations, etc. Compared with the two existing approaches, SMS shows much better performance. Our studies shed new light on genomics and generated novel tools in metagenomics. Computer Sciences Genomics
97	Computational Study of Gene Transcription Initialization and Regulation Zheng, Hansi 01 January 2022 (has links) (PDF) MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression and play an essential role in phenotype development. The regulation mechanism behind miRNA reveals insight into gene expression and gene regulation. Transcription Start Site(TSS) is the key to studying gene expression. However, the TSSs of miRNAs can be thousands of nucleotides away from the precursor miRNAs, which makes it hard to be detected by conventional RNA-Seq experiments. Some previous methods tried to take advantage of sequencing data using sequence features or integrated epigenetic markers, but resulted in either not condition-specific or low-resolution prediction. Furthermore, the availability of a large amount of Single-Cell RNA-Seq(scRNA-Seq) data provides remarkable opportunities for studying gene regulatory mechanisms at single-cell resolution. Incorporating the gene regulatory mechanisms can assist with cell type identification and state discovery from scRNA-Seq data. In this dissertation, we studied computational modeling of gene transcription initialization and expression, including two novel approaches to identify TSSs with various type of conditions and one case study at the single-cell level. Firstly, we studied how TSS can be identified based on Cap Analysis Gene Expression (CAGE) experiments data using the thriving Deep Learning Neural Network. We used a control model to study the Deepbind binding score features that the protein binding motif model can improve overall prediction performance. Furthermore, comparing data from unseen cell lines showed better performance than existing tools. Secondly, to better predict the TSSs of miRNA in a condition-specific manner, we built D-miRT, a two-steam convolutional neural network based on integrated low-resolution epigenetic features and high-resolution sequence features. D-miRT outperformed all baseline models and demonstrated high accuracy for miRNA TSS prediction tasks. Compared with the most recent approaches on cell-specific miRNA TSS identification using cell lines that were unseen to the model training processes, D-miRT also showed superior performance. Thirdly, to study gene transcription initialization and regulation from single-cell perspective, we developed INSISTC, an unsupervised machine learning-based approach that incorporated network structure information for single-cell type classification. In contrast to other clustering algorithms, we showed that INSISTC with the SC3 algorithm provides cluster number estimation. Future studies on gene expression and regulation will benefit from INSISTC's adaptability with regard to the kinds of biological networks that can be used. Computer Sciences Genetics and Genomics
98	Detection of Bacterial Retroelements Using Genomics Mu, Sen 01 May 2013 (has links) (PDF) The reverse flow of genetic information can occur when a special DNA polymerase called Reverse Transcriptase (RT) copies the genetic information in an RNA molecule back into a complementary DNA. One type of RT encoding gene found in bacteria is called a retron element. Recent bacterial genome sequencing projects have revealed many examples of retron RT genes. This gene assignment is based on comparison with a few known retron RT proteins. However, RT proteins are highly diverse in their amino acid sequences, and thus the assigned identity of these RT proteins as retrons in genome databases is questionable. One way to prove that these postulated RTs are indeed from retron elements is to see if they can produce msDNA. Retron RTs are known to synthesize a structurally unique satellite DNA called msDNA in the bacterial cells that contain them. Based on GenBank database matches to a known protein, 7 proteins designated as retron RTs were tested for their ability to synthesize msDNA. Five of these retron RTs did show evidence of producing msDNA and are from very different bacterial hosts. The other 2 RT proteins did not show any evidence that they produce msDNA. Retroelements Genomics Other Microbiology
99	From DNA Logic Gates to DNA Nanorobots Molden, Tatiana 01 January 2022 (has links) (PDF) Due to their biocompatibility and parallel data processing, DNA computational devices are highly desired for applications in diagnosis and treatment of cancer, infectious and genetic diseases. Much like in modern electronic devices, DNA computation is based on the logic gates - by directly interacting with DNA or RNA input molecules, they produce a specific output depending on their embedded logic function. This work is devoted to the development of functional parts of a future DNA nanorobot for biomedical applications. Specifically, we used DNA nanotechnology and the concept of multicomponent DNA probes to develop three parts of the DNA nanorobot: computing, actuating, and sensing. The computation function was addressed by developing construct with two integrated NAND gates, capable of processing three different DNA or RNA inputs. The second "smart" construct produces actuating function – cleavage of RNA for housekeeping gene in response to recognition of RNA inputs, generated by cancer cells. The third construct is an original DNA "Cephalopod-tile", with improved sensing function, capable of recognizing highly structured biological analytes, such as 16S rRNA of E.coli, as well as increasing hybridization kinetics with targets up to 465 times. These nanoconstructs contributed to development of original DNA nanomachine with OR logic function for treatment of cancer. Theses for Defense 1. Computing: It is possible to design tile-associated DNA NAND logic gates that can be integrated in communicating circuits. 2. Actuating: It is possible to create a DNA nano-construction that can cut out a marker fragment from a longer RNA sequence and use it as an activator for triggering cleavage of another RNA sequence. 3. Sensing: A DNA probe equipped with analyte capture function can increase hybridization rates between DNA and RNA analytes. 4. Sensing, computation, and actuating: It is possible to design multifunctional DNA nanomachines with sensing, computation, and therapeutic modules. Genetics and Genomics Nanotechnology
100	Computational Study of Target Gene Interactions - Enhancers and microRNAs Talukder, Amlan 01 January 2021 (has links) (PDF) Gene expression is an essential mechanism for physical and mental development of human. Aberrant regulation of gene expression creates abnormality in human body than can lead to complicated diseases. Gene expression can be regulated at any stage from the chromatin unfolding stage to post-translation stage of protein. In this study, we focused on two important factors of gene expression regulation that participate in the gene expression process at the transcription and the post-transcriptional stages; enhancer-promoter interactions and miRNA-mRNA interactions. The enhancer-promoter interactions are difficult to detect due to the large distance between the enhancer and promoter region and cell-specific activity of the interactions. The cell-specific interactions have not been well studied due to inconsistent feature availability in different cells. We designed a tool that considers a large variety of enhancer-promoter interaction features in different cell lines, can deal with missing features, and can predict cell-specific interactions with better accuracy than the available tools. By analyzing the cell-specific interactions from different sources we also found that enhancers-promoter interactions are shared in groups. MiRNA-mRNA interactions are more complicated in human than other organism because of the imperfectness of the interactions and the smaller size and complex target choosing strategy of the miRNA. Available miRNA target prediction tools, designed on canonical features, often suffer from low accuracy with the new experimentally supported datasets. These tools do not consider the position-wise binding preference and relationship between adjacent positions and regions of the miRNA sequence. Here, we designed a Markov-model based feature to capture this position wise information from experimental data sets, which can be incorporated with any prediction tool to improve the performance of the tool. Computer Sciences Genetics and Genomics

Search results