Global ETD Search

11	Network based analysis of genetic disease associations Gilman, Sarah Roche January 2013 (has links) Despite extensive efforts and many promising early findings, genome-wide association studies have explained only a small fraction of the genetic factors contributing to common human diseases. There are many theories about where this "missing heritability" might lie, but increasingly the prevailing view is that common variants, the target of GWAS, are not solely responsible for susceptibility to common diseases and a substantial portion of human disease risk will be found among rare variants. Relatively new, such variants have not been subject to purifying selection, and therefore may be particularly pertinent for neuropsychiatric disorders and other diseases with greatly reduced fecundity. Recently, several researchers have made great progress towards uncovering the genetics behind autism and schizophrenia. By sequencing families, they have found hundreds of de novo variants occurring only in affected individuals, both large structural copy number variants and single nucleotide variants. Despite studying large cohorts there has been little recurrence among the genes implicated suggesting that many hundreds of genes may underlie these complex phenotypes. The question becomes how to tie these rare mutations together into a cohesive picture of disease risk. Biological networks represent an intuitive answer, as different mutations which converge on the same phenotype must share some underlying biological process. Network-based analysis offers three major advantages: it allows easy integration of both common and rare variants, it allows us to assign significance to collection of genes where individual genes may not be significant due to rarity, and it allows easier identification of the biological processes underlying physical consequences. This work presents the construction of a novel phenotype network and a method for the analysis of disease-associated variants. This method has been applied to de novo mutations and GWAS results associated with both autism and schizophrenia and found clusters of genes strongly connected by shared function for both diseases. The results help elucidate the real physical consequences of putative disease mutations, leading to a better understanding of the pathophysiology of the diseases. Bioinformatics
12	Analysis of trans eSNPs infers regulatory network architecture Kreimer, Anat January 2014 (has links) eSNPs are genetic variants associated with transcript expression levels. The characteristics of such variants highlight their importance and present a unique opportunity for studying gene regulation. eSNPs affect most genes and their cell type specificity can shed light on different processes that are activated in each cell. They can identify functional variants by connecting SNPs that are implicated in disease to a molecular mechanism. Examining eSNPs that are associated with distal genes can provide insights regarding the inference of regulatory networks but also presents challenges due to the high statistical burden of multiple testing. Such association studies allow: simultaneous investigation of many gene expression phenotypes without assuming any prior knowledge and identification of unknown regulators of gene expression while uncovering directionality. This thesis will focus on such distal eSNPs to map regulatory interactions between different loci and expose the architecture of the regulatory network defined by such interactions. We develop novel computational approaches and apply them to genetics-genomics data in human. We go beyond pairwise interactions to define network motifs, including regulatory modules and bi-fan structures, showing them to be prevalent in real data and exposing distinct attributes of such arrangements. We project eSNP associations onto a protein-protein interaction network to expose topological properties of eSNPs and their targets and highlight different modes of distal regulation. Overall, our work offers insights concerning the topological structure of human regulatory networks and the role genetics plays in shaping them. Bioinformatics
13	Structure-Based Genome Scale Function Prediction and Reconstruction of the Mycobacterium tuberculosis Metabolic Network Konate, Mariam January 2014 (has links) Due to vast improvements in sequencing methods over the past few decades, the availability of genomic data is rapidly increasing, thus bringing about the need for functional characterization tools. Considering the breadth of data involved, functional assays would be impractical and only a computational method could afford fast and cost-effective functional annotations. Therefore, homology-based computational methods are routinely used to assign putative molecular functions that can later be confirmed with targeted experiments. These methods are particularly well suited to predict the function of enzymes because most metabolic pathways are conserved across organisms. However, the current methods have limitations, especially when considering enzymes that have very low sequence and structure homology to well-annotated enzymes. We hypothesized that two enzymes with the same molecular function shared significant sequence homology in the region surrounding the active site, even if they appear diverged at the global sequence level. First, we have investigated the limits of sequence and structure conservation for enzymes with the same function during divergent evolution. The goal of this was to determine the sequence identity threshold beyond which functional annotations should not be transferred between two sequences; that is the level of homology beyond which the pair of proteins would not be expected to have the same function. Our analysis, which compares several models of sequence evolution, shows that the sequences of orthologous proteins catalyzing the same reaction rarely diverge beyond 30 % identity, even after approximately 3.5 billion years of evolution. As for structure conservation, enzymes catalyzing the same reactions rarely diverge beyond 3 Ã… root-mean-square distance (RMSD). We have also explored sequence conservation constraints as a function of the distance to the active site. Although residues closer to the protein active site (within a radius of 10 Ã… around the catalytic residues) are mutating significantly slower, the requirement to preserve the molecular function also constrains residues at other parts of the protein. From these results, we have developed a structure-based function prediction method where we employ active site conservation in addition to global sequence homology for functional characterization. We then integrated this method with a probabilistic whole-genome function prediction framework previously developed in the Vitkup group, GLOBUS. The original version of GLOBUS uses sampling of probability space to assign functions to all putative metabolic genes in an input genome by considering sequence homology to known enzymes, gene-gene context and EC co-occurrence. Applying this novel method to the whole-genome metabolic reconstruction of Mycobacterium tuberculosis, we made several novel predictions for genes with apparent links to pathogenesis. Notably, our predictions allowed us to reconstruct the cholesterol degradation pathway in M. tuberculosis, which has been implicated in bacterial persistence in the literature but remains to be fully characterized. This pathway is absent from previously published metabolic models of M. tuberculosis. Our new model can now be used to simulate different environments and conditions in order to gain a better understanding of the metabolic adaptability of M. tuberculosis during pathogenesis. Bioinformatics
14	Understanding and Reducing Clinical Data Biases Fort, Daniel January 2015 (has links) The vast amount of clinical data made available by pervasive electronic health records presents a great opportunity for reusing these data to improve the efficiency and lower the costs of clinical and translational research. A risk to reuse is potential hidden biases in clinical data. While specific studies have demonstrated benefits in reusing clinical data for research, there are significant concerns about potential clinical data biases. This dissertation research contributes original understanding of clinical data biases. Using research data carefully collected from a patient community served by our institution as the reference standard, we examined the measurement and sampling biases in the clinical data for selected clinical variables. Our results showed that the clinical data and research data had similar summary statistical profiles, but that there were detectable differences in definitions and measurements for variables such as height, diastolic blood pressure, and diabetes status. One implication of these results is that research data can complement clinical data for clinical phenotyping. We further supported this hypothesis using diabetes as an example clinical phenotype, showing that integrated clinical and research data improved the sensitivity and positive predictive value. Bioinformatics
15	Combining Heterogeneous Databases to Detect Adverse Drug Reactions Li, Ying January 2015 (has links) Adverse drug reactions (ADRs) cause a global and substantial burden accounting for considerable mortality, morbidity and extra costs. In the United States, over 770,000 ADR related injures or deaths occur each year in hospitals, which may cost up to $5.6 million each year per hospital. Unanticipated ADRs may occur after a drug has been approved due to its use or prolonged use on large, diverse populations. Therefore, the post-marketing surveillance of drugs is essential for generating more complete drug safety profiles and for providing a decision making tool to help governmental drug administration agencies take an action on the marketed drugs. Analysis of spontaneous reports of suspected ADRs has traditionally served as a valuable tool in pharmacovigilance. However, because of well-known limitations of spontaneous reports, observational healthcare data, such as electronic health records (EHRs) and administrative claims data, are starting to be used to complement the spontaneous reporting system. Synthesizing ADR evidence from multiple data sources has been conducted by human experts on an at hoc basis. However, the amount of data from both spontaneous reporting systems (SRSs) and observational healthcare databases is growing exponentially. The revolution in the ability of machines to access, process, and mine databases, making it advantageous to develop an automatic system to obtain integrated evidence by combining them. Towards this goal, this dissertation proposes a framework consisting of three components that generates signal scores based on data an EHR system and of an SRS system, and then integrates two signal scores into a composite one. The first component is a data-driven and regression- based method that aims to alleviate confounding effect and detect ADR based on EHRs. The results demonstrate that this component achieves comparable or slightly higher accuracy than those trained with experts and existing automatic methods. The second component is also a data- driven and regression-based method that aims to reduce the effect of confounding by co- medication and confounding by indication using primary suspected, secondary suspected, concomitant medications and indications on the basis of a SRS. This study demonstrates that it could accomplish comparable or slightly better accuracy than the cutting edge algorithm Gamma Poisson Shrinkage (GPS), which uses primary suspected medications only. The third component is a computational integration method that normalizes signal scores from each data source and integrates them into a composite signal score. The results achieved by the method demonstrate that the combined ADR evidence achieve better accuracy of drug-ADR detection than individual systems based on either an SRS or an EHR. Furthermore, component three is explored as a tool to assist clinical assessors in pharmacovigilance practice. The research presented in this dissertation has produced several novel insights and provided new solutions towards the challenging problem of pharmacovigilance. The method of reducing confounding effect can be generalizable to other EHR systems and the method for integrating ADR evidence can be generalizable to include other data sources. In conclusion, this dissertation develops a method to reduce confounding effect in both EHRs and SRSs, and a combined system to synthesize evidence, which could potentially unveil drug safety profiles and novel adverse events in a timely fashion. Bioinformatics
16	Using high throughput data to understand microbes and their interaction with the environment Kar, Amrita 05 March 2017 (has links) The human microbiome ecosystem plays numerous, yet poorly understood beneficial roles in human health. It can shape the immune response and provide essential vitamins and enzymes to the host. The different environments present in the human host are a major determinant of community composition. Conversely, the presence of certain bacteria in specific parts of the human body is sometimes associated with an increased chance of pathologies. Advances in DNA sequencing have increased our understanding of the relationship of microbes with the environment. However, sequencing data alone is unlikely to provide such understanding without the help of appropriate computational models and analyses. For the first part of this thesis, I applied to the infant gut microbiome an approach previously used to understand the order of colonization of microbial biofilms. Available metagenomic sequencing data from infant fecal samples collected for 2.5 years was queried to test whether or not the gut colonization process is a multi-step process, in which the organisms that are prevalent at a given time are closely related, in their metabolic capabilities, to the organisms present at the previous time step. I further used network expansion algorithms previously developed for the study of large-scale biogeochemical evolution, to explore the dynamics and diet-dependency of the gut microbiome. These analyses suggest that metabolic relatedness among organisms is an important factor in the colonization process. The second part of my thesis explores the role of H. pylori in gastric cancer. I analyzed public microarray data for gastric AGS cancer cell lines infected with different strains of H. pylori differing in pathogenicity. Relative to uninfected AGS cell lines, low-pathogenic H. pylori strain displayed no major metabolic dysregulation, consistent with the fact that H. pylori does not cause inflammation/gastric cancer in a majority of the human population. However, gastric AGS cell lines infected with highly pathogenic strains showed more significant differences, including the upregulation of purine metabolism, possibly consistent with an inflammatory response. The results in this dissertation thus offer insights into how the interplay between metabolic activity of human-associated microbes and their surrounding environment plays an important role in the colonization process as well as in pathogenesis. Bioinformatics
17	Geos-chem adjoint inversion of SO2 and NOx emissions with multi-sensor (OMPS, OMI, and VIIRS) data over China Wang, Yi 01 August 2019 (has links) Accurate and timely SO2 and NOx emission inventories are required to simulate and forecast SO2 and NO2 concentrations in the atmosphere. However, bottom-up emission inventories have a time lag of at least one year, as it takes time to collect necessary activity rates and emission factors. This thesis focuses on using satellite data from Ozone Monitoring Instrument (OMI), Ozone Mapper and Profile Suite (OMPS), and Visible Infrared Imaging Radiometer Suite (VIIRS) to optimize SO2 and NOx emissions through the GEOS-Chem adjoint model. The optimized emission inventories are further applied to improve air quality simulation and forecasts. We firstly integrate OMI SO2 satellite measurements and GEOS-Chem adjoint model simulations to constrain monthly anthropogenic SO2 emissions. The effectiveness of this approach is demonstrated for 14 months over China; resultant posterior emissions not only capture a 20% SO2 emission reduction in Beijing during the 2008 Olympic Games but also improve agreement between modeled and in situ surface measurements. Further analysis reveals that posterior emissions estimates, compared to the prior, lead to significant improvements in forecasting monthly surface and columnar SO2. SO2 and NO2 observations from the newer sensor OMPS are used to optimize SO2 and NOx emissions over China for October 2013 through GEOS-Chem adjoint model. OMPS SO2 and NO2 observations are assimilated separately to optimize corresponding emissions, respectively, and posterior emissions, compared to the prior, yield improvements in simulating columnar SO2 and NO2, which are validated with both OMI and OMPS observations. The posterior emissions from assimilating OMPS SO2 and NO2 simultaneously are within -3% to 15% of separate assimilations for SO2 emissions and ±1% for NOx, and the joint assimilation saves about 50% computational time. Changes of NH3 emissions modify NO2 lifetime, hence affecting posterior NOx emissions in separate assimilations, and having impacts on both posterior SO2 and NOx emissions in joint assimilation. All these assimilation experiments are conducted at coarse (2°×2.5°) spatial resolution to save computational time, but coarse-resolution simulations underestimate hot spots of surface SO2 and NO2. Thus, the posterior coarse-resolution emissions are further efficiently downscaled to fine resolution (0.25°×0.3125°) according to spatial distributions of prior MIX emissions or VIIRS nighttime lights. Posterior fine-resolution simulation and forecasts, validating with in situ surface SO2 and NO2 measurements, improve on the prior ones. Bioinformatics
18	Knowledge transfer: what, how, and why Chin, Si-Chi 01 May 2013 (has links) People learn from prior experiences. We first learn how to use a spoon and then know how to use a different size of spoon. We first learn how to sew and then learn how to embroider. Transferring knowledge from one situation to another related situation often increases the speed of learning. This observation is relevant to human learning, as well as machine learning. This thesis focuses on the problem of knowledge transfer -- an area of study in machine learning. The goal of knowledge transfer is to train a system to recognize and apply knowledge acquired from previous tasks to new tasks or new domains. An effective knowledge transfer system facilitates learning processes for novel tasks, where little information is available. For example, the ability to transfer knowledge from a model that identifies writers born in the U.S. to identify writers born in Kiribati, a much lesser known country, would increase the speed of learning to identify writers born in Kiribati from scratch. In this thesis, we investigate three dimensions of knowledge transfer: what, how, and why. We present and elaborate on these questions: What type of knowledge to transfer? How to transfer knowledge across entities? Why a certain pattern of knowledge transfer is observed? We first propose Segmented Transfer -- a novel knowledge transfer model -- to identify and learn from the most informative partitions from prior tasks. The proposed model is applied to Wikipedia vandalism detection problem and to entity search and retrieval problem and improves the predictions. Based on the foundation of knowledge transfer and network theory, we propose Knowledge Transfer Network (KTN), a novel type of network describing transfer learning relationships among problems. KTN is not only a knowledge representation, but also a framework to select an effective and efficient ensemble of learners to improve a predictive model. This novel type of network provides insights on identifying ontological connections that were initially obscured. For example, we may observe knowledge transfer occurs among dissimilar tasks, such as transferring from using a knife and fork to using chopsticks. Bioinformatics
19	A Framework for Implementing Bioinformatics Knowledge-Exploration Systems Hayes, John A. 01 January 2004 (has links) No description available. Bioinformatics
20	Evolutionary Factors Shaping Haplotype and Nucleotide Diversity in Humans and Malaria McGee, Kate 08 February 2008 (has links) Cheaper and more rapid DNA sequencing has led to the accumulation of large amounts of genetic data and has fueled the development of new methods to analyze this data. Using population genetics theory and computational methods we can explore the evolutionary forces that shape genetic variation within and among populations of humans and malaria parasites. Demographic events such as population size change influence current patterns of genetic variation. Accounting for the demographic history of a population is critical in the interpretation of population genetic analyses, particularly in detecting of regions under selection and in making inferences about linkage disequilibrium. Characterizing how recombination rates evolve is critical for the efficient design of association studies and, in turn, the understanding of the genetics behind complex phenotypes. In malaria parasites, recombination is a key element in the creation of a wide array of antigens, which help invade host cells. We examine patterns of genetic variation in humans and malaria and explore how demographic history and recombination rates affect these patterns. Bioinformatics

Search results