Global ETD Search

41	Computational development of regulatory gene set networks for systems biology applications Suphavilai, Chayaporn January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / In systems biology study, biological networks were used to gain insights into biological systems. While the traditional approach to studying biological networks is based on the identification of interactions among genes or the identification of a gene set ranking according to differentially expressed gene lists, little is known about interactions between higher order biological systems, a network of gene sets. Several types of gene set network have been proposed including co-membership, linkage, and co-enrichment human gene set networks. However, to our knowledge, none of them contains directionality information. Therefore, in this study we proposed a method to construct a regulatory gene set network, a directed network, which reveals novel relationships among gene sets. A regulatory gene set network was constructed by using publicly available gene regulation data. A directed edge in regulatory gene set networks represents a regulatory relationship from one gene set to the other gene set. A regulatory gene set network was compared with another type of gene set network to show that the regulatory network provides additional information. In order to show that a regulatory gene set network is useful for understand the underlying mechanism of a disease, an Alzheimer's disease (AD) regulatory gene set network was constructed. In addition, we developed Pathway and Annotated Gene-set Electronic Repository (PAGER), an online systems biology tool for constructing and visualizing gene and gene set networks from multiple gene set collections. PAGER is available at http://discern.uits.iu.edu:8340/PAGER/. Global regulatory and global co-membership gene set networks were pre-computed. PAGER contains 166,489 gene sets, 92,108,741 co-membership edges, 697,221,810 regulatory edges, 44,188 genes, 651,586 unique gene regulations, and 650,160 unique gene interactions. PAGER provided several unique features including constructing regulatory gene set networks, generating expanded gene set networks, and constructing gene networks within a gene set. However, tissue specific or disease specific information was not considered in the disease specific network constructing process, so it might not have high accuracy of presenting the high level relationship among gene sets in the disease context. Therefore, our framework can be improved by collecting higher resolution data, such as tissue specific and disease specific gene regulations and gene sets. In addition, experimental gene expression data can be applied to add more information to the gene set network. For the current version of PAGER, the size of gene and gene set networks are limited to 100 nodes due to browser memory constraint. Our future plans is integrating internal gene or proteins interactions inside pathways in order to support future systems biology study. regulatory networks gene set networks systems biology Systems biology -- Methodology Gene regulatory networks -- Research Genomes -- Data processing Bioinformatics Structural bioinformatics -- Research Genetic regulation Biological systems -- Research
42	Extraction de connaissances pour la modélisation tri-dimensionnelle de l'interactome structural / Knowledge-based approaches for modelling the 3D structural interactome Ghoorah, Anisah W. 22 November 2012 (has links) L'étude structurale de l'interactome cellulaire peut conduire à des découvertes intéressantes sur les bases moléculaires de certaines pathologies. La modélisation par homologie et l'amarrage de protéines ("protein docking") sont deux approches informatiques pour modéliser la structure tri-dimensionnelle (3D) d'une interaction protéine-protéine (PPI). Des études précédentes ont montré que ces deux approches donnent de meilleurs résultats quand des données expérimentales sur les PPIs sont prises en compte. Cependant, les données PPI ne sont souvent pas disponibles sous une forme facilement accessible, et donc ne peuvent pas être re-utilisées par les algorithmes de prédiction. Cette thèse présente une approche systématique fondée sur l'extraction de connaissances pour représenter et manipuler les données PPI disponibles afin de faciliter l'analyse structurale de l'interactome et d'améliorer les algorithmes de prédiction par la prise en compte des données PPI. Les contributions majeures de cette thèse sont de : (1) décrire la conception et la mise en oeuvre d'une base de données intégrée KBDOCK qui regroupe toutes les interactions structurales domaine-domaine (DDI); (2) présenter une nouvelle méthode de classification des DDIs par rapport à leur site de liaison dans l'espace 3D et introduit la notion de site de liaison de famille de domaines protéiques ("domain family binding sites" ou DFBS); (3) proposer une classification structurale (inspirée du système CATH) des DFBSs et présenter une étude étendue sur les régularités d'appariement entre DFBSs en terme de structure secondaire; (4) introduire une approche systématique basée sur le raisonnement à partir de cas pour modéliser les structures 3D des complexes protéiques à partir des DDIs connus. Une interface web (http://kbdock.loria.fr) a été développée pour rendre accessible le système KBDOCK / Understanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr) Fouille de données Classification Base de données relationnelle Programmation logique Bioinformatique structurale Interaction protéine-protéine Protein docking KBDOCK Knowledge discovery in databases (KDD) Data mining Classification Relational database Logic programming Structural bioinformatics Protein-protein interactions Protein docking KBDOCK 005.756 005.74
43	Multi-scale Modelling of HLA Diversity and Its Effect on Cytotoxic Immune Responses in Influenza H1N1 Infection Mukherjee, Sumanta January 2015 (has links) (PDF) Cytotoxic T-lymphocytes (CTLs) are important components of the adaptive immune system and function by scanning the intracellular environment so as to detect and de-stroy infected cells. CTL responses play a major role in controlling virus-infected cells such as in HIV or influenza and cells infected with intracellular bacteria such as in tuberculosis. To do so they require the antigens to be presented to them, which is fulfilled by the major histocompatibility complex (MHC), commonly known as human leukocyte antigen or HLA molecules in humans. Recognition of antigenic peptides to Class-1 HLA molecules is a prerequisite for triggering CTL immune responses. Individuals differ significantly in their ability to respond to an infection. Among the factors that govern the outcome of an infection, HLA polymorphism in the host is one of the most important. Despite a large body of work on HLA molecules, much remains to be understood about the relationship between HLA diversity and disease susceptibility. High complexity arises due to HLA allele polymorphism, extensive antigen cross-presentability, and host-pathogen heterogeneity. A given allele can recognize a number of different peptides from various pathogens and a given peptide can also bind to a number of different individuals. Thus, given the plurality in peptide-allele pairs and the large number of alleles, understanding the differences in recognition profiles and the implications that follow for disease susceptibilities require mathematical modelling and computational analysis. The main objectives of the thesis were to understand heterogeneity in antigen presentation by HLA molecules at different scales and how that heterogeneity translates to variations in disease susceptibilities and finally the disease dynamics in different populations. Towards this goal, first the variations in HLA alleles need to be characterized systematically and their recognition properties understood. A structure-based classification of all known HLA class-1 alleles was therefore attempted. In the process, it was also of interest to see if understanding of sub-structures at the binding grooves of HLA molecules could help in high confidence prediction of epitopes for different alleles. Next, the goal was to understand how HLA heterogeneity affect disease susceptibilities and disease spread in populations. This was studied at two different levels. Firstly, modelling the HLA genotypes and CTL responses in different populations and assessing how they recognized epitopes from a given virus. The second approach involved modelling the disease dynamics given the predicted susceptibilities in different populations. Influenza H1N1 infection was used as a case study. The specific objectives addressed are: (a) To develop a classification scheme for all known HLA class-1 alleles that can explain epitope recognition profiles and further to dissect the physic-chemical features responsible for differences in peptide specificities, (b) A statistical model has been derived from a large dataset of HLA-peptide complexes. The derived model was used to identify the interdependencies of residues at different peptide and thereby, rationalize the HLA class-I allele binding specificity at a greater detail, (c) To understand the effect of HLA heterogeneity on CTL mediated disease response. A model of HLA genotypes for different populations was required for this, which was constructed and used for estimating disease response to H1N1 via the prediction of epi-topes and (d) To model disease dynamics in different populations with the knowledge of the CTL response-grouping and to evaluate the effect of heterogeneity on different vaccination strategies. Each of the four objectives listed above are described subsequently in chapters 2 to 5, followed by Chapter 6 which summarises the findings from the thesis and presents future directions. Chapter 1 presents an introduction to the importance of the function of HLA molecules, describes structural bioinformatics as a discipline and the methods that are available for it. The chapter also describes different mathematical modelling strategies available to study host immune responses. Chapter 2 describes a novel method for structure-based hierarchical classification of HLA alleles. Presently, more than 2000 HLA class-I alleles are reported, and they vary only across peptide-binding grooves. The polymorphism they exhibit, enables them to bind to a wide range of peptide antigens from diverse sources. HLA molecules and peptides present a complex molecular recognition pattern due to multiplicity in their associations. Thus, a powerful grouping scheme that not only provides an insightful classification, but is also capable of dissecting the physicochemical basis of recognition specificity is necessary to address this complexity. The study reports a hierarchical classification of 2010 class-I alleles by using a systematic divisive clustering method. All-pair distances of alleles were obtained by comparing binding pockets in the structural models. By varying the similarity thresholds, a multilevel classification with 7 supergroups was derived, each further categorized to yield a total of 72 groups. An independent clustering scheme based only on the similarities in their epitope pools correlated highly with pocket-based clustering. Physicochemical feature combinations that best explains the basis for the observed clustering are identified. Mutual information calculated for the set of peptide ligands enables identification of binding site residues that contribute to peptide specificity. The grouping of HLA molecules achieved here will be useful for rational vaccine design, understanding disease susceptibilities and predicting risk of organ transplants. The results are presented in an interactive web- server http://proline.iisc.ernet.in/hlaclassify. In Chapter 3, the knowledge of structural features responsible for generating peptide recognition specificities are first analysed and then utilized for predicting T-cell epi-topes for any class-1 HLA allele. Since identification of epitopes is critical and central to many of the questions in immunology, a study of several HLA-peptide complexes is carried out at the structural level and factors are identified that discriminate good binder peptides from those that do not. T-cell epitopes serve as molecular keys to initiate adaptive immune responses. Identification of T-cell epitopes is also a key step in rational vaccine design. Most available methods are driven by informatics, critically dependent on experimentally obtained training data. Analysis of the training set from IEDB for several alleles indicate that sampling of the peptide space is extremely sparse covering only a tiny fraction of all possible nonamer space, and also heavily skewed, thus restricting the range of epitope prediction. A new epitope prediction method is therefore developed. The method has four distinct modules, (a) structural modelling, estimating statistical pair-potentials and constraint derivation, (b) implicit modelling and interaction profiling, (c) binding affinity prediction through feature representation and (d) use of graphical models to extract peptide sequence signatures to predict epitopes for HLA class I alleles . HLaffy is a novel and efficient epitope prediction method that predicts epitopes for any HLA Class-1 allele, by estimating binding strengths of peptide-HLA complexes which is achieved through learning pair-potentials important for peptide binding. It stands on the strength of mechanistic understanding of HLA-peptide recognition and provides an estimate of the total ligand space for each allele. The method is made accessible through a webserver http://proline.biochem.iisc.ernet.in/HLaffy. In chapter 4, the effect of genetic heterogeneity on disease susceptibilities are investigated. Individuals differ significantly in their ability to respond to an infection. Among the factors that govern the outcome of an infection, HLA polymorphism in the host is one of the most important. Despite a large body of work on HLA molecules, much remains to be understood about how host HLA diversity affects disease susceptibilities. High complexity due to polymorphism, extensive cross-presentability among HLA alleles, host and pathogen heterogeneity, demands for an investigation through computational approaches. Host heterogeneity in a population is modelled through a molecular systems approach starting with mining ‘big data’ from literature. The in-sights derived through this is used to investigate the effect of heterogeneity in a population in terms of the impact it makes on recognizing a pathogen. A case study of influenza virus H1N1 infection is presented. For this, a comprehensive CTL immunome is defined by taking a consensus prediction by three distinct methods. Next, HLA genotypes are constructed for different populations using a probabilistic method. Epidemic incidences in general are observed to correlate with poor CTL response in populations. From this study, it is seen that large populations can be classified into a small number of groups called response-types, specific to a given viral strain. Individuals of a response type are expected to exhibit similar CTL responses. Extent of CTL responses varies significantly across different populations and increases with increase in genetic heterogeneity. Overall, the study presents a conceptual advance towards understanding how genetic heterogeneity influences disease susceptibility in individuals and in populations. Lists of top-ranking epitopes and proteins are also derived, ranked on the basis of conservation, antigenic cross-reactivity and population coverage, which pro- vide ready short-lists for rational vaccine design (flutope). Next, in Chapter 5, the effect of genetic heterogeneity on disease dynamics has been investigated. A mathematical framework has been developed to incorporate the heterogeneity information in the form of response-types described in the previous chap-ter. The spread of a disease in a population is a complex process, controlled by various factors, ranging from molecular level recognition events to socio-economic causes. The ‘response-typing’ described in the previous chapter allows identification of distinct groups of individuals, each with a different extent of susceptibility to a given strain of the virus. 3 different approaches are used for modelling: (i) an SIR model where different response types are considered as partitions of each S, I and R compartment. Initially SIR models are developed, such that the S compartment is sub-divided into further groups based on the ‘response-types’ obtained in the previous chapter. This analysis shows an effect in infection sweep time, i.e., how long the infection stays in the population. A stochastic model incorporates the environmental noise due to random variation in population influx, due to birth, death or migration. The system is observed to show higher stability in the presence of genetic heterogeneity. As the contagion spreads only through direct host to host contact. The topology of the contact network, plays major role in deciding the extent of disease dynamics. An agent based computational framework has been developed for modelling disease spread by considering spatial distribution of the agents, their movement patterns and resulting contact probabilities. The agent-based model (ABM) incorporates the temporal patterns of contacts. The ABM is based on a city block model and captures movement of individuals parametrically. A new concept of system ‘characteristic time’ has been introduced in context of a time-evolving network. ‘Characteristic time’ is the minimum time required to ensure, every individual is connected to all other individuals, in the time aggregated contact network. For any given temporal system, disease time must exceed ‘characteristic time’ in order to spread throughout the population. Shorter ‘characteristic time’ of the system is suggestive of faster spread of the disease. A disease spread network is constructed which shows how the disease spreads from one infected individual to others in the city, given the contact rules and their relative susceptibilities to that viral strain. A high degree of population heterogeneity is seen to results in longer disease residence time. Susceptible individuals preferentially get infected first thereby exposing more susceptible individuals to the disease. Vaccination strategies are derived from the model, which indicates that vaccinating only 20% of the agents, who are hub nodes or highly central nodes and who also have a high degree to susceptible agents, lead to high levels of herd immunity and can confer protection to the rest of the population. Overall, the thesis has provided biologically meaningful classification of all known HLA class-1 alleles and has unravelled the physico-chemical basis for their peptide recognition specificities. The thesis also presents a new algorithm for estimating pep-tide binding affinities and consequently predicting epitopes for all alleles. Finally the thesis presents a conceptual advance in relating HLA diversity to disease susceptibilities and explains how different populations can respond differently to a given infection. A case study with the influenza H1N1 virus identified populations who are most susceptible and those who are least susceptible, in the process identifying important epitopes and responder alleles, providing important pointers for vaccine design. The influence of heterogeneity and response-typing on disease dynamics is also presented for influenza H1N1 infection, which has led to the rational identification of effective vaccination strategies. The methods and concepts developed here are fairly generic and can be adapted easily for studying other infectious diseases as well. Three new web-resources, a) HLAclassify, b) HLaffy and c) Flutope have been developed, which host pre-computed results as well as allow interactive querying to an user to perform analysis with a specific allele, peptide or a pathogenic genome sequence. Influenza H1N1 Infection Cytotoxic T-lymphocytes (CTLs) Human Leucocyte Antigen (HLA) Cytotoxic Immune Responses Modeling Human Immune System Peptide Binding Genetic Heterogeneity Structural Bioinformatics HLAClassify HLAffy Flutope Disease Spreader Network (DSN) Disease Dynamics Mathematics
44	Structural and Evolutionary Studies on Bio-Molecular Complexes Sudha, G January 2014 (has links) (PDF) No description available. Bio-Molecular Complexes Protein Complexes Protein-Protein Interactions Human Casein Kinase Hepatitis C Virus (HCV) Proteins Adeno Associated Virus (AAV) Proteins Protein Structural Bioinformatics Homomeric Proteins Heteromers Paralogous Proteins Biomolecular Structure Computational Structural Biology AAV2 Capsid NS3 protease HCV IRES Molecular Biophysics
45	Critical assessment of predicted interactions at atomic resolution Mendez Giraldez, Raul 21 September 2007 (has links) Molecular Biology has allowed the characterization and manipulation of the molecules of life in the wet lab. Also the structures of those macromolecules are being continuously elucidated. During the last decades of the past century, there was an increasing interest to study how the different genes are organized into different organisms (‘genomes’) and how those genes are expressed into proteins to achieve their functions. Currently the sequences for many genes over several genomes have been determined. In parallel, the efforts to have the structure of the proteins coded by those genes go on. However it is experimentally much harder to obtain the structure of a protein, rather than just its sequence. For this reason, the number of protein structures available in databases is an order of magnitude or so lower than protein sequences. Furthermore, in order to understand how living organisms work at molecular level we need the information about the interaction of those proteins. Elucidating the structure of protein macromolecular assemblies is still more difficult. To that end, the use of computers to predict the structure of these complexes has gained interest over the last decades.<p>The main subject of this thesis is the evaluation of current available computational methods to predict protein – protein interactions and build an atomic model of the complex. The core of the thesis is the evaluation protocol I have developed at Service de Conformation des Macromolécules Biologiques et de Bioinformatique, Université Libre de Bruxelles, and its computer implementation. This method has been massively used to evaluate the results on blind protein – protein interaction prediction in the context of the world-wide experiment CAPRI, which have been thoroughly reviewed in several publications [1-3]. In this experiment the structure of a protein complex (‘the target’) had to be modeled starting from the coordinates of the isolated molecules, prior to the release of the structure of the complex (this is commonly referred as ‘docking’).<p>The assessment protocol let us compute some parameters to rank docking models according to their quality, into 3 main categories: ‘Highly Accurate’, ‘Medium Accurate’, ‘Acceptable’ and ‘Incorrect’. The efficiency of our evaluation and ranking is clearly shown, even for borderline cases between categories. The correlation of the ranking parameters is analyzed further. In the same section where the evaluation protocol is presented, the ranking participants give to their predictions is also studied, since often, good solutions are not easily recognized among the pool of computer generated decoys.<p>An overview of the CAPRI results made per target structure and per participant regarding the computational method they used and the difficulty of the complex. Also in CAPRI there is a new ongoing experiment about scoring previously and anonymously generated models by other participants (the ‘Scoring’ experiment). Its promising results are also analyzed, in respect of the original CAPRI experiment. The Scoring experiment was a step towards the use of combine methods to predict the structure of protein – protein complexes. We discuss here its possible application to predict the structure of protein complexes, from a clustering study on the different results.<p>In the last chapter of the thesis, I present the preliminary results of an ongoing study on the conformational changes in protein structures upon complexation, as those rearrangements pose serious limitations to current computational methods predicting the structure protein complexes. Protein structures are classified according to the magnitude of its conformational re-arrangement and the involvement of interfaces and particular secondary structure elements is discussed. At the end of the chapter, some guidelines and future work is proposed to complete the survey. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences exactes et naturelles Chimie Structural bioinformatics Molecular structure Proteins -- Conformation Proteins -- Structure Amino acids Bio-informatique structurale Structure moléculaire Protéines -- Conformation Protéines -- Structure Acides aminés protein - protein complex protein - protein interaction root mean square deviation docking solvent accessible area conformational change rmsd
46	Variable selection and structural discovery in joint models of longitudinal and survival data He, Zangdong January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Joint models of longitudinal and survival outcomes have been used with increasing frequency in clinical investigations. Correct specification of fixed and random effects, as well as their functional forms is essential for practical data analysis. However, no existing methods have been developed to meet this need in a joint model setting. In this dissertation, I describe a penalized likelihood-based method with adaptive least absolute shrinkage and selection operator (ALASSO) penalty functions for model selection. By reparameterizing variance components through a Cholesky decomposition, I introduce a penalty function of group shrinkage; the penalized likelihood is approximated by Gaussian quadrature and optimized by an EM algorithm. The functional forms of the independent effects are determined through a procedure for structural discovery. Specifically, I first construct the model by penalized cubic B-spline and then decompose the B-spline to linear and nonlinear elements by spectral decomposition. The decomposition represents the model in a mixed-effects model format, and I then use the mixed-effects variable selection method to perform structural discovery. Simulation studies show excellent performance. A clinical application is described to illustrate the use of the proposed methods, and the analytical results demonstrate the usefulness of the methods. Joint models Mixed effect selection Structural discovery Adaptive LASSO Gaussian quadrature EM algorithm Regression analysis -- Data processing Numerical analysis -- Data processing Spectral theory (Mathematics) Estimation theory -- Analysis Statistics -- Data processing Calculus of variations Structural bioinformatics Parameter estimation
47	Joint models for longitudinal and survival data Yang, Lili 11 July 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Epidemiologic and clinical studies routinely collect longitudinal measures of multiple outcomes. These longitudinal outcomes can be used to establish the temporal order of relevant biological processes and their association with the onset of clinical symptoms. In the first part of this thesis, we proposed to use bivariate change point models for two longitudinal outcomes with a focus on estimating the correlation between the two change points. We adopted a Bayesian approach for parameter estimation and inference. In the second part, we considered the situation when time-to-event outcome is also collected along with multiple longitudinal biomarkers measured until the occurrence of the event or censoring. Joint models for longitudinal and time-to-event data can be used to estimate the association between the characteristics of the longitudinal measures over time and survival time. We developed a maximum-likelihood method to joint model multiple longitudinal biomarkers and a time-to-event outcome. In addition, we focused on predicting conditional survival probabilities and evaluating the predictive accuracy of multiple longitudinal biomarkers in the joint modeling framework. We assessed the performance of the proposed methods in simulation studies and applied the new methods to data sets from two cohort studies. / National Institutes of Health (NIH) Grants R01 AG019181, R24 MH080827, P30 AG10133, R01 AG09956. joint models longitudinal data survival data bivariate change point models prediction Bayesian method EM algorithm Biologically-inspired computing Probability measures Expectation-maximization algorithms Failure time data analysis Numerical analysis -- Data processing Clinical trials -- Statistical methods

Page generated in 0.0588 seconds