191 |
A Rank Score Model of Variants Prioritization for Rare DiseaseLiu, Nanxing January 2023 (has links)
The diagnosis of genetic illnesses has undergone a revolution with advancements in sequencing technology. Next-generation sequencing (NGS) has become a standard practice in genetic diagnostics, enabling the identification of various genetic variations. However, distinguishing causative variants from a vast number of benign background variants presents a significant challenge. This study focuses on improving the rank score model used in genetic rare-disease diagnostics at a clinical genomics facility in Stockholm. The objective is to develop a more effective and optimized model through the utilization of exploratory data analysis techniques and machine learning methods, investigating the strengths and weaknesses of various existing annotation scores to identify suitable features and enhance the model's classification performance. The research methodology involved analyzing publicly available ClinVar data, utilizing statistical methods such as principal component analysis (PCA), heatmap, Welch's t-test, and Chi-Square test to evaluate the correlation, patterns, and classification abilities of different variant types. In addition, the study employed a machine learning approach that combines allele frequency filtering and logistic regression trained on both public and in-house datasets to prioritize single nucleotide variants (SNVs) and insertions/deletions (InDels). The resulting model assigns binary class labels (benign or pathogenic) and provides scores for variant classification. Promising performance was observed in both the ClinVar dataset and the unique patient datasets, demonstrating the model's potential for clinical application. The findings of this study hold the potential to enhance genetic rare-disease diagnostics and contribute to advancements in rare disease research.
|
192 |
Identifying Graph Characteristics in Growing Vascular NetworksPlummer, Christopher Finn January 2024 (has links)
One of the ways that a vascular network grows is through the process of angiogenesis, wherebya new blood vessel forms as a branch from an existing vessel towards an area which isstimulating vascular growth. Due to the demands for nutrients and waste transport, growingtumour cells will access the surrounding vascular network by inducing angiogenesis. Once thetumour is connected with the vascular system it can grow further and colonize distant organs.Given the critical nature of this step in tumour development, there is a demand for mathematicaland computational models to provide an understanding of the process for treatment in predictivemedicine. These models allow us to generate vascular networks that demonstrate similarbehaviour to that of the observed networks; however, there is a lack of quantifiable measures ofsimilarity between generated networks, or, of a generated and real network. Furthermore, thereis not an established way to determine which measures hold the most relevance todistinguishing similarity. To construct such a measure we transform our generated vascularnetworks into an abstract graph representation which allows exploration of the plethora of graphcentralities. We propose to determine the relevance of a centrality by finding one that acts as asynthetic likelihood function for estimating the model's parameters with minimal error.Evaluating the relevance of many centralities, it is then possible to suggest which centralitiesshould be used to quantitatively determine similarity. This allows for a way to measure howrealistic a model's growth is, and if given sufficient data, to distinguish between regular andtumour-induced angiogenesis and use it within cancer screening.
|
193 |
Signal Transduction in Diabetic NephropathySimonson, Michael Scott 27 August 2012 (has links)
No description available.
|
194 |
Discovery and Analysis of Genomic Patterns: Applications to Transcription Factor Binding and Genome RearrangementSINHA, AMIT U. 22 April 2008 (has links)
No description available.
|
195 |
Lateral Gene Transfer in Operons and Its Effects on Neighbouring GenesPasha, Asher 10 1900 (has links)
<p>Prokaryotes evolve, in part, by lateral gene transfer (LGT). This transfer of genetic material is likely important in the evolution of operons, a group of genes that are transcribed as a single mRNA. Genes that are transferred may then be integrated into genomes by homologous recombination. In this thesis, it was proposed that homologous recombination is the mechanism of integration of laterally transferred genes into operons. To investigate this proposal, a phylogenetic tree of Bacillus was inferred using DNA sequence alignments. LGT was inferred using a parsimony algorithm, and operons were inferred using OperonDB. Homologous recombination breakpoints were identified by permutation tests, <em>GENECONV</em> and maximum chi square algorithm. The results indicate that there is evidence for integration of functionally annotated genes into operons by homologous recombination. There are several laterally transferred genes that have recombination breakpoints before the start codon or after the stop codon of the genes. It was also proposed in this thesis that LGT causes an increase in the rate of evolution of genes that are neighbours of laterally transferred genes. To investigate this proposal, genes that are neighbours of laterally transferred genes in Bacillus were identified. These genes were classified as upstream or downstream genes to the LGT event. Genes that are not neighbours of laterally transferred genes were also identified as a control. Selection and the rate of evolution was studied using maximum likelihood models implemented in CodeML of PAML. Genes under positive selection were inferred using likelihood ratio tests. The results indicate that only a few neighbouring genes were under positive selection, and the rate of evolution of the neighbouring genes was slightly higher than that of the non-neighbouring genes. The high rates of evolution of the neighbouring genes are likely due to relaxed selection on the neighbouring genes.</p> / Master of Biological Science (MBioSci)
|
196 |
Stochastic Heuristic Program for Target Motif IdentificationZhang, Xian 12 August 1999 (has links)
<p> Identifying motifs that are "close" to one or more substrings in each sequence in a given set of sequences and hence characterize that set is an important problem in computational biology. The target motif identification problem requires motifs that characterize one given set of sequences but are far from every substring in another given set of sequences. This problem is N P-hard and hence is unlikely to have efficient optimal solution algorithms. In this thesis, we propose a set of modifications to one of the most popular stochastic heuristics for finding motifs, Gibbs Sampling [LAB+93], which allow this heuristic to detect target motifs. We also present the results of four simulation studies and tests on real protein datasets which suggest that these modified heuristics are very good at (and are even, in some cases, necessary for) detecting target motifs.</p> / Thesis / Master of Science (MSc)
|
197 |
Computational Tools for Molecular Networks in Biological SystemsZwolak, Jason W. 07 January 2005 (has links)
Theoretical molecular biologists try to understand the workings of cells through mathematics. Some theoreticians use systems of ordinary differential equations (ODEs) as the basis for mathematical modelling of molecular networks. This thesis develops algorithms for estimating molecular reaction rate constants within those mathematical models by fitting the models to experimental data. An additional step is taken to fit non-timecourse experimental data (e.g., transformations must be performed on the ODE solutions before the experimental and simulation data are similar, and therefore, comparable). VTDIRECT is used to perform (a deterministic direct search) global estimation and ODRPACK is used to perform (a trust region Levenberg-Marquardt based) local estimation of rate constants. One such transformation performed on the ODE solutions determines the value of the steady state of the ODE solutions. A new algorithm was developed that finds all steady state solutions of the ODE system given that the system has a special structure (e.g., the right hand sides of the ODEs are rational functions). Also, since the rate constants in the models cannot be negative and may have other restrictions on the values, ODRPACK was modified to address this problem of bound constraints. The new Fortran 95 version of ODRPACK is named ODRPACK95. / Ph. D.
|
198 |
In silico cell biology and biochemistry: a systems biology approachCamacho, Diogo Mayo 29 June 2007 (has links)
In the post-"omic" era the analysis of high-throughput data is regarded as one of the major challenges faced by researchers. One focus of this data analysis is uncovering biological network topologies and dynamics. It is believed that this kind of research will allow the development of new mathematical models of biological systems as well as aid in the improvement of already existing ones. The work that is presented in this dissertation addresses the problem of the analysis of highly complex data sets with the aim of developing a methodology that will enable the reconstruction of a biological network from time series data through an iterative process.
The first part of this dissertation relates to the analysis of existing methodologies that aim at inferring network structures from experimental data. This spans the use of statistical tools such as correlations analysis (presented in Chapter 2) to more complex mathematical frameworks (presented in Chapter 3). A novel methodology that focuses on the inference of biological networks from time series data by least squares fitting will then be introduced. Using a set of carefully designed inference rules one can gain important information about the system which can aid in the inference process. The application of the method to a data set from the response of the yeast Saccharomyces cerevisiae to cumene hydroperoxide is explored in Chapter 5. The results show that this method can be used to generate a coarse-level mathematical model of the biological system at hand. Possible developments of this method are discussed in Chapter 6. / Ph. D.
|
199 |
Unstable Communities in Network EnsemblesRahman, Md Ahsanur 07 January 2016 (has links)
Ensembles of graphs arise naturally in many applications, for example, the temporal evolution of social contacts or computer communications, tissue-specific protein interaction networks, annual citation or co-authorship networks in a field, or a family of high-likelihood Bayesian networks inferred from systems biology data. Several techniques have been developed to analyze such ensembles. A canonical problem is that of computing communities that are persistent across the ensemble. This problem is usually formulated as one of computing dense subgraphs (communities) that are frequent, i.e., appear in many graphs in the ensemble.
In this thesis, we seek to find "unstable communities" which are the antithesis of frequent, dense subgraphs. Informally, an unstable community is a set of nodes that induces highly-varying subgraphs in the ensemble. In other words, the graphs in the ensemble disagree about the precise pairwise connections among these nodes. The primary contribution of this dissertation is to introduce the concept of unstable communities as a novel problem in the field of graph mining. Specifically, it presents three approaches to mathematically formulate the concept of unstable communities, devises algorithms for computing such communities in a given ensemble of networks, and shows the usefulness of this concept in a variety of settings.
Our first definition of unstable community relies on two parameters: the first ensures that a node set induces several different subgraphs in the ensemble and the second guarantees that each of these subgraphs occurs in a large number of graphs in the ensemble. We present two algorithms to enumerate unstable communities that match this definition. The first approach, ClustMiner, is a heuristic that transforms the problem into one of computing dense subgraphs in a single graph that summarizes the ensemble. The second approach, UCMiner, is guaranteed to enumerate all maximal unstable communities correctly. We apply both approaches to systems biology datasets to demonstrate that UCMiner is superior to ClustMiner in the sense that ClustMiner's output contains node sets that are not unstable while also missing several communities computed by UCMiner. We find several node sets that capture the uncertain connectivity of genes in relevant protein complexes, suggesting that further experiments may be required to precisely discern their interaction patterns.
Our second and third definitions of unstable community rely on a novel concept of (scaled) subgraph divergence, a formulation that uses the concept of relative entropy to measure the instability of a community. We propose another algorithm, SDMiner, that can exactly enumerate all maximal unstable communities with small (scaled) subgraph divergence. We perform extensive experiments on social network datasets to show that we can discover UCs that capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets. / Ph. D.
|
200 |
Deterministic Parallel Global Parameter Estimation for a Model of the Budding Yeast Cell CyclePanning, Thomas D. 18 August 2006 (has links)
Two parallel deterministic direct search algorithms are combined to find improved parameters for a system of differential equations designed to simulate the cell cycle of budding yeast. Comparing the model simulation results to experimental data is difficult because most of the experimental data is qualitative rather than quantitative. An algorithm to convert simulation results to mutant phenotypes is presented. Vectors of the 143 parameters defining the differential equation model are rated by a discontinuous objective function. Parallel results on a 2200 processor supercomputer are presented for a global optimization algorithm, DIRECT, a local optimization algorithm, MADS, and a hybrid of the two. A second formulation is presented that uses a system of smooth inequalities to evaluate the phenotype of a mutant. Preliminary results of this formulation are given. / Master of Science
|
Page generated in 0.1275 seconds