1 |
Design of information systems in computational genomics /Croft, Larry. January 2002 (has links) (PDF)
Thesis (Ph.D.) - University of Queensland, 2004. / Includes bibliography.
|
2 |
Charting the single-cell transcriptional landscape of haematopoiesisHamey, Fiona Kathryn January 2019 (has links)
High turnover in the haematopoietic system is sustained by stem and progenitor cells, which divide and mature to produce the range of cell types present in the blood. This complex system has long served as a model of differentiation in adult stem cell systems and its study has important clinical relevance. Maintaining a healthy blood system requires regulation of haematopoietic cell fate decisions, with severe dysregulation of these fate choices observed in diseases such as leukaemia. As transcriptional regulation is known to play a role in this regulation, the gene expression of many haematopoietic progenitors has been measured. However, many of the classic populations are actually extremely heterogeneous in both expression and function, highlighting the need for characterising the haematopoietic progenitor compartment at the level of individual cells. The first aim of this work was to chart the single-cell transcriptional landscape of the haematopoietic stem and progenitor cell (HSPC) compartment. To build a comprehensive map of this landscape, 1,654 HSPCs from mouse bone marrow were profiled using single-cell RNA-sequencing. Analysis of these data generated a useful resource, and reconstructed changes in gene expression, cell cycle and RNA content along differentiation trajectories to three blood lineages. To investigate how single-cell gene expression can be used to learn about regulatory relationships, data measuring the expression of 41 genes (including 31 transcription factors) in 2,167 stem and progenitor cells were used to construct Boolean gene regulatory network models describing the regulation of differentiation from stem cells to two different progenitor populations. The inferred relationships revealed positive regulation of Nfe2 and Cbfa2t3h by Gata2 that was unique to differentiation towards megakaryocyte-erythroid progenitors, which was subsequently experimentally validated. The next study focused on investigating the link between transcriptional and functional heterogeneity within blood progenitor populations. Single-cell profiles of human cord blood progenitors revealed a continuum of lympho-myeloid gene expression. Culture assays performed to assess the functional output of single cells found both unilineage and bilineage output and, by investigating the link between surface marker expression and function, a new sorting strategy was devised that was able to enrich for function within conventional lympho-myeloid progenitor sorting gates. The final project aimed to study changes to the HSPC compartment in a perturbed state. A droplet-based single-cell RNA-sequencing dataset of 44,802 cells was analysed to identify entry points to eight blood lineages and to characterise gene expression changes in this transcriptional landscape. Mapping single-cell data from W41/W41 Kit mutant mice highlighted quantitative shifts in progenitor populations such as a reduction in mast cell progenitors and an increase towards more mature progenitors along the erythroid trajectory. Differential gene expression identified upregulation of stress response and a reduction of apoptosis during erythropoiesis as potential compensatory mechanisms in the Kit mutant progenitors. Together this body of work characterises the HSPC compartment at single-cell level and provides methods for how single-cell data can be used to discover regulatory relationships, link expression heterogeneity to function, and investigate changes in the transcriptional landscape in a perturbed environment.
|
3 |
Finding motif pairs from protein interaction networksSiu, Man-hung. January 2008 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2008. / Includes bibliographical references (leaf 63-69) Also available in print.
|
4 |
Metabolic pathway analysis via integer linear programmingPlanes, Francisco J. January 2008 (has links)
The understanding of cellular metabolism has been an intriguing challenge in classical cellular biology for decades. Essentially, cellular metabolism can be viewed as a complex system of enzyme-catalysed biochemical reactions that produces the energy and material necessary for the maintenance of life. In modern biochemistry, it is well-known that these reactions group into metabolic pathways so as to accomplish a particular function in the cell. The identification of these metabolic pathways is a key step to fully understanding the metabolic capabilities of a given organism. Typically, metabolic pathways have been elucidated via experimentation on different organisms. However, experimental findings are generally limited and fail to provide a complete description of all pathways. For this reason it is important to have mathematical models that allow us to identify and analyze metabolic pathways in a computational fashion. This is precisely the main theme of this thesis. We firstly describe, review and discuss existent mathematical/computational approaches to metabolic pathways, namely stoichiometric and path finding approaches. Then, we present our initial mathematical model named the Beasley-Planes (BP) model, which significantly improves on previous stoichiometric approaches. We also illustrate a successful application of the BP model to optimally disrupt metabolic pathways. The main drawback of the BP model is that it needs as input extra pathway knowledge. This is especially inappropriate if we wish to detect unknown metabolic pathways. As opposed to the BP model and stoichoimetric approaches, this issue is not found in path finding approaches. For this reason a novel path finding approach is built and examined in detail. This analysis serves us as inspiration to build the Improved Beasley-Planes (IBP) model. The IBP model incorporates elements of both stoichometric and path finding approaches. Though somewhat less accurate than the BP model, the IBP model solves the issue of extra pathway knowledge. Our research clearly demonstrates that there is a significant chance of developing a mathematical optimisation model that underlies many/all metabolic pathways.
|
5 |
Integrative methods for gene data analysis and knowledge discovery on the case study of KEDRI's brain gene ontology a thesis submitted to Auckland University of Technology in partial fulfilment of the requirements for the degree of Master of Computer and Information sciences, 2008 /Wang, Yuepeng January 2008 (has links)
Thesis (MCIS) -- AUT University, 2008. / Includes bibliographical references. Also held in print ( 131 leaves : ill. ; 30 cm.) in the Archive at the City Campus (T 616.99404200285 WAN)
|
6 |
Efficient Algorithms for Comparing, Storing, and Sharing Large Collections of Phylogenetic TreesMatthews, Suzanne 2012 May 1900 (has links)
Evolutionary relationships between a group of organisms are commonly summarized in a phylogenetic (or evolutionary) tree. The goal of phylogenetic inference is to infer the best tree structure that represents the relationships between a group of organisms, given a set of observations (e.g. molecular sequences). However, popular heuristics for inferring phylogenies output tens to hundreds of thousands of equally weighted candidate trees. Biologists summarize these trees into a single structure called the consensus tree. The central assumption is that the information discarded has less value than the information retained. But, what if this assumption is not true?
In this dissertation, we demonstrate the value of retaining and studying tree collections. We also conduct an extensive literature search that highlights the rapid growth of trees produced by phylogenetic analysis. Thus, high performance algorithms are needed to accommodate this increasing production of data. We created several efficient algorithms that allow biologists to easily compare, store and share tree collections over tens to hundreds of thousands of phylogenetic trees. Universal hashing is central to all these approaches, allowing us to quickly identify the shared evolutionary relationships contained in tree collections. Our algorithms MrsRF and Phlash are the fastest in the field for comparing large collections of trees. Our algorithm TreeZip is the most efficient way to store large tree collections. Lastly, we developed Noria, a novel version control system that allows biologists to seamlessly manage and share their phylogenetic analyses.
Our work has far-reaching implications for both the biological and computer science communities. We tested our algorithms on four large biological datasets, each consisting of 20; 000 to 150; 000 trees over 150 to 525 taxa. Our experimental results on these datasets indicate the long-term applicability of our algorithms to modern phylogenetic analysis, and underscore their ability to help scientists easily exchange and analyze their large tree collections. In addition to contributing to the reproducibility of phylogenetic analysis, our work enables the creation of test beds for improving phylogenetic heuristics and applications. Lastly, our data structures and algorithms can be applied to managing other tree-like data (e.g. XML).
|
7 |
Protein loop structure predictionChoi, Yoonjoo January 2011 (has links)
This dissertation concerns the study and prediction of loops in protein structures. Proteins perform crucial functions in living organisms. Despite their importance, we are currently unable to predict their three dimensional structure accurately. Loops are segments that connect regular secondary structures of proteins. They tend to be located on the surface of proteins and often interact with other biological agents. As loops are generally subject to more frequent mutations than the rest of the protein, their sequences and structural conformations can vary significantly even within the same protein family. Although homology modelling is the most accurate computational method for protein structure prediction, difficulties still arise in predicting protein loops. Protein loop structure prediction is therefore a bottleneck in solving the protein structure prediction problem. Reflecting on the success of homology modelling, I implement an improved version of a database search method, FREAD. I show how sequence similarity as quantified by environment specific substitution scores can be used to significantly improve loop prediction. FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than ab initio methods; FREAD's predictive ability is length independent. In general, it produces results within 2Å root mean square deviation (RMSD) from the native conformations, compared to an average of over 10Å for loop length 20 for any of the other tested ab initio methods. I then examine FREAD’s predictive ability on a specific type of loops called complementarity determining regions (CDRs) in antibodies. CDRs consist of six hypervariable loops and form the majority of the antigen binding site. I examine CDR loop structure prediction as a general case of loop structure prediction problem. FREAD achieves accuracy similar to specific CDR predictors. However, it fails to accurately predict CDR-H3, which is known to be the most challenging CDR. Various FREAD versions including FREAD with contact information (ConFREAD) are examined. The FREAD variants improve predictions for CDR-H3 on homology models and docked structures. Lastly, I focus on the local properties of protein loops and demonstrate that the protein loop structure prediction problem is a local protein folding problem. The end-to-end distance of loops (loop span) follows a distinctive frequency distribution, regardless of secondary structure elements connected or the number of residues in the loop. I show that the loop span distribution follows a Maxwell-Boltzmann distribution. Based on my research, I propose future directions in protein loop structure prediction including estimating experimentally undetermined local structures using FREAD, multiple loop structure prediction using contact information and a novel ab initio method which makes use of loop stretch.
|
Page generated in 0.1803 seconds