Global ETD Search

11	Bayesian Inference for Genomic Data Analysis Ogundijo, Oyetunji Enoch January 2019 (has links) High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data. Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters. Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples. Electrical engineering Genomics--Data processing Genomics--Mathematical models Tumors Gene expression
12	Topics in Signal Processing: applications in genomics and genetics Elmas, Abdulkadir January 2016 (has links) The information in genomic or genetic data is influenced by various complex processes and appropriate mathematical modeling is required for studying the underlying processes and the data. This dissertation focuses on the formulation of mathematical models for certain problems in genomics and genetics studies and the development of algorithms for proposing efficient solutions. A Bayesian approach for the transcription factor (TF) motif discovery is examined and the extensions are proposed to deal with many interdependent parameters of the TF-DNA binding. The problem is described by statistical terms and a sequential Monte Carlo sampling method is employed for the estimation of unknown parameters. In particular, a class-based resampling approach is applied for the accurate estimation of a set of intrinsic properties of the DNA binding sites. Through statistical analysis of the gene expressions, a motif-based computational approach is developed for the inference of novel regulatory networks in a given bacterial genome. To deal with high false-discovery rates in the genome-wide TF binding predictions, the discriminative learning approaches are examined in the context of sequence classification, and a novel mathematical model is introduced to the family of kernel-based Support Vector Machines classifiers. Furthermore, the problem of haplotype phasing is examined based on the genetic data obtained from cost-effective genotyping technologies. Based on the identification and augmentation of a small and relatively more informative genotype set, a sparse dictionary selection algorithm is developed to infer the haplotype pairs for the sampled population. In a relevant context, to detect redundant information in the single nucleotide polymorphism (SNP) sites, the problem of representative (tag) SNP selection is introduced. An information theoretic heuristic is designed for the accurate selection of tag SNPs that capture the genetic diversity in a large sample set from multiple populations. The method is based on a multi-locus mutual information measure, reflecting a biological principle in the population genetics that is linkage disequilibrium. Signal processing Signal processing--Statistical methods Genetics--Data processing Genomics--Data processing Bioinformatics Transcription factors Electrical engineering
13	Genomic data mining for the computational prediction of small non-coding RNA genes Tran, Thao Thanh Thi 20 January 2009 (has links) The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies. Bioinformatics Non-coding RNA genes Operon prediction Neural networks Computational biology Non-coding RNA Data mining Genomics Data processing Genomes Data processing
14	A novel framework for binning environmental genomic fragments Yang, Bin, 杨彬 January 2010 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Genomics - Data processing. Genomes - Data processing. Microbial ecology - Data processing. Cluster analysis - Data processing. Cluster analysis - Computer programs.
15	The role of parallel computing in bioinformatics Akhurst, Timothy John January 2005 (has links) The need to intelligibly capture, manage and analyse the ever-increasing amount of publicly available genomic data is one of the challenges facing bioinformaticians today. Such analyses are in fact impractical using uniprocessor machines, which has led to an increasing reliance on clusters of commodity-priced computers. An existing network of cheap, commodity PCs was utilised as a single computational resource for parallel computing. The performance of the cluster was investigated using a whole genome-scanning program written in the Java programming language. The TSpaces framework, based on the Linda parallel programming model, was used to parallelise the application. Maximum speedup was achieved at between 30 and 50 processors, depending on the size of the genome being scanned. Together with this, the associated significant reductions in wall-clock time suggest that both parallel computing and Java have a significant role to play in the field of bioinformatics. Bioinformatics Parallel programming (Computer science) LINDA (Computer system) Java (Computer program language) Genomics -- Data processing
16	Genome-wide analyses of single cell phenotypes using cell microarrays Narayanaswamy, Rammohan, 1978- 29 August 2008 (has links) The past few decades have witnessed a revolution in recombinant DNA and nucleic acid sequencing technologies. Recently however, technologies capable of massively high-throughout, genome-wide data collection, combined with computational and statistical tools for data mining, integration and modeling have enabled the construction of predictive networks that capture cellular regulatory states, paving the way for ‘Systems biology’. Consequently, protein interactions can be captured in the context of a cellular interaction network and emergent ‘system’ properties arrived at, that may not have been possible by conventional biology. The ability to generate data from multiple, non-redundant experimental sources is one of the important facets to systems biology. Towards this end, we have established a novel platform called ‘spotted cell microarrays’ for conducting image-based genetic screens. We have subsequently used spotted cell microarrays for studying multidimensional phenotypes in yeast under different regulatory states. In particular, we studied the response to mating pheromone using a cell microarray comprised of the yeast non-essential deletion library and analyzed morphology changes to identify novel genes that were involved in mating. An important aspect of the mating response pathway is large-scale spatiotemporal changes to the proteome, an aspect of proteomics, still largely obscure. In our next study, we used an imaging screen and a computational approach to predict and validate the complement of proteins that polarize and change localization towards the mating projection tip. By adopting such hybrid approaches, we have been able to, not only study proteins involved in specific pathways, but also their behavior in a systemic context, leading to a broader comprehension of cell function. Lastly, we have performed a novel metabolic starvation-based screen using the GFP-tagged collection to study proteome dynamics in response to nutrient limitation and are currently in the process of rationalizing our observations through follow-up experiments. We believe this study to have implications in evolutionarily conserved cellular mechanisms such as protein turnover, quiescence and aging. Our technique has therefore been applied towards addressing several interesting aspects of yeast cellular physiology and behavior and is now being extended to mammalian cells. / text Genomics--Databases Genomics--Data processing DNA microarrays--Databases Phenotype--Data processing Pheromones--Data processing Proteomics--Data processing Proteins--Data processing
17	Identification of Publications on Disordered Proteins from PubMed Sirisha, Peyyeti 07 August 2012 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The literature corresponding to disordered proteins has been on a rise. As the number of publications increase, the time and effort needed to manually identify the relevant publications and protein information to add to centralized repository (called DisProt) is becoming arduous and critical. Existing search facilities on PubMed can retrieve a seemingly large number of publications based on keywords and does not have any support for ranking them based on the probability of the protein names mentioned in a given abstract being added to DisProt. This thesis explores a novel system of using disorder predictors and context based dictionary methods to quickly identify publications on disordered proteins from the PubMed database. NLProt, which is built around Support Vector Machines, is used to identify protein names and PONDR-FIT which is an Artificial Neural Network based meta- predictor is used for identifying protein disorder. The work done in this thesis is of immediate significance in identifying disordered protein names. We have tested the new system on 100 abstracts from DisProt [these abstracts were found to be relevant to disordered proteins and were added to DisProt manually by the annotators.] This system had an accuracy of 87% on this test set. We then took another 100 recently added abstracts from PubMed and ran our algorithm on them. This time it had an accuracy of 68%. We suggested improvements to increase the accuracy and believe that this system can be applied for identifying disordered proteins from literature. DisProt, Database, Software Tool Proteins -- Analysis Bioinformatics Database searching Genomics -- Data processing
18	Assessment of genome visualization tools relevant to HIV genome research: development of a genome browser prototype. Boardman, Anelda Philine January 2004 (has links) <p>Over the past two decades of HIV research, effective vaccine candidates have been elusive. Traditionally viral research has been characterized by a gene -by-gene approach, but in the light of the availability of complete genome sequences and the tractable size of the HIV genome, a genomic approach may improve insight into the biology and epidemiology of this virus. A genomic approach to finding HIV vaccine candidates can be facilitated by the use of genome sequence visualization. Genome browsers have been used extensively by various groups to shed light on the biology and evolution of several organisms including human, mouse, rat, Drosophila and C.elegans. Application of a genome browser to HIV genomes and related annotations can yield insight into forces that drive evolution, identify highly conserved regions as well as regions that yields a strong immune response in patients, and track mutations that appear over the course of infection. Access to graphical representations of such information is bound to support the search for effective HIV vaccine candidates. This study aimed to answer the question of whether a tool or application exists that can be modified to be used as a platform for development of an HIV visualization application and to assess the viability of such an implementation. Existing applications can only be assessed for their suitability as a basis for development of an HIV genome browser once a well-defined set of assessment criteria has been compiled.</p> AIDS (Disease), Genetic aspects HIV (Viruses), Genetic aspects HIV (Viruses), Data processing AIDS (Disease), Data processing Genomics, Data processing.
19	Assessment of genome visualization tools relevant to HIV genome research: development of a genome browser prototype. Boardman, Anelda Philine January 2004 (has links) <p>Over the past two decades of HIV research, effective vaccine candidates have been elusive. Traditionally viral research has been characterized by a gene -by-gene approach, but in the light of the availability of complete genome sequences and the tractable size of the HIV genome, a genomic approach may improve insight into the biology and epidemiology of this virus. A genomic approach to finding HIV vaccine candidates can be facilitated by the use of genome sequence visualization. Genome browsers have been used extensively by various groups to shed light on the biology and evolution of several organisms including human, mouse, rat, Drosophila and C.elegans. Application of a genome browser to HIV genomes and related annotations can yield insight into forces that drive evolution, identify highly conserved regions as well as regions that yields a strong immune response in patients, and track mutations that appear over the course of infection. Access to graphical representations of such information is bound to support the search for effective HIV vaccine candidates. This study aimed to answer the question of whether a tool or application exists that can be modified to be used as a platform for development of an HIV visualization application and to assess the viability of such an implementation. Existing applications can only be assessed for their suitability as a basis for development of an HIV genome browser once a well-defined set of assessment criteria has been compiled.</p> AIDS (Disease), Genetic aspects HIV (Viruses), Genetic aspects HIV (Viruses), Data processing AIDS (Disease), Data processing Genomics, Data processing.
20	Analysis of integration sites of transgenic sheep generated by lentiviral vectors using next-generation sequencing technology Chen, Yu-Hsiang 31 July 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The development of new methods to carry out gene transfer has many benefits to several fields, such as gene therapy, agriculture and animal health. The newly established lentiviral vector systems further increase the efficiency of gene transfer dramatically. Some studies have shown that lentiviral vector systems enhance efficiency over 10-fold higher than traditional pronuclear injection. However, the timing for lentiviral vector integration to occur remains unclear. Integrating in different stages of embryogenesis might lead to different integration patterns between tissues. Moreover, in our previous study we found that the vector copy number in transgenic sheep varied, some having one or more copies per cells while other animals having less than one copy per cell suggesting mosaicism. Here I hypothesized that injection of a lentiviral vector into a single cell embryo can lead to integration very early in embryogenesis but can also occur after several cell divisions. In this study, we focus on investigating integration sites in tissues developing from different germ layers as well as extraembryonic tissues to determine when integration occurs. In addition, we are also interested in insertional mutagenesis caused by viral sequence integration in or near gene regions. We utilize linear amplification-mediated polymerase chain reaction (LAM-PCR) and next- generation sequencing (NGS) technology to determine possible integration sites. In this study, we found the evidence based on a series of experiments to support my hypothesis, suggesting that integration event also happens after several cell divisions. For insertional mutagenesis analysis, the closest genes can be found according to integration sites, but they are likely too far away from the integration sites to be influenced. A well-annotated sheep genome database is needed for insertional mutagenesis analysis. Genetic transformation -- Methodology Genetic vectors -- Research Genetic engineering -- Methodology Gene expression -- Analysis Sheep -- Physiology Mosaicism Lentiviruses Molecular cloning Cell division Mutagenesis Gene mapping Nucleotide sequence Genomics -- Data processing Genomics -- Technique Embryology

Search results