Spelling suggestions: "subject:"bioinformatics"" "subject:"ioinformatics""
91 |
A bioinformatic study on the feasibility of a cross-species proteomics analyses of mycobacteriaRajaonarifara, Elinambinina January 2013 (has links)
Includes abstract.
Includes bibliographical references.
|
92 |
New Approaches of Differential Gene Expression Analysis and Cancer Immune Evasion Mechanism IdentificationUnknown Date (has links)
Background: Genomic and epigenomic data analyses has been a popular research area in the 21st century. Common research problems include detecting differentially expressed genes between groups, clustering and classification using genomic data in order to study the heterogeneity of a disease, and dividing a sequence of measurements along a genome into segments to identify different functional regions of the genome. This study gives a comprehensive investigation of the aforementioned tasks, with emphasis on developing new computational methodologies. Normalization is an important data preparation step in gene expression analyses, in order to remove various systematic noise, therefore reduce sample variance and increase the power of subsequent statistical analyses. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce bias to the data. A question of interest is how to avoid inflation of type I error rate and loss of statistical power incurred by this bias. Breast cancer (BRCA) can escape immune surveillance using 6 known evasion mechanisms, yet the complexity of combination of these mechanisms used by subsets of human BRCA patients is not fully understood. In the era of immunotherapy and personalized medication, there is an urgent need for advancing the knowledge of immune evasion clusters (IEC) in BRCA and identifying reliable biomarkers, which is essential for better understanding of patients’ response to immunotherapies and for rational clinical trial design of combination immunotherapies. Identification of functional enriched regions of a genome often requires dividing a sequence of measurements along the genome into segments where adjacent segments have different properties (e.g. mean values). Despite dozens of algorithms developed to address this issue, accuracy and computational efficiency still need to be improved, to tackle both existing and emerging segmentation problems in genomic and epigenomic research. Results: In chapter 1 of this study we propose a new differential gene expression analysis pipeline super-delta, that pairs a modified t-test derived based on large sample theory with a robust multivariate extension of global normalization, designed to minimize the bias introduced by DEGs. In simulation studies, Super-delta was compared to four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization, and shown to have better statistical power with tighter type I error control. We then applied all methods to a microarray gene expression dataset on BRCA patients who received neoadjuvant chemotherapy. Super-delta was able to identify marginally more DEGs than its competitors, in addition to the substantial overlap of DEGs identified by all of them. Appropriate adaptations are under active development to make this procedure framework incorporated with RNA-Seq data and more general between-group comparison problems. In chapter 2, we developed a sequential biclustering (SBiC) method based on existing biclustering approach using the plaid model and applied it to the log2 normalized RNA-seq data of immune related genes of BRCA patients from The Cancer Genome Atlas (TCGA). We identified seven clusters for 81% of the studied samples. We found that 78.8% of these samples evade through TGF-β immunosuppression, 57.75% through DcR3 counterattack, 48% through CTLA4, and 27.8% through PD-1. Interestingly, combination of TGF-β and DcR3 was pronounced in 57.75% of patients and evasion through DcR3 was exclusive to the lobular invasive subgroup. In addition, triple negative breast cancer (TNBC) patients split equally into 2 clusters: one with impaired antigen presentation and another with high leukocyte recruitment but a combination of 4 evasion mechanisms. We also identified biomarkers that play important roles in distinguishing immune evasion mechanisms. These findings provide a better understanding of patients’ response to immunotherapies and shed light to rational design of novel combination immunotherapies. In chapter 3, We designed an efficient algorithm called iSeg, for segmentation of genomic and epigenomic profiles. It first utilizes dynamic programming to identify candidate significant segments, then uses a novel data structure based on coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Merging of significant segments are performed at the end to generate the final set of segments. The algorithm can serve as a general computational framework that works with different model assumptions of the data. As a general procedure, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, and (differential) nuclease sensitivity. We evaluated iSeg using both simulated and experimental datasets and showed that it performs satisfactorily when compared with some popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is very computationally efficient, well suited for long sequences and large number of input data profiles. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 11, 2018. / differential gene expression analysis, immune evasion mechanism, robust data normalization, segmentation, sequential biclustering, Super-delta / Includes bibliographical references. / Jinfeng Zhang, Professor Directing Dissertation; Qing-Xiang (Amy) Sang, University Representative; Qing Mai, Committee Member; Yiyuan She, Committee Member.
|
93 |
Applications of Machine Learning to Precision MedicineUnknown Date (has links)
Work is presented from two projects, each involving an application of machine learning to precision medicine. The first project was for the Document Triage Task of the BioCreative VI Precision Medicine Track. Teams were asked to build machine learning models to identify journal abstracts that contain at least one mention of a protein-protein interaction (PPI) affected by a mutation. The second project is an analysis of gene expression data from a group of breast cancer patients receiving neoadjuvant chemotherapy to search for biomarkers predicting the outcome of treatment. The model developed for the Biocreative challenge did not use state of the art methods but achieved results only slightly worse than modern deep learning techniques. My contribution to this project was in feature engineering, model tuning and model validation. The feature engineering process will be presented along with a discussion of difficulties due to scarcity of data. The data for the second project was collected from breast cancer patients at the Sun Yat-sen University Cancer Center in Guangzhou China. RNASeq data and clinical information were collected from patients before and after undergoing neoadjuvant chemotherapy. Genes and pathways of potential relevance to the outcome of neoadjuvant therapy were identified for further study. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / 2019 / June 12, 2019. / Biomarkers, Genomics, Machine Learning, Neoadjuvant Chemotherapy, Precision Medicine, Text Mining / Includes bibliographical references. / Jinfeng Zhang, Professor Directing Dissertation; Tingting Zhao, University Representative; Mingjing Tao, Committee Member; Wei Wu, Committee Member.
|
94 |
Network-based approach for post genome-wide association study analysis in admixed populationsMbiyavanga, Mamana January 2014 (has links)
Includes abstract. / Includes bibliographical references. / In this project, we review some existing pathway-based approaches for GWA study analyses, by exploring different implemented methods for combining effects of multiple modest genetic variants at gene and pathway levels. We then propose a graph-based method, ancGWAS, that incorporates the signal from GWA study, and the locus-specific ancestry into the human protein-protein interaction (PPI) network to identify significant sub-networks or pathways associated with the trait of interest. This network-based method applies centrality measures within linkage disequilibrium (LD) on the network to search for pathways and applies a scoring summary statistic on the resulting pathways to identify the most enriched pathways associated with complex diseases.
|
95 |
Identification of the virulence gene of Mycobacterium tuberculosisRabiu, Halimah Adenike January 2007 (has links)
Includes bibliographical references (leaves [103]-119). / The major thrust of this project is to identify and characterize potential virulence genes from M. tuberculosis. To this end, we have compiled and integrated information from various public databases to catalogue 246573 microbial genes from 84 organisms, including pathogens and non pathogenic microbes. We determined the phylogenetic distributions by grouping the proteins into families based on sequence similarity with the aid of BLASTP and the NCBI BLASTClust program.
|
96 |
Data mining of host transcriptome and microbiome in pulmonary diseaseZhao, Yue 28 October 2020 (has links)
Pulmonary disease is one of the most common and serious medical conditions in the world, and the correct diagnosis and prediction of incipient pulmonary diseases such as tuberculosis (TB) and lung cancer can greatly decrease the number of pulmonary disease-related deaths. In this thesis, I studied the transcriptome and microbiome difference between pulmonary disease patients and healthy controls, developed and applied several pipelines incorporating bioinformatics methods, statistics and machine learning models to identify patterns in human transcriptome as well as microbiome data for pulmonary disease prediction. On the host transcriptome side, I first evaluated the performance of existing TB disease and TB progression biomarkers, created a bulk RNA-seq gene-expression based biomarker selection pipeline, and then identified a 29-gene signature that can correctly predict TB progression as far as 6 years before the TB diagnosis. On microbiome side, I developed Animalcules, an R package for microbiome data analysis such as diversity comparison and differential abundance analysis, which supports both user graphical interface and command-line functions. I then applied Animalcules for two microbiome case studies: identifying the TB and Asthma related microbes. After working on host transcriptome and microbiome separately, I then discussed the computational framework for identifying host-microbe interactions, and its significant potential for studying pulmonary disease pathogenesis, diagnosis and treatment.
|
97 |
Prevalence and frequency spectra of single nucleotide polymorphisms at exon-intron junctions of human genesLupindo, Bukiwe January 2008 (has links)
Includes bibliographical references (leaves 92-112). / In humans and other higher eukaryotes the observation of multiple splice isoforms for a given gene is common. However it is not clear whether all of these alternatively spliced isoforms are a product of true alternative splicing or some are due to DNA sequence variations in human populations. Genetic variations that affect splicing have been shown to cause variation in splicing patterns and potentially are an important source of phenotypic variability among humans. Furthermore, variation in disease susceptibility and manifestation between individuals is often associated with genetic polymorphisms that determine the way in which genes are spliced. Hence, identification of genetic polymorphisms that might affect the way in which pre-mRNAs are spliced is an area of great interest.
|
98 |
Dynamics of Microbial Genome EvolutionHooper, Sean January 2003 (has links)
<p>The success of microbial life on Earth can be attributed not only to environmental factors, but also to the surprising hardiness, adaptability and flexibility of the microbes themselves. They are able to quickly adapt to new niches or circumstances through gene evolution and also by sheer strength of numbers, where statistics favor otherwise rare events.</p><p>An integral part of adaptation is the plasticity of the genome; losing and acquiring genes depending on whether they are needed or not. Genomes can also be the birthplace of new gene functions, by duplicating and modifying existing genes. Genes can also be acquired from outside, transcending species boundaries. In this work, the focus is set primarily on duplication, deletion and import (lateral transfer) of genes – three factors contributing to the versatility and success of microbial life throughout the biosphere. </p><p>We have developed a compositional method of identifying genes that have been imported into a genome, and the rate of import/deletion turnover has been appreciated in a number of organisms. Furthermore, we propose a model of genome evolution by duplication, where through the principle of gene amplification, novel gene functions are discovered within genes with weak- or secondary protein functions. Subsequently, the novel function is maintained by selection and eventually optimized. Finally, we discuss a possible synergic link between lateral transfer and duplicative processes in gene innovation.</p>
|
99 |
Predicting Function of Genes and Proteins from Sequence, Structure and Expression DataHvidsten, Torgeir R. January 2004 (has links)
<p>Functional genomics refers to the task of determining gene and protein function for whole genomes, and requires computational analysis of large amounts of biological data including DNA and protein sequences, protein structures and gene expressions. Machine learning methods provide a powerful tool to this end by first inducing general models from such data and already characterized genes or proteins and then by providing hypotheses on the functions of the remaining, uncharacterized cases.</p><p>This study contains four parts giving novel contributions to functional genomics through the analysis of different biological data and different aspects of biological functions. Gene Ontology played an important part in this research providing a controlled vocabulary for describing the cellular roles of genes and proteins in terms of specific molecular functions and broad biological processes.</p><p>The first part used gene expression time profiles to learn models capable of predicting the participation of genes in biological processes. The model consists of IF-THEN rules associating biological processes with minimal set of discrete changes in expression level over limited periods of time. The models were used to hypothesize new biological processes for both characterized and uncharacterized genes.</p><p>The second part investigated the combinatorial nature of gene regulation by inducing IF-THEN rules associating minimal combinations of sequence motifs common to genes with similar expression profiles. Such combinations were shown to be significantly correlated to function, and provided hypotheses on the mechanisms behind the regulation of gene expression in several biological responses.</p><p>The third part used a novel concept of local descriptors of protein structure to investigate sequence patterns governing protein structure at a local level and to predict the topological class (fold) of protein domains from sequence. Finally, the fourth part used local descriptors to represent protein structure and induced IF-THEN rule models predicting molecular function from structure.</p>
|
100 |
Dynamics of Microbial Genome EvolutionHooper, Sean January 2003 (has links)
The success of microbial life on Earth can be attributed not only to environmental factors, but also to the surprising hardiness, adaptability and flexibility of the microbes themselves. They are able to quickly adapt to new niches or circumstances through gene evolution and also by sheer strength of numbers, where statistics favor otherwise rare events. An integral part of adaptation is the plasticity of the genome; losing and acquiring genes depending on whether they are needed or not. Genomes can also be the birthplace of new gene functions, by duplicating and modifying existing genes. Genes can also be acquired from outside, transcending species boundaries. In this work, the focus is set primarily on duplication, deletion and import (lateral transfer) of genes – three factors contributing to the versatility and success of microbial life throughout the biosphere. We have developed a compositional method of identifying genes that have been imported into a genome, and the rate of import/deletion turnover has been appreciated in a number of organisms. Furthermore, we propose a model of genome evolution by duplication, where through the principle of gene amplification, novel gene functions are discovered within genes with weak- or secondary protein functions. Subsequently, the novel function is maintained by selection and eventually optimized. Finally, we discuss a possible synergic link between lateral transfer and duplicative processes in gene innovation.
|
Page generated in 0.1176 seconds