Spelling suggestions: "subject:"bioinformatics"" "subject:"ioinformatics""
Revisiting and re-computing the X-score scoring functionMambo, Hilaire Mobele January 2014 (has links)
Includes bibliographical references. / Scoring functions seek to compute in different ways protein-ligand binding energies by summing together the individual pairwise atomic interaction energies observed in crystal structures between the protein and the bound ligand. To date though, accurate prediction remains a big challenge since existing scoring functions fail to reproduce known binding energies with a sufficient degree of accuracy and robustness. To overcome this problem, we assign a discrete weighting to the individual atomic interaction to account for entropic desolvation factors on ligand binding. We thereafter re-compute the revised scoring function and test the output against multiple sets of data to examine the robustness of the heuristic weightings used.
Evaluating the predictive performance of cytotoxic T lymphocyte epitope prediction tools using Elispot assay dataMeraba, Rebone Leboreng January 2018 (has links)
Computational T-cell epitope prediction tools have been previously devised to predict potential human leukocyte antigen (HLA) binding peptides from protein sequences. These tools are complements of Enzyme-linked immunosorbent spot (ELISpot) assays - a very commonly applied immunological technique that is used both to identify regions of pathogen genomes that trigger an immune response and to characterize the relationships between an individual's complement of HLA alleles and the degree of immunity that they display. If computational tools could accurately predict HLA-peptide binding, then these tools might be useable as a cheap and reliable alternative to ELISpot assays. A web-based IFN γ ELISpot assay dataset sharing resource, called IMMUNO-SHARE, was developed to enable the simple and straightforward storage and dissemination amongst researchers of large volumes of IFN γ ELISpot assay data. Such experimental data was next used to make HLA-peptide binding predictions with four frequently used T-cell epitope prediction tools - netMHC 3.2, IEDB_ANN, IEDB_ARB Matrix and IEDB_SMM. The predictive performances of all four tools individually and collectively was statistically assessed using non-parametric Spearman rank-order correlation tests. It was found that none of the four tested tools yielded binding affinity predictions that were detectably correlated with the observed ELISpot data. High false positive rates, where high predicted binding affinities between peptides and patient HLAs corresponded in these patients with no appreciable immune responses, were apparent for all four of the tested methods. The low degree of correlation between ELISpot data and HLA-peptide binding predictions and in particular, high false positive rates and relatively low true positive and true negative rates, indicate that the four tested tools would require substantial improvement before they could be seen as a viable alternative to ELISpot assays. Given that the accuracy of predictions of each of the four methods tested is largely dependent on both the quantity and quality of known true binder and true non-binder datasets that were used to train the HLA-peptide binding prediction methods implemented by the tools, it is plausible that the accuracy of these tools could be increased with larger training datasets. Retraining either the current methods or the next generation of prediction tools would therefore be greatly facilitated by the availability of large quantities of publically available HLA-peptide binding interaction information. It is hoped that IMMUNO-SHARE or some other ELISpot data sharing resource could eventually meet this need.
Influence of gut microbiota on immune system in infantsKachambwa, Paidamoyo January 2017 (has links)
Background and Methods: Microbiota play many significant, direct or indirect, beneficial and detrimental roles in humans. Microbiome development is established at infancy where diet plays a directive role in the proliferation of gut microbes. It has been shown that the presence of a defined set of microbes has been known to increase the overall immunological capacity, which vaccines depend on to be effective. To date, little work has been done on the effect of the microbiota on immune system at infancy, thus an analysis of the microbial ecology present in the infant's gut and its correlation with immune activation is needed. Expression of genes involved in mediating and regulating immunity can be measured as an indicator of immune activity. Vaccines work by stimulating an immune response which can be measured by gene expression levels. This affects the infant's ability to establish a strong immune system, which is also dictated at infancy. 16s rRNA sequence data generated from 134 infant stool samples, at vaccination points 0, 6 and 14 weeks from infants that were either breast or formula fed, was analysed using the Quantitative Insights Into Microbial Ecology (QIIME) pipeline to detect different taxonomic groups that make up a particular microbiome. Statistical analysis in R was used to quantify the diversity of the different microbial groups in the gut. Expression levels of immune-related genes were measured from blood samples that were stimulated by a Bacillus Calmette–Guérin (BCG) antigen and correlated with microbiota compositions. Results and Conclusion: Microbiome data showed initial differentiation between breast and mixed fed infants.15% of 5 of the most abundant bacteria for breast fed infants were Bifidobacteriales, which are known for their probiotic properties. The data did not fully cluster as the oldest samples were taken quite early at 14 weeks. Individual bacteria were correlated with individual gene expression level data. The study shows the relative abundance of particular bacteria, comparing against feeding modality and demonstrated how the microbiota correlates with gene expression levels. At week 14, Bifidobacterium of abundance below 0 (heatmap log₁₀ scale) generally correlated with high CASP3 gene expression levels in breast fed babies while abundances above 1 correlated with low gene expression levels. Gene expression at abnormal levels usually has undesirable effects which result in dysfunctional immune reactions that lead to conditions ranging from autoimmune diseases to cancer.
Characterisation of the metabolome of Mycobacterium tuberculosis to identify new pathways and pathway holesWolfenden, Kristen Marie January 2014 (has links)
Includes abstract. / Includes bibliographical references. / Due to high incidence rates and the development of new drug-resistant or multidrug-resistant strains of TB, the development of new medicines and treatments for tuberculosis is a necessity. In order to develop these drugs, Mycobacterium tuberculosis (Mtb) needs to be studied more completely; this study performs a characterisation of the metabolome of Mtb and comparison across the phylogenetic profile to identify notable pathways.
A bioinformatic study on the feasibility of a cross-species proteomics analyses of mycobacteriaRajaonarifara, Elinambinina January 2013 (has links)
Includes abstract. Includes bibliographical references.
New Approaches of Differential Gene Expression Analysis and Cancer Immune Evasion Mechanism IdentificationUnknown Date (has links)
Background: Genomic and epigenomic data analyses has been a popular research area in the 21st century. Common research problems include detecting differentially expressed genes between groups, clustering and classification using genomic data in order to study the heterogeneity of a disease, and dividing a sequence of measurements along a genome into segments to identify different functional regions of the genome. This study gives a comprehensive investigation of the aforementioned tasks, with emphasis on developing new computational methodologies. Normalization is an important data preparation step in gene expression analyses, in order to remove various systematic noise, therefore reduce sample variance and increase the power of subsequent statistical analyses. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce bias to the data. A question of interest is how to avoid inflation of type I error rate and loss of statistical power incurred by this bias. Breast cancer (BRCA) can escape immune surveillance using 6 known evasion mechanisms, yet the complexity of combination of these mechanisms used by subsets of human BRCA patients is not fully understood. In the era of immunotherapy and personalized medication, there is an urgent need for advancing the knowledge of immune evasion clusters (IEC) in BRCA and identifying reliable biomarkers, which is essential for better understanding of patients’ response to immunotherapies and for rational clinical trial design of combination immunotherapies. Identification of functional enriched regions of a genome often requires dividing a sequence of measurements along the genome into segments where adjacent segments have different properties (e.g. mean values). Despite dozens of algorithms developed to address this issue, accuracy and computational efficiency still need to be improved, to tackle both existing and emerging segmentation problems in genomic and epigenomic research. Results: In chapter 1 of this study we propose a new differential gene expression analysis pipeline super-delta, that pairs a modified t-test derived based on large sample theory with a robust multivariate extension of global normalization, designed to minimize the bias introduced by DEGs. In simulation studies, Super-delta was compared to four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization, and shown to have better statistical power with tighter type I error control. We then applied all methods to a microarray gene expression dataset on BRCA patients who received neoadjuvant chemotherapy. Super-delta was able to identify marginally more DEGs than its competitors, in addition to the substantial overlap of DEGs identified by all of them. Appropriate adaptations are under active development to make this procedure framework incorporated with RNA-Seq data and more general between-group comparison problems. In chapter 2, we developed a sequential biclustering (SBiC) method based on existing biclustering approach using the plaid model and applied it to the log2 normalized RNA-seq data of immune related genes of BRCA patients from The Cancer Genome Atlas (TCGA). We identified seven clusters for 81% of the studied samples. We found that 78.8% of these samples evade through TGF-β immunosuppression, 57.75% through DcR3 counterattack, 48% through CTLA4, and 27.8% through PD-1. Interestingly, combination of TGF-β and DcR3 was pronounced in 57.75% of patients and evasion through DcR3 was exclusive to the lobular invasive subgroup. In addition, triple negative breast cancer (TNBC) patients split equally into 2 clusters: one with impaired antigen presentation and another with high leukocyte recruitment but a combination of 4 evasion mechanisms. We also identified biomarkers that play important roles in distinguishing immune evasion mechanisms. These findings provide a better understanding of patients’ response to immunotherapies and shed light to rational design of novel combination immunotherapies. In chapter 3, We designed an efficient algorithm called iSeg, for segmentation of genomic and epigenomic profiles. It first utilizes dynamic programming to identify candidate significant segments, then uses a novel data structure based on coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Merging of significant segments are performed at the end to generate the final set of segments. The algorithm can serve as a general computational framework that works with different model assumptions of the data. As a general procedure, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, and (differential) nuclease sensitivity. We evaluated iSeg using both simulated and experimental datasets and showed that it performs satisfactorily when compared with some popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is very computationally efficient, well suited for long sequences and large number of input data profiles. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 11, 2018. / differential gene expression analysis, immune evasion mechanism, robust data normalization, segmentation, sequential biclustering, Super-delta / Includes bibliographical references. / Jinfeng Zhang, Professor Directing Dissertation; Qing-Xiang (Amy) Sang, University Representative; Qing Mai, Committee Member; Yiyuan She, Committee Member.
Applications of Machine Learning to Precision MedicineUnknown Date (has links)
Work is presented from two projects, each involving an application of machine learning to precision medicine. The first project was for the Document Triage Task of the BioCreative VI Precision Medicine Track. Teams were asked to build machine learning models to identify journal abstracts that contain at least one mention of a protein-protein interaction (PPI) affected by a mutation. The second project is an analysis of gene expression data from a group of breast cancer patients receiving neoadjuvant chemotherapy to search for biomarkers predicting the outcome of treatment. The model developed for the Biocreative challenge did not use state of the art methods but achieved results only slightly worse than modern deep learning techniques. My contribution to this project was in feature engineering, model tuning and model validation. The feature engineering process will be presented along with a discussion of difficulties due to scarcity of data. The data for the second project was collected from breast cancer patients at the Sun Yat-sen University Cancer Center in Guangzhou China. RNASeq data and clinical information were collected from patients before and after undergoing neoadjuvant chemotherapy. Genes and pathways of potential relevance to the outcome of neoadjuvant therapy were identified for further study. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / 2019 / June 12, 2019. / Biomarkers, Genomics, Machine Learning, Neoadjuvant Chemotherapy, Precision Medicine, Text Mining / Includes bibliographical references. / Jinfeng Zhang, Professor Directing Dissertation; Tingting Zhao, University Representative; Mingjing Tao, Committee Member; Wei Wu, Committee Member.
Network-based approach for post genome-wide association study analysis in admixed populationsMbiyavanga, Mamana January 2014 (has links)
Includes abstract. / Includes bibliographical references. / In this project, we review some existing pathway-based approaches for GWA study analyses, by exploring different implemented methods for combining effects of multiple modest genetic variants at gene and pathway levels. We then propose a graph-based method, ancGWAS, that incorporates the signal from GWA study, and the locus-specific ancestry into the human protein-protein interaction (PPI) network to identify significant sub-networks or pathways associated with the trait of interest. This network-based method applies centrality measures within linkage disequilibrium (LD) on the network to search for pathways and applies a scoring summary statistic on the resulting pathways to identify the most enriched pathways associated with complex diseases.
Identification of the virulence gene of Mycobacterium tuberculosisRabiu, Halimah Adenike January 2007 (has links)
Includes bibliographical references (leaves -119). / The major thrust of this project is to identify and characterize potential virulence genes from M. tuberculosis. To this end, we have compiled and integrated information from various public databases to catalogue 246573 microbial genes from 84 organisms, including pathogens and non pathogenic microbes. We determined the phylogenetic distributions by grouping the proteins into families based on sequence similarity with the aid of BLASTP and the NCBI BLASTClust program.
DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FORCRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITESEQUENCING DATA ANALYSISWang, Kai 04 January 2019 (has links)
No description available.
Page generated in 0.0743 seconds