Spelling suggestions: "subject:"bioinformatics"" "subject:"ioinformatics""
81 |
RNA sequencing differential expression and small RNA analyses of obesity and BMI with post-mortem human brainWake, Christian 29 September 2019 (has links)
Obesity, the accumulation of body fat to excess, may cause serious negative health effects including increased risk of heart disease, type 2 diabetes, stroke and certain cancers. RNA sequencing studies in the human brain related to obesity have not been previously undertaken. I conducted both large and small RNA sequencing of hypothalamus (207 samples) and nucleus accumbens (276 samples) from individuals defined as consistently obese (124 samples), consistently normal weight as controls (148 samples) or selected without respect to BMI and falling within neither case nor control definition (211 samples), based on longitudinal BMI measures. The samples were provided by three cohort studies with brain donation programs; the Framingham Heart Study, the Religious Orders Study and the Memory Aging Project. For each brain region and large/small RNA sequencing set, differential expression of obesity, BMI, brain region and sex was performed. There are sixteen mRNAs and five microRNAs that are differentially expressed (adjusted p < 0:05) by obesity or BMI in these tissues. Some genes, such as APOBR and CES1 and some gene sets, such as Reactome’s “opioid signaling”, yielded findings with interesting implications.
The small RNA sequencing data was used for novel analyses of microRNAs (miRNAs), discovering novel miRNAs and characterizing post-transcriptionally edited miRNAs (isomiRs). A custom miRNA identification analysis pipeline was built, which utilizes miRDeep* miRNA identification and result filtering based on false positive rate estimates. With this analysis I discovered over 300 novel miRNAs. Our isomiR analysis included isomiR-specific read filtering based on genome-alignment, and generated a set of isomiR reads which show editing patterns that are non-random with respect to the position and nucleotide of the edit. Specifically, purine substitution, pyrimidine substitution and 3’ polyadenylation and polyuridylation are commonly observed. The patterns of editing revealed that some miRNAs are almost always edited while others are very rarely. I developed a novel statistical test to determine differences in the isomiR profiles of individual miRNAs between two sets of samples. This method revealed 58 miRNAs with differentially edited isomiRs between the two brain regions, but none when comparing obese with control samples or male with female samples.
|
82 |
Genomic biomarker development to impact clinical management of patients at risk for lung cancerZhang, Jiarui 20 June 2020 (has links)
Lung cancer is the leading cause of cancer mortality in the US and the world, largely due to the challenges with early detection and precision management of aggressive cancer. We previously derived and validated bronchial and nasal epithelial gene expression biomarkers to detect lung cancer among individuals undergoing clinical workup for suspect of lung cancer. However, there are continuing challenges and needs to better understand lung cancer airway biology and ultimately impact clinical management: 1. Whether airway genomic classifiers could be developed to detect cancer among patients with indeterminate pulmonary nodules; 2. What are the airway cellular and molecular subtypes and their abilities to improve lung cancer diagnosis; 3. Whether molecular and histological subtype profiling based on lung adenocarcinoma gene expression would impact pre-/post-surgical management by indolence and aggressiveness prediction.
To fulfill above goals, I first developed a cancer biomarker based on the nasal airway gene expression alterations, and improved clinical model prediction among patients with indeterminate pulmonary nodules. Next, I leveraged both bulk and single cell bronchial airway gene expressions from patients of different lung cancer subtypes, and identified the molecular and cellular changes associated with adenocarcinoma vs. squamous cell carcinoma. This finding facilitated the development of a lung cancer subtype biomarker that improved the diagnostic accuracy of the previous lung cancer classifier. Finally, I leveraged tumor gene expression data from clinical stage I lung adenocarcinomas from a screening population, and identified solid-, micropapillary- and cribriform-specific gene signatures. A classifier predictive of aggressive histologic features was developed with potential to predict histologic aggressiveness from pre-surgical tumor biopsies where all histologic patterns may not be represented. Such a biomarker may be useful in guiding clinical decision making including extent of surgical resection.
Findings and discussions in this dissertation will discuss the potential for these biomarkers to have clinical utility in patients with or at risk for lung cancer. / 2022-06-19T00:00:00Z
|
83 |
The evolutionary impacts of secondary structures within genomes of eukaryote-infecting single-stranded DNA virusesMuhire, Brejnev Muhizi January 2015 (has links)
Includes bibliographical references / Secondary structures forming through base-pairing in virus genomes have been proven to regulate several processes during viral replication cycles, including genome replication, transcription, post-transcriptional activities, protein synthesis, genome packaging, generation of viral sub-genomes and evasion of host-cell immune responses. Although computational DNA/RNA folding methods based-on free energy minimisation approaches are capable of predicting structures that form within virus genomes, these methods are not entirely accurate. Notably, many of structures that are accurately predicted will likely have no biological importance within the genomes in which they reside because even randomly generated single-stranded RNA/DNA sequences will form stable secondary structures. Nevertheless, with additional genome evolution analyses involving the detection of natural selection, sequence co-evolution, and genetic recombination, it is possible to both validate the existence of, and infer the biological importance of, computationally predicted structures. Here I implement and deploy free bioinformatics tools to (1) automate nucleotide and protein sequences classification into datasets useful for downstream molecular evolution analyses; (2) improve the accuracy of computational virus-genome-scale secondary structure prediction; (3) enable the identification of biologically relevant secondary structures using signals of purifying selection, coevolution and recombination within aligned sequence datasets; and (4) enable efficient visualisation of structural and selection data for better characterisation of individual secondary structural elements. Using these tools I carried-out large scale studies that predicted and characterised novel functional secondary structures, that potentially regulate transcription, translation, gene splicing, and replication, within the genomes of eukaryote-infecting ssDNA viruses (Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae). I show that purifying selection tends to be stronger at base-paired sites than it is at unpaired sites and, wherever mutations are tolerable within paired regions, I demonstrate that there exist strong associations between base-pairing and complementary coevolution. Finally, I show that the recombinant genomes of some, but not all, eukaryote-infecting ssDNA virus groups display weak evidence of both homologous and non-homologous recombination break-points preferentially occurring at genome sites that minimally disrupt secondary structures. Altogether, these results suggest that natural selection acting to maintain important biologically functional secondary structural elements has been a major process during the evolution of eukaryote-infecting ssDNA viruses.
|
84 |
Statistical and computational methods for addressing heterogeneity in genomic dataZhang, Yuqing 16 July 2020 (has links)
Heterogeneity describes any variability across different datasets. In genomic studies which profile gene expression levels, the presence of heterogeneity is ubiquitous, and may bring challenges to the integrative analysis of multiple datasets. Thus, many efforts are needed to understand and address the impact of heterogeneity. In this dissertation, I have developed novel statistical models and computational software for this purpose. I derived reference-batch ComBat and ComBat-Seq, two improved models based on the state-of-the-art method, ComBat, for addressing one particular type of heterogeneity known as the “batch effects”. I showed their benefits compared to the existing methods in several data types and situations, and implemented these models in publicly available software. Then, I created systematic simulations to explore the impact of common study heterogeneity on the independent validation of genomic prediction models, showing that the most identifiable sources of heterogeneity are not the primary ones affecting the validation of genomic predictors. Finally, I adapted a solution using cross-study ensemble learning to train predictors with generalizable independent performance, to address the unwanted impact of batch effects on prediction. I compared this new framework with the traditional approach for batch correction, showing that cross-study learning may provide a more robust-performing model in independent validation. Results in this dissertation provide insights and guidelines for working with heterogeneous gene expression profiling datasets in practice, and encourage further investigation on understanding and addressing heterogeneity in genomic studies
|
85 |
Integration and visualisation of data in bioinformaticsSalazar, Gustavo A January 2015 (has links)
Includes bibliographical references / The most recent advances in laboratory techniques aimed at observing and measuring biological processes are characterised by their ability to generate large amounts of data. The more data we gather, the greater the chance of finding clues to understand the systems of life. This, however, is only true if the methods that analyse the generated data are efficient, effective, and robust enough to overcome the challenges intrinsic to the management of big data. The computational tools designed to overcome these challenges should also take into account the requirements of current research. Science demands specialised knowledge for understanding the particularities of each study; in addition, it is seldom possible to describe a single observation without considering its relationship with other processes, entities or systems. This thesis explores two closely related fields: the integration and visualisation of biological data. We believe that these two branches of study are fundamental in the creation of scientific software tools that respond to the ever increasing needs of researchers. The distributed annotation system (DAS) is a community project that supports the integration of data from federated sources and its visualisation on web and stand-alone clients. We have extended the DAS protocol to improve its search capabilities and also to support feature annotation by the community. We have also collaborated on the implementation of MyDAS, a server to facilitate the publication of biological data following the DAS protocol, and contributed in the design of the protein DAS client called DASty. Furthermore, we have developed a tool called probeSearcher, which uses the DAS technology to facilitate the identification of microarray chips that include probes for regions on proteins of interest. Another community project in which we participated is BioJS, an open source library of visualisation components for biological data. This thesis includes a description of the project, our contributions to it and some developed components that are part of it. Finally, and most importantly, we combined several BioJS components over a modular architecture to create PINV, a web based visualiser of protein-protein interaction (PPI) networks, that takes advantage of the features of modern web technologies in order to explore PPI datasets on an almost ubiquitous platform (the web) and facilitates collaboration between scientific peers. This thesis includes a description of the design and development processes of PINV, as well as current use cases that have benefited from the tool and whose feedback has been the source of several improvements to PINV. Collectively, this thesis describes novel software tools that, by using modern web technologies, facilitates the integration, exploration and visualisation of biological data, which has the potential to contribute to our understanding of the systems of life.
|
86 |
Regulated T cell pre-mRNA splicing as genetic marker of T cell suppressionMofolo, Boitumelo January 2012 (has links)
Includes abstract.
Includes bibliographical references.
|
87 |
Revisiting and re-computing the X-score scoring functionMambo, Hilaire Mobele January 2014 (has links)
Includes bibliographical references. / Scoring functions seek to compute in different ways protein-ligand binding energies by summing together the individual pairwise atomic interaction energies observed in crystal structures between the protein and the bound ligand. To date though, accurate prediction remains a big challenge since existing scoring functions fail to reproduce known binding energies with a sufficient degree of accuracy and robustness. To overcome this problem, we assign a discrete weighting to the individual atomic interaction to account for entropic desolvation factors on ligand binding. We thereafter re-compute the revised scoring function and test the output against multiple sets of data to examine the robustness of the heuristic weightings used.
|
88 |
Evaluating the predictive performance of cytotoxic T lymphocyte epitope prediction tools using Elispot assay dataMeraba, Rebone Leboreng January 2018 (has links)
Computational T-cell epitope prediction tools have been previously devised to predict potential human leukocyte antigen (HLA) binding peptides from protein sequences. These tools are complements of Enzyme-linked immunosorbent spot (ELISpot) assays - a very commonly applied immunological technique that is used both to identify regions of pathogen genomes that trigger an immune response and to characterize the relationships between an individual's complement of HLA alleles and the degree of immunity that they display. If computational tools could accurately predict HLA-peptide binding, then these tools might be useable as a cheap and reliable alternative to ELISpot assays. A web-based IFN γ ELISpot assay dataset sharing resource, called IMMUNO-SHARE, was developed to enable the simple and straightforward storage and dissemination amongst researchers of large volumes of IFN γ ELISpot assay data. Such experimental data was next used to make HLA-peptide binding predictions with four frequently used T-cell epitope prediction tools - netMHC 3.2, IEDB_ANN, IEDB_ARB Matrix and IEDB_SMM. The predictive performances of all four tools individually and collectively was statistically assessed using non-parametric Spearman rank-order correlation tests. It was found that none of the four tested tools yielded binding affinity predictions that were detectably correlated with the observed ELISpot data. High false positive rates, where high predicted binding affinities between peptides and patient HLAs corresponded in these patients with no appreciable immune responses, were apparent for all four of the tested methods. The low degree of correlation between ELISpot data and HLA-peptide binding predictions and in particular, high false positive rates and relatively low true positive and true negative rates, indicate that the four tested tools would require substantial improvement before they could be seen as a viable alternative to ELISpot assays. Given that the accuracy of predictions of each of the four methods tested is largely dependent on both the quantity and quality of known true binder and true non-binder datasets that were used to train the HLA-peptide binding prediction methods implemented by the tools, it is plausible that the accuracy of these tools could be increased with larger training datasets. Retraining either the current methods or the next generation of prediction tools would therefore be greatly facilitated by the availability of large quantities of publically available HLA-peptide binding interaction information. It is hoped that IMMUNO-SHARE or some other ELISpot data sharing resource could eventually meet this need.
|
89 |
Influence of gut microbiota on immune system in infantsKachambwa, Paidamoyo January 2017 (has links)
Background and Methods: Microbiota play many significant, direct or indirect, beneficial and detrimental roles in humans. Microbiome development is established at infancy where diet plays a directive role in the proliferation of gut microbes. It has been shown that the presence of a defined set of microbes has been known to increase the overall immunological capacity, which vaccines depend on to be effective. To date, little work has been done on the effect of the microbiota on immune system at infancy, thus an analysis of the microbial ecology present in the infant's gut and its correlation with immune activation is needed. Expression of genes involved in mediating and regulating immunity can be measured as an indicator of immune activity. Vaccines work by stimulating an immune response which can be measured by gene expression levels. This affects the infant's ability to establish a strong immune system, which is also dictated at infancy. 16s rRNA sequence data generated from 134 infant stool samples, at vaccination points 0, 6 and 14 weeks from infants that were either breast or formula fed, was analysed using the Quantitative Insights Into Microbial Ecology (QIIME) pipeline to detect different taxonomic groups that make up a particular microbiome. Statistical analysis in R was used to quantify the diversity of the different microbial groups in the gut. Expression levels of immune-related genes were measured from blood samples that were stimulated by a Bacillus Calmette–Guérin (BCG) antigen and correlated with microbiota compositions. Results and Conclusion: Microbiome data showed initial differentiation between breast and mixed fed infants.15% of 5 of the most abundant bacteria for breast fed infants were Bifidobacteriales, which are known for their probiotic properties. The data did not fully cluster as the oldest samples were taken quite early at 14 weeks. Individual bacteria were correlated with individual gene expression level data. The study shows the relative abundance of particular bacteria, comparing against feeding modality and demonstrated how the microbiota correlates with gene expression levels. At week 14, Bifidobacterium of abundance below 0 (heatmap log₁₀ scale) generally correlated with high CASP3 gene expression levels in breast fed babies while abundances above 1 correlated with low gene expression levels. Gene expression at abnormal levels usually has undesirable effects which result in dysfunctional immune reactions that lead to conditions ranging from autoimmune diseases to cancer.
|
90 |
Characterisation of the metabolome of Mycobacterium tuberculosis to identify new pathways and pathway holesWolfenden, Kristen Marie January 2014 (has links)
Includes abstract. / Includes bibliographical references. / Due to high incidence rates and the development of new drug-resistant or multidrug-resistant strains of TB, the development of new medicines and treatments for tuberculosis is a necessity. In order to develop these drugs, Mycobacterium tuberculosis (Mtb) needs to be studied more completely; this study performs a characterisation of the metabolome of Mtb and comparison across the phylogenetic profile to identify notable pathways.
|
Page generated in 0.0783 seconds