Global ETD Search

291	The IncP-9 plasmid group : characterisation of genomic sequences and development of tools for environmental monitoring Greated, Alicia January 2000 (has links) No description available. 572.8 Bioinformatics; Bioremediation
292	Purification and characterisation of plasmodium falciparum Hypoxanthine phosphoribosyltransferase. Murungi, Edwin Kimathi January 2007 (has links) <p>Malaria remains the most important parasitic disease worldwide. It is estimated that over 500 million infections and more that 2.7 million deaths arising from malaria occur each year. Most (90%) of the infections occur in Africa with the most affected groups being children of less than five years of age and women. this dire situation is exacerbated by the emrggence of drug resistant strains of Plasmodium falciparum. The work reported in this thesis focuses on improving the purification of PfHPRT by investigating the characteristics of anion exchange DE-52 chromatography (the first stage of purification), developing an HPLC gel filtration method for examining the quaternary structure of the protein and possible end stage purification, and initialcrystalization trials. a homology model of the open, unligaded PfHPRT is constructed using the atoomic structures of human, T.ccruz and STryphimurium HPRT as templates.</p> Plasmodium falciparum Bioinformatics Malaria.
293	Positive-Unlabeled Learning in the Context of Protein Function Prediction Youngs, Noah 19 December 2014 (has links) <p> With the recent proliferation of large, unlabeled data sets, a particular subclass of semisupervised learning problems has become more prevalent. Known as positive-unlabeled learning (PU learning), this scenario provides only positive labeled examples, usually just a small fraction of the entire dataset, with the remaining examples unknown and thus potentially belonging to either the positive or negative class. Since the vast majority of traditional machine learning classifiers require both positive and negative examples in the training set, a new class of algorithms has been developed to deal with PU learning problems.</p><p> A canonical example of this scenario is topic labeling of a large corpus of documents. Once the size of a corpus reaches into the thousands, it becomes largely infeasible to have a curator read even a sizable fraction of the documents, and annotate them with topics. In addition, the entire set of topics may not be known, or may change over time, making it impossible for a curator to annotate which documents are NOT about certain topics. Thus a machine learning algorithm needs to be able to learn from a small set of positive examples, without knowledge of the negative class, and knowing that the unlabeled training examples may contain an arbitrary number of additional but as yet unknown positive examples. </p><p> Another example of a PU learning scenario recently garnering attention is the protein function prediction problem (PFP problem). While the number of organisms with fully sequenced genomes continues to grow, the progress of annotating those sequences with the biological functions that they perform lags far behind. Machine learning methods have already been successfully applied to this problem, but with many organisms having a small number of positive annotated training examples, and the lack of availability of almost any labeled negative examples, PU learning algorithms have the potential to make large gains in predictive performance.</p><p> The first part of this dissertation motivates the protein function prediction problem, explores previous work, and introduces novel methods that improve upon previously reported benchmarks for a particular type of learning algorithm, known as Gaussian Random Field Label Propagation (GRFLP). In addition, we present improvements to the computational efficiency of the GRFLP algorithm, and a modification to the traditional structure of the PFP learning problem that allows for simultaneous prediction across multiple species.</p><p> The second part of the dissertation focuses specifically on the positive-unlabeled aspects of the PFP problem. Two novel algorithms are presented, and rigorously compared to existing PU learning techniques in the context of protein function prediction. Additionally, we take a step back and examine some of the theoretical considerations of the PU scenario in general, and provide an additional novel algorithm applicable in any PU context. This algorithm is tailored for situations in which the labeled positive examples are a small fraction of the set of true positive examples, and where the labeling process may be subject to some type of bias rather than being a random selection of true positives (arguably some of the most difficult PU learning scenarios).</p><p> The third and fourth sections return to the PFP problem, examining the power of tertiary structure as a predictor of protein function, as well as presenting two case studies of function prediction performance on novel benchmarks. Lastly, we conclude with several promising avenues of future research into both PU learning in general, and the protein function prediction problem specifically. </p> Biology, Bioinformatics\|Computer Science
294	RNA-protein structure classifiers incorporated into second-generation statistical potentials Kimura, Takayuki 11 February 2017 (has links) <p> Computational modeling of RNA-protein interactions remains an important endeavor. However, exclusively all-atom approaches that model RNA-protein interactions via molecular dynamics are often problematic in their application. One possible alternative is the implementation of hierarchical approaches, first efficiently exploring configurational space with a coarse-grained representation of the RNA and protein. Subsequently, the lowest energy set of such coarse-grained models can be used as scaffolds for all-atom placements, a standard method in modeling protein 3D-structure. However, the coarse-grained modeling likely will require improved ribonucleotide-amino acid potentials as applied to coarse-grained structures. As a first step we downloaded 1,345 PDB files and clustered them with PISCES to obtain a non-redundant complex data set. The contacts were divided into nine types with DSSR according to the 3D structure of RNA and then 9 sets of potentials were calculated. The potentials were applied to score fifty thousand poses generated by FTDock for twenty-one standard RNA-protein complexes. The results compare favorably to existing RNA-protein potentials. Future research will optimize and test such combined potentials. </p> Biochemistry\|Bioinformatics\|Biophysics
295	Venomics and Functional Analysis of Venom From the Emerald Jewel Wasp, Ampulex compressa Arvidson, Ryan Scott 07 September 2016 (has links) <p> My research involves biochemical analysis of venom from a fascinating parasitoid jewel wasp <i>Ampulex compressa.</i> Most parasitoid wasps envenomate the host by stinging into the body cavity to cause paralysis and developmental arrest, prior to deposition of eggs externally or within the body cavity. <i>A. compressa</i> instead uses a different subjugation strategy by injecting venom directly into the central nervous system, eliciting a behavioral sequence culminating in hypokinesia, a 7–10 day lethargy advantageous to wasp reproduction. Hypokinesia is a specific, venom-induced behavioral state characterized by suppression of the escape response and reduced spontaneous walking, leaving other motor functions unaffected. This specificity of action is particularly unique among venoms and interestingly, effects of the venom on the escape response are reversible as the cockroach may recover after 7–10 days if not consumed by the wasp larvae. Venom-induced hypokinesia raises an interesting biological question: How can such a potent biochemical cocktail cause such long-lasting, specific, yet reversible effects on behavior? I approached this question in two ways: objective one—bioinformatic analysis of the venom and venom gland tissue to determine what the venom is made of, and objective two—functional analysis of key venom components to determine how the venom works. To address objective 1, I used advanced bioinformatics techniques to generate transcriptomes of the venom tissue and proteomes of the venom and venom tissue. Next generation sequencing of venom gland RNA has yielded full-length coding sequences and quantification of venom transcript levels, while mass spectroscopy based protein analysis has validated the presence of venom proteins. These analyses will allow construction of a comprehensive <i>A. compressa</i> “venome” that will help inform functional analyses of the venom and its role in hypokinesia induction. For objective two, I focused on the characterization of the most abundant peptide in the venom, tentatively named Ampulexin 1, and pharmacological analysis of an interesting venom peptide neurotransmitter, called tachykinin. Analysis of venom tachykinin action on cockroach brain receptors may reveal an interesting case of the evolution of a neurotransmitter from one animal to target the nervous system of another.</p> Biochemistry\|Bioinformatics\|Parasitology
296	Leveraging Mathematical Models to Predict Allosteric Hotspots in the Age of Deep Sequencing Clarke, Declan 16 September 2016 (has links) <p> A mathematical model is an abstraction that distills quantifiable behaviors and properties into a well-defined formalism in order to learn or predict something about a system. Such models may be as light as pencil-and-paper calculations on the back of an envelope or as heavy as to entail modern super computers. They may be as simple as predicting the trajectory of a baseball or as complex as forecasting the weather. By using macromolecular protein structures as substrates, the objective of this thesis is to improve upon and leverage mathematical models in order to address what is both a growing challenge and a burgeoning opportunity in the age of next-generation sequencing. The rapidly growing volume of data being produced by emerging deep sequencing technologies is enabling more in-depth analyses of protein conservation than previously possible. Increasingly, deep sequencing is bringing to light many disease-associated loci and localized signatures of strong conservation. These signatures in sequence space are the "shadows" of selective pressures that have been acting on proteins over the course of many years. However, despite the rapidly growing abundance of available data on such signatures, as well as the finer resolution with which they may be detected, an intuitive biophysical or functional rationale behind such genomic shadows is often missing (such intuition may otherwise be provided, for instance, by the need to engage in protein-protein interactions, undergo post-translational modification, or achieve a close-packed hydrophobic core). Allostery may frequently provide the missing conceptual link. Allosteric mechanisms act through changes in the dynamic behavior of protein architectures. Because selective evolutionary pressures often act through processes that are intrinsically dynamic in nature, static renderings can fail to provide any plausible rationale for constraint. In the work outlined here, models of protein conformational change are used to predict allosteric residues that either <i>a)</i> act as essential cavities on the protein surface which serve as sources or sinks in allosteric communication; or <i>b)</i> function as important information flow bottlenecks within the allosteric communication pathways of the protein interior. Though most existing approaches entail computationally expensive methods (such as MD) or rely on less direct measures (such as sequence features), the framework discussed herein is simultaneously both computationally tractable and fundamentally structural in nature – conformational change and topology are directly included in the search for allosteric residues – thereby enabling allosteric site prediction across the Protein Data Bank. Large-scale (i.e., general) properties of the predicted allosteric residues are then evaluated with respect to conservation. Multiple threads of evidence (using different sources of data and employing a variety of metrics) are used to demonstrate that the predicted allosteric residues tend to be significantly conserved across diverse evolutionary time scales. In addition, specific examples in which these residues can help to explain previously poorly understood disease-associated variants are discussed. Finally, a practical and computationally rapid software tool that enables users to perform this analysis on their own proteins of interest has been made available to the scientific public.</p>
297	Characterization of Influenza A Virus Infection through Analysis of Intrahost Viral Evolution and Within-host Infection Dynamics Sobel Leonard, Ashley Elizabeth January 2016 (has links) <p>Influenza A virus is a major source of morbidity and mortality, annually resulting in over 9000 deaths in the United States alone. As a segmented, RNA virus, influenza has a high mutation rate, facilitating its evolution to overcome cross protective immunity through natural selection and adapt to new host species or sources of evolutionary pressure through reassortment. The high viral mutation rate also means that these processes affect not only evolution at the population level, but also at the intrahost level. While these processes have been well characterized for population-level viral evolution, viral evolution within a single host is far less well defined. In this dissertation, I characterize influenza infection at the intrahost level with respect to viral evolution and infection dynamics. In the second chapter, I critically evaluate methods for estimating the transmission bottleneck size for influenza A virus from viral sequencing data. The transmission bottleneck describes the infecting population size, a determinant for the level of genetic diversity present at the onset of infection. I show current methods may be biased, both by the criteria used to identify sequencing variants and the presence of demographic stochasticity. In response to these biases, I introduce a new method that (1) corrects for differences in variant calling criteria and (2) accommodates demographic stochasticity. Chapters 3-5 are based on data collected from an existing human challenge study with influenza A virus. In this challenge study, volunteers were experimentally infected with a heterogeneous viral inoculum that had adapted to the conditions in which it had been generated. In chapter 3, I show that transmission was governed by a selective bottleneck and that subsequent intrahost viral evolution was dominated by purifying selection. In chapter 4, I further characterize the observed intrahost viral evolution through the reconstruction of viral haplotypes to evaluate different models of selection. These models differed by the level at which selection was acting, whether selection is focused on individual loci, multiple loci within a single gene segment, or across gene segments. Model selection favored the third model, wherein selection acted across gene segments, thereby establishing that the effective viral reassortment rate was limited in these subjects. In chapter 5, I develop a mathematical model for within-host influenza infection linking viral replication and the host immune response with the development of disease symptoms. I fit this model to experimental data collected from the challenge study. Analysis of the model fits indicated that much of the heterogeneity in the data between subjects could be explained by interindividual variation in viral infectivity. This finding echoed the results of chapters 3 and 4, that there were quantifiable differences in the infecting viral populations between the study subjects. Taken together, these observations suggest that </p><p>intrahost viral genetics may underlie the differences between the subjects’ response to infection.</p> / Dissertation Biology Bioinformatics Virology
298	Genetic Determinants of Salmonella and Campylobacter Required for In Vitro Fitness Mandal, Rabindra Kumar 15 December 2016 (has links) <p>Non-typhoidal Salmonella (NTS) and Campylobacter play a major role in foodborne illness caused by the consumption of food contaminated by pathogens worldwide. A comprehensive understanding of the genetic factors that increase the survival fitness of these foodborne pathogens will effectively help us formulate mitigation strategies without affecting the nutrition ecology. The objective of this study was to identify the genetic determinants of Salmonella and Campylobacter that are required for fitness under various in vitro conditions. For the purpose, we used a high throughput Transposon sequencing (Tn-seq) that utilizes next generation sequencing (NGS) to screen hundreds of thousands of mutants simultaneously. In Chapter 1, we reviewed the technical aspects of different Tn-seq methods along with their pros and cons and compressive summary of recently published studies using Tn-seq methods. In Chapter 2, we exposed complex Tn5 library of Salmonella Typhimurium 14028S (S. Typhimurium) to the mimicked host stressors in vitro conditions. Such as low acidic pH (pH 3) found in the stomach, osmotic (3% NaCl) and short chain fatty acid (SCFAs, 100 mM Propionate) found in intestine, and oxidation (1mM H2O2) and starvation (12-day survival in PBS) found in macrophage. There was an overlapping set of 339 conditionally essential genes (CEGs) required by S. Typhimurium to overcome these host stressors. In Chapter 3, we screened of S. Typhimurium Tn5 library for desiccation survival. Salmonella spp. is the most notable and frequent cause of contamination in low-water activity foods. We identified 61 genes and 6 intergenic regions required for fitness during desiccation stress. In Chapter 4, the essential genome of Campylobacter jejuni (C. jejuni) NCTC 11168 and C. jejuni 81-176 was investigated using Tn-seq. We identified 166 essential protein-coding genes and 20 essential transfer RNA (tRNA) in C. jejuni NCTC 11168 which were intolerant to Tn5 insertions during in vitro growth. The reconstructed library C. jejuni 81-176 had 384 protein coding genes with zero Tn5 insertions. The genetic determinants Salmonella and Campylobacter identified in this study have high potential to be explored as food safety intervention, therapeutic and vaccine target to curb the spread of the foodborne pathogens making world a safer place.
299	Petal - A New Approach to Construct and Analyze Gene Co-Expression Networks in R Petereit, Julia 17 February 2017 (has links) <p> <b>petal</b> is a network analysis method that includes and takes advantage of precise Mathematics, Statistics, and Graph Theory, but remains practical to the life scientist. <b>petal</b> is built upon the assumption that large complex systems follow a scale-free and small-world network topology. One main intention of creating this program is to eliminate unnecessary noise and imprecision introduced by the user. Consequently, no user input parameters are required, and the program is designed to allow the two structural properties, scale-free and small-world, to govern the construction of network models. </p><p> The program is implemented in the statistical language <b>R</b> and is freely available as a package for download. Its package includes several simple <b>R</b> functions that the researcher can use to construct co-expression networks and extract gene groupings from a biologically meaningful network model. More advanced <b>R</b> users may use other functions for further downstream analyses, if desired. </p><p> The <b>petal</b> algorithm is discussed and its application demonstrated on several datasets. <b>petal</b> results show that the technique is capable of detecting biologically meaningful network modules from co-expression networks. That is, scientists can use this technique to identify groups of genes with possible similar function based on their expression information. </p><p> While this approach is motivated by whole-system gene expression data, the fundamental components of the method are transparent and can be applied to large datasets of many types, sizes, and stemming from various fields. </p>
300	Discovering driver somatic mutations, copy number alterations and methylation changes using Markov Chain Monte Carlo Yahya, Bokhari 11 December 2013 (has links) Nowadays we have tremendous amount of genetic data needing to be interpreted. Somatic mutations, copy number variations and methylation are example of the genetics data we are dealing with. Discovering driver mutations from these combined data types is challenging. Mutations are unpredictable and have broad heterogeneity, which makes our goal hard to accomplish. Many methods have been proposed to solve the mystery of genetics of cancer. In this project we manipulate those above mentioned genetics data types and choose to use and modified an existing method utilizing Markov Chain Monte Carlo (MCMC). The method introduced two properties, coverage and exclusivity. We obtained the data from The Cancer Genome Atlas (TCGA). We used MCMC method with three cancer types: Glioblastoma Multiform (GBM) with 214 patients, Breast Invasive Carcinoma (BRCA) with 474 patients and Colon Adenocarcinoma (COAD) with 233 patients. Bioinformatics Life Sciences

Search results