Global ETD Search

71	Measuring deviation from a deeply conserved consensus in protein multiple sequence alignments Mokin, Sergey January 2008 (has links) No description available. Biology - Bioinformatics
72	The viral genomics revolution\| Big data approaches to basic viral research, surveillance, and vaccine development Schobel, Seth Adam Micah 19 February 2016 (has links) <p> Since the decoding of the first RNA virus in 1976, the field of viral genomics has exploded, first through the use of Sanger sequencing technologies and later with the use next-generation sequencing approaches. With the development of these sequencing technologies, viral genomics has entered an era of big data. New challenges for analyzing these data are now apparent. Here, we describe novel methods to extend the current capabilities of viral comparative genomics. Through the use of antigenic distancing techniques, we have examined the relationship between the antigenic phenotype and the genetic content of influenza virus to establish a more systematic approach to viral surveillance and vaccine selection. Distancing of Antigenicity by Sequence-based Hierarchical Clustering (DASH) was developed and used to perform a retrospective analysis of 22 influenza seasons. Our methods produced vaccine candidates identical to or with a high concordance of antigenic similarity with those selected by the WHO. In a second effort, we have developed VirComp and OrionPlot: two independent yet related tools. These tools first generate gene-based genome constellations, or genotypes, of viral genomes, and second create visualizations of the resultant genome constellations. VirComp utilizes sequence-clustering techniques to infer genome constellations and prepares genome constellation data matrices for visualization with OrionPlot. OrionPlot is a java application for tailoring genome constellation figures for publication. OrionPlot allows for color selection of gene cluster assignments, customized box sizes to enable the visualization of gene comparisons based on sequence length, and label coloring. We have provided five analyses designed as vignettes to illustrate the utility of our tools for performing viral comparative genomic analyses. Study three focused on the analysis of respiratory syncytial virus (RSV) genomes circulating during the 2012- 2013 RSV season. We discovered a correlation between a recent tandem duplication within the G gene of RSV-A and a decrease in severity of infection. Our data suggests that this duplication is associated with a higher infection rate in female infants than is generally observed. Through these studies, we have extended the state of the art of genotype analysis, phenotype/genotype studies and established correlations between clinical metadata and RSV sequence data.</p> Biology\|Bioinformatics\|Virology
73	Identification of drug sensitive gene motifs using "epigenetic profiles" derived from bioinformatics databases Nelson, Jonathan M. 14 June 2016 (has links) <p> The use of epigenetic modifying drugs such as DNA methyltransferase inhibitors (DNMTi) and histone deacetylase inhibitors (HDACi) is becoming more common in the treatment of cancer. Currently, there is a profound interest in determining predictive biomarkers for patient response and the efficacy of known and novel drugs. There are likely distinct “epigenetic profiles” defined by the location and abundance of DNA methylation patterns and histone modifications. Here we propose to investigate the response of a selected subset of genes to particular DNMTi and HDACi treatments, in two human cancer cell lines, colorectal carcinoma HCT-116 and liver adenocarcinoma HepG2. In this study we identified unique epigenetic profiles based on microarray and bioinformatics derived epigenetic data that are predictive of the response to epigenetic drug treatment. Microarray studies were used to identify re-activated genes common in two different cancer cell types treated with epigenetic drugs. Bioinformatics data was compiled on these genes and correlated against re-expression to construct the genes’ “epigenetic profile”. We then verified the response of the select group of genes in HCT-116 and HepG2 upon treatment at varying concentrations of epigenetic drugs and illustrated that selective reactivation of the target gene. Additionally, two novel genes were introduced and one selectively activated over another. </p><p> Further research would prove invaluable for the medical and drug development communities, as a more extensive model would certainly be of use to determining patient response to drug treatment based on their individual epigenetic profile and leading to more successful novel drug design.</p> Biology\|Cellular biology\|Bioinformatics
74	Positive-Unlabeled Learning in the Context of Protein Function Prediction Youngs, Noah 19 December 2014 (has links) <p> With the recent proliferation of large, unlabeled data sets, a particular subclass of semisupervised learning problems has become more prevalent. Known as positive-unlabeled learning (PU learning), this scenario provides only positive labeled examples, usually just a small fraction of the entire dataset, with the remaining examples unknown and thus potentially belonging to either the positive or negative class. Since the vast majority of traditional machine learning classifiers require both positive and negative examples in the training set, a new class of algorithms has been developed to deal with PU learning problems.</p><p> A canonical example of this scenario is topic labeling of a large corpus of documents. Once the size of a corpus reaches into the thousands, it becomes largely infeasible to have a curator read even a sizable fraction of the documents, and annotate them with topics. In addition, the entire set of topics may not be known, or may change over time, making it impossible for a curator to annotate which documents are NOT about certain topics. Thus a machine learning algorithm needs to be able to learn from a small set of positive examples, without knowledge of the negative class, and knowing that the unlabeled training examples may contain an arbitrary number of additional but as yet unknown positive examples. </p><p> Another example of a PU learning scenario recently garnering attention is the protein function prediction problem (PFP problem). While the number of organisms with fully sequenced genomes continues to grow, the progress of annotating those sequences with the biological functions that they perform lags far behind. Machine learning methods have already been successfully applied to this problem, but with many organisms having a small number of positive annotated training examples, and the lack of availability of almost any labeled negative examples, PU learning algorithms have the potential to make large gains in predictive performance.</p><p> The first part of this dissertation motivates the protein function prediction problem, explores previous work, and introduces novel methods that improve upon previously reported benchmarks for a particular type of learning algorithm, known as Gaussian Random Field Label Propagation (GRFLP). In addition, we present improvements to the computational efficiency of the GRFLP algorithm, and a modification to the traditional structure of the PFP learning problem that allows for simultaneous prediction across multiple species.</p><p> The second part of the dissertation focuses specifically on the positive-unlabeled aspects of the PFP problem. Two novel algorithms are presented, and rigorously compared to existing PU learning techniques in the context of protein function prediction. Additionally, we take a step back and examine some of the theoretical considerations of the PU scenario in general, and provide an additional novel algorithm applicable in any PU context. This algorithm is tailored for situations in which the labeled positive examples are a small fraction of the set of true positive examples, and where the labeling process may be subject to some type of bias rather than being a random selection of true positives (arguably some of the most difficult PU learning scenarios).</p><p> The third and fourth sections return to the PFP problem, examining the power of tertiary structure as a predictor of protein function, as well as presenting two case studies of function prediction performance on novel benchmarks. Lastly, we conclude with several promising avenues of future research into both PU learning in general, and the protein function prediction problem specifically. </p> Biology, Bioinformatics\|Computer Science
75	Dinoflagellate genomic organization and phylogenetic marker discovery utilizing deep sequencing data Mendez, Gregory Scott 01 October 2016 (has links) <p> Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The <i>Suessiales</i> were found to be sister to the <i>Peridinales.</i> The <i>Prorocentrales </i> formed a monophyletic group with the Dinophysiales that was sister to the <i>Gonyaulacales.</i> The <i>Gymnodinales</i> was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.</p> Biology\|Molecular biology\|Bioinformatics
76	The structural and functional landscape of protein superfamilies: From the thioredoxin fold to parasite peptidases Atkinson, Holly J. January 2009 (has links) Thesis (Ph. D.)--University of California, San Francisco, 2009. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3484. Adviser: Patricia C. Babbitt.
77	Enhanced bioinformatics data modeling concepts and their use in querying and integration Ji, Feng. January 2008 (has links) Thesis (Ph.D.) -- University of Texas at Arlington, 2008.
78	Evolutionary coupling in multisubunit membrane protein complexes / Natarajan, Shreedhar. January 2008 (has links) Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008. / Source: Dissertation Abstracts International, Volume: 69-05, Section: B, page: 2701. Adviser: Eric Jakobsson. Includes bibliographical references (leaves 120-147) Available on microfilm from Pro Quest Information and Learning.
79	Application of Graph Theoretic Clustering on Some Biomedical Data Sets Ahlert, Darla 11 June 2015 (has links) <p> Clustering algorithms have become a popular way to analyze biomedical data sets and in particular, gene expression data. Since these data sets are often large, it is difficult to gather useful information from them as a whole. Clustering is a proven method to extract knowledge about the data that can eventually lead to many discoveries in the biological world. Hierarchical clustering is used frequently to interpret gene expression data, but recently, graph-theoretic clustering algorithms have started to gain some attraction for analysis of this type of data. We consider five graph-theoretic clustering algorithms run over a post-mortem gene expression dataset, as well as a few different biomedical data sets, in which the ground truth, or class label, is known for each data point. We then externally evaluate the algorithms based on the accuracy of the resulting clusters against the ground truth clusters. Comparing the results of each of the algorithms run over all of the datasets, we found that our algorithms are efficient on the real biomedical datasets but find gene expression data especially difficult to handle.</p> Biology, Bioinformatics\|Computer Science
80	Identification of Dermacentor andersoni saliva proteins that modulate mammalian phagocyte function Mudenda, Lwiindi 13 August 2015 (has links) <p> Ticks are obligate blood sucking parasites which transmit a wide range of pathogens worldwide including protozoa, bacteria and viruses. Additionally, tick feeding alone may result in anemia, dermatosis and toxin-induced paralysis. <i> Dermacentor andersoni</i> is a species of tick found in the western United States that transmits pathogens of public health importance including <i> Rickettsia rickettsii, Francisella tularensis,</i> and Colorado Tick Fever Virus, as well as <i>Anaplasma marginale</i>, a rickettsial pathogen that causes economic losses in both the dairy and beef industries worldwide. <i>D. andersoni</i> ticks are obligate blood sucking parasites that require a blood meal through all stages of their lifecycle. During feeding, ticks secrete factors that modulate both innate and acquired immune responses in the host which enables them to feed for several days without detection. The pathogens transmitted by ticks exploit these immunomodulatory properties to facilitate invasion of and replication in the host. Molecular characterization of these immunomodulatory proteins secreted in tick saliva offers an opportunity to develop novel anti-tick vaccines as well as anti-inflammatory drug targets. To this end we performed deep sequence analysis on unfed ticks and ticks fed for 2 or 5 days. The pooled data generated a database of 21,797 consensus sequences. Salivary gland gene expression levels of unfed ticks were compared to 2- and 5-day fed ticks to identify genes upregulated early during tick feeding. Next we performed mass spectrometry on saliva from 2- and 5-day fed ticks and used the database to identify 677 proteins. We cross referenced the protein data with the transcriptome data to identify 157 proteins of interest for immunomodulation and blood feeding. Both proteins of unknown function and known immunomodulators were identified. We expressed four of these proteins and tested them for inhibition of macrophage activation and/or cytokine expression in vitro. The results showed diverse effects of the various test proteins on the inflammatory response of mouse macrophage cell lines. The proteins upregulated some cytokines while downregulating others. However, all the proteins upregulated the regulatory cytokine IL-10.</p>

Page generated in 0.0607 seconds