131 |
Motif Finding in Biological SequencesLiao, Ying-Jer 21 August 2003 (has links)
A huge number of genomic information, including protein and DNA sequences, is generated by the human genome project. Deciphering these sequences and detecting local residue patterns of multiple sequences are very difficult. One of the ways to decipher these biological sequences is to detect local residue patterns from them. However, detecting unknown patterns from multiple sequences is still very difficult. In this thesis, we propose an algorithm, based on the Gibbs sampler method, for identifying local consensus patterns (motifs) in monomolecular sequences. We first designed an ACO (ant colony optimization) algorithm to find a good initial solution and a set of better candidate positions for revising the motif. Then the Gibbs sampler method is applied with these better candidate positions as the input. The required time for finding motifs using our algorithm is reduced drastically. It takes only 20 % of time of the Gibbs sampler method and it maintains the comparable quality.
|
132 |
Physically interpretable machine learning methods for transcription factor binding site identification using principled energy thresholds and occupancyDrawid, Amar Mohan. January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computational Biology and Molecular Biophysics." Includes bibliographical references (p. 210-226).
|
133 |
Modeling the interaction and energetics of biological molecules with a polarizable force fieldShi, Yue, active 21st century 11 July 2014 (has links)
Accurate prediction of protein-ligand binding affinity is essential to computational drug discovery. Current approaches are limited by the accuracy of the underlying potential energy model that describes atomic interactions. A more rigorous physical model is critical for evaluating molecular interactions to chemical accuracy. The objective of this thesis research is to develop a polarizable force field with an accurate representation of electrostatic interactions, and apply this model to protein-ligand recognition and to ultimately solve practical problems in computer aided drug discovery. By calculating the hydration free energies of a series of organic small molecules, an optimal protocol is established to develop the electrostatic parameters from quantum mechanics calculations. Next, the systematical development and parameterization procedure of AMOEBA protein force field is presented. The derived force field has gone through extensive validations in both gas phase and condensed phase. The last part of the thesis involves the application of AMOEBA to study protein-ligand interactions. The binding free energies of benzamidine analogs to trypsin using molecular dynamics alchemical perturbation are calculated with encouraging accuracy. AMOEBA is also used to study the thermodynamic effect of constraining and hydrophobicity on binding energetics between phosphotyrosine(pY)-containing tripeptides and the SH2 domain of growth receptor binding protein 2 (Grb2). The underlying mechanism of an "entropic paradox" associated with ligand preorganization is explored. / text
|
134 |
Reconciling gene family evolution and species evolutionSjöstrand, Joel January 2013 (has links)
Species evolution can often be adequately described with a phylogenetic tree. Interestingly, this is the case also for the evolution of homologous genes; a gene in an ancestral species may – through gene duplication, gene loss, lateral gene transfer (LGT), and speciation events – give rise to a gene family distributed across contemporaneous species. However, molecular sequence evolution and genetic recombination make the history – the gene tree – non-trivial to reconstruct from present-day sequences. This history is of biological interest, e.g., for inferring potential functional equivalences of extant gene pairs. In this thesis, we present biologically sound probabilistic models for gene family evolution guided by species evolution – effectively yielding a gene-species tree reconciliation. Using Bayesian Markov-chain Monte Carlo (MCMC) inference techniques, we show that by taking advantage of the information provided by the species tree, our methods achieve more reliable gene tree estimates than traditional species tree-uninformed approaches. Specifically, we describe a comprehensive model that accounts for gene duplication, gene loss, a relaxed molecular clock, and sequence evolution, and we show that the method performs admirably on synthetic and biological data. Further-more, we present two expansions of the inference procedure, enabling it to pro-vide (i) refined gene tree estimates with timed duplications, and (ii) probabilistic orthology estimates – i.e., that the origin of a pair of extant genes is a speciation. Finally, we present a substantial development of the model to account also for LGT. A sophisticated algorithmic framework of dynamic programming and numerical methods for differential equations is used to resolve the computational hurdles that LGT brings about. We apply the method on two bacterial datasets where LGT is believed to be prominent, in order to estimate genome-wide LGT and duplication rates. We further show that traditional methods – in which gene trees are reconstructed and reconciled with the species tree in separate stages – are prone to yield inferior gene tree estimates that will overestimate the number of LGT events. / Arters evolution kan i många fall beskrivas med ett träd, vilket redan Darwins anteckningsböcker från HMS Beagle vittnar om. Detta gäller också homologa gener; en gen i en ancestral art kan – genom genduplikationer, genförluster, lateral gentransfer (LGT) och artbildningar – ge upphov till en genfamilj spridd över samtida arter. Att från sekvenser från nu levande arter rekonstruera genfamiljens framväxt – genträdet – är icke-trivialt på grund av genetisk rekombination och sekvensevolution. Genträdet är emellertid av biologiskt intresse, i synnerhet för att det möjliggör antaganden om funktionellt släktskap mellan nutida genpar. Denna avhandling behandlar biologiskt välgrundade sannolikhetsmodeller för genfamiljsevolution. Dessa modeller tar hjälp av artevolutionens starka inverkan på genfamiljens historia, och ger väsentligen upphov till en förlikning av genträd och artträd. Genom Bayesiansk inferens baserad på Markov-chain Monte Carlo (MCMC) visar vi att våra metoder presterar bättre genträdsskattningar än traditionella ansatser som inte tar artträdet i beaktning. Mer specifikt beskriver vi en modell som omfattar genduplikationer, genförluster, en relaxerad molekylär klocka, samt sekvensevolution, och visar att metoden ger högkvalitativa skattningar på både syntetiska och biologiska data. Vidare presenterar vi två utvidgningar av detta ramverk som möjliggör (i) genträdsskattningar med tidpunkter för duplikationer, samt (ii) probabilistiska ortologiskattningar – d.v.s. att två nutida gener härstammar från en artbildning. Slutligen presenterar vi en modell som inkluderar LGT utöver ovan nämnda mekanismer. De beräkningsmässiga svårigheter som LGT ger upphov till löses med ett intrikat ramverk av dynamisk programmering och numeriska metoder för differentialekvationer. Vi tillämpar metoden för att skatta LGT- och duplikationsraten hos två bakteriella dataset där LGT förmodas ha spelat en central roll. Vi visar också att traditionella metoder – där genträd skattas och förlikas med artträdet i separata steg – tenderar att ge sämre genträdsskattningar, och därmed överskatta antalet LGT-händelser. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 5: Manuscript.</p>
|
135 |
Deep Sequencing and Functional Analyses Identify a Role of Fusobacterium Species in Colorectal TumorigenesisKostic, Aleksandar David 08 June 2015 (has links)
The tumor microenvironment is a complex community consisting of neoplastic cells, surrounding stromal cells, a broad array of immune cells, and a microbiota. By sheer numbers, the microbiota has its greatest manifestation in colorectal cancer (CRC) because the colon contains up to 100 trillion bacteria, outnumbering human cells by a factor of 10 and encoding a gene-content that is 100-fold larger than that of the human genome. Indeed, previous studies using germ-free mice in a variety of genetic backgrounds have demonstrated that the microbiota can impact colorectal tumorigenesis. In addition, specific strains of enterotoxigenic bacteria have been shown to promote colitis-associated cancer in mice. Here, we explore the composition of the tissue-associated microbiota in human CRC and evaluate the role of tumor-enriched microbes in potentiating colorectal tumorigenesis in mice. Advances in DNA sequencing technology have fueled a renaissance in the microbiome field. Deep sequencing metagenomics enables rapid, culture-independent characterization of a microbial community. We present PathSeq, a highly scalable software tool that performs computational subtraction on high-throughput sequencing data to identify nonhuman nucleic acids. PathSeq makes it possible to analyze sequence datasets as large as human whole-genomes for the purpose of metagenomics and also to discover previously unsequenced microorganisms. We used PathSeq to characterize the composition of the microbiota in human CRCusing whole-genome sequencing on nine tumor/normal pairs and 16S rDNA sequencing on an additional 95 pairs. The genus Fusobacterium was highly enriched in tumors, while the Bacteroidetes and Firmicutes phyla were depleted.We show that in the \(Apc^{Min/+}\) mouse model of intestinal tumorigenesis, Fusobacterium nucleatum increases tumor multiplicity, selectively recruits tumor-infiltrating myeloid cells, and is associated with a pro-inflammatory expression signature that is shared with human fusobacteria-positive colorectal carcinomas. We find that Fusobacterium spp. are enriched in human colonic adenomas relative to surrounding tissues and fusobacterial abundance is increased in stool samples from patients with colorectal adenomas and carcinomas, compared to healthy subjects. Collectively, these data support that fusobacteria may be involved in early stages of intestinal tumorigenesis and, through recruitment of tumor-infiltrating immune cells, may generate a pro-inflammatory tissue microenvironment conducive to colorectal neoplasia progression.
|
136 |
Molecular investigation of polypyrrole and surface recognition by affinity peptidesFonner, John Michael 23 January 2012 (has links)
Successful tissue engineering strategies in the nervous system must be carefully crafted to interact favorably with the complex biochemical signals of the native environment. To date, all chronic implants incorporating electrical conductivity degrade in performance over time as the foreign body reaction and subsequent fibrous encapsulation isolate them from the host tissue. Our goal is to develop a peptide-based interfacial biomaterial that will non-covalently coat the surface of the conducting polymer polypyrrole, allowing the implant to interact with the nervous system through both electrical and chemical cues. Starting with a candidate peptide sequence discovered through phage display, we used computational simulations of the peptide on polypyrrole to describe the bound peptide structure, explore the mechanism of binding, and suggest new, better binding peptide sequences. After experimentally characterizing the polymer, we created a molecular mechanics model of polypyrrole using quantum mechanics calculations and compared its in silico properties to experimental observables such as density and chain packing. Using replica exchange molecular dynamics, we then modeled the behavior of affinity binding peptides on the surface of polypyrrole in explicit water and saline environments. Relative measurements of the contributions of each
amino acid were made using distance measurements and computational alanine scanning. / text
|
137 |
A study on predicting gene relationship from a computational perspectiveChan, Pui-yee., 陳沛儀. January 2004 (has links)
published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy
|
138 |
Inverse Parametric Alignment for Accurate Biological Sequence ComparisonKim, Eagu January 2008 (has links)
For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. In practice, substitution scores are usually chosen by convention, and gap penalties are often found by trial and error. In contrast, a rigorous way to determine parameter values that are appropriate for aligning biological sequences is by solving the problem of Inverse Parametric Sequence Alignment. Given examples of biologically correct reference alignments, this is the problem of finding parameter values that make the examples score as close as possible to optimal alignments of their sequences. The reference alignments that are currently available contain regions where the alignment is not specified, which leads to a version of the problem with partial examples.In this dissertation, we develop a new polynomial-time algorithm for Inverse Parametric Sequence Alignment that is simple to implement, fast in practice, and can learn hundreds of parameters simultaneously from hundreds of examples. Computational results with partial examples show that best possible values for all 212 parameters of the standard alignment scoring model for protein sequences can be computed from 200 examples in 4 hours of computation on a standard desktop machine. We also consider a new scoring model with a small number of additional parameters that incorporates predicted secondary structure for the protein sequences. By learning parameter values for this new secondary-structure-based model, we can improve on the alignment accuracy of the standard model by as much as 15% for sequences with less than 25% identity.
|
139 |
Support vector machines for classification and regressionShah, Rohan Shiloh. January 2007 (has links)
In the last decade Support Vector Machines (SVMs) have emerged as an important learning technique for solving classification and regression problems in various fields, most notably in computational biology, finance and text categorization. This is due in part to built-in mechanisms to ensure good generalization which leads to accurate prediction, the use of kernel functions to model non-linear distributions, the ability to train relatively quickly on large data sets using novel mathematical optimization techniques and most significantly the possibility of theoretical analysis using computational learning theory. In this thesis, we discuss the theoretical basis and computational approaches to Support Vector Machines.
|
140 |
A Computational Study of Proton Uptake Pathways in Cytochrome c OxidaseCaplan, David 21 November 2012 (has links)
Cytochrome c oxidase (CcO), the terminal enzyme in the electron transport chain, couples proton pumping to the reduction of dioxygen into water. The coupling mechanism remains to be elucidated. Previous studies have identified several mutations within CcO's primary proton uptake pathway (the D-channel) that decouple proton pumping from redox activity. Here, I examine the molecular basis for decoupling in single and double mutants of highly conserved residues, D132 and N139, in order to gain insight into the coupling mechanism. In particular, I use molecular dynamics and free energy simulations of a new, unconstrained model of bacterial CcO embedded in a solvated lipid bilayer to investigate how such mutants affect functional hydration and ionic selectivity in the D-channel. Results support earlier mechanistic insights obtained in our laboratory from simplified molecular models and predict a new, testable hypothesis by which cations such as K+ may inhibit proton pumping in charged mutants of N139.
|
Page generated in 0.0917 seconds