Global ETD Search

1	An Application for Downloading and Integrating Molecular Biology Data Fontaine, Burr R. 24 August 2005 (has links) Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Sciences in the School of Informatics Indiana University July 2004 / Integrating large volumes of data from diverse sources is a formidable challenge for many investigators in the field of molecular biology. Developing efficient methods for accessing and integrating this data is a major focus of investigation in the field of bioinformatics. In early 2003, the Hereditary Genomics division of the department of Medical and Molecular Genetics at IUPUI recognized the need for a software application that would automate many of the manual processes that were being used to obtain data for their research. The two primary objectives for this project were: 1) an application that would provide large-scale, integrated output tables to help answer questions that frequently arose in the course of their research, and 2) a graphic user interface (GUI) that would minimize or eliminate the need for technical expertise in computer programming or database operations on the part of the end-users. In early 2003, Indiana University (IU), IBM, and the Indiana Genomics Initiative (INGEN) introduced a new resource called Centralized Life Sciences Data Services (CLSD). CLSD is a centralized data repository that provides programmatic access to biological data that is collected and integrated from multiple public, online databases. METHODS 1. an in-depth analysis was conducted to assess the department's data requirements and map these requirements to the data available at CLSD 2. CLSD incorporated new data as necessary 3. SQL was written to generate tables that would replace the targeted manual processes 4. a DB2 client was installed in Medical and Molecular Genetics to establish remote access to CLSD 5. a graphic user interface (GUI) was designed and implemented in HTML/CGI 6. a PERL program was written to accept parameters from the web input form, submit queries to CLSD, and generate HTML-based output tables 7. validation, updates, and maintenance procedures were conducted after early prototype implementation RESULTS AND CONCLUSIONS This application resulted in a substantial increase in efficiency over the manual methods that were previously used for data collection. The application also allows research teams to update their data much more frequently. A high level of accuracy in the output tables was confirmed by a thorough validation process. downloading molecular biology data integrating data
2	Integration of kinetic models with data from 13C-metabolic flux experiments Schabort, Willem Petrus Du Toit 12 1900 (has links) Thesis (MSc (Biochemistry))--University of Stellenbosch, 2007. / A detailed mathematical description of all the processes in a cell could be an informative tool for investigating biological function. Detailed kinetic models could be built either by obtaining enzyme kinetic parameters in vitro, or by obtaining them from time series analyses of metabolite data from rapid pulse experiments. A genome scale in vitro enzyme kinetic assay project would be prohibitively laborious with the current technologies. Further, there are still uncertainties about the importance of in vivo effects such as metabolite channelling, spatial effects and molecular crowding which could make in vitro determined parameters invalid. Accordingly, there is much interest in in vivo experiments for kinetic modelling. In vivo experimental methods suffer from a number of technical and even fundamental problems. Technical problems are being solved by more sensitive metabolomics tools and rapid sampling technologies. However, the large number of effectors of each enzyme reaction makes it impossible to obtain models at the level of detail possible with the in vitro method. Ultimately, the solution to building a genome scale Silicon Cell is to make use of both strategies. As metabolomics technologies are rapidly improving, it would thus make sense to follow the parts-based in vitro kinetics methodology, and carry out a detailed accuracy assessment of the model with in vivo experiments. To address the problem of the fundamental limit of information from concentration time-series, other in vivo experiments will have to be carried out as well. 13C-metabolic flux analysis has recently undergone vast improvements with the use of better experimental protocols and powerful algorithms for flux calculation. Incorporation of this type of experiment in the validation protocol is the aim of this thesis, which represents an intermediary step towards using the genome-scale stoichiometric models as platforms for building genome-scale kinetic models. It is illustrated here how kinetic models can be combined with metabolic flux data in a special way which allows correct modelling of boundary conditions and validation using novel concepts. We used 13C-metabolic flux analysis and gas chromatography-mass-spectrometry to measure metabolic fluxes through the central metabolic pathways of the yeast Saccharomyces cerevisiae. This data was integrated with a previously constructed detailed kinetic model of fermentative glycolysis in the yeast to illustrate our approach. Various implications for such data integration with kinetic models were identified and a software program was designed for this purpose. Dissertations -- Biochemistry Theses -- Biochemistry Molecular biology -- Data processing
3	Algorithms on constrained sequence alignment Ho, Ngai-lam., 何毅林. January 2004 (has links) published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy Nucleotide sequence. Proteins - Analysis. Algorithms. Bioinformatics. Molecular biology - Data processing.
4	Development of graphical software tools for molecular biology Archer, Emory Scott. January 1997 (has links) published_or_final_version / Zoology / Master / Master of Philosophy Molecular biology - Data processing. Computer graphics. Computer software.
5	Approximate string alignment and its application to ESTs, mRNAs and genome mapping Yim, Cheuk-hon, Terence., 嚴卓漢. January 2004 (has links) published_or_final_version / abstract / Computer Science and Information Systems / Master / Master of Philosophy Gene mapping - Data processing Nucleotide sequence - Data processing. Molecular biology - Data processing. Algorithms.
6	Simulation and database software for computational systems biology : PySCes and JWS Online Olivier, Brett Gareth 03 1900 (has links) Thesis (PhD)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: Since their inception, biology and biochemistry have been spectacularly successful in characterising the living cell and its components. As the volume of information about cellular components continues to increase, we need to ask how we should use this information to understand the functioning of the living cell? Computational systems biology uses an integrative approach that combines theoretical exploration, computer modelling and experimental research to answer this question. Central to this approach is the development of computational models, new modelling strategies and computational tools. Against this background, this study aims to: (i) develop a new modelling package: PySCeS, (ii) use PySCeS to study discontinuous behaviour in a metabolic pathway in a way that was very difficult, if not impossible, with existing software, (iii) develop an interactive, web-based repository (JWS Online) of cellular system models. Three principles that, in our opinion, should form the basis of any new modelling software were laid down: accessibility (there should be as few barriers as possible to PySCeS use and distribution), flexibility (pySCeS should be extendable by the user, not only the developers) and usability (PySCeS should provide the tools we needed for our research). After evaluating various alternatives we decided to base PySCeS on the freely available programming language, Python, which, in combination with the large collection of science and engineering algorithms in the SciPy libraries, would give us a powerful modern, interactive development environment. / AFRIKAANSE OPSOMMING: Sedert hul totstandkoming was biologie en, meer spesifiek, biochemie uiters suksesvol in die karakterisering van die lewende sel se komponente. Steeds groei die hoeveelheid informasie oor die molekulêre bestanddele van die sel daagliks; ons moet onself dus afvra hoe ons hierdie informasie kan integreer tot 'n verstaanbare beskrywing van die lewende sel se werking. Om dié vraag te beantwoord gebruik rekenaarmatige sisteembiologie 'n geïntegreerde benadering wat teorie, rekenaarmatige modellering en eksperimenteeIe navorsing kombineer. Sentraal tot die benadering is die ontwikkeling van nuwe modelle, strategieë vir modellering, en sagteware. Teen hierdie agtergrond is die hoofdoelstelling van hierdie projek: (i) die ontwikkeling van 'n nuwe modelleringspakket, PySCeS (ii) die benutting van PySCeS om diskontinue gedrag in n metaboliese sisteem te bestudeer (iets wat met die huidiglik beskikbare sagteware redelik moeilik is), (en iii) die ontwikkeling vann interaktiewe, internet-gebaseerde databasis van sellulêre sisteem modelle, JWS Online. Ons is van mening dat nuwe sagteware op drie belangrike beginsels gebaseer behoort te wees: toeganklikheid (die sagteware moet maklik bekombaar en bruikbaar wees), buigsaamheid (die gebruiker moet self PySCeS kan verander en ontwikkel) en bruikbaarheid (al die funksionalitiet wat ons vir ons navorsing nodig moet in PySCeS ingebou wees). Ons het verskeie opsies oorweeg en besluit om die vrylik verkrygbare programmeringstaal, Python, in samehang die groot kolleksie wetenskaplike algoritmes, SciPy, te gebruik. Hierdie kombinasie verskaf n kragtige, interaktiewe ontwikkelings- en gebruikersomgewing. PySCeS is ontwikkel om onder beide die Windows en Linux bedryfstelsels te werk en, meer spesifiek, om gebruik te maak van 'n 'command line interface'. Dit beteken dat PySCeS op enige interaktiewe rekenaar-terminaal Python ondersteun sal werk. Hierdie eienskap maak ook moontlik die gebruik van PySCeS as 'n modelleringskomponent in 'n groter sagteware pakket onder enige bedryfstelsel wat Python ondersteun. PySCeS is op 'n modulere ontwerp gebaseer, wat dit moontlik vir die eindgebruiker maak om die sagteware se bronkode verder te ontwikkel. As 'n toepassing is PySCeS gebruik om die oorsaak van histeretiese gedrag van 'n lineêre, eindproduk-geïnhibeerde metaboliese pad te ondersoek. Ons het hierdie interessante gedrag in 'n vorige studie ontdek, maar kon nie, met die sagteware wat op daardie tydstip tot ons beskikking was, hierdie studie voortsit nie. Met PySCeS se ingeboude vermoë om parameter kontinuering te doen, kon ons die oorsake van hierdie diskontinuë gedrag volledig karakteriseer. Verder het ons 'n nuwe metode ontwikkel om hierdie gedrag te visualiseer as 'n interaksie tussen die volledige sisteem se subkomponente. Tydens PySCeS se ontwikkeling het ons opgemerk dat dit baie moeilik was om metaboliese modelle wat in die literature gepubliseer is te herbou en te bestudeer. Hierdie situasie is grotendeels die gevolg van die feit dat nêrens 'n sentrale databasis vir metaboliese modelle bestaan nie (soos dit wel bestaan vir genomiese data of proteïen strukture). Die JWS Online databasis is spesifiek ontwikkel om hierdie leemte te vul. JWS Online maak dit vir die gebruiker moontlik om, via die internet en sonder die installasie van enige gespesialiseerde modellerings sagteware, gepubliseerde modelle te bestudeer en ook af te laai vir gebruik met ander modelleringspakkette soos bv. PySCeS. JWS Online het alreeds 'n onmisbare hulpbron vir sisteembiologiese navorsing en onderwys geword. Molecular biology -- Data processing Cytology -- Data processing Cytology -- Mathematical models Cytology -- Computer simulation Biochemistry -- Computer simulation Dissertations -- Biochemistry
7	Binding sites in protein structures: characterisation and relation with destabilising regions Dessailly, Benoît 20 September 2007 (has links) An increasing number of proteins with unknown function have their three-dimensional structure solved at high resolution. This situation, largely due to structural genomics initiatives, has been stimulating the development of automated structure-based function prediction methods. Knowledge of residues important for function – and more particularly – for binding can help automated prediction of function in different ways. The properties of a binding site such as its shape or amino acid composition can provide clues on the ligand that may bind to it. Also, having information on functionally important regions in similar proteins can refine the process of annotation transfer between homologues.<p>Experimental results indicate that functional residues often have an unfavourable contribution to the stability of the folded state of a protein. This observation is the underlying principle of several computational methods for predicting the location of functional sites in protein structures. These methods search protein structures for destabilising residues, with the assumption that these are likely to be important for function.<p>We have developed a method to detect clusters of destabilising residues which are in close spatial proximity within a protein structure. Individual residue contributions to protein stability are evaluated using detailed atomic models and an energy function based on fundamental physico-chemical principles.<p>Our overall aim in this work was to evaluate the overlap between these clusters of destabilising residues and known binding sites in proteins.<p>Unfortunately, reliable benchmark datasets of known binding sites in proteins are sorely lacking. Therefore, we have undertaken a comprehensive approach to define binding sites unambiguously from structural data. We have rigorously identified seven issues which should be considered when constructing datasets of binding sites to validate prediction methods, and we present the construction of two new datasets in which these problems are handled. In this regard, our work constitute a major improvement over previous studies in the field.<p>Our first dataset consists of 70 proteins with binding sites for diverse types of ligands (e.g. nucleic acids, metal ions) and was constructed using all available data, including literature curation. The second dataset contains 192 proteins with binding sites for small ligands and polysaccharides, does not require literature curation, and can therefore be automatically updated.<p>We have used our dataset of 70 proteins to evaluate the overlap between destabilising regions and binding sites (the second dataset of 192 proteins was not used for that evaluation as it constitutes a later improvement). The overlap is on average limited but significantly larger than random. The extent of the overlap varies with the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for nucleic acid-binding sites. These differences are rationalised in terms of the geometry and energetics of the binding sites.<p>Although destabilising regions, as detected in this work, can in general not be used to predict all types of binding sites in protein structures, they can provide useful information, particularly on the location of binding sites for polysaccharides and small ligands.<p>In addition, our datasets of binding sites in proteins should help other researchers to derive and validate new function prediction methods. We also hope that the criteria which we use to define binding sites may be useful in setting future standards in other analyses. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences exactes et naturelles Biologie Protein binding Molecular biology -- Data processing Proteins -- Structure Protéines -- Fixation Biologie moléculaire -- Informatique Protéines -- Structure protein structure bioinformatics protein ligand binding site molecular biology
8	Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functions Huang, Fei January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Intrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions. Intrinsically disordered proteins Support vector machine Clustering Proteins -- Conformation -- Research Proteins -- Denaturation Protein folding -- Research Support vector machines Aggregation (Chemistry) Amino acids -- Analysis Cellular signal transduction Molecular biology -- Mathematics Algorithms

Search results