1 |
An Application for Downloading and Integrating Molecular Biology DataFontaine, Burr R. 24 August 2005 (has links)
Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Sciences
in the School of Informatics Indiana University
July 2004 / Integrating large volumes of data from diverse sources is a formidable challenge for many investigators in the field of molecular biology. Developing efficient methods for accessing and integrating this data is a major focus of investigation in the field of bioinformatics.
In early 2003, the Hereditary Genomics division of the department of Medical and Molecular Genetics at IUPUI recognized the need for a software application that would automate many of the manual processes that were being used to obtain data for their research. The two primary objectives for this project were: 1) an application that would provide large-scale, integrated output tables to help answer questions that frequently arose in the course of their research, and 2) a graphic user interface (GUI) that would minimize or eliminate the need for technical expertise in computer programming or database operations on the part of the end-users.
In early 2003, Indiana University (IU), IBM, and the Indiana Genomics Initiative (INGEN) introduced a new resource called Centralized Life Sciences Data Services (CLSD). CLSD is a centralized data repository that provides programmatic access to biological data that is collected and integrated from multiple public, online databases.
METHODS
1. an in-depth analysis was conducted to assess the department's data requirements and map these requirements to the data available at CLSD
2. CLSD incorporated new data as necessary
3. SQL was written to generate tables that would replace the targeted manual processes
4. a DB2 client was installed in Medical and Molecular Genetics to establish remote access to CLSD
5. a graphic user interface (GUI) was designed and implemented in HTML/CGI
6. a PERL program was written to accept parameters from the web input form, submit queries to CLSD, and generate HTML-based output tables
7. validation, updates, and maintenance procedures were conducted after early prototype implementation
RESULTS AND CONCLUSIONS
This application resulted in a substantial increase in efficiency over the manual methods that were previously used for data collection. The application also allows research teams to update their data much more frequently. A high level of accuracy in the output tables was confirmed by a thorough validation process.
|
2 |
Integration of kinetic models with data from 13C-metabolic flux experimentsSchabort, Willem Petrus Du Toit 12 1900 (has links)
Thesis (MSc (Biochemistry))--University of Stellenbosch, 2007. / A detailed mathematical description of all the processes in a cell could be an
informative tool for investigating biological function. Detailed kinetic models
could be built either by obtaining enzyme kinetic parameters in vitro, or
by obtaining them from time series analyses of metabolite data from rapid
pulse experiments. A genome scale in vitro enzyme kinetic assay project
would be prohibitively laborious with the current technologies. Further,
there are still uncertainties about the importance of in vivo effects such as
metabolite channelling, spatial effects and molecular crowding which could
make in vitro determined parameters invalid. Accordingly, there is much
interest in in vivo experiments for kinetic modelling. In vivo experimental
methods suffer from a number of technical and even fundamental problems.
Technical problems are being solved by more sensitive metabolomics tools
and rapid sampling technologies. However, the large number of effectors of
each enzyme reaction makes it impossible to obtain models at the level of detail
possible with the in vitro method. Ultimately, the solution to building a
genome scale Silicon Cell is to make use of both strategies. As metabolomics
technologies are rapidly improving, it would thus make sense to follow the
parts-based in vitro kinetics methodology, and carry out a detailed accuracy
assessment of the model with in vivo experiments. To address the problem
of the fundamental limit of information from concentration time-series, other
in vivo experiments will have to be carried out as well. 13C-metabolic flux
analysis has recently undergone vast improvements with the use of better experimental
protocols and powerful algorithms for flux calculation. Incorporation
of this type of experiment in the validation protocol is the aim of this thesis, which represents an intermediary step towards using the genome-scale
stoichiometric models as platforms for building genome-scale kinetic models.
It is illustrated here how kinetic models can be combined with metabolic
flux data in a special way which allows correct modelling of boundary conditions
and validation using novel concepts. We used 13C-metabolic flux
analysis and gas chromatography-mass-spectrometry to measure metabolic
fluxes through the central metabolic pathways of the yeast Saccharomyces
cerevisiae. This data was integrated with a previously constructed detailed
kinetic model of fermentative glycolysis in the yeast to illustrate our approach.
Various implications for such data integration with kinetic models
were identified and a software program was designed for this purpose.
|
3 |
Algorithms on constrained sequence alignmentHo, Ngai-lam., 何毅林. January 2004 (has links)
published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy
|
4 |
Development of graphical software tools for molecular biologyArcher, Emory Scott. January 1997 (has links)
published_or_final_version / Zoology / Master / Master of Philosophy
|
5 |
Approximate string alignment and its application to ESTs, mRNAs and genome mappingYim, Cheuk-hon, Terence., 嚴卓漢. January 2004 (has links)
published_or_final_version / abstract / Computer Science and Information Systems / Master / Master of Philosophy
|
6 |
Simulation and database software for computational systems biology : PySCes and JWS OnlineOlivier, Brett Gareth 03 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: Since their inception, biology and biochemistry have been spectacularly successful in
characterising the living cell and its components. As the volume of information about
cellular components continues to increase, we need to ask how we should use this information
to understand the functioning of the living cell?
Computational systems biology uses an integrative approach that combines theoretical
exploration, computer modelling and experimental research to answer this question.
Central to this approach is the development of computational models, new modelling
strategies and computational tools. Against this background, this study aims to: (i) develop
a new modelling package: PySCeS, (ii) use PySCeS to study discontinuous behaviour
in a metabolic pathway in a way that was very difficult, if not impossible, with
existing software, (iii) develop an interactive, web-based repository (JWS Online) of cellular
system models.
Three principles that, in our opinion, should form the basis of any new modelling
software were laid down: accessibility (there should be as few barriers as possible to
PySCeS use and distribution), flexibility (pySCeS should be extendable by the user, not
only the developers) and usability (PySCeS should provide the tools we needed for our
research). After evaluating various alternatives we decided to base PySCeS on the freely
available programming language, Python, which, in combination with the large collection
of science and engineering algorithms in the SciPy libraries, would give us a powerful
modern, interactive development environment. / AFRIKAANSE OPSOMMING: Sedert hul totstandkoming was biologie en, meer spesifiek, biochemie uiters suksesvol
in die karakterisering van die lewende sel se komponente. Steeds groei die hoeveelheid
informasie oor die molekulêre bestanddele van die sel daagliks; ons moet onself dus afvra
hoe ons hierdie informasie kan integreer tot 'n verstaanbare beskrywing van die lewende
sel se werking.
Om dié vraag te beantwoord gebruik rekenaarmatige sisteembiologie 'n geïntegreerde
benadering wat teorie, rekenaarmatige modellering en eksperimenteeIe navorsing kombineer.
Sentraal tot die benadering is die ontwikkeling van nuwe modelle, strategieë vir
modellering, en sagteware. Teen hierdie agtergrond is die hoofdoelstelling van hierdie
projek: (i) die ontwikkeling van 'n nuwe modelleringspakket, PySCeS (ii) die benutting
van PySCeS om diskontinue gedrag in n metaboliese sisteem te bestudeer (iets wat
met die huidiglik beskikbare sagteware redelik moeilik is), (en iii) die ontwikkeling vann
interaktiewe, internet-gebaseerde databasis van sellulêre sisteem modelle, JWS Online.
Ons is van mening dat nuwe sagteware op drie belangrike beginsels gebaseer behoort
te wees: toeganklikheid (die sagteware moet maklik bekombaar en bruikbaar wees),
buigsaamheid (die gebruiker moet self PySCeS kan verander en ontwikkel) en bruikbaarheid
(al die funksionalitiet wat ons vir ons navorsing nodig moet in PySCeS ingebou
wees). Ons het verskeie opsies oorweeg en besluit om die vrylik verkrygbare programmeringstaal,
Python, in samehang die groot kolleksie wetenskaplike algoritmes, SciPy, te
gebruik. Hierdie kombinasie verskaf n kragtige, interaktiewe ontwikkelings- en gebruikersomgewing. PySCeS is ontwikkel om onder beide die Windows en Linux bedryfstelsels te werk
en, meer spesifiek, om gebruik te maak van 'n 'command line interface'. Dit beteken dat
PySCeS op enige interaktiewe rekenaar-terminaal Python ondersteun sal werk. Hierdie
eienskap maak ook moontlik die gebruik van PySCeS as 'n modelleringskomponent in
'n groter sagteware pakket onder enige bedryfstelsel wat Python ondersteun. PySCeS is
op 'n modulere ontwerp gebaseer, wat dit moontlik vir die eindgebruiker maak om die
sagteware se bronkode verder te ontwikkel.
As 'n toepassing is PySCeS gebruik om die oorsaak van histeretiese gedrag van 'n
lineêre, eindproduk-geïnhibeerde metaboliese pad te ondersoek. Ons het hierdie interessante
gedrag in 'n vorige studie ontdek, maar kon nie, met die sagteware wat op daardie
tydstip tot ons beskikking was, hierdie studie voortsit nie. Met PySCeS se ingeboude
vermoë om parameter kontinuering te doen, kon ons die oorsake van hierdie diskontinuë
gedrag volledig karakteriseer. Verder het ons 'n nuwe metode ontwikkel om hierdie
gedrag te visualiseer as 'n interaksie tussen die volledige sisteem se subkomponente.
Tydens PySCeS se ontwikkeling het ons opgemerk dat dit baie moeilik was om
metaboliese modelle wat in die literature gepubliseer is te herbou en te bestudeer. Hierdie
situasie is grotendeels die gevolg van die feit dat nêrens 'n sentrale databasis vir
metaboliese modelle bestaan nie (soos dit wel bestaan vir genomiese data of proteïen
strukture). Die JWS Online databasis is spesifiek ontwikkel om hierdie leemte te vul.
JWS Online maak dit vir die gebruiker moontlik om, via die internet en sonder die
installasie van enige gespesialiseerde modellerings sagteware, gepubliseerde modelle te
bestudeer en ook af te laai vir gebruik met ander modelleringspakkette soos bv. PySCeS.
JWS Online het alreeds 'n onmisbare hulpbron vir sisteembiologiese navorsing en onderwys
geword.
|
7 |
Binding sites in protein structures: characterisation and relation with destabilising regionsDessailly, Benoît 20 September 2007 (has links)
An increasing number of proteins with unknown function have their three-dimensional structure solved at high resolution. This situation, largely due to structural genomics initiatives, has been stimulating the development of automated structure-based function prediction methods. Knowledge of residues important for function – and more particularly – for binding can help automated prediction of function in different ways. The properties of a binding site such as its shape or amino acid composition can provide clues on the ligand that may bind to it. Also, having information on functionally important regions in similar proteins can refine the process of annotation transfer between homologues.<p>Experimental results indicate that functional residues often have an unfavourable contribution to the stability of the folded state of a protein. This observation is the underlying principle of several computational methods for predicting the location of functional sites in protein structures. These methods search protein structures for destabilising residues, with the assumption that these are likely to be important for function.<p>We have developed a method to detect clusters of destabilising residues which are in close spatial proximity within a protein structure. Individual residue contributions to protein stability are evaluated using detailed atomic models and an energy function based on fundamental physico-chemical principles.<p>Our overall aim in this work was to evaluate the overlap between these clusters of destabilising residues and known binding sites in proteins.<p>Unfortunately, reliable benchmark datasets of known binding sites in proteins are sorely lacking. Therefore, we have undertaken a comprehensive approach to define binding sites unambiguously from structural data. We have rigorously identified seven issues which should be considered when constructing datasets of binding sites to validate prediction methods, and we present the construction of two new datasets in which these problems are handled. In this regard, our work constitute a major improvement over previous studies in the field.<p>Our first dataset consists of 70 proteins with binding sites for diverse types of ligands (e.g. nucleic acids, metal ions) and was constructed using all available data, including literature curation. The second dataset contains 192 proteins with binding sites for small ligands and polysaccharides, does not require literature curation, and can therefore be automatically updated.<p>We have used our dataset of 70 proteins to evaluate the overlap between destabilising regions and binding sites (the second dataset of 192 proteins was not used for that evaluation as it constitutes a later improvement). The overlap is on average limited but significantly larger than random. The extent of the overlap varies with the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for nucleic acid-binding sites. These differences are rationalised in terms of the geometry and energetics of the binding sites.<p>Although destabilising regions, as detected in this work, can in general not be used to predict all types of binding sites in protein structures, they can provide useful information, particularly on the location of binding sites for polysaccharides and small ligands.<p>In addition, our datasets of binding sites in proteins should help other researchers to derive and validate new function prediction methods. We also hope that the criteria which we use to define binding sites may be useful in setting future standards in other analyses. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
8 |
Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functionsHuang, Fei January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Intrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions.
|
Page generated in 0.0948 seconds