Global ETD Search

11	Approximate string alignment and its application to ESTs, mRNAs and genome mapping Yim, Cheuk-hon, Terence., 嚴卓漢. January 2004 (has links) published_or_final_version / abstract / Computer Science and Information Systems / Master / Master of Philosophy Gene mapping - Data processing Nucleotide sequence - Data processing. Molecular biology - Data processing. Algorithms.
12	Algorithms for sequence alignment Powell, David Richard, 1973- January 2001 (has links) Abstract not available Algorithms Computational complexity Sequential analysis Sequences (Mathematics) Programming (Mathematics) Biology -- Data processing
13	Assessing the use of voting methods to improve Bayesian network structure learning Abu-Hakmeh, Khaldoon Emad 27 August 2012 (has links) Structure inference in learning Bayesian networks remains an active interest in machine learning due to the breadth of its applications across numerous disciplines. As newer algorithms emerge to better handle the task of inferring network structures from observational data, network and experiment sizes heavily impact the performance of these algorithms. Specifically difficult is the task of accurately learning networks of large size under a limited number of observations, as often encountered in biological experiments. This study evaluates the performance of several leading structure learning algorithms on large networks. The selected algorithms then serve as a committee, which then votes on the final network structure. The result is a more selective final network, containing few false positives, with compromised ability to detect all network features. Metabolomics Bioinformatics Machine learning Bayesian networks Artificial intelligence Algorithms Biology Data processing
14	Application of bioinformatics in studies of sphingolipid biosynthesis Momin, Amin Altaf 17 May 2010 (has links) The studies in this dissertation demonstrate that the gene expression pathway maps are useful tools to notice alteration in different branches of sphingolipid biosynthesis pathway based on microarray and other transcriptomic analysis. To facilitate the integrative analysis of gene expression and sphingolipid amounts, updated pathway maps were prepared using an open access visualization tool, Pathvisio v1.1. The datasets were formatted using Perl scripts and visualized with the aid of color coded pathway diagrams. Comparative analysis of transcriptomics and sphingolipid alterations from experimental studies and published literature revealed 72.8 % correlation between mRNA and sphingolipid differences (p-value < 0.0001 by the Fisher's exact test).The high correlation between gene expression differences and sphingolipid alterations highlights the application of this tool to evaluate molecular changes associate with sphingolipid alterations as well as predict differences in specific metabolites that can be experimentally verified using sensitive approaches such as mass spectrometry. In addition, bioinformatics sequence analysis was used to identify transcripts for sphingolipid biosynthesis enzyme 3-ketosphinganine reductase, and homology modeling studies helped in the evaluation of a cell line defective in sphingolipid metabolism due to mutation in the enzyme serine palmitoyltransferase, the first enzyme of de novo biosynthesis pathway. Hence, the combination of different bioinformatics approaches, including protein and DNA sequence analysis, structure modeling and pathway diagrams can provide valuable inputs for biochemical and molecular studies of sphingolipid metabolism. Pathway analysis Pathway maps Gene expression Metabolomics Cancer Sphingolipids Metabolism Mass spectrometry Biosynthesis Eukaryotic cells Prokaryotes Bioinformatics Biology Data processing
15	Explicitação de esquema orientada a contexto para promover interoperabilidade semântica / Promoting semantic interoperability by a context oriented approach to make schemas explicit Bernardo, Ivelize Rocha, 1982- 09 April 2012 (has links) Orientador: André Santanchè / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-21T20:20:46Z (GMT). No. of bitstreams: 1 Bernardo_IvelizeRocha_M.pdf: 1642753 bytes, checksum: 79818a62ab275ff01db056b803beb9b6 (MD5) Previous issue date: 2012 / Resumo: A flexibilidade proporcionada por planilhas eletrônicas possibilita sua customização seguindo modelos mentais de seus autores e as tornam sistemas populares de gerenciamento de dados. Gradativamente tem crescido a necessidade de se integrar e articular dados de diferentes planilhas e, para que máquinas possam auxiliar neste processo, o desafio é como interpretar automaticamente o seu esquema implícito, que é dirigido à interpretação humana. Alguns trabalhos propõem o mapeamento do conteúdo das planilhas para padrões abertos de interoperabilidade, principalmente aqueles da Web Semântica. A principal limitação destes trabalhos consiste no pressuposto de que é possível reconhecer e explicitar os esquemas e a semântica das planilhas automaticamente, independentemente do seu domínio. Este trabalho se diferencia por considerar o contexto e o domínio em que foram concebidas as planilhas essenciais para se traçar o conjunto de práticas compartilhadas pela comunidade em questão, que estabelece padrões de construção a serem reconhecidos automaticamente por nosso sistema, em um processo de extração de dados e explicitação de esquemas. Nossa proposta envolve uma estratégia para caracterização de padrões de construção associados a modelos conceituais de autores na construção de planilhas, que é resultado de uma ampla pesquisa de práticas compartilhadas por autores de planilhas no domínio de uso da Biologia. Neste documento apresentamos o resultado de um experimento prático envolvendo tal sistema, no qual integramos os dados de centenas de planilhas eletrônicas disponíveis na Web. Tal integração foi possível pela capacidade única de nossa abordagem de reconhecer a natureza da planilha analisada dentro de seu contexto de criação / Abstract: The flexibility provided by spreadsheets allows their customization following mental models of their authors and makes them popular data management systems. Gradually there is a growing need of integrating and join data from different spreadsheets and, to enable machines assistance in this process, the challenge is how to automatically interpret their implicit schema, which is addressed to human interpretation. In this sense, some related works propose mapping spreadsheets contents to open interoperability standards, mainly Semantic Web standards. The main limitation of such proposals is the assumption that it is possible to recognize and make explicit the schema and the semantics of spreadsheets automatically apart from their domain. This work differs by assuming the essential role of the context and the domain in which the spreadsheet was conceived to delineate shared practices of the community, which establishes building patterns to be automatically recognized by our system, in data extraction process and schema recognition. Our proposal involves a strategy to characterize building patterns related to conceptual models of authors in spreadsheets building process, which results from an extensive research of practices shared among authors of spreadsheets in the Biology usage domain. In this document we present a result of a practical experiment involving such a system, in which we integrated data from hundreds of spreadsheets available on the Web. This integration was possible due to a unique ability of our approach of recognizing the spreadsheet nature analyzed inside its creation context / Mestrado / Ciência da Computação / Mestra em Ciência da Computação Planilhas eletrônicas Web semântica Recuperação da informação Biologia - Processamento de dados Electronic spreadsheets Semantic Web Information retrieval Biology - Data processing
16	Simulation and database software for computational systems biology : PySCes and JWS Online Olivier, Brett Gareth 03 1900 (has links) Thesis (PhD)--Stellenbosch University, 2005. / ENGLISH ABSTRACT: Since their inception, biology and biochemistry have been spectacularly successful in characterising the living cell and its components. As the volume of information about cellular components continues to increase, we need to ask how we should use this information to understand the functioning of the living cell? Computational systems biology uses an integrative approach that combines theoretical exploration, computer modelling and experimental research to answer this question. Central to this approach is the development of computational models, new modelling strategies and computational tools. Against this background, this study aims to: (i) develop a new modelling package: PySCeS, (ii) use PySCeS to study discontinuous behaviour in a metabolic pathway in a way that was very difficult, if not impossible, with existing software, (iii) develop an interactive, web-based repository (JWS Online) of cellular system models. Three principles that, in our opinion, should form the basis of any new modelling software were laid down: accessibility (there should be as few barriers as possible to PySCeS use and distribution), flexibility (pySCeS should be extendable by the user, not only the developers) and usability (PySCeS should provide the tools we needed for our research). After evaluating various alternatives we decided to base PySCeS on the freely available programming language, Python, which, in combination with the large collection of science and engineering algorithms in the SciPy libraries, would give us a powerful modern, interactive development environment. / AFRIKAANSE OPSOMMING: Sedert hul totstandkoming was biologie en, meer spesifiek, biochemie uiters suksesvol in die karakterisering van die lewende sel se komponente. Steeds groei die hoeveelheid informasie oor die molekulêre bestanddele van die sel daagliks; ons moet onself dus afvra hoe ons hierdie informasie kan integreer tot 'n verstaanbare beskrywing van die lewende sel se werking. Om dié vraag te beantwoord gebruik rekenaarmatige sisteembiologie 'n geïntegreerde benadering wat teorie, rekenaarmatige modellering en eksperimenteeIe navorsing kombineer. Sentraal tot die benadering is die ontwikkeling van nuwe modelle, strategieë vir modellering, en sagteware. Teen hierdie agtergrond is die hoofdoelstelling van hierdie projek: (i) die ontwikkeling van 'n nuwe modelleringspakket, PySCeS (ii) die benutting van PySCeS om diskontinue gedrag in n metaboliese sisteem te bestudeer (iets wat met die huidiglik beskikbare sagteware redelik moeilik is), (en iii) die ontwikkeling vann interaktiewe, internet-gebaseerde databasis van sellulêre sisteem modelle, JWS Online. Ons is van mening dat nuwe sagteware op drie belangrike beginsels gebaseer behoort te wees: toeganklikheid (die sagteware moet maklik bekombaar en bruikbaar wees), buigsaamheid (die gebruiker moet self PySCeS kan verander en ontwikkel) en bruikbaarheid (al die funksionalitiet wat ons vir ons navorsing nodig moet in PySCeS ingebou wees). Ons het verskeie opsies oorweeg en besluit om die vrylik verkrygbare programmeringstaal, Python, in samehang die groot kolleksie wetenskaplike algoritmes, SciPy, te gebruik. Hierdie kombinasie verskaf n kragtige, interaktiewe ontwikkelings- en gebruikersomgewing. PySCeS is ontwikkel om onder beide die Windows en Linux bedryfstelsels te werk en, meer spesifiek, om gebruik te maak van 'n 'command line interface'. Dit beteken dat PySCeS op enige interaktiewe rekenaar-terminaal Python ondersteun sal werk. Hierdie eienskap maak ook moontlik die gebruik van PySCeS as 'n modelleringskomponent in 'n groter sagteware pakket onder enige bedryfstelsel wat Python ondersteun. PySCeS is op 'n modulere ontwerp gebaseer, wat dit moontlik vir die eindgebruiker maak om die sagteware se bronkode verder te ontwikkel. As 'n toepassing is PySCeS gebruik om die oorsaak van histeretiese gedrag van 'n lineêre, eindproduk-geïnhibeerde metaboliese pad te ondersoek. Ons het hierdie interessante gedrag in 'n vorige studie ontdek, maar kon nie, met die sagteware wat op daardie tydstip tot ons beskikking was, hierdie studie voortsit nie. Met PySCeS se ingeboude vermoë om parameter kontinuering te doen, kon ons die oorsake van hierdie diskontinuë gedrag volledig karakteriseer. Verder het ons 'n nuwe metode ontwikkel om hierdie gedrag te visualiseer as 'n interaksie tussen die volledige sisteem se subkomponente. Tydens PySCeS se ontwikkeling het ons opgemerk dat dit baie moeilik was om metaboliese modelle wat in die literature gepubliseer is te herbou en te bestudeer. Hierdie situasie is grotendeels die gevolg van die feit dat nêrens 'n sentrale databasis vir metaboliese modelle bestaan nie (soos dit wel bestaan vir genomiese data of proteïen strukture). Die JWS Online databasis is spesifiek ontwikkel om hierdie leemte te vul. JWS Online maak dit vir die gebruiker moontlik om, via die internet en sonder die installasie van enige gespesialiseerde modellerings sagteware, gepubliseerde modelle te bestudeer en ook af te laai vir gebruik met ander modelleringspakkette soos bv. PySCeS. JWS Online het alreeds 'n onmisbare hulpbron vir sisteembiologiese navorsing en onderwys geword. Molecular biology -- Data processing Cytology -- Data processing Cytology -- Mathematical models Cytology -- Computer simulation Biochemistry -- Computer simulation Dissertations -- Biochemistry
17	Binding sites in protein structures: characterisation and relation with destabilising regions Dessailly, Benoît 20 September 2007 (has links) An increasing number of proteins with unknown function have their three-dimensional structure solved at high resolution. This situation, largely due to structural genomics initiatives, has been stimulating the development of automated structure-based function prediction methods. Knowledge of residues important for function – and more particularly – for binding can help automated prediction of function in different ways. The properties of a binding site such as its shape or amino acid composition can provide clues on the ligand that may bind to it. Also, having information on functionally important regions in similar proteins can refine the process of annotation transfer between homologues.<p>Experimental results indicate that functional residues often have an unfavourable contribution to the stability of the folded state of a protein. This observation is the underlying principle of several computational methods for predicting the location of functional sites in protein structures. These methods search protein structures for destabilising residues, with the assumption that these are likely to be important for function.<p>We have developed a method to detect clusters of destabilising residues which are in close spatial proximity within a protein structure. Individual residue contributions to protein stability are evaluated using detailed atomic models and an energy function based on fundamental physico-chemical principles.<p>Our overall aim in this work was to evaluate the overlap between these clusters of destabilising residues and known binding sites in proteins.<p>Unfortunately, reliable benchmark datasets of known binding sites in proteins are sorely lacking. Therefore, we have undertaken a comprehensive approach to define binding sites unambiguously from structural data. We have rigorously identified seven issues which should be considered when constructing datasets of binding sites to validate prediction methods, and we present the construction of two new datasets in which these problems are handled. In this regard, our work constitute a major improvement over previous studies in the field.<p>Our first dataset consists of 70 proteins with binding sites for diverse types of ligands (e.g. nucleic acids, metal ions) and was constructed using all available data, including literature curation. The second dataset contains 192 proteins with binding sites for small ligands and polysaccharides, does not require literature curation, and can therefore be automatically updated.<p>We have used our dataset of 70 proteins to evaluate the overlap between destabilising regions and binding sites (the second dataset of 192 proteins was not used for that evaluation as it constitutes a later improvement). The overlap is on average limited but significantly larger than random. The extent of the overlap varies with the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for nucleic acid-binding sites. These differences are rationalised in terms of the geometry and energetics of the binding sites.<p>Although destabilising regions, as detected in this work, can in general not be used to predict all types of binding sites in protein structures, they can provide useful information, particularly on the location of binding sites for polysaccharides and small ligands.<p>In addition, our datasets of binding sites in proteins should help other researchers to derive and validate new function prediction methods. We also hope that the criteria which we use to define binding sites may be useful in setting future standards in other analyses. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences exactes et naturelles Biologie Protein binding Molecular biology -- Data processing Proteins -- Structure Protéines -- Fixation Biologie moléculaire -- Informatique Protéines -- Structure protein structure bioinformatics protein ligand binding site molecular biology
18	Computational biology approaches in drug repurposing and gene essentiality screening Philips, Santosh 20 June 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The rapid innovations in biotechnology have led to an exponential growth of data and electronically accessible scientific literature. In this enormous scientific data, knowledge can be exploited, and novel discoveries can be made. In my dissertation, I have focused on the novel molecular mechanism and therapeutic discoveries from big data for complex diseases. It is very evident today that complex diseases have many factors including genetics and environmental effects. The discovery of these factors is challenging and critical in personalized medicine. The increasing cost and time to develop new drugs poses a new challenge in effectively treating complex diseases. In this dissertation, we want to demonstrate that the use of existing data and literature as a potential resource for discovering novel therapies and in repositioning existing drugs. The key to identifying novel knowledge is in integrating information from decades of research across the different scientific disciplines to uncover interactions that are not explicitly stated. This puts critical information at the fingertips of researchers and clinicians who can take advantage of this newly acquired knowledge to make informed decisions. This dissertation utilizes computational biology methods to identify and integrate existing scientific data and literature resources in the discovery of novel molecular targets and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was able to identify a potential novel anti-estrogenic indication for the commonly used urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in the laboratory. In chapter 3 of my dissertation, through the use of a manually curated corpus and machine learning algorithms, I identified and extracted genes that are essential for cell survival. These results brighten the reality that novel knowledge with potential clinical applications can be discovered from existing data and literature by integrating information across various scientific disciplines. Drug repurposing Gene essentiality Literature mining Machine learning Biology -- Data processing Computational biology -- Methods Epidemiology -- Statisical methods Personalized medicine Genetic disorders -- Molecular diagnosis
19	Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functions Huang, Fei January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Intrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions. Intrinsically disordered proteins Support vector machine Clustering Proteins -- Conformation -- Research Proteins -- Denaturation Protein folding -- Research Support vector machines Aggregation (Chemistry) Amino acids -- Analysis Cellular signal transduction Molecular biology -- Mathematics Algorithms
20	Context specific text mining for annotating protein interactions with experimental evidence Pandit, Yogesh 03 January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Proteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents. Data mining -- Analysis Systems biology -- Methodology Biology -- Data processing -- Research

Search results