1 |
Formalizing biomedical concepts from textual definitionsTsatsaronis, George, Ma, Yue, Petrova, Alina, Kissa, Maria, Distel, Felix, Baader , Franz, Schroeder, Michael 04 January 2016 (has links) (PDF)
Background
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
Results
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
Conclusions
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
2 |
A genome-scale mining strategy for recovering novel rapidly-evolving nuclear single-copy genes for addressing shallow-scale phylogenetics in HydrangeaWanke, Stefan, Granados Mendoza, Carolina, Naumann, Julia, Samain, Marie-Stéphanie, Goetghebeur, Paul, De Smet, Yannick 04 January 2016 (has links) (PDF)
Background
Identifying orthologous molecular markers that potentially resolve relationships at and below species level has been a major challenge in molecular phylogenetics over the past decade. Non-coding regions of nuclear low- or single-copy markers are a vast and promising source of data providing information for shallow-scale phylogenetics. Taking advantage of public transcriptome data from the One Thousand Plant Project (1KP), we developed a genome-scale mining strategy for recovering potentially orthologous single-copy markers to address low-scale phylogenetics. Our marker design targeted the amplification of intron-rich nuclear single-copy regions from genomic DNA. As a case study we used Hydrangea section Cornidia, one of the most recently diverged lineages within Hydrangeaceae (Cornales), for comparing the performance of three of these nuclear markers to other "fast" evolving plastid markers.
Results
Our data mining and filtering process retrieved 73 putative nuclear single-copy genes which are potentially useful for resolving phylogenetic relationships at a range of divergence depths within Cornales. The three assessed nuclear markers showed considerably more phylogenetic signal for shallow evolutionary depths than conventional plastid markers. Phylogenetic signal in plastid markers increased less markedly towards deeper evolutionary divergences. Potential phylogenetic noise introduced by nuclear markers was lower than their respective phylogenetic signal across all evolutionary depths. In contrast, plastid markers showed higher probabilities for introducing phylogenetic noise than signal at the deepest evolutionary divergences within the tribe Hydrangeeae (Hydrangeaceae).
Conclusions
While nuclear single-copy markers are highly informative for shallow evolutionary depths without introducing phylogenetic noise, plastid markers might be more appropriate for resolving deeper-level divergences such as the backbone relationships of the Hydrangeaceae family and deeper, at which non-coding parts of nuclear markers could potentially introduce noise due to elevated rates of evolution. The herein developed and demonstrated transcriptome based mining strategy has a great potential for the design of novel and highly informative nuclear markers for a range of plant groups and evolutionary scales.
|
3 |
Convergent evolution of heat-inducibility during subfunctionalization of the Hsp70 gene familyKrenek, Sascha, Schlegel, Martin, Berendonk, Thomas U. 28 November 2013 (has links) (PDF)
Background: Heat-shock proteins of the 70 kDa family (Hsp70s) are essential chaperones required for key cellular functions. In eukaryotes, four subfamilies can be distinguished according to their function and localisation in different cellular compartments: cytosol, endoplasmic reticulum, mitochondria and chloroplasts. Generally, multiple cytosol-type Hsp70s can be found in metazoans that show either constitutive expression and/or stress-inducibility, arguing for the evolution of different tasks and functions. Information about the hsp70 copy number and diversity in microbial eukaryotes is, however, scarce, and detailed knowledge about the differential gene expression in most protists is lacking. Therefore, we have characterised the Hsp70 gene family of Paramecium caudatum to gain insight into the evolution and differential heat stress response of the distinct family members in protists and to investigate the diversification of eukaryotic hsp70s focusing on the evolution of heat-inducibility.
Results: Eleven putative hsp70 genes could be detected in P. caudatum comprising homologs of three major Hsp70-subfamilies. Phylogenetic analyses revealed five evolutionarily distinct Hsp70-groups, each with a closer relationship to orthologous sequences of Paramecium tetraurelia than to another P. caudatum Hsp70-group. These highly diverse, paralogous groups resulted from duplications preceding Paramecium speciation, underwent divergent evolution and were subject to purifying selection. Heat-shock treatments were performed to test for differential expression patterns among the five Hsp70-groups as well as for a functional conservation within Paramecium. These treatments induced exceptionally high mRNA up-regulations in one cytosolic group with a low basal expression, indicative for the major heat inducible hsp70s. All other groups showed comparatively high basal expression levels and moderate heat-inducibility, signifying constitutively expressed genes. Comparative EST analyses for P. tetraurelia hsp70s unveiled a corresponding expression pattern, which supports a functionally conserved evolution of the Hsp70 gene family in Paramecium.
Conclusions: Our analyses suggest an independent evolution of the heat-inducible cytosol-type hsp70s in Paramecium and in its close relative Tetrahymena, as well as within higher eukaryotes. This result indicates convergent evolution during hsp70 subfunctionalization and implies that heat-inducibility evolved several times during the course of eukaryotic evolution.
|
Page generated in 0.0121 seconds