Global ETD Search

1	Epistemologia da Informática em Saúde: entre a teoria e a prática / Epistemology of Medical Informatics: between theory and practice Colepícolo, Eliane [UNIFESP] 26 March 2008 (has links) (PDF) Made available in DSpace on 2015-07-22T20:50:02Z (GMT). No. of bitstreams: 0 Previous issue date: 2008-03-26 / Epistemologia da Informática em Saúde: entre a teoria e a prática. Eliane Colepí-colo. 2008. CONTEXTO. O objetivo dessa pesquisa é compreender a epistemologia da área de Informática em Saúde (IS) por meio de um estudo comparativo entre aspectos teóricos e práticos desta disciplina. MATERIAIS E MÉTODOS. O estudo foi dividido em 3 eta-pas: estudo estatístico, estudo terminológico e estudo epistemológico. O estudo esta-tístico envolveu o desenvolvimento e uso de robô para extração de metadados de arti-gos científicos da base PubMed, assim como a mineração de textos destes resumos de artigos, utilizados para estatísticas e análise posterior. O estudo terminológico visou o desenvolvimento de um tesauro especializado em IS, aqui denominado EpistemIS, que, integrado ao MeSH, serviu como base ao estudo estatístico. O estudo epistemo-lógico começou com o estudo dos metaconceitos da ação e pensamento humanos (MAPHs), que são arte, técnica, ciência, tecnologia e tecnociência. A seguir, realizou-se o desenvolvimento de um método epistemológico, baseado nas obras de Mário Bunge, para classificação epistemológica de conceitos da área provenientes do tesau-ro EpistemIS. Uma pesquisa de opinião com a comunidade científica da área foi reali-zada por meio de questionário na web. RESULTADOS. Obteve-se: uma caracteriza-ção dos MAPHs, mapas de sistematização do conhecimento em IS, classificações epistemológica e em MAPHs da IS, um mapa do conhecimento em IS e o consenso da comunidade sobre a epistemologia da IS. Por fim, foram calculadas estatísticas relati-vas: às classificações epistemológica e em MAPHs em IS, à integração entre o corpus de análise (437.289 artigos PubMed) e o tesauro EpistemIS. CONCLUSÃO. A partir de argumentos teóricos e práticos concluiu-se que a Informática em Saúde é uma tecno-ciência que se ocupa de solucionar problemas relativos aos domínios das Ciências da Vida, Ciências da Saúde e do Cuidado em Saúde, por meio da pesquisa científica in-terdisciplinar e do desenvolvimento de tecnologia para uso na sociedade. / TEDE Epistemologia Estatística Medical Subject Headings (MeSH) Mineração de textos Terminologia Tesauro Informática em saúde Epistemology Statistics Medical Subject Headings (MeSH) Text Mining Terminology Thesaurus Health informatics
2	Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed Eisinger, Daniel 07 October 2013 (has links) The patent domain is a very important source of scientific information that is currently not used to its full potential. Searching for relevant patents is a complex task because the number of existing patents is very high and grows quickly, patent text is extremely complicated, and standard vocabulary is not used consistently or doesn’t even exist. As a consequence, pure keyword searches often fail to return satisfying results in the patent domain. Major companies employ patent professionals who are able to search patents effectively, but even they have to invest a lot of time and effort into their search. Academic scientists on the other hand do not have access to such resources and therefore often do not search patents at all, but they risk missing up-to-date information that will not be published in scientific publications until much later, if it is published at all. Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Similarly, professional patent searches expand beyond keywords by including class codes from various patent classification systems. However, classification-based searches can only be performed effectively if the user has very detailed knowledge of the system, which is usually not the case for academic scientists. Consequently, we investigated methods to automatically identify relevant classes that can then be suggested to the user to expand their query. Since every patent is assigned at least one class code, it should be possible for these assignments to be used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. In order to gain such knowledge, we perform an in-depth comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows that the hierarchies are structurally similar, but terms and annotations differ significantly. The most important differences concern the considerably higher complexity of the IPC class definitions compared to MeSH terms and the far lower number of class assignments to the average patent compared to the number of MeSH terms assigned to PubMed documents. As a result of these differences, problems are caused both for unexperienced patent searchers and professionals. On the one hand, the complex term system makes it very difficult for members of the former group to find any IPC classes that are relevant for their search task. On the other hand, the low number of IPC classes per patent points to incomplete class assignments by the patent office, therefore limiting the recall of the classification-based searches that are frequently performed by the latter group. We approach these problems from two directions: First, by automatically assigning additional patent classes to make up for the missing assignments, and second, by automatically retrieving relevant keywords and classes that are proposed to the user so they can expand their initial search. For the automated assignment of additional patent classes, we adapt an approach to the patent domain that was successfully used for the assignment of MeSH terms to PubMed abstracts. Each document is assigned a set of IPC classes by a large set of binary Maximum-Entropy classifiers. Our evaluation shows good performance by individual classifiers (precision/recall between 0:84 and 0:90), making the retrieval of additional relevant documents for specific IPC classes feasible. The assignment of additional classes to specific documents is more problematic, since the precision of our classifiers is not high enough to avoid false positives. However, we propose filtering methods that can help solve this problem. For the guided patent search, we demonstrate various methods to expand a user’s initial query. Our methods use both keywords and class codes that the user enters to retrieve additional relevant keywords and classes that are then suggested to the user. These additional query components are extracted from different sources such as patent text, IPC definitions, external vocabularies and co-occurrence data. The suggested expansions can help unexperienced users refine their queries with relevant IPC classes, and professionals can compose their complete query faster and more easily. We also present GoPatents, a patent retrieval prototype that incorporates some of our proposals and makes faceted browsing of a patent corpus possible. info:eu-repo/classification/ddc/004 ddc:004
3	Word-sense disambiguation in biomedical ontologies Alexopoulou, Dimitra 11 June 2010 (has links) With the ever increase in biomedical literature, text-mining has emerged as an important technology to support bio-curation and search. Word sense disambiguation (WSD), the correct identification of terms in text in the light of ambiguity, is an important problem in text-mining. Since the late 1940s many approaches based on supervised (decision trees, naive Bayes, neural networks, support vector machines) and unsupervised machine learning (context-clustering, word-clustering, co-occurrence graphs) have been developed. Knowledge-based methods that make use of the WordNet computational lexicon have also been developed. But only few make use of ontologies, i.e. hierarchical controlled vocabularies, to solve the problem and none exploit inference over ontologies and the use of metadata from publications. This thesis addresses the WSD problem in biomedical ontologies by suggesting diﬀerent approaches for word sense disambiguation that use ontologies and metadata. The "Closest Sense" method assumes that the ontology deﬁnes multiple senses of the term; it computes the shortest path of co-occurring terms in the document to one of these senses. The "Term Cooc" method deﬁnes a log-odds ratio for co-occurring terms including inferred co-occurrences. The "MetaData" approach trains a classiﬁer on metadata; it does not require any ontology, but requires training data, which the other methods do not. These approaches are compared to each other when applied to a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The MetaData approach performs best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The Term Cooc approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The Closest Sense approach achieves on average 80% success rate. Furthermore, the thesis showcases applications ranging from ontology design to semantic search where WSD is important. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.064 seconds