• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 7
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 31
  • 13
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Abrangência nas estratégias de busca em Anestesiologia: descritores nas bases de dados MEDLINE e EMBASE / Comprehensiveness in search strategies in Anesthesiology: subheadings in MEDLINE and EMBASE databases

Volpato, Enilze de Souza Nogueira [UNESP] 24 July 2017 (has links)
Submitted by Enilze de Souza N Volpato null (enilze@btu.unesp.br) on 2017-09-20T13:58:00Z No. of bitstreams: 1 tese Enilze doutorado 18 set 2017.pdf: 2811609 bytes, checksum: 80bb3a313f1b7220a03a2d560f6d0719 (MD5) / Approved for entry into archive by LUIZA DE MENEZES ROMANETTO (luizamenezes@reitoria.unesp.br) on 2017-09-20T14:47:30Z (GMT) No. of bitstreams: 1 volpato_esn_dr_bot.pdf: 2811609 bytes, checksum: 80bb3a313f1b7220a03a2d560f6d0719 (MD5) / Made available in DSpace on 2017-09-20T14:47:30Z (GMT). No. of bitstreams: 1 volpato_esn_dr_bot.pdf: 2811609 bytes, checksum: 80bb3a313f1b7220a03a2d560f6d0719 (MD5) Previous issue date: 2017-07-24 / Introdução: Para auxiliar os pesquisadores a identificarem os termos que devem compor a estratégia de busca, bibliotecários e educadores orientam os pesquisadores a consultarem e incluírem os termos (autorizados e não autorizados) do vocabulário controlado da base de dados na formulação de estratégias sensíveis para elaboração de revisões sistemáticas. No entanto, ao utilizar todos os termos disponíveis no tesauros (i.e. vocabulário controlado), as estratégias podem ficar extensas, pois alguns descritores incluem muitos termos não autorizados. Objetivo: Avaliar a praticidade e abrangência das estratégias de buscas compostas por descritores tanto do MeSH como do EMTREE, na área de Anestesiologia, que possam compor uma única estratégia de busca a ser utilizada nas bases de dados MEDLINE via PubMed e EMBASE. Método: Em nosso estudo transversal de estratégias de busca, selecionamos e analisamos 37 estratégias de busca desenvolvidas para o campo de Anestesiologia. Foram elaboradas as estratégias de busca originais que incluíram todos os termos disponibilizados nos vocabulários controlados, ou seja, com todas as variações referentes às diferentes grafias e ordens, direta e indireta, analisadas neste estudo. As estratégias originais foram modificadas com a exclusão dos termos que eram uma variação de grafia ou da ordem (direta ou indireta) para comparação dos resultados e adaptadas para submissão nas duas bases de dados. Resultados: As estratégias originais (com inclusão das variações: diferentes grafias e ordens direta e indireta) recuperaram o mesmo número de registros que as estratégias modificadas (sem a inclusão das variações)na base de dados Medline (média de 61,3%) e maior número na EMBASE (média de 63,9 %), na amostra analisada. O número de resultados obtidos pelas pesquisas analisadas não foi idêntico usando a associação ou não dos termos MeSH e EMTREE, sendo que a associação dos termos dos dois vocabulários controlados recuperou maior número de registros em comparação com o uso de termos de apenas um deles, nas duas bases de dados estudadas. Conclusões: Considerando os resultados, recomendamos o uso de todos os termos disponíveis nos vocabulários controlados incluindo termos autorizados e não autorizados (ou seja, diferentes ortografias e ordem direta e indireta do mesmo termo) e a associação dos termos do MeSH com os do EMTREE, para elaboração de estratégias de busca altamente sensíveis na realização de revisões sistemáticas. / Introduction: A high-quality electronic search is essential in ensuring accuracy and comprehensivness in identifying potentially relevant records in conducting a systematic review. To assist researchers in identifying terms when formulating a sensitive search strategy, librarians and educators instruct researchers to consult and include preferred and non-preferred terms of the controlled database. However, by using all available terms in the thesaurus (i.e. subject headings), strategies can be lengthy and very laborious. Objective: To identify the most efficient method for searching in both Medline through PubMed and EMBASE, covering search terms with different spellings, direct and indirect orders, and association (or lack thereof) with MeSH and EMTREE terms. Method: In our cross-sectional study of search strategies, we selected and analysed 37 search strategies specifically developed for the anesthesiology field. These search strategies were adapted in order to cover all potentially relevant search terms in terms of different spellings and direct and indirect orders, most efficiently. Results: When adapted to include different spellings and direct and indirect orders, adapted versions of the selected search strategies retrieved the same number of search results in the Medline (mean of 61,3%) and higher number in EMBASE (mean of 63,9%) of the analyzed sample. The number of results retrieved by the searches analysed was not identical using the association or not of MeSH and EMTREE terms; however the association of these terms from both controlled vocabularies retireved a large number of records compared to the use of either one of them. Conclusions: In view of these results, we recommend the use of search terms which include preferred and non-preferred terms (i.e., different spellings and direct/indirect order of the same term) and associated MeSH and EMTREE terms, in order to develop highly-sensitive search strategies for systematic reviews.
12

Discovery of overlapping 1-closed biclusters

Banerjee, Abhik January 2012 (has links)
No description available.
13

A comunicação científica em saúde: uma abordagem semiótica / Scientific communication in health: a semiotic approach

Maria do Carmo Avamilano Alvarez 06 March 2015 (has links)
No cenário da cultura acadêmica e científica em transformação, as bibliotecas indagam os pesquisadores em HIV/aids sobre dificuldades e facilidades na procura de informações. Articulando as disciplinas ciência da informação, saúde pública e semiótica da cultura, este estudo busca analisar o entendimento dos pesquisadores sobre as estratégias de busca de informação técnica e científica nos sistemas bibliográficos. Além disso, identifica os sistemas de signos centrais na semiosfera da cultura e descreve as interações dos pesquisadores com sistemas de informação e bibliotecas, refletindo sobre a comunicação e seus desafios na contemporaneidade. Com base no método semiótico estrutural da Escola de Tártu-Moscou e nos trabalhos de seu maior expoente Iuri Lótman, foram realizadas 25 entrevistas com dois grupos de pesquisa em HIV/aids no Brasil. A diversidade de estratégias do pesquisador sobre a ação de buscar informação científica na contemporaneidade sustenta a dinâmica e a complexidade dos sistemas de signos. As divergências e convergências, traduzidas nas narrativas, explicitam as diferentes linguagens presentes na semiosfera e as tradições de pesquisa. Essas linguagens se transformam, modelizando-se pela variedade de formação acadêmica, atuação profissional e vivências do pesquisador, revelando a complexidade na cultura. No plano da semiosfera, alguns sistemas de signos são vistos como centrais na cultura dos pesquisadores, com destaque para o Google e a base de dados PubMed/MEDLINE. O Google ganha expressão por oferecer um sistema simples e prático, evitando o desperdício de tempo. No entanto, mostra-se um sistema controlado por algoritmos, tendendo ao empobrecimento de resultados e ao domínio comercial. A base de dados PubMed/MEDLINE evidencia sua presença nos códigos da cultura, embora esteja dividindo espaço com o prevalecente Google. Um sistema de signo central para as bibliotecas e pouco conhecido dos pesquisadores é o tesauro MeSH, uma ferramenta que atua de modo invisível por ser complexa em sua utilização. As novas tecnologias ajudam, mas não podem ser supervalorizadas. A administração do tempo e a seleção de informações relevantes para o contexto da pesquisa posicionam os sistemas de informação e bibliotecas como importantes mediadores na comunicação ou tradutores de linguagens. Entretanto, seus papéis ainda não estão claros para o pesquisador. A irregularidade na semiosfera transparece nos movimentos do centro e periferia e assimetrias observadas. A imprevisibilidade promove a transformação e faz parte da consciência do pesquisador, que busca a geração de informação nova. A biblioteca é um sistema presente na busca de informações, mas perde sua centralidade quando projetada para o futuro. Entender a assimetria e a heterogeneidade semióticas envolvidas na cultura acadêmica é uma forma de sobrevivência para as bibliotecas. A luta pela sobrevivência de grupos profissionais se revela nos esforços para manter no centro da semiosfera registros e normas que identificam a cultura. Entretanto, a sobrevivência não se dá somente no fechamento em si, mas no diálogo com o oposto, que promove a criação. As áreas da Saúde e Ciência da Informação se entrelaçam com suas linguagens e modelizações, subdividindo-se em outras modelizações, como a área de prestação de serviços à população, a área acadêmica. A comunicação e a semiótica possibilitam decifrar essa rica diversidade. / Within the scenario of the academic and scientific culture undergoing transformation, libraries question researchers in HIV / AIDS as to the difficulty and ease of the search for information. In its articulation of the disciplines information science, public health and semiotic of culture, this study analyzes the understanding of the researchers regarding the search strategies of technical and scientific information in bibliographic systems. It identifies the central sign systems in the semiosphere of culture and describes the interactions between researchers and information and library systems, reflecting on communication and its present-day challenges. Based on the structural semiotic method of the Tartu-Moscow School and the work of its greatest exponent Yuri Lotman, we undertook 25 interviews with two HIV/AIDS research groups in Brazil. The diversity of the researchers views as to the action involved in seeking scientific information nowadays confirms the dynamics and complexity of the systems of signs. The differences and similarities, as reflected in narratives, explain the different languages present in the semiosphere. These languages undergo continual transformation modeling themselves in accordance with the variety of academic training, professional activity and experience of the researcher, thus revealing the cultural complexity involved. On the plane of the semiosphere, some sign systems are seen as central to the culture of the researchers, especially Google and PubMed / MEDLINE. Google gains in significance as it offers a simple and practical system which avoids waste of time. However, one here perceives a system controlled by algorithms which tends to the impoverishment of results and to commercial domination. The presence of PubMed / MEDLINE is evident in the culture codes, although it is sharing space with the prevailing Google. A key sign system for libraries - though little known to researchers - is the MeSH thesaurus, a tool that works unperceived due to the complexity of its use. New technologies help, but should not be overvalued. Time management and the selection of the information relevant to the research context place information systems and libraries in the position of important mediators of communication or translators of language. However, their roles are still not clear to the researcher. The irregularity in the semiosphere is reflected in the movements of the center and periphery and the asymmetries observed. The unpredictability promotes transformation and is part of the consciousness of the researcher who seeks to generate new information. The library is a system present in the search for information, but loses its centrality when projected into the future. Understanding the asymmetry and the semiotic heterogeneity involved in academic culture is a way to ensure the survival of libraries. The struggle for the survival of professional groups is evident in their efforts to maintain records and norms of cultural identity at the heart of the semiosphere. However, survival is not ensured simply by this closing in on oneself, but it is rather by dialogue with the opposite that creativity is promoted. The areas of Health and Information Science are interwoven with their languages and modeling while at the same time they subdivide into other modelings such as the field of the delivery service to the population and the academic area. Communication and semiotics help decipher this rich diversity.
14

GoPubMed: Ontology-based literature search for the life sciences / GoPubMed: ontologie-basierte Literatursuche für die Lebenswissenschaften

Doms, Andreas 20 January 2009 (has links) (PDF)
Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.
15

Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

Eisinger, Daniel 08 September 2014 (has links) (PDF)
The patent domain is a very important source of scientific information that is currently not used to its full potential. Searching for relevant patents is a complex task because the number of existing patents is very high and grows quickly, patent text is extremely complicated, and standard vocabulary is not used consistently or doesn’t even exist. As a consequence, pure keyword searches often fail to return satisfying results in the patent domain. Major companies employ patent professionals who are able to search patents effectively, but even they have to invest a lot of time and effort into their search. Academic scientists on the other hand do not have access to such resources and therefore often do not search patents at all, but they risk missing up-to-date information that will not be published in scientific publications until much later, if it is published at all. Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Similarly, professional patent searches expand beyond keywords by including class codes from various patent classification systems. However, classification-based searches can only be performed effectively if the user has very detailed knowledge of the system, which is usually not the case for academic scientists. Consequently, we investigated methods to automatically identify relevant classes that can then be suggested to the user to expand their query. Since every patent is assigned at least one class code, it should be possible for these assignments to be used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. In order to gain such knowledge, we perform an in-depth comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows that the hierarchies are structurally similar, but terms and annotations differ significantly. The most important differences concern the considerably higher complexity of the IPC class definitions compared to MeSH terms and the far lower number of class assignments to the average patent compared to the number of MeSH terms assigned to PubMed documents. As a result of these differences, problems are caused both for unexperienced patent searchers and professionals. On the one hand, the complex term system makes it very difficult for members of the former group to find any IPC classes that are relevant for their search task. On the other hand, the low number of IPC classes per patent points to incomplete class assignments by the patent office, therefore limiting the recall of the classification-based searches that are frequently performed by the latter group. We approach these problems from two directions: First, by automatically assigning additional patent classes to make up for the missing assignments, and second, by automatically retrieving relevant keywords and classes that are proposed to the user so they can expand their initial search. For the automated assignment of additional patent classes, we adapt an approach to the patent domain that was successfully used for the assignment of MeSH terms to PubMed abstracts. Each document is assigned a set of IPC classes by a large set of binary Maximum-Entropy classifiers. Our evaluation shows good performance by individual classifiers (precision/recall between 0:84 and 0:90), making the retrieval of additional relevant documents for specific IPC classes feasible. The assignment of additional classes to specific documents is more problematic, since the precision of our classifiers is not high enough to avoid false positives. However, we propose filtering methods that can help solve this problem. For the guided patent search, we demonstrate various methods to expand a user’s initial query. Our methods use both keywords and class codes that the user enters to retrieve additional relevant keywords and classes that are then suggested to the user. These additional query components are extracted from different sources such as patent text, IPC definitions, external vocabularies and co-occurrence data. The suggested expansions can help unexperienced users refine their queries with relevant IPC classes, and professionals can compose their complete query faster and more easily. We also present GoPatents, a patent retrieval prototype that incorporates some of our proposals and makes faceted browsing of a patent corpus possible.
16

A comunicação científica em saúde: uma abordagem semiótica / Scientific communication in health: a semiotic approach

Alvarez, Maria do Carmo Avamilano 06 March 2015 (has links)
No cenário da cultura acadêmica e científica em transformação, as bibliotecas indagam os pesquisadores em HIV/aids sobre dificuldades e facilidades na procura de informações. Articulando as disciplinas ciência da informação, saúde pública e semiótica da cultura, este estudo busca analisar o entendimento dos pesquisadores sobre as estratégias de busca de informação técnica e científica nos sistemas bibliográficos. Além disso, identifica os sistemas de signos centrais na semiosfera da cultura e descreve as interações dos pesquisadores com sistemas de informação e bibliotecas, refletindo sobre a comunicação e seus desafios na contemporaneidade. Com base no método semiótico estrutural da Escola de Tártu-Moscou e nos trabalhos de seu maior expoente Iuri Lótman, foram realizadas 25 entrevistas com dois grupos de pesquisa em HIV/aids no Brasil. A diversidade de estratégias do pesquisador sobre a ação de buscar informação científica na contemporaneidade sustenta a dinâmica e a complexidade dos sistemas de signos. As divergências e convergências, traduzidas nas narrativas, explicitam as diferentes linguagens presentes na semiosfera e as tradições de pesquisa. Essas linguagens se transformam, modelizando-se pela variedade de formação acadêmica, atuação profissional e vivências do pesquisador, revelando a complexidade na cultura. No plano da semiosfera, alguns sistemas de signos são vistos como centrais na cultura dos pesquisadores, com destaque para o Google e a base de dados PubMed/MEDLINE. O Google ganha expressão por oferecer um sistema simples e prático, evitando o desperdício de tempo. No entanto, mostra-se um sistema controlado por algoritmos, tendendo ao empobrecimento de resultados e ao domínio comercial. A base de dados PubMed/MEDLINE evidencia sua presença nos códigos da cultura, embora esteja dividindo espaço com o prevalecente Google. Um sistema de signo central para as bibliotecas e pouco conhecido dos pesquisadores é o tesauro MeSH, uma ferramenta que atua de modo invisível por ser complexa em sua utilização. As novas tecnologias ajudam, mas não podem ser supervalorizadas. A administração do tempo e a seleção de informações relevantes para o contexto da pesquisa posicionam os sistemas de informação e bibliotecas como importantes mediadores na comunicação ou tradutores de linguagens. Entretanto, seus papéis ainda não estão claros para o pesquisador. A irregularidade na semiosfera transparece nos movimentos do centro e periferia e assimetrias observadas. A imprevisibilidade promove a transformação e faz parte da consciência do pesquisador, que busca a geração de informação nova. A biblioteca é um sistema presente na busca de informações, mas perde sua centralidade quando projetada para o futuro. Entender a assimetria e a heterogeneidade semióticas envolvidas na cultura acadêmica é uma forma de sobrevivência para as bibliotecas. A luta pela sobrevivência de grupos profissionais se revela nos esforços para manter no centro da semiosfera registros e normas que identificam a cultura. Entretanto, a sobrevivência não se dá somente no fechamento em si, mas no diálogo com o oposto, que promove a criação. As áreas da Saúde e Ciência da Informação se entrelaçam com suas linguagens e modelizações, subdividindo-se em outras modelizações, como a área de prestação de serviços à população, a área acadêmica. A comunicação e a semiótica possibilitam decifrar essa rica diversidade. / Within the scenario of the academic and scientific culture undergoing transformation, libraries question researchers in HIV / AIDS as to the difficulty and ease of the search for information. In its articulation of the disciplines information science, public health and semiotic of culture, this study analyzes the understanding of the researchers regarding the search strategies of technical and scientific information in bibliographic systems. It identifies the central sign systems in the semiosphere of culture and describes the interactions between researchers and information and library systems, reflecting on communication and its present-day challenges. Based on the structural semiotic method of the Tartu-Moscow School and the work of its greatest exponent Yuri Lotman, we undertook 25 interviews with two HIV/AIDS research groups in Brazil. The diversity of the researchers views as to the action involved in seeking scientific information nowadays confirms the dynamics and complexity of the systems of signs. The differences and similarities, as reflected in narratives, explain the different languages present in the semiosphere. These languages undergo continual transformation modeling themselves in accordance with the variety of academic training, professional activity and experience of the researcher, thus revealing the cultural complexity involved. On the plane of the semiosphere, some sign systems are seen as central to the culture of the researchers, especially Google and PubMed / MEDLINE. Google gains in significance as it offers a simple and practical system which avoids waste of time. However, one here perceives a system controlled by algorithms which tends to the impoverishment of results and to commercial domination. The presence of PubMed / MEDLINE is evident in the culture codes, although it is sharing space with the prevailing Google. A key sign system for libraries - though little known to researchers - is the MeSH thesaurus, a tool that works unperceived due to the complexity of its use. New technologies help, but should not be overvalued. Time management and the selection of the information relevant to the research context place information systems and libraries in the position of important mediators of communication or translators of language. However, their roles are still not clear to the researcher. The irregularity in the semiosphere is reflected in the movements of the center and periphery and the asymmetries observed. The unpredictability promotes transformation and is part of the consciousness of the researcher who seeks to generate new information. The library is a system present in the search for information, but loses its centrality when projected into the future. Understanding the asymmetry and the semiotic heterogeneity involved in academic culture is a way to ensure the survival of libraries. The struggle for the survival of professional groups is evident in their efforts to maintain records and norms of cultural identity at the heart of the semiosphere. However, survival is not ensured simply by this closing in on oneself, but it is rather by dialogue with the opposite that creativity is promoted. The areas of Health and Information Science are interwoven with their languages and modeling while at the same time they subdivide into other modelings such as the field of the delivery service to the population and the academic area. Communication and semiotics help decipher this rich diversity.
17

Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

Eisinger, Daniel 07 October 2013 (has links)
The patent domain is a very important source of scientific information that is currently not used to its full potential. Searching for relevant patents is a complex task because the number of existing patents is very high and grows quickly, patent text is extremely complicated, and standard vocabulary is not used consistently or doesn’t even exist. As a consequence, pure keyword searches often fail to return satisfying results in the patent domain. Major companies employ patent professionals who are able to search patents effectively, but even they have to invest a lot of time and effort into their search. Academic scientists on the other hand do not have access to such resources and therefore often do not search patents at all, but they risk missing up-to-date information that will not be published in scientific publications until much later, if it is published at all. Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Similarly, professional patent searches expand beyond keywords by including class codes from various patent classification systems. However, classification-based searches can only be performed effectively if the user has very detailed knowledge of the system, which is usually not the case for academic scientists. Consequently, we investigated methods to automatically identify relevant classes that can then be suggested to the user to expand their query. Since every patent is assigned at least one class code, it should be possible for these assignments to be used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. In order to gain such knowledge, we perform an in-depth comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows that the hierarchies are structurally similar, but terms and annotations differ significantly. The most important differences concern the considerably higher complexity of the IPC class definitions compared to MeSH terms and the far lower number of class assignments to the average patent compared to the number of MeSH terms assigned to PubMed documents. As a result of these differences, problems are caused both for unexperienced patent searchers and professionals. On the one hand, the complex term system makes it very difficult for members of the former group to find any IPC classes that are relevant for their search task. On the other hand, the low number of IPC classes per patent points to incomplete class assignments by the patent office, therefore limiting the recall of the classification-based searches that are frequently performed by the latter group. We approach these problems from two directions: First, by automatically assigning additional patent classes to make up for the missing assignments, and second, by automatically retrieving relevant keywords and classes that are proposed to the user so they can expand their initial search. For the automated assignment of additional patent classes, we adapt an approach to the patent domain that was successfully used for the assignment of MeSH terms to PubMed abstracts. Each document is assigned a set of IPC classes by a large set of binary Maximum-Entropy classifiers. Our evaluation shows good performance by individual classifiers (precision/recall between 0:84 and 0:90), making the retrieval of additional relevant documents for specific IPC classes feasible. The assignment of additional classes to specific documents is more problematic, since the precision of our classifiers is not high enough to avoid false positives. However, we propose filtering methods that can help solve this problem. For the guided patent search, we demonstrate various methods to expand a user’s initial query. Our methods use both keywords and class codes that the user enters to retrieve additional relevant keywords and classes that are then suggested to the user. These additional query components are extracted from different sources such as patent text, IPC definitions, external vocabularies and co-occurrence data. The suggested expansions can help unexperienced users refine their queries with relevant IPC classes, and professionals can compose their complete query faster and more easily. We also present GoPatents, a patent retrieval prototype that incorporates some of our proposals and makes faceted browsing of a patent corpus possible.
18

GoPubMed: Ontology-based literature search for the life sciences

Doms, Andreas 06 January 2009 (has links)
Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.
19

Extragenic Accumulation of RNA Polymerase II Enhances Transcription by RNA Polymerase III

Neugebauer, Karla M., Grishina, Inna, Bledau, Anita S., Listerman, Imke 25 November 2015 (has links) (PDF)
Recent genomic data indicate that RNA polymerase II (Pol II) function extends beyond conventional transcription of primarily protein-coding genes. Among the five snRNAs required for pre-mRNA splicing, only the U6 snRNA is synthesized by RNA polymerase III (Pol III). Here we address the question of how Pol II coordinates the expression of spliceosome components, including U6. We used chromatin immunoprecipitation (ChIP) and high-resolution mapping by PCR to localize both Pol II and Pol III to snRNA gene regions. We report the surprising finding that Pol II is highly concentrated ∼300 bp upstream of all five active human U6 genes in vivo. The U6 snRNA, an essential component of the spliceosome, is synthesized by Pol III, whereas all other spliceosomal snRNAs are Pol II transcripts. Accordingly, U6 transcripts were terminated in a Pol III-specific manner, and Pol III localized to the transcribed gene regions. However, synthesis of both U6 and U2 snRNAs was α-amanitin-sensitive, indicating a requirement for Pol II activity in the expression of both snRNAs. Moreover, both Pol II and histone tail acetylation marks were lost from U6 promoters upon α-amanitin treatment. The results indicate that Pol II is concentrated at specific genomic regions from which it can regulate Pol III activity by a general mechanism. Consequently, Pol II coordinates expression of all RNA and protein components of the spliceosome.
20

Concept Based Knowledge Discovery from Biomedical Literature.

Radovanovic, Aleksandar. January 2009 (has links)
<p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p>

Page generated in 0.0489 seconds