Global ETD Search

21	Estudo sobre o impacto da adição de vocabulários estruturados da área de ciências da saúde no Currículo Lattes Araújo, Charles Henrique de January 2016 (has links) A busca de informações em bases de dados de instituições que possuem grande volume de dados necessita cada vez mais de processos mais eficientes para realização dessa tarefa. Problemas de grafia, idioma, sinonímia, abreviação de termos e a falta de padronização dos termos, tanto nos argumentos de busca, quanto na indexação dos documentos, interferem diretamente nos resultados. Diante disso, este estudo teve como objetivo avaliar o impacto da adição de vocabulários estruturados da área de Ciências da Saúde no Currículo Lattes, na recuperação de perfis similares de pesquisadores das áreas de Ciências Biológicas e Ciências da Saúde, utilizando técnicas de mineração de dados, expansão de consultas, modelos vetoriais de consultas e utilização de algoritmo de trigramas. Foram realizados cruzamentos de informações entre as palavras-chaves de artigos publicados registrados no Currículo Lattes e as informações contidas no Medical Subject Headings (MeSH) e nos Descritores em Ciências da Saúde (DeCS), bem como comparações entre os resultados das consultas, utilizando as palavras-chaves originais e adicionando-lhes os termos resultantes do processo de expansão de consultas. Os resultados mostram que a metodologia adotada neste estudo pode incrementar qualitativamente o universo de perfis recuperados, podendo dessa forma contribuir para a melhoria dos Sistemas de Informações do Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq. / Information retrieval in large databases need increasingly more efficient ways for accomplishing this task. There are many problems, like spelling, language, synonym, acronyms, lack of standardization of terms, both in the search arguments, as in the indexing of documents. They directly interfere in the results. Thus, this study aimed to evaluate the impact of the addition of structured vocabularies of Health Sciences area in Lattes Database, in the recovery of similar profiles of researchers that work in Biological Sciences and Health Sciences, using Query Expansion, Data Mining procedures, Vector Models and Trigram Phrase Matching algorithm. Crosschecking keywords of articles registered in Lattes Database and Medical Subject Headings (MeSH) and Health Sciences Descriptors (DeCS) terms, as well as comparisons between the results of queries using the original keywords and adding them to query expansion terms. The results show that the methodology used in this study can qualitatively increase the set of recovered profiles, contributing to the improvement of CNPq Information Systems. Vocabulário controlado Sistemas de recomendação Recuperação da informação Ciências da saúde Query expansion Data mining Recommendation systems
22	Estudo sobre o impacto da adição de vocabulários estruturados da área de ciências da saúde no Currículo Lattes Araújo, Charles Henrique de January 2016 (has links) A busca de informações em bases de dados de instituições que possuem grande volume de dados necessita cada vez mais de processos mais eficientes para realização dessa tarefa. Problemas de grafia, idioma, sinonímia, abreviação de termos e a falta de padronização dos termos, tanto nos argumentos de busca, quanto na indexação dos documentos, interferem diretamente nos resultados. Diante disso, este estudo teve como objetivo avaliar o impacto da adição de vocabulários estruturados da área de Ciências da Saúde no Currículo Lattes, na recuperação de perfis similares de pesquisadores das áreas de Ciências Biológicas e Ciências da Saúde, utilizando técnicas de mineração de dados, expansão de consultas, modelos vetoriais de consultas e utilização de algoritmo de trigramas. Foram realizados cruzamentos de informações entre as palavras-chaves de artigos publicados registrados no Currículo Lattes e as informações contidas no Medical Subject Headings (MeSH) e nos Descritores em Ciências da Saúde (DeCS), bem como comparações entre os resultados das consultas, utilizando as palavras-chaves originais e adicionando-lhes os termos resultantes do processo de expansão de consultas. Os resultados mostram que a metodologia adotada neste estudo pode incrementar qualitativamente o universo de perfis recuperados, podendo dessa forma contribuir para a melhoria dos Sistemas de Informações do Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq. / Information retrieval in large databases need increasingly more efficient ways for accomplishing this task. There are many problems, like spelling, language, synonym, acronyms, lack of standardization of terms, both in the search arguments, as in the indexing of documents. They directly interfere in the results. Thus, this study aimed to evaluate the impact of the addition of structured vocabularies of Health Sciences area in Lattes Database, in the recovery of similar profiles of researchers that work in Biological Sciences and Health Sciences, using Query Expansion, Data Mining procedures, Vector Models and Trigram Phrase Matching algorithm. Crosschecking keywords of articles registered in Lattes Database and Medical Subject Headings (MeSH) and Health Sciences Descriptors (DeCS) terms, as well as comparisons between the results of queries using the original keywords and adding them to query expansion terms. The results show that the methodology used in this study can qualitatively increase the set of recovered profiles, contributing to the improvement of CNPq Information Systems. Vocabulário controlado Sistemas de recomendação Recuperação da informação Ciências da saúde Query expansion Data mining Recommendation systems
23	Cell assemblies para expansão de consultas / Cell assemblies for query expansion Volpe, Isabel Cristina January 2011 (has links) Uma das principais tarefas de Recuperação de Informações é encontrar documentos que sejam relevantes a uma consulta. Esta tarefa é difícil porque, em muitos casos os termos de busca escolhidos pelo usuário são diferentes dos termos utilizados pelos autores dos documentos. Ao longo dos anos, várias abordagens foram propostas para lidar com este problema. Uma das técnicas mais utilizadas, com o objetivo de expandir o número de documentos relevantes recuperados é a Expansão de Consultas, que consiste em expandir a consulta com a adição de termos relacionados. Este trabalho propõe um método que utiliza o modelo de Cell Assemblies para a expansão da consulta. Cell Assemblies são grupos de neurônios conectados, com padrões de disparo, que permitem que a atividade persista mesmo após a remoção dos estímulos externos. A modificação das sinapses entre os neurônios é feita através de regras de aprendizagem Hebbiana. Neste trabalho, o modelo Cell Assemblies foi adaptado a fim de aprender os relacionamentos entre os termos de uma coleção de documentos. Esses relacionamentos são utilizados para expandir a consulta original com termos relacionados. A avaliação experimental sobre uma coleção de testes padrão em Recuperação de Informações mostrou que algumas consultas melhoraram significativamente seus resultados com a técnica proposta. / One of the main tasks in Information Retrieval is to match a user query to the documents that are relevant for it. This matching is challenging because in many cases the keywords the user chooses will be different from the words the authors of the relevant documents have used. Throughout the years, many approaches have been proposed to deal with this problem. One of the most popular consists in expanding the query with related terms with the goal of retrieving more relevant documents. In this work, we propose a new method in which a Cell Assembly model is applied for query expansion. Cell Assemblies are reverberating circuits of neurons that can persist long beyond the initial stimulus has ceased. They learn through Hebbian Learning rules and have been used to simulate the formation and the usage of human concepts. We adapted the Cell Assembly model to learn relationships between the terms in a document collection. These relationships are then used to augment the original queries. Our experiments use standard Information Retrieval test collections and show that some queries significantly improved their results with the proposed technique. Recuperacao : Informacao Redes neurais Query expansion Information retrieval Neural networks Hebbian learning
24	Modelo fuzzy para recuperação de informação utilizando multiplas ontologias relacionadas / Fuzzy information retrieval model using multiple related ontologies Leite, Maria Angelica de Andrade 13 August 2018 (has links) Orientador: Ivan Luiz Marques Ricarte / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-13T09:57:01Z (GMT). No. of bitstreams: 1 Leite_MariaAngelicadeAndrade_D.pdf: 1895167 bytes, checksum: fdce073bd2fe535322ed192c85f7b61a (MD5) Previous issue date: 2009 / Resumo: Com a crescente popularidade da World Wide Web mais pessoas têm acesso à informação cujo volume vem expandindo ao longo do tempo. A área de recuperação de informação ganhou um novo desafio visando buscar os recursos pelo significado da informação neles contida. Uma forma de recuperar a informação, pelo seu significado, é pelo uso de uma base de conhecimento que modela os conceitos de um domínio e seus relacionamentos. Atualmente, ontologias têm sido utilizadas para modelar bases de conhecimento. Para tratar com a imprecisão e a incerteza, presentes no conhecimento e no processo de recuperação de informação, são empregadas técnicas da teoria de conjuntos fuzzy. Trabalhos precedentes codificam a base de conhecimento utilizando apenas uma ontologia. Entretanto, uma coleção de documentos pode tratar temas pertencentes a domínios diferentes, expressos por ontologias distintas, que podem estar relacionados. Neste trabalho, uma forma de organização e representação do conhecimento em múltiplas ontologias relacionadas foi investigada e um novo método de expansão de consulta foi desenvolvido. A organização do conhecimento e o método de expansão de consulta foram integrados no modelo fuzzy para recuperação de informação utilizando múltiplas ontologias relacionadas. O desempenho do modelo foi comparado com outro modelo fuzzy para recuperação de informação e com a máquina de busca Lucene do projeto Apache. Em ambos os casos o modelo proposto apresentou uma melhora nas medidas de precisão e cobertura. / Abstract: With the World Wide Web popularity growth, more people has access to information and this information volume is expanding over the time. The information retrieval area has a new challenge intending to search information resources by their meaning. A way to retrieve information, by its meaning, is by using a knowledge base that encodes the domain concepts and their relationships. Nowadays ontologies are being used to model knowledge bases. To deal with imprecison and uncertainty present in the knowledge and in the information retrieval process, fuzzy set theory techniques are employed. Preceding works encode a knowledge base using just one ontology. However a document collection can deal with different domain themes, expressed by distinct ontologies, that can be related. In this work a way of knowledge organization and representation, using multiple related ontologies, was investigated and a new method of query expansion was developed. The knowledge organization and the query expansion method were integrated in the fuzzy model for information retrieval based on mutiple related ontologies. The model performance was compared with another fuzzy-based approach for information retrieval and with the Apache Lucene search engine. In both cases the proposed model improves the precision and recall measures. / Doutorado / Engenharia de Computação / Doutor em Engenharia Elétrica Recuperação da informação Representação do conhecimento Ontologia Sistemas fuzzy Query expansion Knowledge representation Ontology Fuzzy information retrieval
25	Query Expansion Research and Application in Search Engine Based on Concepts Lattice Cui, Jun January 2009 (has links) Formal concept analysis is increasingly applied to query expansion and data mining problems. In this paper I analyze and compare the current concept lattice construction algorithm, and choose iPred and Border algorithms to adapt for query expansion. After I adapt two concept lattice construction algorithms, I apply these four algorithms on one query expansion prototype system. The calculation time for four algorithms are recorded and analyzed. The result of adapted algorithms is good. Moreover I find the efficiency of concept lattice construction is not consistent with complex analysis result. In stead, it is high depend on the structure of data set, which is data source of concept lattice. Formal concept analysis Query expansion Concept lattice Computer Sciences Datavetenskap (datalogi)
26	Rozšiřování dotazů pro vyhledávání medicínských informací / Query expansion for medical information retrieval Bibyna, Feraena January 2015 (has links) One of the challenges in medical information retrieval is the terminology gap between the documents (commonly written by medical professional, using medical jargons), and the queries (commonly composed by non professional, using layman terms). In this thesis, we investigate the effect of query expansion, using domain-specific knowledge resource, to deal with this challenge. We use the Unified Medical Language System (UMLS), a repository of biomedical vocabularies, and utilize two of its resources: the Metathesaurus and the Semantic Network. We use the query set and document set provided by CLEF eHealth organizer. The query sets, provided for the medical information retrieval shared task, represent two different use cases of medical information retrieval. We experiment with query expansion using synonymous terms and non-synonymous concepts, blind relevance feedback, field weighting, and linear interpolation of different systems. Powered by TCPDF (www.tcpdf.org)
27	Chinese-English cross-lingual information retrieval in biomedicine using ontology-based query expansion Wang, Xinkai January 2011 (has links) In this thesis, we propose a new approach to Chinese-English Biomedical cross-lingual information retrieval (CLIR) using query expansion based on the eCMeSH Tree, a Chinese-English ontology extended from the Chinese Medical Subject Headings (CMeSH) Tree. The CMeSH Tree is not designed for information retrieval (IR), since it only includes heading terms and has no term weighting scheme for these terms. Therefore, we design an algorithm, which employs a rule-based parsing technique combined with the C-value term extraction algorithm and a filtering technique based on mutual information, to extract Chinese synonyms for the corresponding heading terms. We also develop a term-weighting mechanism. Following the hierarchical structure of CMeSH, we extend the CMeSH Tree to the eCMeSH Tree with synonymous terms and their weights. We propose an algorithm to implement CLIR using the eCMeSH Tree terms to expand queries. In order to evaluate the retrieval improvements obtained from our approach, the results of the query expansion based on the eCMeSH Tree are individually compared with the results of the experiments of query expansion using the CMeSH Tree terms, query expansion using pseudo-relevance feedback, and document translation. We also evaluate the combinations of these three approaches. This study also investigates the factors which affect the CLIR performance, including a stemming algorithm, retrieval models, and word segmentation. 026.61
28	Automatisk synonymgenerering med Word2Vec for query expansion inom e-handel Kojic, Kemal, Petersson, Emil January 2018 (has links) I detta arbete undersöks hur väl automatisk synonymgenerering genom maskininlärnings-metoden Word2Vec, som tränats över en datamängd från Google News på hundra miljarder ord, lämpar sig för query expansion inom ehandel. Detta görs genom användning av produkt- och eventdata från ett välkänt modebolag där synonymer genereras utifrån söksträngar som loggats i eventdata genom olika metoder som i sin tur bildar synonymböcker som används i framtida sökningar med hjälp av query expansion. För att kunna besvara studiens forskningsfrågor utförs först en kvantitativ analys. Denna analys utförs på data som matchade köp, produktträffar, no-hits och söktid. Information om denna data genereras utifrån en söksimulator som simulerar loggade händelser från användarsessioner i ett ehandelssystem. Därefter filtreras de genererade synonymböckerna genom att ta bort synonymer som är kopplade till de söksträngar som producerat ett sämre resultat i simuleringen med synonymer, än utan. För att validera vårt resultat från den kvantitativa analysen utförs även en kvalitativ analys på skillnaden i sökresultatet som de olika metoderna tar fram, där vi undersöker vad det är för produkter som tas fram med hjälp av synonymerna, för att undersöka dess relevans. Våra tester uppvisar att ett lägre tröskelvärde leder till fler produkträffar och minskar antalet no-hits. Antalet produktträffar ökades med mellan 4\%-10\%, no-hits reducerades med mellan 11\%-22\%. I de fall där söksträngen har tilldelats bra synonymer påverkas relevansen av produkterna positivt då fler relevanta produkter dyker upp i sökresultatet. I de fall där söksträngen har tilldelats mindre bra synonymer påverkas relevansen av produkterna negativt då vissa irrelevanta produkter dyker upp i sökresultatet som användaren antagligen inte vill se i sitt sökresultat. I alla fall där de automatiskt genererade synonymerna används så befinner sig majoriteten av alla köpta produkter i den första halvan av sökresultatet, däremot minskar antalet köpta produkter på den första platsen i sökresultatet i alla fallen. / In this thesis, we examine automatic synonym generation through the use of the machine learning algorithm Word2Vec that has been trained using a Google News data set containing a hundred million words to find out if it is suitable for query expansions in e-commerce. This is examined through the use of product- and event data from a well-known fashion company where synonyms are generated from search-queries that have been logged in the event data through different methods, resulting in thesaurus' that are used in future searches with the use of query expansions. In order to answer the thesis' research question, a quantitative analysis is performed. This analysis is performed on data such as matched payments, product matches, no-hits and search time. Information about this data is generated through a search simulator that simulates logged events from user sessions in a e-commerce system. The generated thesaurus' are later filtered through the removal of synonyms that are connected to search queries whose results have produced worse results than the results without synonyms. In order to validate our results from the quantitative analysis a qualitative analysis is also performed on the difference of the search result that the different methods produce. In this qualitative analysis we research what type of products that the added synonyms produce in order to understand the relevance of the search query. Our tests show that the lower the threshold is, the higher the number of product hits and the lower the number of no-hits. Our tests shows that the number of product hits was increased by between 4\%-10\%, the number of no-hits was reduced by 11\%-22\%. In all of the tests using automatically generated synonyms, the results show that the majority of the purchased products are presented in the first half of the search result, however, in all of the tests using automatically generated synonyms the number of purchases in the first position of the search result was reduced. synonymgenerering query expansion e-handel word2vec text mining Engineering and Technology Teknik och teknologier
29	Utilising semantic technologies for intelligent indexing and retrieval of digital images Osman, T., Thakker, Dhaval, Schaefer, G. 15 October 2013 (has links) Yes / Yes / The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing a colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion. Semantic matching ; Query expansion ; Image retrieval ; Ontology engineering ; Keyword search ; Knowledge management ; Web
30	Augmenting Dynamic Query Expansion in Microblog Texts Khandpur, Rupinder P. 17 August 2018 (has links) Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems. In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts. / Ph. D. / Analysis of real-time, social media can provide critical insights into ongoing societal events. Where consequences and implications of specific events include monetary losses, threats to critical infrastructure and national security, disruptions to daily life, and a potential to cause loss of life and physical property. It is imperative for developing good ‘ground truth’ to develop adequate data-driven information systems, i.e., an authoritative record of events reported in the media cataloged alongside important dimensions. Availability of high-quality ground truth events can support various analytic efforts, e.g., identifying precursors of attacks, developing predictive indicators using surrogate data sources, and tracking the progression of events over space and time. A dynamic search result refinement is useful for expanding a general set of user queries into a more relevant collection. The challenges of information overload and misalignment of context between the user query and retrieved results can overwhelm both human and machine. In this dissertation, we focus our efforts on these specific challenges. With the ever-increasing volume of user-generated data large-scale analysis is a tedious task. Our first focus is to develop a scalable model that dynamically tracks and ranks evolving topics as they appear in social media. Then to simplify the cognitive tasks involving sense-making of evolving themes, we take a visual approach to retrieve situationally critical and emergent information effectively. This visual analytics approach learns from user’s interactions during the exploratory process and then generates a better representation of the data. Thus, improving the situational understanding and usability of underlying data models. Such features are crucial for big-data based decision & support systems. To make the event-focused retrieval process more robust, we developed a context-rich procedure that adds new relevant key terms to the user’s original query by utilizing the linguistic structures in text. This context-awareness allows the algorithm to retrieve those relevant characteristics that can help users to gain adequate information from social media about real-world events. Online social commentary about events is very informal and can be incomplete. However, to get the complete picture and adequately describe these events we develop an approach that models the underlying relatedness of information and iteratively extract meaning and denotations from event-related texts. We learn how to express the high-order relationships between events and entities and group them to identify those attributes that best explain the events the user is trying to uncover. In all the augmentations we develop, our strategy is to allow only very minimal human supervision using just a small set of seed event triggers and requires no training or labeled samples. We show a comprehensive evaluation of these augmentations on real-world domains - threats on airports, cyber attacks, and protests. We also demonstrate their applicability as for real-time analysis that provides vital event characteristics, and contextually consistent information can be a beneficial aid for emergency responders. Dynamic Query Expansion Microblog Event Retrieval Social Media Analytics Visual Knowledge Discovery

Search results