1 |
Ranking de relevância baseado em informações geográficas e sociais.ROCHA, Júlio Henrique. 14 May 2018 (has links)
Submitted by Kilvya Braga (kilvyabraga@hotmail.com) on 2018-05-14T13:17:14Z
No. of bitstreams: 1
JÚLIO HENRIQUE ROCHA - DISSERTAÇÃO (PPGCC) 2016.pdf: 3692650 bytes, checksum: 97151b25e0e73635f40106266ca79e2e (MD5) / Made available in DSpace on 2018-05-14T13:17:14Z (GMT). No. of bitstreams: 1
JÚLIO HENRIQUE ROCHA - DISSERTAÇÃO (PPGCC) 2016.pdf: 3692650 bytes, checksum: 97151b25e0e73635f40106266ca79e2e (MD5)
Previous issue date: 2016 / Capes / Recuperação de Informação Geográfica (GIR) é uma área de pesquisa que desenvolve e viabiliza a construção de mecanismos de busca por conteúdos distribuídos pela Internet envolvendo algum contexto geográfico. Os motores de busca geográfica, que são artefatos produzidos na área de GIR, podem ser especificados para trabalhar em diversos contextos (e.g., esportes, concursos públicos), buscando um tratamento adequado ao tipo de documento manipulado. Atualmente, a comunidade científica e o meio comercial vêm concentrando esforços na construção de motores de busca geográfica com o foco em encontrar notícias distribuídas na Internet. Contudo, motores de busca (geográfica ou não) com foco em notícias, deveriam considerar o fator de credibilidade da informação contida nas mesmas no momento de ordená-las. Infelizmente, na maior parte das vezes, isso não acontece. Mensurar a credibilidade de notícias é uma atividade onerosa e complexa, por exigir o conhecimento dos fatos relatados. Dessa forma, os motores de busca acabam deixando a cargo do usuário a responsabilidade em confiar no que está sendo lido. Nesse contexto, esta dissertação propõe um método de ranking de relevância com foco em notícias e baseado em informações colhidas em redes sociais, para valorar um grau de credibilidade e, assim, ordená-las. O valor de credibilidade da notícia é calculado considerando a afinidade dos usuários, que a compartilharam em sua rede social, com as localidades mencionadas na notícia. Por fim, o ranking de relevância proposto é integrado a uma ferramenta de busca e leitura de notícias, denominada GeoSEn News, que viabiliza a consulta por meio de diversas operações espaciais e permite a visualização dos resultados em diferentes perspectivas. Tal ferramenta foi utilizada para avaliar o método proposto através de experimentos utilizando dados colhidos na rede social Twitter e em mídias informativas espalhadas pelo Brasil. A avaliação apresentou resultados promissores e atestou a viabilidade da construção do ranking de relevância que se baseia em informações coletadas em redes sociais. / Geographic Information Retrieval is a research field that develops and allows the construction of search engines to retrieve information with geographic context that is available on the Internet. Produced in the GIR field, geographic search engines can be specified to work in many different contexts (e.g., as sports, concerts), seeking proper ways to handle the chosen document type. Nowadays, the scientific community and the commerce are focusing efforts on building geographic search engines to find news over the Internet. However, search engines (geographical or otherwise) focused on news should consider the information credibility factor in the moment of ranking them. Unfortunately, in most cases, it is not what happens. Measure the news credibility is a complex and expensive task since it requires knowledge of the stated facts. Thereby, search engines end up giving the user the responsibility to trust or not what is being read. In this context, this work proposes a relevance ranking method focused in news and based on information collected from social networks, to evaluate a credibility factor and thus, rank them. The news credibility value is calculated considering the affinity of users who have shared it on their social network with the locations mentioned in the news. Lastly, the proposed relevance ranking is integrated with a search engine and reading news tool called GeoSEn News, which enables various spatial operations queries and allows result visualization in different perspectives. Through experiments using data collected in the social network Twitter and informational media throughout Brazil, this tool was used to evaluate the proposed method. The evaluation presented promising results and certified the feasibility of building relevance ranking based on information collected from social networks.
|
2 |
A Smart Patent Monitoring Assistant : Using Natural Language Processing / Ett smart verktyg för patentövervakning baserat på natural language processingFsha Nguse, Selemawit January 2022 (has links)
Patent monitoring is about tracking the upcoming inventions in a particular field, predicting future trends, and specific intellectual property rights of interest. It is the process of finding relevant patents on a particular topic based on a specific query. With patent monitoring, one can keep them updated on the new technology in the market. Also, they can find potential licensing opportunities for their inventions. The outputs of patent monitoring are essential for companies, academics, and inventors looking forward to using the latest patents that can enhance further innovation. Nevertheless, there is no widely accepted best approach to patent monitoring. Usually, most patent monitoring systems are based on complex search and find, often leading to insignificant hit rates and highly human intervention. As the number of patents published each year increases massively and with patents being critical to accelerating innovation, the current approach to patent monitoring has two main drawbacks. Firstly, human-driven patent monitoring is time consuming and expensive process. In addition, there is a risk of overlooking interesting documents due to inadequate searching tools and processes, which could cost companies fortunes while at the same time hindering further innovation and creativity. This thesis presents a smart patent monitoring assistant tool that applies natural language processing. The use of several natural language processing methods is investigated to find, classify and rank relevant documents. The tool was trained on a dataset that contains the title, abstract, and claims of patent documents. Given a dataset of patent documents, the aim of this thesis is to create a tool that can classify patents into two classes relevant and not relevant. Furthermore, the tool can rank documents based on relevancy. The evaluation result of the tool gave satisfying results when it came to receiving the expected patents. In addition, there is a significant improvement in terms of performance for memory usage and the time it took to train the model and get results. / Patentövervakning handlar om att övervaka kommande uppfinningar, förutsäga framtida trender, eller specifika immateriella rättigheter och används för att hitta relevanta patent inom ett visst område. Med patentövervakning är det möjligt att hålla patent uppdaterade enligt den senaste tekniken på marknaden samt att hitta potentiella möjligheter att licensiera innehavda patent till tredje part. Målgruppen för patentövervakning är företag, akademiker, och uppfinnare som vill hitta de senaste patenten för att uppnå maximal innovation. Dock finns det ingen generell metod för att bedriva patentövervakning. Vanligtvis används komplexa sökmetoder som resulterar i undermåliga resultat och kräver manuellt ingripande. I och med att andelen patent ökar varje år har nuvarande metod två huvudsakliga nackdelar. Till att börja med är mänsklig patentövervakning en tidskrävande och dyr process. Vidare är det en betydande risk att missa viktiga eller på andra sätt intressanta dokument till följd av en bristande sökprocess. Detta kan möjligtvis resultera i att företag missar stora möjligheter samt utebliven innovation och kreativitet. Detta arbete presenterar ett smart verktyg för patentövervakning baserat på natural language processing. Vi analyserar användningen av ett flertal processer för att hitta, klassificera, och rangordna relevant dokument. Verktyget tränades på ett dataset som innehåller patentets titel, abstrakt, och vad patentet gör anspråk på. Givet ett godtyckligt dataset är målet med detta arbete att utveckla ett verktyg med förmågan att klassificera relevanta och icke-relevanta patent samt rangordna dessa utifrån relevans. Resultatet visar att verktyget gav tillfredsställande gällande att hitta önskvärda patent. Vidare uppnåddes en signifikant förbättring när det gäller prestanda för minnesanvändning och tiden som krävs för att träna modeller och erhålla resultat.
|
Page generated in 0.2363 seconds