Global ETD Search

1	[en] AUTOMATIC GENERATION OF BENCHMARKS FOR EVALUATING KEYWORD AND NATURAL LANGUAGE INTERFACES TO RDF DATASETS / [pt] GERAÇÃO AUTOMÁTICA DE BENCHMARKS PARA AVALIAR INTERFACES BASEADAS EM PALAVRAS-CHAVE E LINGUAGEM NATURAL PARA DATASETS RDF ANGELO BATISTA NEVES JUNIOR 04 November 2022 (has links) [pt] Os sistemas de busca textual fornecem aos usuários uma alternativa amigável para acessar datasets RDF (Resource Description Framework). A avaliação de desempenho de tais sistemas requer benchmarks adequados, consistindo de datasets RDF, consultas e respectivas respostas esperadas. No entanto, os benchmarks disponíveis geralmente possuem poucas consultas e respostas incompletas, principalmente porque são construídos manualmente com a ajuda de especialistas. A contribuição central desta tese é um método para construir benchmarks automaticamente, com um maior número de consultas e com respostas mais completas. O método proposto aplica-se tanto a consultas baseadas em palavras-chave quanto em linguagem natural e possui duas partes: geração de consultas e geração de respostas. A geração de consultas seleciona um conjunto de entidades relevantes, chamadas de indutores, e, para cada uma, heurísticas orientam o processo de extração de consultas relacionadas. A geração de respostas recebe as consultas produzidas no passo anterior e computa geradores de solução (SG), subgrafos do dataset original contendo diferentes respostas às consultas. Heurísticas também orientam a construção dos SGs evitando o desperdiço de recursos computacionais na geração de respostas irrelevantes. / [en] Text search systems provide users with a friendly alternative to access Resource Description Framework (RDF) datasets. The performance evaluation of such systems requires adequate benchmarks, consisting of RDF datasets, text queries, and respective expected answers. However, available benchmarks often have small sets of queries and incomplete sets of answers, mainly because they are manually constructed with the help of experts. The central contribution of this thesis is a method for building benchmarks automatically, with larger sets of queries and more complete answers. The proposed method works for both keyword and natural language queries and has two steps: query generation and answer generation. The query generation step selects a set of relevant entities, called inducers, and, for each one, heuristics guide the process of extracting related queries. The answer generation step takes the queries and computes solution generators (SG), subgraphs of the original dataset containing different answers to the queries. Heuristics also guide the construction of SGs, avoiding the waste of computational resources in generating irrelevant answers. [pt] BENCHMARK [pt] INTERFACE EM LINGUAGEM NATURAL [pt] DATASETS [en] BENCHMARK [en] NATURAL LANGUAGE INTERFACE [en] DATASETS
2	[en] NAMED ENTITY RECOGNITION FOR PORTUGUESE / [pt] RECONHECIMENTO DE ENTIDADES MENCIONADAS PARA O PORTUGUÊS DANIEL SPECHT SILVA MENEZES 13 December 2018 (has links) [pt] A produção e acesso a quantidades imensas dados é um elemento pervasivo da era da informação. O volume de informação disponível é sem precedentes na história da humanidade e está sobre constante processo de expansão. Uma oportunidade que emerge neste ambiente é o desenvolvimento de aplicações que sejam capazes de estruturar conhecimento contido nesses dados. Neste contexto se encaixa a área de Processamento de Linguagem Natural (PLN) - Natural Language Processing (NLP) - , ser capaz de extrair informações estruturadas de maneira eficiente de fontes textuais. Um passo fundamental para esse fim é a tarefa de Reconhecimento de Entidades Mencionadas (ou nomeadas) - Named Entity Recognition (NER) - que consistem em delimitar e categorizar menções a entidades num texto. A construção de sistemas para NLP deve ser acompanhada de datasets que expressem o entendimento humano sobre as estruturas gramaticais de interesse, para que seja possível realizar a comparação dos resultados com o real discernimento humano. Esses datasets são recursos escassos, que requerem esforço humano para sua produção. Atualmente, a tarefa de NER vem sendo abordada com sucesso por meio de redes neurais artificiais, que requerem conjuntos de dados anotados tanto para avaliação quanto para treino. A proposta deste trabalho é desenvolver um dataset de grandes dimensões para a tarefa de NER em português de maneira automatizada, minimizando a necessidade de intervenção humana. Utilizamos recursos públicos como fonte de dados, nominalmente o DBpedia e Wikipédia. Desenvolvemos uma metodologia para a construção do corpus e realizamos experimentos sobre o mesmo utilizando arquiteturas de redes neurais de melhores performances reportadas atualmente. Exploramos diversas modelos de redes neurais, explorando diversos valores de hiperparâmetros e propondo arquiteturas com o foco específico de incorporar fontes de dados diferentes para treino. / [en] The production and access of huge amounts of data is a pervasive element of the Information Age. The volume of availiable data is without precedents in human history and it s in constant expansion. An oportunity that emerges in this context is the development and usage of applicationos that are capable structuring the knowledge of data. In this context fits the Natural Language Processing, being able to extract information efficiently from textual data. A fundamental step for this goal is the task of Named Entity Recognition (NER) which delimits and categorizes the mentions to entities. The development o systems for NLP tasks must be accompanied by datasets produced by humans in order to compare the system with the human discerniment for the NLP task at hand. These datasets are a scarse resource which the construction is costly in terms of human supervision. Recentlly, the NER task has been approached using artificial network models which needs datsets for both training and evaluation. In this work we propose the construction of a datasets for portuguese NER with an automatic approach using public data sources structured according to the principles of SemanticWeb, namely, DBpedia and Wikipédia. A metodology for the construction of this dataset was developed and experiments were performed using both the built dataset and the neural network architectures with the best reported results. Many setups for the experiments were evaluated, we obtained preliminary results for diverse hiperparameters values, also proposing architectures with the specific focus of incorporating diverse data sources for training. [pt] REDES NEURAIS [en] NEURAL NETWORKS [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] WIKIPEDIA [en] WIKIPEDIA [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [en] NATURAL LANGUAGE PROCESSING [en] NAMED ENTITY RECOGNITION [pt] DATASETS [en] DATASETS

Search results

[en] AUTOMATIC GENERATION OF BENCHMARKS FOR EVALUATING KEYWORD AND NATURAL LANGUAGE INTERFACES TO RDF DATASETS / [pt] GERAÇÃO AUTOMÁTICA DE BENCHMARKS PARA AVALIAR INTERFACES BASEADAS EM PALAVRAS-CHAVE E LINGUAGEM NATURAL PARA DATASETS RDF

[en] NAMED ENTITY RECOGNITION FOR PORTUGUESE / [pt] RECONHECIMENTO DE ENTIDADES MENCIONADAS PARA O PORTUGUÊS