Global ETD Search

41	DrillBeyond: Processing Multi-Result Open World SQL Queries Eberius, Julian, Thiele, Maik, Braunschweig, Katrin, Lehner, Wolfgang 11 July 2022 (has links) In a traditional relational database management system, queries can only be defined over attributes defined in the schema, but are guaranteed to give single, definitive answer structured exactly as specified in the query. In contrast, an information retrieval system allows the user to pose queries without knowledge of a schema, but the result will be a top-k list of possible answers, with no guarantees about the structure or content of the retrieved documents. In this paper, we present DrillBeyond, a novel IR/RDBMS hybrid system, in which the user seamlessly queries a relational database together with a large corpus of tables extracted from a web crawl. The system allows full SQL queries over the relational database, but additionally allows the user to use arbitrary additional attributes in the query that need not to be defined in the schema. The system then processes this semi-specified query by computing a top-k list of possible query evaluations, each based on different candidate web data sources, thus mixing properties of RDBMS and IR systems. We design a novel plan operator that encapsulates a web data retrieval and matching system and allows direct integration of such systems into relational query processing. We then present methods for efficiently processing multiple variants of a query, by producing plans that are optimized for large invariant intermediate results that can be reused between multiple query evaluations. We demonstrate the viability of the operator and our optimization strategies by implementing them in PostgreSQL and evaluating on a standard benchmark by adding arbitrary attributes to its queries. info:eu-repo/classification/ddc/004 ddc:004
42	A Generic Approach to Component-Level Evaluation in Information Retrieval Kürsten, Jens 19 November 2012 (has links) (PDF) Research in information retrieval deals with the theories and models that constitute the foundations for any kind of service that provides access or pointers to particular elements of a collection of documents in response to a submitted information need. The specific field of information retrieval evaluation is concerned with the critical assessment of the quality of search systems. Empirical evaluation based on the Cranfield paradigm using a specific collection of test queries in combination with relevance assessments in a laboratory environment is the classic approach to compare the impact of retrieval systems and their underlying models on retrieval effectiveness. In the past two decades international campaigns, like the Text Retrieval Conference, have led to huge advances in the design of experimental information retrieval evaluations. But in general the focus of this system-driven paradigm remained on the comparison of system results, i.e. retrieval systems are treated as black boxes. This approach to the evaluation of retrieval system has been criticised for treating systems as black boxes. Recent works on this subject have proposed the study of the system configurations and their individual components. This thesis proposes a generic approach to the evaluation of retrieval systems at the component-level. The focus of the thesis at hand is on the key components that are needed to address typical ad-hoc search tasks, like finding books on a particular topic in a large set of library records. A central approach in this work is the further development of the Xtrieval framework by the integration of widely-used IR toolkits in order to eliminate the limitations of individual tools. Strong empirical results at international campaigns that provided various types of evaluation tasks confirm both the validity of this approach and the flexibility of the Xtrieval framework. Modern information retrieval systems contain various components that are important for solving particular subtasks of the retrieval process. This thesis illustrates the detailed analysis of important system components needed to address ad-hoc retrieval tasks. Here, the design and implementation of the Xtrieval framework offers a variety of approaches for flexible system configurations. Xtrieval has been designed as an open system and allows the integration of further components and tools as well as addressing search tasks other than ad-hoc retrieval. This approach ensures that it is possible to conduct automated component-level evaluation of retrieval approaches. Both the scale and impact of these possibilities for the evaluation of retrieval systems are demonstrated by the design of an empirical experiment that covers more than 13,000 individual system configurations. This experimental set-up is tested on four test collections for ad-hoc search. The results of this experiment are manifold. For instance, particular implementations of ranking models fail systematically on all tested collections. The exploratory analysis of the ranking models empirically confirms the relationships between different implementations of models that share theoretical foundations. The obtained results also suggest that the impact on retrieval effectiveness of most instances of IR system components depends on the test collections that are being used for evaluation. Due to the scale of the designed component-level evaluation experiment, not all possible interactions of the system component under examination could be analysed in this work. For this reason the resulting data set will be made publicly available to the entire research community. / Das Forschungsgebiet Information Retrieval befasst sich mit Theorien und Modellen, die die Grundlage für jegliche Dienste bilden, die als Antwort auf ein formuliertes Informationsbedürfnis den Zugang zu oder einen Verweis auf entsprechende Elemente einer Dokumentsammlung ermöglichen. Die Qualität von Suchalgorithmen wird im Teilgebiet Information Retrieval Evaluation untersucht. Der klassische Ansatz für den empirischen Vergleich von Retrievalsystemen basiert auf dem Cranfield-Paradigma und nutzt einen spezifischen Korpus mit einer Menge von Beispielanfragen mit zugehörigen Relevanzbewertungen. Internationale Evaluationskampagnen, wie die Text Retrieval Conference, haben in den vergangenen zwei Jahrzehnten zu großen Fortschritten in der Methodik der empirischen Bewertung von Suchverfahren geführt. Der generelle Fokus dieses systembasierten Ansatzes liegt jedoch nach wie vor auf dem Vergleich der Gesamtsysteme, dass heißt die Systeme werden als Black Box betrachtet. In jüngster Zeit ist diese Evaluationsmethode vor allem aufgrund des Black-Box-Charakters des Untersuchungsgegenstandes in die Kritik geraten. Aktuelle Arbeiten fordern einen differenzierteren Blick in die einzelnen Systemeigenschaften, bzw. ihrer Komponenten. In der vorliegenden Arbeit wird ein generischer Ansatz zur komponentenbasierten Evaluation von Retrievalsystemen vorgestellt und empirisch untersucht. Der Fokus der vorliegenden Dissertation liegt deshalb auf zentralen Komponenten, die für die Bearbeitung klassischer Ad-Hoc Suchprobleme, wie dem Finden von Büchern zu einem bestimmten Thema in einer Menge von Bibliothekseinträgen, wichtig sind. Ein zentraler Ansatz der Arbeit ist die Weiterentwicklung des Xtrieval Frameworks mittels der Integration weitverbreiteter Retrievalsysteme mit dem Ziel der gegenseitigen Eliminierung systemspezifischer Schwächen. Herausragende Ergebnisse im internationalen Vergleich, für verschiedenste Suchprobleme, verdeutlichen sowohl das Potenzial des Ansatzes als auch die Flexibilität des Xtrieval Frameworks. Moderne Retrievalsysteme beinhalten zahlreiche Komponenten, die für die Lösung spezifischer Teilaufgaben im gesamten Retrievalprozess wichtig sind. Die hier vorgelegte Arbeit ermöglicht die genaue Betrachtung der einzelnen Komponenten des Ad-hoc Retrievals. Hierfür wird mit Xtrieval ein Framework dargestellt, welches ein breites Spektrum an Verfahren flexibel miteinander kombinieren lässt. Das System ist offen konzipiert und ermöglicht die Integration weiterer Verfahren sowie die Bearbeitung weiterer Retrievalaufgaben jenseits des Ad-hoc Retrieval. Damit wird die bislang in der Forschung verschiedentlich geforderte aber bislang nicht erfolgreich umgesetzte komponentenbasierte Evaluation von Retrievalverfahren ermöglicht. Mächtigkeit und Bedeutung dieser Evaluationsmöglichkeiten werden anhand ausgewählter Instanzen der Komponenten in einer empirischen Analyse mit über 13.000 Systemkonfigurationen gezeigt. Die Ergebnisse auf den vier untersuchten Ad-Hoc Testkollektionen sind vielfältig. So wurden beispielsweise systematische Fehler bestimmter Ranking-Modelle identifiziert und die theoretischen Zusammenhänge zwischen spezifischen Klassen dieser Modelle anhand empirischer Ergebnisse nachgewiesen. Der Maßstab des durchgeführten Experiments macht eine Analyse aller möglichen Einflüsse und Zusammenhänge zwischen den untersuchten Komponenten unmöglich. Daher werden die erzeugten empirischen Daten für weitere Studien öffentlich bereitgestellt. komponentenbasierte Analyse Software Framework Component-Level Evaluation ddc:004 ddc:020 Information-Retrieval-System Metasuchmaschine Evaluation Information Retrieval
43	DESIGN OF A COMPREHENSIVE DATA BASE ON DESERT PLANTS. Kelly, Kathleen, 1942- January 1982 (has links) No description available. Desert plants -- Data processing. PLANT (Information retrieval system)
44	Network monitoring with focus on HTTP Schmid, Andreas 01 May 1998 (has links) Since its introduction in the early 1990s, the quick growth of the World Wide Web (WWW) traffic raises the question of whether past Local Area Network (LAN) packet traces still reflect the current situation or whether they have become obsolete. For this thesis, several LAN packet traces were obtained by monitoring the LAN of a typical academic environment. The tools for monitoring the network were a stand-alone HP LAN Protocol Analyzer as well as the free-ware software tool tcpdump. The main focus was placed on acquiring a low-level overview of the LAN traffic. Thus, it was possible to determine what protocols were mainly used and how the packet sizes were distributed. In particular, this study aimed at establishing the amount of WWW traffic on the LAN, and determining the MIME-Types of this traffic. The results indicate that in a typical academic environment, conventional sources of LAN traffic such as NFS are still predominant, whereas WWW traffic plays a rather marginal role. Furthermore, a large portion of the network packets contains little or no data at all, while another significant portion of the packets have sizes around the Maximum Transfer Unit (MTU). Consequently, research in the networking field has to direct its focus on issues beside the WWW. / Graduation date: 1998 Local area networks (Computer networks) Computer network protocols HTTP (Computer network protocol)
45	SCAPULA system : a computerized retrieval system for archaeological data from the Upper Wabash Drainage Sun, Pao-Kong January 1984 (has links) The heart of this dissertation is the SCAPULA Information Retrieval System, used to create, maintain, and retrieve coded archaeological data for the Upper Wabash Drainage at the Archaeology Laboratory of Ball State University.Several existing archaeological data banks were surveyed and classified at first, and different file organizations, computer software and hardware were reviewed next using as a major criterion the needs of archaeologists at Ball State in order to determine the characteristics of the SCAPULA System.The encoding instructions and retrieval keywords are illustrated and listed, while the functions of the SCAPULA are introduced. With its straightforward query instructions and examples, the SCAPULA Information Retrieval System, a relational data bank, is very easy to use.The present study sought to examine the impact of victim-observer similarity, victim physical attractiveness, outcome severity and sex of respondent on responsibility attributions made toward a rape victim. Perceived attitudinal similarity, victim physical attractiveness, and outcome severity were experimentally varied. In addition this study sought to further examine sex differences, which prior research has indicated may influence how a rape victim is perceived.A modified version of Alexander's (1980) scale was used to measure the degree of responsibility attributed to the victim, to the assailant, to society and to chance in each condition. A research design was developed using two levels of each of the four factors.The experiment was conducted during regular class periods. The population consisted of 198 male and female undergraduate students. Prior to the actual experiment, Ss were randomly assigned to review an attitude questionnaire (supposedly completed by the victim), which was either similar or dissimilar to one completed previously by themselves. The attitude survey used in this study was the Important Issues Questionnaire (Novak & Lerner, 1968). The study was conducted such that Ss perceived the victim to be either like or unlike themselves in basic attitudes. Ss were then asked to view a videotape in which a sexual assault victim was interviewed. The victim was actually an actress who read a prepared script. Outcome severity was varied by the use of written vignettes and by the victim's (actress's) narration of either having suffered an attempted rape or a rape with physical injuries. Physical attractiveness was varied by the use of cosmetics and dress. Ss were tested in groups. Each group saw only one of the four videotapes. Ss were debriefed following the experiment.The study was designed to answer the following research questions:1. Would Ss make significantly different responsibility attributions toward a victim they perceived as similar to themselves than toward a victim they perceived as dissimilar to themselves?2. Would Ss make significantly different responsibility attributions toward a victim who suffered a non-severe outcome than toward a victim who suffered a severe outcome?3. Would male Ss make significantly different attributions of responsibility toward a physically attractive victim than toward a physically unattractive victim?4. Would the respondent's sex significantly affect the degree of responsibility attributed to the victim?A 2x2x2x2 multivariate analysis of variance was used to test the four research hypothesis. Significance was considered at an alpha level of .05.FindingsThe results of this study indicated that no significant difference existed for similarity, outcome severity, sex of respondent or physical attractiveness. There was however, a tendency for Ss to attribute more responsibility to the victim who had suffered a severe outcome, and also for the assailant in that condition to be assigned a harsher penalty.ConclusionPrior research in the area of rape victim culpability has offered conflicting results. The present study sought to provide clarity to the findings of previous research. Further research is needed in this area to gain a clearer understanding of factors which influence how victims of sexual assault are perceived. Archaeology -- Data processing. SCAPULA (Information retrieval system) Wabash River Watershed. Indiana -- Antiquities.
46	Kontextbasiertes Information-Retrieval : Modell, Konzeption und Realisierung kontextbasierter Information-Retrieval-Systeme / Morgenroth, Karlheinz. January 2006 (has links) Universiẗat, Diss., 2006--Bamberg.
47	Recuperação da Informação: estudo da usabilidade na base de dados Public Medical (PUBMED) COELHO, Odete Máyra Mesquita January 2014 (has links) COELHO, Odete Máyra Mesquita; PINTO, Virgínia Bentes. Recuperação da Informação: estudo da usabilidade na base de dados Public Medical (PUBMED). 2014. 172 f. Dissertação (Mestrado) - Universidade Federal da Paraíba, Mestrado em Ciência da Informação, Paraíba, 2014. / Submitted by Lidya Silva (nagylla.lidya@gmail.com) on 2016-07-01T14:54:21Z No. of bitstreams: 1 2014_diss_ommsales.pdf: 4229373 bytes, checksum: 0087285c704b68c550008eeb3ca7869a (MD5) / Rejected by Márcia Araújo (marcia_m_bezerra@yahoo.com.br), reason: Por gentileza, faça as devidas correções de acordo com as orientações recebidas. Qualquer dúvida ligar 33667659. Márcia Bezerra Revisora do Repositório Institucional Biblioteca das Casas de Cultura Estrangeira/UFC on 2016-07-06T13:18:46Z (GMT) / Submitted by Lidya Silva (nagylla.lidya@gmail.com) on 2016-07-08T13:27:55Z No. of bitstreams: 1 2014_diss_ommsales.pdf: 4229373 bytes, checksum: 0087285c704b68c550008eeb3ca7869a (MD5) / Approved for entry into archive by Maria Josineide Góis (josineide@ufc.br) on 2016-07-15T13:21:36Z (GMT) No. of bitstreams: 1 2014_diss_ommsales.pdf: 4229373 bytes, checksum: 0087285c704b68c550008eeb3ca7869a (MD5) / Made available in DSpace on 2016-07-15T13:21:36Z (GMT). No. of bitstreams: 1 2014_diss_ommsales.pdf: 4229373 bytes, checksum: 0087285c704b68c550008eeb3ca7869a (MD5) Previous issue date: 2014 / It investigates the understanding that resident doctors have about the process of information retrieval on the basis of Public Medical (PubMed) data, taking into consideration the aspects of usability in human-computer interaction, the resources available and the level of user satisfaction in searching process. The theoretical framework used for this research relates the concepts of information and information systems for the healthcare, and then addresses the Information Retrieval systems and databases, entering the field of information architecture for evaluating the usability of these sources information. The methodological approach includes exploratory research whose first phase consisted of the heuristic evaluation of the PubMed database interface, using the guidelines proposed by Nielsen and Tahir (2002). The results of this analysis show that although these guidelines have been designed to build homepage, thirty-eight of them are suited to the PubMed interface. Therefore, it is inferred that these guidelines can be used for heuristic evaluation of databases focused on the area of Health regarding the usability of this database, it was observed that the interface has a well-structured architecture, is friendly and objective, and present numerous possibilities for search and retrieval of information. The second phase of empirical study took place through the application of prospective usability testing to measure user satisfaction database. These tests were done using a semi-structured questionnaire administered to resident doctors specialty of Internal Medicine, University Hospital Walter Cantídio the Federal University of Ceará, totaling 36% of participants. The results of this step show a good performance and a good user satisfaction PubMed regarding the usability of the database, considering that enables them to achieve their research goals with real effectiveness and efficiency, yet they do not know all the resources available to search and retrieval of information offered by this database. / Investiga qual o entendimento que os médicos residentes têm sobre o processo de recuperação de informação na base de dados Public Medical (PubMed), levando em consideração os aspectos relativos à usabilidade na interação humano-computador, os recursos disponíveis e o nível de satisfação do usuário no processo de busca. O referencial teórico utilizado para esta pesquisa relaciona os conceitos de informação e de informação para a área da saúde, e em seguida aborda os Sistemas de Recuperação de Informação e as bases de dados, adentrando no campo da arquitetura da informação para avaliar a usabilidade dessas fontes de informação. O percurso metodológico contempla a pesquisa exploratória cuja primeira etapa constou da avaliação heurística da interface da base de dados PubMed, utilizando- se as diretrizes propostas por Nielsen e Tahir (2002). Os resultados dessa análise evidenciam que, embora tais diretrizes tenham sido pensadas para a construção de homepage, trinta e oito delas se adequaram à interface da PubMed. Portanto, infere- se que essas diretrizes podem ser utilizadas para a avaliação heurística de bases de dados voltadas para a área da Saúde. Com relação à usabilidade dessa base de dados, evidenciou-se que a interface tem uma arquitetura bem estruturada, é amigável e objetiva, além de apresentar inúmeras possibilidades de busca e recuperação da informação. A segunda etapa do estudo empírico deu-se por meio da aplicação dos testes prospectivos de usabilidade para mensurar a satisfação dos usuários da base de dados. Esses testes foram feitos por meio de um questionário semiestruturado aplicado aos médicos residentes da especialidade de Clínica Médica do Hospital Universitário Walter Cantídio da Universidade Federal do Ceará, perfazendo um total de 36% de participantes. Os resultados dessa etapa evidenciam um bom desempenho e uma boa satisfação dos usuários da PubMed quanto à usabilidade dessa base de dados, haja vista que permite a eles atingirem seus objetivos de pesquisa com real eficácia e eficiência, ainda que não conheçam todos os recursos disponíveis para a busca e a recuperação da informação oferecidos por essa base de dados. Sistema de Recuperação de Informação Usabilidade Bases de dados PubMed Avaliação heurística Information Retrieval System Usability Databases PubMed Heuristic evaluation
48	Usabilidade de software: um estudo do Catalogo Auslib França, Fabiana da Silva 27 April 2011 (has links) Submitted by Viviane Lima da Cunha (viviane@biblioteca.ufpb.br) on 2017-11-09T13:17:55Z No. of bitstreams: 2 arquivototal.pdf: 4984003 bytes, checksum: 15bf30b6c475733d92d8611db7b64bc0 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-11-09T13:17:55Z (GMT). No. of bitstreams: 2 arquivototal.pdf: 4984003 bytes, checksum: 15bf30b6c475733d92d8611db7b64bc0 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2011-04-27 / The automation of information retrieval systems in libraries try to reduce the cognitive load of work, to perform specific tasks, which previously was done manually with chips in a catalog. With the automation of libraries and the help of the Internet, catalogs are now available online. So, it was carried out a study about the online catalog Auslib, wich aim was analyze automation softwares for libraries. This study was based on a hybrid approach scoped to verify the subjective satisfaction of the Sistemoteca’s users from the Federal University of Campina Grande. The public search module was analized, because it was already deployed and working. Were realized tests of interaction with students and staff. The usability happened as field survey and in the laboratory, attended by 128 users, according to all three prospects for evaluating the hybrid approach: a compliance inspection of the product, performance measurement and survey of user satisfaction. In the compliance inspection was decided to use parts: 14 - dialogues with menu, 16 - dialogues by direct manipulation and 17 - forms of dialogues. The choice of those parts of ISO 9241 was due to the modes of interaction, human-computer interface feature that - if the evaluated product. Part 11, ISO 9241:1998 was used to select assessment strategies in order to measure the performance and survey of user satisfaction. The instruments used for collecting data in this study were: questionnaires for delineation of the profile and subjective satisfaction of users, interview, test script, record video and list of compliance inspection. As a collection method was used verbalizing procedures for users to comment his acts. After the quantitative and qualitative analysis of the data, testing with users allowed to verify how they interacted with the interface Auslib Catalog. A result for the two categories of users as obtained as an index of a minimal satisfaction with the product, in both test environments, it’s justified by the list of faults detected in the module's public search software.It’s believed that the research has significantly contributed through usability guidelines proposed as a way to improve the product interface. / A automatização dos sistemas de recuperação da informação nas bibliotecas tenta diminuir a carga cognitiva de trabalho, para a realização de tarefas específicas, o que antes era feito manualmente com fichas em forma de catálogo. Com a automação das bibliotecas e o auxílio da Internet, os catálogos são atualmente disponibilizados online. Desta forma, realizou-se um estudo acerca do catálogo online Auslib, com o objetivo de analisar o software de automação para bibliotecas. Este estudo foi fundamentado em uma abordagem híbrida de avaliação, com o propósito de identificar a satisfação subjetiva dos usuários do Sistemoteca da Universidade Federal de Campina Grande. O alvo do estudo foi o módulo destinado à pesquisa pública, por estar implantado e em funcionamento. O estudo se caracteriza como exploratório e descritivo. Foram realizados os ensaios de interação com alunos e funcionários. Os testes de usabilidade ocorreram em ambiente laboratorial e em campo, com 128 usuários, de acordo com os três enfoques de avaliação da abordagem híbrida: a inspeção de conformidade do produto, a mensuração do desempenho e a sondagem de satisfação do usuário. Na inspeção de conformidade, optou-se em utilizar as partes: 14 – diálogos com menu, 16 – diálogos por manipulação direta e 17 – diálogos por formulários. A opção por estas partes da ISO 9241 foi devido aos modos de interação, na interface homemcomputador que apresentam-se no produto avaliado. A parte 11 do padrão internacional ISO 9241:1998 foi utilizada para escolher estratégias de avaliação, a adotar com o intuito de mensurar o desempenho e a sondagem da satisfação dos usuários. Os instrumentos para a coleta de dados utilizados na pesquisa foram: questionários para delineamento do perfil e para a sondagem da satisfação subjetiva dos usuários; entrevista; roteiro de teste; registro em vídeo e listas de inspeção de conformidade. Como método de coleta, utilizou-se a verbalização de procedimentos para que os usuários comentassem, verbalmente, suas ações ao longo das sessões de teste. Após as análises quantitativas e qualitativas dos dados, o teste com os usuários possibilitou verificar o modo como interagiram com a interface do Catálogo Auslib. Obteve-se como resultado para as duas categorias de usuários, nos dois ambientes de teste, um índice de satisfação mínina com o produto, justificada pela lista de falhas detectadas no módulo de pesquisa pública do software. Acredita-se que a pesquisa contribuiu, de forma significativa, mediante diretrizes de usabilidade propostas, como instrumento para o aprimoramento do produto. Usabilidade de software Sistema de recuperação de informação Catálogo Auslib Software usability Information retrieval system Catalog Auslib
49	Extração de relações semanticas via análise de correlação de termos em documentos / Extracting semantic relations via analysis of correlated terms in documents Botero, Sergio William 12 December 2008 (has links) Orientador: Ivan Luiz Marques Ricarte / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-12T17:41:25Z (GMT). No. of bitstreams: 1 Botero_SergioWilliam_M.pdf: 2163763 bytes, checksum: a7c5db625a3d99cead80cee63b7908ce (MD5) Previous issue date: 2008 / Resumo: Sistemas de recuperação de informação são ferramentas para automatizar os procedimentos de busca por informações. Surgiram com propostas simples nas quais a recuperação era baseada exclusivamente na sintaxe das palavras e evoluíram para sistemas baseados na semântica das palavras como, por exemplo, os que utilizam ontologias. Entretanto, a especificação manual de ontologias é uma tarefa extremamente custosa e sujeita a erros humanos. Métodos automáticos para a construção de ontologias mostraram-se ineficientes, identificando falsas relações semânticas. O presente trabalho apresenta uma técnica baseada em processamento de linguagem natural e um novo algoritmo de agrupamento para a extração semi-automática de relações que utiliza o conteúdo dos documentos, uma ontologia de senso comum e supervisão do usuário para identificar corretamente as relações semânticas. A proposta envolve um estágio que utiliza recursos lingüísticos para a extração de termos e outro que utiliza algoritmos de agrupamento para a identificação de conceitos e relações semânticas de instanciação entre termos e conceitos. O algoritmo proposto é baseado em técnicas de agrupamento possibilístico e de bi-agrupamento e permite a extração interativa de conceitos e relações. Os resultados são promissores, similares às metodologias mais recentes, com a vantagem de permitir a supervisão do processo de extração / Abstract: Information Retrieval systems are tools to automate the searching for information. The first implementations were very simple, based exclusively on word syntax, and have evolved to systems that use semantic knowledge such as those using ontologies. However, the manual specification is an expensive task and subject to human mistakes. In order to deal with this problem, methodologies that automatically construct ontologies have been proposed but they did not reach good results, identifying false semantic relation between words. This work presents a natural language processing technique e a new clustering algorithm for the semi-automatic extraction of semantic relations by using the content of the document, a commom-sense ontology, and the supervision of the user to correctly identify semantic relations. The proposal encompasses a stage that uses linguistic resources to extract the terms and another stage that uses clustering algorithms to identify concepts and instanceof relations between terms and concepts. The proposed algorithm is based on possibilistic clustering and bi-clustering techniques and it allows the interative extraction of concepts. The results are promising, similar to the most recent methodologies, with the advantage of allowing the supervision of the extraction process / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Processamento de textos (Computação) Semântica Recuperação da informação Ontologia Text processing (Computation) Semantic Information retrieval Information retrieval system Ontology
50	Bag of graphs = definition, implementation, and validation in classification tasks / Sacola de grafos : definição, implementação e validação em tarefas de classificação Silva, Fernanda Brandão, 1988- 25 August 2018 (has links) Orientador: Ricardo da Silva Torres / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-25T12:32:03Z (GMT). No. of bitstreams: 1 Silva_FernandaBrandao_M.pdf: 11339319 bytes, checksum: 8a89e5104479c5f13a4d5718242f42c9 (MD5) Previous issue date: 2014 / Resumo: Atualmente, há uma alta demanda por soluções que possibilitem a implementação de serviços de recuperação e classificação eficazes e eficientes para grande volumes de dados. Nesse contexto, diversos estudos têm investigado o uso de novas técnicas baseadas na comparação de estruturas locais presentes em objetos na implementação de serviços de classificação e recuperação. Estruturas locais podem ser caracterizadas por diferentes tipos de relacionamentos (e.g., distribuição espacial) entre primitivas de objetos, sendo geralmente exploradas em problemas de reconhecimento de padrões. Nessa dissertação de mestrado, propomos a Sacola de Grafos, uma nova abordagem baseada no modelo de Sacola de Palavras Visuais, que utiliza grafos para codificar estruturas locais de um objeto. Uma definição formal do modelo proposto é apresentada, assim como conceitos e regras que tornam este modelo flexível e ajustável a diferentes aplicações. Na abordagem proposta, um objeto é representado por um grafo que modela as estruturas locais existentes. Usando um dicionário pré-definido, o objeto pode ser descrito por uma representação vetorial com a frequência de ocorrência de padrões locais no grafo correspondente. Neste trabalho, apresentamos dois métodos baseados no modelo proposto, a Sacola de Grafos Triviais e a Sacola de Grafos Visuais, que constroem representações vetoriais para imagens e grafos, respectivamente. Ambos os métodos são validados em tarefas de classificação. Nós avaliamos o método Sacola de Grafos Triviais para classificação de grafos em quatro bases do repositório IAM, obtendo resultados significativos em termos de acurácia e tempo de execução. O método Sacola de Grafos Visuais é avaliado para classificação de imagens nas bases Caltech-101 e Caltech-256, alcançando resultados promissores, com elevados valores de acurácia / Abstract: Nowadays, there is a strong interest for solutions that allow the implementation of effective and efficient retrieval and classification services associated with large volumes of data. In this context, several studies have been investigating the use of new techniques based on the comparison of local structures within objects in the implementation of classification and retrieval services. Local structures may be characterized by different types of relationships (e.g., spatial distribution) among object primitives, being commonly exploited in pattern recognition problems. In this dissertation, we propose the Bag of Graphs (BoG), a new approach based on the Bag-of-Words model that uses graphs for encoding local structures of a digital object. We present a formal definition of the proposed model, introducing concepts and rules that make this model flexible and adaptable for different applications. In the proposed approach, a digital object is represented by a graph that models the existing local structures. Using a pre-defined dictionary, the object is described by a vector representation with the frequency of occurrence of local patterns in the corresponding graph. In this work, we present two BoG-based methods, the Bag of Singleton Graphs (BoSG) and the Bag of Visual Graphs (BoVG), which create vector representations for graphs and images, respectively. Both methods are validated in classification tasks. We evaluate the Bag of Singleton Graphs (BoSG) for graph classification on four datasets of the IAM repository, obtaining significant results in terms of both accuracy and execution time. The method Bag of Visual Graphs (BoVG), which encodes the spatial distribution of visual words, is evaluated for image classification on the Caltech-101 and Caltech-256 datasets, achieving promising results with high accuracy scores / Mestrado / Ciência da Computação / Mestra em Ciência da Computação Visão por computador Reconhecimento de padrões Classificação Computer vision Information retrieval system Pattern recognition Classification

Search results