Global ETD Search

11	Módulo de consultas distribuídas do Infinispan / Module that supports distributed queries in Infinispan Lacerra, Israel Danilo 26 November 2012 (has links) Com a grande quantidade de informações existentes nas aplicações computacionais hoje em dia, cada vez mais tornam-se necessários mecanismos que facilitem e aumentem o desempenho da recuperação dessas informações. Nesse contexto vem surgindo os bancos de dados chamados de NOSQL, que são bancos de dados tipicamente não relacionais que, em prol da disponibilidade e do desempenho em ambientes com enormes quantidades de dados, abrem mão de requisitos antes vistos como fundamentais. Neste trabalho iremos lidar com esse cenário ao implementar o módulo de consultas distribuídas do JBoss Infinispan, um sistema de cache distribuído que funciona também como um banco de dados NOSQL em memória. Além de apresentar a implementação desse módulo, iremos falar do surgimento do movimento NOSQL, de como se caracterizam esses bancos e de onde o Infinispan se insere nesse movimento. / With the big amount of data available to computer applications nowadays, there is an increasing need for mechanisms that facilitate the retrieval of such data and improve data access performance. In this context we see the emergence of so-called NOSQL databases, which are databases that are typically non-relational and that give up fulfilling some requirements previously seen as fundamental in order to achieve better availability and performance in big data environments. In this work we deal with the scenario above and implement a module that supports distributed queries in JBoss Infinispan, a distributed cache system that works also as an in-memory NOSQL database. Besides presenting the implementation of that module, we discuss the emergence of the NOSQL movement, the characterization of NOSQL databases, and where Infinispan fits in this context. cache cache cache distribuído consultas distribuídas data grid data grid distributed cache distributed queries Infinispan Infinispan Lucene. NOSQL NOSQL sistemas de grade de dados
12	Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus Lyall-Wilson, Jennifer Rae January 2013 (has links) The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of all the concepts within the document collection as well as the relationships these concepts have to one another. Because the representation is generated using data from the association thesaurus, a mapping will exist between the representation of the concepts and the terms used to describe these concepts. The research applies to search engines designed for use in an individual website with content focused on a specific conceptual domain. Therefore, both the document collection and the subject content must be well-bounded, which affords the ability to make use of techniques not currently feasible for general purpose search engine used on the entire web. automatic query expansion conceptual network information retrieval Lucene search engine Natural Language Processing (NLP) association thesaurus
13	Jak kvalita lemmatizace ovlivňuje výsledky vyhledávání dokumentů v českém jazyce / Effect of the Czech Stemming Algorithm on the Document Retrieval Pytelka, Petr January 2012 (has links) This thesis deals with the measurement of the quality of the stemming/lemmatization algo-rithm for the Czech language in document processing systems and provides an analysis of the results. The theoretical part of the thesis describes the principles of the full-text search, the possibilities of implementation as well as the common problems which have to be solved in connection with the processing of natural language. Methods of evaluating the quality of lemmatization, using recall and precision, are discussed. In addition, the theoret-ical part covers the method of measuring the index of under-stemming and over-stemming, which can be applied for the purposes of a more detailed evaluation. An experiment for evaluating the lemmatization algorithms is proposed in the second part of the thesis. A specialized application has been developed to perform the experiment in three different systems, namely Apache Lucene, the PostgreSQL database systems and the Microsoft SQL Server. The experiment is based on the Prague Dependency Treebank cor-pus. It has been carried out both for the corpus as a whole and for selected word classes separately. Further analysis of the results for Czech stemmer in Apache Lucene leads to a proposal for several modifications of the algorithm. Such modifications result in measurable improvements. The results achieved show how metrics discussed, together with the values measured, can be used for improving the lemmatization algorithms and thus to improve the full-text search for Czech language.
14	Módulo de consultas distribuídas do Infinispan / Module that supports distributed queries in Infinispan Israel Danilo Lacerra 26 November 2012 (has links) Com a grande quantidade de informações existentes nas aplicações computacionais hoje em dia, cada vez mais tornam-se necessários mecanismos que facilitem e aumentem o desempenho da recuperação dessas informações. Nesse contexto vem surgindo os bancos de dados chamados de NOSQL, que são bancos de dados tipicamente não relacionais que, em prol da disponibilidade e do desempenho em ambientes com enormes quantidades de dados, abrem mão de requisitos antes vistos como fundamentais. Neste trabalho iremos lidar com esse cenário ao implementar o módulo de consultas distribuídas do JBoss Infinispan, um sistema de cache distribuído que funciona também como um banco de dados NOSQL em memória. Além de apresentar a implementação desse módulo, iremos falar do surgimento do movimento NOSQL, de como se caracterizam esses bancos e de onde o Infinispan se insere nesse movimento. / With the big amount of data available to computer applications nowadays, there is an increasing need for mechanisms that facilitate the retrieval of such data and improve data access performance. In this context we see the emergence of so-called NOSQL databases, which are databases that are typically non-relational and that give up fulfilling some requirements previously seen as fundamental in order to achieve better availability and performance in big data environments. In this work we deal with the scenario above and implement a module that supports distributed queries in JBoss Infinispan, a distributed cache system that works also as an in-memory NOSQL database. Besides presenting the implementation of that module, we discuss the emergence of the NOSQL movement, the characterization of NOSQL databases, and where Infinispan fits in this context. cache cache distribuído consultas distribuídas data grid Infinispan Lucene. NOSQL sistemas de grade de dados cache data grid distributed cache distributed queries Infinispan NOSQL
15	Text-Based Information Retrieval Using Relevance Feedback Krishnan, Sharenya January 2011 (has links) Europeana, a freely accessible digital library with an idea to make Europe's cultural and scientific heritage available to the public was founded by the European Commission in 2008. The goal was to deliver a semantically enriched digital content with multilingual access to it. Even though they managed to increase the content of data they slowly faced the problem of retrieving information in an unstructured form. So to complement the Europeana portal services, ASSETS (Advanced Search Service and Enhanced Technological Solutions) was introduced with services that sought to improve the usability and accessibility of Europeana. My contribution is to study different text-based information retrieval models, their relevance feedback techniques and to implement one simple model. The thesis explains a detailed overview of the information retrieval process along with the implementation of the chosen strategy for relevance feedback that generates automatic query expansion. Finally, the thesis concludes with the analysis made using relevance feedback, discussion on the model implemented and then an assessment on future use of this model both as a continuation of my work and using this model in ASSETS. Information Retrieval Relevance Feedback Query Expansion Rocchio classification Probabilistic model Lucene Similarity scoring function Kullback-Leibler Divergence (KLD) Engineering and Technology Teknik och teknologier
16	Identifiering av anomalier i COSMIC genom analys av loggar / Identification of anomalies in COSMIC through log analysis Al-egli, Muntaher, Zeidan Nasser, Adham January 2015 (has links) Loggar är en viktig del av alla system, det ger en inblick i vad som sker. Att analysera loggar och extrahera väsentlig information är en av de största trenderna nu inom IT-branchen. Informationen i loggar är värdefulla resurser som kan användas för att upptäcka anomalier och hantera dessa innan det drabbar användaren. I detta examensarbete dyker vi in i grunderna för informationssökning och analysera undantagsutskrifter i loggar från COSMIC för att undersöka om det är möjligt att upptäcka anomalier med hjälp av retrospektivdata. Detta examensarbete ger även en inblick i möjligheten att visualisera data från loggar och erbjuda en kraftfull sökmotor. Därför kommer vi att fördjupa oss i de tre välkända program som adresserar frågorna i centraliserad loggning: Elasticsearch, Logstash och Kibana. Sammanfattningsvis visar resultatet att det är möjligt att upptäckta anomalier genom att tillämpa statistiska metoder både på retrospektiv- och realtidsdata. / Logs are an important part of any system; it provides an insight into what is happening. One of the biggest trends in the IT industry is analyzing logs and extracting essential information. The information in the logs are valuable resources that can be used to detect anomalies and manage them before it affects the user In this thesis we will dive into the basics of the information retrieval and analyze exceptions in the logs from COSMIC to investigate whether it is feasible to detect anomalies using retrospective data. This thesis also gives an insight into whether it’s possible to visualize data from logs and offer a powerful search engine. Therefore we will dive into the three well known applications that addresses the issues in centralized logging: Elasticsearch, Logstash and Kibana. In summary, our results shows that it’s possible to detected anomalies by applying statistical methods on both in retrospective and real time data. Information retrieval log management real-time monitoring classification normalization aggregation correlation vector space model Boolean retrieval Lucene Elasticsearch Logstash Kibana database COSMIC Cambio LIPS Informationssökning logghantering realtidsövervakning klassificering normalisering aggregering korrelation vektorrumsmodell boolesk hämtningsmodell Lucene Elasticserach Logstash Kibana databaser COSMIC Cambio LIPS

Page generated in 0.0352 seconds