Spelling suggestions: "subject:"data discovery"" "subject:"data viscovery""
1 |
Scalable Discovery and Analytics on Web Linked DataAbdelaziz, Ibrahim 07 1900 (has links)
Resource Description Framework (RDF) provides a simple way for expressing facts across the web, leading to Web linked data. Several distributed and federated RDF systems have emerged to handle the massive amounts of RDF data available nowadays. Distributed systems are optimized to query massive datasets that appear as a single graph, while federated systems are designed to query hundreds of decentralized and interlinked graphs.
This thesis starts with a comprehensive experimental study of the state-of-the-art RDF systems. It identifies a set of research problems for improving the state-of-the-art, including: supporting the emerging RDF analytics required by many modern applications, querying linked data at scale, and enabling discovery on linked data. Addressing these problems is the focus of this thesis.
First, we propose Spartex; a versatile framework for complex RDF analytics. Spartex extends SPARQL to seamlessly combine generic graph algorithms with SPARQL queries. Spartex implements a generic SPARQL operator as a vertex-centric program that interprets SPARQL queries and executes them efficiently using a built-in optimizer. We demonstrate that Spartex scales to datasets with billions of edges, and is at least as fast as the state-of-the-art specialized RDF engines. For analytical tasks, Spartex is an order of magnitude faster than existing alternatives.
To address the scalability limitation of federated RDF engines, we propose Lusail; a scalable system for querying geo-distributed RDF graphs. Lusail follows a two-tier strategy: (i) locality-aware decomposition of the query into subqueries to maximize the computations at the endpoints and minimize intermediary results, and (ii) selectivity-aware execution to reduce network latency and increase parallelism. Our experiments on billions of triples show that Lusail outperforms existing systems by orders of magnitude in scalability and response time.
Finally, enabling discovery on linked data is challenging due to the prior knowledge required to formulate SPARQL queries. To address these challenges; we develop novel techniques to (i) predict semantically equivalent SPARQL queries from a set of keywords by leveraging word embeddings, and (ii) generate fine-grained and non-blocking query plans to get fast and early results.
|
2 |
Porovnání nástrojů pro Data Discovery / Data Discovery Tools ComparisonKopecký, Martin January 2012 (has links)
Diploma thesis focuses on Data Discovery tools, which have been growing in im-portance in the Business Intelligence (BI) field during the last few years. Increasing number of companies of all sizes tend to include them in their BI environments. The main goal of this thesis is to compare QlikView, Tableau and PowerPivot using a defined set of criteria. The comparison is based on development of human resources report, which was modeled on a real life banking sector business case. The main goal is supported by a number of minor goals, namely: analysis of existing comparisons, definition of a new set of criteria, basic description of the compared platforms, and documentation of the case study. The text can be divided into two major parts. The theoretical part describes elemental BI architecture, discusses In-memory databases and data visualisation in context of a BI solution, and analyses existing comparisons of Data Discovery tools and BI platforms in general. Eight different comparisons are analysed in total, including reports of consulting companies and diploma theses. The applied part of the thesis builds upon the previous analysis and defines comparison criteria divided into five groups: Data import, transformation and storage; Data analysis and presentation; Operations criteria; User friendliness and support; Business criteria. The subsequent chapter describes the selected platforms, their brief history, component architecture, available editions and licensing. Case study chapter documents development of the report in each of the platforms and pinpoints their pros and cons. The final chapter applies the defined set of criteria and uses it to compare the selected Data Discovery platforms to fulfil the main goal of this thesis. The results are presented both numerically, utilising the weighted sum model, and verbally. The contribution of the thesis lies in the transparent confrontation of three Data Discovery tools, in the definition of a new set of comparison criteria, and in the documentation of the practical testing. The thesis offers an indirect answer to the question: "Which analytical tool should we use to supplement our existing BI solution?"
|
3 |
Self Service Business Intelligence Design : Guidelines for Designing a Customizable Qlik Sense ApplicationHahr, Andreas, Åberg, Ludvig January 2016 (has links)
With the increasing amount of valuable data that companies have access to the need for tools visualizing this data have reached a wider group of users, many of which are not tech-savvy. Self-service Business Intelligence applications aim to meet this need and many guidelines regarding the general design of Business Intelligence have been produced in recent years. In this thesis some of these guidelines are interpreted and applied during the development of a Qlik Sense application for the Device Connection Platform department at Ericsson. The purpose of this thesis is to produce more specific guidelines that aim to complement existing general guidelines on Self-service Business Intelligence design; guidelines that should be taken into account when developing Qlik Sense applications. As a result, five guidelines that concerns conditional dimensions, screen resolutions, naming conventions for master items, the data layer and Qlik Sense conventions for visualizations are presented. Pros and cons regarding these guidelines are discussed along with alternative approaches. The conclusion states that the general guidelines interpreted in this project work were helpful for the workflow and readability of the application, but that more specific guidelines such as the ones presented in the result could be well needed when it comes to customizabil ity and flexibility for end users. / Allt eftersom mängden värdefull data som företag har tillgång till ökar har behovet av verktyg som visualiserar dessa data nått en bredare grupp användare, där många är mindre tekniskt kunniga. Self-service Business Intelligence applikationer syftar till att möta detta behov och många generella riktlinjer för hur sådana applikationer ska designas har tagits fram under senare år. I denna rapport blir dessa riktlinjer tolkade och därefter applicerade under tiden av skapandet av en Self-service Business Intelligence applikation i mjukvaran Qlik Sense och för Ericsson Device Connection Platform. Syftet med rapporten är att utforma och presentera specifika riktlinjer för Qlik Sense att användas som komplement till de existerande och mer generella riktlinjerna för design av Self-service Business Intelligence i allmänhet. Rapportens resultat består av fem riktlinjer som avser villkorliga dimensioner, skärmupplösning, namnkonventioner för original, datalagret och Qlik Sense egna konventioner för visualiseringar. Föroch nackdelar med de framtagna riktlinjerna diskuteras tillsammans med alternativa tillvägagångssätt. Vi drar slutsatsen att de generella riktlinjerna som tolkats genom projektet i denna rapport var speciellt hjälpfulla för att uppnå läsbarhet och ett bra arbetsflöde för slutanvändaren i applikationen. Vidare konstateras att fler mjukvaruspecifika riktlinjer kan vara välbehövliga när det kommer till anpassningsbarhet av applikationer och flexibilitet för slutanvändare.
|
4 |
Integration of Heterogeneous Databases: Discovery of Meta-Information and Maintenance of Schema-Restructuring ViewsKoeller, Andreas 15 April 2002 (has links)
In today's networked world, information is widely distributed across many independent databases in heterogeneous formats. Integrating such information is a difficult task and has been adressed by several projects. However, previous integration solutions, such as the EVE-Project, have several shortcomings. Database contents and structure change frequently, and users often have incomplete information about the data content and structure of the databases they use. When information from several such insufficiently described sources is to be extracted and integrated, two problems have to be solved: How can we discover the structure and contents of and interrelationships among unknown databases, and how can we provide durable integration views over several such databases? In this dissertation, we have developed solutions for those key problems in information integration. The first part of the dissertation addresses the fact that knowledge about the interrelationships between databases is essential for any attempt at solving the information integration problem. We are presenting an algorithm called FIND2 based on the clique-finding problem in graphs and k-uniform hypergraphs to discover redundancy relationships between two relations. Furthermore, the algorithm is enhanced by heuristics that significantly reduce the search space when necessary. Extensive experimental studies on the algorithm both with and without heuristics illustrate its effectiveness on a variety of real-world data sets. The second part of the dissertation addresses the durable view problem and presents the first algorithm for incremental view maintenance in schema-restructuring views. Such views are essential for the integration of heterogeneous databases. They are typically defined in schema-restructuring query languages like SchemaSQL, which can transform schema into data and vice versa, making traditional view maintenance based on differential queries impossible. Based on an existing algebra for SchemaSQL, we present an update propagation algorithm that propagates updates along the query algebra tree and prove its correctness. We also propose optimizations on our algorithm and present experimental results showing its benefits over view recomputation.
|
5 |
Aplicação da mineração de opinião no planejamento turístico do município de GramadoEndres, Marco Antonio Trois 28 April 2016 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-07-18T18:07:03Z
No. of bitstreams: 1
Marco Antonio Trois Endres _.pdf: 5043076 bytes, checksum: f944e2d6d0e1a6e1ca49512a57670875 (MD5) / Made available in DSpace on 2016-07-18T18:07:04Z (GMT). No. of bitstreams: 1
Marco Antonio Trois Endres _.pdf: 5043076 bytes, checksum: f944e2d6d0e1a6e1ca49512a57670875 (MD5)
Previous issue date: 2016-04-28 / Nenhuma / O propósito deste estudo é explorar o processo de descoberta de conhecimento e analisar as oportunidades geradas pela Mineração de Opinião como técnica para se obter um retorno sobre experiência do turista em relação aos produtos e serviços ofertados pelo destino turístico. Entender o turista quanto ao seu comportamento de compra e seus hábitos de viagem é fundamental para a ampliação do mercado turístico e melhoria da experiência turística do visitante. Usuários da web têm a oportunidade de registrar e divulgar suas ideias e opiniões através de comentários em redes sociais. Estas opiniões estão disponíveis e em grande volume para as organizações. Neste contexto perguntam-se, quais as contribuições da Mineração de Opinião na geração de informação útil para a gestão da atividade turística, como suporte ao processo de tomada de decisão no planejamento e no aprimoramento das suas ações? Este estudo teve como cenário de investigação o município de Gramado/RS e os comentários registrados em redes sociais pelos turistas que o visitam. Para alcançar o propósito deste estudo, foram extraídas opiniões do Twitter e Facebook e submetidas a uma técnica de análise de sentimentos. Como resultado do estudo, são apresentados e discutidos os resultados da aplicação da Mineração de Opinião consolidados de acordo com as dimensões de competitividade que o município é avaliado. / The purpose of this study is to explore the knowledge discovery process and analyze the opportunities generated by the Opinion Mining as a technique to obtain a feedback on the tourist experience about products and services offered by the tourist destination. Understanding the tourist about their buying behavior and their travel habits is essential to the expansion of the tourist market and improvement of the tourist experience. Web users have the opportunity to register and show their ideas and opinions through posts on social networks. These opinions are available in high volume to organizations. In this context, what are the contributions of Opinion Mining to generate useful information for the management of tourism activities, to support the decision-making process in planning and improvement of their actions? This study analyses the comments registered on social networks by tourists who visit Gramado/RS. To achieve the purpose of this study, opinions were extracted from Twitter and Facebook and submitted to a sentiment analysis technique. As a result of the study are presented and discussed the results summarized according to the competitiveness of dimensions that the municipality is assessed.
|
6 |
Big data - použití v bankovní sféře / Big data - application in bankingUřídil, Martin January 2012 (has links)
There is a growing volume of global data, which is offering new possibilities for those market participants, who know to take advantage of it. Data, information and knowledge are new highly regarded commodity especially in the banking industry. Traditional data analytics is intended for processing data with known structure and meaning. But how can we get knowledge from data with no such structure? The thesis focuses on Big Data analytics and its use in banking and financial industry. Definition of specific applications in this area and description of benefits for international and Czech banking institutions are the main goals of the thesis. The thesis is divided in four parts. The first part defines Big Data trend, the second part specifies activities and tools in banking. The purpose of the third part is to apply Big Data analytics on those activities and shows its possible benefits. The last part focuses on the particularities of Czech banking and shows what actual situation about Big Data in Czech banks is. The thesis gives complex description of possibilities of using Big Data analytics. I see my personal contribution in detailed characterization of the application in real banking activities.
|
Page generated in 0.0684 seconds