Global ETD Search

291	Robust Real-time Query Processing with QStream Schmidt, Sven, Legler, Thomas, Schär, Sebastian, Lehner, Wolfgang 08 August 2023 (has links) Processing data streams with Quality-of-Service (QoS) guarantees is an emerging area in existing streaming applications. Although it is possible to negotiate the result quality and to reserve the required processing resources in advance, it remains a challenge to adapt the DSMS to data stream characteristics which are not known in advance or are difficult to obtain. Within this paper we present the second generation of our QStream DSMS which addresses the above challenge by using a real-time capable operating system environment for resource reservation and by applying an adaptation mechanism if the data stream characteristics change spontaneously. info:eu-repo/classification/ddc/004 ddc:004
292	OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data Braunschweig, Katrin, Eberius, Julian, Thiele, Maik, Lehner, Wolfgang 27 January 2023 (has links) Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches. In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data. info:eu-repo/classification/ddc/004 ddc:004
293	Compilation Techniques, Algorithms, and Data Structures for Efficient and Expressive Data Processing Systems Supun Madusha Bandara Abeysinghe Tennakoon Mudiyanselage (17454786) 30 November 2023 (has links) <pre>The proliferation of digital data, driven by factors like social media, e-commerce, etc., has created an increasing demand for highly processed data at higher levels of fidelity, which puts increasing demands on modern data processing systems. In the past, data processing systems faced bottlenecks due to limited main memory availability. However, as main memory becomes more abundant, their optimization focus has shifted from disk I/O to optimized computation through techniques like compilation. This dissertation addresses several critical limitations within such compilation-based data processing systems.<br><br>In modern data analytics pipelines, combination of workloads from various paradigms, such as traditional DBMS and Machine Learning, is common. <br>These pipelines are typically managed by specialized systems designed for specific workload types. While these specialized systems optimize their individual performance, substantial performance loss occurs when they are combined to handle mixed workloads. This loss is mainly due to overheads at system boundaries, including data copying and format conversions, as well as the general inability to perform cross-system optimizations.<br><br>This dissertation tackles this problem in two angles. First, it proposes an efficient post-hoc integration of individual systems using generative programming via the construction of common intermediate layers. This approach preserves the best-of-breed performance of individual workloads while achieving state-of-the-art performance for combined workloads. Second, we introduce a high-level query language capable of expressing various workload types, acting as a general substrate to implement combined workloads. This allows the generation of optimized code for end-to-end workloads through<br>the construction of an intermediate representation (IR).<br><br>The dissertation then shifts focus to data processing systems used for incremental view maintenance (IVM). While existing IVM systems achieve high performance through compilation and novel algorithms, they have limitations in handling specific query classes. Notably, they are incapable of handling queries involving correlated nested aggregate subqueries. To address this, our work proposes a novel indexing scheme based on a new data structure and a corresponding set of algorithms that fully incrementalize such queries. This approach result in substantial asymptotic speedups and order-of-magnitude performance improvements for workloads of practical importance.<br><br>Finally, the dissertation explores efficient and expressive fixed-point computations, with a focus on Datalog--a language widely used for declarative program analysis. Although existing Datalog engines rely on compilation and specialized code generation to achieve performance, they lack the flexibility to support extensions required for complex program analysis. Our work introduces a new Datalog engine built using generative programming techniques that offers both flexibility and state-of-the-art performance through specialized code generation.</pre><p></p> Database systems Programming languages Compilation. Data structures. Datalog Incremental Computing Machine Learning Query Compilation Program Analysis Declarative Program Analysis Declarative Query Languages
294	Using Vocabulary Mappings for Federated RDF Query Processing / Att använda vokabulär mappning för federerad RDF frågebehandling Winneroth, Juliette January 2023 (has links) Federated RDF querying systems provide an interface to multiple autonomous RDF data sources, allowing a user to execute a SPARQL query on multiple data sources at once and get one unified result. When these autonomous data sources use different vocabularies, the SPARQL query must be rewritten to the vocabulary of the data source in order to get the desired results. This thesis describes how vocabulary mappings can be used to rewrite SPARQL queries for federated RDF query processing. In this thesis, different types of vocabulary mappings are explored to find a suitable vocabulary mapping representation to use in formulating an approach for query rewriting. The approach describes how the SPARQL subqueries and solution mappings can be rewritten in order to handle heterogeneous vocabularies. The thesis then presents how the query federation engine HeFQUIN is extended to rewrite the federated queries and their results. A final evaluation of the implementation shows how implementing a query rewriting approach can improve the federated query engine’s execution times. federated query processing RDF vocabulary mapping query federation engine SPARQL Natural Sciences Naturvetenskap Computer and Information Sciences Data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi)
295	Highspeed Graph Processing Exploiting Main-Memory Column Stores Hauck, Matthias, Paradies, Marcus, Fröning, Holger, Lehner, Wolfgang, Rauhe, Hannes 03 February 2023 (has links) A popular belief in the graph database community is that relational database management systems are generally ill-suited for efficient graph processing. This might apply for analytic graph queries performing iterative computations on the graph, but does not necessarily hold true for short-running, OLTP-style graph queries. In this paper we argue that, instead of extending a graph database management system with traditional relational operators—predicate evaluation, sorting, grouping, and aggregations among others—one should consider adding a graph abstraction and graph-specific operations, such as graph traversals and pattern matching, to relational database management systems. We use an exemplary query from the interactive query workload of the LDBC social network benchmark and run it against our enhanced in-memory, columnar relational database system to support our claims. Our performance measurements indicate that a columnar RDBMS—extended by graph-specific operators and data structures—can serve as a foundation for high-speed graph processing on big memory machines with non-uniform memory access and a large number of available cores. info:eu-repo/classification/ddc/004 ddc:004
296	Hereditary Colorectal Cancer: Information-Based Approach Manilich, Elena A. January 2010 (has links) No description available. Computer Science medical informatics databases hereditary cancer pedigree family tree data mining colorectal cancer random forest microarray data query interface query sets Cologene Familial adenomatous polyposis FAP
297	Relaxation of Subgraph Queries Delivering Empty Results Vasilyeva, Elena, Thiele, Maik, Mocan, Adrian, Lehner, Wolfgang 16 September 2022 (has links) Graph databases with the property graph model are used in multiple domains including social networks, biology, and data integration. They provide schema-flexible storage for data of a different degree of a structure and support complex, expressive queries such as subgraph isomorphism queries. The exibility and expressiveness of graph databases make it difficult for the users to express queries correctly and can lead to unexpected query results, e.g. empty results. Therefore, we propose a relaxation approach for subgraph isomorphism queries that is able to automatically rewrite a graph query, such that the rewritten query is similar to the original query and returns a non-empty result set. In detail, we present relaxation operations applicable to a query, cardinality estimation heuristics, and strategies for prioritizing graph query elements to be relaxed. To determine the similarity between the original query and its relaxed variants, we propose a novel cardinality-based graph edit distance. The feasibility of our approach is shown by using real-world queries from the DBpedia query log. info:eu-repo/classification/ddc/004 ddc:004
298	Multi Domain Semantic Information Retrieval Based on Topic Model Lee, Sanghoon 07 May 2016 (has links) Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important information from numerous database sources have been a key challenge in current IR systems. Topic modeling is one of the most recent techniquesthat discover hidden thematic structures from large data collections without human supervision. Several topic models have been proposed in various fields of study and have been utilized extensively for many applications. Latent Dirichlet Allocation (LDA) is the most well-known topic model that generates topics from large corpus of resources, such as text, images, and audio.It has been widely used in many areas in information retrieval and data mining, providing efficient way of identifying latent topics among document collections. However, LDA has a drawback that topic cohesion within a concept is attenuated when estimating infrequently occurring words. Moreover, LDAseems not to consider the meaning of words, but rather to infer hidden topics based on a statisticalapproach. However, LDA can cause either reduction in the quality of topic words or increase in loose relations between topics. In order to solve the previous problems, we propose a domain specific topic model that combines domain concepts with LDA. Two domain specific algorithms are suggested for solving the difficulties associated with LDA. The main strength of our proposed model comes from the fact that it narrows semantic concepts from broad domain knowledge to a specific one which solves the unknown domain problem. Our proposed model is extensively tested on various applications, query expansion, classification, and summarization, to demonstrate the effectiveness of the model. Experimental results show that the proposed model significantly increasesthe performance of applications. Information retrieval Semantics Topic model Query expansion Text classification Text summarization
299	從搜尋引擎查詢紀錄中學習Ontology / Ontology Learning from Query Logs of Search Engines 陳茂富 Unknown Date (has links) Ontology可用來組織、管理與分享知識，Ontology Engineering是一種建構Ontology的過程，建構的過程中，多數的工作需要人費時費力地去完成，因此利用機器來輔助Ontology Engineering成了一門重要的課題。使用Knowledge Discovery的方法協助Ontology Engineering建構Ontology的過程，稱為Ontology Learning，本論文中提出的Ontology Learning方法為分析使用者在搜尋引擎下關鍵字查詢時的行為，加上利用與查詢關鍵字有關的網頁資訊，以輔助建構Ontology。本論文中的Ontology由使用者所查詢的關鍵字組成，我們要learning的，則是這些關鍵字彼此之間的關係，其中有上義詞、下義詞與同義詞等等，因此，自動尋找關鍵字彼此之間的關係以輔助建構Ontology，即為我們提出本論文的目的。除此之外，本論文亦實作了完整的Ontology Learning系統，從一開始使用者查詢記錄的蒐集，關鍵字擷取與分析，關鍵字之間的關係判定，直到最後Ontology的產生，都將由系統自動完成。 / Ontology can be used to organize, manage and share knowledge. Ontology Engineering is the process of constructing Ontology. However, it’s usually a time-consuming and error-prone task. Thus, utilizing methods of Knowledge Discovery to help Ontology Engineering is called Ontology Learning. In this thesis, Ontology Learning process is done by using those pages related query terms and analyzing the querying behavior of users on search engines. The Ontology is organized by user query terms and relations among them. These relations we define are hyperonomy, hyponomy, synonymy and et al. Our goal of this thesis is to automatically learn the correct relations among these query terms. Besides, we implemented the complete system platform for Ontology Learning. The system can automatically collect logs, extract and analyze query keywords, and produce the final Ontology. 搜尋引擎查詢紀錄學習Ontology Ontology learning Query log
300	Optimal and Robust Routing of Subscriptions for Unifying Access to the Past and the Future in Publish/Subscribe Li, Guoli 18 February 2011 (has links) A flexible, scalable, and asynchronous middleware abstract is needed for business process management, which involves thousands of tasks and a large number of running instances of large business processes. The content-based publish/subscribe system is an ideal candidate to serve as enterprise service bus for these applications. In the publish/subscribe paradigm, information providers called publishers disseminate publications to all subscribers who have expressed interests by registering subscriptions through a loosely coupled interface. However, the traditional publish/subscribe paradigm only supports stateless subscriptions, that is, event correlation is ignored. Moreover, subscribers can only receive publications issued after their subscriptions. There are many application contexts, however, where access to publications from the past is necessary,such as for replaying a business process execution to debug it. Even more interesting uses arise when data from the past can be correlated with those in the future. Therefore, new languages and new functionalities are needed in the standard publish/subscribe model in order to support business process management. A new subscription language PADRES SQL(PSQL) which can express event patterns and unify both historic and future views for subscribers. PADRES allows a subscriber to access data published both in the past and in the future. Furthermore, complex event detection happens in the broker network. The main difficulties of distributed event detection are routing a composite subscription, including where and how to decompose the composite subscription, and routing the individual parts of the subscription. Our composite subscription routing decisions are based on a cost model which minimizes the routing and detection delay. An adaptive subscription routing protocol is proposed to determine efficient location with dynamic changing workloads. PADRES also provides robust message delivery by exploring alternative paths in a cyclic overlay. Routing optimizations and efficient matching algorithms are studied to improve the performance of the extended publish/subscribe model. With the above features, we propose the Ninos system, the distributed business process execution architecture as a case study,which uses light-weight activity agents to carry out business process execution in a distributed environment. Ninos proves that decentralized business process execution is the trend for next generation products, and the publish/subscribe model is ideal to serve as an enterpriser service bus (ESB) for distributed applications. publish/subscribe content-based routing query middleware BPEL workflow managment 0984

Search results