Global ETD Search

231	Layout Optimization for Distributed Relational Databases Using Machine Learning Patvarczki, Jozsef 23 May 2012 (has links) A common problem when running Web-based applications is how to scale-up the database. The solution to this problem usually involves having a smart Database Administrator determine how to spread the database tables out amongst computers that will work in parallel. Laying out database tables across multiple machines so they can act together as a single efficient database is hard. Automated methods are needed to help eliminate the time required for database administrators to create optimal configurations. There are four operators that we consider that can create a search space of possible database layouts: 1) denormalizing, 2) horizontally partitioning, 3) vertically partitioning, and 4) fully replicating. Textbooks offer general advice that is useful for dealing with extreme cases - for instance you should fully replicate a table if the level of insert to selects is close to zero. But even this seemingly obvious statement is not necessarily one that will lead to a speed up once you take into account that some nodes might be a bottle neck. There can be complex interactions between the 4 different operators which make it even more difficult to predict what the best thing to do is. Instead of using best practices to do database layout, we need a system that collects empirical data on when these 4 different operators are effective. We have implemented a state based search technique to try different operators, and then we used the empirically measured data to see if any speed up occurred. We recognized that the costs of creating the physical database layout are potentially large, but it is necessary since we want to know the "Ground Truth" about what is effective and under what conditions. After creating a dataset where these four different operators have been applied to make different databases, we can employ machine learning to induce rules to help govern the physical design of the database across an arbitrary number of computer nodes. This learning process, in turn, would allow the database placement algorithm to get better over time as it trains over a set of examples. What this algorithm calls for is that it will try to learn 1) "What is a good database layout for a particular application given a query workload?" and 2) "Can this algorithm automatically improve itself in making recommendations by using machine learned rules to try to generalize when it makes sense to apply each of these operators?" There has been considerable research done in parallelizing databases where large amounts of data are shipped from one node to another to answer a single query. Sometimes the costs of shipping the data back and forth might be high, so in this work we assume that it might be more efficient to create a database layout where each query can be answered by a single node. To make this assumption requires that all the incoming query templates are known beforehand. This requirement can easily be satisfied in the case of a Web-based application due to the characteristic that users typically interact with the system through a web interface such as web forms. In this case, unseen queries are not necessarily answerable, without first possibly reconstructing the data on a single machine. Prior knowledge of these exact query templates allows us to select the best possible database table placements across multiple nodes. But in the case of trying to improve the efficiency of a Web-based application, a web site provider might feel that they are willing to suffer the inconvenience of not being able to answer an arbitrary query, if they are in turn provided with a system that runs more efficiently. distributed databases machine learning layout optimization
232	Analytical Query Execution Optimized for all Layers of Modern Hardware Polychroniou, Orestis January 2018 (has links) Analytical database queries are at the core of business intelligence and decision support. To analyze the vast amounts of data available today, query execution needs to be orders of magnitude faster. Hardware advances have made a profound impact on database design and implementation. The large main memory capacity allows queries to execute exclusively in memory and shifts the bottleneck from disk access to memory bandwidth. In the new setting, to optimize query performance, databases must be aware of an unprecedented multitude of complicated hardware features. This thesis focuses on the design and implementation of highly efficient database systems by optimizing analytical query execution for all layers of modern hardware. The hardware layers include the network across multiple machines, main memory and the NUMA interconnection across multiple processors, the multiple levels of caches across multiple processor cores, and the execution pipeline within each core. For the network layer, we introduce a distributed join algorithm that minimizes the network traffic. For the memory hierarchy, we describe partitioning variants aware to the dynamics of the CPU caches and the NUMA interconnection. To improve the memory access rate of linear scans, we optimize lightweight compression variants and evaluate their trade-offs. To accelerate query execution within the core pipeline, we introduce advanced SIMD vectorization techniques generalizable across multiple operators. We evaluate our algorithms and techniques on both mainstream hardware and on many-integrated-core platforms, and combine our techniques in a new query engine design that can better utilize the features of many-core CPUs. In the era of hardware becoming increasingly parallel and datasets consistently growing in size, this thesis can serve as a compass for developing hardware-conscious databases with truly high-performance analytical query execution. Computer science Databases Querying (Computer science) Hardware
233	Keyword search in relational databases. / CUHK electronic theses & dissertations collection January 2010 (has links) In this thesis, for the schema-based approaches, we propose an efficient algorithm to general all relational algebra expressions in order to find all the connected trees in an RDB. We also study an efficient algorithm to evaluate all the expressions using semijoins in RDBMS . We show that our method can also be extended to answer continuous keyword queries in a relational data stream. We further propose novel algorithms that find sets of tuples that are reachable from a root tuple within a radius, and algorithms that find multi-center subgraphs within a radius. Our algorithms use SQL queries only in order to make fully use of RDBMS. We show that the current commercial RDBMSs are powerful enough to support such keyword queries in RDBs efficiently without any additional new indexing to be built and maintained. The main idea behind our approach is tuple reduction. For the graph-based approaches, we propose an efficient algorithm to find all/top- K multi-center subgraphs in polynomial delay. We also introduce a new kind of keyword query, namely, structural statistics by keywords, to summarize keyword search results into several dimensions. We conducted extensive performance studies using two large real datasets IMDB and DBLP to show the efficiency and effectiveness of all our approaches. / Keyword search in relational databases (RDBs) has been extensively studied recently. A keyword search (or a keyword query) in RDBs is specified by a set of keywords to explore the interconnected tuple structures in an RDB that cannot be easily identified using SQL on RDBMSs. In brief, it finds how the tuples containing the given keywords are connected via sequences of connections (foreign key references) among tuples in an RDB. Such interconnected tuple structures can be found as connected trees up to a certain size, sets of tuples that are reachable from a root tuple within a radius, or even multi-center subgraphs within a radius. In the literature, there are two main approaches, namely schema-based approaches and graph-based approaches. The schema-based approaches are to generate a set of relational algebra expressions and evaluate every such expression using SQL on an RDBMS directly or in a middleware on top of an RDBMS indirectly. Due to a large number of relational algebra expressions needed to process, most of the existing works take a middleware approach without fully utilizing RDBMSs. The graph-based approaches are to materialize an RDB as a graph and find the interconnected tuple structures using graph-based algorithms in memory. / Qin, Lu. / Adviser: Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 73-02, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 133-138). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Keyword searching Querying (Computer science) Relational databases
234	Indexing techniques for object-oriented databases. January 1996 (has links) by Frank Hing-Wah Luk. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 92-95). / Abstract --- p.ii / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- The Problem in Object-Oriented Database Indexing --- p.2 / Chapter 1.3 --- Contributions --- p.3 / Chapter 1.4 --- Thesis Organization --- p.4 / Chapter 2 --- Object-oriented Data Model --- p.5 / Chapter 2.1 --- Object-oriented Data Model --- p.5 / Chapter 2.2 --- Object and Object Identifiers --- p.6 / Chapter 2.3 --- Complex Attributes and Methods --- p.6 / Chapter 2.4 --- Class --- p.8 / Chapter 2.4.1 --- Inheritance Hierarchy --- p.8 / Chapter 2.4.2 --- Aggregation Hierarchy --- p.8 / Chapter 2.5 --- Sample Object-Oriented Database Schema --- p.9 / Chapter 3 --- Indexing in Object-Oriented Databases --- p.10 / Chapter 3.1 --- Introduction --- p.10 / Chapter 3.2 --- Indexing on Inheritance Hierarchy --- p.10 / Chapter 3.3 --- Indexing on Aggregation Hierarchy --- p.13 / Chapter 3.4 --- Indexing on Integrated Support --- p.16 / Chapter 3.5 --- Indexing on Method Invocation --- p.18 / Chapter 3.6 --- Indexing on Overlapping Path Expressions --- p.19 / Chapter 4 --- Triple Node Hierarchy --- p.23 / Chapter 4.1 --- Introduction --- p.23 / Chapter 4.2 --- Triple Node --- p.25 / Chapter 4.3 --- Triple Node Hierarchy --- p.26 / Chapter 4.3.1 --- Construction of the Triple Node Hierarchy --- p.26 / Chapter 4.3.2 --- Updates in the Triple Node Hierarchy --- p.31 / Chapter 4.4 --- Cost Model --- p.33 / Chapter 4.4.1 --- Storage --- p.33 / Chapter 4.4.2 --- Query Cost --- p.35 / Chapter 4.4.3 --- Update Cost --- p.35 / Chapter 4.5 --- Evaluation --- p.37 / Chapter 4.6 --- Summary --- p.42 / Chapter 5 --- Triple Node Hierarchy in Both Aggregation and Inheritance Hierarchies --- p.43 / Chapter 5.1 --- Introduction --- p.43 / Chapter 5.2 --- Preliminaries --- p.44 / Chapter 5.3 --- Class-Hierarchy Tree --- p.45 / Chapter 5.4 --- The Nested CH-tree --- p.47 / Chapter 5.4.1 --- Construction --- p.47 / Chapter 5.4.2 --- Retrieval --- p.48 / Chapter 5.4.3 --- Update --- p.48 / Chapter 5.5 --- Cost Model --- p.49 / Chapter 5.5.1 --- Assumptions --- p.51 / Chapter 5.5.2 --- Storage --- p.52 / Chapter 5.5.3 --- Query Cost --- p.52 / Chapter 5.5.4 --- Update Cost --- p.53 / Chapter 5.6 --- Evaluation --- p.55 / Chapter 5.6.1 --- Storage Cost --- p.55 / Chapter 5.6.2 --- Query Cost --- p.57 / Chapter 5.6.3 --- Update Cost --- p.62 / Chapter 5.7 --- Summary --- p.63 / Chapter 6 --- Decomposition of Path Expressions --- p.65 / Chapter 6.1 --- Introduction --- p.65 / Chapter 6.2 --- Configuration on Path Expressions --- p.67 / Chapter 6.2.1 --- Single Path Expression --- p.67 / Chapter 6.2.2 --- Overlapping Path Expressions --- p.68 / Chapter 6.3 --- New Algorithm --- p.70 / Chapter 6.3.1 --- Example --- p.72 / Chapter 6.4 --- Evaluation --- p.75 / Chapter 6.5 --- Summary --- p.76 / Chapter 7 --- Conclusion and Future Research --- p.77 / Chapter 7.1 --- Conclusion --- p.77 / Chapter 7.2 --- Future Research --- p.78 / Chapter A --- Evaluation of some Parameters in Chapter5 --- p.79 / Chapter B --- Cost Model for Nested-Inherited Index --- p.82 / Chapter B.1 --- Storage --- p.82 / Chapter B.2 --- Query Cost --- p.84 / Chapter B.3 --- Update --- p.84 / Chapter C --- Algorithm constructing a minimum auxiliary set of J Is --- p.87 / Chapter D --- Estimation on the number of possible combinations --- p.89 / Bibliography --- p.92 Indexing Database management Object-oriented databases
235	Gossip mechanisms for distributed database systems. January 2007 (has links) Yam, Shing Chung Jonathan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 75-79). / Abstracts in English and Chinese. / Abstract / Acknowledgement / Contents / List of Figures / List of Tables / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.2 / Chapter 1.2 --- Thesis Organization --- p.5 / Chapter 2 --- Literature Review --- p.7 / Chapter 2.1 --- Data Sharing and Dissemination --- p.7 / Chapter 2.2 --- Data Aggregation --- p.12 / Chapter 2.3 --- Sensor Network Database Systems --- p.13 / Chapter 2.4 --- Data Routing and Networking --- p.23 / Chapter 2.5 --- Other Applications --- p.24 / Chapter 3 --- Preliminaries --- p.25 / Chapter 3.1 --- Probability Distribution and Gossipee-selection Schemes --- p.25 / Chapter 3.2 --- The Network Models --- p.28 / Chapter 3.3 --- Objective and Problem Statement --- p.30 / Chapter 3.4 --- Two-tier Gossip Mechanism --- p.31 / Chapter 3.5 --- Semantic-dependent Gossip Mechanism --- p.32 / Chapter 4 --- Results for Two-tier Gossip Mechanisms --- p.34 / Chapter 4.1 --- Background --- p.34 / Chapter 4.2 --- A Time Bound for Solving the Clustered Destination Problem with T-Theorem 1 --- p.39 / Chapter 4.3 --- Further Results´ؤTheorem 2 --- p.49 / Chapter 4.4 --- Experimental Results for Two-tier and N-tier Gossip Mechanisms --- p.51 / Chapter 4.4.1 --- Performance Evaluation of Two-tier Gossip Mechanisms --- p.52 / Chapter 4.4.2 --- Performance Evaluation of N-tier Gossip Mechanisms --- p.56 / Chapter 4.5 --- Discussion --- p.60 / Chapter 5 --- Results for Semantic-dependent Gossip Mechanisms --- p.62 / Chapter 5.1 --- Background --- p.62 / Chapter 5.2 --- Theory --- p.65 / Chapter 5.3 --- "Detection of Single Moving Heat Source with S max(2c1l,c1h ))" --- p.66 / Chapter 5.4 --- Detection of Multiple Static Heat Sources with Two-tier Gossip mechanism --- p.69 / Chapter 5.5 --- Discussion --- p.72 / Chapter 6 --- Conclusion --- p.73 / Chapter 7 --- References --- p.75 / Appendix Prove of Result 4.3 --- p.80 Computer network protocols Computer networks Distributed databases
236	Video Game Development Strategies for Creating Successful Cognitively Challenging Games Williams, Walter K. 01 January 2018 (has links) The video game industry is a global multibillion dollar industry with millions of players. The process of developing video games is essential for the continued growth of the industry, and developers need to employ effective strategies that will help them to create successful games. The purpose of this explorative qualitative single case study was to investigate the design strategies of video game developers who have successfully created video games that are challenging, entertaining, and successful. The technology acceptance model served as a conceptual framework. The entire population for this study was members of a video game development team from a small successful video game development company in North Carolina. The data collection process included interviews with 7 video game developers and analysis of 7 organizational documents. Member checking was used to increase the validity of the findings from the participants. Through the use of triangulation, 4 major themes were identified in the study: the video game designer has a significant impact on the development process, the development process for successful video games follows iterative agile programming methods, programming to challenge cognition is not a target goal for developers, and receiving feedback is essential to the process. The findings in this study may benefit future video game developers and organizations to develop strategies for developing successful games that entertain and challenge players while ensuring the viability of the organization. Findings may influence society as they demonstrate where the points of interest should be directed concerning the impact of video games upon behavior of the players. Development TAM Video Games Databases and Information Systems
237	E-Business Strategy to Adopt Electronic Banking Services in Ethiopia Gebreslassie, Teklebrhan Woldearegay 01 January 2017 (has links) E-banking services in Ethiopia are increasing among low-income populations; however, with over 53 million mobile service users countrywide, more than 85% of the population still lacks access to banking services. A single case study was used to explore e-business strategies that bank managers use to promote the adoption of electronic banking services to the unbanked population in Ethiopia. The extended resource-based view of strategy served as the conceptual framework for this study. Data were collected from interviews with 12 experienced bank managers from leading commercial bank in Ethiopia. Data were analyzed using coding techniques and word clustering, with the help of qualitative data analysis software. After member checking and methodological triangulation, data were sorted into 5 themes including ensuring leadership, creating accessibility, fostering customers' acceptance, leveraging unique features and organizational resources, and building an e-banking ecosystem. The result showed that bank managers need to develop a customer-centric organizational posture and they should focus to build e-banking ecosystem inside and outside the country so that they can realize their vision to become global competitors. The findings from the study may contribute to positive social change for the unbanked communities in Ethiopia by informing bank managers options of e-banking adoption strategies thereby improve the convenience and accessibility of banking services. E Banking adoption strategy Databases and Information Systems
238	An informetric study of the distribution of bibliographic records in online databases: a case study using the literature of Fuzzy Set Theory (1965-1993) Hood, William, School of Information Library & Archive Studies, UNSW January 1999 (has links) This study investigated the distribution of bibliographic records amongst online bibliographic databases. The topic of Fuzzy Set Theory over the period of 1965 to 1993 was chosen to provide the case study for this investigation. From the DIALOG database host, searches were conducted on 114 databases to determine the number of journal article records relating to the topic of Fuzzy Sets. Both the number of records in each database, as well as the overlap of coverage between the databases were calculated. Six counting techniques were developed to allocate records to databases based on different methods for handling records that were duplicated between databases. When duplicate records are included, the top database accounts for 19% of the records; when duplicates are removed, the top database was found to account for 37% of the records. The distribution of records in databases was found to conform to the Bradford-Zipf hyperbolic distribution. Various other analyses were undertaken including: the duplicate records themselves, the total size of the DIALOG database system over time and the density of Fuzzy Set records in databases over time. A secondary aim of this study was to perform an informetric study on the literature of Fuzzy Set Theory itself. Results obtained include an analysis of the growth of the Fuzzy Set literature, an analysis of the journals covering the topic of Fuzzy Sets, an analysis of the terminology used in describing topics related to Fuzzy Sets. Also, the Ulrich's database was used to provide a subject classification of the journals to analyse the diffusion of the topic of Fuzzy Sets into other disciplines. Apart from the discipline of mathematics, the top disciplines into which Fuzzy Sets have diffused were found to be applied physics, systems and computing. The third aim of the thesis was to refine and develop the methodology used to perform large scale informetric studies using data from a variety of online bibliographic databases. Commercially available software was used wherever possible, but where this was not possible or infeasible, custom written programs were developed to perform various steps in the methodology. bibliometrics informetrics bibliographic databases fuzzy set theory
239	Explorations In Searching Compressed Nucleic Acid And Protein Sequence Databases And Their Cooperatively-Compressed Indices Gardner-Stephen, Paul Mark, paul.gardner-stephen@flinders.edu.au January 2008 (has links) Nucleic acid and protein databases such as GenBank are growing at a rate that perhaps eclipses even Moores Law of increase in computational power. This poses a problem for the biological sciences, which have become increasingly dependant on searching and manipulating these databases. It was once reasonably practical to perform exhaustive searches of these databases, for example using the algorithm described by Smith and Waterman, however it has been many years since this was the case. This has led to the development of a series of search algorithms, such as FASTA, BLAST and BLAT, that are each successively faster, but at similarly successive costs in terms of thoroughness. Attempts have been made to remedy this problem by devising search algorithms that are both fast and thorough. An example is CAFE, which seeks to construct a search system with a sub-linear relationship between search time and database size, and argues that this property must be present for any search system to be successful in the long term. This dissertation explores this notion by seeking to construct a search system that takes advantage of the growing redundancy in databases such as GenBank in order to reduce both the search time and the space required to store the databases and their indices, while preserving or increasing the thoroughness of the search. The result is the creation and implementation of new genomic sequence search and alignment, database compression, and index compression algorithms and systems that make progress toward resolving the problem of reducing search speed and space requirements while improving sensitivity. However, success is tempered by the need for databases with adequate local redundancy, and the computational cost of these algorithms when servicing un-batched queries. bioinformatics databases genomics information retrieval text compression
240	Serializable Isolation for Snapshot Databases Cahill, Michael James January 2009 (has links) PhD / Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases. databases transactions concurrency control snapshot isolation serialiazability

Search results