Spelling suggestions: "subject:"distributed databases"" "subject:"eistributed databases""
11 |
Document replication and distribution algorithms for load balancing ingeographically distributed web server systemsZhuo, Ling, 卓玲 January 2002 (has links)
published_or_final_version / Computer Science and Information Systems / Master / Master of Philosophy
|
12 |
Layout Optimization for Distributed Relational Databases Using Machine LearningPatvarczki, Jozsef 23 May 2012 (has links)
A common problem when running Web-based applications is how to scale-up the database. The solution to this problem usually involves having a smart Database Administrator determine how to spread the database tables out amongst computers that will work in parallel. Laying out database tables across multiple machines so they can act together as a single efficient database is hard. Automated methods are needed to help eliminate the time required for database administrators to create optimal configurations. There are four operators that we consider that can create a search space of possible database layouts: 1) denormalizing, 2) horizontally partitioning, 3) vertically partitioning, and 4) fully replicating. Textbooks offer general advice that is useful for dealing with extreme cases - for instance you should fully replicate a table if the level of insert to selects is close to zero. But even this seemingly obvious statement is not necessarily one that will lead to a speed up once you take into account that some nodes might be a bottle neck. There can be complex interactions between the 4 different operators which make it even more difficult to predict what the best thing to do is. Instead of using best practices to do database layout, we need a system that collects empirical data on when these 4 different operators are effective. We have implemented a state based search technique to try different operators, and then we used the empirically measured data to see if any speed up occurred. We recognized that the costs of creating the physical database layout are potentially large, but it is necessary since we want to know the "Ground Truth" about what is effective and under what conditions. After creating a dataset where these four different operators have been applied to make different databases, we can employ machine learning to induce rules to help govern the physical design of the database across an arbitrary number of computer nodes. This learning process, in turn, would allow the database placement algorithm to get better over time as it trains over a set of examples. What this algorithm calls for is that it will try to learn 1) "What is a good database layout for a particular application given a query workload?" and 2) "Can this algorithm automatically improve itself in making recommendations by using machine learned rules to try to generalize when it makes sense to apply each of these operators?" There has been considerable research done in parallelizing databases where large amounts of data are shipped from one node to another to answer a single query. Sometimes the costs of shipping the data back and forth might be high, so in this work we assume that it might be more efficient to create a database layout where each query can be answered by a single node. To make this assumption requires that all the incoming query templates are known beforehand. This requirement can easily be satisfied in the case of a Web-based application due to the characteristic that users typically interact with the system through a web interface such as web forms. In this case, unseen queries are not necessarily answerable, without first possibly reconstructing the data on a single machine. Prior knowledge of these exact query templates allows us to select the best possible database table placements across multiple nodes. But in the case of trying to improve the efficiency of a Web-based application, a web site provider might feel that they are willing to suffer the inconvenience of not being able to answer an arbitrary query, if they are in turn provided with a system that runs more efficiently.
|
13 |
Gossip mechanisms for distributed database systems.January 2007 (has links)
Yam, Shing Chung Jonathan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 75-79). / Abstracts in English and Chinese. / Abstract / Acknowledgement / Contents / List of Figures / List of Tables / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.2 / Chapter 1.2 --- Thesis Organization --- p.5 / Chapter 2 --- Literature Review --- p.7 / Chapter 2.1 --- Data Sharing and Dissemination --- p.7 / Chapter 2.2 --- Data Aggregation --- p.12 / Chapter 2.3 --- Sensor Network Database Systems --- p.13 / Chapter 2.4 --- Data Routing and Networking --- p.23 / Chapter 2.5 --- Other Applications --- p.24 / Chapter 3 --- Preliminaries --- p.25 / Chapter 3.1 --- Probability Distribution and Gossipee-selection Schemes --- p.25 / Chapter 3.2 --- The Network Models --- p.28 / Chapter 3.3 --- Objective and Problem Statement --- p.30 / Chapter 3.4 --- Two-tier Gossip Mechanism --- p.31 / Chapter 3.5 --- Semantic-dependent Gossip Mechanism --- p.32 / Chapter 4 --- Results for Two-tier Gossip Mechanisms --- p.34 / Chapter 4.1 --- Background --- p.34 / Chapter 4.2 --- A Time Bound for Solving the Clustered Destination Problem with T-Theorem 1 --- p.39 / Chapter 4.3 --- Further Results´ؤTheorem 2 --- p.49 / Chapter 4.4 --- Experimental Results for Two-tier and N-tier Gossip Mechanisms --- p.51 / Chapter 4.4.1 --- Performance Evaluation of Two-tier Gossip Mechanisms --- p.52 / Chapter 4.4.2 --- Performance Evaluation of N-tier Gossip Mechanisms --- p.56 / Chapter 4.5 --- Discussion --- p.60 / Chapter 5 --- Results for Semantic-dependent Gossip Mechanisms --- p.62 / Chapter 5.1 --- Background --- p.62 / Chapter 5.2 --- Theory --- p.65 / Chapter 5.3 --- "Detection of Single Moving Heat Source with S max(2c1l,c1h ))" --- p.66 / Chapter 5.4 --- Detection of Multiple Static Heat Sources with Two-tier Gossip mechanism --- p.69 / Chapter 5.5 --- Discussion --- p.72 / Chapter 6 --- Conclusion --- p.73 / Chapter 7 --- References --- p.75 / Appendix Prove of Result 4.3 --- p.80
|
14 |
Performance analysis of a distributed file systemMukhopadhyay, Meenakshi 01 January 1990 (has links)
An important design goal of a distributed file system, a component of many distributed systems, is to provide UNIX file access semantics, e.g., the result of any write system call is visible by all processes as soon as the call completes. In a distributed environment, these semantics are difficult to implement because processes on different machines do not share kernel cache and data structures. Strong data consistency guarantees may be provided only at the expense of performance.
This work investigates the time costs paid by AFS 3.0, which uses a callback mechanism to provide consistency guarantees, and those paid by AFS 4.0 which uses typed tokens for synchronization. AFS 3.0 provides moderately strong consistency guarantees, but they are not like UNIX because data are written back to the server only after a file is closed. AFS 4.0 writes back data to the server whenever there are other clients wanting to access it, the effect being like UNIX file access semantics. Also, AFS 3.0 does not guarantee synchronization of multiple writers, whereas AFS 4.0 does.
|
15 |
On indexing large databases for advanced data modelsSamoladas, Vasilis. January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2001. / Vita. Includes bibliographical references. Available also from UMI Company.
|
16 |
On indexing large databases for advanced data modelsSamoladas, Vasilis 04 April 2011 (has links)
Not available / text
|
17 |
Distributed databases for Multi Mediation : Scalability, Availability & PerformanceKuruganti, NSR Sankaran January 2015 (has links)
Context: Multi Mediation is a process of collecting data from network(s) & network elements, pre-processing this data and distributing it to various systems like Big Data analysis, Billing Systems, Network Monitoring Systems, and Service Assurance etc. With the growing demand for networks and emergence of new services, data collected from networks is growing. There is need for efficiently organizing this data and this can be done using databases. Although RDBMS offers Scale-up solutions to handle voluminous data and concurrent requests, this approach is expensive. So, alternatives like distributed databases are an attractive solution. Suitable distributed database for Multi Mediation, needs to be investigated. Objectives: In this research we analyze two distributed databases in terms of performance, scalability and availability. The inter-relations between performance, scalability and availability of distributed databases are also analyzed. The distributed databases that are analyzed are MySQL Cluster 7.4.4 and Apache Cassandra 2.0.13. Performance, scalability and availability are quantified, measurements are made in the context of Multi Mediation system. Methods: The methods to carry out this research are both qualitative and quantitative. Qualitative study is made for the selection of databases for evaluation. A benchmarking harness application is designed to quantitatively evaluate the performance of distributed database in the context of Multi Mediation. Several experiments are designed and performed using the benchmarking harness on the database cluster. Results: Results collected include average response time & average throughput of the distributed databases in various scenarios. The average throughput & average INSERT response time results favor Apache Cassandra low availability configuration. MySQL Cluster average SELECT response time is better than Apache Cassandra for greater number of client threads, in high availability and low availability configurations.Conclusions: Although Apache Cassandra outperforms MySQL Cluster, the support for transaction and ACID compliance are not to be forgotten for the selection of database. Apart from the contextual benchmarks, organizational choices, development costs, resource utilizations etc. are more influential parameters for selection of database within an organization. There is still a need for further evaluation of distributed databases. / <p>I am indebted to my advisor Prof. Lars Lundberg and his valuable ideas which helped in the completion of this work. In fact he has guided on every crucial and important stages of this research work.</p><p>I sincerely thank Prof. Markus Fiedler & Prof. Kurt Tutschku for their endless support during the work.</p><p>I am grateful to Neeraj Garg, Sourab, Saket & Kulbir at Ericsson, for providing me necessary equipment and helping me financially during my work.</p><p>To my family members and friends who one way or the other shared their support. Thank you.</p><p>Above all I would like to thank the Supreme Personality of Godhead, the author of everything.</p>
|
18 |
An aggregate navigator for data warehouseKhandelwal, Nileshkumar. January 2000 (has links)
Thesis (M.S.)--Ohio University, June, 2000. / Title from PDF t.p.
|
19 |
Networked International Classification of Diseases, ninth revision (ICD-9) coder a .NET approach /Krishnaprasad, Balaji. January 2003 (has links)
Thesis (M.S.)--University of Florida, 2003. / Title from title page of source document. Includes vita. Includes bibliographical references.
|
20 |
Representation and optimization for data integration /Friedman, Marc T., January 1999 (has links)
Thesis (Ph. D.)--University of Washington, 1999. / Includes bibliographical references (p. 144-155).
|
Page generated in 0.0952 seconds