61 |
Coping with value dependency for failure recovery in multidatabase systems /Sun, Yongmei, January 1997 (has links)
Thesis (M. Sc.), Memorial University of Newfoundland, 1998. / Restricted until June 1999. Bibliography: leaves 78-83.
|
62 |
Algorithmes décentralisés et asynchrones pour l'apprentissage statistique large échelle et application à l'indexation multimédia / Decentralized and asynchronous algorithms for large scale machine learning and application to multimedia indexingFellus, Jérôme 03 October 2017 (has links)
Avec l’avènement de « l'ère des données », les besoins des systèmes de traitement de l'information en ressources de calcul ont explosé, dépassant largement les évolutions technologiques des processeurs modernes. Dans le domaine de l'apprentissage statistique en particulier, les paradigmes de calcul massivement distribués représentent la seule alternative praticable.L'algorithmique distribuée emprunte la plupart de ses concepts à l'algorithmique classique, centralisée et séquentielle, dans laquelle le comportement du système est décrit comme une suite d'instructions exécutées l'une après l'autre. L'importance de la communication entre unités de calcul y est généralement négligée et reléguée aux détails d'implémentation. Or, lorsque le nombre d'unités impliquées augmente, le poids des opérations locales s'efface devant les effets émergents propres aux larges réseaux d'unités. Pour conserver les propriétés désirables de stabilité, de prédictibilité et de programmabilité offertes par l'algorithmique centralisée, les paradigmes de calcul distribué doivent dès lors intégrer cette dimension qui relève de la théorie des graphes.Cette thèse propose un cadre algorithmique pour l'apprentissage statistique large échelle, qui prévient deux défaut majeurs des méthodes classiques : la centralisation et la synchronisation. Nous présentons ainsi plusieurs algorithmes basés sur des protocoles Gossip décentralisés et asynchrones, applicables aux problèmes de catégorisation, estimation de densité, réduction de dimension, classification et optimisation convexe. Ces algorithmes produisent des solutions identiques à leurs homologues centralisés, tout en offrant une accélération appréciable sur de larges réseaux pour un coût de communication très réduit. Ces qualités pratiques sont démontrées mathématiquement par une analyse de convergence détaillée. Nous illustrons finalement la pertinence des méthodes proposées sur des tâches d'indexation multimédia et de classification d'images. / With the advent of the "data era", the amount of computational resources required by information processing systems has exploded, largely exceeding the technological evolutions of modern processors. Specifically, contemporary machine learning applications necessarily resort to massively distributed computation.Distributed algorithmics borrows most of its concepts from classical centralized and sequential algorithmics, where the system's behavior is defined as a sequence of instructions, executed one after the other. The importance of communication between computation units is generally neglected and pushed back to implementation details. Yet, as the number of units grows, the impact of local operations vanishes behind the emergent effects related to the large network of units. To preserve the desirable properties of centralized algorithmics such as stability, predictability and programmability, distributed computational paradigms must encompass this graph-theoretical dimension.This thesis proposes an algorithmic framework for large scale machine learning, which prevent two major drawbacks of classical methods, namely emph{centralization} and emph{synchronization}. We therefore introduce several new algorithms based on decentralized and asynchronous Gossip protocols, for solving clustering, density estimation, dimension reduction, classification and general convex optimization problems, while offering an appreciable speed-up on large networks with a very low communication cost. These practical advantages are mathematically supported by a theoretical convergence analysis. We finally illustrate the relevance of proposed methods on multimedia indexing applications and real image classification tasks.
|
63 |
Uma arquitetura para consultas a repositorios de biodiversidade na Web / An architecture to query biodiversity data on the WebGomes Junior, Luiz Celso, 1979- 18 May 2007 (has links)
Orientador: Claudia Maria Bauzer Medeiros / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-08T21:08:38Z (GMT). No. of bitstreams: 1
GomesJunior_LuizCelso_M.pdf: 2851826 bytes, checksum: 3831116628e7070356b5ae9e89047c6a (MD5)
Previous issue date: 2007 / Resumo: A vida na Terra forma uma ampla e complexa rede de interações que alguns especialistas estimam conter até 80 milhões de espécies diferentes. Abordar o tema biodiversidade é essencialmente um esforço distribuído. Uma instituição de pesquisa, seja qual for seu tamanho, é capaz de lidar com apenas uma pequena fração desta variedade. Portanto, para conduzir pesquisas ecologicamente relevantes, é preciso coletar porções de informação sobre espécies e seus habitats em um grande número de instituições e correlacioná-las usando conhecimento geográfico, biológico e ecológico. A distribuição e a heterogeneidade inerentes aos dados de biodiversidade impõem diversos desafios, por exemplo, como encontrar informação relevante na Web, como resolver divergências sintáticas e semânticas e como processar vários predicados ecológicos e espaciais. Esta dissertação apresenta uma arquitetura que explora avanços em interoperabilidade de dados e tecnologias da Web semântica para tratar destes desafios. A solução se baseia em ontologias e anotação de repositórios para prover compartilhamento e descoberta de dados, estimulando a pesquisa colaborativa em biodiversidade. Um protótipo usando dados reais implementa parte da arquitetura / Abstract: Life on Earth forms a broad and complex network of interactions, which some experts estimate to be composed of up to 80 million different species. Tackling biodiversity is essentially a distributed effort. A research institution, no matter how big, can only deal with a small fraction of this variety. Therefore, to carry ecologically-relevant biodiversity research, one must collect chunks of information on species and their habitats from alarge number of institutions and correlate them using geographic, biologic and ecological knowledge. Distribution and heterogeneity inherent to biodiversity data pose several challenges, such as how to find relevant information on the Web, how to solve syntactic and semantic heterogeneity, and how to process a variety of ecological and spatial predicates. This dissertation presents an architecture that exploits advances in data interoperability andsemantic Web technologies to meet these challenges. The solution relies on ontologies and annotated repositories to support data sharing, discovery and collaborative biodiversity research. A prototype using real data has implemented part of the architecture / Mestrado / Banco de Dados / Mestre em Ciência da Computação
|
64 |
Distributed database support for networked real-time multiplayer gamesGrimm, Henrik January 2002 (has links)
The focus of this dissertation is on large-scale and long-running networked real-time multiplayer games. In this type of games, each player controls one or many entities, which interact in a shared virtual environment. Three attributes - scalability, security, and fault tolerance - are considered essential for this type of games. The normal approaches for building this type of games, using a client/server or peer-to-peer architecture, fail in achieving all three attributes. We propose a server-network architecture that supports these attributes. In this architecture, a cluster of servers collectively manage the game state and each server manages a separate region of the virtual environment. We discuss how the architecture can be extended using proxies, and we compare it to other similar architectures. Further, we investigate how a distributed database management system can support the proposed architecture. Since efficiency is very important in this type of games, some properties of traditional database systems must be relaxed. We also show how methods for increasing scalability, such as interest management and dead reckoning, can be implemented in a database system. Finally, we suggest how the proposed architecture can be validated using a simulation of a large-scale game.
|
65 |
Partition Aware Database Replication : A state-update transfer strategy based on PRiDeOlby, Johan January 2007 (has links)
Distributed real-time databases can be used to support data sharing for applications in wireless ad-hoc networks. In such networks, topology changes frequently and partitions may be unpredictable and last for an unbounded period. In this thesis, the existing database replication protocol PRiDe is extended to handle such long-lasting partitions. The protocol uses optimistic and detached replication to provide predictable response times in unpredictable networks and forward conflict resolution to guarantee progress. The extension, pPRiDe, combines update and state transfer strategies. Update transfer for intra-partition communication can reduce bandwidth usage and ease conflict resolution. State transfer for inter partition conflicts removes dependency on a common state between partitions prior to the merge to apply update messages on. This makes the resource usage independent of the life span of partitions. This independence comes at the cost of global data stability guarantees and pPRiDe can thus only provide per partition guarantees. The protocol supports application specific conflict resolution routines for both state and update conflicts. A basic simulator for mobile ad-hoc networks has been developed to validate that pPRiDe provides eventual consistency. pPRiDe shows that a hybrid approach to change propagation strategy can be beneficial in networks where collaboration by data sharing within long lasting partitions and predictable resource usage is necessary. These types of systems already require the conflict management routines necessary for pPRiDe and can benefit from an existing protocol. In addition to pPRiDe and the simulator this thesis provides a flexible object database suitable for future works and an implementation of PRiDe on top of that database.
|
66 |
Access Control and Storage of Distributed IoT DataMends, Diana 03 April 2018 (has links)
There has been a growth of a class of databases known as the Not only SQL (NoSQL) databases in recent years. Its quick growth has been fueled by a high demand by businesses as it offers a convenient way to store data and is significantly different from our traditional relational databases. It is easy to process unstructured data, offers a cloud-friendly ap- proach and grows through the distribution of data over lots of commodity computers. Most of these NoSQL databases are distributed in several different locations, spanning countries and are known as geo-distributed cloud datastores.
We work to customize one of these known as Cassandra. Given the size of the database and the size of applications accessing the data stored, it has been challenging to customize it to meet existing application Service Level Agreement (SLAs). We live in an era of data breaches and even though some types of information are stripped of all sensitive data, there are ways to easily identify and link it to data of real persons or government. Data saved in different countries are subject to the rules and regulations of that specific country and security measures employed to safeguard consumer data.
In this thesis, we describe mechanisms for selectively replicating data in a large scale NoSQL datastore in respect of privacy and legal regulations. We introduce an easily extensible constraint language to implement these policy constraints through the creation of a pluggable topology provider in the configuration files of Cassandra. Experiments using the modified Cassandra trunk demonstrate that our techniques work well, respect response times and improves read and write latencies.
|
67 |
Data Mining-based Fragmentation for Query OptimizationSridharan, Srilakshmi 27 October 2014 (has links)
No description available.
|
68 |
Developing distributed applications with distributed heterogenous databasesDixon, Eric Richard 19 May 2010 (has links)
This report identifies how Tuxedo fits into the scheme of distributed database processing. Tuxedo is an On-Line Transaction Processing (OLTP) system. Tuxedo was studied because it is the oldest and most widely used transaction processing system on UNIX. That means that it is established, extensively tested, and has the most tools available to extend its capabilities. The disadvantage of Tuxedo is that newer UNIX OLTP systems are often based on more advanced technology. For this reason, other OLTPs were examined to compare their additional capabilities with those offered by Tuxedo.
As discussed in Sections I and II, Tuxedo is modeled according to the X/Open's Distributed Transaction Processing (DTP) model. The DTP model includes three pieces: Application Programs (APs), Transaction Monitors (TMs), and Resource Managers (RMs). Tuxedo provides a TM in the model and uses the XA specification to communicate with RMs (e.g. Informix). Tuxedo's TX specification, which defines communications between the APs and TMs is also being considered by X/Open as the standard interface between APs and TMs. There is currently no standard interface between those two pieces. Tuxedo conforms to all X/Open's current standards related to the model.
Like the other major OLTPs for UNIX, Tuxedo is based on the client/server model. Tuxedo expands that support to include both synchronous and asynchronous service calls. Tuxedo calls that extension the enhanced client/server model. Tuxedo also expands their OLTP support to allow distributed transactions to include databases on IBM compatible Personal Computers (PCs) and proprietary mainframe (Host) systems. Tuxedo calls this extension Enterprise Transaction Processing (ETP). The name enterprise comes from the fact that since Tuxedo supports database transactions supporting UNIX, PCs. and Host computers, transactions can span the computer systems of entire businesses, or enterprises.
Tuxedo is not as robust as the distributed database system model presented by Date. Tuxedo requires programmer participation in providing the capabilities that Date says the distributed database manager should provide. The coordinating process is the process which is coordinating a global transaction. According to Date's model, agents exist on remote sites participating in the transaction in order to handle the calls to the local resource manager. In Tuxedo, the programmer must provide that agent code in the form of services.
Tuxedo does provide location transparency, but not in the form Date describes. Date describes location transparency as controlled by a global catalog. In Tuxedo, location transparency is provided by the location of servers as specified in the Tuxedo configuration file. Tuxedo also does not provide replication transparency as specified by Date. In Tuxedo, the programmer must write services which maintain replicated records.
Date also describes five problems faced by distributed database managers. The first problem is query processing. Tuxedo provides capabilities to fetch records from databases, but does not provide the capabilities to do joins across distributed databases. The second problem is update propagation. Tuxedo does not provide for replication transparency. Tuxedo does provide enough capabilities for programmers to reliably maintain replicated records. The third problem is concurrency control, which is supported by Tuxedo. The fourth problem is the commit protocol. Tuxedo's commit protocol is the two-phase commit protocol. The fifth problem is the global catalog. Tuxedo does not have a global catalog.
The other comparison presented in the paper was between Tuxedo and the other major UNIX OL TPs: Transarc's Encina, Top End, and CICS. Tuxedo is the oldest and has the largest market share. This gives 38 Tuxedo the advantage of being the most thoroughly tested and the most stable. Tuxedo also has the most tools available to extend its capabilities. The disadvantage Tuxedo has is that since it is the oldest, it is based on the oldest technology.
Transarc's Encina is the most advanced UNIX OLTP. Encina is based on DCB and supports multithreading. However, Encina has been slow to market and has had stability problems because of its advanced features. Also, since Encina is based on DCB, its success is tied to the success of DCB. Top End is less advanced than Encina, but more advanced than Tuxedo. It is also much more stable than Encina. However. Top End is only now being ported from the NCR machines on which it was originally built. CICS is not yet commercially available. CICS is good for companies with CICS code to port to UNIX and CICS programmers who are already experts. The disadvantage to CICS is that companies which work with UNIX already and do not use CICS will find the interface less natural than Tuxedo, which originated under UNIX. / Master of Science
|
69 |
A reference model for the process control domain of applicationDhevcharran, Nirvani 11 1900 (has links)
The process control domain is intrinsically complex and dynamic. It has proved to be difficult to construct and maintain process control systems under the traditional software development methodologies. Object Orientation is the latest paradigm in software development. The reason for its widespread acceptance is that it allows the application of the principles of hierarchical structuring and component abstraction which is essential in building
large systems. It also promotes component reusability which makes systems easier to maintain and modify.
For the process control domain, these are important benefits. Furthermore, most process control systems have physical devices which can be modeled naturally as objects with the timing and performance issues of each object directly addressed. A Target System Reference Model which addresses various aspects of the process
control domain is proposed within this dissertation. The objective is to provide a frame of reference within which a process control system can function. / Computing / M. Sc. (Computer Science)
|
70 |
A reference model for the process control domain of applicationDhevcharran, Nirvani 11 1900 (has links)
The process control domain is intrinsically complex and dynamic. It has proved to be difficult to construct and maintain process control systems under the traditional software development methodologies. Object Orientation is the latest paradigm in software development. The reason for its widespread acceptance is that it allows the application of the principles of hierarchical structuring and component abstraction which is essential in building
large systems. It also promotes component reusability which makes systems easier to maintain and modify.
For the process control domain, these are important benefits. Furthermore, most process control systems have physical devices which can be modeled naturally as objects with the timing and performance issues of each object directly addressed. A Target System Reference Model which addresses various aspects of the process
control domain is proposed within this dissertation. The objective is to provide a frame of reference within which a process control system can function. / Computing / M. Sc. (Computer Science)
|
Page generated in 0.0896 seconds