11 |
Parallel itemset mining in massively distributed environments / Fouille de motifs en parallèle dans des environnements massivement distribuésSalah, Saber 20 April 2016 (has links)
Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes. Les méthodes d'analyse de données ont toujours été confrontées à des quantités qui mettent en difficulté les capacités de traitement, ou qui les dépassent. Pour franchir les verrous technologiques associés à ces questions d'analyse, la communauté peut se tourner vers les techniques de calcul distribué. En particulier, l'extraction de motifs, qui est un des problèmes les plus abordés en fouille de données, présente encore souvent de grandes difficultés dans le contexte de la distribution massive et du parallélisme. Dans cette thèse, nous abordons deux sujets majeurs liés à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). / Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes.à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie).
|
12 |
The distribution of geographic information systems data in a computer communications network.Veenendaal, Bert January 1999 (has links)
Geographic information systems (GIS) are developing in a rapidly expanding distributed environment. With the ever-increasing growth of computer networks and the Internet in particular, it is imperative that GIS take advantage of distributed data technologies to provide users and applications with shared and improved access to geographic data.Geographic data distribution design is concerned with determining what data gets placed at which computer network sites and involves the issues of data partitioning, allocation and dynamic migration. Partitioning is concerned with how data, or fragments of the data, are apportioned to partitions. These partitions must then be assigned to network sites in an allocation process. Because data usage and access changes by applications in a dynamic environment, migration strategies are necessary to redistribute the data. In order for data distribution to reflect current usage patterns of applications, the design process must obtain and accumulate data usage information from applications.This dissertation first details the predicate fragmentation (PF) model. The core of the model is the PF tree that has been developed and implemented to store and maintain usage information. User predicates, obtained from application queries, are inserted into the tree and primitive predicates can be identified from the tree. These primitive predicates define the fragmentation from which a data distribution can be determined. Predicate insertion and pruning operations are essential to the maintenance of the tree structure.A methodology that uses the PF model to obtain a partitioning, allocation and migration strategy is then outlined. The fragments identified from the PF trees are aggregated into partitions that are then assigned to individual network sites using a site access allocation strategy. A dynamic migration strategy then uses changes in the PF ++ / trees to identify the data that must be migrated to a new site in order to accommodate the changing application environment.The implementation of the geographic data distribution methodology is referred to as GEODDIS. The methodology was tested and evaluated using a mineral occurrence application called GEOMINE which was developed with the ArcInfo GIS. The results indicate that geographic data distribution performs well when successive applications have similar data usage requirements. The implementation of the geographic data distribution methodology is referred to as GEODDIS. The methodology was tested and evaluated using a mineral occurrence application called GEOMINE which was developed with the ArcInfo GIS. The results indicate that geographic data distribution performs well when successive applications have similar data usage requirements. For applications with very different data usage patterns, the performance decreases to the worst case scenario where all the data must be transferred to the site where it is used. The tree pruning and data migration operations are essential to maintaining a PF tree structure and distribution that reflects the current data usage of applications.
|
13 |
Efficient Methods for Arbitrary Data RedistributionBai, Sheng-Wen 21 July 2005 (has links)
In many parallel programs, run-time data redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. For the heterogeneous computation environment, irregular data redistributions can be used to adjust data assignment. Since data redistribution is performed at run-time, there is a performance trade-off between the efficiency of the new data distribution for a subsequent phase of an algorithm and the cost of redistributing array among processors. Thus, efficient methods for performing data redistribution are of great importance for the development of distributed memory compilers for data-parallel programming languages.
For the regular data redistribution, two approaches are presented in this dissertation, indexing approach and packing/unpacking approach. In the indexing approach, we propose a generalized basic-cycle calculation (GBCC) technique to efficiently generate the communication sets for a BLOCK-CYCLIC(s) over P processors to BLOCK-CYCLIC(t) over Q processors data redistribution. In the packing/unpacking approach, we present a User-Defined Types (UDT) method to perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution, using MPI user-defined datatypes. This method reduces the required memory buffers and avoids unnecessary movement of data. For the irregular data redistribution, in this dissertation, an Essential Cycle Calculation (ECC) method will be presented.
The above methods are originally developed for one dimension array. However, the multi-dimension array can also be performed by simply applying these methods dimension by dimension starting from the first (last) dimension if array is in column-major (row-major).
|
14 |
Using TENA to Enable Next Generation Range Control and Data DistributionSchmidt, Andrew, Wigent, Mark A. 10 1900 (has links)
ITC/USA 2014 Conference Proceedings / The Fiftieth Annual International Telemetering Conference and Technical Exhibition / October 20-23, 2014 / Town and Country Resort & Convention Center, San Diego, CA / There is a need for a capability that enables setup and execution of tests, including integration of new instrumentation into the T&E range environment more rapidly and reliably than with existing methods, and with reduced cost and effort. Moreover, because individual ranges have developed approaches to range control and data distribution which are often range-specific and which call for significant interface development when integrating new instrumentation and systems to the range environment, there is a need to develop a range control and data distribution mechanism that can be reused throughout the T&E community. The purpose of the Next Generation Range Control and Data Distribution (NGRC&DD) project, which is funded by the Test Resource Management Center's (TRMC) Central Test and Evaluation Investment Program (CTEIP), is to develop a capability that modernizes and enhances system control and data distribution in DoD ranges. The Test and Training Enabling Architecture (TENA) is an underlying technology used by NGRC&DD. Migrating to the TENA middleware requires a fundamental reexamination of what data is produced and how it is distributed. TENA offers some tools and mechanisms for ranges that are advantageous relative to traditional methods of data dissemination as well as other versions of middleware available to the community.
|
15 |
WINGS CONCEPT: PRESENT AND FUTUREHarris, Jim, Downing, Bob 10 1900 (has links)
International Telemetering Conference Proceedings / October 20-23, 2003 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The Western Aeronautical Test Range (WATR) of NASA’s Dryden Flight Research Center (DFRC) is
facing a challenge in meeting the technology demands of future flight mission projects. Rapid growth in
technology for aircraft has resulted in complexity often surpassing the capabilities of the current WATR
real-time processing and display systems. These current legacy systems are based on an architecture
that is over a decade old. In response, the WATR has initiated the development of the WATR
Integrated Next Generation System (WINGS). The purpose of WINGS is to provide the capability to
acquire data from a variety of sources and process that data for subsequent analysis and display to
Project Users in the WATR Mission Control Centers (MCCs) in real-time, near real-time and
subsequent post-mission analysis. WINGS system architecture will bridge the continuing gap between
new research flight test requirements and capability by distributing current system architectures to
provide incremental and iterative system upgrades.
|
16 |
Dynamic Grid-Based Data Distribution Management in Large Scale Distributed SimulationsRoy, Amber Joyce 12 1900 (has links)
Distributed simulation is an enabling concept to support the networked interaction of models and real world elements that are geographically distributed. This technology has brought a new set of challenging problems to solve, such as Data Distribution Management (DDM). The aim of DDM is to limit and control the volume of the data exchanged during a distributed simulation, and reduce the processing requirements of the simulation hosts by relaying events and state information only to those applications that require them. In this thesis, we propose a new DDM scheme, which we refer to as dynamic grid-based DDM. A lightweight UNT-RTI has been developed and implemented to investigate the performance of our DDM scheme. Our results clearly indicate that our scheme is scalable and it significantly reduces both the number of multicast groups used, and the message overhead, when compared to previous grid-based allocation schemes using large-scale and real-world scenarios.
|
17 |
Optimized Composition of Parallel Components on a Linux ClusterAl-Trad, Anas January 2012 (has links)
We develop a novel framework for optimized composition of explicitly parallel software components with different implementation variants given the problem size, data distribution scheme and processor group size on a Linux cluster. We consider two approaches (or two cases of the framework). In the first approach, dispatch tables are built using measurement data obtained offline by executions for some (sample) points in the ranges of the context properties. Inter-/extrapolation is then used to do actual variant-selection for a given execution context at run-time. In the second approach, a cost function of each component variant is provided by the component writer for variant-selection. These cost functions can internally lookup measurements' tables built, either offline or at deployment time, for computation- and communication-specific primitives. In both approaches, the call to an explicitly parallel software component (with different implementation variants) is made via a dispatcher instead of calling a variant directly. As a case study, we apply both approaches on a parallel component for matrix multiplication with multiple implementation variants. We implemented our variants using Message Passing Interface (MPI). The results show the reduction in execution time for the optimally composed applications compared to applications with hard-coded composition. In addition, the results show the comparison of estimated and measured times for each variant using different data distributions, processor group and problem sizes.
|
18 |
Data Sharing And Access With A Corba Data Distribution Service ImplementationDursun, Mustafa 01 September 2006 (has links) (PDF)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
|
19 |
Data Distribution Service for Industrial AutomationYang, Jinsong January 2012 (has links)
In industrial automation systems, there is usually large volume of data which needs to be delivered to right places at the right time. In addition, large number of nodes in the automation systems are usually distributed which increases the complexity that there needs to be more point-to-point Ethernet-connections in the network. Hence, it is necessary to apply data-centric design and reduce the connection complexity. Data Distributed Service for Real-Time Systems (DDS) is a data-centric middleware specification adopted by Object Management Group (OMG). It uses the Real-Time Publish-Subscribe protocol as its wiring protocol and targets for mission- and business-critical systems. The IEC 61499 Standard defines an open architecture for the next generation of distributed control and automation systems. This thesis presents the structure and key features of DDS and builds a model of real-time distributed system based on the IEC 61499 Standard. Then a performance evaluation of the DDS communication based on this model is carried out. The traditional socket-based communication is also evaluated to act as a reference for the DDS communication. The results of the evaluation mostly show that DDS is considered as a good solution to reduce the complexity of the Ethernet connections in distributed systems and can be applied to some classes of industrial automation systems.
|
20 |
Systém pro správu multimediálních dat a jejich distribuci / The system for multimedia data managing and their distributionPaulech, Michal January 2016 (has links)
This thesis is about description and design of system for media files management and their distribution. System allows users to upload media files in different formats. Media files are distributed to devices on which they are played. The system creates an overview of the playback based on records that the device sent to the system. The thesis describes the technology used to create the system. Furthermore, the work is a description of the structure of the system, its functions and a description of system implementation.
|
Page generated in 0.1192 seconds