Global ETD Search

11	Parallel itemset mining in massively distributed environments / Fouille de motifs en parallèle dans des environnements massivement distribués Salah, Saber 20 April 2016 (has links) Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes. Les méthodes d'analyse de données ont toujours été confrontées à des quantités qui mettent en difficulté les capacités de traitement, ou qui les dépassent. Pour franchir les verrous technologiques associés à ces questions d'analyse, la communauté peut se tourner vers les techniques de calcul distribué. En particulier, l'extraction de motifs, qui est un des problèmes les plus abordés en fouille de données, présente encore souvent de grandes difficultés dans le contexte de la distribution massive et du parallélisme. Dans cette thèse, nous abordons deux sujets majeurs liés à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). / Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes.à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). Extraction de motifs Données distribuées Classification Pattern Mining Data distribution Classification
12	The distribution of geographic information systems data in a computer communications network. Veenendaal, Bert January 1999 (has links) Geographic information systems (GIS) are developing in a rapidly expanding distributed environment. With the ever-increasing growth of computer networks and the Internet in particular, it is imperative that GIS take advantage of distributed data technologies to provide users and applications with shared and improved access to geographic data.Geographic data distribution design is concerned with determining what data gets placed at which computer network sites and involves the issues of data partitioning, allocation and dynamic migration. Partitioning is concerned with how data, or fragments of the data, are apportioned to partitions. These partitions must then be assigned to network sites in an allocation process. Because data usage and access changes by applications in a dynamic environment, migration strategies are necessary to redistribute the data. In order for data distribution to reflect current usage patterns of applications, the design process must obtain and accumulate data usage information from applications.This dissertation first details the predicate fragmentation (PF) model. The core of the model is the PF tree that has been developed and implemented to store and maintain usage information. User predicates, obtained from application queries, are inserted into the tree and primitive predicates can be identified from the tree. These primitive predicates define the fragmentation from which a data distribution can be determined. Predicate insertion and pruning operations are essential to the maintenance of the tree structure.A methodology that uses the PF model to obtain a partitioning, allocation and migration strategy is then outlined. The fragments identified from the PF trees are aggregated into partitions that are then assigned to individual network sites using a site access allocation strategy. A dynamic migration strategy then uses changes in the PF ++ / trees to identify the data that must be migrated to a new site in order to accommodate the changing application environment.The implementation of the geographic data distribution methodology is referred to as GEODDIS. The methodology was tested and evaluated using a mineral occurrence application called GEOMINE which was developed with the ArcInfo GIS. The results indicate that geographic data distribution performs well when successive applications have similar data usage requirements. The implementation of the geographic data distribution methodology is referred to as GEODDIS. The methodology was tested and evaluated using a mineral occurrence application called GEOMINE which was developed with the ArcInfo GIS. The results indicate that geographic data distribution performs well when successive applications have similar data usage requirements. For applications with very different data usage patterns, the performance decreases to the worst case scenario where all the data must be transferred to the site where it is used. The tree pruning and data migration operations are essential to maintaining a PF tree structure and distribution that reflects the current data usage of applications.
13	Efficient Methods for Arbitrary Data Redistribution Bai, Sheng-Wen 21 July 2005 (has links) In many parallel programs, run-time data redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. For the heterogeneous computation environment, irregular data redistributions can be used to adjust data assignment. Since data redistribution is performed at run-time, there is a performance trade-off between the efficiency of the new data distribution for a subsequent phase of an algorithm and the cost of redistributing array among processors. Thus, efficient methods for performing data redistribution are of great importance for the development of distributed memory compilers for data-parallel programming languages. For the regular data redistribution, two approaches are presented in this dissertation, indexing approach and packing/unpacking approach. In the indexing approach, we propose a generalized basic-cycle calculation (GBCC) technique to efficiently generate the communication sets for a BLOCK-CYCLIC(s) over P processors to BLOCK-CYCLIC(t) over Q processors data redistribution. In the packing/unpacking approach, we present a User-Defined Types (UDT) method to perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution, using MPI user-defined datatypes. This method reduces the required memory buffers and avoids unnecessary movement of data. For the irregular data redistribution, in this dissertation, an Essential Cycle Calculation (ECC) method will be presented. The above methods are originally developed for one dimension array. However, the multi-dimension array can also be performed by simply applying these methods dimension by dimension starting from the first (last) dimension if array is in column-major (row-major). GBCC ECC Data Redistribution MPI User-Defined Datatypes Data Distribution
14	Using TENA to Enable Next Generation Range Control and Data Distribution Schmidt, Andrew, Wigent, Mark A. 10 1900 (has links) ITC/USA 2014 Conference Proceedings / The Fiftieth Annual International Telemetering Conference and Technical Exhibition / October 20-23, 2014 / Town and Country Resort & Convention Center, San Diego, CA / There is a need for a capability that enables setup and execution of tests, including integration of new instrumentation into the T&E range environment more rapidly and reliably than with existing methods, and with reduced cost and effort. Moreover, because individual ranges have developed approaches to range control and data distribution which are often range-specific and which call for significant interface development when integrating new instrumentation and systems to the range environment, there is a need to develop a range control and data distribution mechanism that can be reused throughout the T&E community. The purpose of the Next Generation Range Control and Data Distribution (NGRC&DD) project, which is funded by the Test Resource Management Center's (TRMC) Central Test and Evaluation Investment Program (CTEIP), is to develop a capability that modernizes and enhances system control and data distribution in DoD ranges. The Test and Training Enabling Architecture (TENA) is an underlying technology used by NGRC&DD. Migrating to the TENA middleware requires a fundamental reexamination of what data is produced and how it is distributed. TENA offers some tools and mechanisms for ranges that are advantageous relative to traditional methods of data dissemination as well as other versions of middleware available to the community. TENA PMRF
15	WINGS CONCEPT: PRESENT AND FUTURE Harris, Jim, Downing, Bob 10 1900 (has links) International Telemetering Conference Proceedings / October 20-23, 2003 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The Western Aeronautical Test Range (WATR) of NASA’s Dryden Flight Research Center (DFRC) is facing a challenge in meeting the technology demands of future flight mission projects. Rapid growth in technology for aircraft has resulted in complexity often surpassing the capabilities of the current WATR real-time processing and display systems. These current legacy systems are based on an architecture that is over a decade old. In response, the WATR has initiated the development of the WATR Integrated Next Generation System (WINGS). The purpose of WINGS is to provide the capability to acquire data from a variety of sources and process that data for subsequent analysis and display to Project Users in the WATR Mission Control Centers (MCCs) in real-time, near real-time and subsequent post-mission analysis. WINGS system architecture will bridge the continuing gap between new research flight test requirements and capability by distributing current system architectures to provide incremental and iterative system upgrades. Telemetry Ground Segment Network Real-time Data Distribution
16	Dynamic Grid-Based Data Distribution Management in Large Scale Distributed Simulations Roy, Amber Joyce 12 1900 (has links) Distributed simulation is an enabling concept to support the networked interaction of models and real world elements that are geographically distributed. This technology has brought a new set of challenging problems to solve, such as Data Distribution Management (DDM). The aim of DDM is to limit and control the volume of the data exchanged during a distributed simulation, and reduce the processing requirements of the simulation hosts by relaying events and state information only to those applications that require them. In this thesis, we propose a new DDM scheme, which we refer to as dynamic grid-based DDM. A lightweight UNT-RTI has been developed and implemented to investigate the performance of our DDM scheme. Our results clearly indicate that our scheme is scalable and it significantly reduces both the number of multicast groups used, and the message overhead, when compared to previous grid-based allocation schemes using large-scale and real-world scenarios. Computer simulation. distributed simulation data distribution management DDM
17	Optimized Composition of Parallel Components on a Linux Cluster Al-Trad, Anas January 2012 (has links) We develop a novel framework for optimized composition of explicitly parallel software components with different implementation variants given the problem size, data distribution scheme and processor group size on a Linux cluster. We consider two approaches (or two cases of the framework). In the first approach, dispatch tables are built using measurement data obtained offline by executions for some (sample) points in the ranges of the context properties. Inter-/extrapolation is then used to do actual variant-selection for a given execution context at run-time. In the second approach, a cost function of each component variant is provided by the component writer for variant-selection. These cost functions can internally lookup measurements' tables built, either offline or at deployment time, for computation- and communication-specific primitives. In both approaches, the call to an explicitly parallel software component (with different implementation variants) is made via a dispatcher instead of calling a variant directly. As a case study, we apply both approaches on a parallel component for matrix multiplication with multiple implementation variants. We implemented our variants using Message Passing Interface (MPI). The results show the reduction in execution time for the optimally composed applications compared to applications with hard-coded composition. In addition, the results show the comparison of estimated and measured times for each variant using different data distributions, processor group and problem sizes. Software Composition Parallel Components Linux Cluster Message Passing Interface Implementations Variant Data Distribution
18	Data Sharing And Access With A Corba Data Distribution Service Implementation Dursun, Mustafa 01 September 2006 (has links) (PDF) Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
19	Data Distribution Service for Industrial Automation Yang, Jinsong January 2012 (has links) In industrial automation systems, there is usually large volume of data which needs to be delivered to right places at the right time. In addition, large number of nodes in the automation systems are usually distributed which increases the complexity that there needs to be more point-to-point Ethernet-connections in the network. Hence, it is necessary to apply data-centric design and reduce the connection complexity. Data Distributed Service for Real-Time Systems (DDS) is a data-centric middleware specification adopted by Object Management Group (OMG). It uses the Real-Time Publish-Subscribe protocol as its wiring protocol and targets for mission- and business-critical systems. The IEC 61499 Standard defines an open architecture for the next generation of distributed control and automation systems. This thesis presents the structure and key features of DDS and builds a model of real-time distributed system based on the IEC 61499 Standard. Then a performance evaluation of the DDS communication based on this model is carried out. The traditional socket-based communication is also evaluated to act as a reference for the DDS communication. The results of the evaluation mostly show that DDS is considered as a good solution to reduce the complexity of the Ethernet connections in distributed systems and can be applied to some classes of industrial automation systems. Data Distribution Service Industrial Automation Real-Time Systems Communication Computer Sciences Datavetenskap (datalogi)
20	Systém pro správu multimediálních dat a jejich distribuci / The system for multimedia data managing and their distribution Paulech, Michal January 2016 (has links) This thesis is about description and design of system for media files management and their distribution. System allows users to upload media files in different formats. Media files are distributed to devices on which they are played. The system creates an overview of the playback based on records that the device sent to the system. The thesis describes the technology used to create the system. Furthermore, the work is a description of the structure of the system, its functions and a description of system implementation.

Search results