Global ETD Search

31	Efficient Synchronized Data Distribution Management in Distributed Simulations Tacic, Ivan 10 February 2005 (has links) Data distribution management (DDM) is a mechanism to interconnect data producers and data consumers in a distributed application. Data producers provide useful data to consumers in the form of messages. For each message produced, DDM determines the set of data consumers interested in receiving the message and delivers it to those consumers. We are particularly interested in DDM techniques for parallel and distributed discrete event simulations. Thus far, researchers have treated synchronization of events (i.e. time management) and DDM independent of each other. This research focuses on how to realize time managed DDM mechanisms. The main reason for time-managed DDM is to ensure that changes in the routing of messages from producers to consumers occur in a correct sequence. Also time managed DDM avoids non-determinism in the federation execution, which may result in non-repeatable executions. An optimistic approach to time managed DDM is proposed where one allows DDM events to be processed out of time stamp order, but a detection and recovery procedure is used to recover from such errors. These mechanisms are tailored to the semantics of the DDM operations to ensure an efficient realization. A correctness proof is presented to verify the algorithm correctly synchronizes DDM events. We have developed a fully distributed implementation of the algorithm within the framework of the Georgia Tech Federated Simulation Development Kit (FDK) software. A performance evaluation of the synchronized DDM mechanism has been completed in a loosely coupled distributed system consisting of a network of workstations connected over a local area network (LAN). We compare time-managed versus unsynchronized DDM for two applications that exercise different mobility patterns: one based on a military simulation and a second utilizing a synthetic workload. The experiments and analysis illustrate that synchronized DDM performance depends on several factors: the simulations model (e.g. lookahead), applications mobility patterns and the network hardware (e.g. size of network buffers). Under certain mobility patterns, time-managed DDM is as efficient as unsynchronized DDM. There are also mobility patterns where time-managed DDM overheads become significant, and we show how they can be reduced. Time management Synchronization Interest management Data distribution management Computer simulation Discrete-time systems System analysis
32	Data Distribution Management In Large-scale Distributed Environments Gu, Yunfeng 15 February 2012 (has links) Data Distribution Management (DDM) deals with two basic problems: how to distribute data generated at the application layer among underlying nodes in a distributed system and how to retrieve data back whenever it is necessary. This thesis explores DDM in two different network environments: peer-to-peer (P2P) overlay networks and cluster-based network environments. DDM in P2P overlay networks is considered a more complete concept of building and maintaining a P2P overlay architecture than a simple data fetching scheme, and is closely related to the more commonly known associative searching or queries. DDM in the cluster-based network environment is one of the important services provided by the simulation middle-ware to support real-time distributed interactive simulations. The only common feature shared by DDM in both environments is that they are all built to provide data indexing service. Because of these fundamental differences, we have designed and developed a novel distributed data structure, Hierarchically Distributed Tree (HD Tree), to support range queries in P2P overlay networks. All the relevant problems of a distributed data structure, including the scalability, self-organizing, fault-tolerance, and load balancing have been studied. Both theoretical analysis and experimental results show that the HD Tree is able to give a complete view of system states when processing multi-dimensional range queries at different levels of selectivity and in various error-prone routing environments. On the other hand, a novel DDM scheme, Adaptive Grid-based DDM scheme, is proposed to improve the DDM performance in the cluster-based network environment. This new DDM scheme evaluates the input size of a simulation based on probability models. The optimum DDM performance is best approached by adapting the simulation running in a mode that is most appropriate to the size of the simulation. Data Distribution Management DDM Range Query Associative searching Multi-dimensional Simulation Distributed P2P HLA/RTI Cluster Data structure Overlay Region-based Grid-based HD Tree AGB DDM
33	Data Distribution Management In Large-scale Distributed Environments Gu, Yunfeng January 2012 (has links) Data Distribution Management (DDM) deals with two basic problems: how to distribute data generated at the application layer among underlying nodes in a distributed system and how to retrieve data back whenever it is necessary. This thesis explores DDM in two different network environments: peer-to-peer (P2P) overlay networks and cluster-based network environments. DDM in P2P overlay networks is considered a more complete concept of building and maintaining a P2P overlay architecture than a simple data fetching scheme, and is closely related to the more commonly known associative searching or queries. DDM in the cluster-based network environment is one of the important services provided by the simulation middle-ware to support real-time distributed interactive simulations. The only common feature shared by DDM in both environments is that they are all built to provide data indexing service. Because of these fundamental differences, we have designed and developed a novel distributed data structure, Hierarchically Distributed Tree (HD Tree), to support range queries in P2P overlay networks. All the relevant problems of a distributed data structure, including the scalability, self-organizing, fault-tolerance, and load balancing have been studied. Both theoretical analysis and experimental results show that the HD Tree is able to give a complete view of system states when processing multi-dimensional range queries at different levels of selectivity and in various error-prone routing environments. On the other hand, a novel DDM scheme, Adaptive Grid-based DDM scheme, is proposed to improve the DDM performance in the cluster-based network environment. This new DDM scheme evaluates the input size of a simulation based on probability models. The optimum DDM performance is best approached by adapting the simulation running in a mode that is most appropriate to the size of the simulation. Data Distribution Management DDM Range Query Associative searching Multi-dimensional Simulation Distributed P2P HLA/RTI Cluster Data structure Overlay Region-based Grid-based HD Tree AGB DDM
34	Differential evolution technique on weighted voting stacking ensemble method for credit card fraud detection Dolo, Kgaugelo Moses 12 1900 (has links) Differential Evolution is an optimization technique of stochastic search for a population-based vector, which is powerful and efficient over a continuous space for solving differentiable and non-linear optimization problems. Weighted voting stacking ensemble method is an important technique that combines various classifier models. However, selecting the appropriate weights of classifier models for the correct classification of transactions is a problem. This research study is therefore aimed at exploring whether the Differential Evolution optimization method is a good approach for defining the weighting function. Manual and random selection of weights for voting credit card transactions has previously been carried out. However, a large number of fraudulent transactions were not detected by the classifier models. Which means that a technique to overcome the weaknesses of the classifier models is required. Thus, the problem of selecting the appropriate weights was viewed as the problem of weights optimization in this study. The dataset was downloaded from the Kaggle competition data repository. Various machine learning algorithms were used to weight vote a class of transaction. The differential evolution optimization techniques was used as a weighting function. In addition, the Synthetic Minority Oversampling Technique (SMOTE) and Safe Level Synthetic Minority Oversampling Technique (SL-SMOTE) oversampling algorithms were modified to preserve the definition of SMOTE while improving the performance. Result generated from this research study showed that the Differential Evolution Optimization method is a good weighting function, which can be adopted as a systematic weight function for weight voting stacking ensemble method of various classification methods. / School of Computing / M. Sc. (Computing) Differentia evolution Weighted voting Stacking ensemble method Class distribution Data distribution SMOTE Machine learning Bid data Credit card fraud 364.163 Credit Card Fraud
35	REAL-TIME INTEGRATION OF RADAR INFORMATION, AND GROUND AND RADIOSONDE METEOROLOGY WITH FLIGHT RESEARCH DATA Billings, Don, Wei, Mei, Leung, Joseph, Aoyagi, Michio, Shigemoto, Fred, Honeyman, Rob 10 1900 (has links) International Telemetering Conference Proceedings / October 26-29, 1998 / Town & Country Resort Hotel and Convention Center, San Diego, California / Although PCM/TDM framed data is one of the most prevalent formats handled by flight test ranges, it is often required to acquire and process other types. Examples of such non-standard data types are radar position information and meteorological data from both ground based and radiosonde systems. To facilitate the process and management of such non-standard data types, a micro-processor based system was developed to acquire and transform them into a standard PCM/TDM data frame. This obviated the expense of developing additional special software and hardware to handle such non-standard data types. Pulse-Code Modulation (PCM) Time-Division Multiplexing (TDM) microprocessor radiosonde synchronous asynchronous Intra-Range Instrumentation Group (IRIG) 4-20mA industrial current loop Non-Return to Zero Level (NRZL) meteorology PCM data distribution system (PCMDDS)
36	Jobzentrisches Monitoring in Verteilten Heterogenen Umgebungen mit Hilfe Innovativer Skalierbarer Methoden Hilbrich, Marcus 24 June 2015 (has links) (PDF) Im Bereich des wissenschaftlichen Rechnens nimmt die Anzahl von Programmläufen (Jobs), die von einem Benutzer ausgeführt werden, immer weiter zu. Dieser Trend resultiert sowohl aus einer steigenden Anzahl an CPU-Cores, auf die ein Nutzer zugreifen kann, als auch durch den immer einfacheren Zugriff auf diese mittels Portalen, Workflow-Systemen oder Services. Gleichzeitig schränken zusätzliche Abstraktionsschichten von Grid- und Cloud-Umgebungen die Möglichkeit zur Beobachtung von Jobs ein. Eine Lösung bietet das jobzentrische Monitoring, das die Ausführung von Jobs transparent darstellen kann. Die vorliegende Dissertation zeigt zum einen Methoden mit denen eine skalierbare Infrastruktur zur Verwaltung von Monitoring-Daten im Kontext von Grid, Cloud oder HPC (High Performance Computing) realisiert werden kann. Zu diesem Zweck wird sowohl eine Aufgabenteilung unter Berücksichtigung von Aspekten wie Netzwerkbandbreite und Speicherkapazität mittels einer Strukturierung der verwendeten Server in Schichten, als auch eine dezentrale Aufbereitung und Speicherung der Daten realisiert. Zum anderen wurden drei Analyseverfahren zur automatisierten und massenhaften Auswertung der Daten entwickelt. Hierzu wurde unter anderem ein auf der Kreuzkorrelation basierender Algorithmus mit einem baumbasierten Optimierungsverfahren zur Reduzierung der Laufzeit und des Speicherbedarfs entwickelt. Diese drei Verfahren können die Anzahl der manuell zu analysierenden Jobs von vielen Tausenden, auf die wenigen, interessanten, tatsächlichen Ausreißer bei der Jobausführung reduzieren. Die Methoden und Verfahren zur massenhaften Analyse, sowie zur skalierbaren Verwaltung der jobzentrischen Monitoring-Daten, wurden entworfen, prototypisch implementiert und mittels Messungen sowie durch theoretische Analysen untersucht. / An increasing number of program executions (jobs) is an ongoing trend in scientific computing. Increasing numbers of available compute cores and lower access barriers, based on portal-systems, workflow-systems, or services, drive this trend. At the same time, the abstraction layers that enable grid and cloud solutions pose challenges in observing job behaviour. Thus, observation and monitoring capabilities for large numbers of jobs are lacking. Job-centric monitoring offers a solution to present job executions in a transparent manner. This dissertation presents methods for scalable infrastructures that handle monitoring data of jobs in grid, cloud, and HPC (High Performance Computing) solutions. A layer-based organisation of servers with a distributed storage scheme enables a task sharing that respects network bandwidths and data capacities. Additionally, three proposed automatic analysis techniques enable an evaluation of huge data quantities. One of the developed algorithms is based on cross-correlation and uses a tree-based optimisation strategy to decrease both runtime and memory usage. These three methods are able to significantly reduce the number of jobs for manual analysis from many thousands to a few interesting jobs that exhibit outlier-behaviour during job execution. Contributions of this thesis include a design, a prototype implementation, and an evaluation for methods that analyse large amounts of job-data, as well for the scalable storage concept for such data. Jobzentrisches Monitoring Ähnlichkeit Grid Big Data Monitoring Skalierbarkeit automatische Analyse Cloud HPC Datenanalyse Datenverteilung Job-centric Monitoring Similarity Grid Big Data Monitoring Saleability automatic Analysis Cloud HPC Data Analysis Data Distribution ddc:004 rvk:ST 200
37	Uma An?lise Comparativa entre Sistemas de Combina??o de Classificadores com Distribui??o Vertical dos Dados Santana, Laura Emmanuella Alves dos Santos 01 February 2008 (has links) Made available in DSpace on 2014-12-17T15:47:44Z (GMT). No. of bitstreams: 1 LauraEASS.pdf: 1648653 bytes, checksum: 0aa1d6a5cd26175688d09f2c09459503 (MD5) Previous issue date: 2008-02-01 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / In systems that combine the outputs of classification methods (combination systems), such as ensembles and multi-agent systems, one of the main constraints is that the base components (classifiers or agents) should be diverse among themselves. In other words, there is clearly no accuracy gain in a system that is composed of a set of identical base components. One way of increasing diversity is through the use of feature selection or data distribution methods in combination systems. In this work, an investigation of the impact of using data distribution methods among the components of combination systems will be performed. In this investigation, different methods of data distribution will be used and an analysis of the combination systems, using several different configurations, will be performed. As a result of this analysis, it is aimed to detect which combination systems are more suitable to use feature distribution among the components / Em sistemas que combinam as sa?das de classificadores de padr?es, sistemas de combina??o, como comit?s e sistemas multiagentes para classifica??o, um dos principais problemas ? que os componentes do sistema (classificadores ou agentes) devem ser diversos entre si. Em outras palavras, n?o existe ganho de desempenho em sistemas formados por um conjunto de componentes id?nticos. Um modo de aumentar a diversidade do sistema ? distribuir os dados do padr?o entre os classificadores que comp?em o sistema. Neste trabalho ser? feita uma investiga??o sobre o impacto do uso de t?cnicas de distribui??o de dados, mais especificamente distribui??o de caracter?sticas, entre os componentes de sistemas de combina??o de classificadores. Nesta investiga??o, diferentes t?cnicas de distribui??o de caracter?sticas ser?o usadas e uma an?lise comparativa entre diferentes sistemas de combina??o, usando diferentes configura??es, ser? feita. Como resultado desta an?lise, espera-se detectar que sistemas de combina??o s?o mais adequados para usar distribui??o de caracter?sticas entre os componentes Distribui??o Vertical de Dados Comit?s de Classificadores Sistemas Multiagentes Vertical Data Distribution Classifier Combination Systems Ensembles Multiagent Systems
38	Squelettes algorithmiques pour la programmation et l'exécution efficaces de codes parallèles / Algorithmic skeletons for efficient programming and execution of parallel codes Legaux, Joeffrey 13 December 2013 (has links) Les architectures parallèles sont désormais présentes dans tous les matériels informatiques, mais les programmeurs ne sont généralement pas formés à leur programmation dans les modèles explicites tels que MPI ou les Pthreads. Il y a un besoin important de modèles plus abstraits tels que les squelettes algorithmiques qui sont une approche structurée. Ceux-ci peuvent être vus comme des fonctions d’ordre supérieur synthétisant le comportement d’algorithmes parallèles récurrents que le développeur peut ensuite combiner pour créer ses programmes. Les développeurs souhaitent obtenir de meilleures performances grâce aux programmes parallèles, mais le temps de développement est également un facteur très important. Les approches par squelettes algorithmiques fournissent des résultats intéressants dans ces deux aspects. La bibliothèque Orléans Skeleton Library ou OSL fournit un ensemble de squelettes algorithmiques de parallélisme de données quasi-synchrones dans le langage C++ et utilise des techniques de programmation avancées pour atteindre une bonne efficacité. Nous avons amélioré OSL afin de lui apporter de meilleures performances et une plus grande expressivité. Nous avons voulu analyser le rapport entre les performances des programmes et l’effort de programmation nécessaire sur OSL et d’autres modèles de programmation parallèle. La comparaison rigoureuse entre des programmes parallèles dans OSL et leurs équivalents de bas niveau montre une bien meilleure productivité pour les modèles de haut niveau qui offrent une grande facilité d’utilisation tout en produisant des performances acceptables. / Parallel architectures have now reached every computing device, but software developers generally lackthe skills to program them through explicit models such as MPI or the Pthreads. There is a need for moreabstract models such as the algorithmic skeletons which are a structured approach. They can be viewed ashigher order functions that represent the behaviour of common parallel algorithms, and those are combinedby the programmer to generate parallel programs. Programmers want to obtain better performances through the usage of parallelism, but the development time implied is also an important factor. Algorithmic skeletons provide interesting results in both those fields. The Orléans Skeleton Library or OSL provides a set of algorithmic skeletons for data parallelism within the bulk synchronous parallel model for the C++ language. It uses advanced metaprogramming techniques to obtain good performances. We improved OSL in order to obtain better performances from its generated programs, and extended its expressivity. We wanted to analyze the ratio between the performance of programs and the development effort needed within OSL and other parallel programming models. The comparison between parallel programs written within OSL and their equivalents in low level parallel models shows a better productivity for high level models : they are easy to use for the programmers while providing decent performances. Modèles de haut niveau, Squelettes algorithmiques Parallélisme quasi-synchrone Effort de programmation Expressivité Exceptions Distribution des données Homomorphismes High level programming models Algorithmic skeletons Bulk synchronous parallelism Development effort Expressivity Exceptions Data distribution Homomorphisms
39	Effective Automatic Computation Placement and Data Allocation for Parallelization of Regular Programs Chandan, G January 2014 (has links) (PDF) Scientiﬁc applications that operate on large data sets require huge amount of computation power and memory. These applications are typically run on High Performance Computing (HPC) systems that consist of multiple compute nodes, connected over an network interconnect such as InﬁniBand. Each compute node has its own memory and does not share the address space with other nodes. A signiﬁcant amount of work has been done in past two decades on parallelizing for distributed-memory architectures. A majority of this work was done in developing compiler technologies such as high performance Fortran (HPF) and partitioned global address space (PGAS). However, several steps involved in achieving good performance remained manual. Hence, the approach currently used to obtain the best performance is to rely on highly tuned libraries such as ScaLAPACK. The objective of this work is to improve automatic compiler and runtime support for distributed-memory clusters for regular programs. Regular programs typically use arrays as their main data structure and array accesses are afﬁne functions of outer loop indices and program parameters. A lot of scientiﬁc applications such as linear-algebra kernels, stencils, partial differential equation solvers, data-mining applications and dynamic programming codes fall in this category. In this work, we propose techniques for ﬁnding computation mapping and data allocation when compiling regular programs for distributed-memory clusters. Techniques for transformation and detection of parallelism, relying on the polyhedral framework already exist. We propose automatic techniques to determine computation placements for identiﬁed parallelism and allocation of data. We model the problem of ﬁnding good computation placement as a graph partitioning problem with the constraints to minimize both communication volume and load imbalance for entire program. We show that our approach for computation mapping is more effective than those that can be developed using vendor-supplied libraries. Our approach for data allocation is driven by tiling of data spaces along with a compiler assisted runtime scheme to allocate and deallocate tiles on-demand and reuse them. Experimental results on some sequences of BLAS calls demonstrate a mean speedup of 1.82× over versions written with ScaLAPACK. Besides enabling weak scaling for distributed memory, data tiling also improves locality for shared-memory parallelization. Experimental results on a 32-core shared-memory SMP system shows a mean speedup of 2.67× over code that is not data tiled. High Performance Computing Systems Computer Placement Data Analysis and Management Distributed-memory Automatic Parallelization Data-distribution Polyhedral Model Computation Placement Hyperplanes and Polyhedra HPC Systems Data Allocation and Management Data Movement Code Generation Computer Science
40	Vývoj datového skladu na platformě Teradata a Informatica v sektoru pojišťovnictví / Data warehousing on technological platform TERADATA and Informatica in the insurance industry Šiler, Zdeněk January 2012 (has links) This thesis focuses on data warehousing on technological platform TERADATA and Informatica Power Center (further only IFPC). TERADATA provides a robust database system for storage of big volume data and query processing over such data. Product Informatica Powercenter is a tool for developing of ETL processes. Both of tools belong to mature technology for large data warehouse development which stores large volumes of data over the enterprise. The thesis analyses both tools to build data warehouse and the specifics of their use in the insurance sector. The thesis is divided into two main thematic sections - theoretical and practical part. The theoretical part describes database system TERADATA and ETL tool IFPC in details, including analysis of business intelligence architecture in the insurance segment, which often uses this platform for data warehouse development. The thesis describes the architecture of database system TERADATA and the way to data storage and query processing. Then specific features, on which is necessary to focus by TERADATA data warehouse development, are characterized. Also its advantages and disadvantages are analyzed. Database system TERADATA is faced with other competing database systems. The thesis deals with general characteristics of ETL tool IFPC -- software architecture a its components. It examines the advantages and disadvantages of IFPC compared to competitors on the market. Conclusion of the theoretical part analyzes the synergies between Teradata and IFPC. The thesis explains the real benefits of combination TERADATA and IFPC. The practical part of thesis demostrates the use of tools for data warehousing development on real project Unification of client data. This project describes the entire development process in a data warehouse from business requirements through functional and technical design to implementation of ETL mapping in Informatica Power Center. It deals with bug fixing during ETL development and testing methods. The pratical part focuses on implementation of chosen mapping in IFPC which is deployed in the insurance sector. Part of this thesis is a comparison of ETL tools IFPC with SSIS ETL tool integrated in MS SQL Server 2008 R2.

Search results