381 |
Ανάπτυξη συστημάτων δημοσιεύσεων/συνδρομών σε δομημένα δίκτυα ομοτίμων εταίρων / Content-based publish/subscribe systems over DHT-based Peer-to-Peer NetworksΑικατερινίδης, Ιωάννης 18 April 2008 (has links)
Τα τελευταία χρόνια οι εφαρμογές συνεχούς μετάδοσης ροών πληροφορίας στο διαδίκτυο έχουν γίνει ιδιαίτερα δημοφιλείς. Με τον συνεχώς αυξανόμενο ρυθμό εισόδου νέων αντικειμένων πληροφορίας, γίνεται ολοένα και πιο επιτακτική η ανάγκη για την ανάπτυξη πληροφορικών συστημάτων που να μπορούν να προσφέρουν στους χρήστες τους μόνο εκείνες τις πληροφορίες που τους ενδιαφέρουν, φιλτράροντας τεράστιους όγκους από άσχετες για τον κάθε χρήστη, πληροφορίες. Ένα μοντέλο διάδοσης πληροφορίας ικανό να ενσωματώσει τέτοιου είδους ιδιότητες, είναι το μοντέλο δημοσιεύσεων/συνδρομών βασισμένο στο περιεχόμενο ( content-based publish/subscribe)
Βασική συνεισφορά μας στο χώρο είναι η εφαρμογή του μοντέλου δημοσιεύσεων/συνδρομών βασισμένου στο περιεχόμενο (content-based publish/subscribe) πάνω στα δίκτυα ομοτίμων ώστε να μπορέσουμε να προσφέρουμε στους χρήστες υψηλή εκφραστικότητα κατά την δήλωση των ενδιαφερόντων τους, λειτουργώντας σε ένα πλήρως κατανεμημένο και κλιμακώσιμο περιβάλλον. Ο κορμός των προτεινόμενων λύσεων σε αυτή τη διατριβή είναι: (α) η ανάπτυξη αλγορίθμων για την αποθήκευση των κλειδιών των δημοσιεύσεων σε κατάλληλους κόμβους του δικτύου με βάση τις συνθήκες στο περιεχόμενο που έχουν δηλωθεί και (β) αλγορίθμων δρομολόγησης δημοσιεύσεων στο διαδίκτυο έτσι ώστε να ((συναντούν)) αυτούς τους κόμβους οι οποίοι περιέχουν συνδρομές που ικανοποιούνται από την πληροφορία της δημοσίευσης.
Οι προτεινόμενοι αλγόριθμοι υλοποιήθηκαν και εξετάσθηκαν ενδελεχώς με προσομοίωση μελετώντας την απόδοσή τους με βάση μετρικές όπως: η δίκαιη κατανομή του φόρτου στους κόμβους του δικτύου από τη διακίνηση μηνυμάτων κατά την επεξεργασία των συνδρομών/δημοσιεύσεων, ο συνολικός αριθμός μηνυμάτων που διακινούνται, ο συνολικός όγκος επιπλέον πληροφορίας που απαιτούν οι αλγόριθμοι να εισέλθει στο δίκτυο (network bandwidth), και ο χρόνος που απαιτείται για την ανεύρεση των συνδρομών που συζευγνύουν με κάθε δημοσίευση. / In the past few years the continuous data streams applications have become particularly popular. With the continuously increasing rate of entry of new information, it becomes imperative the need for developing appropriate infrastructures that will offer only the information that users are interested for, filtering out large volumes of irrelevant for each user, information. The content-based publish/subscribe model, is capable of handling large volumes of data traffic in a distributed, fully decentralized manner.
Our basic contribution in this research area is the coupling of the content-based publish/subscribe model with the structured (DHT-based) peer-to-peer networks, offering high expressiveness to users on stating their interests. The proposed infrastructure operated in a distributed and scalable environment. The proposed solutions in this thesis are related to the development and testing: (a) of a number of algorithms for subscription processing in the network and (b) of a number of algorithms for processing the publication events.
The proposed algorithms were developed and thoroughly tested with a detailed simulation-based experimentation. The performance metrics are: the fair distribution of load in the nodes of network from the distribution of messages while processing subscriptions and publication events, the total number of messages that are generated, the total volume of additional information that is required from the algorithms to operate, and the time that is required for matching publication events to subscriptions.
|
382 |
Real-time monitoring of distributed real-time and embedded systems using WebPuranik, Darshan Gajanan 03 January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Asynchronous JavaScript and XML (AJAX) is the primary method for enabling asynchronous communication over the Web. Although AJAX is providing warranted real-time capabilities to the Web, it requires unconventional programming methods at the expense of extensive resource usage. WebSockets, which is an
emerging protocol, has the potential to address many challenges with implementing asynchronous communication over the Web. There, however, has been no in-depth study that quantitatively compares AJAX and WebSockets.
This thesis therefore provides two contributions to Web development.
First, it provides an experience report for adding real-time monitoring support
over the Web to the Open-source Architecture of Software Instrumentation of Systems(OASIS), which is open-source real-time instrumentation middleware for distributed real-time and embedded (DRE) systems. Secondly, it quantitatively compares using AJAX and WebSockets to stream collected instrumentation data over the Web in real-time. Results from quantitative comparison between WebSockets and AJAX show that a WebSockets server consumes 50% less network bandwidth than an AJAX server;
a WebSockets client consumes memory at constant rate, not at an increasing rate; and WebSockets can send up to 215.44% more data samples when consuming the same amount network bandwidth as AJAX.
|
383 |
Modeling, monitoring and optimization of discrete event systems using Petri netsYan, Jiaxiang 29 January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Yan, Jiaxiang. M.S.E.C.E., Purdue University, May 2013. Modeling, Monitoring and Optimization of Discrete Event Systems Using Petri Nets. Major Professor: Lingxi Li. In last decades, the research of discrete event systems (DESs) has attracts more and more attention because of the fast development of intelligent control strategies. Such control measures combine the conventional control strategies with discrete
decision-making processes which simulate human decision-making processes. Due to the scale and complexity of common DESs, the dedicated models, monitoring methods and optimal control strategies for them are necessary. Among various DES models, Petri nets are famous for the advantage in dealing with asynchronous processes. They have been widely applied in intelligent transportation systems (ITS) and communication technology in recent years. With encoding of the Petri net state, we can also
enable fault detection and identification capability in DESs and mitigate potential human
errors. This thesis studies various problems in the context of DESs that can be modeled by Petri nets. In particular, we focus on systematic modeling, asynchronous monitoring and optimal control strategies design of Petri nets. This thesis starts by looking at the systematic modeling of ITS. A microscopic
model of signalized intersection and its two-layer timed Petri net representation is
proposed in this thesis, where the first layer is the representation of the intersection
and the second layer is the representation of the traffic light system. Deterministic and
stochastic transitions are both involved in such Petri net representation. The detailed
operation process of such Petri net representation is stated. The improvement of such Petri net representation is also provided with comparison to previous models. Then we study the asynchronous monitoring of sensor networks. An event sequence reconstruction algorithm for a given sensor network based on asynchronous observations of its state changes is proposed in this thesis. We assume that the sensor network is modeled as a Petri net and the asynchronous observations are in the
form of state (token) changes at different places in the Petri net. More specifically,
the observed sequences of state changes are provided by local sensors and are asynchronous,
i.e., they only contain partial information about the ordering of the state changes that occur. We propose an approach that is able to partition the given net into several subnets and reconstruct the event sequence for each subnet. Then we develop an algorithm that is able to reconstruct the event sequences for the entire net that are consistent with: 1) the asynchronous observations of state changes; 2)
the event sequences of each subnet; and 3) the structure of the given Petri net. We discuss the algorithmic complexity. The final problem studied in this thesis is the optimal design method of Petri net controllers with fault-tolerant ability. In particular, we consider multiple faults detection and identification in Petri nets that have state machine structures (i.e., every transition in the net has only one input place and one output place). We develop the approximation algorithms to design the fault-tolerant Petri net controller which achieves the minimal number of connections with the original controller. A design example for an automated guided vehicle (AGV) system is also provided to illustrate our approaches.
|
384 |
Parallel acceleration of deadlock detection and avoidance algorithms on GPUsAbell, Stephen W. 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Current mainstream computing systems have become increasingly complex. Most of which have Central Processing Units (CPUs) that invoke multiple threads for their computing tasks. The growing issue with these systems is resource contention and with resource contention comes the risk of encountering a deadlock status in the system. Various software and hardware approaches exist that implement deadlock detection/avoidance techniques; however, they lack either the speed or problem size capability needed for real-time systems. The research conducted for this thesis aims to resolve issues present in past approaches by converging the two platforms (software and hardware) by means of the Graphics Processing Unit (GPU). Presented in this thesis are two GPU-based deadlock detection algorithms and one GPU-based deadlock avoidance algorithm. These GPU-based algorithms are: (i) GPU-OSDDA: A GPU-based Single Unit Resource Deadlock Detection Algorithm, (ii) GPU-LMDDA: A GPU-based Multi-Unit Resource Deadlock Detection Algorithm, and (iii) GPU-PBA: A GPU-based Deadlock Avoidance Algorithm. Both GPU-OSDDA and GPU-LMDDA utilize the Resource Allocation Graph (RAG) to represent resource allocation status in the system. However, the RAG is represented using integer-length bit-vectors. The advantages brought forth by this approach are plenty: (i) less memory required for algorithm matrices, (ii) 32 computations performed per instruction (in most cases), and (iii) allows our algorithms to handle large numbers of processes and resources. The deadlock detection algorithms also require minimal interaction with the CPU by implementing matrix storage and algorithm computations on the GPU, thus providing an interactive service type of behavior. As a result of this approach, both algorithms were able to achieve speedups over two orders of magnitude higher than their serial CPU implementations (3.17-317.42x for GPU-OSDDA and 37.17-812.50x for GPU-LMDDA). Lastly, GPU-PBA is the first parallel deadlock avoidance algorithm implemented on the GPU. While it does not achieve two orders of magnitude speedup over its CPU implementation, it does provide a platform for future deadlock avoidance research for the GPU.
|
385 |
High-performant, Replicated, Queue-oriented Transaction Processing Systems on Modern Computing InfrastructuresThamir Qadah (11132985) 27 July 2021 (has links)
With the shifting landscape of computing hardware architectures and the emergence of new computing environments (e.g., large main-memory systems, hundreds of CPUs, distributed and virtualized cloud-based resources), state-of-the-art designs of transaction processing systems that rely on conventional wisdom suffer from lost performance optimization opportunities. This dissertation challenges conventional wisdom to rethink the design and implementation of transaction processing systems for modern computing environments.<div><br></div><div>We start by tackling the vertical hardware scaling challenge, and propose a deterministic approach to transaction processing on emerging multi-sockets, many-core, shared memory architecture to harness its unprecedented available parallelism. Our proposed priority-based queue-oriented transaction processing architecture eliminates the transaction contention footprint and uses speculative execution to improve the throughput of centralized deterministic transaction processing systems. We build QueCC and demonstrate up to two orders of magnitude better performance over the state-of-the-art.<br></div><div><br></div><div>We further tackle the horizontal scaling challenge and propose a distributed queue-oriented transaction processing engine that relies on queue-oriented communication to eliminate the traditional overhead of commitment protocols for multi-partition transactions. We build Q-Store, and demonstrate up to 22x improvement in system throughput over the state-of-the-art deterministic transaction processing systems.<br></div><div><br></div><div>Finally, we propose a generalized framework for designing distributed and replicated deterministic transaction processing systems. We introduce the concept of speculative replication to hide the latency overhead of replication. We prototype the speculative replication protocol in QR-Store and perform an extensive experimental evaluation using standard benchmarks. We show that QR-Store can achieve a throughput of 1.9 million replicated transactions per second in under 200 milliseconds and a replication overhead of 8%-25%compared to non-replicated configurations.<br></div>
|
386 |
Scalable Parallel Machine Learning on High Performance Computing Systems–Clustering and Reinforcement LearningWeijian Zheng (14226626) 08 December 2022 (has links)
<p>High-performance computing (HPC) and machine learning (ML) have been widely adopted by both academia and industries to address enormous data problems at extreme scales. While research has reported on the interactions of HPC and ML, achieving high performance and scalability for parallel and distributed ML algorithms is still a challenging task. This dissertation first summarizes the major challenges for applying HPC to ML applications: 1) poor performance and scalability, 2) loss of the convergence rate, 3) lower quality of the trained model, and 4) a lack of performance optimization techniques designed for specific applications. Researchers can address the four challenges in new ML applications. This dissertation shows how to solve them for two specific applications: 1) a clustering algorithm and 2) graph optimization algorithms that use reinforcement learning (RL).</p>
<p>As to the clustering algorithm, we first propose an algorithm called the simulated-annealing clustering algorithm. By combining a blocked data layout and asynchronous local optimization within each thread, the simulated-annealing enhanced clustering algorithm has a convergence rate that is comparable to the K-means algorithm but with much higher performance. Experiments with synthetic and real-world datasets show that the simulated-annealing enhanced clustering algorithm is significantly faster than the MPI K-means library using up to 1024 cores. However, the optimization costs (Sum of Square Error (SSE)) of the simulated-annealing enhanced clustering algorithm became higher than the original costs. To tackle this problem, we devise a new algorithm called the full-step feel-the-way clustering algorithm. In the full-step feel-the-way algorithm, there are L local steps within each block of data points. We use the first local step’s results to compute accurate global optimization costs. Our results show that the full-step algorithm can significantly reduce the global number of iterations needed to converge while obtaining low SSE costs. However, the time spent on the local steps is greater than the benefits of the saved iterations. To improve this performance, we next optimize the local step time by incorporating a sampling-based method called reassignment-history-aware sampling. Extensive experiments with various synthetic and real world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that our parallel algorithms can outperform the fastest open-source MPI K-means implementation by up to 110% on 4,096 CPU cores with comparable SSE costs.</p>
<p>Our evaluations of the sampling-based feel-the-way algorithm establish the effectiveness of the local optimization strategy, the blocked data layout, and the sampling methods for addressing the challenges of applying HPC to ML applications. To explore more parallel strategies and optimization techniques, we focus on a more complex application: graph optimization problems using reinforcement learning (RL). RL has proved successful for automatically learning good heuristics to solve graph optimization problems. However, the existing RL systems either do not support graph RL environments or do not support multiple or many GPUs in a distributed setting. This has compromised RL’s ability to solve large scale graph optimization problems due to the lack of parallelization and high scalability. To address the challenges of parallelization and scalability, we develop OpenGraphGym-MG, a high performance distributed-GPU RL framework for solving graph optimization problems. OpenGraphGym-MG focuses on a class of computationally demanding RL problems in which both the RL environment and the policy model are highly computation intensive. In this work, we distribute large-scale graphs across distributed GPUs and use spatial parallelism and data parallelism to achieve scalable performance. We compare and analyze the performance of spatial and data parallelism and highlight their differences. To support graph neural network (GNN) layers that take data samples partitioned across distributed GPUs as input, we design new parallel mathematical kernels to perform operations on distributed 3D sparse and 3D dense tensors. To handle costly RL environments, we design new parallel graph environments to scale up all RL-environment-related operations. By combining the scalable GNN layers with the scalable RL environment, we are able to develop high performance OpenGraphGym-MG training and inference algorithms in parallel.</p>
<p>To summarize, after proposing the major challenges for applying HPC to ML applications, this thesis explores several parallel strategies and performance optimization techniques using two ML applications. Specifically, we propose a local optimization strategy, a blocked data layout, and sampling methods for accelerating the clustering algorithm, and we create a spatial parallelism strategy, a parallel graph environment, agent, and policy model, and an optimized replay buffer, and multi-node selection strategy for solving large optimization problems over graphs. Our evaluations prove the effectiveness of these strategies and demonstrate that our accelerations can significantly outperform the state-of-the-art ML libraries and frameworks without loss of quality in trained models.</p>
|
Page generated in 0.0223 seconds