Global ETD Search

1	Mining constraints for Testing and Verification Wu, Weixin 06 February 2009 (has links) With the advances in VLSI and System-On-Chip (SOC) technologies, the complexity of hardware systems has increased manifold. The increasing complexity poses serious challenges to the digital hardware design. Functional verification has become one of the most expensive and time-consuming components of the current product development cycle. Today, design verification alone often surpasses 70% of the total development cost and the situation has been projected to continue to worsen. The two most widely used formal methods for design verification are Equivalence Checking and Model Checking. During the design phase, hardware goes through several stages of optimizations for area, speed, power, etc. Determining the functional correctness of the design after each optimization step by means of exhaustive simulation can be prohibitively expensive. An alternative to prove functional correctness of the optimized design is to determine the design's functional equivalence with respect to some golden model which is known to be functionally correct. Efficient techniques to perform this process is known as Equivalence Checking. Equivalence Checking requires that the implementation circuit should be functionally equivalent to the specification circuit. Complexities in Equivalence Checking can be exponential to the circuit size in the worst case. Since Equivalence Checking of sequential circuits still remains a challenging problem, in this thesis, we first address this problem using efficient learning techniques. In contrast to the traditional learning methods, our method employs a mining algorithm to discover global constraints among several nodes efficiently in a sequential circuit. In a Boolean satisfiability (SAT) based framework for the bounded sequential equivalence checking, by taking advantage of the repeated search space, our mining algorithm is only performed on a small window size of unrolled circuit, and the mined relations could be reused subsequently. These powerful relations, when added as new constraint clauses to the original formula, help to significantly increase the deductive power for the SAT engine, thereby pruning a larger portion of the search space. Likewise, the memory required and time taken to solve these problems are alleviated. We also propose a pseudo-functional test generation method based on effective functional constraints extraction. We use mining techniques to extract a set of multi-node functional constraints which consists of illegal states and internal signal correlation. Then the functional constraints are imposed to a ATPG tool to generate pseudo functional delay tests. / Master of Science Learning Simulation SAT Multi-node Constraint Mining
2	Optimization of Cross-Layer Network Data based on Multimedia Application Requirements Rahman, Tasnim 15 August 2019 (has links) This thesis proposes a convex network utility maximization (NUM) problem that can be solved to optimize a cross-layer network based on user and system defined requirements for quality and link capacity of multimedia applications. The problem can also be converged to a distributed solution using dual decomposition. Current techniques do not address the changing system's requirements for the network in addition to the user's requirements for an application when optimizing a cross-layer network, but rather focus on optimizing a dynamic network to conform to a real-time application or for a specific performance. Optimizing the cross-layer network for the changing system and user requirements allows a more accurate optimization of the overall cross-layer network of any given multi-node, ad-hoc wireless application for data transmission quality and link capacity to meet overall mission demands. Convex Optimization Network Utility Application Requirements Swarm Network Multi-node Network
3	Fast Static Learning and Inductive Reasoning with Applications to ATPG Problems Dsouza, Michael Dylan 03 March 2015 (has links) Relations among various nodes in the circuit, as captured by static and inductive invariants, have shown to have a positive impact on a wide range of EDA applications. Techniques such as boolean constraint propagation for static learning and assume-then-verify approach to reason about inductive invariants have been possible due to efficient SAT solvers. Although a significant amount of research effort has been dedicated to the development of effective invariant learning techniques over the years, the computation time for deriving powerful multi-node invariants is still a bottleneck for large circuits. Fast computation of static and inductive invariants is the primary focus of this thesis. We present a novel technique to reduce the cost of static learning by intelligently identifying redundant computations that may not yield new invariants, thereby achieving significant speedup. The process of inductive invariant reasoning relies on the assume-then-verify framework, which requires multiple iterations to complete, making it infeasible for cases with a large set of multi-node invariants. We present filtering techniques that can be applied to a diverse set of multi-node invariants to achieve a significant boost in performance of the invariant checker. Mining and reasoning about all possible potential multi-node invariants is simply infeasible. To alleviate this problem, strategies that narrow down the focus on specific types of powerful multi-node invariants are also presented. Experimental results reflect the promise of these techniques. As a measure of quality, the invariants are utilized for untestable fault identification and to constrain ATPG for path delay fault testing, with positive results. / Master of Science Static learning inductive reasoning multi-node invariants logic implications boolean constraint propagation
4	Sequential Equivalence Checking of Circuits with Different State Encodings by Pruning Simulation-based Multi-Node Invariants Yuan, Zeying 05 October 2015 (has links) Verification is an important step for Integrated Circuit (IC) design. In fact, literature has reported that up to 70% of the design effort is spent on checking if the design is functionally correct. One of the core verification tasks is Equivalence Checking (EC), which attempts to check if two structurally different designs are functionally equivalent for all reachable states. Powerful equivalence checking can also provide opportunities for more aggressive logic optimizations, meeting different goals such as smaller area, better performance, etc. The success of Combinational Equivalence Checking (CEC) has laid a foundation to industry-level combinational logic synthesis and optimization. However, Sequential Equivalence Checking (SEC) still faces much challenge, especially for those complex circuits that have different state encodings and few internal signal equivalences. In this thesis, we propose a novel simulation-based multi-node inductive invariant generation and pruning technique to check the equivalence of sequential circuits that have different state encodings and very few equivalent signals between them. By first grouping flip-flops into smaller subsets to make it scalable for large designs, we then propose a constrained logic synthesis technique to prune potential multi-node invariants without inadvertently losing important constraints. Our pruning technique guarantees the same conclusion for different instances (proving SEC or not) compared to previous approaches in which merging of such potential invariants might lose important relations if the merged relation does not turn out to be a true invariant. Experimental results show that the smaller invariant set can be very effective for sequential equivalence checking of such hard SEC instances. Our approach is up to 20x-- faster compared to previous mining-based methods for larger circuits. / Master of Science Sequential Equivalence Checking(SEC) Multi-node Inductive Invariants Constrained Logic Synthesis
5	Transforming and Optimizing Irregular Applications for Parallel Architectures Zhang, Jing 12 February 2018 (has links) Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically increase. On the other hand, many applications in well-established and emerging fields, such as bioinformatics, social network analysis, and graph processing, exhibit increasing irregularities in memory access, control flow, and communication patterns. While multiple techniques have been introduced into modern parallel architectures to tolerate these irregularities, many irregular applications still execute poorly on current parallel architectures, as their irregularities exceed the capabilities of these techniques. Therefore, it is critical to resolve irregularities in applications for parallel architectures. However, this is a very challenging task, as the irregularities are dynamic, and hence, unknown until runtime. To optimize irregular applications, many approaches have been proposed to improve data locality and reduce irregularities through computational and data transformations. However, there are two major drawbacks in these existing approaches that prevent them from achieving optimal performance. First, these approaches use local optimizations that exploit data locality and regularity locally within a loop or kernel. However, in many applications, there is hidden locality across loops or kernels. Second, these approaches use "one-size-fits-all'' methods that treat all irregular patterns equally and resolve them with a single method. However, many irregular applications have complex irregularities, which are mixtures of different types of irregularities and need differentiated optimizations. To overcome these two drawbacks, we propose a general methodology that includes a taxonomy of irregularities to help us analyze the irregular patterns in an application, and a set of adaptive transformations to reorder data and computation based on the characteristics of the application and architecture. By extending our adaptive data-reordering transformation on a single node, we propose a data-partitioning framework to resolve the load imbalance problem of irregular applications on multi-node systems. Unlike existing frameworks, which use "one-size-fits-all" methods to partition the input data by a single property, our framework provides a set of operations to transform the input data by multiple properties and generates the desired data-partitioning codes by composing these operations into a workflow. / Ph. D. / Irregular applications, which present unpredictable and irregular patterns of data accesses and computation, are increasingly important in well-established and emerging fields, such as biological data analysis, social network analysis, and machine learning, to deal with large datasets. On the other hand, current parallel processors, such as multi-core CPUs (central processing units), GPUs (graphics processing units), and computer clusters (i.e., groups of connected computers), are designed for regular applications and execute irregular applications poorly. Therefore, it is critical to optimize irregular applications for parallel processors. However, it is a very challenging task, as the irregular patterns are dynamic, and hence, unknown until application execution. To overcome this challenge, we propose a general methodology that includes a taxonomy of irregularities to help us analyze the irregular patterns in an application, and a set of adaptive transformations to reorder data and computation for exploring hidden regularities based on the characteristics of the application and processor. We apply our methodology on couples of important and complex irregular applications as case studies to demonstrate that it is effective and efficient. Irregular Applications Parallel Architectures Multi-core Many-core Multi-node Bioinformatics
6	The design and implementation of adaptive videoconference topology in Learning Manager System and Access-Grid integrated environment. Chen, Shun-Keng 09 February 2007 (has links) Nowadays the Learning Management System (LMS) platforms provide limited bidirectional, interactive mechanisms that they are competent to handle personal or small-scale distance learning systems. These mechanisms are designed for one to many online tutorials, and the technology utilizes single-input by single-output video stream technology, the video and audio data need to be coupled with one or many Multipoint Control Units (MCU) to mix or convert them into a single output media stream. In this platform MCU is critical to LMS, however, such system is expensive, lack of capacity and difficult to be massively deployed. Access-Grid (AG), an Open Source program, offers users capability to watch online multimedia audio-video contents from all the interconnected nodes of LMS through Multicast protocol, and supports groups-to-group high quality interactive distance learning. It requires all the networks to support the Multicast protocol. The MBONE (Multicast Backbone) can be used to connect different Multicast groups via Unicast communication. However, if the number of groups involving in the distance learning are large, the host computers or routers of the network will be heavily loaded because they need to handle the delivering of the media packets. To use a QuickBridge for aggregating and delivering packages is an alternative of LMS and requires (N-1) N BW bandwidth . For example, if there is a 15 nodes online conference and each node uses 800kbps data rate to transmit audio-video contents, then the demanded bandwidth of the aggregation is 168 Mbps. The way of dispersing and controlling the data flow becomes important factors and will greatly affect the quality of the AG online conference. This thesis modifies the procedure of AG and QuickBridge, and allows all the AG Clients to be able to transmit Unicast and Multicast packets in the online conference. It offers a Meeting Management Server to dynamically adjust topology and hub points, and achieves better elasticity to the system. By modifying VIC and RAT procedure, the system controls the outbound audio-video data flow from each nodes of online conference, and reduces the demand of bandwidth. The system can directly provide end-to-end conferencing, using Unicast communication to connect the nodes in different Multicast groups, or using the Multicast on the backbone and then using unicast communication to the local nodes. The functionality of the LMS can be improved and capable of supporting multi-windows to multi-user interactive online conference for the users. The results of this thesis can be applied upon real-time interactive distance learning, online video conferencing and interactive online TV. It also helps to lower the cost of the system and reduce the requirement of network bandwidth. Access-GRID online interactive distance learning video conference real-time distance learning adjustable online conference topology
7	Methodical Design Approaches to Multiple Node Collection Robustness for Flip-Flop Soft Error MItigation January 2015 (has links) abstract: The space environment comprises cosmic ray particles, heavy ions and high energy electrons and protons. Microelectronic circuits used in space applications such as satellites and space stations are prone to upsets induced by these particles. With transistor dimensions shrinking due to continued scaling, terrestrial integrated circuits are also increasingly susceptible to radiation upsets. Hence radiation hardening is a requirement for microelectronic circuits used in both space and terrestrial applications. This work begins by exploring the different radiation hardened flip-flops that have been proposed in the literature and classifies them based on the different hardening techniques. A reduced power delay element for the temporal hardening of sequential digital circuits is presented. The delay element single event transient tolerance is demonstrated by simulations using it in a radiation hardened by design master slave flip-flop (FF). Using the proposed delay element saves up to 25% total FF power at 50% activity factor. The delay element is used in the implementation of an 8-bit, 8051 designed in the TSMC 130 nm bulk CMOS. A single impinging ionizing radiation particle is increasingly likely to upset multiple circuit nodes and produce logic transients that contribute to the soft error rate in most modern scaled process technologies. The design of flip-flops is made more difficult with increasing multi-node charge collection, which requires that charge storage and other sensitive nodes be separated so that one impinging radiation particle does not affect redundant nodes simultaneously. We describe a correct-by-construction design methodology to determine a-priori which hardened FF nodes must be separated, as well as a general interleaving scheme to achieve this separation. We apply the methodology to radiation hardened flip-flops and demonstrate optimal circuit physical organization for protection against multi-node charge collection. Finally, the methodology is utilized to provide critical node separation for a new hardened flip-flop design that reduces the power and area by 31% and 35% respectively compared to a temporal FF with similar hardness. The hardness is verified and compared to other published designs via the proposed systematic simulation approach that comprehends multiple node charge collection and tests resiliency to upsets at all internal and input nodes. Comparison of the hardness, as measured by estimated upset cross-section, is made to other published designs. Additionally, the importance of specific circuit design aspects to achieving hardness is shown. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2015 Electrical engineering Flip-flop Methodology Multi node charge collection Radiation hardening by design Single Event Transient (SET) Single Event Upset (SEU)
8	Energy And Channel-Aware Power And Discrete Rate Adaptation And Access In Energy Harvesting Wireless Networks Khairnar, Parag S 05 1900 (has links) (PDF) Energy harvesting (EH) nodes, which harvest energy from the environment in order to communicate over a wireless link, promise perpetual operation of wireless networks. The primary focus of the communication system design shifts from being as energy conservative as possible to judiciously handling the randomness in the energy harvesting process in order to enhance the system performance. This engenders a significant redesign of the physical and multiple access layers of communication. In this thesis, we address the problem of maximizing the throughput of a system that consists of rate-adaptive EH nodes that transmit data to a common sink node. We consider the practical case of discrete rate adaptation in which a node selects its transmission power from a set of finitely many rates and adjusts its transmit power to meet a bit error rate (BER) constraint. When there is only one EH node in the network, the problem involves determining the rate and power at which the node should transmit as a function of its channel gain and battery state. For the system with multiple EH nodes, which node should be selected also needs to be determined. We first prove that the energy neutrality constraint, which governs the operation of an EH node, is tighter than the average power constraint. We then propose a simple rate and power adaptation scheme for a system with a single EH node and prove that its throughput approaches the optimal throughput arbitrarily closely. We then arrive at the optimal selection and rate adaptation rules for a multi-EH node system that opportunistically selects at most one node to transmit at any time. The optimal scheme is shown to significantly outperform other ad hoc selection and transmission schemes. The effect of energy overheads, such as battery storage inefficiencies and the energy required for sensing and processing, on the transmission scheme and its overall throughput is also analytically characterized. Further, we show how the time and energy overheads incurred by the opportunistic selection process itself affect the adaptation and selection rules and the overall system throughput. Insights into the scaling behavior of the average system throughput in the asymptotic regime, in which the number of nodes tend to infinity, are also obtained. We also optimize the maximum time allotted for selection, so as to maximize the overall system throughput. For systems with EH nodes or non-EH nodes, which are subject to an average power constraint, the optimal rate and power adaptation depends on a power control parameter, which hitherto has been calculated numerically. We derive novel asymptotically tight bounds and approximations for the same, when the average rate of energy harvesting is large. These new expressions are analytically insightful, computationally useful, and are also quite accurate even in the non-asymptotic regime when average rate of energy harvesting is relatively small. In summary, this work develops several useful insights into the design of selection and transmission schemes for a wireless network with rate-adaptive EH nodes. Electric Power Networks Wireless Communication Networks Energy Harvesting Nodes Energy Harvesting Systems Energy Harvesting Wireless Networks Energy Harvesting Wireless Nodes Multi-node Systems EH Nodes Communication Engineering
9	Compilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed Architectures Patel, Parita January 2017 (has links) (PDF) 1. Main Contributions made by the supplicant: This thesis proposes an Open Computing Language (OpenCL) framework to address the challenges of implementation of graph algorithms on parallel architectures and large scale graph processing. The proposed framework uses the front-end of the existing Falcon DSL compiler, andso, programmers enjoy conventional, imperative and shared memory programming style. The back-end of the framework generates implementations of graph algorithms in OpenCL to target single device architectures. The generated OpenCL code is portable across various platforms, e.g., CPU and GPU, and also vendors, e.g., NVIDIA, Intel and AMD. The framework automatically generates code for thread management and memory management for the devices. It hides all the lower level programming details from the programmers. A few optimizations are applied to reduce the execution time. The large graph processing challenge is tackled through graph partitioning over multiple devices of a single node and multiple nodes of a distributed cluster. The programmer codes a graph algorithm in Falcon assuming that the graph fits into single machine memory and the framework handles graph partitioning without any intervention by the programmer. The framework analyses the Abstract Syntax Tree (AST) generated by Falcon to find all the necessary information about communication and synchronization. It automatically generates code for message passing to hide the complexity of programming in a distributed environment. The framework also applies a set of optimizations to minimize the communication latency. The thesis reports results of several experiments conducted on widely used graph algorithms: single source shortest path, pagerank and minimum spanning tree to name a few. Experimental evaluations show that the reported results are comparable to the state-of-art non-portable graph DSLs and frameworks on a single node. Experiments in a distributed environment to show the scalability and efficiency of the framework are also described. 2. Summary of the Referees' Written Comments: Extracts from the referees' reports are provided below. A copy of the written replies to the clarifications sought by the external examiner is appended to this report. Referee 1: This thesis extends the Falcon framework with OpenCL for parallel graph processing on multi-device and multi-node architectures. The thesis makes important contributions. Processing large graphs in short time is very important, and making use of multiple nodes and devices is perhaps the only way to achieve this. Towards this, the thesis makes good contributions for easy programming, compiler transformations and efficient runtime systems. One of the commendable aspects of the thesis that it demonstrates with graphs that cannot be accommodated In the memory of a single device. The thesis is generally written well. The related work coverage is very good. The magnitude of thesis excellent for a Masters work. The experimental setup is very comprehensive with good set of graphs, good experimental comparisons with state-of-art works and good platforms. Particularly. the demonstration with a GPU cluster with multiple GPU nodes (Chapter 5) is excellent. The attempt to demonstrate scalability with 2, 4 and 8 nodes is also noteworthy. However, the contributions on optimizations are weak. Most of the optimizations and compiler transformations are straight-forward. There should be summary observations on the results in Chapter 3, especially given that the results are mixed and don't quite clearly convey the clear advantages of their work. The same is the case with multi-device results in chapter 4, where the results are once again mixed. Similarly, the speedups and scalability achieved with multiple nodes are not great. The problem size justification in the multi-node results is not clear. (Referee 1 also indicates a couple of minor changes to the thesis). Referee 2: The thesis uses the OpenCL framework to address the problem of programming graph algorithms on distributed systems. The use of OpenCL ensures that the generated code is platform-agnoistic and vendor-agnoistic. Sufficient experimentation with large scale graphs and reasonable size clusters have been conducted to demonstrate the scalability and portability of the code generated by the framework. The automatically generated code is almost as efficient as manually written code. The thesis is well written and is of high quality. The related work section is well organized and displays a good knowledge of the subject matter under consideration. The author has made important contributions to a good publication as well. 3. An Account of the Open Oral Examination: The oral examination of Ms. Parita Patel took place during 10 AM and 11AM on 27th November 2017, in the Seminar Hall of the Department of Computer Science and Automation. The members of the Oral Examination Board present were, Prof. Sathish Vadhiyar, external examiner and Prof. Y. N. Srikant, research supervisor. The candidate presented the work in an open defense seminar highlighting the problem domain, the methodology used, the investigations carried out by her, and the resulting contributions documented in the thesis before an audience consisting of the examiners, some faculty members, and students. Some of the questions posed by the examiners and the members of the audience during the oral examination are listed below. 1. How much is the overlap between Falcon work and this thesis? Response: We have used the Falcon front end in our work. Further, the existing Falcon compiler was useful to us to test our own implementation of algorithms in Falcon. 2. Why are speedup and scalability not very high with multiple nodes? Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 3. Do you have plans of making the code available for use by the community? Response: The code includes some part of Falcon implementation (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community. 4. How can a graph that does not fit into a single device fit into a single node in the case of multiple nodes? Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning. 5. Is there a way to permit morph algorithms to be coded in your framework? Response: Currently, our framework does not translate morph algorithms. Supporting morph algorithms will require some kind of runtime system to manage memory on GPU since morph algorithms add and remove the vertices and edges to the graph dynamically. This can be further explored in future work. 6. Is it possible to accommodate FPGA devices in your framework? Response: Yes, we can support FPGA devices (or any other device that is compatible for OpenCL) just by specifying the device type in the command line argument. We did not work with other devices because CPU and GPU are generally used to process graph algorithms. The candidate provided satisfactory answers to all the questions posed and the clarifications sought by the audience and the examiners during the presentation. The candidate's overall performance during the open defense and the oral examination was very satisfactory to the oral examination board. 4. Certificate of Corrections and Changes: All the necessary corrections and changes suggested by the examiners have been made in the thesis and these have been verified by the members of the oral examination board. The thesis has been recommended for acceptance in its revised form. 5. Final Recommendation: In view of the recommendations of the referees and the satisfactory performance of the candidate in the oral examination, the oral examination board recommends that the thesis of Ms. ParitaPatel be accepted for the award of the M.Sc(Engg.) Degree of the Institute. Response to the comments by the external examiner on the M.Sc(Engg.) thesis “Compilation of Graph Algorithms for Hybrid, Cross-Platform, and Distributed Architectures” by Parita Patel 1. Comment: The contributions on optimizations are weak. Response: The novelty of this thesis is to make the Falcon platform agnostic, and additionally process large scale graphs on multi-devices of a single node and multi-node clusters seamlessly. Our framework performs similar to the existing frameworks, but at the same time, it targets several types of architectures which are not possible in the existing works. Advanced optimizations are beyond the scope of this thesis. 2. Comment: The translation of Falcon to OpenCL is simple. While the translation of Falcon to OpenCL was not hard, figuring out the details of the translation for multi-device and multi-node architectures was not simple. For example, design of implementations for collection, set, global variables, concurrency, etc., were non-trivial. These designs have already been explained in the appropriate places in the thesis. Further, such large software introduced its own intricacies during development. 3. Comment: Lines between Falcon work and this work are not clear. Response: Appendix-A shows the falcon implementation of all the algorithms which we used to run the experiments. We compiled these falcon implementations through our framework and subsequently ran the generated code on different types of target architectures and compared the results with other framework's generated code. These falcon programs were written by us. We have also used the front-end of the Falcon compiler and this has already been stated in the thesis (page 16). 4. Comment: There should be a summary of observations in chapter 3. Response: Summary of observations have been added to chapter 3 (pages 35-36), chapter 4 (page 46), and chapter 5 (page 51) of the thesis. 5. Comment: Speedup and scalability achieved with multiple nodes are not great. Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 6. Comment: It will be good to separate the related work coverage into a separate chapter. Response: The related work is coherent with the flow in chapter 1. It consists of just 4.5 pages and separating it into a separate chapter would make both (rest of) chapter 1 and the new chapter very small. Therefore, we do not recommend it. 7. Comment: The code should be made available for use by the community. Response: The code includes some part of Falcon code (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community. 8. Comment: Page 28: Shouldn’t the else part be inside the kernel? Response: There was some missing text and a few minor changes in Figure 3.14 (page 28) which have been incorporated in the corrected thesis. 9. Comment: Figure 4.1 needs to be explained better. Response: Explanation for Figure 4.1 (pages 38-39) has been added to the thesis. 10. Comment: The problem size justification in the multi-node results is not clear. Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning. Name of the Candidate: Parita Patel (S.R. No. 04-04-00-10-21-14-1-11610) Degree Registered: M.Sc(Engg.) Department: Computer Science & Automation Title of the Thesis: Compilation of Graph Algorithms for Hybrid, Cross-Platform and Graph algorithms are abundantly used in various disciplines. These algorithms perform poorly due to random memory access and negligible spatial locality. In order to improve performance, parallelism exhibited by these algorithms can be exploited by leveraging modern high performance parallel computing resources. Implementing graph algorithms for these parallel architectures requires manual thread management and memory management which becomes tedious for a programmer. Large scale graphs cannot fit into the memory of a single machine. One solution is to partition the graph either on multiple devices of a single node or on multiple nodes of a distributed network. All the available frameworks for such architectures demand unconventional programming which is difficult and error prone. To address these challenges, we propose a framework for compilation of graph algorithms written in an intuitive graph domain-specific language, Falcon. The framework targets shared memory parallel architectures, computational accelerators and distributed architectures (CPU and GPU cluster). First, it analyses the abstract syntax tree (generated by Falcon) and gathers essential information. Subsequently, it generates optimized code in OpenCL for shared-memory parallel architectures and computational accelerators, and OpenCL coupled with MPI code for distributed architectures. Motivation behind generating OpenCL code is its platform-agnostic and vendor-agnostic behavior, i.e., it is portable to all kinds of devices. Our framework makes memory management, thread management, message passing, etc., transparent to the user. None of the available domain-specific languages, frameworks or parallel libraries handle portable implementations of graph algorithms. Experimental evaluations demonstrate that the generated code performs comparably to the state-of-the-art non-portable implementations and hand-tuned implementations. The results also show portability and scalability of our framework. Large Scale Graph Processing Hybrid Architecture High Performance Compute Resources Distributed Architecture Portability Graph Algorithms Bulk Synchronous Parallel (BSP) Model Open Computing Language (OpenCL) Multi-Node Architectures Multi-Device Architectures Computer Science
10	Performance Analysis of Opportunistic Selection and Rate Adaptation in Time Varying Channels Kona, Rupesh Kumar January 2016 (has links) (PDF) Opportunistic selection and rate adaptation play a vital role in improving the spectral and power efficiency of current multi-node wireless systems. However, time-variations in wireless channels affect the performance of opportunistic selection and rate-adaptation in the following ways. Firstly, the selected node can become sub-optimal by the time data transmission commences. Secondly, the choice of transmission parameters such as rate and power for the selected node become sub-optimal. Lastly, the channel changes during data transmission. In this thesis, we develop a comprehensive and tractable analytical framework that accurately accounts for these effects. It differs from the extensive existing literature that primarily focuses on time-variations until the data transmission starts. Firstly, we develop a novel concept of a time-invariant effective signal-to-noise ratio (TIESNR), which tractably and accurately captures the time-variations during the data transmission phase with partial channel state information available at the receiver. Secondly, we model the joint distribution of the signal-to-noise ratio at the time of selection and TIESNR during the data transmission using generalized bivariate gamma distribution. The above analytical steps facilitate the analysis of the outage probability and average packet error rate (PER) for a given modulation and coding scheme and average throughput with rate adaptation. We also present extensive numerical results to verify the accuracy of each step of our approach and show that ignoring the correlated time variations during the data transmission phase can significantly underestimate the outage probability and average PER, whereas it overestimates the average throughput even for packet durations as low as 1 msec. Time Varying Channels Multi-Node Wireless Systems Signal-To-Noise Ratio Opportunistic Selection Rate Adaptation Wireless Channels Modulation and Coding Scheme Data Transmission TIESNR Packet Error Rate Electrical Communication Engineering

Search results