• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 17
  • 17
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Bus Topology Exploration and Memory Allocation for Heterogeneous Systems

Wu, Jhih-Yong 02 August 2007 (has links)
Since semiconductor process is constantly being improved, the complexity of system-on-chip is rising daily and we can place more and more elements on the same chip area. The system designers have been searching new methodology that can handle the complex systems and the environment which can quickly simulate the system-on-chip. It is brought forward that is raising the level of abstraction, as the design methodology of Electronic-System-Level (ESL). But system designers still need to decide the system architecture (the bus and PE connection status), and judge if the system could meet the performance and cost constraints by simulation results. For the very complex system, system designers will cost more and more time owning to the growth of design space to get the best system architecture. In this thesis, we propose a synthesis method to support automatic ESL design and help system designers to decide system architecture from large design space in short time. The method uses fast estimation method to estimate bus topology and memory allocation that affect the processing-elements¡¦ communication. By this method, we can find better system architecture which meets all constraints with the same amount of processing-elements.
2

Effects of Development Platform Heterogeneity in Testing of Heterogeneous systems: An Industrial Survey

V N ANJANAYA UDAY, MAJETI January 2016 (has links)
Context. Over the years, software has evolved to a large and complex system of systems. According to the literature, a heterogeneous system is defined as “a system comprised of n number of subsystems where at least one subsystem exhibits heterogeneity with respect to other subsystem”. The area of research in heterogeneous system has also received large attention in recent years, as a result of shift in technology and customer needs. In heterogeneous systems, heterogeneity may occur in different dimensions for different systems. Objectives. The main aim of this thesis is, “to investigate the effects of development platform heterogeneity in heterogeneous system on the test process”. The objectives to achieve our aim is to determine the influence of platform heterogeneity on software testing and also to investigate best practices for testing heterogeneous systems with different types of heterogeneity. Methods. An industrial survey and interviews with practitioners are considered as a research method in this thesis. The purpose of this survey is to help the testers to understand how the platform heterogeneity affects the test process. Results. In this research, the researcher had gathered data related to effects and best practices in heterogeneous systems from both survey and interviews. Conclusions. In this thesis, the researcher had investigated the effects of development platform heterogeneity in heterogeneous system on the test process and also identified the best practices for testing heterogeneous systems that exhibit different types of heterogeneity. Apart from these, the researcher also identified different types of development platforms used for development of a heterogeneous type of systems in the industry.
3

OPTIMIZATIONS FOR N-BODY PROBLEMS ON HETEROGENOUS SYSTEMS

Jianqiao Liu (6636020) 14 May 2019 (has links)
<div><div>N-body problems, such as simulating the motion of stars in a galaxy and evaluating the spatial statistics through n-point correlation function, are popularly solved. The naive approaches to n-body problems are typically O(n^2) algorithms. Tree codes take advantages of the fact that a group of bodies can be skipped or approximated as a union if their distance is far away from one body’s sight. It reduces the complexity from O(n^2) to O(n*lgn). However, tree codes rely on pointer chasing and have massive branch instructions. These are highly irregular and thus prevent tree codes from being easily parallelized. </div><div><br></div><div>GPU offers the promise of massive, power-efficient parallelism. However, exploiting this parallelism requires the code to be carefully structured to deal with the limitations of the SIMT execution model. This dissertation focusses on optimizations for n-body problems on the heterogeneous system. A general inspector-executor based framework is proposed to automatically schedule GPU threads to achieve high performance. Essentially, the framework lets the GPU execute partial of the tree codes and profile threads behaviors, then it assigns the CPU to re-organize these threads to minimize the divergence before executing the remaining portion of the traversals on the GPU. We apply this framework to six tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses hand tuned, application-specific scheduling. </div><div> </div><div>For large scale input, ChaNGa is the best-of-breed n-body platform. It uses an asymp-totically-efficient tree traversal strategy known as a dual-tree walk to quickly provide an accurate simulation result. On GPUs, ChaNGa uses a hybrid strategy where the CPU performs the tree walk to determine which bodies interact while the GPU performs the force computation. In this dissertation, we show that a highly optimized single-tree walk approach is able to achieve better GPU performance by significantly accelerating the tree walk and reducing CPU/GPU communication. Our experiments show that this new design can achieve a 8.25x speedup over baseline ChaNGa using one node, one process per node configuration. We also point out that ChaNGa's implementation doesn't satisfy the inclusion condition so that GPU-centric remote tree walk doesn't perform well.</div></div>
4

Mechanically flexible interconnects (MFIs) for large scale heterogeneous system integration

Zhang, Chaoqi 07 April 2015 (has links)
In this research, wafer-level flexible input/output interconnection technologies, Mechanically Flexible Interconnects (MFIs), have been developed. First, Au-NiW MFIs with 65 µm vertical elastic range of motion are designed and fabricated. The gold passivation layer is experimentally verified to not only lower the electrical resistance but also significantly extend the life-time of the MFIs. In addition, a photoresist spray-coating based fabrication process is developed to scale the in-line pitch of MFIs from 150 µm to 50 µm. By adding a contact-tip, Au-NiW MFI could realize a rematable assembly on a substrate with uniform pads and a robust assembly on a substrate with 45 µm surface variation. Last but not least, multi-pitch multi-height MFIs (MPMH MFIs) are formed using double-lithography and double-reflow processes, which can realize an MFI array containing MFIs with various heights and various pitches. Using these advanced MFIs, large scale heterogeneous systems which can provide high performance system-level interconnections are demonstrated. For example, the demonstrated 3D interposer stacking enabled by MPMH MFIs is promising to realize a low profile and cavity-free robust stacking system. Moreover, bridged multiinterposer system is developed to address the reticle and yield limitations of realizing a large scale system using current 2.5D integration technologies. The high-bandwidth interconnection available within interposer can be extended by using a silicon chip to bridge adjacent interposers. MFIs assisted thermal isolation is also developed to alleviate thermal coupling in a high-performance 3D stacking system.
5

Prototyping methodology of image processing applications on heterogeneous parallel systems / Méthodologie de prototypage d'applications de traitement d'image sur des systèmes parallèles hétérogènes

Zhang, Jinglin 19 December 2013 (has links)
Le travail présenté dans cette thèse s'inscrit dans un contexte de manipulation croissante d'images et de vidéo sur des systèmes embarqués parallèles. Les limitations et le manque de flexibilité dans la conception actuelle de ces systèmes font qu’il est de plus en plus compliqué de mettre en oeuvre les applications, en particulier lorsque le système est hétérogène. Or, non seulement Open Computing Language (OpenCL) est un nouveau cadre pour utiliser pleinement la capacité de calcul des processeurs généraux ou embarqués, mais, en outre, des outils de prototypage rapide sont disponibles pour la conception des systèmes, leur but étant de générer un prototype fiable ou de mettre en oeuvre de manière automatique les applications de traitement d’images et vidéo sur les systèmes embarqués. L'objectif général de cette thèse est d'évaluer et d'améliorer les processus de conception pour les systèmes embarqués, particulièrement ceux fondés sur des approches flot de données (haut niveau d’abstraction) et OpenCL (niveau intermédiaire d’abstraction). Cet objectif ambitieux fait l’objet de plusieurs projets dont le projet collaboratif COMPA, mettant en oeuvre les outils Orcc, Preesm et HMPP. Dans ce cadre, cette thèse vise à valider et évaluer ces outils sur des applications d'estimation de mouvement et d’appariement stéréo. Nous avons ainsi modélisé ces applications dans le langage hautniveau RVC-CAL. Puis, par le biais des trois outils Orcc, Preesm et HMPP, nous avons généré et vérifié du code C, OpenCL et CUDA, pour des plates-formes hétérogènes CPU multi-coeur et GPU. L’implémentation des algorithmes sur la puce embarquée MPPA multi-coeur (many-core) de la société KALRAY, a été étudiée. Pour atteindre l’objectif, nous avons proposé trois algorithmes. Le premier est un estimateur de mouvement parallélisé pour un système hétérogène constitué d’un CPU et d’un GPU : pour cette implantation, nous avons développé une méthode qui équilibre la répartition des charges entre CPU et GPU. Le second algorithme est une méthode d’appariement stéréo en temps réel : elle utilise une combinaison de fonctions de coût et une agrégation des coûts par pas d’itération carré ; nos résultats expérimentaux surpassent les autres méthodes en offrant un compromis intéressant entre la complexité de l’algorithme et sa précision. Le troisième algorithme est une méthode d’appariement stéréo basée sur le mouvement : elle utilise les vecteurs de mouvements issus du premier algorithme pour déterminer la région d’étude nécessaire pour le second algorithme ; nos résultats montrent que l’approche est particulièrement efficace lorsque les séquences de test sont riches en mouvement, même bruitées. / The work presented in this thesis takes place in a context of growing demand for image and video applications on parallel embedded systems. The limitations and lack of flexibility of current design with parallel embedded systems make increasingly complicated to implement applications, particularly on heterogeneous systems. But Open Computing Language (OpenCL) is a new framework for fully employ the capability of computation of general purpose processors or embedded processors. In the meantime, some rapid prototyping tools to design systems are proposed to generate a reliably prototype or automatically implement the image and video applications on embedded systems. The goal of this thesis was to evaluate and to improve design processes for embedded systems, especially based on the dataflow approach (high level of abstraction) and OpenCL approach (intermediate level of abstraction). This challenge is tackled by several projects including the collaborative project COMPA which studies a framework based on the Orcc, Preesm and HMPP tools. In this context, this thesis aims to validate and to evaluate the framework with motion estimation and stereo matching algorithms. For this aim, algorithms have been described using the high-level RVC-CAL language. With the help of Orcc, Preesm, and HMPP tools, we generated and verified C code or OpenCL code or CUDA code for heterogeneous platforms based on multi-core CPU and GPU. We also studied the implementations of these algorithms onto the last generation of many-core for embedded system called MPPA and developed by KALRAY. We proposed three algorithms. One is a parallelized motion estimation method for heterogeneous system based on one CPU and one GPU: we developed one basic method to balance the workload distribution on such heterogeneous system. The second algorithm is a real-time stereo matching method that adopts combined costs and costs aggregation with square size step to implement on laptop’s GPU platform: our experimental results outperform other baseline methods about tradeoff between matching accuracy and time-efficiency. The third algorithm is a joint motion-based video stereo matching method that uses the motion vectors calculated by the first algorithm to build the support region for the second algorithm: our experimental results outperform the stereo video matching methods in the test sequences with abundant movement even in large amounts of noise.
6

Enhancing capacity and coverage for heterogeneous cellular systems

Mahmud, Azwan Bin January 2014 (has links)
The thesis is concerned with capacity and coverage enhancement of OFDMA heterogeneous cellular systems with a specific focus on fractional frequency reuse (FFR), femtocells and amplify-and-forward (AF) relay systems. The main aim of the thesis is to develop new mathematical analysis for the spectral efficiency and outage probability of multi-cells multi-tier systems in diverse traffic, interference and fading scenarios. In the first part of the thesis, a new unified mathematical framework for performance analysis of FFR and soft frequency reuse (SFR) schemes is developed. This leads to new exact expressions of FFR and SFR area spectral efficiency in downlink and uplink scenarios which account for a mixture of frequency reuse factors in a homogeneous cellular system. The mathematical framework is extended to include modelling and performance analysis of FFR systems with elastic data traffic. Further analysis is carried out in relation to the performance of FFR and/or SFR schemes, in terms of energy efficiency and base station cooperation. The new proposed analytical framework can lead to a better understanding and computationally efficient performance analysis of next generation heterogeneous cellular systems. Next generation cellular systems are characterized by an increase in the spatial node density to improve the spectral efficiency and coverage, especially for users at home and at the cell edges. In this regard, relays and femtocells play a major role. Therefore, relays and femocells are the focus of the second part of the thesis. Firstly, we present a new and unified spectral efficiency analysis in dual-hop fixed-gain AF relay systems over generalised interferences models. The generalised interference models are either based on the Nakagami-m fading with arbitrary distance or on spatial Poisson Point Process in case of randomly deployed heterogeneous interferers. The models have been considered separately in the open literature due to the complexity of the mathematical analysis. Secondly, the outage probability is utilised to deduce the femtocell exclusion region for FFR system and a new static resource allocation scheme is proposed for femtocells which improve the capacity. The work presented in the thesis has resulted in the publication of seven scientific papers in prestigious IEEE journals and conferences.
7

JOB SCHEDULING FOR STREAMING APPLICATIONS IN HETEROGENEOUS DISTRIBUTED PROCESSING SYSTEMS

Al-Sinayyid, Ali 01 December 2020 (has links)
The colossal amounts of data generated daily are increasing exponentially at a never-before-seen pace. A variety of applications—including stock trading, banking systems, health-care, Internet of Things (IoT), and social media networks, among others—have created an unprecedented volume of real-time stream data estimated to reach billions of terabytes in the near future. As a result, we are currently living in the so-called Big Data era and witnessing a transition to the so-called IoT era. Enterprises and organizations are tackling the challenge of interpreting the enormous amount of raw data streams to achieve an improved understanding of data, and thus make efficient and well-informed decisions (i.e., data-driven decisions). Researchers have designed distributed data stream processing systems that can directly process data in near real-time. To extract valuable information from raw data streams, analysts need to create and implement data stream processing applications structured as a directed acyclic graphs (DAG). The infrastructure of distributed data stream processing systems, as well as the various requirements of stream applications, impose new challenges. Cluster heterogeneity in a distributed environment results in different cluster resources for task execution and data transmission, which make the optimal scheduling algorithms an NP-complete problem. Scheduling streaming applications plays a key role in optimizing system performance, particularly in maximizing the frame-rate, or how many instances of data sets can be processed per unit of time. The scheduling algorithm must consider data locality, resource heterogeneity, and communicational and computational latencies. The latencies associated with the bottleneck from computation or transmission need to be minimized when mapped to the heterogeneous and distributed cluster resources. Recent work on task scheduling for distributed data stream processing systems has a number of limitations. Most of the current schedulers are not designed to manage heterogeneous clusters. They also lack the ability to consider both task and machine characteristics in scheduling decisions. Furthermore, current default schedulers do not allow the user to control data locality aspects in application deployment.In this thesis, we investigate the problem of scheduling streaming applications on a heterogeneous cluster environment and develop the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto a heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications that are structured as a DAG, such as Amazon Timestream, Google Millwheel, and Twitter Heron. We conducted experiments using three Storm microbenchmark topologies in both simulated and real Apache Storm environments. To evaluate performance, we compared the proposed MT-Scheduler with the simulated round-robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round-robin approach in terms of both average system latency and throughput.
8

Functional Programming and Metamodeling frameworks for System Design

Mathaikutty, Deepak Abraham 19 May 2005 (has links)
System-on-Chip (SoC) and other complex distributed hardware/software systems contain heterogeneous components whose behavior are best captured by different models of computations (MoCs). As a result, any system design framework for such systems requires the capability to express heterogeneous MoCs. Although a number of system level design languages (SLDL)s and frameworks have proliferated over the last few years, most of them are lacking in multiple ways. Some of the SLDLs and system design frameworks we have worked with are SpecC, Ptolemy II, SystemC-H, etc. From our analysis of these, we identify their following shortcomings: First, their dependence on specific programming language artifacts (Java or C/C++) make them less amenable to formal analysis. Second, the refinement strategies proposed in the design flows based on these languages lack formal semantics underpinnings making it difficult to prove that refinements preserve correctness, and third, none of the available SLDLs are easily customizable by users. In our work, we address these problems as follows: To alleviate the first problem, we follow Axel Jantsch's paradigm of function-based semantic definitions of MoCs and formulate a functional programming framework called SML-Sys. We illustrate through a number of examples how to model heterogenous computing systems using SML-Sys. Our framework provides for formal reasoning due to its formal semantic underpinning inherited from SML's precise denotational semantics. To handle the second problem and apply refinement strategies at a higher-level, we propose a refinement methodology and provide a semantics preserving transformation library within our framework. To address the third shortcoming, we have developed EWD, which allows users to customize MoC-specific visual modeling syntax defined as a metamodel. EWD is developed using a metamodeling framework GME (Generic Modeling Environment). It allows for automatic design-time syntactic and semantic checks on the models for conformance to their metamodel. Modeling in EWD facilitates saving the model in an XML-based interoperability language (IML) we defined for this purpose. The IML format is in turn automatically translated into Standard ML, or Haskell models. These may then be executed and analyzed either by our existing model analysis tools SMLSys, or the ForSyDe environment. We also generate SMV-based template from the XML representation to obtain verification models. / Master of Science
9

Ein Betriebssystem für konfigurierbare Hardware

Krutz, David 22 January 2007 (has links)
In dieser Arbeit wird die Möglichkeit der Unterstützung des Hardwareentwurfs mit VHDL durch ein Hardwarebetriebssystem untersucht. Durch die Wiederverwendung von Betriebssystemmodulen sollen die Entwicklungszeit verkürzt, die Nachnutzbarkeit von Entwürfen verbessert und die Zuverlässigkeit erhöht werden. Um ein Betriebssystemkonzept umzusetzen, müssen spezielle Anforderungen an die Programmiersprache gestellt werden. Diese werden von VHDL nicht erfüllt. Daher wird ein Strukturcompiler vorgestellt, der unter Beibehaltung der Syntax der Sprache VHDL den zusätzlichen Anforderungen gerecht wird. Der Strukturcompiler verbindet das Anwendungsprogramm mit den Betriebssystemmodulen und erzeugt daraus ein VHDL-Programm, das mit den typischen FPGA-Entwicklungswerkzeugen simuliert oder synthetisiert werden kann. Bei der Entwicklung des Betriebssystems für konfigurierbare Hardware hat sich herausgestellt, dass sich dieses nur eingebettet in ein Gesamtkonzept für den Entwurf von heterogene Systeme sinnvoll anwenden lässt. Deshalb wird in dieser Arbeit eine Methode für die Entwicklung von heterogenen Systemen auf Basis eines Signalflussgraphen diskutiert. Angewendet wurde das Betriebssystemkonzept auf verschiedenen FPGA-Karten, sowohl käuflich erworbene als auch Eigenentwicklungen. Das für diese Karten erstellte Betriebssystem umfasst dabei Module zur Kommunikation zwischen FPGA und PC sowie zur Anbindung verschiedener externer Peripheriegeräte, wie z.B. Speicher. Es wurde ebenfalls untersucht wie Prozessoren als Bestandteil der konfigurierbaren Hardware in das Betriebssystemkonzept integriert werden können. Im Rahmen dieser Arbeit wurden auch viele Beispielanwendungen untersucht. Diese wurden einerseits zum Testen des Strukturcompilers und der Betriebssystemmodule benutzt. Andererseits fand das Betriebssystemkonzept für konfigurierbare Hardware auch Anwendung in verschiedenen Projekten. / This work investigates the possibility of describing a hardware design independent of special hardware. This is realized with the concept of an operating system. The re-use of operating system modules reduces the time of development and also increases the reliability. Additionally, the change of a development platform has no influence on the application algorithm anymore. In order to apply the concept of an operating system special constraints have to be fulfilled by the hardware description language, which is not supported by VHDL. For that reason a structure compiler has been developed. The structure compiler connects the application program with the operating system modules and produces a VHDL program, which can be used to simulate or to program the FPGA with the typical VHDL development tools. In the progress of developing the operating system concept for reconfigurable hardware it was realized that such a concept can only be used in connection with a design methodology for heterogeneous systems. In this work a design methodology based on a declarative language represented as signal flow graph is discussed. The operating system concept for reconfigurable hardware was tested on different FPGA boards. For these cards an operating system was developed. The operating system contains modules for the communication with the PC over different interfaces as well as modules for accessing different exterior peripheries, i.e. memory. Additionally, the integration of processors as part of the configurable hardware within the operating system concept was investigated. For the verification of the structure compiler and the operating system modules some examples have been developed. The operating system concept for configurable hardware was also applied in different projects.
10

Prototyping methodology of image processing applications on heterogeneous parallel systems

Zhang, Jinglin 19 December 2013 (has links) (PDF)
The work presented in this thesis takes place in a context of growing demand for image and video applications on parallel embedded systems. The limitations and lack of flexibility of current design with parallel embedded systems make increasingly complicated to implement applications, particularly on heterogeneous systems. But Open Computing Language (OpenCL) is a new framework for fully employ the capability of computation of general purpose processors or embedded processors. In the meantime, some rapid prototyping tools to design systems are proposed to generate a reliably prototype or automatically implement the image and video applications on embedded systems. The goal of this thesis was to evaluate and to improve design processes for embedded systems, especially based on the dataflow approach (high level of abstraction) and OpenCL approach (intermediate level of abstraction). This challenge is tackled by several projects including the collaborative project COMPA which studies a framework based on the Orcc, Preesm and HMPP tools. In this context, this thesis aims to validate and to evaluate the framework with motion estimation and stereo matching algorithms. For this aim, algorithms have been described using the high-level RVC-CAL language. With the help of Orcc, Preesm, and HMPP tools, we generated and verified C code or OpenCL code or CUDA code for heterogeneous platforms based on multi-core CPU and GPU. We also studied the implementations of these algorithms onto the last generation of many-core for embedded system called MPPA and developed by KALRAY. We proposed three algorithms. One is a parallelized motion estimation method for heterogeneous system based on one CPU and one GPU: we developed one basic method to balance the workload distribution on such heterogeneous system. The second algorithm is a real-time stereo matching method that adopts combined costs and costs aggregation with square size step to implement on laptop's GPU platform: our experimental results outperform other baseline methods about tradeoff between matching accuracy and time-efficiency. The third algorithm is a joint motion-based video stereo matching method that uses the motion vectors calculated by the first algorithm to build the support region for the second algorithm: our experimental results outperform the stereo video matching methods in the test sequences with abundant movement even in large amounts of noise.

Page generated in 0.0693 seconds