• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 63
  • 29
  • 8
  • 6
  • 4
  • 1
  • 1
  • Tagged with
  • 134
  • 38
  • 23
  • 22
  • 21
  • 20
  • 19
  • 19
  • 19
  • 18
  • 18
  • 18
  • 16
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

ChipCflow: tool for convert C code in a static dataflow architecture in reconfigurable hardware / ChipCflow: ferramenta para conversão de código C em uma arquitetura a fluxo de dados estática em harware reconfigurável

Antonio Carlos Fernandes da Silva 19 February 2015 (has links)
A growing search for alternative architectures and softwares have been noted in the last years. This search happens due to the advance of hardware technology and such advances must be complemented by innovations on design methodologies, test and verification techniques in order to use technology effectively. Alternative architectures and softwares, in general, explores the parallelism of applications, differently to Von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the process of program execution is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted search area due to hardware advances, in particular, the advances of Reconfigurable Computing and Field Programmable Gate Arrays (FPGAs). ChipCflow projet is a tool for execution of algorithms using dynamic dataflow graph in FPGA. In this thesis, the development of a code conversion tool to generate aplications in a static dataflow architecture, is described. Also the ChipCflow project where the code conversion tool is part, is presented. The specification of algorithm to be converted is made in C language and converted to a hadware description language, respecting the proposed by ChipCflow project. The results are the proof of concept of converting a high-level language code for dataflow architecture to be used into a FPGA. / Existe uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece pois houveram avanços na tecnologia do hardware, e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Os software e arquiteturas alternativas, geralmente são modelos que exploram o paralelismo das aplicações, ao contrário do modelo de Von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados, logo o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade do programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser uma área de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e dos Field Programmable Gate Arrays (FPGAs).Nesta tese é descrita uma ferramenta de conversão de código que visa a geração de aplicações utilizando uma arquitetura a fluxo de dados estática. Também é descrito o projeto ChipCflow, cuja ferramenta de conversão de código, descrita nesta tese, é parte integrante. A especificação do algoritmo a ser convertido é feita em linguagem C e convertida para uma linguagem de descrição de hardware, respeitando o modelo proposto pelo ChipCflow. Os resultados alcançados visam a prova de conceito da conversão de código de uma linguagem de alto nível para uma arquitetura a fluxo de dados a ser configurada em FPGA.
72

ChipCFlow - Partição e protocolo de comunicação no grafo a fluxo de dados dinâmico / ChipCFlow - partioning and communication protocol in the dynamic dataflow graph

Lucas Barbosa Sanches 14 May 2010 (has links)
Este trabalho descreve a prova de conceito de uma abordagem que utiliza o modelo de computação a fluxo de dados, inerentemente paralelo, associado ao modelo de computação reconfigurável parcial e dinamicamente, visando à obtenção de sistemas computacionais de alto desempenho. Mais especificamente, trata da obtenção de um modelo para o particionamento dos grafos a fluxo de dados dinâmicos e de um protocolo de comunicação entre suas partes, a fim de permitir a sua implementação em arquiteturas dinamicamente reconfiguráveis, em especial em FGPAs Virtex da Xilinx. Enquadra-se no contexto do projeto ChipCFlow, de escopo mais amplo, que pretende obter uma ferramenta para geração automática de descrição de hardware sintetizável, a partir de código em alto nível, escrito em linguagem C, fazendo uso da abordagem a fluxo de dados para extrair o paralelismo implícito nas aplicações originais. O modelo proposto é aplicado em um grafo a fluxo de dados dinâmico, e através de simulações sua viabilidade é discutida / This work describes the concept of an approach that uses data ow computational model, inherently parallel, associated with de reconfigurable computing model, partial and dynamic, in order to obtain high performance computational systems. More specifically, it is about a model to the partitioning and communication between partitioned sectors of a CDFG (Control Data Flow Graph) in order to map these graphs on a partial reconfiguration FPGA fabric, in special Virtex II/II-Pro from Xilinx. It is part of the ChipCFlow project, that has a bigger scope, and that aims to automatically obtain syntetisable hardware descriptions, from high level code written in C and, by using a data flow approach to extract implicit parallelism in original applications. The model obtained is extensively explained and applied to an example of CDFG, where by means of simulations its feasibility is discussed
73

High-level synthesis of elasticity : from models to circuits

Jelodari Mamaghani, Mahdi January 2016 (has links)
The forward-looking design trend in Very Large Scale Integrated (VLSI) is Systems-on-Chip (SoC). SoC aims to integrate multiple computation, communication and storage components into a single chip and targets high performance systems by elimination of most on-chip communication costs. It is agreed that running SoC components under control of a single clock is not feasible and clock distribution has been revealed as a critical obstacle. Asynchronous techniques can be exploited to relax strict timing constraints of traditional design methodologies. A less radical solution is Globally Asynchronous Locally Synchronous (GALS) systems which offer potential advantages in this respect, as it preserves system modularity and concentrates on communication aspects. The problem with GALS design is the relative lack of familiarity of traditional designers with this approach. To deal with this, a methodology is proposed to allow designers implement GALS systems at a higher abstraction level which is independent of technology, protocol, data encoding or any other details of circuit design. With the recent advances in concurrent programming, Communicating Sequential Processes (CSP) has gained popularity again. The CSP-based programming languages, like Go, have emerged to allow software designers to exploit the model toward implementing scalable softwares. CSP has a long history since 90's in the hardware domain, mainly utilised by the Asynchronous community. In this thesis, a novel high level synthesis framework is proposed, called eTeak, which enables the designers to implement GALS-like systems in a CSP-based language (Balsa) without concerning about the timing issues at system level. The proposed approach in this thesis takes advantage of synchronous elasticity to introduce a common timing discipline to the circuit which transforms it into a latency-insensitive system. A latency-insensitive system is able to tolerate dynamic changes in the computation and communication delays. This feature enables eTeak to raise the level of abstraction to the data-flow representation where functionality is separated from timing details. Therefore, it is possible for a designer to specify a large scale system by only concentrating on its functionality and postpone timing complexity to when synthesis takes place. Unlike many previous systems, the proposed design flow employs data-driven synthesis style to distribute controllers through the network which contributes to its modularity and enhanced concurrency. This facilitates partitioning into elastic blocks and is supposed to pave the road for further optimisations, such retiming and re-synthesis, using commercial EDA tools.
74

Dimensional Analysis of Data Flow Programs

Shennat, Abdulmonem Ibrahim 24 May 2022 (has links)
Our main objective is to design Dimensional Analysis (DA) algorithms for the multidimensional dialect PyLucid of Lucid, the equational data flow language. The significance is that the DA is indispensable for an efficient implementation of multidimensional Lucid and should aid the implementation of other data flow systems, such as Google’s TensorFlow. Data flow is a form of computation in which components of multidimensional datasets (MDDs) travel on communication lines in a network of processing stations. Each processing station incrementally transforms its input MDDs to its output, another (possibly very different) MDD. MDDs are very common in Health Information Systems and data science in general. An important concept is that of relevant dimension. A dimension is relevant if the coordinate of that dimension is required to extract a value. It is very important that in calculating with MDDs we avoid non-relevant dimensions, otherwise we duplicate entries (say, in a cache) and waste time and space. Suppose, for example, that we are measuring rainfall in a region. Each individual measurement (say, of an hour’s worth of rain) is determined by location (one dimension), day, (a second dimension) and time of day (a third dimension). All three dimensions are a priori relevant. Now suppose we want the total rainfall for each day. In this MDD (call it N) the relevant dimensions are location and day, but time of day is no longer relevant and must be removed. Normally this is done manually. However, can this process be automated? We answer this question affirmatively by devising and testing algorithms that produce useful and reliable approximations (specifically, upper bounds) for the dimensionalities of the variables in a program. By dimensionality we mean the set of relevant dimensions. For example, if M is the MDD of raw rain measurements, its dimensionality is {location, day, hour}, and that of N is {location, day}. Note that the dimensionality is more than just the rank, which is simply the number of dimensions. Previously, there’s extensive research on dataflow itself, which we summarize. However, an exhaustive literature search uncovered no relevant previous DA work other than that of the GLU (Granular Lucid) project in the 90s. Unfortunately the GLU project was funded privately and remains proprietary – not even the author has access to it. Our methodology is that we proceeded incrementally, solving increasingly difficult instances of DA corresponding to increasingly sophisticated language features. We solved the case of one dimension (time), two dimensions (time and space), and multiple dimensions. We also solved the difficult problem (which the GLU team never solved) of determining the dimensionality of programs that include user defined functions, including recursively defined functions. We do this by adapting the PyLucid interpreter (to produce the DAM interpreter) to evaluating the entire program over the (finite) domain of dimensionalities. As a result, the experimentally validated algorithms in our dissertation can produce useful upper bounds for the dimensionalities of the variables in multidimensional PyLucid programs. That also includes those with user-defined functions / Graduate
75

Utilizing Heterogeneity in Manycore Architectures for Streaming Applications

Savas, Süleyman January 2017 (has links)
In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures. The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores. This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains. Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance. / HiPEC (High Performance Embedded Computing) / NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
76

Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience / Compilation pour une architecture multi-thread à flot de données : algorithmes, outils et retour d'expérience

Li, Feng 20 May 2014 (has links)
Quelque-soit le multiprocesseur et son architecture, la facilité de leur programmation demeure une difficulté majeure. Une croyance bien installée est que l’exploitation correcte et efficace du parallélisme dans une application est une question pour les concepteurs d’outils de développement logiciel. Selon cette vision, nous avons besoin de techniques de compilation plus sophistiqués pour partitionner une application en threads simultanés. Mais de nombreux experts revendiquent que l'architecture joue un rôle tout aussi important: il faut opérer un changement fondamental dans l'architecture de processeurs avant que l’on puisse espérer des progrès importants au niveau de leur programmabilité. Notre approche favorise la convergence de ces points de vue. La convergence entre le calcul parallèle “en flot de données” avec l'architecture de von Neumann est porteuse de nombreuses promesses. En particulier en termes de tolérance à la latence, en termes d’exploitation d'un haut degré de parallélisme, le tout pour un très faible coût de changement de contexte entre threads. Les architectures à flot de données multithread exigent un haut degré de parallélisme pour tolérer la latence. D'autre part, le partitionnement d’un programme en un grand nombre de threads à grain fin est une source d'erreurs commune pour les développeurs. Pour reconcilier ces faits, nous nous efforçons de faire progresser l'état de l'art dans le partitionnement automatique de threads, conjointement avec le support du langage de programmation pour l’exploitation de parallélisme à plus gros grain, tout en préservant un concurrence déterministe. Cette thèse présente un algorithme général de partitionnement de threads, pour transformer du code séquentiel en un programme exprimant du parallélisme en flot de données. Notre algorithme fonctionne sur le Program Dependence Graph (PDG) et la forme en assignation unique statique (Static Single Assignment, SSA), pour extraire du parallélisme de tâche, pipeline, et de données, en présence de flot de contrôle arbitraire. Nous avons conçu une nouvelle représentation intermédiaire pour faciliter la génération de code, et son exécution parallèle en flot de données. Nous avons également mis en œuvre ces algorithmes dans un prototype fondé sur GCC, et contribué au développement d’une plateforme de simulation permettant d’explorer la parallélisation en flot de données à grande échelle. Ces extensions et l'architecture simulée permettent l'exploration de modèles innovants de mémoire pour le parallélisme en flot de données. Ces outils et modèles ont également été évalués sur des applications réalistes. / Across the wide range of multiprocessor architectures, all seem to share one common problem: they are hard to program. It is a general belief that parallelism is a software problem, and that perhaps we need more sophisticated compilation techniques to partition the application into concurrent threads. Many experts also make the point that the underlining architecture plays an equally important architecture before one may expect significant progress in the programmability of multiprocessors. Our approach favors a convergence of these viewpoints. The convergence of dataflow and von Neumann architecture promises latency tolerance, the exploitation of a high degree of parallelism, and light thread switching cost. Multithreaded dataflow architectures require a high degree of parallelism to tolerate latency. On the other hand, it is error-prone for programmers to partition the program into large number of fine grain threads. To reconcile these facts, we aim to advance the state of the art in automatic thread partitioning, in combination with programming language support for coarse-grain, functionally deterministic concurrency. This thesis presents a general thread partitioning algorithm for transforming sequential code into a parallel data-flow program targeting a multithreaded dataflow architecture. Our algorithm operates on the program dependence graph and on the static single assignment form, extracting task, pipeline, and data parallelism from arbitrary control flow, and coarsening its granularity using a generalized form of typed fusion. We design a new intermediate representation to ease code generation for an explicit token match dataflow execution model. We also implement a GCC-based prototype. We also evaluate coarse-grain dataflow extensions of OpenMP in the context of a large-scale 1024-core, simulated multithreaded dataflow architecture. These extension and simulated architecture allow the exploration of innovative memory models for dataflow computing. We evaluate these tools and models on realistic applications.
77

An Application Framework for a Power-Aware Processor Architecture

Mandlekar, Anup Shrikant 31 August 2012 (has links)
The instruction-set based general purpose processors are not energy-efficient for event-driven applications. The E-textiles group at Virginia Tech proposed a novel data-flow processor architecture design to bridge the gap between event-driven applications and the target architecture. The architecture, although promising in terms of performance and energy-efficiency, was explored for limited number of applications. This thesis presents a model-driven approach for the design of an application framework, facilitating rapid development of software applications to test the architecture performance. The application framework is integrated with the prior automation framework bringing software applications at the right level of abstraction. The processor architecture design is made flexible and scalable, making it suitable for a wide range of applications. Additionally, an embedded flash memory based architecture design for reduction in the static power consumption is proposed. This thesis estimates significant reduction in overall power consumption with the incorporation of flash memory. / Master of Science
78

Hardware Synthesis of Synchronous Data Flow Models

Koecher, Matthew R. 06 April 2004 (has links) (PDF)
Synchronous Dataflow (SDF) graphs are a convenient way to represent many signal processing and dataflow operations. Nodes within SDF graphs represent computation while arcs represent dependencies between nodes. Using a graph representation, SDF graphs formally specify a dataflow algorithm without any assumptions on the final implementation. This allows an SDF model to be synthesized into a variety of implementation techniques including both software and hardware. This thesis presents a technique for generating an abstract hardware representation from SDF models. The techniques presented here operate on SDF models defined structurally within the Ptolemy modeling environment. The behavior of the nodes within Ptolemy SDF models is specified in software and can be simple, such as a single arithmetic operation, or arbitrarily complex. This thesis presents a technique for extracting the behavior of a limited class of SDF nodes defined in software and generating a structural description of the SDF model based on primitive arithmetic and logical operations. This synthesized graph can be used for subsequent hardware synthesis transformations.
79

Automating dataflow for a machine learning algorithm

Gunneström, Albert, Bauer, Erik January 2019 (has links)
Machine learning algorithms can be used to predict the future demand for heat in buildings. This can be used as a decision basis by district heating plants when deciding an appropriate heat output for the plant. This project is based on an existing machine learning model that uses temperature data and the previous heat demand as input data. The model has to be able to make new predictions and display the results continuously in order to be useful for heating plant operators. In this project a program was developed that automatically collects input data, uses this data with the machine learning model and displays the predicted heat demand in a graph. One of the sources for input data does not always provide reliable data and in order to ensure that the program runs continuously and in a robust way, approximations of missing data have to be made. The result is a program that runs continuously but with some constraints on the input data. The input data needs to be able to provide some correct values within the last two days in order for the program run continuously. A comparison between calculated predictions and the actual measured heat demand showed that the predictions were in general higher than the actual values. Some possible causes and solutions were identified but are left for future work. / Maskininlärnings-algoritmer kan användas för att göra prediktioner på den framtida efterfrågan på värme i fastigheter. Detta kan användas som ett beslutsunderlag av fjärrvärmeverk för att avgöra en lämplig uteffekt. Detta projektarbete baseras på en befintlig maskininlärnings-modell som använder sig av temperaturdata och tidigare värmedata som inparametrar. Modellen måste kunna göra nya prediktioner och visa resultaten kontinuerligt för att vara användbar för driftpersonal på fjärrvärmeverk. I detta projekt utvecklades ett program som automatiskt samlar in inparameterdata, använder denna data i maskininlärnings-modellen och visar resultaten i en graf. En av källorna för inparameterdata ger inte alltid pålitlig data och för att garantera att programmet körs kontinuerligt och på ett robust vis så måste man approximera inkorrekt data. Resultatet är ett program som kör kontinuerligt men med några restriktioner på inparameterdatan. Inparameterdatan måste ha åtminstone några korrekta värden inom de senaste två dagarna för att programmet ska köras kontinuerligt. En jämförelse mellan beräknade prediktioner och den verkliga uppmätta efterfrågan på värme visade att prediktionerna generellt var högre än de verkliga värdena. Några möjliga orsaker och lösningar identifierades men lämnas till framtida arbeten.
80

HyFlow: A High Performance Distributed Software Transactional Memory Framework

Saad Ibrahim, Mohamed Mohamed 14 June 2011 (has links)
We present HyFlow - a distributed software transactional memory (D-STM) framework for distributed concurrency control. Lock-based concurrency control suffers from drawbacks including deadlocks, livelocks, and scalability and composability challenges. These problems are exacerbated in distributed systems due to their distributed versions which are more complex to cope with (e.g., distributed deadlocks). STM and D-STM are promising alternatives to lock-based and distributed lock-based concurrency control for centralized and distributed systems, respectively, that overcome these difficulties. HyFlow is a Java framework for DSTM, with pluggable support for directory lookup protocols, transactional synchronization and recovery mechanisms, contention management policies, cache coherence protocols, and network communication protocols. HyFlow exports a simple distributed programming model that excludes locks: using (Java 5) annotations, atomic sections are defiend as transactions, in which reads and writes to shared, local and remote objects appear to take effect instantaneously. No changes are needed to the underlying virtual machine or compiler. We describe HyFlow's architecture and implementation, and report on experimental studies comparing HyFlow against competing models including Java remote method invocation (RMI) with mutual exclusion and read/write locks, distributed shared memory (DSM), and directory-based D-STM. / Master of Science

Page generated in 0.0437 seconds