• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 14
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 53
  • 53
  • 20
  • 19
  • 11
  • 11
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Máquina de cláusulas : arquitetura e modelo de execução de cláusulas Prolog / Clause machines : architecture and prolog clauses execution model

Bins Filho, Jose Carlos January 1990 (has links)
Este trabalho define um modelo de execução para cláusulas Prolog, a partir do modelo abstrato de Máquinas de Cláusulas, e o Projeto de uma arquitetura paralela que suporte o modelo proposto. São também introduzidos alguns aspectos sobre as linguagens Lógicas e as máquinas Prolog visto que estes elementos estão relacionados intimamente tanto com o modelo quanto com a arquitetura propostos. Na proposta do modelo de execução são definidos uma representação para os elementos do modelo abstrato (predicados, arcos e clausulas) e um conjunto de algoritmos que permitem a operacionalização do modelo de forma a que tanto o paralelismo como a concorrência inerentes ao modelo abstrato sejam exploradas de forma integral. Na proposta da arquitetura são, primeiramente, discutidas algumas opções de arquitetura básica e, posteriormente, descrita a arquitetura escolhida tanto a nível de blocos bem como dos seus componentes principais, a saber: interface de mem6ria, processador e rede de interconexão. Para cada um destes componentes são descritas as principais instruções e são apresentados os algoritmos que as implementam. Junto com a descrição da arquitetura é definida uma estrutura de dados que permite a implementação da representação descrita no modelo de execuqao e é definido também o algoritmo de unificação que percorre a estrutura proposta. Na validação é feito o cálculo da largura de banda máxima alcançada pela arquitetura proposta, calculo este baseado no algoritmo de unificação descrito. E também feita uma avaliação do ganho de performance da arquitetura proposta em relação a um processador bem como é justificado o numero de processadores escolhidos comparando a performance alcançada na arquitetura proposta com a performance alcançada por conjuntos maiores e menores de processadores. Por fim na conclusa o são feitos comentários sobre os objetivos atingidos e sobre possíveis extensões a este trabalho. / The present work defines a execution model for Prolog clauses based on the clause machines abstract model and then proposes a parallel architecture for the execution model. Some topics about Logic languages and Prolog machines were therefore introduced because they are closely related with, both, the model and the architecture proposed. In the execution model the representation of the abstract model elements (predicates, arcs and clauses) and the set of algoritms that allow the operation of the model were defined so that the parallelism of the model can be integraly achieved. In the architecture proposal, first some options for the basic architecture were discussed and then the chosen architecture is describeb at block level as much as at its components level. The most importants components reported are the memory interface, the processor and the interconection net, for each one of them the possible instructions were describeb as well as their algoritms. Together with the especification of the architecture, the data estructure that allows the implementation of the execution model representation and the concerning unification algorit that scans the proposed representation were especified too. In the validation the thoughtput permited by the proposal architecture is calculated based on the unification algoritm earlier described. Besides that the performance gain compared with an architecture with only one processor was estimated, as much as the confrontation of the performance of lesser and greater sets of processors elements were made in order to validate the chossen number. At last, in the conclusion, some coments about the fulfilled goals and about eventual extends for the work.
22

Contribuição ao controle experimental de robôs paralelos. / Contribution to the experimental control of parallel robots.

Vítor Neves Hartmann 06 July 2018 (has links)
Existe um crescente aumento nos gastos mundiais em inovação, com maior participação de áreas como a computação, a eletrônica, a saúde, a área automotiva e a área industrial. Mecanismos diferenciados, assimétricos, tais como o estudado neste trabalho, necessitam de maiores investigações, como a realização de ensaios normatizados. Para se contribuir com esse cenário propõe-se a análise de um sistema robótico, de arquitetura paralela, para experimentos de múltiplas finalidades. O presente trabalho aborda desde a construção de uma máquina paralela de topologia assimétrica, passando por seu controle, até a obtenção de informações sobre essa nova arquitetura. A sua construção é dividida em cinco subsistemas, que se interrelacionam: o mecânico, o elétrico, o de atuação, o de controle e o de interface. Sete estratégias de controle foram comparadas de acordo com os seguintes critérios: a exatidão na trajetória, o comprimento controlável, a dispersão das trajetórias em diferentes períodos de tempo, e o consumo de energia. Os resultados foram gerados por meio de registro, em folha de papel, da trajetória do efetuador da máquina. As curvas geradas foram digitalizadas e comparadas entre si. Os resultados mostraram que os controles dinâmicos podem permitir o funcionamento adequado da máquina, sendo possível, inclusive, velocidades maiores que as observadas no controle descentralizado PID. Neste trabalho em particular, o maior desafio observado foi o valor da menor frequência natural amortecida, que se mostrou baixo e resultou em baixos esforços de controle. Em ordem decrescente, os tipos de controle que apresentaram melhores resultados foram o PID descentralizado, o controle por torque computado com feedforward, e o controle por modos deslizantes, também com feedforward. / There is a steady increase in global spending on innovation, with an increased share of areas such as computing, electronics, health, automotive and industrial area. Differentiated, asymmetric mechanisms, such as the one proposed in this work, need further investigations, such as standardized tests. To contribute to this scenario the analysis of a robotic system, with parallel architecture and for multi-purpose experiments, is proposed. This work covers the construction of a parallel machine with asymmetric topology, its control, and the collection of information on this new architecture. Its construction is divided into five sub-systems, which are interrelated: the mechanical, the electrical, the actuation, the control and its interface. Seven control strategies were compared, according to four criteria. The chosen criteria are the following: track accuracy, stability range, dispersion of the paths at different periods and energy consumption. The results were generated by recording, on a sheet of paper, the trajectory of the machine\'s end-effector. The plots were digitalized and compared. The results showed that the dynamic controls can allow the machine to behave appropriately, even at speeds higher than those with the decentralized PID. The main challenge in this case was the lowest damped frequency, with a low value that resulted in low control efforts. In decreasing order, the best results were achieved with the decentralized PID, the feedforward computed torque control and the feedforward sliding modes control.
23

Gerenciamento de tags na arquitetura ChipCflow - uma máquina a fluxo de dados dinâmica / Tag management in ChipCflow architecture - a dynamic dataflow machine

Bruno de Abreu Silva 15 April 2011 (has links)
Nos últimos anos, percebeu-se uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece porque houve avanços na tecnologia do hardware e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Muitos dos softwares e arquiteturas alternativas, geralmente partem para modelos que exploram o paralelismo das aplicações, ao contrário do modelo de von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados. Logo, o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade de o programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser um tema de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e os FPGAs (Field-Programmable Gate Arrays). O projeto ChipCflow é uma ferramenta para execução de algoritmos usando o modelo a fluxo de dados dinâmico em FPGA. Este trabalho apresenta o formato para os tagged-tokens do ChipCflow, os operadores de manipulação das tags dos tokens e suas implementações a fim de que se tenha a PROVA-DE-CONCEITOS para tais operadores na arquitetura ChipCflow / The alternative architectures and softwares researches have been growing in the last years. These researches are happening due to the advance of hardware technology and such advances must be complemented by improvements on design methodologies, test and verification techniques in order to use technology effectively. Many of the alternative architectures and softwares, in general, explore the parallelism of applications, differently to von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the execution of programs is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted research area due to hardware advances, in particular, the advances of Reconfigurable Computing and FPGAs (Field-Programmable Gate Arrays). ChipCflow project is a tool for execution of algorithms using dynamic dataflow graph in FPGA. The main goal in this module of the ChipCflow project is to define the tagged-token format, the iterative operators that will manipulate the tags of tokens and to implement them
24

Máquina de cláusulas : arquitetura e modelo de execução de cláusulas Prolog / Clause machines : architecture and prolog clauses execution model

Bins Filho, Jose Carlos January 1990 (has links)
Este trabalho define um modelo de execução para cláusulas Prolog, a partir do modelo abstrato de Máquinas de Cláusulas, e o Projeto de uma arquitetura paralela que suporte o modelo proposto. São também introduzidos alguns aspectos sobre as linguagens Lógicas e as máquinas Prolog visto que estes elementos estão relacionados intimamente tanto com o modelo quanto com a arquitetura propostos. Na proposta do modelo de execução são definidos uma representação para os elementos do modelo abstrato (predicados, arcos e clausulas) e um conjunto de algoritmos que permitem a operacionalização do modelo de forma a que tanto o paralelismo como a concorrência inerentes ao modelo abstrato sejam exploradas de forma integral. Na proposta da arquitetura são, primeiramente, discutidas algumas opções de arquitetura básica e, posteriormente, descrita a arquitetura escolhida tanto a nível de blocos bem como dos seus componentes principais, a saber: interface de mem6ria, processador e rede de interconexão. Para cada um destes componentes são descritas as principais instruções e são apresentados os algoritmos que as implementam. Junto com a descrição da arquitetura é definida uma estrutura de dados que permite a implementação da representação descrita no modelo de execuqao e é definido também o algoritmo de unificação que percorre a estrutura proposta. Na validação é feito o cálculo da largura de banda máxima alcançada pela arquitetura proposta, calculo este baseado no algoritmo de unificação descrito. E também feita uma avaliação do ganho de performance da arquitetura proposta em relação a um processador bem como é justificado o numero de processadores escolhidos comparando a performance alcançada na arquitetura proposta com a performance alcançada por conjuntos maiores e menores de processadores. Por fim na conclusa o são feitos comentários sobre os objetivos atingidos e sobre possíveis extensões a este trabalho. / The present work defines a execution model for Prolog clauses based on the clause machines abstract model and then proposes a parallel architecture for the execution model. Some topics about Logic languages and Prolog machines were therefore introduced because they are closely related with, both, the model and the architecture proposed. In the execution model the representation of the abstract model elements (predicates, arcs and clauses) and the set of algoritms that allow the operation of the model were defined so that the parallelism of the model can be integraly achieved. In the architecture proposal, first some options for the basic architecture were discussed and then the chosen architecture is describeb at block level as much as at its components level. The most importants components reported are the memory interface, the processor and the interconection net, for each one of them the possible instructions were describeb as well as their algoritms. Together with the especification of the architecture, the data estructure that allows the implementation of the execution model representation and the concerning unification algorit that scans the proposed representation were especified too. In the validation the thoughtput permited by the proposal architecture is calculated based on the unification algoritm earlier described. Besides that the performance gain compared with an architecture with only one processor was estimated, as much as the confrontation of the performance of lesser and greater sets of processors elements were made in order to validate the chossen number. At last, in the conclusion, some coments about the fulfilled goals and about eventual extends for the work.
25

Information representation on a universal neural Chip

Galluppi, Francesco January 2013 (has links)
How can science possibly understand the organ through which the Universe knows itself? The scientific method can be used to study how electro-chemical signals represent information in the brain. However, modelling it by simulating its structures and functions is a computation- and communication-intensive task. Whilst supercomputers offer great computational power, brain-scale models are challenging in terms of communication overheads and power consumption. Dedicated neural hardware can be used to enhance simulation performance, but it is often optimised for specific models. While performance and flexibility are desirable simulation features, there is no perfect modelling platform, and the choice is subordinate to the specific research question being investigated. In this context SpiNNaker constitutes a novel parallel architecture, with communication and memory accesses optimised for spike-based computation, permitting simulation of large spiking neural networks in real time. To exploit SpiNNaker's performance and reconfigurability fully, a neural network model must be translated from its conceptual form into data structures for a parallel system. This thesis presents a flexible approach to distributing and mapping neural models onto SpiNNaker, within the constraints introduced by its specialised architecture. The conceptual map underlying this approach characterizes the interaction between the model and the system: during the build phase the model is placed on SpiNNaker; at runtime, placement information mediates communication with devices and instrumentation for data analysis. Integration within the computational neuroscience community is achieved by interfaces to two domain-specific languages: PyNN and Nengo. The real-time, event-driven nature of the SpiNNaker platform is explored using address-event representation sensors and robots, performing visual processing using a silicon retina, and navigation on a robotic platform based on a cortical, basal ganglia and hippocampal place cells model. The approach has been successfully exploited to run models on all iterations of SpiNNaker chips and development boards to date, and demonstrated live in workshops and conferences.
26

Utilizing Heterogeneity in Manycore Architectures for Streaming Applications

Savas, Süleyman January 2017 (has links)
In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures. The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores. This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains. Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance. / HiPEC (High Performance Embedded Computing) / NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
27

High Performance Applications for the Single-Chip Message-Passing Parallel Computer

Dickenson, William Wesley 05 May 2004 (has links)
Computer architects continue to push the limits of modern microprocessors. By using techniques such as out-of-order execution, branch prediction, and dynamic scheduling, designers have found ways to speed execution. However, growing architectural complexity has led to unsustained development and testing times. Shrinking feature sizes are causing increased wire resistances and signal propagation, thereby limiting a design's scalability. Indeed, the method of exploiting instruction-level parallelism (ILP) within applications is reaching a point of diminishing returns. One approach to the aforementioned challenges is the Single-Chip Message-Passing (SCMP) Parallel Computer, developed at Virginia Tech. SCMP is a unique, tiled architecture aimed at thread-level parallelism (TLP). Identical cores are replicated across the chip, and global wire traces have been eliminated. The nodes are connected via a 2-D grid network and each contains a local memory bank. This thesis presents the design and analysis of three high-performance applications for SCMP. The results show that the architecture proves itself as a formidable opponent to several current systems. / Master of Science
28

Transforming and Optimizing Irregular Applications for Parallel Architectures

Zhang, Jing 12 February 2018 (has links)
Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically increase. On the other hand, many applications in well-established and emerging fields, such as bioinformatics, social network analysis, and graph processing, exhibit increasing irregularities in memory access, control flow, and communication patterns. While multiple techniques have been introduced into modern parallel architectures to tolerate these irregularities, many irregular applications still execute poorly on current parallel architectures, as their irregularities exceed the capabilities of these techniques. Therefore, it is critical to resolve irregularities in applications for parallel architectures. However, this is a very challenging task, as the irregularities are dynamic, and hence, unknown until runtime. To optimize irregular applications, many approaches have been proposed to improve data locality and reduce irregularities through computational and data transformations. However, there are two major drawbacks in these existing approaches that prevent them from achieving optimal performance. First, these approaches use local optimizations that exploit data locality and regularity locally within a loop or kernel. However, in many applications, there is hidden locality across loops or kernels. Second, these approaches use "one-size-fits-all'' methods that treat all irregular patterns equally and resolve them with a single method. However, many irregular applications have complex irregularities, which are mixtures of different types of irregularities and need differentiated optimizations. To overcome these two drawbacks, we propose a general methodology that includes a taxonomy of irregularities to help us analyze the irregular patterns in an application, and a set of adaptive transformations to reorder data and computation based on the characteristics of the application and architecture. By extending our adaptive data-reordering transformation on a single node, we propose a data-partitioning framework to resolve the load imbalance problem of irregular applications on multi-node systems. Unlike existing frameworks, which use "one-size-fits-all" methods to partition the input data by a single property, our framework provides a set of operations to transform the input data by multiple properties and generates the desired data-partitioning codes by composing these operations into a workflow. / Ph. D.
29

Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing

Helal, Ahmed Elmohamadi Mohamed 29 January 2020 (has links)
In the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process. This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications. / Doctor of Philosophy / Current supercomputers integrate a massive number of heterogeneous compute units with varying speed, computational throughput, memory bandwidth, and memory access latency. This trend represents a major challenge to end users, as their applications have been designed from the ground up to primarily exploit homogeneous CPUs. While heterogeneous systems can deliver several orders of magnitude speedup compared to traditional CPU-based systems, end users need extensive software and hardware expertise as well as significant time and effort to efficiently utilize all the available compute resources. To streamline such a daunting process, this dissertation presents automated frameworks for analyzing and modeling the performance on parallel architectures and for transforming the execution of user applications at runtime. The proposed frameworks incorporate domain knowledge and adapt to the input data and the underlying hardware using novel static and dynamic analyses. The experimental results show the efficacy of the introduced frameworks across many important application domains, such as computational fluid dynamics (CFD), and computer-aided design (CAD). In particular, the adaptive execution approach on heterogeneous systems achieves up to an order-of-magnitude speedup over the optimized parallel implementations.
30

Etude et optimisation d'algorithmes pour le suivi d'objets couleur / Analysis and optimisation of algorithms for color object tracking

Laguzet, Florence 27 September 2013 (has links)
Les travaux de cette thèse portent sur l'amélioration et l'optimisation de l'algorithme de suivi d'objet couleur Mean-Shift à la fois d’un point de vue robustesse du suivi et d’un point de vue architectural pour améliorer la vitesse d’exécution. La première partie des travaux a consisté en l'amélioration de la robustesse du suivi. Pour cela, l'impact des espaces de représentation couleur a été étudié, puis une méthode permettant la sélection de l'espace couleur représentant le mieux l'objet à suivre a été proposée. L'environnement de la cible changeant au cours du temps, une stratégie est mise en place pour resélectionner un espace couleur au moment opportun. Afin d'améliorer la robustesse dans le cas de séquences particulièrement difficile, le Mean-Shift avec stratégie de sélection a été couplé avec un autre algorithme plus coûteux en temps d'exécution : le suivi par covariance. L’objectif de ces travaux est d’obtenir un système complet fonctionnant en temps réel sur processeurs multi-cœurs SIMD. Une phase d’étude et d'optimisation a donc été réalisée afin de rendre les algorithmes paramétrables en complexité pour qu’ils puissent s’exécuter en temps réel sur différentes plateformes, pour différentes tailles d’images et d’objets suivi. Dans cette optique de compromis vitesse / performance, il devient ainsi possible de faire du suivi temps-réel sur des processeurs ARM type Cortex A9. / The work of this thesis focuses on the improvement and optimization of the Mean-Shift color object tracking algorithm, both from a theoretical and architectural point of view to improve both the accuracy and the execution speed. The first part of the work consisted in improving the robustness of the tracking. For this, the impact of color space representation on the quality of tracking has been studied, and a method for the selection of the color space that best represents the object to be tracked has been proposed. The method has been coupled with a strategy determining the appropriate time to recalculate the model. Color space selection method was also used in collaboration with another object tracking algorithm to further improve the tracking robustness for particularly difficult sequences : the covariance tracking which is more time consuming. The objective of this work is to obtain an entire real time system running on multi-core SIMD processors. A study and optimization phase has been made in order to obtain algorithms with a complexity that is configurable so that they can run in real time on different platforms, for various sizes of images and object tracking. In this context of compromise between speed and performance, it becomes possible to do real-time tracking on processors like ARM Cortex A9.

Page generated in 0.1162 seconds