• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 14
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 53
  • 53
  • 20
  • 19
  • 11
  • 11
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Systematic construction and mapping of parallel programs

Grant-Duff, Zulena Noemi January 1997 (has links)
No description available.
2

An investigation into the potential of Wafer-scale associative string processors

Sheridan, Norman Gerald January 1992 (has links)
No description available.
3

High speed image processing system using parallel DSPs

Kshirsagar, Shirish Purushottam January 1994 (has links)
No description available.
4

A structural approach to the mapping problem in parallel discrete event logic simulations

Davoren, Mark January 1989 (has links)
No description available.
5

The art of active memory

Merrall, Simon C. January 1994 (has links)
No description available.
6

The use of libraries for numerical computation in distributed memory MIMD systems

Beattie, Bridget Joan Healy January 1997 (has links)
No description available.
7

Personalized Computer Architecture as Contextual Partitioning for Speech Recognition

Kent, Christopher Grant 22 January 2010 (has links)
Computing is entering an era of hundreds to thousands of processing elements per chip, yet no known parallelism form scales to that degree. To address this problem, we investigate the foundation of a computer architecture where processing elements and memory are contextually partitioned based upon facets of a user's life. Such Contextual Partitioning (CP), the situational handling of inputs, employs a method for allocating resources, novel from approaches used in today's architectures. Instead of focusing components on mutually exclusive parts of a task, as in Thread Level Parallelism, CP assigns different physical components to different versions of the same task, defining versions by contextual distinctions in device usage. Thus, application data is processed differently based on the situation of the user. Further, partitions may be user specific, leading to personalized architectures. Our focus is mobile devices, which are, or can be, personalized to one owner. Our investigation is centered on leveraging CP for accurate and real-time speech recognition on mobile devices, scalable to large vocabularies, a highly desired application for future user interfaces. By contextually partitioning a vocabulary, training partitions as separate acoustic models with SPHINX, we demonstrate a maximum error reduction of 61% compared to a unified approach. CP also allows for systems robust to changes in vocabulary, requiring up to 97% less training when updating old vocabulary entries with new words, and incurring fewer errors from the replacement. Finally, CP has the potential to scale nearly linearly with increasing core counts, offering architectures effective with future processor designs. / Master of Science
8

Towards higher speed decoding of convolutional turbocodes

SANCHEZ GONZALEZ, Oscar David 15 March 2013 (has links) (PDF)
The turbo codes are a well known channel coding technique widely used because of their outstanding error decoding performance close to the Shannon limit. These codes were proposed using a clever pragmatic approach where a set of concepts that had been previously introduced, together with the iterative processing of data, are successfully combined to obtain close to optimal decoding performance capabilities. However, precisely because this iterative processing, high latency values appear and the achievable decoder throughput is limited. At the beginning of our research activities, the fastest turbo decoder architecture introduced in the literature achieved a throughput peak value around 700 Mbit/s. There were also several works that proposed architectures capable of achieving throughput values around 100 Mbit/s. Research opportunities were then available in order to establish architectural solutions that enable the decoding at a few Gbit/s, so that the industrial requirements are fulfilled and future high performance digital communication systems can be conceived. The first part of this work is devoted to the study of the turbo codes at an algorithmic level. Several SISO decoder algorithms are explored, and different parallel turbo decoder techniques are analyzed. The convergence of parallel turbo decoder is specially considered. To this end the EXtrinsic Information Transfer (EXIT) charts are used. Conclusions derived from these kind of diagrams have served to propose a novel SISO decoder schedule to be used in shuffled turbo decoder architectures. The architectural issues when implementing high parallel turbo decoder are considered in the second part of this thesis. We propose a high throughput low complexity radix-16 SISO decoder. This decoder is intended to break the bottleneck that appears because of the recursive operations in the heart of the turbo decoding algorithm. The design of this architecture was possible thanks to the elimination of parallel paths in a radix-16 trellis diagram transition. The proposed SISO decoder implements a high speed radix-8 Add Compare Select (ACS) unit which exhibits a lower hardware complexity and lower critical path compared with a radix-16 ACS unit. Our radix-16 SISO decoder degrades the turbo decoder error correcting performance. Therefore, we have proposed two techniques so that the architecture can be used in practical applications. Thus, architectural solutions to build high parallel turbo decoder architectures, which integrate our SISO decoder, are presented. Finally, a methodology to efficiently explore the design space of parallel turbo decoder architectures is described. The main objective of this approach is to reduce the time to market constraint by designing turbo decoder architectures for a given throughput.
9

Information representation on a universal neural Chip

Galluppi, Francesco January 2013 (has links)
How can science possibly understand the organ through which the Universe knows itself? The scientific method can be used to study how electro-chemical signals represent information in the brain. However, modelling it by simulating its structures and functions is a computation- and communication-intensive task. Whilst supercomputers offer great computational power, brain-scale models are challenging in terms of communication overheads and power consumption. Dedicated neural hardware can be used to enhance simulation performance, but it is often optimised for specific models. While performance and flexibility are desirable simulation features, there is no perfect modelling platform, and the choice is subordinate to the specific research question being investigated. In this context SpiNNaker constitutes a novel parallel architecture, with communication and memory accesses optimised for spike-based computation, permitting simulation of large spiking neural networks in real time. To exploit SpiNNaker's performance and reconfigurability fully, a neural network model must be translated from its conceptual form into data structures for a parallel system. This thesis presents a flexible approach to distributing and mapping neural models onto SpiNNaker, within the constraints introduced by its specialised architecture. The conceptual map underlying this approach characterizes the interaction between the model and the system: during the build phase the model is placed on SpiNNaker; at runtime, placement information mediates communication with devices and instrumentation for data analysis. Integration within the computational neuroscience community is achieved by interfaces to two domain-specific languages: PyNN and Nengo. The real-time, event-driven nature of the SpiNNaker platform is explored using address-event representation sensors and robots, performing visual processing using a silicon retina, and navigation on a robotic platform based on a cortical, basal ganglia and hippocampal place cells model. The approach has been successfully exploited to run models on all iterations of SpiNNaker chips and development boards to date, and demonstrated live in workshops and conferences.
10

Máquina de cláusulas : arquitetura e modelo de execução de cláusulas Prolog / Clause machines : architecture and prolog clauses execution model

Bins Filho, Jose Carlos January 1990 (has links)
Este trabalho define um modelo de execução para cláusulas Prolog, a partir do modelo abstrato de Máquinas de Cláusulas, e o Projeto de uma arquitetura paralela que suporte o modelo proposto. São também introduzidos alguns aspectos sobre as linguagens Lógicas e as máquinas Prolog visto que estes elementos estão relacionados intimamente tanto com o modelo quanto com a arquitetura propostos. Na proposta do modelo de execução são definidos uma representação para os elementos do modelo abstrato (predicados, arcos e clausulas) e um conjunto de algoritmos que permitem a operacionalização do modelo de forma a que tanto o paralelismo como a concorrência inerentes ao modelo abstrato sejam exploradas de forma integral. Na proposta da arquitetura são, primeiramente, discutidas algumas opções de arquitetura básica e, posteriormente, descrita a arquitetura escolhida tanto a nível de blocos bem como dos seus componentes principais, a saber: interface de mem6ria, processador e rede de interconexão. Para cada um destes componentes são descritas as principais instruções e são apresentados os algoritmos que as implementam. Junto com a descrição da arquitetura é definida uma estrutura de dados que permite a implementação da representação descrita no modelo de execuqao e é definido também o algoritmo de unificação que percorre a estrutura proposta. Na validação é feito o cálculo da largura de banda máxima alcançada pela arquitetura proposta, calculo este baseado no algoritmo de unificação descrito. E também feita uma avaliação do ganho de performance da arquitetura proposta em relação a um processador bem como é justificado o numero de processadores escolhidos comparando a performance alcançada na arquitetura proposta com a performance alcançada por conjuntos maiores e menores de processadores. Por fim na conclusa o são feitos comentários sobre os objetivos atingidos e sobre possíveis extensões a este trabalho. / The present work defines a execution model for Prolog clauses based on the clause machines abstract model and then proposes a parallel architecture for the execution model. Some topics about Logic languages and Prolog machines were therefore introduced because they are closely related with, both, the model and the architecture proposed. In the execution model the representation of the abstract model elements (predicates, arcs and clauses) and the set of algoritms that allow the operation of the model were defined so that the parallelism of the model can be integraly achieved. In the architecture proposal, first some options for the basic architecture were discussed and then the chosen architecture is describeb at block level as much as at its components level. The most importants components reported are the memory interface, the processor and the interconection net, for each one of them the possible instructions were describeb as well as their algoritms. Together with the especification of the architecture, the data estructure that allows the implementation of the execution model representation and the concerning unification algorit that scans the proposed representation were especified too. In the validation the thoughtput permited by the proposal architecture is calculated based on the unification algoritm earlier described. Besides that the performance gain compared with an architecture with only one processor was estimated, as much as the confrontation of the performance of lesser and greater sets of processors elements were made in order to validate the chossen number. At last, in the conclusion, some coments about the fulfilled goals and about eventual extends for the work.

Page generated in 0.2479 seconds