Global ETD Search

491	Efficient Multi-ported Memories for FPGAs LaForest, Charles Eric 15 February 2010 (has links) Multi-ported memories are challenging to implement on FPGAs since the provided block RAMs typically have only two ports. In this dissertation we present a thorough exploration of the design space of FPGA multi-ported memories by evaluating conventional solutions to this problem, and introduce a new design that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches. For example we build a 256-location, 32-bit, 12-ported (4-write, 8-read) memory that operates at 281 MHz on Altera Stratix III FPGAs while consuming an area equivalent to 3679 ALMs: a 43% speed improvement and 84% area reduction over a pure ALM implemen- tation, and a 61% speed improvement over a pure "multipumped" implementation, although the pure multipumped implementation is 7.2-fold smaller. FPGA computer architecture multi-ported memory multipumping soft processor parallelism FIFO buffers 0544
492	Vienlusčių tinklo architektūrų tyrimas / Research in Network on Chip architectures Čepaitis, Modestas 16 August 2007 (has links) “International Technology Roadmap for Semiconductors” (ITRS) teigia, kad baigiantis dešimtmečiui bus pagamintas 50 – 100 nm lustas susidedantis iš maždaug 4 bilijonų tranzistorių operuojančių 10Ghz. Taigi tokių parametrų sistemas pagaminti yra nelengva, tranzistorių skaičiaus didėjimas taip pat didina lusto dydį, lusto sud��tingumą bei sunkina lusto integruojamumą. Be to vienlustės sistemos naudoja bendras magistrales bendrauti su kitais lustiniais resursais. Šios magistralės yra nedalomos, todėl ateityje toks komunikavimo metodas taps vis didesne problema gaminant lustus susidedančius iš bilijonų tranzistorių. Šiems tikslams spręsti ir buvo pasiūlyta vienlusčių tinklo paradigma. Vienlusčių tinklų paradigma siūlo galimus komunikavimo infrastruktūrų sprendimus, susiduriant su vis sudėtingesnėmis sistemomis, bei padeda trumpinanti lusto pagaminimo laiko periodą. Šiame darbe naudojama atkartojimo technologijos, kuriant bendrini komponentą vienlusčių tinklams. Metaspecifikacija, sukurta naudojant preprocesorinį atkartojimo metodą. / According to the International Technology Roadmap for Semiconductors (ITRS), before the end of this decade we will be entering the era of a billion transistors on a single chip. It is being stated that soon we will have a chip of 50- 100 nm comprising around 4 billion transistors operating at a frequency of 10 Ghz. However, it has been observed that as the system grows, so does the complexity of integrating various components on a chip. The major threat toward the achievement of a billion transistor chip is poor scalability of current interconnect structure of today’s SoC. In order to cope with growing interconnect infrastructure, the “Network on chip (NoC)” concept was introduced. NoCs present a possible communication infrastructure solution to deal with increased design complexity and shrinking time-to-market. It is clear that NoCs can potentially become the preferred interconnection approach for SoCs being developed in a near future. This paper discusses the impact of the reuse of NoC components methods, for parameterize and test these systems. There is preprocessing type reusing being used, to create a router metaspecification of router for the 2d torus interconnect network. Informatics Vienlusčių tinklas Vienlusčių tinklo architektūra Vienlusčių tinklų maršrutizatorius Topologija Network on chip Computer architecture Router Topology
493	Compiling for a Multithreaded Horizontally-microcoded Soft Processor Family Tili, Ilian 28 November 2013 (has links) Soft processing engines make FPGA programming simpler for software programmers. TILT is a multithreaded soft processing engine that contains multiple deeply pipelined and varying latency functional units. In this thesis, we present a compiler framework for compiling and scheduling for TILT. By using the compiler to generate schedules and manage hardware we create computationally dense designs (high throughput per hardware area) which make compelling processing engines. High schedule density is achieved by mixing instructions from different threads and by prioritizing the longest path of data flow graphs. Averaged across benchmark kernels we can achieve 90% of the theoretical throughput, and can reduce the performance gap relative to custom hardware from 543x for a scalar processor to only 4.41x by replicating TILT cores up to a comparable area cost. We also present methods of quickly navigating the design space and predicting the area of hardware configurations. compiling soft processors fpga scheduling throughput computational density design space computer architecture 0544 0984
494	Compiling for a Multithreaded Horizontally-microcoded Soft Processor Family Tili, Ilian 28 November 2013 (has links) Soft processing engines make FPGA programming simpler for software programmers. TILT is a multithreaded soft processing engine that contains multiple deeply pipelined and varying latency functional units. In this thesis, we present a compiler framework for compiling and scheduling for TILT. By using the compiler to generate schedules and manage hardware we create computationally dense designs (high throughput per hardware area) which make compelling processing engines. High schedule density is achieved by mixing instructions from different threads and by prioritizing the longest path of data flow graphs. Averaged across benchmark kernels we can achieve 90% of the theoretical throughput, and can reduce the performance gap relative to custom hardware from 543x for a scalar processor to only 4.41x by replicating TILT cores up to a comparable area cost. We also present methods of quickly navigating the design space and predicting the area of hardware configurations. compiling soft processors fpga scheduling throughput computational density design space computer architecture 0544 0984
495	Scalability issues in distributed and parallel databases Gottemukkala, Vibby January 1996 (has links) No description available. Computer architecture
496	Asynchronous events : tools for distributed programming on concurrent object-based systems Menon, Sathis N. 12 1900 (has links) No description available. Computer architecture
497	A trace-driven simulation study of cache memories Xiong, Bo January 1989 (has links) The purpose of this study is to explore the relationship between hit ratio of cache memory and design parameters. Cache memories are widely used in the design of computer system architectures to match relatively slow memories against fast CPUs. Caches hold the active segments of a program which are currently in use. Since instructions and data in cache memories can be referenced much faster than the time required to access main memory, cache memories permit the execution rate of the machine to be substantially increased. In order to function effectively, cache memories must be carefully designed and implemented. In this study, a trace-driven simulation study of direct mapped, associative mapped and set-associative mapped cache memories is made. In the simulation, cache fetch algorithm, placement policy, cache size and various parameters related to cache design and the resulting effect on system performance is investigated. The cache memories are simulated using the C language and the simulation results are analyzed for the design and implementation of cache memories. / Department of Physics and Astronomy Computer storage devices. Computer architecture. Simulation methods. C (Computer program language)
498	Understanding Multicore Performance : Efficient Memory System Modeling and Simulation Sandberg, Andreas January 2014 (has links) To increase performance, modern processors employ complex techniques such as out-of-order pipelines and deep cache hierarchies. While the increasing complexity has paid off in performance, it has become harder to accurately predict the effects of hardware/software optimizations in such systems. Traditional microarchitectural simulators typically execute code 10 000×–100 000× slower than native execution, which leads to three problems: First, high simulation overhead makes it hard to use microarchitectural simulators for tasks such as software optimizations where rapid turn-around is required. Second, when multiple cores share the memory system, the resulting performance is sensitive to how memory accesses from the different cores interleave. This requires that applications are simulated multiple times with different interleaving to estimate their performance distribution, which is rarely feasible with today's simulators. Third, the high overhead limits the size of the applications that can be studied. This is usually solved by only simulating a relatively small number of instructions near the start of an application, with the risk of reporting unrepresentative results. In this thesis we demonstrate three strategies to accurately model multicore processors without the overhead of traditional simulation. First, we show how microarchitecture-independent memory access profiles can be used to drive automatic cache optimizations and to qualitatively classify an application's last-level cache behavior. Second, we demonstrate how high-level performance profiles, that can be measured on existing hardware, can be used to model the behavior of a shared cache. Unlike previous models, we predict the effective amount of cache available to each application and the resulting performance distribution due to different interleaving without requiring a processor model. Third, in order to model future systems, we build an efficient sampling simulator. By using native execution to fast-forward between samples, we reach new samples much faster than a single sample can be simulated. This enables us to simulate multiple samples in parallel, resulting in almost linear scalability and a maximum simulation rate close to native execution. / CoDeR-MP / UPMARC Computer Architecture Simulation Modeling Sampling Caches Memory Systems gem5 Parallel Simulation Virtualization Sampling Multicore
499	Improving Device Driver Reliability through Decoupled Dynamic Binary Analyses Ruwase, Olatunji O. 01 May 2013 (has links) Device drivers are Operating Systems (OS) extensions that enable the use of I/O devices in computing systems. However, studies have identified drivers as an Achilles’ heel of system reliability, their high fault rate accounting for a significant portion of system failures. Consequently, significant effort has been directed towards improving system robustness by protecting system components (e.g., OS kernel, I/O devices, etc.) from the harmful effects of driver faults. In contrast to prior techniques which focused on preventing unsafe driver interactions (e.g., with the OS kernel), my thesis is that checking a driver’s execution for correctness violations results in the detection and mitigation of more faults. To validate this thesis, I present Guardrail, a flexible and powerful framework that enables instruction-grained dynamic analysis (e.g., data race detection) of unmodified kernel-mode driver binaries to safeguard I/O operations and devices from driver faults. Guardrail decouples the analysis tool from driver execution to improve performance, and runs it in user-space to simplify the deployment of new tools. Moreover, Guardrail leverages virtualization to be transparent to both the driver and device, and enable support for arbitrary driver/device combinations. To demonstrate Guardrail’s generality, I implemented three novel dynamic checking tools within the framework for detecting memory faults, data races and DMA faults in drivers. These tools found 25 serious bugs, including previously unknown bugs, in Linux storage and network drivers. Some of the bugs existed in several Linux (and driver) releases, suggesting their elusiveness to existing approaches. Guardrail easily detected these bugs using common driver workloads. Finally, I present an evaluation of using Guardrail to protect network and storage I/O operations from memory faults, data races and DMA faults in drivers. The results show that with hardware-assisted logging for decoupling the heavyweight analyses from driver execution, standard I/O workloads generally experienced negligible slowdown on their end-to-end performance. In conclusion, Guardrail’s high fidelity fault detection and efficient monitoring performance makes it a promising approach for improving the resilience of computing systems to the wide variety of driver faults. Device Drivers I/O Operating Systems Virtualization Dynamic Analysis Reliability Computer Architecture Computer Sciences
500	Coordinating the Design and Management of Heterogeneous Datacenter Resources Guevara, Marisabel Alejandra January 2014 (has links) <p>Heterogeneous design presents an opportunity to improve energy efficiency but raises a challenge in management. Whereas prior work separates the two, we coordinate heterogeneous design and management. We present a market-based resource allocation mechanism that navigates the performance and power trade-offs of heterogeneous architectures. Given this management framework, we explore a design space of heterogeneous processors and show a 12x reduction in response time violations when equipping a datacenter with three processor types over a homogeneous system that consumes the same power. To better understand trade-offs in large heterogeneous design spaces, we explore dozens of design strategies and present a risk taxonomy that classifies the reasons why a deployed system may underperform relative to design targets. We propose design strategies that explicitly mitigate risk, such as a strategy that minimizes the coefficient of variation in performance. In our experiments, we find that risk-aware design accounts for more than 70% of the strategies that produce systems with the best service quality. We also present a new datacenter management mechanism that fairly allocates processors to latency-sensitive applications. Tasks express value for performance using sophisticated piecewise-linear utility functions. With fairness in market allocations, we show how datacenters can mitigate envy amongst latency-sensitive users. We quantify the price of fairness and detail efficiency-fairness trade-offs. Finally, we extend the market to fairly allocate heterogeneous processors.</p> / Dissertation Computer science Computer engineering computer architecture datacenter heterogeneous processors market mechanism resource management

Search results