Global ETD Search

611	Hessian-based occlusion-aware radiance caching Zhao, Yangyang 10 1900 (has links) Simuler efficacement l'éclairage global est l'un des problèmes ouverts les plus importants en infographie. Calculer avec précision les effets de l'éclairage indirect, causés par des rebonds secondaires de la lumière sur des surfaces d'une scène 3D, est généralement un processus coûteux et souvent résolu en utilisant des algorithmes tels que le path tracing ou photon mapping. Ces techniquesrésolvent numériquement l'équation du rendu en utilisant un lancer de rayons Monte Carlo. Ward et al. ont proposé une technique nommée irradiance caching afin d'accélérer les techniques précédentes lors du calcul de la composante indirecte de l'éclairage global sur les surfaces diffuses. Krivanek a étendu l'approche de Ward et Heckbert pour traiter le cas plus complexe des surfaces spéculaires, en introduisant une approche nommée radiance caching. Jarosz et al. et Schwarzhaupt et al. ont proposé un modèle utilisant le hessien et l'information de visibilité pour raffiner le positionnement des points de la cache dans la scène, raffiner de manière significative la qualité et la performance des approches précédentes. Dans ce mémoire, nous avons étendu les approches introduites dans les travaux précédents au problème du radiance caching pour améliorer le positionnement des éléments de la cache. Nous avons aussi découvert un problème important négligé dans les travaux précédents en raison du choix des scènes de test. Nous avons fait une étude préliminaire sur ce problème et nous avons trouvé deux solutions potentielles qui méritent une recherche plus approfondie. / Efficiently simulating global illumination is one of the most important open problems in computer graphics. Accurately computing the effects of indirect illumination, caused by secondary bounces of light off surfaces in a 3D scene, is generally an expensive process and often solved using algorithms such as path tracing or photon mapping. These approaches numerically solve the rendering equation using stochastic Monte Carlo ray tracing. Ward et al. proposed irradiance caching to accelerate these techniques when computing the indirect illumination component on diffuse surfaces. Krivanek extended the approach of Ward and Heckbert to handle the more complex case of glossy surfaces, introducing an approach referred to as radiance caching. Jarosz et al. and Schwarzhaupt et al. proposed a more accurate visibility-aware Hessian-based model to greatly improve the placement of records in the scene for use in an irradiance caching context, significantly increasing the quality and performance of the baseline approach. In this thesis, we extended similar approaches introduced in these aforementioned work to the problem of radiance caching to improve the placement of records. We also discovered a crucial problem overlooked in the previous work due to the choice of test scenes. We did a preliminary study of this problem, and found several potential solutions worth further investigation. éclairage global cache d'irradiance cache de radiance synthèse d'images lancer de rayon rendu photoréaliste global illumination irradiance caching radiance caching image synthesis ray tracing photo-realistic rendering
612	[en] A PREDICTIVE CACHE SYSTEM FOR REAL-TIME PROCESSING OF LARGE 2D GRAPHICAL DADA / [pt] UM SISTEMA DE CACHE PREDITIVO PARA O PROCESSAMENTO EM TEMPO-REAL DE GRANDES VOLUMES DE DADOS GRÁFICOS SERGIO ESTEVAO MACHADO LISBOA PINHEIRO 31 March 2004 (has links) [pt] Atualmente, diversas áreas de Computação Gráfica necessitam processar uma grande quantidade de dados. Para visualizar em tempo-real esses dados é necessário lidar com dois tipos de problema. O primeiro está relacionado com o pouco tempo destinado para realizar os cálculos no processo de síntese de imagem. O segundo problema surge da capacidade limitada de armazenamento dos dispositivos de alta velocidade, como memórias RAM e de textura. Para resolver o primeiro problema, este trabalho utilizou a técnica de multi-resolução para representar os dados gráficos. A representação em multi-resolução permite que a quantidade de dados processada durante a visualização seja praticamente constante. O segundo problema foi resolvido a partir de um sistema de gerenciamento de memória preditivo baseado no modelo de memória virtual. Este trabalho propõe uma arquitetura que permite que qualquer tipo de dispositivo de armazenamento seja inserido. Os dispositivos estão organizados em seqüência. O funcionamento do sistema consiste em reservar um espaço de memória em cada dispositivo e gerenciar esse espaço de forma otimizada. O sistema de predição tem a finalidade de carregar antecipadamente os dados que serão provavelmente utilizados pela aplicação num futuro próximo. Este trabalho propõe um algoritmo de predição adaptativo específico para o problema de visualização. Este algoritmo explora as informações sobre as variações dos parâmetros da câmera e as informações sobre a taxa de transferência de dados, que são usadas para decidir o que deve ser carregado. As informações dos parâmetros da câmera ajudam a determinar os dados que possivelmente serão utilizados pela aplicação. A informação da taxa de transmissão é utilizada para decidir qual o nível de resolução desses dados que devem ser carregados antecipadamente para os dispositivos de alta velocidade. O sistema de gerenciamento de memória preditivo foi testado em aplicações de visualização de imagens de satélite e panoramas virtuais,em tempo-real. / [en] Nowadays, many areas of computer graphics need to process a huge amount of data. In order to visualize the data in realtime time, it is necessary to solve two different problems. The first problem is the limited time available to perform rendering. The second one arises from the restricted capacity of storage high-speed memories, like RAM and texture memories. In order to solve the first problem, this work has used multi-resolution techniques. The multi-resolution representation allows the application to work with a constant amount of data during the rendering process. The second problem has been solved by a predictive management memory system based on the virtual memory model. This work proposes an architecture that allows any storage device to be incorporated in the system. Devices are organized sequentially. The heart of the system consists in allocating an area of memory for each device and managing this space optimally. The predictive system aims to load in advance. The data that will probably be used by the application in the near future. This work proposes a specific adaptative prediction algorithm for the visualization problem. This algorithm exploits the information about the camera parameter variations as well as the data transfer rate, in order to decide what should be loaded. The camera parameters are used to determine which data will possibly be used by the application. The transfer rate is used to decide which resolution level of the data should be loaded to the high- speed devices, in advance. The predictive memory management system has been tested for real-time visualization of satellite images and virtual panoramas. [pt] CACHE PREDITIVO [en] PREDICTIVE CACHE [pt] IMAGEM DE SATELITE [en] SATELLITE IMAGE [pt] VISUALIZACAO EM TEMPO-REAL [en] REAL-TIME VISUALIZATION [pt] PANORAMA VIRTUAL [en] VIRTUAL PANORAMA [pt] MULTI-RESOLUCAO ADAPTATIVA [en] ADAPTATIVE MULTI-RESOLUTION
613	Architecture multi-coeurs et temps d'exécution au pire cas / Multicore architectures and worst-case execution time Lesage, Benjamin 21 May 2013 (has links) Les tâches critiques en systèmes temps-réel sont soumises à des contraintes temporelles et de correction. La validation d'un tel système repose sur l'estimation du comportement temporel au pire cas de ses tâches. Le partage de ressources, inhérent aux architectures multi-cœurs, entrave le calcul de ces estimations. Le comportement temporel d'une tâche dépend de ses rivales du fait de l'arbitrage de l'accès aux ressources ou de modifications concurrentes de leur état. Cette étude vise à l'estimation de la contribution temporelle de la hiérarchie mémoire au pire temps d'exécution de tâches critiques. Les méthodes existantes, pour caches d'instructions, sont étendues afin de supporter caches de données privés et partagés, et permettre l'analyse de hiérarchies mémoires riches. Le court-circuitage de cache est ensuite utilisé pour réduire la pression sur les caches partagés. Nous proposons à cette fin différentes heuristiques basées sur la capture de la réutilisation de blocs de cache entre différents accès mémoire. Notre seconde proposition est la politique de partitionnement Preti qui permet l'allocation d'un espace sans conflits à une tâche. Preti favorise aussi les performances de tâches non critiques concurrentes aux temps-réel dans les systèmes de criticité hybride. / Critical tasks in the context of real-time systems submit to both timing and correctness constraints. Whence, the validation of a real-time system rely on the estimation of its tasks’ Worst case execution times. Resource sharing, as it occurs on multicore architectures, hinders the computation of such estimates. The timing behaviour of a task is impacted by its concurrents, whether because of resource access arbitration or concurrent modifications of a resource state. This study focuses on estimating the contribution of the memory hierarchy to tasks’ worst case execution time. Existing analysis methods, defined for instruction caches, are extended to support private and shared data caches, hence allowing for the analysis of rich memory hierarchies. Cache bypass is then used to reduce the pressure laid by concurrent tasks on shared caches levels. We propose different bypass heuristics, based on the capture of cache blocks’ reuse between memory accesses. Our second proposal is the Preti partitioning scheme which allows for the allocation to tasks of a cache space, free from inter-task conflicts. Preti offers the added benefit of providing for average-case performance to non-critical tasks concurrent to real-time ones on hybrid criticality systems. Systèmes temps-réel Hiérarchie mémoire Multi-coeur Pire-temps d'exécution Bypass Analyse statique Partitionnement de cache Real-time systems Memory hierarchy Multicore Worst-case execution time Bypass Static analysis Cache partitioning
614	Implementation of Cache Attack on Real Information Centric Networking System Anto Morais, Faustina J. 01 January 2018 (has links) Network security is an ongoing major problem in today’s Internet world. Even though there have been simulation studies related to denial of service and cache attacks, studies of attacks on real networks are still lacking in the research. In this thesis, the effects of cache attacks in real information-centric networking systems were investigated. Cache attacks were implemented in real networks with different cache sizes and with Least Recently Used, Random and First In First Out algorithms to fill the caches in each node. The attacker hits the cache with unpopular content, making the user request that the results be fetched from web servers. The cache hit, time taken to get the result, and number of hops to serve the request were calculated with real network traffic. The results of the implementation are provided for different topologies and are compared with the simulation results. Thesis University of North Florida UNF DoS DDoS ICN Cache Cache Attack Caching Computer and Systems Architecture Digital Communications and Networking
615	Towards Low-Complexity Scalable Shared-Memory Architectures Zeffer, Håkan January 2006 (has links) <p>Plentiful research has addressed low-complexity software-based shared-memory systems since the idea was first introduced more than two decades ago. However, software-coherent systems have not been very successful in the commercial marketplace. We believe there are two main reasons for this: lack of performance and/or lack of binary compatibility.</p><p>This thesis studies multiple aspects of how to design future binary-compatible high-performance scalable shared-memory servers while keeping the hardware complexity at a minimum. It starts with a software-based distributed shared-memory system relying on no specific hardware support and gradually moves towards architectures with simple hardware support.</p><p>The evaluation is made in a modern chip-multiprocessor environment with both high-performance compute workloads and commercial applications. It shows that implementing the coherence-violation detection in hardware while solving the interchip coherence in software allows for high-performing binary-compatible systems with very low hardware complexity. Our second-generation hardware-software hybrid performs on par with, and often better than, traditional hardware-only designs.</p><p>Based on our results, we conclude that it is not only possible to design simple systems while maintaining performance and the binary-compatibility envelope, it is often possible to get better performance than in traditional and more complex designs.</p><p>We also explore two new techniques for evaluating a new shared-memory design throughout this work: adjustable simulation fidelity and statistical multiprocessor cache modeling.</p> shared memory distributed shared memory hardware-software trade-off software coherence coherence profiling remote access cache chip multiprocessor simultaneous multi threading simulation workload characterization statistical cache model Computer engineering Datorteknik
616	Efficient and Scalable Cache Coherence for Many-Core Chip Multiprocessors Ros Bardisa, Alberto 24 September 2009 (has links) La nueva tendencia para aumentar el rendimiento de los futuroscomputadores son los multiprocesadores en un solo chip (CMPs). Seespera que en un futuro cercano salgan al mercado CMPs con decenas deprocesadores. Hoy en dï¿½a, la mejor manera de mantener la coherencia decache en estos sistemas es mediante los protocolos basados endirectorio. Sin embargo, estos protocolos tienen dos grandesproblemas: una gran sobrecarga de memoria y una alta latencia de losfallos de cache.Esta tesis se ha centrado en estos problemas claves para la eficienciay escalabilidad del CMP. En primer lugar, se ha presentado unaorganizaciï¿½n de directorios escalable. En segundo lugar, se hanpropuesto los protocolos de coherencia directa, que evitan laindirecciï¿½n al nodo home y, por tanto, reducen el tiempo de ejecuciï¿½nde las aplicaciones. Por ï¿½ltimo, se ha desarrollado una polï¿½tica demapeo para caches compartidas pero fï¿½sicamente distribuidas, quereduce la latencia de acceso y garantiza una distribuciï¿½n uniforme delos datos con el fin de reducir su tasa de fallos. Esto se traducefinalmente en un menor tiempo de ejecuciï¿½n para las aplicaciones. / Chip multiprocessors (CMPs) constitute the new trend for increasingthe performance of future computers. In the near future, chips withtens of cores will become more popular. Nowadays, directory-basedprotocols constitute the best alternative to keep cache coherence inlarge-scale systems. Nevertheless, directory-based protocols have twoimportant issues that prevent them from achieving better scalability:the directory memory overhead and the long cache miss latencies.This thesis focuses on these key issues. The first proposal is ascalable distributed directory organization that copes with the memoryoverhead of directory-based protocols. The second proposal presentsthe direct coherence protocols, which are aimed at avoiding theindirection problem of traditional directory-based protocols and,therefore, they improve applications' performance. Finally, a novelmapping policy for distributed caches is presented. This policyreduces the long access latency while lessening the number of off-chipaccesses, leading to improvements in applications' execution time. directory protocols scalability cache coherence Chip multiprocessors NUCA caches latencia de acceso coherencia directa indirecciï¿½n protocolos de directorio escalabilidad coherencia de cache Multiprocesadores en un solo chip indirection direct coherence access latency NUCA caches Arquitectura de computadores 004
617	Cache Prediction and Execution Time Analysis on Real-Time MPSoC Neikter, Carl-Fredrik January 2008 (has links) <p>Real-time systems do not only require that the logical operations are correct. Equally important is that the specified time constraints always are complied. This has successfully been studied before for mono-processor systems. However, as the hardware in the systems gets more complex, the previous approaches become invalidated. For example, multi-processor systems-on-chip (MPSoC) get more and more common every day, and together with a shared memory, the bus access time is unpredictable in nature. This has recently been resolved, but a safe and not too pessimistic cache analysis approach for MPSoC has not been investigated before. This thesis has resulted in designed and implemented algorithms for cache analysis on real-time MPSoC with a shared communication infrastructure. An additional advantage is that the algorithms include improvements compared to previous approaches for mono-processor systems. The verification of these algorithms has been performed with the help of data flow analysis theory. Furthermore, it is not known how different types of cache miss characteristic of a task influence the worst case execution time on MPSoC. Therefore, a program that generates randomized tasks, according to different parameters, has been constructed. The parameters can, for example, influence the complexity of the control flow graph and average distance between the cache misses.</p> Real-time systems MPSoC static timing analysis worst case execution time cache memory cache analysis data flow analysis control flow graph task generation randomization Computer science Datavetenskap Computer engineering Datorteknik
618	Towards Low-Complexity Scalable Shared-Memory Architectures Zeffer, Håkan January 2006 (has links) Plentiful research has addressed low-complexity software-based shared-memory systems since the idea was first introduced more than two decades ago. However, software-coherent systems have not been very successful in the commercial marketplace. We believe there are two main reasons for this: lack of performance and/or lack of binary compatibility. This thesis studies multiple aspects of how to design future binary-compatible high-performance scalable shared-memory servers while keeping the hardware complexity at a minimum. It starts with a software-based distributed shared-memory system relying on no specific hardware support and gradually moves towards architectures with simple hardware support. The evaluation is made in a modern chip-multiprocessor environment with both high-performance compute workloads and commercial applications. It shows that implementing the coherence-violation detection in hardware while solving the interchip coherence in software allows for high-performing binary-compatible systems with very low hardware complexity. Our second-generation hardware-software hybrid performs on par with, and often better than, traditional hardware-only designs. Based on our results, we conclude that it is not only possible to design simple systems while maintaining performance and the binary-compatibility envelope, it is often possible to get better performance than in traditional and more complex designs. We also explore two new techniques for evaluating a new shared-memory design throughout this work: adjustable simulation fidelity and statistical multiprocessor cache modeling. shared memory distributed shared memory hardware-software trade-off software coherence coherence profiling remote access cache chip multiprocessor simultaneous multi threading simulation workload characterization statistical cache model Computer engineering Datorteknik
619	Hiérarchie mémoire dans les systèmes intégrés multiprocesseurs construits autour de réseaux sur puce / Memory hierarchy in embedded multiprocessor system built around networks on chip Belhadj Amor, Hela 05 October 2017 (has links) Les systèmes parallèles de type multi/pluri-cœurs permettant d'obtenir une grande puissance de calcul à bas coût énergétique sont de nos jours une réalité. Néanmoins, l'exploitation des performances de ces architectures dépend de l'efficacité du système à gérer les accès aux données. Le but de nos travaux est d'améliorer l'efficacité de ces accès en exploitant les caractéristiques de l'architecture matérielle.Dans une première partie, nous proposons une nouvelle organisation de la hiérarchie des mémoires caches qui maximise l'utilisation de l'espace de stockage disponible à chaque niveau. Cette solution, basée sur les architectures à accès non uniforme au cache (NUCA), supporte les transferts inter et intra-niveau de la hiérarchie. Elle requiert un protocole de cohérence de cache qui s'adapte à ses spécifications.Certes, le transfert des données au niveau de la hiérarchie est aussi un déterminant de la performance du système. Dans une seconde partie, nous prenons en compte les besoins de communication spécifiques du protocole. Nous proposons un réseau virtualisé comme support de communication ad-hoc afin de gérer le trafic de cohérence à moindre coût. Ce dernier relie les caches d'un même niveau pour supporter les transferts intra-niveaux, qui sont une spécificité de notre protocole, en vue de réduire la latence moyenne d'accès. / Multi/many-cores parallel systems for high-power computing at low energy costs are nowadays a reality. However, exploiting the performance of these architectures depends on the efficiency of the system in managing data accesses. The aim of our work is to improve the efficiency of these accesses by exploiting the hardware architecture characteristics.In a first part, we propose a new cache hierarchy organization that aims at maximizing the use of the available storage space at each level. This solution, based on non-uniform cache access architectures (NUCA), supports inter and intra-level transfers of the hierarchy. It requires a cache coherency protocol that suits its specifications.Obviously, the transfer of data in the hierarchy is also a determinant of the system performance. In a second part, we consider the specific communication needs of the protocol. We suggest the use of a virtualized network as an ad-hoc communication medium to manage consistency traffic at a lower cost. It links the caches of the same level to support intra-level transfers, which are a specificity of our protocol, in order to reduce the average access latency. Hiérarchie mémoire Cohérence Réseau sur Puce (NoC) Non Uniform Cache Architecture (NUCA) Multi-Processor System-On-Chip (MPSoC Memory hierarchy Consistency Network-On-Chip (NoC) Non Uniform Cache Architecture (NUCA) 004
620	Multi-Core Memory System Design : Developing and using Analytical Models for Performance Evaluation and Enhancements Dwarakanath, Nagendra Gulur January 2015 (has links) (PDF) Memory system design is increasingly inﬂuencing modern multi-core architectures from both performance and power perspectives. Both main memory latency and bandwidth have im-proved at a rate that is slower than the increase in processor core count and speed. Off-chip memory, primarily built from DRAM, has received signiﬁcant attention in terms of architecture and design for higher performance. These performance improvement techniques include sophisticated memory access scheduling, use of multiple memory controllers, mitigating the impact of DRAM refresh cycles, and so on. At the same time, new non-volatile memory technologies have become increasingly viable in terms of performance and energy. These alternative technologies offer different performance characteristics as compared to traditional DRAM. With the advent of 3D stacking, on-chip memory in the form of 3D stacked DRAM has opened up avenues for addressing the bandwidth and latency limitations of off-chip memory. Stacked DRAM is expected to offer abundant capacity — 100s of MBs to a few GBs — at higher bandwidth and lower latency. Researchers have proposed to use this capacity as an extension to main memory, or as a large last-level DRAM cache. When leveraged as a cache, stacked DRAM provides opportunities and challenges for improving cache hit rate, access latency, and off-chip bandwidth. Thus, designing off-chip and on-chip memory systems for multi-core architectures is complex, compounded by the myriad architectural, design and technological choices, combined with the characteristics of application workloads. Applications have inherent spatial local-ity and access parallelism that inﬂuence the memory system response in terms of latency and bandwidth. In this thesis, we construct an analytical model of the off-chip main memory system to comprehend this diverse space and to study the impact of memory system parameters and work-load characteristics from latency and bandwidth perspectives. Our model, called ANATOMY, uses a queuing network formulation of the memory system parameterized with workload characteristics to obtain a closed form solution for the average miss penalty experienced by the last-level cache. We validate the model across a wide variety of memory conﬁgurations on four-core, eight-core and sixteen-core architectures. ANATOMY is able to predict memory latency with average errors of 8.1%, 4.1%and 9.7%over quad-core, eight-core and sixteen-core conﬁgurations respectively. Further, ANATOMY identiﬁe better performing design points accurately thereby allowing architects and designers to explore the more promising design points in greater detail. We demonstrate the extensibility and applicability of our model by exploring a variety of memory design choices such as the impact of clock speed, beneﬁt of multiple memory controllers, the role of banks and channel width, and so on. We also demonstrate ANATOMY’s ability to capture architectural elements such as memory scheduling mechanisms and impact of DRAM refresh cycles. In all of these studies, ANATOMY provides insight into sources of memory performance bottlenecks and is able to quantitatively predict the beneﬁt of redressing them. An insight from the model suggests that the provisioning of multiple small row-buffers in each DRAM bank achieves better performance than the traditional one (large) row-buffer per bank design. Multiple row-buffers also enable newer performance improvement opportunities such as intra-bank parallelism between data transfers and row activations, and smart row-buffer allocation schemes based on workload demand. Our evaluation (both using the analytical model and detailed cycle-accurate simulation) shows that the proposed DRAM re-organization achieves signiﬁcant speed-up as well as energy reduction. Next we examine the role of on-chip stacked DRAM caches at improving performance by reducing the load on off-chip main memory. We extend ANATOMY to cover DRAM caches. ANATOMY-Cache takes into account all the key parameters/design issues governing DRAM cache organization namely, where the cache metadata is stored and accessed, the role of cache block size and set associativity and the impact of block size on row-buffer hit rate and off-chip bandwidth. Yet the model is kept simple and provides a closed form solution for the aver-age miss penalty experienced by the last-level SRAM cache. ANATOMY-Cache is validated against detailed architecture simulations and shown to have latency estimation errors of 10.7% and 8.8%on average in quad-core and eight-core conﬁgurations respectively. An interesting in-sight from the model suggests that under high load, it is better to bypass the congested DRAM cache and leverage the available idle main memory bandwidth. We use this insight to propose a refresh reduction mechanism that virtually eliminates refresh overhead in DRAM caches. We implement a low-overhead hardware mechanism to record accesses to recent DRAM cache pages and refresh only these pages. Older cache pages are considered invalid and serviced from the (idle) main memory. This technique achieves average refresh reduction of 90% with resulting memory energy savings of 9%and overall performance improvement of 3.7%. Finally, we propose a new DRAM cache organization that achieves higher cache hit rate, lower latency and lower off-chip bandwidth demand. Called the Bi-Modal Cache, our cache organization brings three independent improvements together: (i) it enables parallel tag and data accesses, (ii) it eliminates a large fraction of tag accesses entirely by use of a novel way locator and (iii) it improves cache space utilization by organizing the cache sets as a combination of some big blocks (512B) and some small blocks (64B). The Bi-Modal Cache reduces hit latency by use of the way locator and parallel tag and data accesses. It improves hit rate by leveraging the cache capacity efficiently – blocks with low spatial reuse are allocated in the cache at 64B granularity thereby reducing both wasted off-chip bandwidth as well as cache internal fragmentation. Increased cache hit rate leads to reduction in off-chip bandwidth demand. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement of 10.8%, 13.8% and 14.0% in quad-core, eight-core and sixteen-core workloads respectively over an aggressive baseline. Multi Core Architecture ANATOMY-Cache DRAM Off-chip Memory Off-chip Bandwidth On-chip Memory Systems ANATOMY Multi-Core Memory System DRAM Cache Computer System-performance Evaluation Memory System Design Computer Science

Search results