• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 30
  • 11
  • 11
  • 8
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 185
  • 75
  • 52
  • 40
  • 29
  • 28
  • 24
  • 23
  • 23
  • 21
  • 19
  • 19
  • 18
  • 18
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

IMPROVING PERFORMANCE AND ENERGY EFFICIENCY FOR THE INTEGRATED CPU-GPU HETEROGENEOUS SYSTEMS

Wen, Hao 01 January 2018 (has links)
Current heterogeneous CPU-GPU architectures integrate general purpose CPUs and highly thread-level parallelized GPUs (Graphic Processing Units) in the same die. This dissertation focuses on improving the energy efficiency and performance for the heterogeneous CPU-GPU system. Leakage energy has become an increasingly large fraction of total energy consumption, making it important to reduce leakage energy for improving the overall energy efficiency. Cache occupies a large on-chip area, which are good targets for leakage energy reduction. For the CPU cache, we study how to reduce the cache leakage energy efficiently in a hybrid SPM (Scratch-Pad Memory) and cache architecture. For the GPU cache, the access pattern of GPU cache is different from the CPU, which usually has little locality and high miss rate. In addition, GPU can hide memory latency more effectively due to multi-threading. Because of the above reasons, we find it is possible to place the cache lines of the GPU data caches into the low power mode more aggressively than traditional leakage management for CPU caches, which can reduce more leakage energy without significant performance degradation. The contention in shared resources between CPU and GPU, such as the last level cache (LLC), interconnection network and DRAM, may degrade both CPU and GPU performance. We propose a simple yet effective method based on probability to control the LLC replacement policy for reducing the CPU’s inter-core conflict misses caused by GPU without significantly impacting GPU performance. In addition, we develop two strategies to combine the probability based method for the LLC and an existing technique called virtual channel partition (VCP) for the interconnection network to further improve the CPU performance. For a specific graph application of Breadth first search (BFS), which is a basis for graph search and a core building block for many higher-level graph analysis applications, it is a typical example of parallel computation that is inefficient on GPU architectures. In a graph, a small portion of nodes may have a large number of neighbors, which leads to irregular tasks on GPUs. These irregularities limit the parallelism of BFS executing on GPUs. Unlike the previous works focusing on fine-grained task management to address the irregularity, we propose Virtual-BFS (VBFS) to virtually change the graph itself. By adding virtual vertices, the high-degree nodes in the graph are divided into groups that have an equal number of neighbors, which increases the parallelism such that more GPU threads can work concurrently. This approach ensures correctness and can significantly improve both the performance and energy efficiency on GPUs.
62

Query Execution on Modern CPUs

Zeuch, Steffen 13 July 2018 (has links)
Über die letzten Jahrzehnte haben sich Datenbanken von festplatten-basierten zu hauptspeicher-basierten Datenbanksystemen entwickelt. Um diese Herausforderungen anzugehen und das volle Potenzial moderner Prozessoren zu erschließen, stellt diese Dissertation vier Ansätze vor um den Einfluss der „Memory Wall“ zu reduzieren. Der erste Ansatz zeigt auf, wie spezielle Prozessorinstruktionen (sogenannte SIMD Instruktionen) die Ausnutzung von Caches erhöhen und gleichzeitig die Anzahl der Instruktionen verringern. In dieser Arbeit werden dazu vorhandene Baumstrukturen so angepasst, dass diese SIMD Instruktionen verwendet werden können und somit die benötigte Hauptspeicherbandbreite verringert wird. Der zweite Ansatz dieser Arbeit führt ein Model ein, welches es ermöglicht die Anfrageausführung in verschiedenen Datenbanksystemen zu vereinheitlichen und dadurch vergleichbar zu machen. Durch diese Vereinheitlichung wird es möglich, die Hardwareausnutzung durch Hinzunahme von Wissen über die auszuführende Hardware zu optimieren. Der dritte Ansatz analysiert verschiedene Datenbankoperatoren bezüglich ihres Verhaltens auf verschiedenen Hardwareumgebungen. Diese Analyse ermöglicht es, Datenbankoperatoren besser zu verstehen und Kostenmodelle für ihr Verhalten zu entwickeln. Der vierte Ansatz dieser Arbeit baut auf der Analyse der Operatoren auf und führt einen progressiven Optimierungsalgorithmus ein, der die Ausführung von Anfragen zur Laufzeit auf die jeweiligen Bedingungen wie z.B. Daten- oder Hardwareeigenschaften anpasst. Dazu werden zur Laufzeit prozessorinterne Zähler verwendet, die das Verhalten des Operators auf der jeweiligen Hardware widerspiegeln. / Over the last decades, database systems have been migrated from disk to memory architectures such as RAM, Flash, or NVRAM. Research has shown that this migration fundamentally shifts the performance bottleneck upwards in the memory hierarchy. Whereas disk-based database systems were largely dominated by disk bandwidth and latency, in-memory database systems mainly depend on the efficiency of faster memory components, e.g., RAM, caches, and registers. To encounter these challenges and enable the full potential of the available processing power of modern CPUs for database systems, this thesis proposes four approaches to reduce the impact of the Memory Wall. First, SIMD instructions increase the cache line utilization and decrease the number of executed instructions if they operate on an appropriate data layout. Thus, we adapt tree structures for processing with SIMD instructions to reduce demands on the memory bus and processing units are decreased. Second, by modeling and executing queries following a unified model, we are able to achieve high resource utilization. Therefore, we propose a unified model that enables us to utilize knowledge about the query plan and the underlying hardware to optimize query execution. Third, we need a fundamental knowledge about the individual database operators and their behavior and requirements to optimally distribute the resources among available computing units. We conduct an in-depth analysis of different workloads using performance counters create these insights. Fourth, we propose a non-invasive progressive optimization approach based on in-depth knowledge of individual operators that is able to optimize query execution during run-time. In sum, using additional run-time statistics gathered by performance counters, a unified model, and SIMD instructions, this thesis improves query execution on modern CPUs.
63

Improving and Extending a High Performance Processor Optimized for FPGAs / Förbättring och utökning av en högpresterande processor anpassad för FPGAer

Källming, Daniel, Hultenius, Kristoffer January 2010 (has links)
<p>This thesis is about a number of improvements and additions done to a soft CPU optimized for field programmable gate arrays (FPGAs). The goal has been to implement the changes without substantially lowering the CPU's ability to operate at high clock frequencies. The result of the thesis is a number of high clock frequency modules, which when added completes the CPU hardware functionality in certain areas. The maximum frequency of the CPU is however somewhat lowered after the modules have been added.</p> / <p>Detta examensarbete handlar om ett antal förbättringar och utökningar av en mjuk processor speciellt anpassad för fältprogrammerbara grindmatriser (FPGA). Målet har varit att göra förändringarna utan att göra större avkall på processorns förmåga att operera i höga klockfrekvenser. Resultatet av examensarbetet är ett antal moduler som klarar av höga frekvenser och kompletterar processorns hårdvarufunktioner. Dock reduceras maxfrekvensen på processorn något med modulerna tillagda.</p>
64

Cfd Analysis Of A Notebook Computer Thermal Management Solution

Yalcin, Fidan Seza 01 May 2008 (has links) (PDF)
In this study, the thermal management system of a notebook computer is investigated by using a commercial finite volume Computational Fluid Dynamics (CFD) software. After taking the computer apart, all dimensions are measured and all major components are modeled as accurately as possible. Heat dissipation values and necessary characteristics of the components are obtained from the manufacturer&#039 / s specifications. The different heat dissipation paths that are utilized in the design are investigated. Two active fans and aluminum heat dissipation plates as well as the heat pipe system are modeled according to their specifications. The first and second order discretization schemes as well as two different mesh densities are investigated as modeling choices. Under different operating powers, adequacy of the existing thermal management system is observed. Average and maximum temperatures of the internal components are reported in the form of tables. Thermal resistance networks for five different operating conditions are obtained from the analysis of the CFD simulation results. Temperature distributions on the top surface of the chassis where the keyboard and touchpad are located are investigated considering the user comfort.
65

Design, Fabrication, And Experimental Evaluation Of Microchannel Heat Sinks In Cpu Cooling

Koyuncuoglu, Aziz 01 September 2010 (has links) (PDF)
A novel complementary metal oxide semiconductor (CMOS) compatible microchannel heat sink is designed, fabricated, and tested for electronic cooling applications. The proposed microchannel heat sink requires no design change of the electronic circuitry underneath. Therefore, microchannels can be fabricated on top of the finished CMOS wafers by just adding a few more steps to the fabrication flow. Combining polymer (parylene C) and metal (copper) structures, a high performance microchannel heat sink can be easily manufactured on top of the electronic circuits, forming a monolithic cooling system. In the design stage, computer simulations of the microchannels with several different dimensions have been performed. Microchannels made of only parylene showed poor heat transfer performance as expected since the thermal conductivity of parylene C is very low. Therefore an alternative design comprising structural parylene layer and embedded metal layers has been modeled. Copper is selected as the metal due to its simple fabrication and very good thermal properties. The results showed that the higher the copper surface area the better the thermal performance of the heat sinks. Based on the modeling results, the final test structures are designed with full copper sidewalls with a parylene top wall. Several different microchannel test chips have been fabricated in METU-MEMS Research &amp / Application Center cleanroom facilities. The devices are tested with different flow rates and heat loads. During the tests, it was shown that the test devices can remove about 126 W/cm2 heat flux from the chip surface while keeping the chip temperature at around 90&deg / C with a coolant flow rate of 500 &mu / l/min per channel.
66

Maintenance of a 3D Visualization System

Wang, Cishen January 2008 (has links)
<p>Vizz3D is a powerful 3D visualization system. The current version is neither perfect nor up-to-date. Furthermore, some important features are missing. In order to keep the tool valuable it needs to be maintained. I implemented a new feature allowing to save and load the view port in the graph to control the camera position. I also improved the CPU utilization and the navigation system to solve the limitations in Vizz3D and to improve the overall performance.</p>
67

Predictive power management for multi-core processors

Bircher, William Lloyd 07 February 2011 (has links)
Energy consumption by computing systems is rapidly increasing due to the growth of data centers and pervasive computing. In 2006 data center energy usage in the United States reached 61 billion kilowatt-hours (KWh) at an annual cost of 4.5 billion USD [Pl08]. It is projected to reach 100 billion KWh by 2011 at a cost of 7.4 billion USD. The nature of energy usage in these systems provides an opportunity to reduce consumption. Specifically, the power and performance demand of computing systems vary widely in time and across workloads. This has led to the design of dynamically adaptive or power managed systems. At runtime, these systems can be reconfigured to provide optimal performance and power capacity to match workload demand. This causes the system to frequently be over or under provisioned. Similarly, the power demand of the system is difficult to account for. The aggregate power consumption of a system is composed of many heterogeneous systems, each with a unique power consumption characteristic. This research addresses the problem of when to apply dynamic power management in multi-core processors by accounting for and predicting power and performance demand at the core-level. By tracking performance events at the processor core or thread-level, power consumption can be accounted for at each of the major components of the computing system through empirical, power models. This also provides accounting for individual components within a shared resource such as a power plane or top-level cache. This view of the system exposes the fundamental performance and power phase behavior, thus making prediction possible. This dissertation also presents an extensive analysis of complete system power accounting for systems and workloads ranging from servers to desktops and laptops. The analysis leads to the development of a simple, effective prediction scheme for controlling power adaptations. The proposed Periodic Power Phase Predictor (PPPP) identifies patterns of activity in multi-core systems and predicts transitions between activity levels. This predictor is shown to increase performance and reduce power consumption compared to reactive, commercial power management schemes by achieving higher average frequency in active phases and lower average frequency in idle phases. / text
68

Optimized Graphics for Handheld Real-time CG Applications

Powell, Robin January 2014 (has links)
The purpose of this theoretical thesis is to research the problem: producing optimized content for handheld real-time CG applications. This thesis is based on existing literature studiesregarding the subject. It will explore the topic on how to deal with draw calls, shaderoptimizations, hard/smooth edges and how engines deal with polygons and textures and howthis correlates with current technical limitations of current game engines for handheld realtime applications such as the iPhone. On these grounds, the thesis will expand on the subject of how those topics affect the hardware; more specifically targeted the GPU and CPU onmobile devices. Today, the hardware used on mobile devices on the market is limited; it is important tounderstand that a lot of factors come in to play and making optimization a part of the design, not a final step regarding the creation of content is vital when creating CG applications formobile devices. A closer look at the available evidences suggests that when dealing with the limited resources available every performance aspect should be closely looked into. It is vital to optimize the content and correlate your game budget accordingly to the marketed device. In the end it will make for a better game. This thesis sheds a light on the development strategiesfor CG applications on handheld devices.
69

Modèles de programmation et d'exécution pour les architectures parallèles et hybrides. Applications à des codes de simulation pour la physique.

Ospici, Matthieu 03 July 2013 (has links) (PDF)
Nous nous intéressons dans cette thèse aux grandes architectures parallèles hybrides, c'est-à-dire aux architectures parallèles qui sont une combinaison de processeurs généraliste (Intel Xeon par exemple) et de processeurs accélérateur (GPU Nvidia). L'exploitation efficace de ces grappes hybrides pour le calcul haute performance est au cœur de nos travaux. L'hétérogénéité des ressources de calcul au sein des grappes hybrides pose de nombreuses problématiques lorsque l'on souhaite les exploiter efficacement avec de grandes applications scientifiques existantes. Deux principales problématiques ont été traitées. La première concerne le partage des accélérateurs pour les applications MPI et la seconde porte sur la programmation et l'exécution concurrente de code entre CPU et accélérateur. Les architectures hybrides sont très hétérogènes : en fonction des architectures, le ratio entre le nombre d'accélérateurs et le nombre de coeurs CPU est très variable. Ainsi, nous avons tout d'abord proposé une notion de virtualisation d'accélérateur, qui permet de donner l'illusion aux applications qu'elles ont la capacité d'utiliser un nombre d'accélérateurs qui n'est pas lié au nombre d'accélérateurs physiques disponibles dans le matériel. Un modèle d'exécution basé sur un partage des accélérateurs est ainsi mis en place et permet d'exposer aux applications une architecture hybride plus homogène. Nous avons également proposé des extensions aux modèles de programmation basés sur MPI / threads afin de traiter le problème de l'exécution concurrente entre CPU et accélérateurs. Nous avons proposé pour cela un modèle basé sur deux types de threads, les threads CPU et accélérateur, permettant de mettre en place des calculs hybrides exploitant simultanément les CPU et les accélérateurs. Dans ces deux cas, le déploiement et l'exécution du code sur les ressources hybrides est crucial. Nous avons pour cela proposé deux bibliothèques logicielles S_GPU 1 et S_GPU 2 qui ont pour rôle de déployer et d'exécuter les calculs sur le matériel hybride. S_GPU 1 s'occupant de la virtualisation, et S_GPU 2 de l'exploitation concurrente CPU -- accélérateurs. Pour observer le déploiement et l'exécution du code sur des architectures complexes à base de GPU, nous avons intégré des mécanismes de traçage qui permettent d'analyser le déroulement des programmes utilisant nos bibliothèques. La validation de nos propositions a été réalisée sur deux grandes application scientifiques : BigDFT (simulation ab-initio) et SPECFEM3D (simulation d'ondes sismiques). Nous les avons adapté afin qu'elles puissent utiliser S_GPU 1 (pour BigDFT) et S_GPU 2 (pour SPECFEM3D).
70

Comparing performance between plain JavaScript and popular JavaScript frameworks

Ladan, Zlatko January 2015 (has links)
JavaScript is used on the web together with HTML and CSS, in many cases using frameworks for JavaScript such as jQuery and Backbone.js. This project is comparing the speed and memory allocation of the programming language JavaScript and its two most used frameworks as well as the language on its own. Since JavaScript is not very fast and it has some missing features or features that differ from browser to browser and frameworks solve this problem but at the cost of speed and memory allocation, the aim is to find out how well JavaScript and the two frameworks jQuery and Backbone.js are doing this on Google Chrome Canary. The results varied (mostly) between the implementations and show that the to-do application is a good enough example to use when comparing the results of heap allocation and CPU time of methods. The results where compared with their mean values and using ANOVA. JavaScript was the fastest, but it might not be enough for a developer to completely stop using frameworks. With JavaScript a developer can choose to create a custom framework, or use an existing one based on the results of this project.

Page generated in 0.0397 seconds