Global ETD Search

1	Database server workload characterization in an e-commerce environment Liu, Fujian 19 December 2005 A typical E-commerce system that is deployed on the Internet has multiple layers that include Web users, Web servers, application servers, and a database server. As the system use and user request frequency increase, Web/application servers can be scaled up by replication. A load balancing proxy can be used to route user requests to individual machines that perform the same functionality. <br><br>To address the increasing workload while avoiding replicating the database server, various dynamic caching policies have been proposed to reduce the database workload in E-commerce systems. However, the nature of the changes seen by the database server as a result of dynamic caching remains unknown. A good understanding of this change is fundamental for tuning a database server to get better performance. <br><br> In this study, the TPC-W (a transactional Web E-commerce benchmark) workloads on a database server are characterized under two different dynamic caching mechanisms, which are generalized and implemented as query-result cache and table cache. The characterization focuses on response time, CPU computation, buffer pool references, disk I/O references, and workload classification. <br><br>This thesis combines a variety of analysis techniques: simulation, real time measurement and data mining. The experimental results in this thesis reveal some interesting effects that the dynamic caching has on the database server workload characteristics. The main observations include: (a) dynamic cache can considerably reduce the CPU usage of the database server and the number of database page references when it is heavily loaded; (b) dynamic cache can also reduce the database reference locality, but to a smaller degree than that reported in file servers. The data classification results in this thesis show that with dynamic cache, the database server sees TPC-W profiles more like on-line transaction processing workloads. Dynamic Caching TPC-W Profile Classification OLTP Temporal Locality
2	Database server workload characterization in an e-commerce environment Liu, Fujian 19 December 2005 (has links) A typical E-commerce system that is deployed on the Internet has multiple layers that include Web users, Web servers, application servers, and a database server. As the system use and user request frequency increase, Web/application servers can be scaled up by replication. A load balancing proxy can be used to route user requests to individual machines that perform the same functionality. <br><br>To address the increasing workload while avoiding replicating the database server, various dynamic caching policies have been proposed to reduce the database workload in E-commerce systems. However, the nature of the changes seen by the database server as a result of dynamic caching remains unknown. A good understanding of this change is fundamental for tuning a database server to get better performance. <br><br> In this study, the TPC-W (a transactional Web E-commerce benchmark) workloads on a database server are characterized under two different dynamic caching mechanisms, which are generalized and implemented as query-result cache and table cache. The characterization focuses on response time, CPU computation, buffer pool references, disk I/O references, and workload classification. <br><br>This thesis combines a variety of analysis techniques: simulation, real time measurement and data mining. The experimental results in this thesis reveal some interesting effects that the dynamic caching has on the database server workload characteristics. The main observations include: (a) dynamic cache can considerably reduce the CPU usage of the database server and the number of database page references when it is heavily loaded; (b) dynamic cache can also reduce the database reference locality, but to a smaller degree than that reported in file servers. The data classification results in this thesis show that with dynamic cache, the database server sees TPC-W profiles more like on-line transaction processing workloads. Dynamic Caching TPC-W Profile Classification OLTP Temporal Locality
3	A Novel Cache Migration Scheme in Network-on-Chip Devices Nafziger, Jonathan W. 06 December 2010 (has links) No description available. Electrical Engineering Network-on-chip NoC Cache Data Migration Spatial Locality Temporal Locality
4	Design Space Exploration and Optimization of Embedded Memory Systems Rabbah, Rodric Michel 11 July 2006 (has links) Recent years have witnessed the emergence of microprocessors that are embedded within a plethora of devices used in everyday life. Embedded architectures are customized through a meticulous and time consuming design process to satisfy stringent constraints with respect to performance, area, power, and cost. In embedded systems, the cost of the memory hierarchy limits its ability to play as central a role. This is due to stringent constraints that fundamentally limit the physical size and complexity of the memory system. Ultimately, application developers and system engineers are charged with the heavy burden of reducing the memory requirements of an application. This thesis offers the intriguing possibility that compilers can play a significant role in the automatic design space exploration and optimization of embedded memory systems. This insight is founded upon a new analytical model and novel compiler optimizations that are specifically designed to increase the synergy between the processor and the memory system. The analytical models serve to characterize intrinsic program properties, quantify the impact of compiler optimizations on the memory systems, and provide deep insight into the trade-offs that affect memory system design. Cache architecture Temporal locality Embedded systems Design space exploration Compilers Spatial locality Prefetching Data remapping Embedded computer systems Programming Cache memory Compilers (Computer programs) Memory hierarchy
5	Methods for Creating and Exploiting Data Locality Wallin, Dan January 2006 (has links) The gap between processor speed and memory latency has led to the use of caches in the memory systems of modern computers. Programs must use the caches efficiently and exploit data locality for maximum performance. Multiprocessors, built from many processing units, are becoming commonplace not only in large servers but also in smaller systems such as personal computers. Multiprocessors require careful data locality optimizations since accesses from other processors can lead to invalidations and false sharing cache misses. This thesis explores hardware and software approaches for creating and exploiting temporal and spatial locality in multiprocessors. We propose the capacity prefetching technique, which efficiently reduces the number of cache misses but avoids false sharing by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time. Prefetching techniques often lead to increased coherence and data traffic. The new bundling technique avoids one of these drawbacks and reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource. Most of the studies have been performed on advanced scientific algorithms. This thesis demonstrates that a cc-NUMA multiprocessor, with hardware data migration and replication optimizations, efficiently exploits the temporal locality in such codes. We further present a method of parallelizing a multigrid Gauss-Seidel partial differential equation solver, which creates temporal locality at the expense of increased communication. Our conclusion is that on modern chip multiprocessors, it is more important to optimize algorithms for data locality than to avoid communication, since communication can take place using a shared cache. data locality temporal locality spatial locality prefetching cache cache behavior cache coherence snooping protocols partial differential equation shared-memory multiprocessor chip multiprocessor simulation Computer engineering Datorteknik
6	Packet Order Matters! : Improving Application Performance by Deliberately Delaying Packets / Paketsekvensen betyder! : Förbättra applikationsprestanda genom att avsiktligt fördröja paket Ghasemirahni, Hamid January 2021 (has links) Data-centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, CPU caches deliver great benefits. This licentiate thesis systematically studies the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. The results are that (i) the performance of a variety of widely deployed applications degrade substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization's link to/from its upstream provider reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing µs-scale delays of some packets, Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%. / Datacenter distribuerar alltmer rå varuservrar med höghastighets-nätverksgränssnitt för att möjliggöra kommunikation med låg latens. Att uppnå låg latens vid höga datahastigheter beror dock mycket på hur den inkommande trafiken interagerar med systemets cacheminnen. När paket som behöver bearbetas på samma sätt är konsekutiva, dvs. uppvisar hög tids- och rumslig lokalitet, ger cacher stora fördelar. I denna licentiatuppsats studerar vi systematiskt effekterna av tidsmässig och rumslig trafikplats på prestanda för rå varuservrar utrustade med höghastighetsnätgränssnitt.Vå ra resultat visar att (i) prestandan för en mängd allmänt distribuerade applikationer försämras avsevärt med till och med den minsta bristen på trafikplats, och (ii) visar ett trafikspår från vår organisation dålig trafikplats som nätverksprotokoll, drivrutiner och den underliggande omkopplingen/dirigera tygspridningspaket i tid (minska lokaliteten). För att ta itu med dessa problem byggde vi Reframer, en mjukvarulösning som medvetet fördröjer paket och ordnar dem för att öka trafikplatsen. Trots införandet av µs-skalafördröjningar för vissa paket visar vi att Reframer ökar genomströmningen för en nätverkstjänstkedja med upp till 84% och minskar flödet för en webbserver med 11% samtidigt som dess genomströmning förbättras med 20%. / <p>QC 20210512</p> / ULTRA Packet Ordering. Spatial Locality Temporal Locality Packet Scheduling Batch Processing Paketbeställning Rumslig lokalitet Temporal lokalitet Paketplanering Satsvis bearbetning Communication Systems Kommunikationssystem
7	Iterative and Adaptive PDE Solvers for Shared Memory Architectures / Iterativa och adaptiva PDE-lösare för parallelldatorer med gemensam minnesorganisation Löf, Henrik January 2006 (has links) Scientific computing is used frequently in an increasing number of disciplines to accelerate scientific discovery. Many such computing problems involve the numerical solution of partial differential equations (PDE). In this thesis we explore and develop methodology for high-performance implementations of PDE solvers for shared-memory multiprocessor architectures. We consider three realistic PDE settings: solution of the Maxwell equations in 3D using an unstructured grid and the method of conjugate gradients, solution of the Poisson equation in 3D using a geometric multigrid method, and solution of an advection equation in 2D using structured adaptive mesh refinement. We apply software optimization techniques to increase both parallel efficiency and the degree of data locality. In our evaluation we use several different shared-memory architectures ranging from symmetric multiprocessors and distributed shared-memory architectures to chip-multiprocessors. For distributed shared-memory systems we explore methods of data distribution to increase the amount of geographical locality. We evaluate automatic and transparent page migration based on runtime sampling, user-initiated page migration using a directive with an affinity-on-next-touch semantic, and algorithmic optimizations for page-placement policies. Our results show that page migration increases the amount of geographical locality and that the parallel overhead related to page migration can be amortized over the iterations needed to reach convergence. This is especially true for the affinity-on-next-touch methodology whereby page migration can be initiated at an early stage in the algorithms. We also develop and explore methodology for other forms of data locality and conclude that the effect on performance is significant and that this effect will increase for future shared-memory architectures. Our overall conclusion is that, if the involved locality issues are addressed, the shared-memory programming model provides an efficient and productive environment for solving many important PDE problems. partial differential equations iterative methods finite elements conjugate gradients adaptive mesh refinement multigrid cc-NUMA distributed shared memory OpenMP page migration TLB shoot-down bandwidth minimization reverse Cuthill-McKee migrate-on-next-touch affinity temporal locality chip multiprocessors CMP

1

Page generated in 0.049 seconds