• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 46
  • 7
  • 4
  • 4
  • 2
  • 1
  • Tagged with
  • 87
  • 87
  • 43
  • 25
  • 18
  • 17
  • 15
  • 14
  • 14
  • 14
  • 13
  • 13
  • 11
  • 11
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Managing XML data in a relational warehouse on query translation, warehouse maintenance, and data staleness /

Kanna, Rajesh. January 2001 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains x, 75 p.; also contains graphics. Vita. Includes bibliographical references (p. 71-74).
42

Shared resource management for efficient heterogeneous computing

Lee, Jaekyu 13 January 2014 (has links)
The demand for heterogeneous computing, because of its performance and energy efficiency, has made on-chip heterogeneous chip multi-processors (HCMP) become the mainstream computing platform, as the recent trend shows in a wide spectrum of platforms from smartphone application processors to desktop and low-end server processors. The performance of on-chip GPUs is not yet comparable to that of discrete GPU cards, but vendors have integrated more powerful GPUs and this trend will continue in upcoming processors. In this architecture, several system resources are shared between CPUs and GPUs. The sharing of system resources enables easier and cheaper data transfer between CPUs and GPUs, but it also causes resource contention problems between cores. The resource sharing problem has existed since the homogeneous (CPU-only) chip-multi processor (CMP) was introduced. However, resource sharing in HCMPs shows different aspects because of the different nature of CPU and GPU cores. In order to solve the resource sharing problem in HCMPs, we consider efficient shared resource management schemes, in particular tackling the problem in shared last-level cache and interconnection network. In the thesis, we propose four resource sharing mechanisms: First, we propose an efficient cache sharing mechanism that exploits the different characteristics of CPU and GPU cores to effectively share cache space between them. Second, adaptive virtual channel partitioning for on-chip interconnection network is proposed to isolate inter-application interference. By partitioning virtual channels to CPUs and GPUs, we can prevent the interference problem while guaranteeing quality-of-service (QoS) for both cores. Third, we propose a dynamic frequency controlling mechanism to efficiently share system resources. When both cores are active, the degree of resource contention as well as the system throughput will be affected by the operating frequency of CPUs and GPUs. The proposed mechanism tries to find optimal operating frequencies for both cores, which reduces the resource contention while improving system throughput. Finally, we propose a second cache sharing mechanism that exploits GPU-semantic information. The programming and execution models of GPUs are more strict and easier than those of CPUs. Also, programmers are asked to provide more information to the hardware. By exploiting these characteristics, GPUs can energy-efficiently exercise the cache and simpler, but more efficient cache partitioning can be enabled for HCMPs.
43

Accelerating Java on Embedded GPU

P. Joseph, Iype 10 March 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
44

A performance evaluation of dynamic transport switching for multi-transport devices /

Wang, Lei, January 2006 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2006. / Includes bibliographical references (p. 199-200).
45

Statically configured heterogeneous SMT processor

Vellore Suriyakumar, Avinankumar. January 2009 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Computer Science, 2009. / Includes bibliographical references.
46

Pervasive hypermedia

Anderson, Kenneth M. January 1997 (has links)
Thesis (Ph. D., Information and Computer Science)--University of California, Irvine, 1997. / Includes bibliographical references.
47

A hardware/software codesign for the chemical reactivity of BRAMS / Um coprojeto de hardware/software para a reatividade química do BRAMS

Carlos Alberto Oliveira de Souza Junior 05 June 2017 (has links)
Several critical human activities depend on the weather forecasting. Some of them are transportation, health, work, safety, and agriculture. Such activities require computational solutions for weather forecasting through numerical models. These numerical models must be accurate and allow the computers to process them quickly. In this project, we aim at migrating a small part of the software of the weather forecasting model of Brazil, BRAMS Brazilian developments on the Regional Atmospheric Modelling System to a heterogeneous system composed of Xeon (Intel) processors coupled to a reprogrammable circuit (FPGA) via PCIe bus. According to the studies in the literature, the chemical equation from the mass continuity equation is the most computationally demanding part. This term calculates several linear systems Ax = b. Thus, we implemented such equations in hardware and provided a portable and highly parallel design in OpenCL language. The OpenCL framework also allowed us to couple our circuit to BRAMS legacy code in Fortran90. Although the development tools present several problems, the designed solution has shown to be viable with the exploration of parallel techniques. However, the performance was below of what we expected. / Várias atividades humanas dependem da previsão do tempo. Algumas delas são transporte, saúde, trabalho, segurança e agricultura. Tais atividades exigem solucões computacionais para previsão do tempo através de modelos numéricos. Estes modelos numéricos devem ser precisos e ágeis para serem processados no computador.Este projeto visa portar uma pequena parte do software do modelo de previsão de tempo do Brasil, o BRAMSBrazilian developments on the Regional Atmospheric Modelling Systempara uma arquitetura heterogênea composta por processadores Xeon (Intel) acoplados a um circuito reprogramável em FPGA via barramento PCIe. De acordo com os estudos, o termo da química da equação de continuidade da massa é o termo mais caro computacionalmente. Este termo calcula várias equações lineares do tipo Ax = b. Deste modo, este trabalho implementou estas equações em hardware, provendo um ´codigo portável e paralelo na linguagem OpenCL. O framework OpenCL também nos permitiu acoplar o código legado do BRAMS em Fortran90 junto com o hardware desenvolvido. Embora as ferramentas de desenvolvimento tenham apresentado vários problemas, a solução implementada mostrou-se viável com a exploração de técnicas de paralelismo. Entretando sua perfomance ficou muito aquém do desejado.
48

Accelerating Java on Embedded GPU

P. Joseph, Iype January 2014 (has links)
Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI-OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.
49

Accelerated Iterative Algorithms with Asynchronous Accumulative Updates on a Heterogeneous Cluster

Gubbi Virupaksha, Sandesh 23 March 2016 (has links)
In recent years with the exponential growth in web-based applications the amount of data generated has increased tremendously. Quick and accurate analysis of this 'big data' is indispensable to make better business decisions and reduce operational cost. The challenges faced by modern day data centers to process big data are multi fold: to keep up the pace of processing with increased data volume and increased data velocity, deal with system scalability and reduce energy costs. Today's data centers employ a variety of distributed computing frameworks running on a cluster of commodity hardware which include general purpose processors to process big data. Though better performance in terms of big data processing speed has been achieved with existing distributed computing frameworks, there is still an opportunity to increase processing speed further. FPGAs, which are designed for computationally intensive tasks, are promising processing elements that can increase processing speed. In this thesis, we discuss how FPGAs can be integrated into a cluster of general purpose processors running iterative algorithms and obtain high performance. In this thesis, we designed a heterogeneous cluster comprised of FPGAs and CPUs and ran various benchmarks such as PageRank, Katz and Connected Components to measure the performance of the cluster. Performance improvement in terms of execution time was evaluated against a homogeneous cluster of general purpose processors and a homogeneous cluster of FPGAs. We built multiple four-node heterogeneous clusters with different configurations by varying the number of CPUs and FPGAs. We studied the effects of load balancing between CPUs and FPGAs. We obtained a speedup of 20X, 11.5X and 2X for PageRank, Katz and Connected Components benchmarks on a cluster cluster configuration of 2 CPU + 2 FPGA for an unbalancing ratio against a 4-node homogeneous CPU cluster. We studied the effect of input graph partitioning, and showed that when the input is a Multilevel-KL partitioned graph we obtain an improvement of 11%, 26% and 9% over randomly partitioned graph for Katz, PageRank and Connected Components benchmarks on a 2 CPU + 2 FPGA cluster.
50

Popcorn Linux: enabling efficient inter-core communication in a Linux-based multikernel operating system

Shelton, Benjamin H. 31 May 2013 (has links)
As manufacturers introduce new machines with more cores, more NUMA-like architectures, and more tightly integrated heterogeneous processors, the traditional abstraction of a monolithic OS running on a SMP system is encountering new challenges.  One proposed path forward is the multikernel operating system.  Previous efforts have shown promising results both in scalability and in support for heterogeneity.  However, one effort\'s source code is not freely available (FOS), and the other effort is not self-hosting and does not support a majority of existing applications (Barrelfish). In this thesis, we present Popcorn, a Linux-based multikernel operating system.  While Popcorn was a group effort, the boot layer code and the memory partitioning code are the author\'s work, and we present them in detail here.  To our knowledge, we are the first to support multiple instances of the Linux kernel on a 64-bit x86 machine and to support more than 4 kernels running simultaneously. We demonstrate that existing subsystems within Linux can be leveraged to meet the design goals of a multikernel OS.  Taking this approach, we developed a fast inter-kernel network driver and messaging layer.  We demonstrate that the network driver can share a 1 Gbit/s link without degraded performance and that in combination with guest kernels, it meets or exceeds the performance of SMP Linux with an event-based web server.  We evaluate the messaging layer with microbenchmarks and conclude that it performs well given the limitations of current x86-64 hardware.  Finally, we use the messaging layer to provide live process migration between cores. / Master of Science

Page generated in 0.1277 seconds