141 |
Robust Service Provisioning in Network Function Virtualization / ネットワーク機能仮想化における堅牢なサービスプロビジョニングZHANG, YUNCAN 24 September 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23550号 / 情博第780号 / 新制||情||133(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授 大木 英司, 教授 原田 博司, 教授 湊 真一 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
142 |
Evaluating energy-efficient cloud radio access networks for 5GSigwele, Tshiamo, Alam, Atm S., Pillai, Prashant, Hu, Yim Fun 04 February 2016 (has links)
Yes / Next-generation cellular networks such as fifth-generation (5G) will experience tremendous growth in traffic. To accommodate such traffic demand, there is a necessity to increase the network capacity that eventually requires the deployment of more base stations (BSs). Nevertheless, BSs are very expensive and consume a significant amount of energy. Meanwhile, cloud radio access networks (C-RAN) has been proposed as an energy-efficient architecture that leverages cloud computing technology where baseband processing is performed in the cloud, i.e., the computing servers or baseband processing units (BBUs) are located in the cloud. With such an arrangement, more energy saving gains can be achieved by reducing the number of BBUs used. This paper proposes a bin packing scheme with three variants such as First-fit (FT), First-fit decreasing (FFD) and Next-fit (NF) for minimizing energy consumption in 5G C-RAN. The number of BBUs are reduced by matching the right amount of baseband computing load with traffic load. In the proposed scheme, BS traffic items that are mapped into processing requirements, are to be packed into computing servers, called bins, such that the number of bins used are minimized and idle servers can then be switched off to save energy. Simulation results demonstrate that the proposed bin packing scheme achieves an enhanced energy performance compared to the existing distributed BS architecture.
|
143 |
Exploration and Integration of File Systems in LlamaOSCraig, Kyle January 2014 (has links)
No description available.
|
144 |
An evaluation of GPU virtualizationVilestad, Josef January 2024 (has links)
There has been extensive research and progress on virtualization on CPUs for a while. More recently the focus on GPU virtualization has increased as processing power doubles roughly every 2.5 years. Coupled with advances in memory management and the PCIe standard the first hardware assisted virtual solutions became available in the 2010s. Very recently, a new virtualization mode called Multi-Instance GPU (MIG) makes it possible to isolate partitions with memory in hardware rather than just software. This thesis is focused on virtual GPU performance and capabilities for AI training in a multi tenant situation. It explores the technologies currently used for GPU virtualization,including Single Root IO Virtualization (SR-IOV) and mediated devices. It also covers a proposed new standard for IO virtualization called SIOV that addresses some of the limitations in the SR-IOV standard. The limitations of time sliced virtualization are mainly the lack of customization for a partition compared to CPU virtualization and the problem of overhead. MIG virtualization is more customisable in how compute power and memory can be allocated, the biggest limitation is that fast intercommunication is not currently possible between partitions, making MIG more suited for applications that can run on just one partition. It is also not suited for graphical applications as it currently does not support any graphical APIs. The experimental results showed that in compute situations the overhead of time sliced virtualization is around 5% while the maximum intercommunication bandwidth is lowered by 11% and latency increased by 25%. Time slice windows of 4ms compared to 2ms can decrease scheduling overhead to nearly 0.5% at the cost of increased latency for the end user, this can be beneficial for applications where user interactivity is not of importance.
|
145 |
On Optimizing and Leveraging Distributed Shared Memory for High Performance, Resource Aggregation, and Cache-coherent Heterogeneous-ISA ProcessorsChuang, Ho-Ren 28 June 2022 (has links)
This dissertation focuses on the problem space of heterogeneous-ISA multiprocessors – an architectural design point that is being studied by the academic research community and increasingly available in commodity systems. Since such architectures usually lack globally coherent shared memory, software-based distributed shared memory (DSM) is often used to provide the illusion of such a memory. The DSM abstraction typically provides this illusion using a reader-replicate, writer-invalidate memory consistency protocol that operates at the granularity of memory pages and is usually implemented as a first-class operating system abstraction. This enables symmetric multiprocessing (SMP) programming frameworks, augmented with a heterogeneous-ISA compiler, to use CPU cores of different ISAs for parallel computations as if they are of the same ISA, improving programmability, especially for legacy SMP applications which therefore can run unmodified on such hardware.
Past DSMs have been plagued by poor performance, in part due to the high latency and low bandwidth of interconnect network infrastructures. The dissertation revisits DSM in light of modern interconnects that reverse this performance trend. The dissertation presents Xfetch, a bulk page prefetching mechanism designed for the DEX DSM system. Xfetch exploits spatial locality, and aggressively and sequentially prefetches pages before potential read faults, improving DSM performance. Our experimental evaluations reveal that Xfetch achieves up to ≈142% speedup over the baseline DEX DSM that does not prefetch page data.
SMP programming models often allow primitives that permit weaker memory consistency semantics, where synchronization updates can be delayed, permitting greater parallelism and thereby higher performance. Inspired by such primitives, the dissertation presents a DSM protocol called MWPF that trades-off memory consistency for higher performance in select SMP code regions, targeting heterogeneous-ISA multiprocessor systems. MWPF also overcomes performance bottlenecks of past DSM systems for heterogeneous-ISA multiprocessors such as due to significant number of invalidation messages, false page sharing, large number of read page faults, and large synchronization overheads by using efficient protocol primitives that delay and batch invalidation messages, aggressively prefetch data pages, and perform cross-domain synchronization with low overhead. Our experimental evaluations reveal that MWPF achieves, on average, 11% speedup over the baseline DSM implementation.
The dissertation presents PuzzleHype, a distributed hypervisor that enables a single virtual machine (VM) to use fragmented resources in distributed virtualized settings such as CPU cores, memory, and devices of different physical hosts, and thereby decrease resource fragmentation and increase resource utilization. PuzzleHype leverages DSM implemented in host operating systems to present an unified and consistent view of a continuous pseudo-physical address space to guest operating systems. To transparently utilize CPU and I/O resources, PuzzleHype integrates multiple physical CPUs into a single VM by migrating threads, forwarding interrupts, and by delegating I/O. Our experimental evaluations reveal that PuzzleHype yields speedups in the range of 355%–173% over baseline over-provisioning scenarios which are otherwise necessary due to resource fragmentation.
To enable a distributed hypervisor to adapt to resource and workload changes, the dissertation proposes the concept of CPU borrowing that allows a VM's virtual CPU (vCPU) to migrate to an available physical CPU (pCPU) and release it when it is no longer necessary, i.e., CPU returning. CPU borrowing can thus be used when a node is over-committed, and CPU returning can be used when the borrowed CPU resource is no longer necessary. To transparently migrate a vCPU at runtime without incurring a significant downtime, the dissertation presents a suite of techniques including leveraging thread migration, loading/restoring vCPU in KVM states, maintaining a global vCPU location table, and creating a DSM kernel thread for handling on-demand paging. Our experimental evaluations reveal that migrating vCPUs to resource-available nodes achieves a speedup of 1.4x over running the vCPUs on distributed nodes.
When a VM spans multiple nodes, it is likelihood for failure increases. To mitigate this, the dissertation presents a distributed checkpoint/restart mechanism that allows a distributed VM to tolerate failures. A user interface is introduced for sending/receiving checkpoint/restart commands to a distributed VM. We implement the checkpoint/restart technique in the native KVM tool, and extend it to a distributed mode by converting Inter-Process Communication (IPC) into message passing between nodes, pausing/resuming distributed vCPU executions, and loading/restoring runtime states on the correct set of nodes. Our experimental evaluations indicate that the overhead of checkpointing a distributed VM is ≈10% or less than that of the native KVM tool with our checkpoint support. Restarting a distributed VM is faster than native KVM with our restart support because no additional page faults occur during restarting.
The dissertation's final contribution is PopHype, a system software stack that allows simulation of cache-coherent, shared memory heterogeneous-ISA hardware. PopHype includes a Linux operating system that implements DSM as an OS abstraction for processes, i.e., allows multiple processes running on multiple (ISA-different) machines to share memory. With KVM-enabled, this OS becomes a hypervisor that allows multiple, process-based instances of an architecture emulator such as QEMU to execute in a shared address space, allowing multiple QEMU instances to emulate different ISAs in shared memory, i.e., emulate shared memory heterogeneous-ISA hardware. PopHype also includes a modified QEMU to use process-level DSM and an optimized guest OS kernel for improved performance. Our experimental studies confirm PopHype's effectiveness, and reveal that PopHype achieves an average speedup of 7.32x over a baseline that runs multiple QEMU instances in shared memory atop a single host OS. / Doctor of Philosophy / Computing devices are ubiquitous around us. Each of these devices is powered by specialized chips called processors. These processors take in instructions, process them, and produce output. Such processing is what enables us, humans, to send messages to our loved ones, take photographs, as well as carry out various business functions such as using spreadsheet software. The kinds of instructions these processors execute are classified into so-called Instruction Set Architectures or ISAs. Chip designers build processors adopting different ISAs for various applications ranging from computing on mobile phones to cloud computing data centers used by large technology companies.
Within a data center, there are typically hundreds of thousands of computing devices that serve an organization's purpose to serve millions or even billions of users. Programming these computers individually to serve a collective goal is an arduous task requiring hundreds of software engineering experts. To simplify programming these computers on a large scale, this thesis envisions an abstraction where tens of devices appear as one computing unit to the programmer, allowing them to program multiple computers as if they are one. This allows for better resource utilization in the sense that the power of multiple computing devices can be pooled together without the need to acquire newer, larger, and more-expensive computers.
Furthermore, such pooling allows the software to leverage multiple different ISAs on different computers instead of a single ISA on one computer. This thesis also envisions a way for software to run on multiple computers with potentially different ISAs without exposing the difficulty of managing them to the software engineers.
|
146 |
On the Enhancement of Remote GPU Virtualization in High Performance ClustersReaño González, Carlos 01 September 2017 (has links)
Graphics Processing Units (GPUs) are being adopted in many computing facilities given their extraordinary computing power, which makes it possible to accelerate many general purpose applications from different domains. However, GPUs also present several side effects, such as increased acquisition costs as well as larger space requirements. They also require more powerful energy supplies. Furthermore, GPUs still consume some amount of energy while idle and their utilization is usually low for most workloads.
In a similar way to virtual machines, the use of virtual GPUs may address the aforementioned concerns. In this regard, the remote GPU virtualization mechanism allows an application being executed in a node of the cluster to transparently use the GPUs installed at other nodes. Moreover, this technique allows to share the GPUs present in the computing facility among the applications being executed in the cluster. In this way, several applications being executed in different (or the same) cluster nodes can share one or more GPUs located in other nodes of the cluster. Sharing GPUs should increase overall GPU utilization, thus reducing the negative impact of the side effects mentioned before. Reducing the total amount of GPUs installed in the cluster may also be possible.
In this dissertation we enhance one framework offering remote GPU virtualization capabilities, referred to as rCUDA, for its use in high-performance clusters. While the initial prototype version of rCUDA demonstrated its functionality, it also revealed concerns with respect to usability, performance, and support for new GPU features, which prevented its used in production environments. These issues motivated this thesis, in which all the research is primarily conducted with the aim of turning rCUDA into a production-ready solution for eventually transferring it to industry. The new version of rCUDA resulting from this work presents a reduction of up to 35% in execution time of the applications analyzed with respect to the initial version. Compared to the use of local GPUs, the overhead of this new version of rCUDA is below 5% for the applications studied when using the latest high-performance computing networks available. / Las unidades de procesamiento gráfico (Graphics Processing Units, GPUs) están siendo utilizadas en muchas instalaciones de computación dada su extraordinaria capacidad de cálculo, la cual hace posible acelerar muchas aplicaciones de propósito general de diferentes dominios. Sin embargo, las GPUs también presentan algunas desventajas, como el aumento de los costos de adquisición, así como mayores requerimientos de espacio. Asimismo, también requieren un suministro de energía más potente. Además, las GPUs consumen una cierta cantidad de energía aún estando inactivas, y su utilización suele ser baja para la mayoría de las cargas de trabajo.
De manera similar a las máquinas virtuales, el uso de GPUs virtuales podría hacer frente a los inconvenientes mencionados. En este sentido, el mecanismo de virtualización remota de GPUs permite que una aplicación que se ejecuta en un nodo de un clúster utilice de forma transparente las GPUs instaladas en otros nodos de dicho clúster. Además, esta técnica permite compartir las GPUs presentes en el clúster entre las aplicaciones que se ejecutan en el mismo. De esta manera, varias aplicaciones que se ejecutan en diferentes nodos de clúster (o los mismos) pueden compartir una o más GPUs ubicadas en otros nodos del clúster. Compartir GPUs aumenta la utilización general de la GPU, reduciendo así el impacto negativo de las desventajas anteriormente mencionadas. De igual forma, este mecanismo también permite reducir la cantidad total de GPUs instaladas en el clúster.
En esta tesis mejoramos un entorno de trabajo llamado rCUDA, el cual ofrece funcionalidades de virtualización remota de GPUs para su uso en clusters de altas prestaciones. Si bien la versión inicial del prototipo de rCUDA demostró su funcionalidad, también reveló dificultades con respecto a la usabilidad, el rendimiento y el soporte para nuevas características de las GPUs, lo cual impedía su uso en entornos de producción. Estas consideraciones motivaron la presente tesis, en la que toda la investigación llevada a cabo tiene como objetivo principal convertir rCUDA en una solución lista para su uso entornos de producción, con la finalidad de transferirla eventualmente a la industria. La nueva versión de rCUDA resultante de este trabajo presenta una reducción de hasta el 35% en el tiempo de ejecución de las aplicaciones analizadas con respecto a la versión inicial. En comparación con el uso de GPUs locales, la sobrecarga de esta nueva versión de rCUDA es inferior al 5% para las aplicaciones estudiadas cuando se utilizan las últimas redes de computación de altas prestaciones disponibles. / Les unitats de processament gràfic (Graphics Processing Units, GPUs) estan sent utilitzades en moltes instal·lacions de computació donada la seva extraordinària capacitat de càlcul, la qual fa possible accelerar moltes aplicacions de propòsit general de diferents dominis. No obstant això, les GPUs també presenten alguns desavantatges, com l'augment dels costos d'adquisició, així com major requeriment d'espai. Així mateix, també requereixen un subministrament d'energia més potent. A més, les GPUs consumeixen una certa quantitat d'energia encara estant inactives, i la seua utilització sol ser baixa per a la majoria de les càrregues de treball.
D'una manera semblant a les màquines virtuals, l'ús de GPUs virtuals podria fer front als inconvenients esmentats. En aquest sentit, el mecanisme de virtualització remota de GPUs permet que una aplicació que s'executa en un node d'un clúster utilitze de forma transparent les GPUs instal·lades en altres nodes d'aquest clúster. A més, aquesta tècnica permet compartir les GPUs presents al clúster entre les aplicacions que s'executen en el mateix. D'aquesta manera, diverses aplicacions que s'executen en diferents nodes de clúster (o els mateixos) poden compartir una o més GPUs ubicades en altres nodes del clúster. Compartir GPUs augmenta la utilització general de la GPU, reduint així l'impacte negatiu dels desavantatges anteriorment esmentades. A més a més, aquest mecanisme també permet reduir la quantitat total de GPUs instal·lades al clúster.
En aquesta tesi millorem un entorn de treball anomenat rCUDA, el qual ofereix funcionalitats de virtualització remota de GPUs per al seu ús en clústers d'altes prestacions. Si bé la versió inicial del prototip de rCUDA va demostrar la seua funcionalitat, també va revelar dificultats pel que fa a la usabilitat, el rendiment i el suport per a noves característiques de les GPUs, la qual cosa impedia el seu ús en entorns de producció. Aquestes consideracions van motivar la present tesi, en què tota la investigació duta a terme té com a objectiu principal convertir rCUDA en una solució preparada per al seu ús entorns de producció, amb la finalitat de transferir-la eventualment a la indústria. La nova versió de rCUDA resultant d'aquest treball presenta una reducció de fins al 35% en el temps d'execució de les aplicacions analitzades respecte a la versió inicial. En comparació amb l'ús de GPUs locals, la sobrecàrrega d'aquesta nova versió de rCUDA és inferior al 5% per a les aplicacions estudiades quan s'utilitzen les últimes xarxes de computació d'altes prestacions disponibles. / Reaño González, C. (2017). On the Enhancement of Remote GPU Virtualization in High Performance Clusters [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86219 / Premios Extraordinarios de tesis doctorales
|
147 |
Elastic call admission control using fuzzy logic in virtualized cloud radio base stationsSigwele, Tshiamo, Pillai, Prashant, Hu, Yim Fun January 2015 (has links)
No / Conventional Call Admission Control (CAC) schemes are based on stand-alone Radio Access Networks (RAN) Base Station (BS) architectures which have their independent and fixed spectral and computing resources, which are not shared with other BSs to address their varied traffic needs, causing poor resource utilization, and high call blocking and dropping probabilities. It is envisaged that in future communication systems like 5G, Cloud RAN (C-RAN) will be adopted in order to share this spectrum and computing resources between BSs in order to further improve the Quality of Service (QoS) and network utilization. In this paper, an intelligent Elastic CAC scheme using Fuzzy Logic in C-RAN is proposed. In the proposed scheme, the BS resources are consolidated to the cloud using virtualization technology and dynamically provisioned using the elasticity concept of cloud computing in accordance to traffic demands. Simulations shows that the proposed CAC algorithm has high call acceptance rate compared to conventional CAC.
|
148 |
Secure and Trusted Execution Framework for Virtualized WorkloadsKotikela, Srujan D 08 1900 (has links)
In this dissertation, we have analyzed various security and trustworthy solutions for modern computing systems and proposed a framework that will provide holistic security and trust for the entire lifecycle of a virtualized workload. The framework consists of 3 novel techniques and a set of guidelines. These 3 techniques provide necessary elements for secure and trusted execution environment while the guidelines ensure that the virtualized workload remains in a secure and trusted state throughout its lifecycle. We have successfully implemented and demonstrated that the framework provides security and trust guarantees at the time of launch, any time during the execution, and during an update of the virtualized workload. Given the proliferation of virtualization from cloud servers to embedded systems, techniques presented in this dissertation can be implemented on most computing systems.
|
149 |
Enhancing storage performance in virtualized environments: a pro-active approachSivathanu, Sankaran 17 May 2011 (has links)
Efficient storage and retrieval of data is critical in today's computing environments
and storage systems need to keep up with the pace of evolution of other system components
like CPU, memory etc., for building an overall efficient system. With virtualization
becoming pervasive in enterprise and cloud-based infrastructures, it becomes vital to build
I/O systems that better account for the changes in scenario in virtualized systems. However,
the evolution of storage systems have been limited significantly due to adherence to legacy
interface standards between the operating system and storage subsystem. Even though storage
systems have become more powerful in the recent times hosting large processors and
memory, thin interface to file system leads to wastage of vital information contained in the
storage system from being used by higher layers. Virtualization compounds this problem
with addition of new indirection layers that makes underlying storage systems even more
opaque to the operating system.
This dissertation addresses the problem of inefficient use of disk information by identifying
storage-level opportunities and developing pro-active techniques to storage management.
We present a new class of storage systems called pro-active storage systems (PaSS),
which in addition to being compatible with existing I/O interface, exerts a limit degree of
control over the file system policies by leveraging it's internal information. In this dissertation,
we present our PaSS framework that includes two new I/O interfaces called push
and pull, both in the context of traditional systems and virtualized systems. We demonstrate
the usefulness of our PaSS framework by a series of case studies that exploit the
information available in underlying storage system layer, for overall improvement in IO
performance. We also built a framework to evaluate performance and energy of modern
storage systems by implementing a novel I/O trace replay tool and an analytical model for measuring performance and energy of complex storage systems. We believe that our PaSS
framework and the suite of evaluation tools helps in better understanding of modern storage
system behavior and thereby implement efficient policies in the higher layers for better
performance, data reliability and energy efficiency by making use of the new interfaces in
our framework.
|
150 |
Virtualization services: scalable methods for virtualizing multicore systemsRaj, Himanshu 10 January 2008 (has links)
Multi-core technology is bringing parallel processing capabilities
from servers to laptops and even handheld devices. At the same time,
platform support for system virtualization is making it easier to
consolidate server and client resources, when and as needed by
applications. This consolidation is achieved by dynamically mapping
the virtual machines on which applications run to underlying
physical machines and their processing cores. Low cost processor and
I/O virtualization methods efficiently scaled to different numbers of
processing cores and I/O devices are key enablers of such consolidation.
This dissertation develops and evaluates new methods for scaling
virtualization functionality to multi-core and future many-core systems.
Specifically, it re-architects virtualization functionality to improve
scalability and better exploit multi-core system resources. Results
from this work include a self-virtualized I/O abstraction, which
virtualizes I/O so as to flexibly use different platforms' processing
and I/O resources. Flexibility affords improved performance and resource
usage and most importantly, better scalability than that offered by
current I/O virtualization solutions. Further, by describing system virtualization as a
service provided to virtual machines and the underlying computing platform,
this service can be enhanced to provide new and innovative functionality.
For example, a virtual device may provide obfuscated data to guest operating
systems to maintain data privacy; it could mask differences in device
APIs or properties to deal with heterogeneous underlying resources; or it
could control access to data based on the ``trust' properties of the
guest VM.
This thesis demonstrates that extended virtualization services are
superior to existing operating system or user-level implementations
of such functionality, for multiple reasons. First, this solution
technique makes more efficient use of key performance-limiting resource in
multi-core systems, which are memory and I/O bandwidth. Second, this
solution technique better exploits the parallelism inherent in multi-core
architectures and exhibits good scalability properties, in
part because at the hypervisor level, there is greater control in precisely
which and how resources are used to realize extended virtualization services.
Improved control over resource usage makes it possible to provide
value-added functionalities for both guest VMs and the platform.
Specific instances of virtualization services described in this thesis are the
network virtualization service that exploits heterogeneous processing cores,
a storage virtualization service that provides location transparent access
to block devices by extending
the functionality provided by network virtualization service, a multimedia
virtualization service that allows efficient media device sharing based on semantic
information, and an object-based storage service with enhanced access
control.
|
Page generated in 0.0623 seconds