• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 367
  • 83
  • 46
  • 1
  • Tagged with
  • 497
  • 486
  • 125
  • 96
  • 77
  • 45
  • 44
  • 44
  • 42
  • 40
  • 40
  • 40
  • 40
  • 39
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Dynamic Scheduling of Parallel Applications on Shared-Memory Multiprocessors

Martorell Bofill, Xavier 09 July 1999 (has links)
No description available.
142

On the programmability of heterogeneous massively-parallel computing systems

Gelado Fernández, Isaac 02 July 2010 (has links)
Heterogeneous parallel computing combines general purpose processors with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. Existing programming models for heterogeneous parallel computing impose added coding complexity when compared to traditional sequential shared-memory programming models for homogeneous systems. This extra code complexity is assumable in supercomputing environments, where programmability is sacrificed in pursuit of high performance. However, heterogeneous parallel systems are massively reaching the desktop market (e.g., 425.4 million of GPU cards were sold in 2009), where the trade-off between performance and programmability is the opposite. The code complexity required when using accelerators and the lack of compatibility prevents programmers from exploiting the full computing capabilities of heterogeneous parallel systems in general purpose applications. This dissertation aims to increase the programmability of CPU - accelerator systems, without introducing major performance penalties. The key insight is that general purpose application programmers tend to favor programmability at the cost of system performance. This fact is illustrated by the tendency to use high-level programming languages, such as C++, to ease the task of programming at the cost of minor performance penalties. Moreover, currently many general purpose applications are being developed using interpreted languages, such as Java, C# or python, which raise the abstraction level even further introducing relatively large performance overheads. This dissertation also takes the approach of raising the level of abstraction for accelerators to improve programmability and investigates hardware and software mechanisms to efficiently implement these high-level abstractions without introducing major performance overheads. Heterogeneous parallel systems typically implement separate memories for CPUs and accelerators, although commodity systems might use a shared memory at the cost of lower performance. However, in these commodity shared memory systems, coherence between accelerator and CPUs is not guaranteed. This system architecture implies that CPUs can only access system memory, and accelerators can only access their own local memory. This dissertation assumes separate system and accelerator memory and shows that low-level abstractions for these disjoint address spaces are the source of poor programmability of heterogeneous parallel systems. A first consequence of having separate system and accelerator memories are the current data transfer models for heterogeneous parallel systems. In this dissertation two data transfer paradigms are identified: per-call and double-buffered. In these two models, data structures used by accelerators are allocated in both, system and accelerator memories. These models differ on how data between accelerator and system memories is managed. The per-call model transfers the input data needed by accelerators before accelerator calls, and transfers back the output data produced by accelerators on accelerator call return. The per-call model is quite simple, but might impose unacceptable performance penalties due to data transfer overheads. The double-buffered model aims to overlap data communication and CPU and accelerator computation. This model requires a relative quite complex code due to parallel execution and the need of synchronization between data communication and processing tasks. The extra code required for data transfers in these two models is necessary due to the lack of by-reference parameter passing to accelerators. This dissertation presents a novel accelerator-hosted data transfer model. In this model, data used by accelerators is hosted in the accelerator memory, so when the CPU accesses this data, it is effectively accessing the accelerator memory. Such a model cleanly supports by-reference parameter passing to accelerator calls, removing the need to explicit data transfers. The second consequence of separate system and accelerator memories is that current programming models export separate virtual system and accelerator address spaces to application programmers. This dissertation identifies the double-pointer problem as a direct consequence of these separate virtual memory spaces. The double-pointer problem is that data structures used by both, accelerators and CPUs, are referenced by different virtual memory addresses (pointers) in the CPU and accelerator code. The double-pointer problem requires programmers to add extra code to ensure that both pointers contain consistent values (e.g., when reallocating a data structure). Keeping consistency between system and accelerator pointers might penalize accelerator performance and increase the accelerator memory requirements when pointers are embedded within data structures (e.g., a linked-list). For instance, the double-pointer problem requires increasing the numbers of global memory accesses by 2X in a GPU code that reconstructs a linked-list. This dissertation argues that a unified virtual address space that includes both, system and accelerator memories is an efficient solution to the double-pointer problem. Moreover, such a unified virtual address space cleanly complements the accelerator-hosted data model previously discussed. This dissertation introduces the Non-Uniform Accelerator Memory Access (NUAMA) architecture, as a hardware implementation of the accelerator-hosted data transfer model and the unified virtual address space. In NUAMA an Accelerator Memory Collector (AMC) is included within the system memory controller to identify memory requests for accelerator-hosted data. The AMC buffers and coalesces such memory requests to efficiently transfer data from the CPU to the accelerator memory. NUAMA also implements a hybrid L2 cache memory. The L2 cache in NUAMA follows a write-throughwrite-non-allocate policy for accelerator hosted data. This policy ensures that the contents of the accelerator memory are updated eagerly and, therefore, when the accelerator is called, most of the data has been already transferred. The eager update of the accelerator memory contents effectively overlaps data communication and CPU computation. A write-backwrite-allocate policy is used for the data hosted by the system memory, so the performance of applications that does not use accelerators is not affected. In NUAMA, accelerator-hosted data is identified using a TLB-assisted mechanism. The page table entries are extended with a bit, which is set for those memory pages that are hosted by the accelerator memory. NUAMA increases the average bandwidth requirements for the L2 cache memory and the interconnection network between the CPU and accelerators, but the instantaneous bandwidth, which is the limiting factor, requirements are lower than in traditional DMA-based architectures. The NUAMA architecture is compared to traditional DMA systems using cycle-accurate simulations. Experimental results show that NUAMA and traditional DMA-based architectures perform equally well. However, the application source code complexity of NUAMA is much lower than in DMA-based architectures. A software implementation of the accelerator-hosted model and the unified virtual address space is also explored. This dissertation presents the Asymmetric Distributed Shared Memory (ADSM) model. ADSM maintains a shared logical memory space for CPUs to access data in the accelerator physical memory but not vice versa. The asymmetry allows light-weight implementations that avoid common pitfalls of symmetrical distributed shared memory systems. ADSM allows programmers to assign data structures to performance critical methods. When a method is selected for accelerator execution, its associated data objects are allocated within the shared logical memory space, which is hosted in the accelerator physical memory and transparently accessible by the methods executed on CPUs. ADSM reduces programming efforts for heterogeneous parallel computing systems and enhances application portability. The design and implementation of an ADSM run-time, called GMAC, on top of CUDA in a GNU/Linux environment is presented. Experimental results show that applications written in ADSM and running on top of GMAC achieve performance comparable to their counterparts using programmer-managed data transfers. This dissertation presents the GMAC system, evaluates different design choices, and it further suggests additional architectural support that will likely allow GMAC to achieve higher application performance than the current CUDA model. Finally, the execution model of heterogeneous parallel systems is considered. Accelerator execution is abstracted in different ways in existent programming models. This dissertation explores three approaches implemented by existent programming models. OpenCL and the NVIDIA CUDA driver API use file descriptor semantics to abstract accelerators: user processes access accelerators through descriptors. This approach increases the complexity of using accelerators because accelerator descriptors are needed in any call involving the accelerator (e.g., memory allocations or passing a parameter to the accelerator). The IBM Cell SDK abstract accelerators as separate execution threads. This approach requires adding the necessary code to create new execution threads and synchronization primitives to use of accelerators. Finally, the NVIDIA CUDA run-time API abstract accelerators as Remote Procedure Calls (RPC). This approach is fundamentally incompatible with ADSM, because it assumes separate virtual address spaces for accelerator and CPU code. The Heterogeneous Parallel Execution (HPE) model is presented in this dissertation. This model extends the execution thread abstraction to incorporate different execution modes. Execution modes define the capabilities (e.g., accessible virtual address space, code ISA, etc) of the code being executed. In this execution model, accelerator calls are implemented as execution mode switches, analogously to system calls. Accelerator calls in HPE are synchronous, on the contrary of CUDA, OpenCL and the IBM Cell SDK. Synchronous accelerator calls provide full compatibility with the existent sequential execution model provided by most operating systems. Moreover, abstracting accelerator calls as execution mode switches allows application that use accelerator to run on system without accelerators. In these systems, the execution mode switch falls back to an emulation layer, which emulates the accelerator execution in the CPU. This dissertation further presents different design and implementation choices for the HPE model, in GMAC. The necessary hardware support for an efficient implementation of this model is also presented. Experimental results show that HPE introduces a low execution-time overhead while offering a clean and simple programming interface to applications.
143

Architectural explorations for streaming accelerators with customized memory layouts

Shafiq, Muhammad 21 May 2012 (has links)
El concepto básico de la arquitectura mono-nucleo en los procesadores de propósito general se ajusta bien a un modelo de programación secuencial. La integración de multiples núcleos en un solo chip ha permitido a los procesadores correr partes del programa en paralelo. Sin embargo, la explotación del enorme paralelismo disponible en muchas aplicaciones de alto rendimiento y de los datos correspondientes es difícil de conseguir usando unicamente multicores de propósito general. La aparición de aceleradores tipo streaming y de los correspondientes modelos de programación han mejorado esta situación proporcionando arquitecturas orientadas al proceso de flujos de datos. La idea básica detrás del diseño de estas arquitecturas responde a la necesidad de procesar conjuntos enormes de datos. Estos dispositivos de alto rendimiento orientados a flujos permiten el procesamiento rapido de datos mediante el uso eficiente de computación paralela y comunicación entre procesos. Los aceleradores streaming orientados a flujos, igual que en otros procesadores, consisten en diversos componentes micro-arquitectonicos como por ejemplo las estructuras de memoria, las unidades de computo, las unidades de control, los canales de Entrada/Salida y controles de Entrada/Salida, etc. Sin embargo, los requisitos del flujo de datos agregan algunas características especiales e imponen otras restricciones que afectan al rendimiento. Estos dispositivos, por lo general, ofrecen un gran número de recursos computacionales, pero obligan a reorganizar los conjuntos de datos en paralelo, maximizando la independiencia para alimentar los recursos de computación en forma de flujos. La disposición de datos en conjuntos independientes de flujos paralelos no es una tarea sencilla. Es posible que se tenga que cambiar la estructura de un algoritmo en su conjunto o, incluso, puede requerir la reescritura del algoritmo desde cero. Sin embargo, todos estos esfuerzos para la reordenación de los patrones de las aplicaciones de acceso a datos puede que no sean muy útiles para lograr un rendimiento óptimo. Esto es debido a las posibles limitaciones microarquitectonicas de la plataforma de destino para los mecanismos hardware de prefetch, el tamaño y la granularidad del almacenamiento local, y la flexibilidad para disponer de forma serial los datos en el interior del almacenamiento local. Las limitaciones de una plataforma de streaming de proposito general para el prefetching de datos, almacenamiento y demas procedimientos para organizar y mantener los datos en forma de flujos paralelos e independientes podría ser eliminado empleando técnicas a nivel micro-arquitectonico. Esto incluye el uso de memorias personalizadas especificamente para las aplicaciones en el front-end de una arquitectura streaming. El objetivo de esta tesis es presentar exploraciones arquitectónicas de los aceleradores streaming con diseños de memoria personalizados. En general, la tesis cubre tres aspectos principales de tales aceleradores. Estos aspectos se pueden clasificar como: i) Diseño de aceleradores de aplicaciones específicas con diseños de memoria personalizados, ii) diseño de aceleradores con memorias personalizadas basados en plantillas, y iii) exploraciones del espacio de diseño para dispositivos orientados a flujos con las memorias estándar y personalizadas. Esta tesis concluye con la propuesta conceptual de una Blacksmith Streaming Architecture (BSArc). El modelo de computación Blacksmith permite la adopción a nivel de hardware de un front-end de aplicación específico utilizando una GPU como back-end. Esto permite maximizar la explotación de la localidad de datos y el paralelismo a nivel de datos de una aplicación mientras que proporciona un flujo mayor de datos al back-end. Consideramos que el diseño de estos procesadores con memorias especializadas debe ser proporcionado por expertos del dominio de aplicación en la forma de plantillas. / The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is hard to achieve from these general purpose multi-cores. Streaming accelerators and the corresponding programing models improve upon this situation by providing throughput oriented architectures. The basic idea behind the design of these architectures matches the everyday increasing requirements of processing huge data sets. These high-performance throughput oriented devices help in high performance processing of data by using efficient parallel computations and streaming based communications. The throughput oriented streaming accelerators ¿ similar to the other processors ¿ consist of numerous types of micro-architectural components including the memory structures, compute units, control units, I/O channels and I/O controls etc. However, the throughput requirements add some special features and impose other restrictions for the performance purposes. These devices, normally, offer a large number of compute resources but restrict the applications to arrange parallel and maximally independent data sets to feed the compute resources in the form of streams. The arrangement of data into independent sets of parallel streams is not an easy and simple task. It may need to change the structure of an algorithm as a whole or even it can require to write a new algorithm from scratch for the target application. However, all these efforts for the re-arrangement of application data access patterns may still not be very helpful to achieve the optimal performance. This is because of the possible micro-architectural constraints of the target platform for the hardware pre-fetching mechanisms, the size and the granularity of the local storage and the flexibility in data marshaling inside the local storage. The constraints of a general purpose streaming platform on the data pre-fetching, storing and maneuvering to arrange and maintain it in the form of parallel and independent streams could be removed by employing micro-architectural level design approaches. This includes the usage of application specific customized memories in the front-end of a streaming architecture. The focus of this thesis is to present architectural explorations for the streaming accelerators using customized memory layouts. In general the thesis covers three main aspects of such streaming accelerators in this research. These aspects can be categorized as : i) Design of Application Specific Accelerators with Customized Memory Layout ii) Template Based Design Support for Customized Memory Accelerators and iii) Design Space Explorations for Throughput Oriented Devices with Standard and Customized Memories. This thesis concludes with a conceptual proposal on a Blacksmith Streaming Architecture (BSArc). The Blacksmith Computing allow the hardware-level adoption of an application specific front-end with a GPU like streaming back-end. This gives an opportunity to exploit maximum possible data locality and the data level parallelism from an application while providing a throughput natured powerful back-end. We consider that the design of these specialized memory layouts for the front-end of the device are provided by the application domain experts in the form of templates. These templates are adjustable according to a device and the problem size at the device's configuration time. The physical availability of such an architecture may still take time. However, simulation framework helps in architectural explorations to give insight into the proposal and predicts potential performance benefits for such an architecture.
144

Transparent management of scratchpad memories in shared memory programming models

Álvarez Martín, Lluc 16 December 2015 (has links)
Cache-coherent shared memory has traditionally been the favorite memory organization for chip multiprocessors thanks to its high programmability. In this organization the cache hierarchy is in charge of moving the data and keeping it coherent between all the caches, enabling the usage of shared memory programming models where the programmer does not need to carry out any data management operation. Unfortunately, performing all the data management operations in hardware causes severe problems, being the primary concerns the power consumption originated in the caches and the amount of coherence traffic in the interconnection network. A good solution is to introduce ScratchPad Memories (SPMs) alongside the cache hierarchy, forming a hybrid memory hierarchy. SPMs are more power-efficient than caches and do not generate coherence traffic, but they degrade programmability. In particular, SPMs require the programmer to partition the data, to program data transfers, and to keep coherence between different copies of the data. A promising solution to exploit the benefits of the SPMs without harming programmability is to allow programmers to use shared memory programming models and to automatically generate code that manages the SPMs. Unfortunately, current compilers and runtime systems encounter serious limitations to automatically generate code for hybrid memory hierarchies from shared memory programming models. This thesis proposes to transparently manage the SPMs of hybrid memory hierarchies in shared memory programming models. In order to achieve this goal this thesis proposes a combination of hardware and compiler techniques to manage the SPMs in fork-join programming models and a set of runtime system techniques to manage the SPMs in task programming models. The proposed techniques allow to program hybrid memory hierarchies with these two well-known and easy-to-use forms of shared memory programming models, capitalizing on the benefits of hybrid memory hierarchies in power consumption and network traffic without harming programmability. The first contribution of this thesis is a hardware/software co-designed coherence protocol to transparently manage the SPMs of hybrid memory hierarchies in fork-join programming models. The solution allows the compiler to always generate code to manage the SPMs with tiling software caches, even in the presence of unknown memory aliasing hazards between memory references to the SPMs and to the cache hierarchy. On the software side, the compiler generates a special form of memory instruction for memory references with possible aliasing hazards. On the hardware side, the special memory instructions are diverted to the correct copy of the data using a set of directories that track what data is mapped to the SPMs. The second contribution of this thesis is a set of runtime system techniques to manage the SPMs of hybrid memory hierarchies in task programming models. The proposed runtime system techniques exploit the characteristics of these programming models to map the data specified in the task dependences to the SPMs. Different policies are proposed to mitigate the communication costs of the data transfers, overlapping them with other execution phases such as the task scheduling phase or the execution of the previous task. The runtime system can also reduce the number of data transfers by using a task scheduler that exploits data locality in the SPMs. In addition, the proposed techniques are combined with mechanisms that reduce the impact of fine-grained tasks, such as hardware runtime systems or large SPM sizes. The accomplishment of this thesis is that hybrid memory hierarchies can be programmed with fork-join and task programming models. Consequently, architectures with hybrid memory hierarchies can be exposed to the programmer as a shared memory multiprocessor, taking advantage of the benefits of the SPMs while maintaining the programming simplicity of shared memory programming models. / La memoria compartida con coherencia de caches es la jerarquía de memoria más utilizada en multiprocesadores gracias a su programabilidad. En esta solución la jerarquía de caches se encarga de mover los datos y mantener la coherencia entre las caches, habilitando el uso de modelos de programación de memoria compartida donde el programador no tiene que realizar ninguna operación para gestionar las memorias. Desafortunadamente, realizar estas operaciones en la arquitectura causa problemas severos, siendo especialmente relevantes el consumo de energía de las caches y la cantidad de tráfico de coherencia en la red de interconexión. Una buena solución es añadir Memorias ScratchPad (SPMs) acompañando la jerarquía de caches, formando una jerarquía de memoria híbrida. Las SPMs son más eficientes en energía y tráfico de coherencia, pero dificultan la programabilidad ya que requieren que el programador particione los datos, programe transferencias de datos y mantenga la coherencia entre diferentes copias de datos. Una solución prometedora para beneficiarse de las ventajas de las SPMs sin dificultar la programabilidad es permitir que el programador use modelos de programación de memoria compartida y generar código para gestionar las SPMs automáticamente. El problema es que los compiladores y los entornos de ejecución actuales sufren graves limitaciones al gestionar automáticamente una jerarquía de memoria híbrida en modelos de programación de memoria compartida. Esta tesis propone gestionar automáticamente una jerarquía de memoria híbrida en modelos de programación de memoria compartida. Para conseguir este objetivo esta tesis propone una combinación de técnicas hardware y de compilador para gestionar las SPMs en modelos de programación fork-join, y técnicas en entornos de ejecución para gestionar las SPMs en modelos de programación basados en tareas. Las técnicas propuestas hacen que las jerarquías de memoria híbridas puedan programarse con estos dos modelos de programación de memoria compartida, de tal forma que las ventajas en energía y tráfico de coherencia se puedan explotar sin dificultar la programabilidad. La primera contribución de esta tesis en un protocolo de coherencia hardware/software para gestionar SPMs en modelos de programación fork-join. La propuesta consigue que el compilador siempre pueda generar código para gestionar las SPMs, incluso cuando hay posibles alias de memoria entre referencias a memoria a las SPMs y a la jerarquía de caches. En la solución el compilador genera instrucciones especiales para las referencias a memoria con posibles alias, y el hardware sirve las instrucciones especiales con la copia válida de los datos usando directorios que guardan información sobre qué datos están mapeados en las SPMs. La segunda contribución de esta tesis son una serie de técnicas para gestionar SPMs en modelos de programación basados en tareas. Las técnicas aprovechan las características de estos modelos de programación para mapear las dependencias de las tareas en las SPMs y se complementan con políticas para minimizar los costes de las transferencias de datos, como solaparlas con fases del entorno de ejecución o la ejecución de tareas anteriores. El número de transferencias también se puede reducir utilizando un planificador que tenga en cuenta la localidad de datos y, además, las técnicas se pueden combinar con mecanismos para reducir los efectos negativos de tener tareas pequeñas, como entornos de ejecución en hardware o SPMs de más capacidad. Las propuestas de esta tesis consiguen que las jerarquías de memoria híbridas se puedan programar con modelos de programación fork-join y basados en tareas. En consecuencia, las arquitecturas con jerarquías de memoria híbridas se pueden exponer al programador como multiprocesadores de memoria compartida, beneficiándose de las ventajas de las SPMs en energía y tráfico de coherencia y manteniendo la simplicidad de uso de los modelos de programación de memoria compartida.
145

SIMD@OpenMP : a programming model approach to leverage SIMD features

Caballero de Gea, Diego Luis 11 December 2015 (has links)
SIMD instruction sets are a key feature in current general purpose and high performance architectures. SIMD instructions apply in parallel the same operation to a group of data, commonly known as vector. A single SIMD/vector instruction can, thus, replace a sequence of scalar instructions. Consequently, the number of instructions can be greatly reduced leading to improved execution times. However, SIMD instructions are not widely exploited by the vast majority of programmers. In many cases, taking advantage of these instructions relies on the compiler. Nevertheless, compilers struggle with the automatic vectorization of codes. Advanced programmers are then compelled to exploit SIMD units by hand, using low-level hardware-specific intrinsics. This approach is cumbersome, error prone and not portable across SIMD architectures. This thesis targets OpenMP to tackle the underuse of SIMD instructions from three main areas of the programming model: language constructions, compiler code optimizations and runtime algorithms. We choose the Intel Xeon Phi coprocessor (Knights Corner) and its 512-bit SIMD instruction set for our evaluation process. We make four contributions aimed at improving the exploitation of SIMD instructions in this scope. Our first contribution describes a compiler vectorization infrastructure suitable for OpenMP. This infrastructure targets for-loops and whole functions. We define a set of attributes for expressions that determine how the code is vectorized. Our vectorization infrastructure also implements support for several advanced vector features. This infrastructure is proven to be effective in the vectorization of complex codes and it is the basis upon which we build the following two contributions. The second contribution introduces a proposal to extend OpenMP 3.1 with SIMD parallelism. Essential parts of this work have become key features of the SIMD proposal included in OpenMP 4.0. We define the "simd" and "simd for" directives that allow programmers to describe SIMD parallelism of loops and whole functions. Furthermore, we propose a set of optional clauses that leads the compiler to generate a more efficient vector code. These SIMD extensions improve the programming efficiency when exploiting SIMD resources. Our evaluation on the Intel Xeon Phi coprocessor shows that our SIMD proposal allows the compiler to efficiently vectorize codes poorly or not vectorized automatically with the Intel C/C++ compiler. In the third contribution, we propose a vector code optimization that enhances overlapped vector loads. These vector loads redundantly read from memory scalar elements already loaded by other vector loads. Our vector code optimization improves the memory usage of these accesses by means of building a vector register cache and exploiting register-to-register instructions. Our proposal also includes a new clause (overlap) in the context of the SIMD extensions for OpenMP of our first contribution. This new clause allows enabling, disabling and tuning this optimization on demand. The last contribution tackles the exploitation of SIMD instructions in the OpenMP barrier and reduction primitives. We propose a new combined barrier and reduction tree scheme specifically designed to make the most of SIMD instructions. Our barrier algorithm takes advantage of simultaneous multi-threading technology (SMT) and it utilizes SIMD memory instructions in the synchronization process. The four contributions of this thesis are an important step in the direction of a more common and generalized use of SIMD instructions. Our work is having an outstanding impact on the whole OpenMP community, ranging from users of the programming model to compiler and runtime implementations. Our proposals in the context of OpenMP improves the programmability of the programming model, the overhead of runtime services and the execution time of applications by means of a better use of SIMD. / Los juegos de instrucciones SIMD son un componente clave en las arquitecturas de propósito general y de alto rendimiento actuales. Estas instrucciones aplican en paralelo la misma operación a un conjunto de datos, conocido como vector. Una instrucción SIMD/vectorial puede sustituir una secuencia de instrucciones escalares. Así, el número de instrucciones puede ser reducido considerablemente, dando lugar a mejores tiempos de ejecución. No obstante, las instrucciones SIMD no son explotadas ampliamente por la mayoría de programadores. En general, beneficiarse de estas instrucciones depende del compilador. Sin embargo, los compiladores tienen dificultades con la vectorización automática de códigos por lo que los programadores avanzados se ven obligados a explotar las unidades SIMD manualmente, empleando intrínsecas de bajo nivel específicas del hardware. Esta aproximación es costosa, propensa a errores y no portable entre arquitecturas. Esta tesis se centra en el modelo de programación OpenMP para abordar el poco uso de las instrucciones SIMD desde tres áreas: construcciones del lenguaje, optimizaciones de código del compilador y algoritmos del runtime. Hemos escogido el coprocesador Intel Xeon Phi (Knights Corner) y su juego de instrucciones SIMD de 512 bits para nuestra evaluación. Realizamos cuatro contribuciones para mejorar la explotación de las instrucciones SIMD en este ámbito. Nuestra primera contribución describe una infraestructura de vectorización de compilador adecuada para OpenMP. Esta infraestructura tiene como objetivo la vectorización de bucles y funciones. Para ello definimos un conjunto de atributos que determina como se vectoriza el código. Nuestra evaluación demuestra la efectividad de esta infraestructura en la vectorización de códigos complejos. Esta infraestructura es la base de las dos propuestas siguientes. En la segunda contribución proponemos una extensión SIMD para de OpenMP 3.1. Partes esenciales de este trabajo se han convertido en características clave de la propuesta sobre SIMD incluida en OpenMP 4.0. Definimos las directivas ‘simd’ y ‘simd for’ que permiten a los programadores describir paralelismo SIMD de bucles y funciones. Además, proponemos un conjunto de cláusulas opcionales que permiten que el compilador genere código vectorial más eficiente. Nuestra evaluación muestra que nuestra propuesta SIMD permite al compilador vectorizar eficientemente códigos pobremente o no vectorizados automáticamente con el compilador Intel C/C++.
146

MPI layer techniques to improve network energy efficiency

Dickov, Branimir 10 December 2015 (has links)
Interconnection networks represent the backbone of large-scale parallel systems. In order to build ultra-scale supercomputers larger interconnection networks are being designed and deployed. As compute nodes become more energy-efficient, the interconnect is accounting for an increasing proportion of the total system energy consumption. The interconnect's energy consumption is, however, only starting to receive serious attention. Most of this power consumption is due to the interconnection links. The problem, in terms of power, of an interconnect link is that its power consumption is almost constant, whether or not it is actively exchanging data, since both ends stay active to mantain synchronization. This thesis complements ongoing efforts related to power reduction and energy proportionality of the interconnection network. The thesis contemplates two directions for power savings in the interconnection network; one is the possibility to use lower bandwidth links during the communication phases and thus save energy, while the second one addresses shifting links to low-power mode during computation phases when they are unused. To address the first one we investigate the potential benefits from MPI data compression. When compression of MPI data is possible, the reduction in link bandwidth is enabled without incurring any performance penalty. Consecutively, lower bandwidth leads to lower link energy consumption. In the past, several compression techniques have been proposed as a way to improve the performance and scalability of parallel applications. Those works have shown significant speed-ups when applying compressors to the MPI transfers of certain algorithmic kernels. However, these techniques have not seen widespread adoptation in current supercomputers. In this thesis we will show that although data compression naturally leads to improved performance, the benefit is small, for modern high-performance networks, and it varies greatly between applications. In contrast, combining data compression with switching to low-power mode preserves performance while delivering effective and consistent energy savings, in proportion with the reduction in data rate. In general, application developers view time spent in a communication as an overhead, and therefore strive to keep it at minimum. This leads to high peak bandwidth demand and latency sensitivity, but low average utilization, which provides significant opportunities for energy savings. It is therefore possible to save energy using low-power modes, but link wake-up latencies must not lead to a loss in performance. Thus, we propose a mechanism that can accurately predict when links are idle, allowing them to be switched to more power efficient mode. Our runtime system called the Pattern Prediction System (PPS) can accurately predict not only when a link will become unused but also when it will become active again, allowing links to be switched off during the idle periods and switched back on again in time to avoid incurring a significant performance degradation. Many HPC application benefit from prediction, since they have repetitive computation and communication phases. By implementing the energy-saving mechanisms inside the MPI library, existing MPI programs do not need to be modified. We also develop more advanced version of the prediction system, Self-Tuned Pattern Prediction System (SPPS) which is capable of automatically tuning to the current application communication characteristic and shaping the switching on/off of the links in the most appropriate way. The proposed compression and prediction techniques are evaluated using an event-driven simulator, which is able to replay the traces from real execution of MPI applications. Experimental results show significant energy savings in the IB links while the performance overhead due to wake-up latencies and additional computation time have negligible effects on the final application performance. / En los últimos años, el consumo de energia en la red de interconexión se esta considerando como uno de los factores que pueden condicionar la carrera hacia los sistemas Exascale. En la red de interconexion, la mayor parte de este consumo de energía se debe a los enlaces de red, cuyo consumo permanece constante independientemente de si los datos se intercambian de forma activa, dado que ambos extremos deben de permanecer activos para poder mantener la sincronización. Esta tesis complementa los esfuerzos de investigación que actualmente se estan llevando a cabo a nivel internacional con el objetivo de reducir la potencia y conseguir una proporcionalidad de consumo de energía con respecto al ancho de banda requerido en las comunicaciones. En esta tesis se contemplan dos direcciones complementarias para conseguir dichos objetivos: por un lado, la posibilidad de usar sólo el ancho de banda necesario durante las fases de comunicación; y por lo tanto usar el modo de bajo consumo durante las fases de computación en las que no se requiere de la red de interconexión. Para abordar la primera de ellas se investiga los posibles beneficios de usar compresión en los datos que se transfieren en los mensajes MPI. Cuando ello es posible, se puede realizar la comunicación con una menor necesidad de ancho de banda de los enlaces sin que necesariamente se produzca una penalizacion en el rendimiento de la aplicación. Varias técnicas de compresión han sido propuestas en la literatura con el objetivo de reducir el tiempo de comunicación y la escalabilidad de las aplicaciones paralelas. Aunque estas técnicas han mostrado un potencial importante en ciertos nucleos computacionales, su adopción en sistemas reales no se ha llevado a cabo. En esta tesis, se muestra como el uso de la compresión de datos en los mensajes MPI puede permitir una reducción en el consumo de energia, reduciendo el número de enlaces activos que son requeridos para realizar la comunicación, en proporción a la reducción de los bytes que deben de ser transferidos. En general, los desarrolladores de aplicaciones consideran el tiempo pasado en la comunicación como un gasto innecesario, y por lo tanto se esfuerzan en mantenerlo al mínimo. Esto lleva a una demanda de un ancho de banda que puede afrontar el pico de alto trafico y de una sensibilidad a la latencía, pero con una utilización mediana baja, lo que ofrece unas oportunidades significativas para el ahorro de energía. Por lo tanto, es posible ahorrar la energía apoyándose en los modos de bajo consumo, pero las latencias de reactivación de los enlaces no deben producir una pérdida en el rendimiento. En esta tesis doctoral se propone un mecanismo que permite predecir con exactitud los periodos de inactividad de los enlaces, lo que permitirá pasarlos al modo más eficiente de energía que disponga la infraestructura de red. La propuesta en esta tesis doctoral actua en tiempo de ejecución y se denomina Sistema de Predicción de Patrones (SPP). SPP permite predecir con exactitud no sólo cuando un enlace llega a ser no usado, sino también cuando se requiere de nuevo su reactivación, permitiendo que los enlaces entren en modo de bajo consumo durante los periodos de inactividad y se vuelven de nuevo activos a tiempo evitando provocar una degradación significativa en el rendimiento. Muchas aplicaciones de HPC (High-Performance Computing) pueden beneficiarse de esta predicción, ya que tienen fases de computación y de comunicación repetitivas. Mediante la implementación de los mecanismos de ahorro de energía dentro de la libreria MPI, los programas MPI existentes no requiren ninguna modificación. En la tesis, tambien desarrollamos una version más avanzada del sistema de predicción que dominamos como el Sistema de Prediccion de Patrones con Ajustes Automáticos (SPPA) que además permite ajustar de forma autónoma uno de los parámetros importantes de SPP que determina el grado de agregación de mensajes en el algoritmo de predicción
147

Semantic resource management and interoperability between distributed computing platforms

Ejarque Artigas, Jorge 14 December 2015 (has links)
Distributed Computing is the paradigm where the application execution is distributed across different computers connected by a communication network. Distributed Computing platforms have evolved very fast during the las decades: starting from Clusters, where a set of computers were working together in a single location; then evolving to the Grids, where computing resources are shared by different entities, creating a global computing infrastructure which is available to different user communities; and finally becoming in what is currently known as the Cloud, where computing and data resources are provided, on demand, in a very dynamic fashion, and following the Utility Computing model where you pay only for what you consume. Different types of companies and institutions are exploring the potential benefits of moving their IT services and applications to Cloud infrastructures, in order to decouple the management of computing resources from their core business process to become more productive. Nevertheless, migrating software to Clouds is not an easy task, since it requires a deep knowledge of the technology to decompose the application and the capabilities offered by providers and how to use them. Besides this complex deployment process, the current cloud market place has several providers offering resources with different capabilities, prices and quality, and each provider uses their own properties and APIs for describing and accessing their resources. Therefore, when customers want to execute an application in the providers' resources, they must understand the different providers' description, compare them and select the most suitable resources for their interests. Once the provider and resources have been selected, developers have to inter-operate with the different providers' interfaces to perform the application execution steps. To do all the mentioned steps, application developers have to deal with the design and implementation of complex integration procedures. This thesis presents several contributions to overcome the aforementioned problems by providing a platform that facilitates and automates the integration of applications in different providers' infrastructures lowering the barrier of adopting new distributed computing infrastructure such as Clouds. The achievement of this objective has been split in several parts. In the first part, we have studied how semantic web technologies are helping to describe applications and to automatically infer a model for deploying them in a distributed platform. Once the application deployment model has been inferred, the second step is finding the resources to deploy and execute the different application components. Regarding this topic, we have studied how semantic web technologies can be applied in the resource allocation problem. Once the different components have been allocated in the providers' resources, it is time to deploy and execute the application components on these resources by invoking a workflow of provider API calls. However, every provider defines their own management interfaces, so the workflow to perform the same actions is different depending on the selected provider. In this thesis, we propose a framework to automatically infer the workflow of provider interface calls required to perform any resource management tasks. In the last part of the thesis, we have studied how to introduce the benefits of software agents for coordinating the application management in distributed platforms. We propose a multi-agent system which is in charge of coordinating the different steps of the application deployment in a distributed way as well as monitoring the correct execution of the application in the computing resources. The different contributions have been validated with a prototype implementation and a set of use cases. / La Computación Distribuida es un paradigma donde la ejecución de aplicaciones se distribuye entre diferentes computadores contados a través de una red de comunicación. Las plataformas de computación distribuida han evolucionado rápidamente durante las últimas décadas, empezando por los "Clusters", donde varios computadores están conectados por una red local; pasando por los "Grids", donde los recursos computacionales son compartidos por varias instituciones creando un red de computación global; llegando finalmente a lo que actualmente conocemos como "Clouds", donde nos podemos proveer de recursos de manera dinámica, bajo demanda y pagando solo por lo que consumimos. Actualmente, varias compañías están descubriendo los beneficios de mover sus aplicaciones a las infraestructuras Cloud, desacoplando la administración de los recursos computacionales de su "core business" para ser más productivos. Sin embargo migrar el software al Cloud no es una tarea fácil porque se requiere un conocimiento exhaustivo de la tecnología y como usar los servicios ofrecidos por los diferentes proveedores. Además cada proveedor ofrece recursos con diferentes capacidades, precios y calidades, con su propia interfaz para acceder a ellos. Por consiguiente, cuando un usuario quiere ejecutar una aplicación en el Cloud, debe entender que ofrece cada proveedor y como usarlo y una vez que ha elegido debe programar los diferentes pasos del despliegue de su aplicación. Si además se quieren usar varios proveedores o cambiar a otro, este proceso debe repetirse varias veces. Esta tesis presenta varias contribuciones para mitigar estos problemas diseñando una plataforma para facilitar y automatizar la integración de aplicaciones en los diferentes proveedores. Estas contribuciones se dividen en varias partes: Primero, el estudio de como las tecnologías semánticas pueden ayudar para describir aplicaciones y automáticamente inferir como se puede desplegar en un plataforma distribuida. Una vez obtenemos este modelo de despliegue, la segunda contribución nos presenta como estas mismas tecnologías pueden usarse para asignar las diferentes partes del despliegue de la aplicación a los recursos de los proveedores. Una vez sabemos la asignación, la siguiente contribución nos resuelve como se puede usar "AI planning" para encontrar la secuencia de servicios que se deben ejecutar para realizar el despliegue deseado. Finalmente, la última parte de la tesis, nos presenta como el despliegue y ejecuciones de las aplicaciones puede coordinarse por un sistema multi-agentes de una manera escalable y distribuida. Las diferentes contribuciones de la tesis han sido validadas mediante la implementación de prototipos y casos de uso.
148

Disseny i modelització d'un sistema de gestió multiresolució de sèries temporals

Llusà Serra, Aleix 04 December 2015 (has links)
Nowadays, it is possible to acquire a huge amount of data, mainly due to the fact that it is easy to build monitoring systems together with big sensor networks. However, data has to be managed accordingly, which is not so trivial. Furthermore. The storage for all this data also has to be considered. On the one hand, time series is the formalisation for the process of acquiring values from a variable along time. There is a great deal of algorithms and methodologies for analysing time series which describe how information can be extracted from data. On the other hand, Data Base Management Systems (DBMS) are the formalisation for the systems that store and manage data. That is, these computer systems are devoted to infer the information that a given user may query. These systems are formally defined by logic models from which the relational model is the main reference. This thesis is a dissertation on the hypothesis to store only parts of original data which contain selected information. This information selection involves summarising data with different resolutions, mainly by aggregating data at periodic time intervals. We name multiresolution to this technique. Multiresolution is operated on time series. The results are time subseries that have bounded size and summaries of information. Particular DBMS are used for managing time series, then they are called Time Series Management Systems (TSMS). In this context, we define TSMS with multiresolution capabilities (MTSMS). Similarly to how it is done for DBMS, we formalise a model for TSMS and for MTSMS. The acquisition of time series presents troublesome properties owing to the fact of variable acquired along time. In MTSMS we consider some of this properties such as: the clock synchronisation for different acquisition systems, unknown data when data has not been acquired or when it is erroneous, a huge amount of data to be manage. Moreover it increases as more data gets acquired, or queries with data that has not been acquired regularly along time. MTSMS are defined as systems to store data by selecting information and so by discarding data that is not considered important. Therefore, the parameters for selecting information must be decided previously to storing data. The information theory is the base for measuring the quality of theses systems, which depends on the parameters chosen. Regarding this, multiresolution can be considered as a lossy compression technique. We introduce some hypothesis on measuring the error caused by multiresolution in comparison with the case of having all the original data. Paraphrasing a current opinion in DBMS, the same system can not be adequate for all the different contexts. In addition, systems must consider performance in a variety of resources apart from computing time, such as energy consumption, storage capacity or network transmission. Concerning this, we design different implementations for the model of MTSMS. These implementations experiment with various computing methodologies: incremental computing along the data stream, parallel computing and relational databases computing. Summarising, in this thesis we formalise a model for MTSMS. MTSMS are useful for storing time series in bounded capacity systems and in order to precompute the multiresolution. In this way they can achieve immediate queries and graphical visualisations for summarised time series. However, they imply an information selection that has to be considered previously to the storage. In this thesis we consider the limits for the multiresolution technique. / Actualment és possible d'adquirir una gran quantitat de dades, principalment gràcies a la facilitat de disposar de sistemes de monitoratge amb grans xarxes de sensors. Això no obstant, no és tan senzill de gestionar posteriorment totes aquestes dades. A més, també cal tenir en compte com s'emmagatzemen aquestes dades. D'una banda, l'adquisició de valors d'una variable al llarg del temps es formalitza com a sèrie temporal. Així, hi ha multitud d'algoritmes i metodologies d'anàlisi de sèries temporals que descriuen com extreure informació de les dades. D'altra banda, l'emmagatzematge i la gestió de les dades es formalitza com a sistemes de gestió de bases de dades (SGBD). Així, hi ha sistemes informàtics dedicats a inferir la informació que un usuari vol consultar. Aquests sistemes són descrits per models lògics formals, entre els quals el model relacional n'és la referència principal. En aquesta tesi dissertem sobre el fet d'emmagatzemar només aquella part de les dades originals que conté una certa informació seleccionada. Aquesta selecció de la informació es duu a terme mitjançant el resum de diferents resolucions de les dades, cadascuna de les quals bàsicament són agregacions de les dades a intervals de temps periòdics. A aquesta técnica l'anomenem multiresolució. La multiresolució s'aplica a les sèries temporals. Com a resultat, s'obtenen subsèries temporals de mida finita i amb la información resumida. Per tal de gestionar les sèries temporals, s'utilitzen SGBD específics anomenats sistemes de gestió per a sèries temporals (SGST). Així doncs, proposem SGST amb capacitats de multiresolució (SGSTM). De la mateixa manera que en els SGBD, formalitzem un model pels SGST i pels SGSTM. A causa de la naturalesa de variable capturada al llarg del temps, en l'adquisició de les sèries temporals apareixen propietats problemàtiques. Els SGSTM tenen en compte algunes d'aquestes propietats com: la sincronització dels rellotges en els diferents sistemes 'adquisició, l'aparició de dades desconegudes perquè no s'han pogut adquirir o perquè són errònies, la gestió d'una quantitat enorme de dades, i que a més segueix creixent al llarg del temps, o les consultes amb dades que no s'han recollit de manera uniforme en el temps. Ara bé, els SGSTM són uns sistemes que emmagatzemen unes dades segons una selecció d'informació i descarten les que no es consideren importants. Per tant, prèviament a l'emmagatzematge, cal decidir els paràmetres de selecció de la informació. Per tal d'avaluar la qualitat d'aquests sistemes, depenent dels paràmetres que s'escullin, es pot utilitzar la teoria de la informació. En aquest sentit, la multiresolució es pot considerar com una tècnica de compressió amb pèrdua. Així doncs, introduïm una reflexió sobre com avaluar l'error que es comet amb la multiresolució en comparació amb disposar de totes les dades originals. Com es diu actualment en l'àmbit dels SGBD, un mateix sistema no pot ser adequat per a tots els contextos. A més, els sistemes han de tenir en compte un bon rendiment en altres recursos a part del temps de computació, com per exemple la capacitat finita, el consum d'energia o la transmissió per la xarxa. Així doncs, dissenyem diverses implementacions del model dels SGSTM. Aquestes implementacions exploren diverses tècniques de computació: computació incremental seguint el flux de dades,computació paral·lela i computació de bases de dades relacional. En resum, en aquesta tesi dissenyem els SGSTM i en formalitzem un model. Els SGSTM són útils per a emmagatzemar sèries temporals en sistemes amb capacitat finita i per a precomputar la multiresolució. D'aquesta manera, permeten disposar de consultes i visualitzacions immediates de les sèries temporals de forma resumida. Això no obstant, impliquen una selecció de la informació que cal decidir prèviament. En aquesta tesi proposem consideracions i reflexions sobre els límits de la multiresolució.
149

Hardware thread scheduling algorithms for single-ISA asymmetric CMPs

Markovic, Nikola 22 December 2015 (has links)
Through the past several decades, based on the Moore's law, the semiconductor industry was doubling the number of transistors on the single chip roughly every eighteen months. For a long time this continuous increase in transistor budget drove the increase in performance as the processors continued to exploit the instruction level parallelism (ILP) of the sequential programs. This pattern hit the wall in the early years of the twentieth century when designing larger and more complex cores became difficult because of the power and complexity reasons. Computer architects responded by integrating many cores on the same die thereby creating Chip Multicore Processors (CMP). In the last decade, the computing technology experienced tremendous developments, Chip Multiprocessors (CMP) expanded from the symmetric and homogeneous to the asymmetric or heterogeneous Multiprocessors. Having cores of different types in a single processor enables optimizing performance, power and energy efficiency for a wider range of workloads. It enables chip designers to employ specialization (that is, we can use each type of core for the type of computation where it delivers the best performance/energy trade-off). The benefits of Asymmetric Chip Multiprocessors (ACMP) are intuitive as it is well known that different workloads have different resource requirements. The CMPs improve the performance of applications by exploiting the Thread Level Parallelism (TLP). Parallel applications relying on multiple threads must be efficiently managed and dispatched for execution if the parallelism is to be properly exploited. Since more and more applications become multi-threaded we expect to find a growing number of threads executing on a machine. Consequently, the operating system will require increasingly larger amounts of CPU time to schedule these threads efficiently. Thus, dynamic thread scheduling techniques are of paramount importance in ACMP designs since they can make or break performance benefits derived from the asymmetric hardware or parallel software. Several thread scheduling methods have been proposed and applied to ACMPs. In this thesis, we first study the state of the art thread scheduling techniques and identify the main reasons limiting the thread level parallelism in an ACMP systems. We propose three novel approaches to schedule and manage threads and exploit thread level parallelism implemented in hardware, instead of perpetuating the trend of performing more complex thread scheduling in the operating system. Our first goal is to improve the performance of an ACMP systems by improving thread scheduling at the hardware level. We also show that the hardware thread scheduling reduces the energy consumption of an ACMP systems by allowing better utilization of the underlying hardware. / A través de las últimas décadas, con base en la ley de Moore, la industria de semiconductores duplica el número de transistores en el chip alrededor de una vez cada dieciocho meses. Durante mucho tiempo, este aumento continuo en el número de transistores impulsó el aumento en el rendimiento de los procesadores solo explotando el paralelismo a nivel de instrucción (ILP) y el aumento de la frecuencia de los procesadores, permitiendo un aumento del rendimiento de los programas secuenciales. Este patrón llego a su limite en los primeros años del siglo XX, cuando el diseño de procesadores más grandes y complejos se convirtió en una tareá difícil debido a las debido al consumo requerido. La respuesta a este problema por parte de los arquitectos fue la integración de muchos núcleos en el mismo chip creando así chip multinúcleo Procesadores (CMP). En la última década, la tecnología de la computación experimentado enormes avances, sobre todo el en chip multiprocesadores (CMP) donde se ha pasado de diseños simetricos y homogeneous a sistemas asimétricos y heterogeneous. Tener núcleos de diferentes tipos en un solo procesador permite optimizar el rendimiento, la potencia y la eficiencia energética para una amplia gama de cargas de trabajo. Permite a los diseñadores de chips emplear especialización (es decir, podemos utilizar un tipo de núcleo diferente para distintos tipos de cálculo dependiendo del trade-off respecto del consumo y rendimiento). Los beneficios de la asimétrica chip multiprocesadores (ACMP) son intuitivos, ya que es bien sabido que diferentes cargas de trabajo tienen diferentes necesidades de recursos. Los CMP mejoran el rendimiento de las aplicaciones mediante la explotación del paralelismo a nivel de hilo (TLP). En las aplicaciones paralelas que dependen de múltiples hilos, estos deben ser manejados y enviados para su ejecución, y el paralelismo se debe explotar de manera eficiente. Cada día hay mas aplicaciones multi-hilo, por lo tanto encotraremos un numero mayor de hilos que se estaran ejecutando en la máquina. En consecuencia, el sistema operativo requerirá cantidades cada vez mayores de tiempo de CPU para organizar y ejecutar estos hilos de manera eficiente. Por lo tanto, las técnicas de optimizacion dinámica para la organizacion de la ejecucion de hilos son de suma importancia en los diseños ACMP ya que pueden incrementar o dsiminuir el rendimiento del hardware asimétrico o del software paralelo. Se han propuesto y aplicado a ACMPs varios métodos de organizar y ejecutar los hilos. En esta tesis, primero estudiamos el estado del arte en las técnicas para la gestionar la ejecucion de los hilos y hemos identificado las principales razones que limitan el paralelismo en sistemas ACMP. Proponemos tres nuevos enfoques para programar y gestionar los hilos y explotar el paralelismo a nivel de hardware, en lugar de perpetuar la tendencia actual de dejar esta gestion cada vez maas compleja al sistema operativo. Nuestro primer objetivo es mejorar el rendimiento de un sistema ACMP mediante la mejora en la gestion de los hilos a nivel de hardware. También mostramos que la gestion del los hilos a nivel de hardware reduce el consumo de energía de un sistemas de ACMP al permitir una mejor utilización del hardware subyacente.
150

Incentives for sharing heterogeneous resources in distributed systems : a participatory approach

Vega D'Aurelio, Davide 21 December 2015 (has links)
Contributory and volunteer computing ecosystems built around a community of participants need, like any other common-pool resources, an adaptive governance mechanism to guarantee the sustainability of the ecosystem. Reciprocity incentive mechanisms based on economic principles have been proved efficient solutions to regulate the resource sharing and allocation in large computing architectures, guaranteeing a direct retribution for each individual contribution even in presence of misbehaving users. However, while these mechanisms preserve the macro-equilibrium of the computational shared resources (e.g., CPU or memory), participants with fewer resources face problems competing for the attention of other members with more resources to cooperate with; making it difficult to apply such principles in practice. Additionally, active members of the community contributingin other aspects (e.g., doing administrative tasks or developing software) are not contemplated in traditional schemes although their time and effort are also part of the common-pool resource and hence, should be retributed somehow. The aim of this thesis is to revisit some of the architectural aspects of current systems and propose a framework to govern contributory and volunteer computing ecosystems in a fairer way based on principles of participatory economics. Our main contributions in this thesis are threefold. First, we examine the mechanisms ruling the resource sharing and propose a new reciprocal incentive mechanism that measures participants¿ effort on sharing resources instead of their direct contribution, so it increases the collaboration opportunities of users with fewer resources in heterogeneous scenarios. Second, we propose a regulation mechanism for allocating new computational devices and distribute new resources within them, with the objective of increasing their impact in the common-pool resources when the demand of resources is supplied by the community. Third, we propose new methods to detect and analyze the social positions and roles of the community members, enabling the governance mechanism to be adapted taking into account members' effort on several tasks not considered otherwise. The main contributions of this thesis conform a single framework that has been tested experimentally, using simulations, in a resource-sharing environment with non-strategic participants. Potentially, the mechanisms developed in this thesis will open new opportunities to apply political-economic and social ideas to the new generation of volunteer, contributory or grid computing systems; as well as other commonpool resources scenarios. / Els sistemes de computació voluntària o contributiva construïts al voltant de comunitats de participants necessiten, com qualsevol altre common-pool resource, mecanismes de govern adaptatius que garanteixin la sostenibilitat de l'ecosistema. Els incentius recíprocs basats en principis econòmics han demostrat ser solucions eficients per regular la compartició i assignació de recursos en arquitectures de gran escala, garantint una retribució directa per cada contribució, inclús en presència d’usuaris maliciosos. No obstant això, mentre aquests mecanismes preserven el macro equilibri dels recursos compartits (p. ex., CPU o memòria), els participants amb menys recursos tenen problemes per competir per l'atenció dels altres membres amb més recursos quan volen cooperar amb ells; fent difícil en la pràctica aplicar aquests principis. A més a més, els membres actius de la comunitat contribuint en altres aspectes (p. ex., realitzant tasques administratives, o desenvolupant software) no es torben contemplats en els esquemes tradicions tot i que el seu temps i esforç també son part del common-pool resource i, per tant, haurien de ser compensats. L'objectiu d'aquesta tesi és revisar alguns dels aspectes d’arquitectura que fan que aquestes estratègies no funcionin i proposar un framework per governar ecosistemes de computació voluntària o contributiva duna manera més justa utilitzant principis de participació econòmica. Primer, examinem els mecanismes que controlen la compartició de recursos i proposem un nou mecanisme d’incentiu recíproc que mesura l'esforç dels participants mentre comparteixen recursos en comptes de la seva contribució directa, de manera que les oportunitats per cooperar incrementen pels usuaris amb menys recursos. En segon lloc, proposem un mecanisme per regular l'assignació de noves màquines de computació i recursos, amb l'objectiu de millorar el seu impacte en escenaris amb common-pool resource quan la demanda de recursos ha de ser subministrada col·lectivament. Tercer, proposem nous mètodes per detectar i analitzar els rols i posicions socials dels membres de la comunitat, permetent que els mecanismes de govern es puguin adaptar tenint en compte l'esforç dels participants en altres tipus de tasques prèviament no contemplades. Les principals contribucions d'aquesta tesi formen un únic framework que ha estat provat experimentalment, utilitzant simulacions, en un escenari cooperatiu amb participants no estratègics. Potencialment, els mecanismes desenvolupats en aquesta tesi obriran noves oportunitats per aplicar idees politico-econòmiques i socials a la nova generació de sistemes de computació voluntària, cooperativa o grid, així com escenaris common-pool resource.

Page generated in 0.0914 seconds