• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 354
  • 79
  • 42
  • 1
  • Tagged with
  • 476
  • 476
  • 117
  • 94
  • 71
  • 45
  • 44
  • 43
  • 40
  • 40
  • 40
  • 40
  • 37
  • 34
  • 32
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Contributions to the security and privacy of electronic ticketing systems

Vives Guasch, Arnau 09 July 2013 (has links)
Un bitllet electrònic és un contracte en format digital entre dues parts, l'usuari i el proveïdor de serveis, on hi queda reflectit l'acord entre ambdós per tal que l'usuari rebi el servei que desitja per part del proveïdor. Els bitllets són emprats en diferents tipus de serveis, com esdeveniments lúdics o esportius, i especialment en l'àmbit del transport. En aquest cas permet reduir costos donat l'alt volum d'usuaris, a més de facilitar la identificació del flux de viatges. Aquesta informació permet preveure i planificar els sistemes de transport de forma més dinàmica. La seguretat dels bitllets electrònics és clau perquè es despleguin a l'entorn real, com també ho és la privadesa dels seus usuaris. La privadesa inclou tant l'anonimitat dels usuaris, és a dir, una acció no s'ha de poder atribuir fàcilment a un determinat usuari, com també la no enllaçabilitat dels diferents moviments d'un determinat usuari. En aquesta tesi proposem protocols de bitllets electrònics que mantinguin les propietats dels bitllets en paper juntament amb els avantatges dels bitllets digitals. Primerament fem un estat de l'art amb les propostes relacionades, analitzant-ne els requisits de seguretat que compleixen. Presentem un protocol de bitllets electrònics que incorpora els nous requisits de seguretat d'exculpabilitat i reutilització, diferents dels que haviem analitzat, tot complint també la privadesa pels usuaris. Posteriorment, presentem una proposta de bitllets electrònics adaptada als sistemes de pagament depenent de l'ús, bàsicament enfocat al transport, que incorpora tant l'anonimat pels usuaris, com també la enllaçabilitat a curt termini, és a dir, complint la no enllaçabilitat dels diferents moviments del mateix usuari, però permetent la enllaçabilitat de les accions relacionades amb el mateix trajecte (p.ex. entrada i sortida). Finalment, mitjançant una evolució de la mateixa tècnica criptogràfica utilitzada en el sistema de pagament per ús, millorant-ne el temps de verificació per a múltiples bitllets alhora (verificació en ``batch''), presentem una proposta que pot ser útil per a varis sistemes de verificació massiva de missatges, posant com a cas d'ús l'aplicació a sistemes de xarxes vehiculars. / An electronic ticket is a digital contract between two parties, that is, the user and the service provider. An agreement between them is established in order that the user can receive the desired service. These tickets are used in different types of services, such as sports or entertainment events, especially in the field of transport. In the case of transport, costs can be reduced due to the high volume of users, and the identification of the travel flow is facilitated. This information allows the forecast and planification of transport systems more dynamically. The security of electronic tickets is very important to be deployed in the real scenarios, as well as the privacy for their users. Privacy includes both the anonymity of users, which implies that an action cannot be easily attributed to a particular user, and also the unlinkability of the different movements of that user. This thesis presents protocols which keep the same security requirements of paper tickets while offering the advantages of digital tickets. Firstly, we perform a state of the art with the related proposals, by analysing the security requirements considered. We then present an electronic ticketing system that includes the security requirements of exculpability and reusability, thus guaranteeing the privacy for users. We later present a proposal of electronic ticketing systems adapted to use-dependant payment systems, especially focused on transport, which includes both the anonymity of users and the short-term linkability of their movements. The related actions of a journey of a determined user can be linkable between them (i.e. entrance and exit of the system) but not with other movements that the user performs. Finally, as an extension of the previous use-dependant payment system solution, we introduce the case of mass-verification systems, where many messages have to be verified in short time, and we present a proposal as a vehicular network use case that guarantees privacy for users with short-term linkability and can verify these messages efficiently.
12

Loop pipelining with resource and timing constraints

Sánchez Carracedo, Fermín 12 January 1996 (has links)
Developing efficient programs for many of the current parallel computers is not easy due to the architectural complexity of those machines. The wide variety of machine organizations often makes it more difficult to port an existing program than to reprogram it completely. Therefore, powerful translators are necessary to generate effective code and free the programmer from concerns about the specific characteristics of the target machine. This work focuses on techniques to be used by an important class of translators, whose objective is to transform sequential programs into equivalent more parallel programs. The transformations are performed at instruction level in order to exploit low level parallelism and increase memory locality.Most of the current applications are programmed in languages which do not allow us to express parallelism between high-level sentences (as Pascal, C or Fortran). Furthermore, a lot of applications written ten or more years ago are still used today, and it is not feasible to rewrite such applications for many reasons (not only technical reasons, but also economic ones). Translators enable programmers to write the application in a familiar sequential programming language, without concerning their selves with the architecture of the target machine. Current compilers for parallel architectures not only translate a program written on a high-level language to the appropriate machine language, but also perform some transformations in the final code in order to execute the program in a more parallel way. The transformations improve the performance in the execution of the program by making use of the knowledge that the compiler has about the machine architecture. The semantics of the program remain intact after any transformation.Experiments show that limiting parallelization to basic blocks not included in loops limits maximum speedup. This is because loops often comprise a large portion of the parallelism available to be exploited in a program. For this reason, a lot of effort has been devoted in the recent years to parallelize loop execution. Several parallel computer architectures and compilation techniques have been proposed to exploit such a parallelism at different granularities. Multiprocessors exploit coarse grained parallelism by distributing entire loop iterations to different processors. Systems oriented to the high-level synthesis (HLS) of VLSI circuits, superscalar processors and very long instruction word (VLIW) processors exploit fine-grained parallelism at instruction level. This work addresses fine-grained parallelization of loops addressed to the HLS of VLSI circuits. Two algorithms are proposed for resource constraints and for timing constraints. An algorithm to reduce the number of registers required to execute a loop in a given architecture is also proposed.
13

Affordable kilo-instruction processors

Pericàs Gleim, Miquel 09 December 2008 (has links)
Diversos motius expliquen l'estancament en el que es troba el desenvolupament del processador tradicional dissenyat per maximitzar el rendiment d'un únic fil d'execució. Per una banda, técniques agressives com la supersegmentacó del camí de dades o l'execució fora d'ordre tenen un impacte molt negatiu sobre el consum de potència i la complexitat del disseny. Altrament, l'increment en la freqüència del processador augmenta la discrepància entre la velocitat del processador i el temps d'accés a memòria principal. Tot i que les memòries cau redueixen considerablement el nombre d'accessos a memòria principal, aquests accessos introdueixen latencies prou grans per reduir considerablement el rendiment. Tècniques convencionals com l'execució fora d'ordre, útils per ocultar accessos a les memòries cau de 2on nivell, no estan pensades per ocultar latències tan grans. Caldrien cues amb mides de centenars d'instruccions i milers de registres per tal de no interrompre l'execució en el moment de produir-se un accés a memòria principal. Desafortunadament, la tecnologia disponible no és eficient per implementar aquestes estructures monolíticament, doncs resultaria un temps d'accés molt elevat, un consum de potència igualment elevat i un àrea no menyspreable. En aquesta tesi s'han estudiat tècniques que permeten l'implementació d'un processador amb capacitat per continuar processant instruccions en el cas de que es produeixin accessos a memòria principal. Les condicions per a que aquest processador sigui implementable són que estigui basat en estructures de mida convencional i que tingui una unitat de control senzilla. El repte es troba en conciliar un model de processador distribuït amb un control senzill. El problema del disseny del processador s'ha enfocat observant el comportament d'un processador de recursos infinits. S'ha observat que l'execució segueix uns patrons molt interessants, basats en la localitat d'execució. En aplicacions numèriques s'observa que més del 70% de les instruccions no depenen de accessos a memòria principal. Aixó és molt important doncs mostra que sempre hi ha una porció important d'instruccions executables poc després de la decodificació. Aixó permet proposar un nou tipus de processador amb dues unitats d'execució. La primera unitat (el "Cache Processor") processa a alta velocitat instruccions independents de memòria principal. La segona unitat ("Memory Processor") processa les instruccions dependents de accessos a memòria principal, pero de forma molt més relaxada, cosa que li permet mantenir milers de instruccions en vol. Aquesta proposta rep el nom de Decoupled KILO-Instruction Processor (D-KIP) i té forces avantatges: per un costat permet la construcció d'un kilo-instruction processor basat en estructures convencionals i per l'altre simplifica el disseny ja que minimitza les interaccions entre ambdos unitats d'execució.En aquesta tesi es proposen dos implementacions de processadors desacoblats: el D-KIP original, i el Flexible Heterogeneous MultiCore (FMC). Sobre aquestes propostes s'analitza el rendiment i es compara amb altres tècniques que incrementan el parallelisme de memoria, com el prefetching o l'execució "runahead". D'aquesta avaluació es desprén que el processador FMC té un rendiment similar al de un processador convencional amb una finestra de 1500 instruccions en vol. Posteriorment s'analitza l'integració del FMC en entorns multicore/multiprogrammats. La tesi es completa amb la proposta d'una cua de loads i stores (LSQ) per a aquest tipus de processador. / Several motives explain the slowdown of high-performance single-thread processor development. On the one hand, aggressive techniques such as superpipelining or out-of-order execution have a considerable impact on power consumption and design complexity. On the other hand, the increment in processor frequencies has led to a large disparity between processor speed and memory access time. Although cache memories considerably reduce the number of accesses to main memory, the remaining accesses introduce latencies large enough to considerably decrease performance. Conventional techniques such as out-of-order execution, while effective in hiding L2 cache accesses, cannot hide latencies this large. Queues of hundreds of entries and thousands of registers would be necessary in order to prevent execution from stalling in the event of a L2 cache miss. Unfortunately, current technology cannot efficiently implement such structures monolithically, as access latencies would considerably increase, as would power consumption and area consumption.In this thesis we studied techniques that allow the processor to continue processing instructions in the event of main memory accesses. The conditions for such a processor to be implementable are that it should be based on structures of conventional size and that it should feature simple control logic. The challenge lies in being able to design a distributed processor with simple control. The design of this processor has been approached by analyzing the behavior of a processor with infinite resources. We have observed that execution follows a very interesting pattern based on execution locality. In numerical codes we observed that over 70% of all instructions do not depend on memory accesses. This is interesting since it shows that there is always a large portion of instructions that can be executed shortly after decode. This allows us to propose a new kind of processor with two execution units. The first unit, the Cache Processor, processes memory-independent instructions at high speed. The second unit, the Memory Processor, processes instructions that depend on main memory accesses, but using relaxed scheduling logic, which allows it to scale to thousands of in-flight instructions. This proposal, which receives the name of Decoupled KILO-Instruction Processor (D-KIP), has several advantages. On the one hand it allows the construction of a kilo-instruction processor based on conventional structures and, on the other hand, it simplifies the design as the interaction between both execution units is minimal. In this thesis two implementations for this kind of processor are presented: the original D-KIP and the Flexible Heterogeneous MultiCore (FMC). The performance of these proposals is analyzed and compared to other proposals that increase memory-level parallelism, such as prefetching or runahead execution. It is observed that the FMC processor performs at the same level of a conventional processor with a window of around 1500 instructions. Further, the integration of the FMC processor into a multicore/multiprogrammed environment is studied. This thesis concludes with the proposal of a two-level Load/Store Queue for this kind of processor.
14

Architecture support for intrusion detection systems

Sreekar Shenoy, Govind 30 October 2012 (has links)
System security is a prerequisite for efficient day-to-day transactions. As a consequence, Intrusion Detection Systems (IDS) are commonly used to provide an effective security ring to systems in a network. An IDS operates by inspecting packets flowing in the network for malicious content. To do so, an IDS like Snort[49] compares bytes in a packet with a database of prior reported attacks. This functionality can also be viewed as string matching of the packet bytes with the attack string database. Snort commonly uses the Aho-Corasick algorithm[2] to detect attacks in a packet. The Aho-Corasick algorithm works by first constructing a Finite State Machine (FSM) using the attack string database. Later the FSM is traversed with the packet bytes. The main advantage of this algorithm is that it provides a linear time search irrespective of the number of strings in the database. The issue however lies in devising a practical implementation. The FSM thus constructed gets very bloated in terms of the storage size, and so is area inefficient. This also affects its performance efficiency as the memory footprint also grows. Another issue is the limited scope for exploiting any parallelism due to the inherent sequential nature in a FSM traversal. This thesis explores hardware and software techniques to accelerate attack detection using the Aho-Corasick algorithm. In the first part of this thesis, we investigate techniques to improve the area and performance efficiency of an IDS. Notable among our contributions, includes a pipelined architecture that accelerates accesses to the most frequently accessed node in the FSM. The second part of this thesis studies the resilience of an IDS to evasion attempts. In an evasion attempt an adversary saturates the performance of an IDS to disable it, and thereby gain access to the network. We explore an evasion attempt that significantly degrades the performance of the Aho-Corasick al- gorithm used in an IDS. As a counter measure, we propose a parallel architecture that improves the resilience of an IDS to an evasion attempt. The final part of this thesis explores techniques to exploit the network traffic characteristic. In our study, we observe significant redundancy in the payload bytes. So we propose a mechanism to leverage this redundancy in the FSM traversal of the Aho-Corasick algorithm. We have also implemented our proposed redundancy-aware FSM traversal in Snort.
15

Mitosis based speculative multithreaded architectures

Madriles Gimeno, Carles 23 July 2012 (has links)
In the last decade, industry made a right-hand turn and shifted towards multi-core processor designs, also known as Chip-Multi-Processors (CMPs), in order to provide further performance improvements under a reasonable power budget, design complexity, and validation cost. Over the years, several processor vendors have come out with multi-core chips in their product lines and they have become mainstream, with the number of cores increasing in each processor generation. Multi-core processors improve the performance of applications by exploiting Thread Level Parallelism (TLP) while the Instruction Level Parallelism (ILP) exploited by each individual core is limited. These architectures are very efficient when multiple threads are available for execution. However, single-thread sections of code (single-thread applications and serial sections of parallel applications) pose important constraints on the benefits achieved by parallel execution, as pointed out by Amdahl’s law. Parallel programming, even with the help of recently proposed techniques like transactional memory, has proven to be a very challenging task. On the other hand, automatically partitioning applications into threads may be a straightforward task in regular applications, but becomes much harder for irregular programs, where compilers usually fail to discover sufficient TLP. In this scenario, two main directions have been followed in the research community to take benefit of multi-core platforms: Speculative Multithreading (SpMT) and Non-Speculative Clustered architectures. The former splits a sequential application into speculative threads, while the later partitions the instructions among the cores based on data-dependences but avoid large degree of speculation. Despite the large amount of research on both these approaches, the proposed techniques so far have shown marginal performance improvements. In this thesis we propose novel schemes to speed-up sequential or lightly threaded applications in multi-core processors that effectively address the main unresolved challenges of previous approaches. In particular, we propose a SpMT architecture, called Mitosis, that leverages a powerful software value prediction technique to manage inter-thread dependences, based on pre-computation slices (p-slices). Thanks to the accuracy and low cost of this technique, Mitosis is able to effectively parallelize applications even in the presence of frequent dependences among threads. We also propose a novel architecture, called Anaphase, that combines the best of SpMT schemes and clustered architectures. Anaphase effectively exploits ILP, TLP and Memory Level Parallelism (MLP), thanks to its unique finegrain thread decomposition algorithm that adapts to the available parallelism in the application.
16

Towards lightweight and high-performance hardware transactional memory

Tomić, Sasa 13 July 2012 (has links)
Conventional lock-based synchronization serializes accesses to critical sections guarded by the same lock. Using multiple locks brings the possibility of a deadlock or a livelock in the program, making parallel programming a difficult task. Transactional Memory (TM) is a promising paradigm for parallel programming, offering an alternative to lock-based synchronization. TM eliminates the risk of deadlocks and livelocks, while it provides the desirable semantics of Atomicity, Consistency, and Isolation of critical sections. TM speculatively executes a series of memory accesses as a single, atomic, transaction. The speculative changes of a transaction are kept private until the transaction commits. If a transaction can break the atomicity or cause a deadlock or livelock, the TM system aborts the transaction and rolls back the speculative changes. To be effective, a TM implementation should provide high performance and scalability. While implementations of TM in pure software (STM) do not provide desirable performance, Hardware TM (HTM) implementations introduce much smaller overhead and have relatively good scalability, due to their better control of hardware resources. However, many HTM systems support only the transactions that fit limited hardware resources (for example, private caches), and fall back to software mechanisms if hardware limits are reached. These HTM systems, called best-effort HTMs, are not desirable since they force a programmer to think in terms of hardware limits, to use both HTM and STM, and to manage concurrent transactions in HTM and STM. In contrast with best-effort HTMs, unbounded HTM systems support overflowed transactions, that do not fit into private caches. Unbounded HTM systems often require complex protocols or expensive hardware mechanisms for conflict detection between overflowed transactions. In addition, an execution with overflowed transactions is often much slower than an execution that has only regular transactions. This is typically due to restrictive or approximative conflict management mechanism used for overflowed transactions. In this thesis, we study hardware implementations of transactional memory, and make three main contributions. First, we improve the general performance of HTM systems by proposing a scalable protocol for conflict management. The protocol has precise conflict detection, in contrast with often-employed inexact Bloom-filter-based conflict detection, which often falsely report conflicts between transactions. Second, we propose a best-effort HTM that utilizes the new scalable conflict detection protocol, termed EazyHTM. EazyHTM allows parallel commits for all non-conflicting transactions, and generally simplifies transaction commits. Finally, we propose an unbounded HTM that extends and improves the initial protocol for conflict management, and we name it EcoTM. EcoTM features precise conflict detection, and it efficiently supports large as well as small and short transactions. The key idea of EcoTM is to leverage an observation that very few locations are actually conflicting, even if applications have high contention. In EcoTM, each core locally detects if a cache line is non-conflicting, and conflict detection mechanism is invoked only for the few potentially conflicting cache lines. / La Sincronización tradicional basada en los cerrojos de exclusión mutua (locks) serializa los accesos a las secciones críticas protegidas este cerrojo. La utilización de varios cerrojos en forma concurrente y/o paralela aumenta la posibilidad de entrar en abrazo mortal (deadlock) o en un bloqueo activo (livelock) en el programa, está es una de las razones por lo cual programar en forma paralela resulta ser mucho mas dificultoso que programar en forma secuencial. La memoria transaccional (TM) es un paradigma prometedor para la programación paralela, que ofrece una alternativa a los cerrojos. La memoria transaccional tiene muchas ventajas desde el punto de vista tanto práctico como teórico. TM elimina el riesgo de bloqueo mutuo y de bloqueo activo, mientras que proporciona una semántica de atomicidad, coherencia, aislamiento con características similares a las secciones críticas. TM ejecuta especulativamente una serie de accesos a la memoria como una transacción atómica. Los cambios especulativos de la transacción se mantienen privados hasta que se confirma la transacción. Si una transacción entra en conflicto con otra transacción o sea que alguna de ellas escribe en una dirección que la otra leyó o escribió, o se entra en un abrazo mortal o en un bloqueo activo, el sistema de TM aborta la transacción y revierte los cambios especulativos. Para ser eficaz, una implementación de TM debe proporcionar un alto rendimiento y escalabilidad. Las implementaciones de TM en el software (STM) no proporcionan este desempeño deseable, en cambio, las mplementaciones de TM en hardware (HTM) tienen mejor desempeño y una escalabilidad relativamente buena, debido a su mejor control de los recursos de hardware y que la resolución de los conflictos así el mantenimiento y gestión de los datos se hace en hardware. Sin embargo, muchos de los sistemas de HTM están limitados a los recursos de hardware disponibles, por ejemplo el tamaño de las caches privadas, y dependen de mecanismos de software para cuando esos límites son sobrepasados. Estos sistemas HTM, llamados best-effort HTM no son deseables, ya que obligan al programador a pensar en términos de los límites existentes en el hardware que se esta utilizando, así como en el sistema de STM que se llama cuando los recursos son sobrepasados. Además, tiene que resolver que transacciones hardware y software se ejecuten concurrentemente. En cambio, los sistemas de HTM ilimitados soportan un numero de operaciones ilimitadas o sea no están restringidos a límites impuestos artificialmente por el hardware, como ser el tamaño de las caches o buffers internos. Los sistemas HTM ilimitados por lo general requieren protocolos complejos o mecanismos muy costosos para la detección de conflictos y el mantenimiento de versiones de los datos entre las transacciones. Por otra parte, la ejecución de transacciones es a menudo mucho más lenta que en una ejecución sobre un sistema de HTM que este limitado. Esto es debido al que los mecanismos utilizados en el HTM limitado trabaja con conjuntos de datos relativamente pequeños que caben o están muy cerca del núcleo del procesador. En esta tesis estudiamos implementaciones de TM en hardware. Presentaremos tres contribuciones principales: Primero, mejoramos el rendimiento general de los sistemas, al proponer un protocolo escalable para la gestión de conflictos. El protocolo detecta los conflictos de forma precisa, en contraste con otras técnicas basadas en filtros Bloom, que pueden reportar conflictos falsos entre las transacciones. Segundo, proponemos un best-effort HTM que utiliza el nuevo protocolo escalable detección de conflictos, denominado EazyHTM. EazyHTM permite la ejecución completamente paralela de todas las transacciones sin conflictos, y por lo general simplifica la ejecución. Por último, proponemos una extensión y mejora del protocolo inicial para la gestión de conflictos, que llamaremos EcoTM. EcoTM cuenta con detección de conflictos precisa, eficiente y es compatible tanto con transacciones grandes como con pequeñas. La idea clave de EcoTM es aprovechar la observación que en muy pocas ubicaciones de memoria aparecen los conflictos entre las transacciones, incluso en aplicaciones tienen muchos conflictos. En EcoTM, cada núcleo detecta localmente si la línea es conflictiva, además existe un mecanismo de detección de conflictos detallado que solo se activa para las pocas líneas de memoria que son potencialmente conflictivas.
17

CPU accounting in multi-threaded processors

Ruiz Luque, José Carlos 29 May 2014 (has links)
In recent years, multi-threaded processors have become more and more popular in industry in order to increase the system aggregated performance and per-application performance, overcoming the limitations imposed by the limited instruction-level parallelism, and by power and thermal constraints. Multi-threaded processors are widely used in servers, desktop computers, lap-tops, and mobile devices. However, multi-threaded processors introduce complexities when accounting CPU (computation) capacity (CPU accounting), since the CPU capacity accounted to an application not only depends upon the time that the application is scheduled onto a CPU, but also on the amount of hardware resources it receives during that period. And given that in a multi-threaded processor hardware resources are dynamically shared between applications, the CPU capacity accounted to an application in a multi-threaded processor depends upon the workload in which it executes. This is inconvenient because the CPU accounting of the same application with the same input data set may be accounted significantly different depending upon the workload in which it executes. Deploying systems with accurate CPU accounting mechanisms is necessary to increase fairness among running applications. Moreover, it will allow users to be fairly charged on a shared data center, facilitating server consolidation in future systems. This Thesis analyses the concepts of CPU capacity and CPU accounting for multi-threaded processors. In this study, we demonstrate that current CPU accounting mechanisms are not as accurate as they should be in multi-threaded processors. For this reason, we present two novel CPU accounting mechanisms that improve the accuracy in measuring the CPU capacity for multi-threaded processors with low hardware overhead. We focus our attention on several current multi-threaded processors, including chip multiprocessors and simultaneous multithreading processors. Finally, we analyse the impact of shared resources in multi-threaded processors in operating system CPU scheduler and we propose several schedulers that improve the knowledge of shared hardware resources at the software level.
18

Sistema de diseño de lentes progresivas asistido por ordenador.

Dürsteler, Juan Carlos 09 December 1991 (has links)
No description available.
19

Atomic dataflow model

Gajinov, Vladimir 20 November 2014 (has links)
With the recent switch in the design of general purpose processors from frequency scaling of a single processor core towards increasing the number of processor cores, parallel programming became important not only for scientific programming but also for general purpose programming. This also stressed the importance of programmability of existing parallel programming models which were primarily designed for performance. It was soon recognized that new programming models are needed that will make parallel programming possible not only to experts, but to a general programming community. Transactional Memory (TM) is an example which follows this premise. It improves dramatically over any previous synchronization mechanism in terms of programmability and composability, at the price of possibly reduced performance. The main source of performance degradation in Transactional Memory is the overhead of transactional execution. Our work on parallelizing Quake game engine is a clear example of this problem. We show that Software Transactional Memory is superior in terms of programmability compared to lock based programming, but that performance is hindered due to extreme amount of overhead introduced by transactional execution. In the meantime, a significant research effort has been invested in overcoming this problem. Our approach is aimed towards improving the performance of transactional code by reducing transactional data conflicts. The idea is based on the organization of the code in which highly conflicting data is promoted to dataflow tokens that coordinate the execution of transactions. The main contribution of this thesis is Atomic Dataflow model (ADF), a new task-based parallel programming model for C/C++ that integrates dataflow abstractions into the shared memory programming model. The ADF model provides language constructs that allow a programmer to delineate a program into a set of tasks and to explicitly define data dependencies for each task. The task dependency information is conveyed to the ADF runtime system that constructs a dataflow task graph that governs the execution of a program. Additionally, the ADF model allows tasks to share data. The key idea is that computation is triggered by dataflow between tasks but that, within a task, execution occurs by making atomic updates to common mutable state. To that end, the ADF model employs transactional memory, which guarantees atomicity of shared memory updates. The second contribution of this thesis is DaSH - the first comprehensive benchmark suite for hybrid dataflow and shared memory programming models. DaSH features 11 benchmarks, each representing one of the Berkeley dwarfs that capture patterns of communication and computation common to a wide range of emerging applications. DaSH includes sequential and shared-memory implementations based on OpenMP and TBB to facilitate easy comparison between hybrid dataflow implementations and traditional shared memory implementations. We use DaSH not only to evaluate the ADF model, but to also compare it with other two hybrid dataflow models in order to identify the advantages and shortcomings of such models, and motivate further research on their characteristics. Finally, we study applicability of hybrid dataflow models for parallelization of the game engine. We show that hybrid dataflow models decrease the complexity of the parallel game engine implementation by eliminating or restructuring the explicit synchronization that is necessary in shared memory implementations. The corresponding implementations also exhibit good scalability and better speedup than the shared memory parallel implementations, especially in the case of a highly congested game world that contains a large number of game objects. Ultimately, on an eight core machine we were able to obtain 4.72x speedup compared to the sequential baseline, and to improve 49% over the lock-based parallel implementation based on work-sharing. / Con el reciente cambio en el diseño de los procesadores de propósito general pasando del aumento de frecuencia al incremento del número de núcleos, la programación paralela se ha convertido en importante no solo para la comunidad científica sino también para la programación en general. Este hecho ha enfatizado la importancia de la programabilidad de los modelos actuales de programación paralela, cuyo objetivo era el rendimiento. Pronto se observó la necesidad de nuevos modelos de programación, para hacer factible la programación paralela a toda la comunidad. Transactional Memory (TM) es un ejemplo de dicho objetivo. Supone una gran mejora sobre cualquier método anterior de sincronización en términos de programabilidad, con una posible reducción del rendimiento como coste. La razón principal de dicha degradación es el sobrecoste de la ejecución transaccional. Nuestro trabajo en la paralelización del motor del juego Quake es un claro ejemplo de este problema. Demostramos que Software Transactional Memory es superior en términos de programabilidad a los modelos de programación basados en locks, pero que el rendimiento es entorpecido por el sobrecoste introducido por TM. Mientras tanto, se ha invertido un importante esfuerzo de investigación para superar dicho problema. Nuestra solución se dirige hacia la mejora del rendimiento del código transaccional reduciendo los conflictos con la información contenida en las transacciones. La idea se basa en la organización del código en el cual la información conflictiva es promocionada a señales del flujo de datos que coordinan la ejecución de las transacciones. La contribución principal de esta tesis es Atomic Dataflow Model (ADF), un nuevo modelo de programación para C/C++ basado en tareas que integra abstracciones de flujo de datos en el modelo de programación de la memoria compartida. El modelo ADF provee construcciones del lenguaje que permiten al programador la definición del programa como un conjunto de tareas, además de la definición explícita de las dependencias de datos para cada tarea. La información de dependencia de la tarea se transmite al runtime de ADF, que construye un grafo de tareas que es el que controla la ejecución de un programa. Adicionalmente, el modelo ADF permite que las tareas compartan información. La idea principal es que la computación es activada por el flujo de datos entre tareas, pero que dentro de una tarea la ejecución ocurre haciendo actualizaciones atómicas a un estado común mutable. Para conseguir este fin, el modelo ADF utiliza TM, que garantiza la atomicidad en las modificaciones de la memoria compartida. La segunda contribución es DaSH, el primer conjunto de benchmarks para los modelos de programación de flujo de datos híbridos y los de memoria compartida. DaSH contiene 11 benchmarks, cada uno representativo de uno de los Berkeley dwarfs que captura patrones de comunicaciones y procesamiento comunes en un amplio rango de aplicaciones emergentes. DaSH incluye implementaciones secuenciales y de memoria compartida basadas en OpenMP y TBB que facilitan la comparación entre los modelos híbridos de flujo de datos e implementaciones de memoria compartida. Nosotros usamos DaSH no solo para evaluar ADF, sino también para compararlo con otros dos modelos híbridos para identificar sus ventajas. Finalmente, estudiamos la aplicabilidad de dichos modelos híbridos para la paralelización del motor del juego. Mostramos que disminuyen la complejidad de la implementación paralela, eliminando o reestructurando la sincronización explícita que es necesaria en las implementaciones de memoria compartida. También se observa una buena escalabilidad y una aceleración mejor, especialmente en el caso de un ambiente de juego muy cargado. En última instancia, sobre una máquina con ocho núcleos se ha obtenido una aceleración del 4.72x comparado con el código secuencial, y una mejora del 49% sobre la implementación paralela basada en locks.
20

Resilience mechanisms for carrier-grade networks

Ramírez, Wilson 18 November 2014 (has links)
In recent years, the advent of new Future Internet (FI) applications is creating ever-demanding requirements. These requirements are pushing network carriers for high transport capacity, energy efficiency, as well as high-availability services with low latency. A widespread practice to provide FI services is the adoption of a multi-layer network model consisting in the use of IP/MPLS and optical technologies such as Wavelength Division Multiplexing (WDM). Indeed, optical transport technologies are the foundation supporting the current telecommunication network backbones, because of the high transmission bandwidth achieved in fiber optical networks. Traditional optical networks consist of a fixed 50 GHz grid, resulting in a low Optical Spectrum (OS) utilization, specifically with transmission rates above 100 Gbps. Recently, optical networks have been undergoing significant changes with the purpose of providing a flexible grid that can fully exploit the potential of optical networks. This has led to a new network paradigm termed as Elastic Optical Network (EON). In recent years, the advent of new Future Internet (FI) applications is creating ever-demanding requirements. A widespread practice to provide FI services is the adoption of a multi-layer network model consisting in the use of IP/MPLS and optical technologies such as Wavelength Division Multiplexing (WDM). Traditional optical networks consist of a fixed 50 GHz grid, resulting in a low Optical Spectrum (OS) utilization. Recently, optical networks have been undergoing significant changes with the purpose of providing a flexible grid that can fully exploit the potential of optical networks. This has led to a new network paradigm termed as Elastic Optical Network (EON). Recently, a new protection scheme referred to as Network Coding Protection (NCP) has emerged as an innovative solution to proactively enable protection in an agile and efficient manner by means of throughput improvement techniques such as Network Coding. It is an intuitive reasoning that the throughput advantages of NCP might be magnified by means of the flexible-grid provided by EONs. The goal of this thesis is three-fold. The first, is to study the advantages of NCP schemes in planning scenarios. For this purpose, this thesis focuses on the performance of NCP assuming both a fixed as well as a flexible spectrum grid. However, conversely to planning scenarios, in dynamic scenarios the accuracy of Network State Information (NSI) is crucial since inaccurate NSI might substantially affect the performance of an NCP scheme. The second contribution of this thesis is to study the performance of protection schemes in dynamic scenarios considering inaccurate NSI. For this purpose, this thesis explores prediction techniques in order to mitigate the negative effects of inaccurate NSI. On the other hand, Internet users are continuously demanding new requirements that cannot be supported by the current host-oriented communication model.This communication model is not suitable for future Internet architectures such as the so-called Internet of Things (IoT). Fortunately, there is a new trend in network research referred to as ID/Locator Split Architectures (ILSAs) which is a non-disruptive technique to mitigate the issues related to host-oriented communications. Moreover, a new routing architecture referred to as Path Computation Element (PCE) has emerged with the aim of overcoming the well-known issues of the current routing schemes. Undoubtedly, routing and protection schemes need to be enhanced to fully exploit the advantages provided by new network architectures.In light of this, the third goal of this thesis introduces a novel PCE-like architecture termed as Context-Aware PCE. In a context-aware PCE scenario, the driver of a path computation is not a host/location, as in conventional PCE architectures, rather it is an interest for a service defined within a context. / En los últimos años la llegada de nuevas aplicaciones del llamado Internet del Futuro (FI) está creando requerimientos sumamente exigentes. Estos requerimientos están empujando a los proveedores de redes a incrementar sus capacidades de transporte, eficiencia energética, y sus prestaciones de servicios de alta disponibilidad con baja latencia. Es una práctica sumamente extendida para proveer servicios (FI) la adopción de un modelo multi-capa el cual consiste en el uso de tecnologías IP/MPLS así como también ópticas como por ejemplo Wavelength Division Multiplexing (WDM). De hecho, las tecnologías de transporte son el sustento del backbone de las redes de telecomunicaciones actuales debido al gran ancho de banda que proveen las redes de fibra óptica. Las redes ópticas tradicionales consisten en el uso de un espectro fijo de 50 GHz. Esto resulta en una baja utilización del espectro Óptico, específicamente con tasas de transmisiones superiores a 100 Gbps. Recientemente, las redes ópticas están experimentado cambios significativos con el propósito de proveer un espectro flexible que pueda explotar el potencial de las redes ópticas. Esto ha llevado a un nuevo paradigma denominado Redes Ópticas Elásticas (EON). Por otro lado, un nuevo esquema de protección llamado Network Coding Protection (NCP) ha emergido como una solución innovadora para habilitar de manera proactiva protección eficiente y ágil usando técnicas de mejora de throughput como es Network Coding (NC). Es un razonamiento lógico pensar que las ventajas relacionadas con throughput de NCP pueden ser magnificadas mediante el espectro flexible proveído por las redes EONs. El objetivo de esta tesis es triple. El primero es estudiar las ventajas de esquemas NCP en un escenario de planificación. Para este propósito, esta tesis se enfoca en el rendimiento de NCP asumiendo un espectro fijo y un espectro flexible. Sin embargo, contrario a escenarios de planificación, en escenarios dinámicos la precisión relacionada de la Información de Estado de Red (NSI) es crucial, ya que la imprecisión de NSI puede afectar sustancialmente el rendimiento de un esquema NCP. La segunda contribución de esta tesis es el estudio del rendimiento de esquemas de protección en escenarios dinámicos considerando NSI no precisa. Para este propósito, esta tesis explora técnicas predictivas con el propósito de mitigar los efectos negativos de NSI impreciso. Por otro lado, los usuarios de Internet están demandando continuamente nuevos requerimientos los cuales no pueden ser soportados por el modelo de comunicación orientado a hosts. Este modelo de comunicaciones no es factible para arquitecturas FI como es el Internet de las cosas (IoT). Afortunadamente, existe un nueva línea investigativa llamada ID/Locator Split Architectures (ILSAs) la cual es una técnica no disruptiva para mitigar los problemas relacionadas con el modelo de comunicación orientado a hosts. Además, un nuevo esquema de enrutamiento llamado as Path Computation Element (PCE) ha emergido con el propósito de superar los problemas bien conocidos de los esquemas de enrutamiento tradicionales. Indudablemente, los esquemas de enrutamiento y protección deben ser mejorados para que estos puedan explotar las ventajas introducidas por las nuevas arquitecturas de redes. A luz de esto, el tercer objetivo de esta tesis es introducir una nueva arquitectura PCE denominada Context-Aware PCE. En un escenario context-aware PCE, el objetivo de una acción de computación de camino no es un host o localidad, como es el caso en lo esquemas PCE tradicionales. Más bien, es un interés por un servicio definido dentro de una información de contexto.

Page generated in 0.4442 seconds