• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 367
  • 83
  • 46
  • 1
  • Tagged with
  • 497
  • 486
  • 125
  • 96
  • 77
  • 45
  • 44
  • 44
  • 42
  • 40
  • 40
  • 40
  • 40
  • 39
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Energy-aware home area networking

Khan, Rafiullah 16 December 2014 (has links)
Cotutela Universitat Politècnica de Catalunya i Università degli Studi di Genova / A study by Lawrence Berkeley National Laboratory (LBNL) revealed that about 60% of the office PCs are left powered-up 24/7 only to maintain the network connectivity for remote access, Voice-over-IP (VOIP) clients, Instant Messaging (IM) and other administrative management reasons. The Advanced Configuration and Power Interface (ACPI) proposed several low power states for PCs as effective mechanism to reduce energy waste, but unfortunately they are seldomly used due to their incapability to maintain network presence. Thus, Billions of dollars of electricity is wasted every year to keep idle or unused network devices fully powered-up only to maintain the network connectivity.This dissertation addresses the Network Connectivity Proxy (NCP), a concept recently been proposed as an optimal strategy to reduce energy waste due to idle network devices. The NCP is a software entity running on a low power network device (such as home gateway, switch or router) and impersonates presence for high power devices (such as PCs) during their sleeping periods. It wakes-up a sleeping device only when its resources are required. In short, the NCP impersonates link layer, network layer, transport layer and application layer presence on behalf of sleeping devices. In this dissertation, we presented the design and implementation of our NCP prototype. The NCP concept faces several issues and challenges that we tried to address in the most effective way in our implementations. Knowing when to start or stop proxying presence on behalf of sleeping devices is critically important for the NCP operations. To achieve this objective in a seamless way without requiring any user intervention, we developed a kernel module that monitors the power state transitions of the device and immediately informs the NCP over a suitable communication protocol in case of any update. An important challenge for the NCP is its ability to proxy a huge and ever increasing number of applications and networking protocols on behalf of sleeping devices. To tackle with this challenge in an efficient way, we implemented a quite generalized set of behavioral rules in our NCP framework that can be suitable for any protocol or application. We also incorporated deployment flexibility in our NCP software that enables us to operate it on on-board NIC, switch/router or on a standalone PC. On-board NIC and switch/router are the optimal locations for the NCP software in home/small office environment (very limited number of devices) or a standalone PC with enough resources is a good choice if high scalability is desirable e.g., medium or large size organizations. A communication protocol is required for information exchange between the NCP and client devices e.g., for power state notifications, registration/de-registration of client devices/behavioral rules etc. To avoid any configuration issues, we developed a flexible and reliable communication framework based on the Universal Plug & Play (UPnP) architecture that provides interesting features such as auto-discovery, zero-configuration and seamless communication between the NCP and client devices. We expanded the NCP coverage beyond LAN boundaries in order to exploit its full potential in terms of energy savings by covering for thousands of client devices. A single global powerful NCP instance located anywhere in the Internet can make easier the implementation of complex tasks and boosts up the energy savings by also shutting down the unused access links and the packets forwarding equipments whenever possible. Further, we also extended the NCP concept for mobile devices to help in improving the battery life. Another important contribution of this dissertation includes the extensive evaluation of the NCP performance on different low power hardwares. We performed large number of experiments and evaluated the effectiveness of NCP prototype in different
92

The use of computational intelligence for security in named data networking

Karami, Amin 16 February 2015 (has links)
Information-Centric Networking (ICN) has recently been considered as a promising paradigm for the next-generation Internet, shifting from the sender-driven end-to-end communication paradigma to a receiver-driven content retrieval paradigm. In ICN, content -rather than hosts, like in IP-based design- plays the central role in the communications. This change from host-centric to content-centric has several significant advantages such as network load reduction, low dissemination latency, scalability, etc. One of the main design requirements for the ICN architectures -since the beginning of their design- has been strong security. Named Data Networking (NDN) (also referred to as Content-Centric Networking (CCN) or Data-Centric Networking (DCN)) is one of these architectures that are the focus of an ongoing research effort that aims to become the way Internet will operate in the future. Existing research into security of NDN is at an early stage and many designs are still incomplete. To make NDN a fully working system at Internet scale, there are still many missing pieces to be filled in. In this dissertation, we study the four most important security issues in NDN in order to defense against new forms of -potentially unknown- attacks, ensure privacy, achieve high availability, and block malicious network traffics belonging to attackers or at least limit their effectiveness, i.e., anomaly detection, DoS/DDoS attacks, congestion control, and cache pollution attacks. In order to protect NDN infrastructure, we need flexible, adaptable and robust defense systems which can make intelligent -and real-time- decisions to enable network entities to behave in an adaptive and intelligent manner. In this context, the characteristics of Computational Intelligence (CI) methods such as adaption, fault tolerance, high computational speed and error resilient against noisy information, make them suitable to be applied to the problem of NDN security, which can highlight promising new research directions. Hence, we suggest new hybrid CI-based methods to make NDN a more reliable and viable architecture for the future Internet. / Information-Centric Networking (ICN) ha sido recientemente considerado como un paradigma prometedor parala nueva generación de Internet, pasando del paradigma de la comunicación de extremo a extremo impulsada por el emisora un paradigma de obtención de contenidos impulsada por el receptor. En ICN, el contenido (más que los nodos, como sucede en redes IPactuales) juega el papel central en las comunicaciones. Este cambio de "host-centric" a "content-centric" tiene varias ventajas importantes como la reducción de la carga de red, la baja latencia, escalabilidad, etc. Uno de los principales requisitos de diseño para las arquitecturas ICN (ya desde el principiode su diseño) ha sido una fuerte seguridad. Named Data Networking (NDN) (también conocida como Content-Centric Networking (CCN) o Data-Centric Networking (DCN)) es una de estas arquitecturas que son objetode investigación y que tiene como objetivo convertirse en la forma en que Internet funcionará en el futuro. Laseguridad de NDN está aún en una etapa inicial. Para hacer NDN un sistema totalmente funcional a escala de Internet, todavía hay muchas piezas que faltan por diseñar. Enesta tesis, estudiamos los cuatro problemas de seguridad más importantes de NDN, para defendersecontra nuevas formas de ataques (incluyendo los potencialmente desconocidos), asegurar la privacidad, lograr una alta disponibilidad, y bloquear los tráficos de red maliciosos o al menos limitar su eficacia. Estos cuatro problemas son: detección de anomalías, ataques DoS / DDoS, control de congestión y ataques de contaminación caché. Para solventar tales problemas necesitamos sistemas de defensa flexibles, adaptables y robustos que puedantomar decisiones inteligentes en tiempo real para permitir a las entidades de red que se comporten de manera rápida e inteligente. Es por ello que utilizamos Inteligencia Computacional (IC), ya que sus características (la adaptación, la tolerancia a fallos, alta velocidad de cálculo y funcionamiento adecuado con información con altos niveles de ruido), la hace adecuada para ser aplicada al problema de la seguridad NDN
93

A dependency-aware parallel programming model

Pérez Cáncer, Josep Maria 11 February 2015 (has links)
Designing parallel codes is hard. One of the most important roadblocks to parallel programming is the presence of data dependencies. These restrict parallelism and, in general, to work them around requires complex analysis and leads to convoluted solutions that decrease the quality of the code. This thesis proposes a solution to parallel programming that incorporates data dependencies into the model. The programming model can handle that information and to dynamically find parallelism that otherwise would be hard to find. This approach improves both programmability and parallelism, and thus performance. While this problem has already been solved in OpenMP 4 at the time of this publication, this research begun before the problem was even being considered for OpenMP 3. In fact, some of the contributions of this thesis have had an influence on the approach taken in OpenMP 4. However, the contributions go beyond that and cover aspects that have not been considered yet in OpenMP 4. The approach we propose is based on function-level dependencies across disjoint blocks of contiguous memory. While finding dependencies under those constraints is simple, it is much harder to do so over strided and possibly partially overlapping sets of data. This thesis also proposes a solution to this problem. By doing so, we increase the range of applicability of the original solution and increase the span of applicability of the programming model. OpenMP4 does not currently cover this aspect. Finally, we present a solution to take advantage of the performance characteristics of Non-Uniform Memory Access architectures. Our proposal is at the programming model level and does not require changes in the code. It automatically distributes the data and does not rely on data migration nor replication. Instead, it is based exclusively on scheduling the computations. While this process is automatic, it can be tuned through minor changes in the code that do not require any change in the programming model. Throughout the thesis, we demonstrate the effectiveness of the proposal through benchmarks that are either hard to program using other paradigms or that have different solutions. In most cases, our solutions perform either on par or better than already existing solutions. This includes the implementations available in well-known high-performance parallel libraries. / Dissenyar codis paral·lels es complex. Un dels principals esculls a l'hora de programar aplicacions paral·leles és la presència de dependències. Aquestes constrenyen el paral·lelisme, i en general, per evitar-les es requereix realitzar anàlisis complicades que donen lloc a solucions complexes que redueixen la qualitat del codi. Aquesta tesi proposa una solució a la programació paral·lela que incorpora al model les dependències de dades. El model de programació és capaç d'utilitzar aquesta informació per a trobar paral·lelisme que altrament seria molt difícil de detectar i d'extreure. Aquest enfoc augmenta la programabilitat i el paral·lelisme, i per tant també el rendiment. Tot i que al moment de la publicació d'aquesta tesi, el problema ja ha estat resolt a OpenMP 4, la recerca d'aquesta tesi va començar abans de que el problema s'hagués plantejat en l'àmbit d'OpenMP 3. De fet, algunes de les contribucions de la tesi han influït en la solució emprada a OpenMP 4. Tanmateix, les contribucions van més enllà i cobreixen aspectes que encara no han estat considerats a OpenMP 4. La proposta es basa en dependències a nivell de funció entre blocs de memòria continus i sense intersecció. Tot i que trobar dependències sota aquestes condicions és senzill, fer-ho sobre dades no contínues amb possibles interseccions parcials és molt més complex. Aquesta tesi també proposa una solució a aquest problema. Fent això, es millora el rang d'aplicació de la solució original i per tant el del model de programació. Aquest és un dels aspectes que encara no es contemplen a OpenMP 4. Finalment, es presenta una solució que té en compte les característiques de rendiment de les arquitectures NUMA (Accés No Uniforme a la Memòria). La proposta es planteja a nivell del model de programació i no precisa de canvis al codi ja que les dades es distribueixen automàticament. En lloc de basar-se en la migració i la replicació de les dades, es basa exclusivament en la planificació de l'execució de les computacions. Tot i que aquest procés és automàtic, es pot afinar mitjançant petits canvis en el codi que no arriben a alterar el model de programació. Al llarg d'aquesta tesi es demostra la efectivitat de les propostes a través de bancs de proves que son difícils de programar amb altres paradigmes o que tenen solucions diferents. A la majoria dels casos les nostres solucions tenen un rendiment similar o millor que les solucions preexistents, que inclouen implementacions en ben reconegues biblioteques paral·leles d'alt rendiment.
94

Enhancing low-level features with mid-level cues

Trulls Fortuny, Eduard 20 February 2015 (has links)
Local features have become an essential tool in visual recognition. Much of the progress in computer vision over the past decade has built on simple, local representations such as SIFT or HOG. SIFT in particular shifted the paradigm in feature representation. Subsequent works have often focused on improving either computational efficiency, or invariance properties. This thesis belongs to the latter group. Invariance is a particularly relevant aspect if we intend to work with dense features. The traditional approach to sparse matching is to rely on stable interest points, such as corners, where scale and orientation can be reliably estimated, enforcing invariance; dense features need to be computed on arbitrary points. Dense features have been shown to outperform sparse matching techniques in many recognition problems, and form the bulk of our work. In this thesis we present strategies to enhance low-level, local features with mid-level, global cues. We devise techniques to construct better features, and use them to handle complex ambiguities, occlusions and background changes. To deal with ambiguities, we explore the use of motion to enforce temporal consistency with optical flow priors. We also introduce a novel technique to exploit segmentation cues, and use it to extract features invariant to background variability. For this, we downplay image measurements most likely to belong to a region different from that where the descriptor is computed. In both cases we follow the same strategy: we incorporate mid-level, "big picture" information into the construction of local features, and proceed to use them in the same manner as we would the baseline features. We apply these techniques to different feature representations, including SIFT and HOG, and use them to address canonical vision problems such as stereo and object detection, demonstrating that the introduction of global cues yields consistent improvements. We prioritize solutions that are simple, general, and efficient. Our main contributions are as follows: (a) An approach to dense stereo reconstruction with spatiotemporal features, which unlike existing works remains applicable to wide baselines. (b) A technique to exploit segmentation cues to construct dense descriptors invariant to background variability, such as occlusions or background motion. (c) A technique to integrate bottom-up segmentation with recognition efficiently, amenable to sliding window detectors. / Les "features" locals s'han convertit en una eina fonamental en el camp del reconeixement visual. Gran part del progrés experimentat en el camp de la visió per computador al llarg de l'última decada es basa en representacions locals de baixa complexitat, com SIFT o HOG. SIFT, en concret, ha canviat el paradigma en representació de característiques visuals. Els treballs que l'han succeït s'acostumen a centrar o bé a millorar la seva eficiencia computacional, o bé propietats d'invariança. El treball presentat en aquesta tesi pertany al segon grup. L'invariança es un aspecte especialment rellevant quan volem treballab amb "features" denses, és a dir per a cada pixel. La manera tradicional d'atacar el problema amb "features" de baixa densitat consisteix en seleccionar punts d'interés estables, com per exemple cantonades, on l'escala i l'orientació poden ser estimades de manera robusta. Les "features" denses, per definició, han de ser calculades en punts arbitraris de la imatge. S'ha demostrat que les "features" denses obtenen millors resultats en tècniques de correspondència per a molts problemes en reconeixement, i formen la major part del nostre treball. En aquesta tesi presentem estratègies per a enriquir "features" locals de baix nivell amb "cues" o dades globals, de mitja complexitat. Dissenyem tècniques per a construïr millors "features", que usem per a atacar problemes tals com correspondències amb un grau elevat d'ambigüetat, oclusions, i canvis del fons de la imatge. Per a atacar ambigüetats, explorem l'ús del moviment per a imposar consistència espai-temporal mitjançant informació d'"optical flow". També presentem una tècnica per explotar dades de segmentació que fem servir per a extreure "features" invariants a canvis en el fons de la imatge. Aquest mètode consisteix en atenuar els components de la imatge (i per tant les "features") que probablement corresponguin a regions diferents a la del descriptor que estem calculant. En ambdós casos seguim la mateixa estratègia: la nostra voluntat és incorporar dades globals d'un nivell de complexitat mitja a la construcció de "features" locals, que procedim a utilitzar de la mateixa manera que les "features" originals. Aquestes tècniques són aplicades a diferents tipus de representacions, incloent SIFT i HOG, i mostrem com utilitzar-les per a atacar problemes fonamentals en visió per computador tals com l'estèreo i la detecció d'objectes. En aquest treball demostrem que introduïnt informació global en la construcció de "features" locals podem obtenir millores consistentment. Donem prioritat a solucions senzilles, generals i eficients. Aquestes són les principals contribucions de la tesi: (a) Una tècnica per a reconstrucció estèreo densa mitjançant "features" espai-temporals, amb l'avantatge respecte a treballs existents que podem aplicar-la a càmeres en qualsevol configuració geomètrica ("wide-baseline"). (b) Una tècnica per a explotar dades de segmentació dins la construcció de descriptors densos, fent-los invariants a canvis al fons de la imatge, i per tant a problemes com les oclusions en estèreo o objectes en moviment. (c) Una tècnica per a integrar segmentació de manera ascendent ("bottom-up") en problemes de reconeixement d'una manera eficient, dissenyada per a detectors de tipus "sliding window".
95

A pragmatic approach toward securing inter-domain routing

Siddiqui, Muhammad Shuaib 19 December 2014 (has links)
Internet security poses complex challenges at different levels, where even the basic requirement of availability of Internet connectivity becomes a conundrum sometimes. Recent Internet service disruption events have made the vulnerability of the Internet apparent, and exposed the current limitations of Internet security measures as well. Usually, the main cause of such incidents, even in the presence of the security measures proposed so far, is the unintended or intended exploitation of the loop holes in the protocols that govern the Internet. In this thesis, we focus on the security of two different protocols that were conceived with little or no security mechanisms but play a key role both in the present and the future of the Internet, namely the Border Gateway Protocol (BGP) and the Locator Identifier Separation Protocol (LISP). The BGP protocol, being the de-facto inter-domain routing protocol in the Internet, plays a crucial role in current communications. Due to lack of any intrinsic security mechanism, it is prone to a number of vulnerabilities that can result in partial paralysis of the Internet. In light of this, numerous security strategies were proposed but none of them were pragmatic enough to be widely accepted and only minor security tweaks have found the pathway to be adopted. Even the recent IETF Secure Inter-Domain Routing (SIDR) Working Group (WG) efforts including, the Resource Public Key Infrastructure (RPKI), Route Origin authorizations (ROAs), and BGP Security (BGPSEC) do not address the policy related security issues, such as Route Leaks (RL). Route leaks occur due to violation of the export routing policies among the Autonomous Systems (ASes). Route leaks not only have the potential to cause large scale Internet service disruptions but can result in traffic hijacking as well. In this part of the thesis, we examine the route leak problem and propose pragmatic security methodologies which a) require no changes to the BGP protocol, b) are neither dependent on third party information nor on third party security infrastructure, and c) are self-beneficial regardless of their adoption by other players. Our main contributions in this part of the thesis include a) a theoretical framework, which, under realistic assumptions, enables a domain to autonomously determine if a particular received route advertisement corresponds to a route leak, and b) three incremental detection techniques, namely Cross-Path (CP), Benign Fool Back (BFB), and Reverse Benign Fool Back (R-BFB). Our strength resides in the fact that these detection techniques solely require the analytical usage of in-house control-plane, data-plane and direct neighbor relationships information. We evaluate the performance of the three proposed route leak detection techniques both through real-time experiments as well as using simulations at large scale. Our results show that the proposed detection techniques achieve high success rates for countering route leaks in different scenarios. The motivation behind LISP protocol has shifted over time from solving routing scalability issues in the core Internet to a set of vital use cases for which LISP stands as a technology enabler. The IETF's LISP WG has recently started to work toward securing LISP, but the protocol still lacks end-to-end mechanisms for securing the overall registration process on the mapping system ensuring RLOC authorization and EID authorization. As a result LISP is unprotected against different attacks, such as RLOC spoofing, which can cripple even its basic functionality. For that purpose, in this part of the thesis we address the above mentioned issues and propose practical solutions that counter them. Our solutions take advantage of the low technological inertia of the LISP protocol. The changes proposed for the LISP protocol and the utilization of existing security infrastructure in our solutions enable resource authorizations and lay the foundation for the needed end-to-end security.
96

Energy-efficient mobile GPU systems

Arnau Montañés, Jose Maria 24 April 2015 (has links)
The design of mobile GPUs is all about saving energy. Smartphones and tablets are battery-operated and thus any type of rendering needs to use as little energy as possible. Furthermore, smartphones do not include sophisticated cooling systems due to their small size, making heat dissipation a primary concern. Improving the energy-efficiency of mobile GPUs will be absolutely necessary to achieve the performance required to satisfy consumer expectations, while maintaining operating time per battery charge and keeping the GPU in its thermal limits. The first step in optimizing energy consumption is to identify the sources of energy drain. Previous studies have demonstrated that the register file is one of the main sources of energy consumption in a GPU. As graphics workloads are highly data- and memory-parallel, GPUs rely on massive multithreading to hide the memory latency and keep the functional units busy. However, aggressive multithreading requires a huge register file to keep the registers of thousands of simultaneous threads. Such a big register file exceeds the power budget typically available for an embedded graphics processors and, hence, more energy-efficient memory latency tolerance techniques are necessary. On the other hand, prior research showed that the off-chip accesses to system memory are one of the most expensive operations in terms of energy in a mobile GPU. Therefore, optimizing memory bandwidth usage is a primary concern in mobile GPU design. Many bandwidth saving techniques, such as texture compression or ARM's transaction elimination, have been proposed in both industry and academia. The purpose of this thesis is to study the characteristics of mobile graphics processors and mobile workloads in order to propose different energy saving techniques specifically tailored for the low-power segment. Firstly, we focus on energy-efficient memory latency tolerance. We analyze several techniques such as multithreading and prefetching and conclude that they are effective but not energy-efficient. Next, we propose an architecture for the fragment processors of a mobile GPU that is based on the decoupled access/execute paradigm. The results obtained by using a cycle-accurate mobile GPU simulator and several commercial Android games show that the decoupled architecture combined with a small degree of multithreading provides the most energy efficient solution for hiding memory latency. More specifically, the decoupled access/execute-like design with just 4 SIMD threads/processor is able to achieve 97% of the performance of a larger GPU with 16 SIMD threads/processor, while providing 20.5% energy savings on average. Secondly, we focus on optimizing memory bandwidth in a mobile GPU. We analyze the bandwidth usage in a set of commercial Android games and find that most of the bandwidth is employed for fetching textures, and also that consecutive frames share most of the texture dataset as they tend to be very similar. However, the GPU cannot capture inter-frame texture re-use due to the big size of the texture dataset for one frame. Based on this analysis, we propose Parallel Frame Rendering (PFR), a technique that overlaps the processing of multiple frames in order to exploit inter-frame texture re-use and save bandwidth. By processing multiple frames in parallel textures are fetched once every two frames instead of being fetched in a frame basis as in conventional GPUs. PFR provides 23.8% memory bandwidth savings on average in our set of Android games, that result in 12% speedup and 20.1% energy savings. Finally, we improve PFR by introducing a hardware memoization system on top. We analyze the redundancy in mobile games and find that more than 38% of the Fragment Program executions are redundant on average. We thus propose a task-level hardware-based memoization system that provides 15% speedup and 12% energy savings on average over a PFR-enabled GPU. / El diseño de las GPUs (Graphics Procesing Units) móviles se centra fundamentalmente en el ahorro energético. Los smartphones y las tabletas son dispositivos alimentados mediante baterías y, por lo tanto, cualquier tipo de renderizado debe utilizar la menor cantidad de energía posible. Mejorar la eficiencia energética de las GPUs móviles será absolutamente necesario para alcanzar el rendimiento requirido para satisfacer las expectativas de los usuarios, sin reducir el tiempo de vida de la batería. El primer paso para optimizar el consumo energético consiste en identificar qué componentes son los principales consumidores de la batería. Estudios anteriores han identificado al banco de registros y a los accessos a memoria principal como las mayores fuentes de consumo energético en una GPU. El propósito de esta tesis es estudiar las características de los procesadores gráficos móviles y de las aplicaciones móviles con el objetivo de proponer distintas técnicas de ahorro energético. En primer lugar, la investigación se centra en desarrollar métodos energéticamente eficientes para ocultar la latencia de la memoria principal. El resultado de la investigación es una arquitectura desacoplada para los Fragment Processors de la GPU. Los resultados experimentales utilizando un simulador de ciclo y distintos juegos de Android muestran que una arquitectura desacoplada, combinada con un nivel de multithreading moderado, proporciona la solución más eficiente desde el punto de vista energético para ocultar la latencia de la memoria prinicipal. Más específicamente, la arquitectura desacoplada con sólo 4 SIMD threads/processor es capaz de alcanzar el 97% del rendimiento de una GPU más grande con 16 SIMD threads/processor, al tiempo que se reduce el consumo energético en un 20.5%. En segundo lugar, el trabajo de investigación se centró en optimizar el ancho de banda en una GPU móvil. Se realizó un estudio del uso del ancho de banda en distintos juegos de Android y se observó que la mayor parte del ancho de banda se utiliza para leer texturas. Además, se observó que frames consecutivos comparten una gran parte de las texturas. Sin embargo, la GPU no puede capturar el reuso de texturas entre frames dado que el tamaño de las texturas utilizadas por un frame es mucho mayor que la caché de segundo nivel. Basándose en este análisis, se desarrolló Parallel Frame Rendering (PFR), una técnica que solapa el procesado de multiples frames consecutivos con el objetivo de explotar el reuso de texturas entre frames y ahorrar así ancho de bando. Al procesar múltiples frames en paralelo las texturas se leen de memoria principal una vez cada dos frames en lugar de leerse en cada frame como sucede en una GPU convencional. PFR proporciona un ahorro del 23.8% en ancho de banda en promedio para distintos juegos de Android, este ahorro de ancho de banda redunda en un incremento del rendimiento del 12% y un ahorro energético del 20.1%. Por último, se mejoró PFR introduciendo un sistema hardware capaz de evitar cómputos redundantes. Un análisis de distintos juegos de Android reveló que más de un 38% de las ejecuciones del Fragment Program eran redundantes en promedio. Así pues, se propuso un sistema hardware capaz de identificar y eliminar parte de los cómputos y accessos a memoria redundantes, dicho sistema proporciona un incremento del rendimiento del 15% y un ahorro energético del 12% en promedio con respecto a una GPU móvil basada en PFR.
97

Techniques to improve concurrency in hardware transactional memory

Armejach Sanosa, Adrià 13 June 2014 (has links)
Transactional Memory (TM) aims to make shared memory parallel programming easier by abstracting away the complexity of managing shared data. The programmer defines sections of code, called transactions, which the TM system guarantees that will execute atomically and in isolation from the rest of the system. The programmer is not required to implement such behaviour, as happens in traditional mutual exclusion techniques like locks - that responsibility is delegated to the underlying TM system. In addition, transactions can exploit parallelism that would not be available in mutual exclusion techniques; this is achieved by allowing optimistic execution assuming no other transaction operates concurrently on the same data. If that assumption is true the transaction commits its updates to shared memory by the end of its execution, otherwise, a conflict occurs and the TM system may abort one of the conflicting transactions to guarantee correctness; the aborted transaction would roll-back its local updates and be re-executed. Hardware and software implementations of TM have been studied in detail. However, large-scale adoption of software-only approaches have been hindered for long due to severe performance limitations. In this thesis, we focus on identifying and solving hardware transactional memory (HTM) issues in order to improve concurrency and scalability. Two key dimensions determine the HTM design space: conflict detection and speculative version management. The first determines how conflicts are detected between concurrent transactions and how to resolve them. The latter defines where transactional updates are stored and how the system deals with two versions of the same logical data. This thesis proposes a flexible mechanism that allows efficient storage and access to two versions of the same logical data, improving overall system performance and energy efficiency. Additionally, in this thesis we explore two solutions to reduce system contention - circumstances where transactions abort due to data dependencies - in order to improve concurrency of HTM systems. The first mechanism provides a suitable design to apply prefetching to speed-up transaction executions, lowering the window of time in which such transactions can experience contention. The second is an accurate abort prediction mechanism able to identify, before a transaction's execution, potential conflicts with running transactions. This mechanism uses past behaviour of transactions and locality in memory references to infer predictions, adapting to variations in workload characteristics. We demonstrate that this mechanism is able to manage contention efficiently in single-application and multi-application scenarios. Finally, this thesis also analyses initial real-world HTM protocols that recently appeared in market products. These protocols have been designed to be simple and easy to incorporate in existing chip-multiprocessors. However, this simplicity comes at the cost of severe performance degradation due to transient and persistent livelock conditions, potentially preventing forward progress. We show that existing techniques are unable to mitigate this degradation effectively. To deal with this issue we propose a set of techniques that retain the simplicity of the protocol while providing improved performance and forward progress guarantees in a wide variety of transactional workloads.
98

Multi-constraint scheduling of MapReduce workloads

Polo Bardès, Jordà 15 July 2014 (has links)
In recent years there has been an extraordinary growth of large-scale data processing and related technologies in both, industry and academic communities. This trend is mostly driven by the need to explore the increasingly large amounts of information that global companies and communities are able to gather, and has lead the introduction of new tools and models, most of which are designed around the idea of handling huge amounts of data. A good example of this trend towards improved large-scale data processing is MapReduce, a programming model intended to ease the development of massively parallel applications, and which has been widely adopted to process large datasets thanks to its simplicity. While the MapReduce model was originally used primarily for batch data processing in large static clusters, nowadays it is mostly deployed along with other kinds of workloads in shared environments in which multiple users may be submitting concurrent jobs with completely different priorities and needs: from small, almost interactive, executions, to very long applications that take hours to complete. Scheduling and selecting tasks for execution is extremely relevant in MapReduce environments since it governs a job's opportunity to make progress and determines its performance. However, only basic primitives to prioritize between jobs are available at the moment, constantly causing either under or over-provisioning, as the amount of resources needed to complete a particular job are not obvious a priori. This thesis aims to address both, the lack of management capabilities and the increased complexity of the environments in which MapReduce is executed. To that end, new models and techniques are introduced in order to improve the scheduling of MapReduce in the presence of different constraints found in real-world scenarios, such as completion time goals, data locality, hardware heterogeneity, or availability of resources. The focus is on improving the integration of MapReduce with the computing infrastructures in which it usually runs, allowing alternative techniques for dynamic management and provisioning of resources. More specifically, it is focused in three scenarios that are incremental in its scope. First, it studies the prospects of using high-level performance criteria to manage and drive the performance of MapReduce applications, taking advantage of the fact that MapReduce is executed in controlled environments in which the status of the cluster is known. Second, it examines the feasibility and benefits of making the MapReduce runtime more aware of the underlying hardware and the characteristics of applications. And finally, it also considers the interaction between MapReduce and other kinds of workloads, proposing new techniques to handle these increasingly complex environments. Following these three items described above, this thesis contributes to the management of MapReduce workloads by 1) proposing a performance model for MapReduce workloads and a scheduling algorithm that leverages the proposed model and is able to adapt depending on the various needs of its users in the presence of completion time constraints; 2) proposing a new resource model for MapReduce and a placement algorithm aware of the underlying hardware as well as the characteristics of the applications, capable of improving cluster utilization while still being guided by job performance metrics; and 3) proposing a model for shared environments in which MapReduce is executed along with other kinds of workloads such as transactional applications, and a scheduler aware of these workloads and its expected demand of resources, capable of improving resource utilization across machines while observing completion time goals.
99

Techniques for improving the performance of software transactional memory

Stipić, Srđan 21 July 2014 (has links)
Transactional Memory (TM) gives software developers the opportunity to write concurrent programs more easily compared to any previous programming paradigms and gives a performance comparable to lock-based synchronizations. Current Software TM (STM) implementations have performance overheads that can be reduced by introducing new abstractions in Transactional Memory programming model. In this thesis we present four new techniques for improving the performance of Software TM: (i) Abstract Nested Transactions (ANT), (ii) TagTM, (iii) profile-guided transaction coalescing, and (iv) dynamic transaction coalescing. ANT improves performance of transactional applications without breaking the semantics of the transactional paradigm, TagTM speeds up accesses to transactional meta-data, profile-guided transaction coalescing lowers transactional overheads at compile time, and dynamic transaction coalescing lowers transactional overheads at runtime. Our analysis shows that Abstract Nested Transactions, TagTM, profile-guided transaction coalescing, and dynamic transaction coalescing improve the performance of the original programs that use Software Transactional Memory.
100

Information dissemination in mobile networks

Trullols Cruces, Oscar 17 July 2014 (has links)
This thesis proposes some solutions to relieve, using Wi-Fi wireless networks, the data consumption of cellular networks using cooperation between nodes, studies how to make a good deployment of access points to optimize the dissemination of contents, analyzes some mechanisms to reduce the nodes' power consumption during data dissemination in opportunistic networks, as well as explores some of the risks that arise in these networks. Among the applications that are being discussed for data off-loading from cellular networks, we can find Information Dissemination in Mobile Networks. In particular, for this thesis, the Mobile Networks will consist of Vehicular Ad-hoc Networks and Pedestrian Ad-Hoc Networks. In both scenarios we will find applications with the purpose of vehicle-to-vehicle or pedestrian-to-pedestrian Information dissemination, as well as vehicle-to-infrastructure or pedestrian-to-infrastructure Information dissemination. We will see how both scenarios (vehicular and pedestrian) share many characteristics, while on the other hand some differences make them unique, and therefore requiring of specific solutions. For example, large car batteries relegate power saving techniques to a second place, while power-saving techniques and its effects to network performance is a really relevant issue in Pedestrian networks. While Cellular Networks offer geographically full-coverage, in opportunistic Wi-Fi wireless solutions the short-range non-fullcoverage paradigm as well as the high mobility of the nodes requires different network abstractions like opportunistic networking, Disruptive/Delay Tolerant Networks (DTN) and Network Coding to analyze them. And as a particular application of Dissemination in Mobile Networks, we will study the malware spread in Mobile Networks. Even though it relies on similar spreading mechanisms, we will see how it entails a different perspective on Dissemination.

Page generated in 0.0424 seconds