Global ETD Search

81	Adaptive Prefetching and Cache Partitioning for Multicore Processors Selfa Oliver, Vicent 13 November 2018 (has links) El acceso a la memoria principal en los procesadores actuales supone un importante cuello de botella para las prestaciones, dado que los diferentes núcleos compiten por el limitado ancho de banda de memoria, agravando la brecha entre las prestaciones del procesador y las de la memoria principal. Distintas técnicas atacan este problema, siendo las más relevantes el uso de jerarquías de caché multinivel y la prebúsqueda. Las cachés jerárquicas aprovechan la localidad temporal y espacial que en general presentan los programas en el acceso a los datos, para mitigar las enormes latencias de acceso a memoria principal. Para limitar el número de accesos a la memoria DRAM, fuera del chip, los procesadores actuales cuentan con grandes cachés de último nivel (LLC). Para mejorar su utilización y reducir costes, estas cachés suelen compartirse entre todos los núcleos del procesador. Este enfoque mejora significativamente el rendimiento de la mayoría de las aplicaciones en comparación con el uso de cachés privados más pequeños. Compartir la caché, sin embargo, presenta una problema importante: la interferencia entre aplicaciones. La prebúsqueda, por otro lado, trae bloques de datos a las cachés antes de que el procesador los solicite, ocultando la latencia de memoria principal. Desafortunadamente, dado que la prebúsqueda es una técnica especulativa, si no tiene éxito puede contaminar la caché con bloques que no se usarán. Además, las prebúsquedas interfieren con los accesos a memoria normales, tanto los del núcleo que emite las prebúsquedas como los de los demás. Esta tesis se centra en reducir la interferencia entre aplicaciones, tanto en las caché compartidas como en el acceso a la memoria principal. Para reducir la interferencia entre aplicaciones en el acceso a la memoria principal, el mecanismo propuesto en esta disertación regula la agresividad de cada prebuscador, activando o desactivando selectivamente algunos de ellos, dependiendo de su rendimiento individual y de los requisitos de ancho de banda de memoria principal de los otros núcleos. Con respecto a la interferencia en cachés compartidos, esta tesis propone dos técnicas de particionado para la LLC, las cuales otorgan más espacio de caché a las aplicaciones que progresan más lentamente debido a la interferencia entre aplicaciones. La primera propuesta de particionado de caché requiere hardware específico no disponible en procesadores comerciales, por lo que se ha evaluado utilizando un entorno de simulación. La segunda propuesta de particionado de caché presenta una familia de políticas que superan las limitaciones en el número de particiones y en el número de vías de caché disponibles mediante la agrupación de aplicaciones en clústeres y la superposición de particiones de caché, por lo que varias aplicaciones comparten las mismas vías. Dado que se ha implementado utilizando los mecanismos para el particionado de la LLC que presentan algunos procesadores Intel modernos, esta propuesta ha sido evaluada en una máquina real. Los resultados experimentales muestran que el mecanismo de prebúsqueda selectiva propuesto en esta tesis reduce el número de solicitudes de memoria principal en un 20%, cosa que se traduce en mejoras en la equidad del sistema, el rendimiento y el consumo de energía. Por otro lado, con respecto a los esquemas de partición propuestos, en comparación con un sistema sin particiones, ambas propuestas reducen la iniquidad del sistema en un promedio de más del 25%, independientemente de la cantidad de aplicaciones en ejecución, y esta reducción en la injusticia no afecta negativamente al rendimiento. / Accessing main memory represents a major performance bottleneck in current processors, since the different cores compete among them for the limited offchip bandwidth, aggravating even more the so called memory wall. Several techniques have been applied to deal with the core-memory performance gap, with the most preeminent ones being prefetching and hierarchical caching. Hierarchical caches leverage the temporal and spacial locality of the accessed data, mitigating the huge main memory access latencies. To limit the number of accesses to the off-chip DRAM memory, current processors feature large Last Level Caches. These caches are shared between all the cores to improve the utilization of the cache space and reduce cost. This approach significantly improves the performance of most applications compared to using smaller private caches. Cache sharing, however, presents an important shortcoming: the interference between applications. Prefetching, on the other hand, brings data blocks to the caches before they are requested, hiding the main memory latency. Unfortunately, since prefetching is a speculative technique, inaccurate prefetches may pollute the cache with blocks that will not be used. In addition, the prefetches interfere with the regular memory requests, both the ones from the application running on the core that issued the prefetches and the others. This thesis focuses on reducing the inter-application interference, both in the shared cache and in the access to the main memory. To reduce the interapplication interference in the access to main memory, the proposed approach regulates the aggressiveness of each core prefetcher, and selectively activates or deactivates some of them, depending on their individual performance and the main memory bandwidth requirements of the other cores. With respect to interference in shared caches, this thesis proposes two LLC partitioning techniques that give more cache space to the applications that have their progress diminished due inter-application interferences. The first cache partitioning proposal requires dedicated hardware not available in commercial processors, so it has been evaluated using a simulation framework. The second proposal dealing with cache partitioning presents a family of partitioning policies that overcome the limitations in the number of partitions and the number of available ways by grouping applications and overlapping cache partitions, so multiple applications share the same ways. Since it has been implemented using the cache partitioning features of modern Intel processors it has been evaluated in a real machine. Experimental results show that the proposed selective prefetching mechanism reduces the number of main memory requests by 20%, which translates to improvements in unfairness, performance, and energy consumption. On the other hand, regarding the proposed partitioning schemes, compared to a system with no partitioning, both reduce unfairness more than 25% on average, regardless of the number of applications running in the multicore, and this reduction in unfairness does not negatively affect the performance. / L'accés a la memòria principal en els processadors actuals suposa un important coll d'ampolla per a les prestacions, ja que els diferents nuclis competeixen pel limitat ample de banda de memòria, agreujant la bretxa entre les prestacions del processador i les de la memòria principal. Diferents tècniques ataquen aquest problema, sent les més rellevants l'ús de jerarquies de memòria cau multinivell i la prebusca. Les memòries cau jeràrquiques aprofiten la localitat temporal i espacial que en general presenten els programes en l'accés a les dades per mitigar les enormes latències d'accés a memòria principal. Per limitar el nombre d'accessos a la memòria DRAM, fora del xip, els processadors actuals compten amb grans caus d'últim nivell (LLC). Per millorar la seva utilització i reduir costos, aquestes memòries cau solen compartir-se entre tots els nuclis del processador. Aquest enfocament millora significativament el rendiment de la majoria de les aplicacions en comparació amb l'ús de caus privades més menudes. Compartir la memòria cau, no obstant, presenta una problema important: la interferencia entre aplicacions. La prebusca, per altra banda, porta blocs de dades a les memòries cau abans que el processador els sol·licite, ocultant la latència de memòria principal. Desafortunadament, donat que la prebusca és una técnica especulativa, si no té èxit pot contaminar la memòria cau amb blocs que no fan falta. A més, les prebusques interfereixen amb els accessos normals a memòria, tant els del nucli que emet les prebusques com els dels altres. Aquesta tesi es centra en reduir la interferència entre aplicacions, tant en les cau compartides com en l'accés a la memòria principal. Per reduir la interferència entre aplicacions en l'accés a la memòria principal, el mecanismo proposat en aquesta dissertació regula l'agressivitat de cada prebuscador, activant o desactivant selectivament alguns d'ells, en funció del seu rendiment individual i dels requisits d'ample de banda de memòria principal dels altres nuclis. Pel que fa a la interferència en caus compartides, aquesta tesi proposa dues tècniques de particionat per a la LLC, les quals atorguen més espai de memòria cau a les aplicacions que progressen més lentament a causa de la interferència entre aplicacions. La primera proposta per al particionat de memòria cau requereix hardware específic no disponible en processadors comercials, per la qual cosa s'ha avaluat utilitzant un entorn de simulació. La segona proposta de particionat per a memòries cau presenta una família de polítiques que superen les limitacions en el nombre de particions i en el nombre de vies de memòria cau disponibles mitjan¿ cant l'agrupació d'aplicacions en clústers i la superposició de particions de memòria cau, de manera que diverses aplicacions comparteixen les mateixes vies. Atès que s'ha implementat utilitzant els mecanismes per al particionat de la LLC que ofereixen alguns processadors Intel moderns, aquesta proposta s'ha avaluat en una màquina real. Els resultats experimentals mostren que el mecanisme de prebusca selectiva proposat en aquesta tesi redueix el nombre de sol·licituds a la memòria principal en un 20%, cosa que es tradueix en millores en l'equitat del sistema, el rendiment i el consum d'energia. Per altra banda, pel que fa als esquemes de particiónat proposats, en comparació amb un sistema sense particions, ambdues propostes redueixen la iniquitat del sistema en més d'un 25% de mitjana, independentment de la quantitat d'aplicacions en execució, i aquesta reducció en la iniquitat no afecta negativament el rendiment. / Selfa Oliver, V. (2018). Adaptive Prefetching and Cache Partitioning for Multicore Processors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112423
82	Intégration du système réparti CHORUS dans le langage de haut niveau Pascal Guillemont, Marc 04 March 1982 (has links) (PDF) Etude des systèmes d'exploitation de quelques systèmes repartis universitaires ou industriels et comparaison à CHORUS. Exposé des principes généraux d'implantation du système d'exploitation CHORUS, de l'architecture du système d'exploitation et ses relations avec les applications. Présentation d'une implantation réelle écrite en Pascal sur des microprocesseurs Intel 8086; étude des principales difficultés rencontrées d'une part avec le système de développement choisi, d'autre part avec le langage; l'aspect spécifiquement matériel n'est pas abordé. Bilan de cette expérience CHORUS Pascal système d'exploitation langage programme programmation interface entré es sorties systèmes répartis Intel 8086 noyau
83	Generation of dynamic control-dependence graphs for binary programs Pogulis, Jakob January 2014 (has links) Dynamic analysis of binary files is an area of computer science that has many purposes. It is useful when it comes to debugging software in a development environment and the developer needs to know which statements affected the value of a specific variable. But it is also useful when analyzing a software for potential vulnerabilities, where data controlled by a malicious user could potentially result in the software executing adverse commands or executing malicious code. In this thesis a tool has been developed to perform dynamic analysis of x86 binaries in order to generate dynamic control-dependence graphs over the execution. These graphs can be used to determine which conditional statements that resulted in a certain outcome. The tool has been developed for x86 Linux systems using the dynamic binary instrumentation framework PIN, developed and maintained by Intel. Techniques for utilizing the additional information about the control flow for a program available during the dynamic analysis in order to improve the control flow information have been implemented and tested. The basic theory of dynamic analysis as well as dynamic slicing is discussed, and a basic overview of the implementation of a dynamic analysis tool is presented. The impact on the performance of the dynamic analysis tool for the techniques used to improve the control flow graph is significant, but approaches to improving the performance are discussed. control-dependencies control-dependence graph intel-pin instrumentation dynamic analysis program analysis dependence graph dynamic dependence graph control dependency control flow analysis control flow Computer Sciences Datavetenskap (datalogi)
84	Modeling Intel® Cilk™ Plus Programs with Unified Modeling Languages Ata-Ul-Nasar, Mansoor January 2015 (has links) Recently multi-core processors have become very popular in computer systems. It allows multiple threads to be executed simultaneously. The advantage of multi-core comes by parallelizing codes to expand the work across hardware. Furthermore, this can be done by using a parallel environment developed by M.I.T. called Intel Cilk Plus, which is design to provide an easy and well-structured parallel programming approach. Intel Cilk Plus is an extension of C and C++ programming languages that describes data parallelism. This extension is very helpful and easy to use among other languages in this field. It has different features including keywords, reducers and array notations etc. In general, this article describes Intel Cilk Plus and its features. In addition, Unified Modelling Language, activity diagrams are used in term of graphical modelling of Intel Cilk Plus by describing the process of a system, capturing the dynamic behaviour of it and representing the flow from one activity to another using control flow. Later on Intel Cilk Plus keywords and UML diagrams tools will be evaluated, a comparison of different UML modelling tools will also be provided. Parallel Programming Intel Cilk Plus Unified Modelling Languages Activity Models Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap Software Engineering Programvaruteknik
85	PRACTICAL CONFIDENTIALITY-PRESERVING DATA ANALYTICS IN UNTRUSTED CLOUDS Savvas Savvides (9113975) 27 July 2020 (has links) <div> <div> <div> <p>Cloud computing offers a cost-efficient data analytics platform. This is enabled by constant innovations in tools and technologies for analyzing large volumes of data through distributed batch processing systems and real-time data through distributed stream processing systems. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. To address this stalemate, both software-based and hardware-based solutions have been proposed yet all have substantial limitations in terms of efficiency, expressiveness, and security. In this thesis, we present solutions that enable practical and expressive confidentiality- preserving batch and stream-based analytics. We achieve this by performing computations over encrypted data using Partially Homomorphic Encryption (PHE) and Property-Preserving Encryption (PPE) in novel ways, and by utilizing remote or Trusted Execution Environment (TEE) based trusted services where needed.</p><p><br></p><p>We introduce a set of extensions and optimizations to PHE and PPE schemes and propose the novel abstraction of Secure Data Types (SDTs) which enables the application of PHE and PPE schemes in ways that improve performance and security. These abstractions are leveraged to enable a set of compilation techniques making data analytics over encrypted data more practical. When PHE alone is not expressive enough to perform analytics over encrypted data, we use a novel planner engine to decide the most efficient way of utilizing client-side completion, remote re-encryption, or trusted hardware re-encryption based on Intel Software Guard eXtensions (SGX) to overcome the limitations of PHE. We also introduce two novel symmetric PHE schemes that allow arithmetic operations over encrypted data. Being symmetric, our schemes are more efficient than the state-of-the-art asymmetric PHE schemes without compromising the level of security or the range of homomorphic operations they support. We apply the aforementioned techniques in the context of batch data analytics and demonstrate the improvements over previous systems. Finally, we present techniques designed to enable the use of PHE and PPE in resource-constrained Internet of Things (IoT) devices and demonstrate the practicality of stream processing over encrypted data.</p></div></div></div><div><div><div> </div> </div> </div> Distributed Computing Computer System Security Data Encryption Cloud computing Distributed processing of data Applied Crytpography Encrypted Databases Trusted Execution Environments Intel SGX Homomorphic Encryption Confidentiality
86	Neuronové sítě pro klasifikaci typu a kvality průmyslových výrobků / Neural networks for visual classification and inspection of the industrial products Míček, Vojtěch January 2020 (has links) The aim of this master's thesis thesis is to enable evaluation of quality, or the type of product in industrial applications using artificial neural networks, especially in applications where the classical approach of machine vision is too complicated. The system thus designed is implemented onto a specific hardware platform and becomes a subject to the final optimalisation for the hardware platform for the best performance of the system.
87	Sledování osob v záznamu z dronu / Tracking People in Video Captured from a Drone Lukáč, Jakub January 2020 (has links) Práca rieši možnosť zaznamenávať pozíciu osôb v zázname z kamery drona a určovať ich polohu. Absolútna pozícia sledovanej osoby je odvodená vzhľadom k pozícii kamery, teda vzhľadom k umiestneniu drona vybaveného príslušnými senzormi. Zistené dáta sú po ich spracovaní vykreslené ako príslušné cesty. Práca si ďalej dáva za cieľ využiť dostupné riešenia čiastkových problémov: detekcia osôb v obraze, identifikácie jednotlivých osôb v čase, určenie vzdialenosti objektu od kamery, spracovanie potrebných senzorových dát. Následne využiť preskúmané metódy a navrhnúť riešenie, ktoré bude v reálnom čase pracovať na uvedenom probléme. Implementačná časť spočíva vo využití akcelerátoru Intel NCS v spojení s Raspberry Pi priamo ako súčasť drona. Výsledný systém je schopný generovať výstup o polohe osôb v zábere kamery a príslušne ho prezentovať.
88	Sledování osob ve videu z dronu / Tracking People in Video Captured from a Drone Lukáč, Jakub January 2021 (has links) Práca rieši možnosť zaznamenávať pozíciu osôb v zázname z kamery drona a určovať ich polohu. Absolútna pozícia sledovanej osoby je odvodená vzhľadom k pozícii kamery, teda vzhľadom k umiestneniu drona vybaveného príslušnými senzormi. Zistené dáta sú po ich spracovaní vykreslené ako príslušné cesty v grafe. Práca si ďalej dáva za cieľ využiť dostupné riešenia čiastkových problémov: detekcia osôb v obraze, identifikácia jednotlivých osôb v čase, určenie vzdialenosti objektu od kamery, spracovanie potrebných senzorových dát. Následne využiť preskúmané metódy a navrhnúť riešenie, ktoré bude v reálnom čase pracovať na uvedenom probléme. Implementačná časť spočíva vo využití akcelerátoru Intel NCS v spojení s Raspberry Pi priamo ako súčasť drona. Výsledný systém je schopný generovať výstup o polohe detekovaných osôb v zábere kamery a príslušne ho prezentovať.
89	Optimalizace polohy propelerové turbíny v kašně / Optimization of the propeller turbine position in a pit Duda, Petr January 2014 (has links) The thesis contains basic information about propeler turbines. It deals with the correct location in the fountain so as to ensure the highest possible performance. Part of the work is devoted to the all-weather resulting blade to blade channels and their impact on the room is filled with diffuser.
90	Bezsnímkové renderování / Frameless Rendering Najman, Pavel January 2012 (has links) The aim of this work is to create a simple raytracer with IPP library, which will use the frameless rendering technique. The first part of this work focuses on the raytracing method. The next part analyzes the frameless rendering technique and its adaptive version with focus on adaptive sampling. Third part describes the IPP library and implementation of a simple raytracer using this library. The last part evaluates the speed and rendering quality of the implemented system.

Search results