Global ETD Search

141	Co-projeto de hardware e software de um escalonador de processos para arquiteturas multicore heterogêneas baseadas em computação reconfigurável / Hardware and software co-design of a process scheduler for heterogeneous multicore architectures based on reconfigurable computing Maikon Adiles Fernandez Bueno 05 November 2013 (has links) As arquiteturas multiprocessadas heterogêneas têm como objetivo principal a extração de maior desempenho da execução dos processos, por meio da utilização de núcleos apropriados às suas demandas. No entanto, a extração de maior desempenho é dependente de um mecanismo eficiente de escalonamento, capaz de identificar as demandas dos processos em tempo real e, a partir delas, designar o processador mais adequado, de acordo com seus recursos. Este trabalho tem como objetivo propor e implementar o modelo de um escalonador para arquiteturas multiprocessadas heterogêneas, baseado em software e hardware, aplicado ao sistema operacional Linux e ao processador SPARC Leon3, como prova de conceito. Nesse sentido, foram implementados monitores de desempenho dentro dos processadores, os quais identificam as demandas dos processos em tempo real. Para cada processo, sua demanda é projetada para os demais processadores da arquitetura e em seguida é realizado um balanceamento visando maximizar o desempenho total do sistema, distribuindo os processos entre processadores, de modo a diminuir o tempo total de processamento de todos os processos. O algoritmo de maximização Hungarian, utilizado no balanceamento do escalonador, foi desenvolvido em hardware, proporcionando paralelismo e maior desempenho na execução do algoritmo. O escalonador foi validado por meio da execução paralela de diversos benchmarks, resultando na diminuição dos tempos de execução em relação ao escalonador sem suporte à heterogeneidade / Heterogeneous multiprocessor architectures have as main objective the extraction of higher performance from processes through the use of appropriate cores to their demands. However, the extraction of higher performance is dependent on an efficient scheduling mechanism, able to identify in real-time the demands of processes and to designate the most appropriate processor according to their resources. This work aims at design and implementations of a model of a scheduler for heterogeneous multiprocessor architectures based on software and hardware, applied to the Linux operating system and the SPARC Leon3 processor as proof of concept. In this sense, performance monitors have been implemented within the processors, which in real-time identifies the demands of processes. For each process, its demand is projected for the other processors in the architecture and then it is performed a balancing to maximize the total system performance by distributing processes among processors. The Hungarian maximization algorithm, used in balancing scheduler was developed in hardware, providing greater parallelism and performance in the execution of the algorithm. The scheduler has been validated through the parallel execution of several benchmarks, resulting in decreased execution times compared to the scheduler without the heterogeneity support Computação reconfigurável Escalonador Multi-core Multiprocessamento heterogêneo Heterogeneous multiprocessing Multicore Reconfigurable computing Scheduler
142	Analyzing Symbiosis on SMT Processors Using a Simple Co-scheduling Scheme Lundmark, Elias, Persson, Chris January 2017 (has links) Simultanous Multithreading (SMT) är ett koncept för att möjligöra effektivare utnyttjande av processorer genom att exekvera flera trådar samtidigt på en enda processorkärna, vilket leder till att systemet kan nyttjas till större grad. Om flera trådar använder samma funktonsenheter i procesorkärnan kommer effektiviteten att minska eftersom detta är ett scenario när SMT inte kan omvandla thread-level parallelism (TLP) till instruction-level parallelism (ILP). I tidigare arbete av de Blanche och Lundqvist föreslår de en simpel schemaläggningsprincip genom att anta att flera instanser av samma program använder samma resurser, bör dessa inte tillåtas att samköras. I detta arbete tillämpar vi deras princip på processorer med stöd för SMT, med motiveringen att flera identiska trådar använder samma funktionsenheter inom processorkärnan. Vi påvisar detta genom att förhindra program från att exekveras simultant med identiska program härleder till att SMT kan omvandla TLP till ILP oftare, när jobb inte kan utnyttja ILP självständigt. Intuitivt visar vi även att genom sakta ned ILP genom att göra det motsatta kan vi lindra belastningen på minnessystemet. / Simultaneous Multithreading (SMT) allows for more efficient processor utilization through co-executing multiple threads on a single processing core, increasing system efficiency and throughput. Multiple co-executing threads share the functional units of a processing core and if the threads use the same functional units, efficiency decreases as this is a scenario where SMT cannot convert thread-level parallelism (TLP) to instruction-level parallelism (ILP). In previous work by de Blanche and Lundqvist, they propose a simple co-scheduling principle co-scheduling multiple instances of the same job should be considered a bad co-schedule as they are more likely to use the same resources. In this thesis, we apply their principle on SMT processors with the rationale that identical threads should use the same functional units within a processing core. We demonstrate that by disallowing jobs to coexecute with itself we enable SMT to convert TLP to ILP more often and that this is true if jobs cannot exploit ILP by themselves. Intuitively, we also show that slowing down ILP by doing the opposite can alleviate the stress on the memory system. Scheduling Co-scheduling Multicore Multithreading SMT Throughput Elektroteknik och elektronik
143	Holographic Cross-connection for Optical Ising Machine Based on Multi-core Fiber Laser Liu, Lichuan, Liu, Lichuan January 2017 (has links) A method of holographic cross-connection is proposed for an Optical Ising machine system. The designed optical Ising machine based on multi-core fiber laser is introduced, including the theory of computation, history of optical computing, the concept of Ising model, the significance of optical Ising machine, the method to achieve Ising machine optically. The cross-connection part is based on computer-generated holograms (CGH), which is produced by Gerchburg-Saxton algorithm. The coupling coefficient between two channels as well as the phase change are controlled by CGHs. The design of holograms is discussed. The instrument used to display holograms is phase-only liquid crystal spatial light modulator (SLM) from HOLOEYE company. The optical system needed in this project, such as collimation lens and relay lens, is designed using Zemax. The system is first evaluated in Zemax simulation, and then constructed experimentally. The results show that we can control amplitude and phase of the reinjection beam at Multi-core fiber. Further experiment should be done to conclude that the control of the cross coupling between channels is achieved by displaying different holograms. Computer generated hologram Cross coupling Ising machine Multicore fiber laser Spatial light modulator
144	Porting a Real-Time Operating System to a Multicore Platform Sjöström Thames, Sixten January 2012 (has links) This thesis is part of the European MANY project. The goal of MANY is to provide developers with tools to develop software for multi and many-core hardware platforms. This is the first thesis that is part of MANY at Enea. The thesis aims to provide a knowledge base about software on many-core at the Enea student research group. More than just providing a knowledge base, a part of the thesis is also to port Enea's operating system OSE to Tilera's many-core processor TILEpro64. The thesis shall also investigate the memory hierarchy and interconnection network of the Tilera processor. The knowledge base about software on many-core was constrained to investigating the shared memory model and operating systems for many-core. This was achieved by investigating prominent academic research about operating systems for many-core processors. The conclusion was that a shared memory model does not scale and for the operating system case, operating systems shall be designed with scalability as one of the most important requirements. This thesis has implemented the hardware abstraction layer required to execute a single-core version of OSE on the TILEpro architecture. This was done in three steps. The Tilera hardware and the OSE software platform were investigated. After that, an OSE target port was chosen as reference architecture. Finally, the hardware dependent parts of the reference software were modified. A foundation has been made for future development. operating system operating systems many-core manycore multicore RTOS distributed operating system Computer Engineering Datorteknik
145	Amplification fibrée multivoie avec décomposition spectrale pour la synthèse d’impulsions femtosecondes / Multichannel fiber amplification with spectral splitting for femtosecond pulse synthesis Rigaud, Philippe 28 November 2014 (has links) Les impulsions femtosecondes (fs) sont employées pour réaliser des interactions lumière matière athermiques intéressant aussi bien les mondes industriel, médical que scientifique.Des lasers avec toujours plus de puissance crête (P c ) à des cadences toujours plus élevées sont requis. Les sources à fibre dopée ytterbium ont pour cela un potentiel important. Or, la durée des impulsions amplifiées demeure élevée (~ 300 fs) en raison du rétrécissement du spectre amplifié pour de forts niveaux de gain, limitant la valeur de P c accessible. L’amplification avec division spectrale à travers un réseau d’amplificateurs fibrés et la synthèse d’impulsions fs par recombinaison spectrale cohérente est proposée comme solution. Les composantes spectrales sont amplifiées séparément en parallèle avant d’être réassemblées en un seul faisceau. La gestion des relations de phase entre les rayonnements issus des voies assurent la reconstruction de l’impulsion après amplification. Différentes architectures sont considérées.Après avoir choisi et dimensionné l’une d’entre elles, nous avons réalisé l’amplification et la synthèse d’impulsions de 280 fs à travers 12 guides non couplés d’une fibre multicœur, sans étireur/compresseur. Nous avons mis en évidence le gain en puissance de cette architecture par rapport à un amplificateur monovoie, proportionnel au carré du nombre de voies mises enjeu. La compatibilité de ce montage avec l’amplification d’impulsions large bande (≈ 40 nm)a été prouvée. En perspective, les performances énergétiques accessibles et la transposition du schéma d’amplification aux oscillateurs en vue de produire des impulsions fs large bande à haute énergie sont discutées. / Femtosecond pulses (fs) are used to produce no thermal light matter interactions which areinteresting for industrial, medical, or scientific activities. Lasers producing higher peak powerat a higher repetition rate are required. Ytterbium doped fiber sources are good candidates.However, pulse duration is still high (~ 300 fs) owing to spectral narrowing at high gainlevels. Peak power is also limited. Amplification in an array of amplifiers with spectralsplitting and fs pulse synthesis by coherent spectral combining is proposed as a solution.Spectral components are separately amplified before to coherently recombine the amplifieroutputs in a single beam. Phase management of the radiations from different amplifiers leadsto short pulse synthesis. Different setups are considered. After the choice and the gauging ofone of them, we amplified and synthesized 280 fs pulses through 12 uncoupled cores of amulticore fiber, without stretcher/compressor devices. We demonstrated the powerenhancement of this setup compared to a single amplifier, proportional to the square of thenumber of amplifier used. Compatibility of the setup with broadband amplification (≈ 40 nm)was demonstrated. In prospects, performance scaling in terms of peak power are in a first timedevelopped. The conception of an oscillator based of this amplification scheme to produce fsbroadband and energetic pulses is proposed in a second time. Femtoseconde Amplification Fibre multicoeur Combinaison cohérente spectrale Femtosecond Amplification Multicore fiber Coherent spectral combining 621.366
146	Fast and flexible compilation techniques for effective speculative polyhedral parallelization / Techniques de compilation flexibles et rapides pour la parallelization polyédrique et spéculative Martinez Caamaño, Juan Manuel 29 September 2016 (has links) Dans cette thèse, nous présentons nos contributions à APOLLO : un compilateur de parallélisation automatique qui combine l'optimisation polyédrique et la parallélisation spéculative, afin d'optimiser des programmes dynamiques à la volée. Grâce à une phase de profilage en ligne et un modèle spéculatif du comportement mémoire du programme cible, Apollo est capable de sélectionner une optimisation et de générer le code résultant. Pendant l'exécution du programme optimisé, Apollo vérifie constamment la validité du modèle spéculatif. La contribution principale de cette thèse est un mécanisme de génération de code qui permet d'instancier toute transformation polyédrique, au cours de l'exécution du programme cible, sans engendrer de surcoût temporel majeur. Ce procédé est désormais utilisé dans Apollo. Nous l'appelons Code-Bones. Il procure des gains de performance significatifs par comparaison aux autres approches. / In this thesis, we present our contributions to APOLLO: an automatic parallelization compiler that combines polyhedral optimization with Thread-Level-Speculation, to optimize dynamic codes on-the-fly. Thanks to an online profiling phase and a speculation model about the target's code behavior, Apollo is able to select an optimization and to generate code based on it. During optimized code execution, Apollo constantly verifies the validity of the speculation model. The main contribution of this thesis is a code generation mechanism that is able to instantiate any polyhedral transformation, at runtime, without incurring a major time-overhead. This mechanism is currently in use inside Apollo. We called it Code-Bones. It provides significant performance benefits when compared to other approaches. Parallélisation Optimisation polyédrique Spéculatif Software Just-In-Time Compilateur Parallelization Compiler Speculative Multicore Software Just-In-Time 005.4
147	Improved composability of software components through parallel hardware platforms for in-car multimedia systems Knirsch, Andreas January 2015 (has links) Recent years have witnessed a significant change to vehicular user interfaces (UI). This is the result of increased functionality, triggered by the continuous proliferation of vehicular software and computer systems. The UI represents the integration point that must fulfil particular requirements for usability despite the increased functionality. A concurrent present trend is the substitution of federated systems with integrated architectures. The steadily rising number of interacting functional components and the increasing integration density implies a growing complexity that has an effect on system development. This evolution raises demands for concepts that aid the composition of such complex and interactive embedded software systems, operated within safety critical environments. This thesis explores the requirements related to composability of software components, based on the example of In-Car Multimedia (ICM). This thesis proposes a novel software architecture that provides an integration path for next-generation ICM. The investigation begins with an examination of characteristics, existing frameworks and applied practice regarding the development and composition of ICM systems. To this end, constructive aspects are identified as potential means for improving composability of independently developed software components that differ in criticality, temporal and computational characteristics. This research examines the feasibility of partitioning software components by exploitation of parallel hardware architectures. Experimental evaluations demonstrate the applicability of encapsulated scheduling domains. These are achieved through the utilisation of multiple technologies that complement each other and provide different levels of containment, while featuring efficient communication to preserve adequate interoperability. In spite of allocating dedicated computational resources to software components, certain resources are still shared and require concurrent access. Particular attention has been paid to management of concurrent access to shared resources to consider the software components' individual criticality and derived priority. A software based resource arbiter is specified and evaluated to improve the system's determinism. Within the context of automotive interactive systems, the UI is of vital importance, as it must conceal inherent complexity to minimise driver distraction. Therefore, the architecture is enhanced with a UI compositing infrastructure to facilitate implementation of a homogenous and comprehensive look and feel despite the segregation of functionality. The core elements of the novel architecture are validated both individually and in combination through a proof-of-concept prototype. The proposed integral architecture supports the development and in particular the integration of mixed-critical and interactive systems. 629.2
148	Performance Analysis of kNN Query Processing on large datasets using CUDA & Pthreads : comparing between CPU & GPU Kalakuntla, Preetham January 2017 (has links) Telecom companies do a lot of analytics to provide consumers a better service and to stay in competition. These companies accumulate special big data that has potential to provide inputs for business. Query processing is one of the major tool to fire analytics at their data. Traditional query processing techniques which follow in-memory algorithm cannot cope up with the large amount of data of telecom operators. The k nearest neighbour technique(kNN) is best suitable method for classification and regression of large datasets. Our research is focussed on implementation of kNN as query processing algorithm and evaluate the performance of it on large datasets using single core, multi-core and on GPU. This thesis shows an experimental implementation of kNN query processing on single core CPU, Multicore CPU and GPU using Python, P- threads and CUDA respectively. We considered different levels of sizes, dimensions and k as inputs to evaluate the performance. The experiment shows that GPU performs better than CPU single core on the order of 1.4 to 3 times and CPU multi-core on the order of 5.8 to 16 times for different levels of inputs. GPU Multicore CPU Parallel computing Performance Single core CPU kNN Query Processing Telecommunications Telekommunikation
149	SYSTEMS SUPPORT FOR DATA ANALYTICS BY EXPLOITING MODERN HARDWARE Hongyu Miao (11751590) 03 December 2021 (has links) <p>A large volume of data is continuously being generated by data centers, humans, and the internet of things (IoT). In order to get useful insights, such enormous data must be processed in time with high throughput, low latency, and high accuracy. To meet such performance demands, a large body of new hardware is being shipped by vendors, such as multi-core CPUs, 3D-stacked memory, embedded microcontrollers, and other accelerators.</p><br><p>However, traditional operating systems (OSes) and data analytics frameworks, the key layer that bridges high-level data processing applications and low-level hardware, fails to deliver these requirements due to quickly evolving new hardware and increases in explosion of data. For instance, general OSes are not aware of the unique characters and demands of data processing applications. Data analytics engines for stream processing, e.g., Apache Spark and Beam, always add more machines to deal with more data but leave every single machine underutilized without fully exploiting underlying hardware features, which leads to poor efficiency. Data analytics frameworks for machine learning inference on IoT devices cannot run neural networks that exceed SRAM size, which disqualifies many important use cases.</p><br><p>In order to bridge the gap between the performance demands of data analytics and the new features of emerging hardware, in this thesis we exploit runtime system designs for high-level data processing applications by exploiting low-level modern hardware features. We study two important data analytics applications, including real-time stream processing and on-device machine learning inference, on three important hardware platforms across the Cloud and the Edge, including multicore CPUs, hybrid memory system combining 3D-stacked memory and general DRAM, and embedded microcontrollers with limited resources. </p><br><p>In order to speed up and enable the two data analytics applications on the three hardware platforms, this thesis contributes three related research projects. In project StreamBox, we exploit the parallelism and memory hierarchy of modern multicore hardware on single machines for stream processing, achieving scalable and highly efficient performance. In project StreamBox-HBM, we exploit hybrid memories to balance bandwidth and latency, achieving memory scalability and highly efficient performance. StreamBox and StreamBox-HBM both offer orders of magnitude performance improvements over the prior state of the art, opening up new applications with higher data processing needs. In project SwapNN, we investigate a system solution for microcontrollers (MCUs) to execute neural networks (NNs) inference out-of-core without losing accuracy, enabling new use cases and significantly expanding the scope of NN inference on tiny MCUs. </p><br><p>We report the system designs, system implementations, and experimental results. Based on our experience in building above systems, we provide general guidance on designing runtime systems across hardware/software stack for a wider range of new applications on future hardware platforms.</p><div><br></div> Computer Engineering Computer systems Machine learning Multicore CPU High bandwidth memory Hybrid memory Microcontrollers Data analytics
150	Paralelní genetický algoritmus pro vícejádrové systémy / The Parallel Genetic Algorithm for Multicore Systems Vrábel, Lukáš January 2010 (has links) Genetický algoritmus je optimalizačná metóda zameraná na efektívne hľadanie riešení rozličných problémov. Je založená na princípe evolúcie a prirodzeného výberu najschopnejších jedincov v prírode. Keďže je táto metóda výpočtovo náročná, bolo vymyslených veľa spôsobov na jej paralelizáciu. Avšak väčšina týchto metód je z historických dôvodov založená na superpočítačoch alebo rozsiahlych počítačových systémoch. Moderný vývoj v oblasti informačných technológií prináša na trh osobných počítačov stále lacnejšie a výkonnejšie viacjadrové systémy. Táto práca sa zaoberá návrhom nových metód paralelizácie genetického algoritmu, ktoré sa snažia naplno využiť možnosti práve týchto počítačových systémov. Tieto metódy sú následne naimplementované v programovacom jazyku C za využitia knižnice OpenMP určenej na paralelizáciu. Implementácia je následne použitá na experimentálne ohodnotenie rozličných charakteristík každej z prezentovaných metód (zrýchlenie oproti sekvenčnej verzii, závislosť konvergencie výsledných hodnôt od miery paralelizácie alebo od vyťaženia procesoru, ...). V poslednej časti práce sú prezentované porovnania nameraných hodnôt a závery vyplývajúce z týchto meraní. Následne sú prediskutované možné vylepšenia daných metód vyplývajúce z týchto záverov, ako aj možnosti spracovania väčšieho množstva charakteristík na presnejšie ohodnotenie efektivity paralelizácie genetických algoritmov.

Search results