21 |
Modèles de programmation et d'exécution pour les architectures parallèles et hybrides. Applications à des codes de simulation pour la physique. / Programming models and execution models for parallel and hybrid architectures. Application to physics simulations.Ospici, Matthieu 03 July 2013 (has links)
Nous nous intéressons dans cette thèse aux grandes architectures parallèles hybrides, c'est-à-dire aux architectures parallèles qui sont une combinaison de processeurs généraliste (Intel Xeon par exemple) et de processeurs accélérateur (GPU Nvidia). L'exploitation efficace de ces grappes hybrides pour le calcul haute performance est au cœur de nos travaux. L'hétérogénéité des ressources de calcul au sein des grappes hybrides pose de nombreuses problématiques lorsque l'on souhaite les exploiter efficacement avec de grandes applications scientifiques existantes. Deux principales problématiques ont été traitées. La première concerne le partage des accélérateurs pour les applications MPI et la seconde porte sur la programmation et l'exécution concurrente de code entre CPU et accélérateur. Les architectures hybrides sont très hétérogènes : en fonction des architectures, le ratio entre le nombre d'accélérateurs et le nombre de coeurs CPU est très variable. Ainsi, nous avons tout d'abord proposé une notion de virtualisation d'accélérateur, qui permet de donner l'illusion aux applications qu'elles ont la capacité d'utiliser un nombre d'accélérateurs qui n'est pas lié au nombre d'accélérateurs physiques disponibles dans le matériel. Un modèle d'exécution basé sur un partage des accélérateurs est ainsi mis en place et permet d'exposer aux applications une architecture hybride plus homogène. Nous avons également proposé des extensions aux modèles de programmation basés sur MPI / threads afin de traiter le problème de l'exécution concurrente entre CPU et accélérateurs. Nous avons proposé pour cela un modèle basé sur deux types de threads, les threads CPU et accélérateur, permettant de mettre en place des calculs hybrides exploitant simultanément les CPU et les accélérateurs. Dans ces deux cas, le déploiement et l'exécution du code sur les ressources hybrides est crucial. Nous avons pour cela proposé deux bibliothèques logicielles S_GPU 1 et S_GPU 2 qui ont pour rôle de déployer et d'exécuter les calculs sur le matériel hybride. S_GPU 1 s'occupant de la virtualisation, et S_GPU 2 de l'exploitation concurrente CPU -- accélérateurs. Pour observer le déploiement et l'exécution du code sur des architectures complexes à base de GPU, nous avons intégré des mécanismes de traçage qui permettent d'analyser le déroulement des programmes utilisant nos bibliothèques. La validation de nos propositions a été réalisée sur deux grandes application scientifiques : BigDFT (simulation ab-initio) et SPECFEM3D (simulation d'ondes sismiques). Nous les avons adapté afin qu'elles puissent utiliser S_GPU 1 (pour BigDFT) et S_GPU 2 (pour SPECFEM3D). / We focus on large parallel hybrid architectures based on a combination of general processors (eg Intel Xeon) and accelerators (Nvidia GPU). Using with efficiency these hybrid clusters for high performance computing is central in our work. The heterogeneity of computing resources in hybrid clusters leads to many issues when we want to use large scientific applications on it. Two main issues were addressed in this thesis. The first one concerns the sharing of accelerators for MPI applications and the second one focuses on programming and concurrent execution of application between CPUs and accelerators. Hybrid architectures are very heterogeneous: for each cluster, the ratio between the number of accelerators and the number of CPU cores can be different. Thus, we first propose a concept of accelerator virtualization, which allows applications to view an architecture in which the number of accelerators is not related to the number of physical accelerators. An execution model based on the sharing of accelerators is proposed. We also propose extensions to the programming model based on MPI + threads to address the problem of concurrent execution between CPUs and accelerators. We propose a system based on two types of threads (CPU and accelerator threads) to implement hybrid calculations simultaneously exploiting the CPU and accelerators model. In both cases, the deployment and the execution of code on hybrid resources is critical. Consequently, we propose two software libraries, called S_GPU 1 and S_GPU 2, designed to deploy and perform calculations on the hybrid hardware. S_GPU 1 deals with virtualization and S_GPU 2 allows concurrent operations on CPUs and accelerators. To observe the deployment and the execution of code on complex hybrid architectures, we integrated trace mechanisms for analyzing the progress of the programs using our libraries. The validation of our proposals has been carried out on two large scientific applications: BigDFT (ab-initio simulation) and SPECFEM3D (simulation of seismic waves).
|
22 |
Performance Analysis of kNN Query Processing on large datasets using CUDA & Pthreads : comparing between CPU & GPUKalakuntla, Preetham January 2017 (has links)
Telecom companies do a lot of analytics to provide consumers a better service and to stay in competition. These companies accumulate special big data that has potential to provide inputs for business. Query processing is one of the major tool to fire analytics at their data. Traditional query processing techniques which follow in-memory algorithm cannot cope up with the large amount of data of telecom operators. The k nearest neighbour technique(kNN) is best suitable method for classification and regression of large datasets. Our research is focussed on implementation of kNN as query processing algorithm and evaluate the performance of it on large datasets using single core, multi-core and on GPU. This thesis shows an experimental implementation of kNN query processing on single core CPU, Multicore CPU and GPU using Python, P- threads and CUDA respectively. We considered different levels of sizes, dimensions and k as inputs to evaluate the performance. The experiment shows that GPU performs better than CPU single core on the order of 1.4 to 3 times and CPU multi-core on the order of 5.8 to 16 times for different levels of inputs.
|
23 |
Adaptation du calcul de la Transformée de Fourier Rapide sur une architecture mixte CPU/GPU intégrée / Adaptation of the Fast Fourier Transform processing on hybride integrated CPU/GPU architectureBergach, Mohamed Amine 02 October 2015 (has links)
Les architectures multi-cœurs Intel Core (IvyBridge, Haswell,...) contiennent à la fois des cœurs CPU généralistes (4), mais aussi des cœurs dédiés GPU embarqués sur cette même puce (16 et 40 respectivement). Dans le cadre de l'activité de la société Kontron (qui participe à ce financement de nature CIFRE) un objectif important est de calculer efficacement sur cette architecture des tableaux et séquences de transformées de Fourier rapides (FFT), comme par exemple on en trouve dans des applications radar. Alors que des bibliothèques natives (mais propriétaires) existent chez Intel pour les CPU, rien de tel n'est actuellement disponible pour la partie GPU. L'objectif de la thèse était donc de définir le placement efficace de modules FFT, en étudiant au niveau théorique la forme optimale permettant de regrouper des étages de calcul d'une telle FFT en fonction de la localité des données sur un cœur de calcul unique. Ce choix a priori permet d'espérer une efficacité des traitements, en ajustant la taille de la mémoire disponible à celles des données nécessaires. Ensuite la multiplicité des cœurs reste exploitable pour disposer plusieurs FFT calculées en parallèle, sans interférence (sauf contention du bus entre CPU et GPU). Nous avons obtenu des résultats significatifs, tant au niveau de l'implantation d'une FFT (1024 points) sur un cœur CPU SIMD, exprimée en langage C, que pour l'implantation d'une FFT de même taille sur un cœur GPU SIMT, exprimée alors en OpenCL. De plus nos résultats permettent de définir des règles pour synthétiser automatiquement de telles solutions, en fonction uniquement de la taille de la FFT son nombre d'étages plus précisément), et de la taille de la mémoire locale pour un coeur de calcul donné. Les performances obtenues sont supérieures à celles de la bibliothèque native Intel pour CPU), et démontrent un gain important de consommation sur GPU. Tous ces points sont détaillés dans le document de thèse. Ces résultats devraient donner lieu à exploitation au sein de la société Kontron. / Multicore architectures Intel Core (IvyBridge, Haswell…) contain both general purpose CPU cores (4) and dedicated GPU cores embedded on the same chip (16 and 40 respectively). As part of the activity of Kontron (the company partially funding this CIFRE scholarship), an important objective is to efficiently compute arrays and sequences of fast Fourier transforms (FFT) such as one finds in radar applications, on this architecture. While native (but proprietary) libraries exist for Intel CPU, nothing is currently available for the GPU part.The aim of the thesis was to define the efficient placement of FFT modules, and to study theoretically the optimal form for grouping computing stages of such FFT according to data locality on a single computing core. This choice should allow processing efficiency, by adjusting the memory size available to the required application data size. Then the multiplicity of cores is exploitable to compute several FFT in parallel, without interference (except for possible bus contention between the CPU and the GPU). We have achieved significant results, both in the implementation of an FFT (1024 points) on a SIMD CPU core, expressed in C, and in the implementation of a FFT of the same size on a GPU SIMT core, then expressed in OpenCL. In addition, our results allow to define rules to automatically synthesize such solutions, based solely on the size of the FFT (more specifically its number of stages), and the size of the local memory for a given computing core. The performances obtained are better than the native Intel library for CPU, and demonstrate a significant gain in consumption on GPU. All these points are detailed in the thesis document.
|
24 |
Ghoul: A cache-friendly programming languageTemmel, Adam January 2020 (has links)
Prestanda har historiskt sett alltid varit av betydelse för nyttjandet av datorer, vilket lett till att processorutvecklare har tagit fram flera olika metoder för att klämma ut mer processorkraft från processorn. Ett av dessa koncept är processorns cacheminne, som ansvarar för att lagra data processorn förväntar sig att behöva inom en snar framtid. Om cacheminnet nyttjats väl så innebär detta att processorn can behandla data i en mycket snabbare takt, vilket direkt påverkar prestanda. På grund av detta vill utvecklare gärna skriva kod som nyttjar cacheminnet till fullo. Detta är inte alltid en enkel uppgift, då de programmeringsmönster och beteenden utvecklaren måste anpassa sig till går att anse vara klumpiga för utvecklaren. Den här studioen kommer utforska möjligheterna att sammanfoga cachevänliga programmeringskoncept med utvecklarvänlig syntax, vilket resulterar i ett programmeringsspråk som är både läsbart, skrivbart samt effektivt med hänsyn till processorns cacheminne. För att lyckas med denna uppgift har studier på mönster inom minnesåtkomst, befintliga programmeringsspråk och kompilatordesign genomförts. Slutprodukten är ett språl vid namn Ghoul som implementerar cachevänliga koncept på en syntaktisk nivå, komplett med en fungerande kompilator. Utdata från denna kompilator blev senare prestandatestad för att avgöra huruvida de koncept språket introducerar har en märkbar påverkan på prestandan av program skrivna i detta språk. Testen visade att de tidigare nämnda konceptet direkt visar ett inflytande på hastigheten data kan behandlas i språket. / Performance has historically always been of importance to computing, and as such, processor developers have brought up several different methods to squeeze out more processing power from the processor. One of these concepts is the presence of a CPU cache memory, whose responsibility is to hold data the processor expects it might use soon. To utilize the cache well means that the processor can compute data at a much higher rate, resulting in a direct impact on performance. Therefore, it follows that it is in the developer’s best interest to write code capable of utilizing the cache memory to its full extent. This is not always an easy task however, as the patterns and style of programming the developer may need to adapt to can come of as cumbersome. This study will explore the possibilities of merging cache-friendly programming concepts with a developer-friendly syntax, resulting in a language that is both readable, writeable as well as efficient in regards to the processor cache. In order to accomplish this task, studies of memory access patterns, existing programming languages and compiler design has been performed. The end product is a language called Ghoul which successfully implements cache-friendly concepts on a syntactic level, complete with a working compiler. Outputs from this compiler were later benchmarked to assert that the concepts introduced had a measurable impact on the performance of programs written in Ghoul, showing that the aforementioned syntactical concepts indeed directly influence the speed at which data can be processed.
|
25 |
COMPARISON OF BUDGET BORROWING AND BUDGET ADAPTATION IN HIERARCHICAL SCHEDULING FRAMEWORKWenkai, Wang January 2016 (has links)
System virtualization technology is widely used in computing nowadays. In embedded domain, it is used as a solution to resource sharing among independent applications. One of the areas is to apply virtualization technique to real-time embedded systems with timing constraints. Multi-level adaptive hierarchical scheduling (AdHierSched) framework is a virtualized real-time framework, which runs in the Linux operating system. is virtualized framework has ability to adapt the CPU partition sizes according to their need through monitoring their demand during run-time, which yields more appropriate processor assignment. However, the performance of the virtualized framework is still unknown when the budget borrowing mechanism is enabled. To this end, in this thesis, we explore a new direction for performing the adaptation of CPU partition. We design and implement a budget borrowing mechanism for dynamic adaptation of resource parameters in AdHierSched framework. Extensive simulations are performed in this thesis, which are used to study and compare dierent adaptation mechanisms with our approach. From the results of experiments, we conclude that when the framework works only with budget borrowing controller, the results are not as good as only running a budget controller in the AdHierSched framework. However, while running both of the controllers at the same time, the experiments results are good enough. We also analyze the overhead of the framework at the end of the evaluation. Finally, we conclude the thesis by presenting the possible future work.
|
26 |
On the Prevention of Cache-Based Side-Channel Attacks in a Cloud EnvironmentGodfrey, Michael 26 September 2013 (has links)
As Cloud services become more commonplace, recent works have uncovered vulnerabilities unique to such systems. Specifi cally, the paradigm promotes a risk of information leakage across virtual machine isolation via side-channels. Unlike conventional
computing, the infrastructure supporting a Cloud environment allows mutually dis-
trusting clients simultaneous access to the underlying hardware, a seldom met requirement for a side-channel attack. This thesis investigates the current state of
side-channel vulnerabilities involving the CPU cache, and identifi es the shortcomings
of traditional defenses in a Cloud environment. It explores why solutions to non-Cloud cache-based side-channels cease to work in Cloud environments, and describes
new mitigation techniques applicable for Cloud security. Speci cally, it separates
canonical cache-based side-channel attacks into two categories, Sequential and Parallel attacks, based on their implementation and devises a unique mitigation technique
for each. Applying these solutions to a canonical Cloud environment, this thesis
demonstrates the validity of these Cloud-specifi c, cache-based side-channel mitigation techniques. Furthermore, it shows that they can be implemented, together, as a
server-side approach to improve security without inconveniencing the client. Finally,
it conducts a comparison of our solutions to the current state-of-the-art. / Thesis (Master, Computing) -- Queen's University, 2013-09-25 18:03:47.737
|
27 |
Métodos para caracterização de desempenho de CPUs industriaisNacul, Andre Costi January 2002 (has links)
A caracterização de desempenho e uma atividade fundamental na area de controle industrial. Por se tratar, na maior parte das vezes, de aplicações de tempo real, a caracterização de desempenho torna-se ainda mais necessária e importante. Entretanto, atualmente não há nenhuma metodologia estabelecida para realizar esta caracterização. Não há nem mesmo um conjunto de parâmetros que devem ser avaliados em um equipamento de controle utilizado em processos industriais. Para tentar suprir esta carência, este trabalho apresenta uma proposta de métricas e workloads para serem utilizados na avaliação de desempenho de sistemas de controle baseados em CLPs e CPUs Industriais. O processo de avaliação de desempenho e discutido em todas as etapas, desde o estudo da aplicação at e a execução dos passos de caracterização de desempenho. Para ilustrar a aplicação das métricas, técnicas e procedimentos propostos, são avaliadas três CPUs Industriais, e os resultados s~ao apresentados ao nal do trabalho. Espera-se assim estar contribuindo para o estabelecimento de uma metodologia padronizada para avaliação de desempenho de equipamentos de controle industrial.
|
28 |
Measuring and Analysing Execution Time in an Automotive Real-Time Application / Exekveringstid i ett Realtidssystem för FordonLiljeroth, Henrik January 2009 (has links)
<p>Autoliv has developed the Night Vision system, which is a safety system for use incars to improve the driver’s situational awareness during night conditions. It is areal-time system that is able to detect pedestrians in the traffic environment andissue warnings when there is a risk of collision. The timing behaviour of programsrunning on real-time systems is vital information when developing and optimisingboth hardware and software. As a part of further developing their Night Visionsystem, Autoliv wanted to examine detailed timing behaviour of a specific part ofthe Night Vision algorithm, namely the Tracking module, which tracks detectedpedestrians. Parallel to this, they also wanted a reliable method to obtain timingdata that would work for other parts of that system as well, or even other applications.</p><p>A preliminary study was conducted in order to determine the most suitable methodof obtaining the timing data desired. This resulted in a measurement-based approachusing software profiling, in which the Tracking module was measured usingvarious input data. The measurements were performed on simulated hardwareusing both a cycle accurate simulator and measurement tools from the systemCPU manufacturer, as well as tools implemented specifically to handle input andoutput data.</p><p>The measurements resulted in large amounts of data used to compile performancestatistics. Using different scenarios in the input data, we were able to obtain timingcharacteristics for several typical situations the system may encounter duringoperation. By manipulating the input data we were also able to observe generalbehaviour and achieve artificially high execution times, which serves as indicationson how the system responds to irregular and unexpected input data.</p><p>The method used for collecting timing information was well suited for this particularproject. It provided the possibility to analyse behavior in a better waythan other, more theoretical, approaches would have. The method is also easilyadaptable to other parts of the Night Vision system, or other systems, with onlyminor adjustments to measurement environment and tools.</p>
|
29 |
A Domain Specific DSP Processor / En domänspecifik DSP-processorTell, Eric January 2001 (has links)
<p>This thesis describes the design of a domain specific DSP processor. The thesis is divided into two parts. The first part gives some theoretical background, describes the different steps of the design process (both for DSP processors in general and for this project) and motivates the design decisions made for this processor. </p><p>The second part is a nearly complete design specification. </p><p>The intended use of the processor is as a platform for hardware acceleration units. Support for this has however not yet been implemented.</p>
|
30 |
Utveckling av CPU-hållare och Kabelkanal för sitta/stå skrivbordOlsson, Charlotte, Henrysson, Louise January 2010 (has links)
Examensarbetet har utförts i samarbete med ROL Ergo i Hovslätt som tillverkar höj- och sänkbara bordsstativ. ROL Ergo monterar och säljer endast bordsstativet, inte bordsskivan. Problemet är att de flesta CPU-hållare och kabelkanaler fästs i bordsskivan. En CPU-hållare bär upp datorns chassi och en kabelkanal används för att samla ihop kablarna. Användandet av en CPU-hållare och kabelkanal ger ett mer städat och välkomnande intryck vid kontorsarbetsplatsen. Målet med arbetet är att ta fram ett koncept med CPU-hållare och kabelkanal som går att kombinera med de olika bordsstativen samt CAD-ritningar och en konkurrentanalys. I kapitlet Teoretisk referensram beskrivs de olika metoderna som används i produktutvecklingsprocessen. I kapitlet Genomförande används de olika metoderna för att uppnå ett resultat. Genomförandedelen är uppdelad i Projektplanering, Definiering, Konceptutveckling och Konstruktion enligt produktutvecklingsprocessen. Resultatet består av en CPU-hållare och en kabelkanal som är anpassade till ROL Ergos bordsstativ. Hållaren består av ett bockat rör och två stänger. Kabelkanalen består av två krokar och en kanal. Produkterna är utformade för att passa i befintliga hål i stativet. CPU-hållarens öppna konstruktion gör den användarvänlig då sladdar m.m. är enkla att komma åt. Konstruktionen på kabelkanalen gör den intressant och spännande och då den går att haka ner blir även den användarvänlig.
|
Page generated in 0.0349 seconds