1 |
Performance Analysis of kNN on large datasets using CUDA & Pthreads : Comparing between CPU & GPUKankatala, Sriram January 2015 (has links)
Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor search is a challenging task in multidimensional data. The important factors in nearest neighbor search kNN are searching speed and accuracy. Implementation of kNN on GPU is an ongoing research from last few years, focusing on improving the performance of kNN. By considering these aspects, our research has been started and found a gap in this research area. This master thesis shows effective and efficient parallelism on multi-core of CPU and GPU to compare the performance with single core CPU. This paper shows an experimental implementation of kNN on single core CPU, Mutli-core CPU and GPU using C, Pthreads and CUDA respectively. We considered different levels of inputs (size, dimensions) to evaluate the performance. The experiment shows the GPU outperforms for kNN when compared to CPU single core with a factor of approximately 5.8 to 16 and CPU multi-core with a factor of approximately 1.2 to 3 for different levels of inputs.
|
2 |
Optimum Microarchitectures for Neuromorphic AlgorithmsWang, Shu January 2011 (has links)
No description available.
|
3 |
Detec??o de ataques por controle de fluxo de execu??o em sistemas embarcados : uma abordagem em hardwarePorcher, Bruno Casagrande 31 March 2017 (has links)
Submitted by PPG Engenharia El?trica (engenharia.pg.eletrica@pucrs.br) on 2017-10-31T20:06:42Z
No. of bitstreams: 1
Disserta??o_Bruno_Porcher.pdf: 1303682 bytes, checksum: 7373a048257a3ff06aef91a7ce86e8d8 (MD5) / Approved for entry into archive by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-11-16T15:58:29Z (GMT) No. of bitstreams: 1
Disserta??o_Bruno_Porcher.pdf: 1303682 bytes, checksum: 7373a048257a3ff06aef91a7ce86e8d8 (MD5) / Made available in DSpace on 2017-11-16T16:03:02Z (GMT). No. of bitstreams: 1
Disserta??o_Bruno_Porcher.pdf: 1303682 bytes, checksum: 7373a048257a3ff06aef91a7ce86e8d8 (MD5)
Previous issue date: 2017-03-31 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES / The use of computer systems is present in the most diverse environments in which we
live and this rapid expansion exposes the population to the most diverse types of vulnerabilities.
Errors in critical systems may result in financial loss, data theft, environmental
damage or may even endanger human life. This work was developed to make it more
difficult for malicious users to take control of computer systems. A hardware-based approach
to detect attacks that cause changes to the program?s execution flow, but with no
necessity for change or even the previous knowledge of the source code, is proposed. Thus,
the purpose of this work is to ensure reliability by guaranteeing that the software running
on the processor is equal to the one developed by the programmer. To do so, checkpoints
in the program verify the integrity of the system during its execution. The proposed technique
is implemented by software, which is responsible for the prior identification of the
basic blocks using the critical system?s executable file. A dedicated hardware, denominated
Watchdog is instantiated with the processor of the critical system and validated
by functional simulations. The technique?s evaluation was carried out by executing in the
soft-core version of a LEON3 processor for code sections, which are capable of exposing
the database?s, denominated Common Vulnerabilities and Exposures (CVE, 2017). The
experimental results demonstrate th proposed technique?s efficiency in terms of corruption
detection in code snippets and in the execution of snippets of code not belonging to the
original program. Finally, an analysis of the main overheads is performed. / O uso de sistemas computacionais est? presente nos mais diversos meios em que vivemos
e esta r?pida expans?o acaba por expor a popula??o aos mais diversos tipos de vulnerabilidades.
Um erro em um sistema cr?tico poder? ocasionar desde preju?zos financeiros,
roubo de dados, danos ambientais ou at? riscos ? vida humana. Este trabalho foi desenvolvido
visando dificultar que a tomada do controle de sistemas computacionais seja feita
por um usu?rio mal intencionado. Este trabalho prop?e uma abordagem em hardware
para detec??o de ataques que eventualmente causem qualquer tipo de altera??o no fluxo
de execu??o de um programa, com o diferencial de que n?o ? necess?rio nenhuma altera??o,
nem mesmo o conhecimento pr?vio do c?digo-fonte do programa da aplica??o em
quest?o. Assim, em mais detalhes, o objetivo deste trabalho ? assegurar a confiabilidade
de um sistema cr?tico, do ponto de vista em que o software que foi desenvolvido pelo
programador seja id?ntico ao software que est? sendo executado no processador. Para isso
ser?o utilizados pontos de checagem no programa capazes de verificarem a integridade
do sistema durante a sua execu??o. A t?cnica proposta foi implementada atrav?s de um
software que por sua vez, ? respons?vel pela identifica??o pr?via dos blocos b?sicos atrav?s
do arquivo execut?vel do sistema cr?tico, e um hardware dedicado, denominado de
Watchdog, instanciado juntamente com o processador do sistema cr?tico. Para a valida??o
da t?cnica proposta foram realizadas simula??es funcionais e a avalia??o foi realizada a
partir de trechos de c?digos capazes de exporem vulnerabilidades da base de dados, denominada
Common Vulnerabilities and Exposures (CVE, 2017). A valida??o e a avalia??o
foram realizadas adotando uma vers?o soft-core do processador LEON3. Os resultados
experimentais demonstraram a efici?ncia da t?cnica proposta em termos de detec??o de
corrup??es em trechos de c?digo e na execu??o de trechos de c?digo n?o pertencentes
ao programa original. Finalmente uma analise das principais penalidades agregadas pela
t?cnica foram realizadas.
|
4 |
Performance Analysis of kNN Query Processing on large datasets using CUDA & Pthreads : comparing between CPU & GPUKalakuntla, Preetham January 2017 (has links)
Telecom companies do a lot of analytics to provide consumers a better service and to stay in competition. These companies accumulate special big data that has potential to provide inputs for business. Query processing is one of the major tool to fire analytics at their data. Traditional query processing techniques which follow in-memory algorithm cannot cope up with the large amount of data of telecom operators. The k nearest neighbour technique(kNN) is best suitable method for classification and regression of large datasets. Our research is focussed on implementation of kNN as query processing algorithm and evaluate the performance of it on large datasets using single core, multi-core and on GPU. This thesis shows an experimental implementation of kNN query processing on single core CPU, Multicore CPU and GPU using Python, P- threads and CUDA respectively. We considered different levels of sizes, dimensions and k as inputs to evaluate the performance. The experiment shows that GPU performs better than CPU single core on the order of 1.4 to 3 times and CPU multi-core on the order of 5.8 to 16 times for different levels of inputs.
|
5 |
Improving Last-Level Cache Performance in Single and Multi-Core ProcesssorsManikanth, R January 2013 (has links) (PDF)
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a timely fashion remains one of the key performance bottlenecks in current systems. With increasing core counts, this problem aggravates and the memory access latency becomes even more critical in multi-core systems. Thus the Last Level Cache (LLC) is of particular importance as any miss experienced at the LLC translates into a costly off-chip memory access. A combination of on-chip caches and prefacers is used to hide the off-chip memory access latency. While a hierarchy of caches focus on exploiting locality by retaining useful data, prefacers complement them by initating data accesses early for blocks that are likely to be accessed in future. In the first half of this thesis, we focus on improving the performance of LLC in single-core processors by focusing on prefetchers. In the case of multi-cores, the LLC is shared across many cores and therefore by many programs running on them. Thus, in the second half of this thesis, we focus on novel and efficient management mechanisms for shared LLC to improve the performance of programs running on the various cores.
Prefetchers observe a training stream of primary misses in the cache and rely on the regularity present in them to predict and avoid future misses. We quantify the regularity present in the training stream using the information theoretic measure of entropy and study the impact on regularity by extending the training stream to include secondary misses and accesses. We also consider triggering prefetches on secondary misses. We _nd that the extended histories are more regular in general and it is beneficial to trigger prefetches on secondary misses also. However, the best design choice varies on a per-benchmark and prefetcher basis, necessitating a dynamic approach to identify the best prefetcher configuration. We propose an inexpensive bloom filter based dynamic mechanism to identify the best performing prefetch design point at run time. The adaptive scheme improves the performance in terms of Instructions Per Cycle (IPC) by 4.6% on average over a baseline prefetcher. This performance improvement is achieved along with a reduction in memory traffic requirements.
It is well known that aggressive prefetching can harm performance due to increased contention for memory bandwidth and cache pollution. Prefetchers treat all loads as equal and try to eliminate as many misses as possible while certain (static) load instructions are known to be more performance critical. As our second contribution, we propose Focused Prefetching, a generic mechanism to introduce performance awareness in prefetching. We identify that a small number of static loads, referred to as Loads Incurring Majority of Commit Stalls (LIMCOS), account for a majority of the commit stalls in processors. We propose simple history-based classifier to identify LIMCOS with high accuracy. We use the classifier to focus the prefetching efforts on LIMCOS. This is achieved in a generic prefetcher-agnostic fashion by filtering the history used by the prefetchers. Focused Prefetching improves performance in terms of IPC by 9.8% for a set of memory intensive SPEC2000 workloads. This performance gain is achieved along with a reduction in memory traffic and an improvement in prefetch accuracy.
In the second part of the thesis, we focus on improving the performance of shared caches in multi-core systems. Last level caches are affected by a lack of temporal locality in the access stream as the locality gets filtered out by caches above it. In the case of multi-cores, the interleaving of accesses from the various cores further adds to the problem. To overcome this, we propose a PC-Centric Next-Use Aware Cache Organization (NUcache) for shared caches in multi-cores, with an ability to retain a subset of cache blocks longer. This is achieved by a logical partitioning of the associative ways of a cache set into Main Ways and Deli Ways. While all the blocks have access to the Main Ways, blocks that are likely to be accessed in the near future (with shorter Next-Use distance) are candidates to be retained longer in the Deli Ways to eliminate future misses. We make use of the fact that a small number of PCs, referred to as delinquent PCs, bring in a majority of the cache blocks and learn the Next-Use characteristic of blocks brought in by them. We propose an intelligent cost-benefit based PC-selection mechanism to identify the best set of delinquent PCs that should have access to the Deli Ways to maximize the cache hits. Performance evaluation reveals that NUcache improves the performance (in terms of Average Normalized Turnaround Time, ANTT) of multi-programmed workloads by 6.2%, 13.9%, 15.8% and 19.6% in dual, quad, eight and sixteen core machines respectively. NUcache also performs better than some of the state-of-the-art cache partitioning mechanisms.
The last part of the thesis deals with effective shared cache management in multi-core systems to achieve various performance objectives. Explicitly controlling the shared cache occupancy of competing applications is a flexible and practical way to achieve a variety of high level performance goals. Existing solutions control cache occupancy at a coarser granularity, do not scale well to large core counts and, in some cases, lack the flexibility to support a variety of performance goals. To overcome this, we propose Probabilistic Shared Cache Management (PriSM), a framework to manage the cache occupancy of different cores at cache block granularity by controlling their eviction probabilities. The proposed framework requires only simple hardware changes to implement, can scale to larger core count and is flexible enough to support a variety of performance goals like hit-maximization, fairness and QoS. PriSM with Hit-Maximization improves the performance (of multi-programmed workloads) in terms of ANTT by 16.5%, 18.7% and 12.7% over baseline LRU in eight, sixteen and thirty two core machines respectively.
|
6 |
Analyse électrothermique des faisceaux de câbles de puissance : une contribution à l’optimisation des systèmes de distribution d’énergie dans les véhicules routiers à propulsion électrique / Electro-thermal analysis of power cable harnesses : a contribution for the optimization of energy distribution systems in road vehicles with electric drivesHolyk, Christophe 04 December 2014 (has links)
Dans le contexte de la montée des préoccupations écologiques, le développement de véhicules de transports routiers s’oriente vers le développement de véhicules moins polluants à entraînement électrique comme les Véhicules Électriques Hybrides (VEHs) et les Véhicules tout Électriques (VEs). Avec l’augmentation des puissances requises et la réduction de l’espace disponible, la gestion thermique devient une préoccupation de plus en plus importante lors du développement des composants embarqués comme les moteurs/générateurs électriques, onduleurs, batteries et faisceaux électriques. Parmi eux, le faisceau électrique de puissance qui est typiquement composé de câbles électriques, de connecteurs et de boîtes de distribution de puissance ne peut être conçu de manière appropriée qu’à la suite d’une analyse thermique, électrique, chimique et mécanique approfondie.Cette thèse est écrite pour contribuer à l’optimisation de la conception électrothermique de faisceaux de câbles par des simulations afin de réduire la quantité de tests expérimentaux nécessaires pour leur développement. Des modèles théoriques pour la prédiction du comportement électrique et thermique de câbles électriques et des faisceaux de câbles sont passés en revue et adaptés aux exigences automobiles. La validation est accomplie en comparant les résultats de simulation avec ceux d’analyses élément finie (FEA) et de données de mesure. Une partie majeure de cette thèse aborde la simulation thermique de câbles électriques de longueur infinie suspendus dans l’air, prenant en compte les dépendances en température des résistances de conducteurs et la non-linéarité du coefficient de transfert thermique total à la surface du câble. L’influence des courants de blindage et de courants arbitraires dans les conducteurs sur la montée en température des câbles électriques est considérée dans des circuits thermiques équivalents et illustré par des exemples pratiques. / In the context of growing ecological concerns, the development of road transport vehicles moves itself toward the development of less polluting vehicles with electric drives such as Hybrid Electric Vehicles (HEVs) and full Electric Vehicles (EVs). With rising power requirements and reducing available space, thermal management is becoming an increasingly important concern during development of on-board vehicle components such as electric motor(s)/generator(s), power inverter(s), battery pack(s) and cable harnesses. Among them, the cable harness which is typically composed of electrical cables, connectors and power distribution boxes can only be designed properly after a detailed thermal, electrical, chemical and mechanical analysis.This thesis is written to contribute to the optimization of the electro-thermal design of cable harnesses through simulations and reduce the amount of experimental testing needed during their development. Theoretical models for the prediction of the electrical and thermal behavior of electric cables and cable harnesses are reviewed and adapted for automotive requirements. Validation is accomplished by comparing simulation results with Finite Element Analysis (FEA) and measurement data. A major part of this thesis addresses the thermal simulation of electrical cables of infinite length installed in air, taking into account the temperature dependencies of conductor resistances and non-linearity of the total heat transfer coefficient at the cable surface. The influence of shielding currents and arbitrary current loads in the conductors on the temperature rises within electric cables is also considered using thermal ladder networks and illustrated by practical examples. Because shielding currents in vehicles are not only caused by induced currents but also by functional electrical currents generated by low-voltage power sources, new theoretical studies and experimental observations for the estimation of these currents as a function of the vehicle electrical architecture and circuit characteristics are presented. A primary finding reveals that keeping the resistance of grounding connections low compared to that of the shielding connections is an appropriate but expensive means for limiting the transfer of functional currents in the shielding circuits. Finally, a complete and modular model for the prediction of transient temperatures along the length of cable harness sections is developed and validated based on the outcomes of all previous findings.
|
Page generated in 0.0763 seconds