Spelling suggestions: "subject:"multithreaded"" "subject:"multithreading""
1 |
OpenMPBench : An Open-Source Benchmark for Multiprocessor Based Embedded Systems / OpenMPBench : en Open-Source riktmärke för multiprocessor baserade inbyggda systemLiang, Yuchen, Iqbal, Syed Muhammad Zeeshan January 2010 (has links)
It is a new and open-source benchmark for multiprocessor based embedded system. It comprises a set of parallel implementations for seven classical algorithms that cover different computing features of general-purpose processor. The performance data including tables and figures is provided for guiding the potential users to evaluate the design of multiprocessor based embedded system. The parallel implementations for seven applications that cover four categories are shown according to the category: Automation and Industry Control * Bitcount * SUSAN * BASICMATH Network * Patricia * Dijkstra Office * Stringsearch Security * SHA Among them, Bitcount and Dijkstra involve more than one parallel application implemented for different functions or using different strategies. Bitcount consists three parallel applications, parallel Bitcnt_1, parallel Bitstring and parallel Bitcnts, that implemented bit counting with different strategy. Three parallel applications implemented for Dijkstra. One is for all-pairs shortest paths problem. Another two are for solving single-source shortest paths problem using single queue strategy and multiple queue strategy respectively. Stringsearch consists of Pratt-Boyer-Moore, Case-sensitive Boyer-Moore-Horspool, Case-Insensitive Boyer-Moore-Horspool, and Boyer-Moore-Horspool (Case-insensitive with accented character translation) implementations. Source code of sequential versions of these applications download from Mibench as well as the standard output based on x86-linux. For OpenMPBench, all parallel applications have been implemented in ANCI C language using POSIX threads. All libraries related to implementations are based on GNU standard library. Development environment is in UBUNTU 9.04 with 2.6.28-generic Linux kernel, GCC 4.2.4 compiler, and Emacs 22.1 editor. On the basis of current hardware condition, a workstation with 8 processors, shipped with UBUNTU 4.2.4, is selected for experiment environment. UBUNTU is a free GNU Linux version that offers all GNU standard library and GCC has been installed by default. In conclusion, we consider this experiment environment is available to simulate the multiprocessor based on embedded systems. / Det är en ny och öppen källkod riktmärke för multiprocessor baserade inbyggda system. Det innehåller en rad parallella implementationer i sju klassiska algoritmer som täcker olika datorer funktioner i allmänt bruk processor. Uppgifter om prestanda inklusive tabeller och siffror ges för att styra potentiella användare att utvärdera utformningen av multiprocessor baserade inbyggda system. De parallella implementeringar för sju ansökningar som omfattar fyra kategorier visas beroende på vilken kategori: Automation och industri Control * Bitcount * SUSAN * BASICMATH Nätverk * Patricia * Dijkstra Office * Stringsearch Säkerhet * SHA Bland dem, Bitcount och Dijkstra omfattar mer än en parallell ansökan genomförs för olika funktioner eller med hjälp av olika strategier. Bitcount består tre parallella program, parallell Bitcnt_1, parallell Bitstring och parallella Bitcnts, som genomförs bit räknar med olika strategi. Tre parallella ansökningar genomförs för Dijkstra. Den ena är för all-par kortaste stigar problem. Ytterligare två är för att lösa enda källa kortaste stigar problemet, använder en kö strategi och flera kö strategi respektive. Stringsearch består av Pratt-Boyer-Moore, skiftlägeskänslig Boyer-Moore-Horspool, skiftlägesokänslig Boyer-Moore-Horspool, och Boyer-Moore-Horspool (små bokstäver med accenttecken översättning) implementationer. Källkod sekventiell versioner av dessa program att hämta från Mibench liksom standard produktion baserad på x86-linux. För OpenMPBench har alla parallella ansökningar har genomförts i ANCI C-språk med POSIX trådar. Alla bibliotek i samband med implementationer är baserat på GNU standard bibliotek. Utvecklingsmiljö i Ubuntu 9.04 med 2.6.28-generic Linuxkärnan, GCC 4.2.4 kompilator och Emacs 22,1 redaktör. På grundval av nuvarande hårdvara skick, en arbetsstation med 8 processorer, som levereras med Ubuntu 4.2.4, har valts för experiment miljön. Ubuntu är ett gratis GNU Linux-version som kan erbjuda alla GNU Standard bibliotek och GCC har installerats som standard. Sammanfattningsvis anser vi att detta experiment miljön är tillgänglig för att simulera multiprocessor baserade på inbyggda system. / Yuchen Liang: phone no: 8641182120823 6-3-1, No. 44, Huabei Road Ganduan, Ganjingzi District, Dalian City, 116023, Liaoning Province, P. R. China Syed Muhammad Zeeshan Iqbal: phone no: 92415510275 Muhallah Gurunanak Pura, Street No: 7, House No:211, Faisalabad, Pakistan
|
2 |
Real time image processing : algorithm parallelization on multicore multithread architecture / Imagerie temps réel : parallélisation d’algorithmes sur plate-forme multi-processeursMahmoudi, Ramzi 13 December 2011 (has links)
Les caractéristiques topologiques d'un objet sont fondamentales dans le traitement d'image. Dansplusieurs applications, notamment l'imagerie médicale, il est important de préserver ou de contrôlerla topologie de l'image. Cependant la conception de telles transformations qui préservent à la foi la topologie et les caractéristiques géométriques de l'image est une tache complexe, en particulier dans le cas du traitement parallèle.Le principal objectif du traitement parallèle est d'accélérer le calcul en partagent la charge de travail à réaliser entre plusieurs processeurs. Si on approche cet objectif sous l'angle de la conception algorithmique, les stratégies du calcul parallèle exploite l'ordre partiel des algorithmes, désigné également par le parallélisme naturel qui présent dans l'algorithme et qui fournit deux principales sources de parallélisme : le parallélisme de données et le parallélisme fonctionnelle.De point de vue conception architectural, il est essentiel de lier l'évolution spectaculaire desarchitectures parallèles et le traitement parallèle. En effet, si les stratégies de parallèlisation sont devenues nécessaire, c'est grâce à des améliorations considérables dans les systèmes de multitraitement ainsi que la montée des architectures multi-core. Toutes ces raisons font du calculeparallèle une approche très efficace. Dans le cas des machines à mémoire partagé, il existe un autreavantage à savoir le partage immédiat des données qui offre plus de souplesse, notamment avec l'évolution du système d'interconnexion entre processeurs, dans la conception de ces stratégies etl'exploitation du parallélisme de données et le parallélisme fonctionnel.Dans cette perspective, nous proposons une nouvelle stratégie de parallèlisation, baptisé SD&M(Split, Distribute and Merge) stratégie qui couvrent une large classe d'opérateurs topologiques.SD&M a été développée afin de fournir un traitement parallèle de tout opérateur basée sur latransformation topologique. Basé sur cette stratégie, nous avons proposé une série d'algorithmestopologiques parallèle (nouvelle version ou version adapté). Nos principales contributions sont :(1)Une nouvelle approche pour calculer la ligne de partage des eaux basée sur ‘MSF transform'.L'algorithme proposé est parallèle, préserve la topologie, n'a pas besoin d'extraction préalable deminima et adaptée pour les machines parallèle à mémoire partagée. Il utilise la même approchede calcule de flux proposé par Jean Cousty et il ne nécessite aucune étape de tri, ni l'utilisationd'une file d'attente hiérarchique. Cette contribution a été précédé par une étude intensive desalgorithmes de calcule de la ligne de partage des eaux dans le cas discret.(2)Une étude similaire sur les algorithmes d'amincissement a été menée. Elle concerne seizealgorithmes d'amincissement qui préservent la topologie. En sus des critères de performance,nous somme basé sur deux critères qualitative pour les comparer et les classés. Après cetteclassification, nous avons essayé d'obtenir de meilleurs résultats grâce avec une version adaptéede l'algorithme d'amincissement proposé par Michel Couprie.(3)Une méthode de calcul amélioré pour le lissage topologique grâce à la combinaison du calculparallèle de la distance euclidienne (en utilisant l'algorithme Meijster) et l'amincissement/épaississement parallèle (en utilisant la version adaptée de l'algorithme de Couprie déjàmentionné). / Topological features of an object are fundamental in image processing. In many applications,including medical imaging, it is important to maintain or control the topology of the image. Howeverthe design of such transformations that preserve topology and geometric characteristics of the inputimage is a complex task, especially in the case of parallel processing.Parallel processing is applied to accelerate computation by sharing the workload among multipleprocessors. In terms of algorithm design, parallel computing strategies profits from the naturalparallelism (called also partial order of algorithms) present in the algorithm which provides two main resources of parallelism: data and functional parallelism. Concerning architectural design, it is essential to link the spectacular evolution of parallel architectures and the parallel processing. In effect, if parallelization strategies become necessary, it is thanks to the considerable improvements in multiprocessing systems and the rise of multi-core processors. All these reasons make multiprocessing very practical. In the case of SMP machines, immediate sharing of data provides more flexibility in designing such strategies and exploiting data and functional parallelism, notably with the evolution of interconnection system between processors.In this perspective, we propose a new parallelization strategy, called SD&M (Split Distribute andMerge) strategy that cover a large class of topological operators. SD&M has been developed in orderto provide a parallel processing for many topological transformations.Based on this strategy, we proposed a series of parallel topological algorithm (new or adaptedversion). In the following we present our main contributions:(1)A new approach to compute watershed transform based on MSF transform, that is parallel,preserves the topology, does not need prior minima extraction and suited for SMP machines.Proposed algorithm makes use of Jean Cousty streaming approach and it does not require any sortingstep, or the use of any hierarchical queue. This contribution came after an intensive study of allexisting watershed transform in the discrete case.(2)A similar study on thinning transform was conducted. It concerns sixteen parallel thinningalgorithms that preserve topology. In addition to performance criteria, we introduce two qualitativecriteria, to compare and classify them. New classification criteria are based on the relationshipbetween the medial axis and the obtained homotopic skeleton. After this classification, we tried toget better results through the proposal of a new adapted version of Couprie's filtered thinningalgorithm by applying our strategy.(3)An enhanced computation method for topological smoothing through combining parallelcomputation of Euclidean Distance Transform using Meijster algorithm and parallel Thinning–Thickening processes using the adapted version of Couprie's algorithm already mentioned.
|
3 |
DirectX 12: Performance Comparison Between Single- and Multithreaded Rendering when Culling Multiple LightsJ'lali, Yousra January 2020 (has links)
Background. As newer computers are constructed, more advanced and powerful hardware come along with them. This leads to the enhancement of various program attributes and features by corporations to get ahold of the hardware, hence, improving performance. A relatively new API which serves to facilitate such logic, is Microsoft DirectX 12. There are numerous opinions about this specific API, and to get a slightly better understanding of its capabilities with hardware utilization, this research puts it under some tests. Objectives. This article’s aim is to steadily perform tests and comparisons in order to find out which method has better performance when using DirectX 12; single-threading, or multithreading. For performance measurements, the average CPU and GPU utilizations are gathered, as well as the average FPS and the speed of which it takes to perform the Render function. When all results have been collected, the comparison between the methods are assessed. Methods. In this research, the main method which is being used is experiments. To find out the performance differences between the two methods, they must undergo different trials while data is gathered. There are four experiments for the single-threaded and multithreaded application, respectively. Each test varies in the number of lights and objects that are rendered in the simulation environment, gradually escalading from 50; then 100; 1000; and lastly, 5000. Results. A similar pattern was discovered throughout the experiments, with all of the four tests, where the multithreaded application used considerably more of the CPU than the single-threaded version. And despite there being less simultaneous work done by the GPU in the one-threaded program, it appeared to be using more GPU utilization than multithreading. Furthermore, the system with many threads tended to perform the Render function faster than its counterpart, regardless of which test was executed. Nevertheless, both applications never differed in FPS. Conclusion. Half of the hypotheses stated in this article were contradicted after some unexpected tun of events. It was believed that the multithreaded system would utilize less of the CPU and more of the GPU. Instead, the outcome contradicted the hypotheses, thus, opposing them. Another theory believed that the system with multiple threads would execute the Render function faster than the other version, a hypothesis that was strongly supported by the results. In addition to that, more objects and lights inserted into the scene did increased the applications’ utilization in both the CPU and GPU, which also supported another hypothesis. In conclusion, the multithreaded program performs faster but still has no gain in FPS compared to single-threading. The multithreaded version also utilizes more CPU and less GPU
|
4 |
Real time image processing : algorithm parallelization on multicore multithread architectureMahmoudi, Ramzi, Mahmoudi, Ramzi 13 December 2011 (has links) (PDF)
Topological features of an object are fundamental in image processing. In many applications,including medical imaging, it is important to maintain or control the topology of the image. Howeverthe design of such transformations that preserve topology and geometric characteristics of the inputimage is a complex task, especially in the case of parallel processing.Parallel processing is applied to accelerate computation by sharing the workload among multipleprocessors. In terms of algorithm design, parallel computing strategies profits from the naturalparallelism (called also partial order of algorithms) present in the algorithm which provides two main resources of parallelism: data and functional parallelism. Concerning architectural design, it is essential to link the spectacular evolution of parallel architectures and the parallel processing. In effect, if parallelization strategies become necessary, it is thanks to the considerable improvements in multiprocessing systems and the rise of multi-core processors. All these reasons make multiprocessing very practical. In the case of SMP machines, immediate sharing of data provides more flexibility in designing such strategies and exploiting data and functional parallelism, notably with the evolution of interconnection system between processors.In this perspective, we propose a new parallelization strategy, called SD&M (Split Distribute andMerge) strategy that cover a large class of topological operators. SD&M has been developed in orderto provide a parallel processing for many topological transformations.Based on this strategy, we proposed a series of parallel topological algorithm (new or adaptedversion). In the following we present our main contributions:(1)A new approach to compute watershed transform based on MSF transform, that is parallel,preserves the topology, does not need prior minima extraction and suited for SMP machines.Proposed algorithm makes use of Jean Cousty streaming approach and it does not require any sortingstep, or the use of any hierarchical queue. This contribution came after an intensive study of allexisting watershed transform in the discrete case.(2)A similar study on thinning transform was conducted. It concerns sixteen parallel thinningalgorithms that preserve topology. In addition to performance criteria, we introduce two qualitativecriteria, to compare and classify them. New classification criteria are based on the relationshipbetween the medial axis and the obtained homotopic skeleton. After this classification, we tried toget better results through the proposal of a new adapted version of Couprie's filtered thinningalgorithm by applying our strategy.(3)An enhanced computation method for topological smoothing through combining parallelcomputation of Euclidean Distance Transform using Meijster algorithm and parallel Thinning-Thickening processes using the adapted version of Couprie's algorithm already mentioned.
|
Page generated in 0.0419 seconds