Spelling suggestions: "subject:"gustafsson's law"" "subject:"gustavsson's law""
1 |
CLUE: A Cluster Evaluation ToolParker, Brandon S. 12 1900 (has links)
Modern high performance computing is dependent on parallel processing systems. Most current benchmarks reveal only the high level computational throughput metrics, which may be sufficient for single processor systems, but can lead to a misrepresentation of true system capability for parallel systems. A new benchmark is therefore proposed. CLUE (Cluster Evaluator) uses a cellular automata algorithm to evaluate the scalability of parallel processing machines. The benchmark also uses algorithmic variations to evaluate individual system components' impact on the overall serial fraction and efficiency. CLUE is not a replacement for other performance-centric benchmarks, but rather shows the scalability of a system and provides metrics to reveal where one can improve overall performance. CLUE is a new benchmark which demonstrates a better comparison among different parallel systems than existing benchmarks and can diagnose where a particular parallel system can be optimized.
|
2 |
Efektivní implementace genetického algoritmu s využitím vícejádrových CPU / The Efficient Implementation of the Genetic Algorithm Using Multicore ProcessorsKouřil, Miroslav January 2010 (has links)
This diploma thesis deals with acceleration of advanced genetic algorithm. For implementation, discrete and continuos versions of UMDA genetic algorithm were chosen. The main part of the acceleration is the utilization of SSE instruction set. Using this set, the functions for calculating fitness and new population sampling were accelerated in particular. Then the pseudorandom number generator that also uses SSE instruction set was implemented. The discrete algorithm reached the speed of up to 4,6 after this implementation. Finally, the algorithms were modified so that the system OpenMP could be used, which enables the running of blocks of code in more threads. The continuous version of algorithm is not convenient for parallelization, because computational complexity of that algorithm is low. In comparison, the discrete versions of algorithm are really appropriate for parallelization. Both the implemented versions reached the total acceleration of up to 4,9 and 7,2.
|
Page generated in 0.0475 seconds