1 |
Cluster load balancing using process migrationNuttall, Mark Patrick January 1997 (has links)
No description available.
|
2 |
An Efficient Numerical Scheme for Simulating Unidirectional Irregular Waves Based on a Hybrid Wave ModelJia, Dongxing 1984- 14 March 2013 (has links)
The Unidirectional Hybrid Wave Model (UHWM) predicts irregular wave kinematics and pressure accurately in comparison with its linear counterpart and modification, especially near the free surface. Hence, in using the Morrison equation it has been employed in the computation of wave loads on a moored floating structure, such as Spar or TLP (Tension Leg Platform), which can be approximated by a slender body or a number of slender components. Dr. Jun Zhang, with his former and current graduate students, have developed a numerical code, known as COUPLE, over the past two decades, simulating 6 Degree Of Freedom (DOF) motions of a moored floating structures interacting with waves, current and wind. COUPLE employs UHWM as a module for computing wave loads on a floating structure. However, when the duration of simulating the wave-structure interaction is long, say 3 hours (typically required by the offshore industry for extreme storm cases), the computation time of using UHWM increases significantly in comparisons with the counterpart based upon linear wave theory.
This study is to develop a numerical scheme which may significantly reduce the CPU time in the use of UHWM and COUPLE. In simulating irregular (or random) waves following a JONSWAP spectrum of a given cut off frequency, the number of free wave components in general grows linearly with the increase of the simulation duration. The CPU time for using a linear spectral method to simulate irregular waves is roughly proportion to N2, where N is the number of free wave components used in simulating irregular waves, while that for using a nonlinear wave model, such as UHWM, it is roughly proportional to N3. Therefore, to reduce the CPU time, the total simulation duration is divided into a number of segments. However, due to the nature of Fast Fourier Transform (FFT), the connection between the two neighboring surface elevations segments is likely discontinuous. To avoid the discontinuity, an overlapped duration between the two neighboring segments is adopted. For demonstration, a free-wave spectrum is input to COUPLE for simulating the 6 DOF motions of a floating 5-MW wind turbine installed on an OC3 moored Spar and tensions in the mooring lines. It is shown that the CPU time for the above simulation for duration of 2048 seconds is reduced from more than16 hours when the irregular wave elevation and kinematics are calculated without dividing into segments to less than three hours when those are calculated by dividing into five segments.
|
3 |
CPU product line lifecycles: econometric duration analysis using parametric and non-parametric estimatorsFisher, Mischa 30 April 2018 (has links)
This thesis provides a comprehensive history of the statistical background and uses of survival analysis, and then applies econometric duration analysis to examine the lifecycles of product lines within the microprocessor industry. Using data from Stanford University's CPUDB, covering Intel and AMD processors introduced between 1971 and 2014, the duration analysis uses both parametric and nonparametric estimators to construct survival and hazard functions for estimated product line lifetimes within microprocessor product families. The well-known and widely applied non-parametric Kaplan-Meier estimator is applied on both the entire sample as a whole, and segmented estimate that considers product line lifecycles of Intel and AMD separately, with median survival time of 456 days. The parametric duration analysis uses both the semi-parametric Cox proportional hazard model, and the fully parametric accelerated failure time model across the Weibull, Exponential and Log-Logistic distributions, which find modest association between higher clock speed and transistor count on diminishing expected time in the marketplace for microprocessors, while the number of cores and other attributes have no predictive power over expected survival times. It is expected that the transistor count and clock speed of a given processor's negative effect on expected duration, likely captures the co-trending of growth in transistor count with a larger marketplace and broader product categories. / Graduate
|
4 |
Performance Analysis of kNN on large datasets using CUDA & Pthreads : Comparing between CPU & GPUKankatala, Sriram January 2015 (has links)
Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor search is a challenging task in multidimensional data. The important factors in nearest neighbor search kNN are searching speed and accuracy. Implementation of kNN on GPU is an ongoing research from last few years, focusing on improving the performance of kNN. By considering these aspects, our research has been started and found a gap in this research area. This master thesis shows effective and efficient parallelism on multi-core of CPU and GPU to compare the performance with single core CPU. This paper shows an experimental implementation of kNN on single core CPU, Mutli-core CPU and GPU using C, Pthreads and CUDA respectively. We considered different levels of inputs (size, dimensions) to evaluate the performance. The experiment shows the GPU outperforms for kNN when compared to CPU single core with a factor of approximately 5.8 to 16 and CPU multi-core with a factor of approximately 1.2 to 3 for different levels of inputs.
|
5 |
Ανάπτυξη διαδικτυακής εφαρμογής για την εξομοίωση της λειτουργίας ενός επεξεργαστή με διευρυμένο ρεπερτόριο εντολώνΚάτσενος, Χρήστος 26 July 2012 (has links)
Αντικείμενο της παρούσας εργασίας είναι η εξομοίωση της λειτουργίας ενός επεξεργαστή με διευρυμένο ρεπερτόριο εντολών μέσω του διαδικτύου. Αναλυτικότερα αναπτύχθηκε ένα διαδικτυακό εργαλείο που δέχεται την αλληλουχία των εντολών και στην συνέχεια αφού εκτελέσει έλεγχο αυτών, συμβολομεταφράζει και αποθηκεύει τον κώδικα που προκύπτει στην μνήμη της εφαρμογής.
Αφού όλα τα παραπάνω έχουν ολοκληρωθεί και το πρόγραμμα έχει ελεγχθεί και αποθηκευθεί στην μνήμη τότε το γραφικό τμήμα της εφαρμογής αναλαμβάνει να εξομοιώσει την λειτουργία του επεξεργαστή, προβάλλοντας τις τιμές που παίρνουν οι καταχωρητές την κάθε στιγμή καθώς και την αλληλουχία των δεδομένων που μεταφέρονται από και προς αυτούς. / The purpose of this study is to simulate the operation of a processor with an expanded set of instructions through the Internet. In more details, it has been developed an online tool that accepts a sequence of instructions and then do various checks on them, compiles them and stores the code in application’s memory.
As long as all this has been completed and the program has been tested and stored in memory, the simulation part of the application starts, in order to simulate the operation of the processor, providing registers with the correct value each time and the sequence of data transferred to and from them.
|
6 |
Efektivní paralelizace evolučních algoritmů / Effective Parallelization of Evolutionary AlgorithmsZáboj, Petr January 2020 (has links)
Evolutionary algorithms are often used for hard optimization problems. Solving time of this problems is long, so we want effective parallelization for this algorithms. Unfortunately, classical methods of parallelization do not work very well in cases where the individual evaluations of problems take significantly different times. In this project, we will try to extend the evolu- tionary algorithm with interleaving generations, which offers a better use of computational resources than classical parallel evolutionary algorithms, by speculative evaluation. Speculative evaluation means the estimation of an in- dividual's fitness function and the prediction of the following steps, which we will use later in the case of a correct estimate. We compare the algorithm with speculative evaluation with the original version in a series of experi- ments and we look at the effect of accuracy in the speculative step on the performance of the algorithm. 1
|
7 |
Simulation des réseaux à grande échelle sur les architectures de calculs hétérogènes / Large-scale network simulation over heterogeneous computing architectureBen Romdhanne, Bilel 16 December 2013 (has links)
La simulation est une étape primordiale dans l'évolution des systèmes en réseaux. L’évolutivité et l’efficacité des outils de simulation est une clef principale de l’objectivité des résultats obtenue, étant donné la complexité croissante des nouveaux des réseaux sans-fils. La simulation a évènement discret est parfaitement adéquate au passage à l'échelle, cependant les architectures logiciel existantes ne profitent pas des avancées récente du matériel informatique comme les processeurs parallèle et les coprocesseurs graphique. Dans ce contexte, l'objectif de cette thèse est de proposer des mécanismes d'optimisation qui permettent de surpasser les limitations des approches actuelles en combinant l’utilisation des ressources de calcules hétérogène. Pour répondre à la problématique de l’efficacité, nous proposons de changer la représentation d'événement, d'une représentation bijective (évènement-descripteur) à une représentation injective (groupe d'évènements-descripteur). Cette approche permet de réduire la complexité de l'ordonnancement d'une part et de maximiser la capacité d'exécuter massivement des évènements en parallèle d'autre part. Dans ce sens, nous proposons une approche d'ordonnancement d'évènements hybride qui se base sur un enrichissement du descripteur pour maximiser le degré de parallélisme en combinons la capacité de calcule du CPU et du GPU dans une même simulation. Les résultats comparatives montre un gain en terme de temps de simulation de l’ordre de 100x en comparaison avec une exécution équivalente sur CPU uniquement. Pour répondre à la problématique d’évolutivité du système, nous proposons une nouvelle architecture distribuée basée sur trois acteurs. / The simulation is a primary step on the evaluation process of modern networked systems. The scalability and efficiency of such a tool in view of increasing complexity of the emerging networks is a key to derive valuable results. The discrete event simulation is recognized as the most scalable model that copes with both parallel and distributed architecture. Nevertheless, the recent hardware provides new heterogeneous computing resources that can be exploited in parallel.The main scope of this thesis is to provide a new mechanisms and optimizations that enable efficient and scalable parallel simulation using heterogeneous computing node architecture including multicore CPU and GPU. To address the efficiency, we propose to describe the events that only differs in their data as a single entry to reduce the event management cost. At the run time, the proposed hybrid scheduler will dispatch and inject the events on the most appropriate computing target based on the event descriptor and the current load obtained through a feedback mechanisms such that the hardware usage rate is maximized. Results have shown a significant gain of 100 times compared to traditional CPU based approaches. In order to increase the scalability of the system, we propose a new simulation model, denoted as general purpose coordinator-master-worker, to address jointly the challenge of distributed and parallel simulation at different levels. The performance of a distributed simulation that relies on the GP-CMW architecture tends toward the maximal theoretical efficiency in a homogeneous deployment. The scalability of such a simulation model is validated on the largest European GPU-based supercomputer
|
8 |
Impact of Network Address Translation on Router PerformanceChugh, Sarabjeet Singh 22 October 2003 (has links)
Network Address Translation (NAT) is a method by which Internet Protocol (IP) addresses are translated from one group to another, in a manner transparent to the end users. It translates the source and destination addresses and ports in the Internet Protocol datagram. There are several benefits for using NAT. NAT can be installed without changes to hosts or routers, it allows reuse of globally routable addresses, it facilitates easy migration or addition of new networks and it provides a method to keep private network addresses hidden from the outside world.
NAT, however, is a processor- and memory-intensive activity for any device that implements it. This is because NAT involves reading from and writing to the header and payload information of every IP packet to do the address translation, a performance-intensive activity. It causes an increase in Central Processing Unit (CPU) and memory utilization and may impair throughput and increase the latency experienced by a packet. Thus, understanding the performance impact of NAT on a network device (in particular, a router) becomes an important factor when implementing NAT in any live network.
This thesis aims to understand and quantify the impact of Network Address Translation on a network router by doing a series of performance tests after specifying the performance parameters to measure and, then, clearly defining the performance testing methodology that is used to study each of the performance parameters. After a discussion of previous research, the measurement system and subsequent measurement results are described. / Master of Science
|
9 |
GPU computing of Heat EquationsZhang, Junchi 29 April 2015 (has links)
There is an increasing amount of evidence in scientific research and industrial engineering indicating that the graphic processing unit (GPU) has a higher efficiency and a stronger ability over CPUs to process certain computations. The heat equation is one of the most well-known partial differential equations with well-developed theories, and application in engineering. Thus, we chose in this report to use the heat equation to numerically solve for the heat distributions at different time points using both GPU and CPU programs. The heat equation with three different boundary conditions (Dirichlet, Neumann and Periodic) were calculated on the given domain and discretized by finite difference approximations. The programs solving the linear system from the heat equation with different boundary conditions were implemented on GPU and CPU. A convergence analysis and stability analysis for the finite difference method was performed to guarantee the success of the program. Iterative methods and direct methods to solve the linear system are also discussed for the GPU. The results show that the GPU has a huge advantage in terms of time spent compared with CPU in large size problems.
|
10 |
Portning och utökning av processor för ASIC och FPGA / Port and extension of processor for ASIC and FPGAOlsson, Martin January 2009 (has links)
<p>In this master thesis, the possibilities of customizing a low-cost microprocessor with the purpose of replacing an existing microprocessor solution are investigated. A brief survey of suitable processors is carried out wherein a replacement is chosen. The replacement processor is then analyzed and extended with accelerators in order to match set requirements.</p><p>The result is a port of the processor Lattice Mico32 for the FPGA curcuit Xilinx Virtex-5 which replaces an earlier solution using Xilinx MicroBlaze. To reach the set requirements, accelerators for floating point arithmetics and FIR filtering have been developed. The toolchain for the processor has been modified to support the addition of accelerated floating point arithmetics.</p><p>A final evaluation of the presented solution shows that it fulfills the set requirements and constitutes a functional replacement for the previous solution.</p>
|
Page generated in 0.0189 seconds