• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 553
  • 32
  • Tagged with
  • 585
  • 585
  • 585
  • 45
  • 37
  • 36
  • 33
  • 31
  • 30
  • 29
  • 29
  • 29
  • 25
  • 25
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
301

Parallelizing the Method of Conjugate Gradients for Shared Memory Architectures

Löf, Henrik January 2004 (has links)
Solving Partial Differential Equations (PDEs) is an important problem in many fields of science and engineering. For most real-world problems modeled by PDEs, we can only approximate the solution using numerical methods. Many of these numerical methods result in very large systems of linear equations. A common way of solving these systems is to use an iterative solver such as the method of conjugate gradients. Furthermore, due to the size of these systems we often need parallel computers to be able to solve them in a reasonable amount of time. Shared memory architectures represent a class of parallel computer systems commonly used both in commercial applications and in scientific computing. To be able to provide cost-efficient computing solutions, shared memory architectures come in a large variety of configurations and sizes. From a programming point of view, we do not want to spend a lot of effort optimizing an application for a specific computer architecture. We want to find methods and principles of optimizing our programs that are generally applicable to a large class of architectures. In this thesis, we investigate how to implement the method of conjugate gradients efficiently on shared memory architectures. We seek algorithmic optimizations that result in efficient programs for a variety of architectures. To study this problem, we have implemented the method of conjugate gradients using OpenMP and we have measured the runtime performance of this solver on a variety of both uniform and non-uniform shared memory architectures. The input data used in the experiments come from a Finite-Element discretization of the Maxwell equations in three dimensions of a fighter-jet geometry. Our results show that, for all architectures studied, optimizations targeting the memory hierarchy exhibited the largest performance increase. Improving the load balance, by balancing the arithmetical work and minimizing the number of global barriers showed to be of lesser importance. Overall, bandwidth minimization of the iteration matrix showed to be the most efficient optimization. On non-uniform architectures, proper data distribution showed to be very important. In our experiments we used page migration to improve the data distribution during runtime. Our results indicate that page migration can be very efficient if we can keep the migration cost low. Furthermore, we believe that page migration can be introduced in a portable way into OpenMP in the form of a directive with a affinity-on-next-touch semantic.
302

Techniques for finite element methods on modern processors

Ljungkvist, Karl January 2015 (has links)
In this thesis, methods for efficient utilization of modern computer hardware for numerical simulation are considered. In particular, we study techniques for speeding up the execution of finite-element methods. One of the greatest challenges in finite-element computation is how to efficiently perform the the system matrix assembly efficiently in parallel, due to its complicated memory access pattern. The main difficulty lies in the fact that many entries of the matrix are being updated concurrently by several parallel threads. We consider transactional memory, an exotic hardware feature for concurrent update of shared variables, and conduct benchmarks on a prototype processor supporting it. Our experiments show that transactions can both simplify programming and provide good performance for concurrent updates of floating point data. Furthermore, we study a matrix-free approach to finite-element computation which avoids the matrix assembly. Motivated by its computational properties, we implement the matrix-free method for execution on graphics processors, using either atomic updates or a mesh coloring approach to handle the concurrent updates. A performance study shows that on the GPU, the matrix-free method is faster than a matrix-based implementation for many element types, and allows for solution of considerably larger problems. This suggests that the matrix-free method can speed up execution of large realistic simulations. / UPMARC / eSSENCE
303

Scientific computing on hybrid architectures

Holm, Marcus January 2013 (has links)
Modern computer architectures, with multicore CPUs and GPUs or other accelerators, make stronger demands than ever on writers of scientific code. As a rule of thumb, the fastest, most efficient program consists of labor-intensive code written by expert programmers for a certain application on a particular computer. This thesis deals with several algorithmic and technical approaches towards effectively satisfying the demand for high-performance parallel programming without incurring such a high cost in expert programmer time. Effective programming is accomplished by writing performance-portable code where performance-critical functionality is provided either by external software or at least a balance between maintainability/generality and efficiency. / UPMARC / eSSENCE
304

Numerical Simulations of Linear Stochastic Oscillators : driven by Wiener and Poisson processes

Berglund, André January 2017 (has links)
The main component of this essay is the numerical analysis of stochastic differential equations driven by Wiener and Poisson processes. In order to do this, we focus on two model problems, the geometric Brownian motion and the linear stochastic oscillator, studied in the literature for stochastic differential equations only driven by a Wiener process. This essay covers theoretical as well as numerical investigations of jump - or more specifically, Poisson - processes and how they influence the above model problems. / Den huvudsakliga komponenten av uppsatsen är en numerisk analys av stokastiska differentialekvationer drivna av Wiener- och Poisson-processer. För att göra det så fokuserar vi på två modellproblem, den geometriska Brownska rörelsen samt den linjära stokastiska oscillatorn, studerade i litteratur för stokastiska differentialekvationer som bara drivs av en Wiener-process.Den här uppsatsen täcker teoretiska samt numeriska undersökningar av hopp - eller mer specifikt, Poisson - processer och hur de påverkar de ovan nämnda modellproblemen.
305

Convergence rates of adaptive algorithms for stochastic and partial differential equations

von Schwerin, Erik January 2005 (has links)
No description available.
306

Conservative high order collocation methods for nonlinear Schrödinger equations

Riera, Pau January 2021 (has links)
In this thesis, we investigate the numerical solution of time-dependent nonlinear Schrödinger equations (more specifically, the Gross-Pitaevskii equation) that appear in the modeling of Bose-Einstein condensates. Since the model is known to conserve important physical invariants, such as mass and energy of the condensate, our goal is to study the importance of reproducing the conservation on the discrete level. The reliability of conservative, compared to non-conservative, methods shall be studied through high order collocation methods for the time discretization and finite element-based space discretizations. In particular, this includes symplectic discontinuous Galerkin time-stepping methods, as well as Continuous Petrov-Galerkin methods. The methods shall be tested for a problem with a known analytical solution, namely two interacting solitons in 1D. This problem is a suitable choice due to its high sensitivity to oscillations of the energy and difficulty to approximate for large time scales.
307

Ocean rogue wave analysis for the development of safer navigation systems. : A Thesis submitted to the University of Gävle for the degree of Bachelor of Mathematics

Manzetti, Sergio January 2023 (has links)
Rogue waves are unexpectedly high waves of 2.5X the significant wave height and which occur in nearly all phases of nature, from  oceans, to fiber-optic cables and atmospheric air-masses. In the ocean, rogue waves pose a significant danger to shipping and fishing vessels and have been found to reach 27.8 meters in height, and attain velocities of up to 100 km/hr. Mechanisms on naval structures for the real-time prediction of rogue waves are currently non-existent, and their development requires a) a good equation for simulating rogue waves and b) a deep study of the wave-trains of rogue waves. In this work, we consider the time-series of four rogue wave trains collected from various sources, including the U.S. Coastal Data Information Program. The method of study encompasses the development piece-wise constant functions from the rogue wave readings by laser/buoys. We use these piece-wise constant functions to form regularized functions as Fourier series, which we consider as weak solutions to the stationary nonlinear Schrödinger equation. The resulting force functions are quantified and compared to physical data of the rogue wave trains. The results show that we obtain a good correlation between the norms of the obtained force functions and the rogue wave height $H_{max}$ and the wave-velocity. The methods developed in the study build a potentially useful foundation for the development of a prediction model in a future study.
308

Novel Algorithms for Optimal Transport via Splitting Methods

Lindbäck, Jacob January 2023 (has links)
This thesis studies how the Douglas–Rachford splitting technique can be leveraged for scalable computational optimal transport (OT). By carefully splitting the problem, we derive an algorithm with several advantages. First, the algorithm enjoys global convergence rates comparable to the state-of-the-art while benefiting from accelerated local rates. In contrast to other methods, it does not depend on hyperparameters that can cause numerical instability. This feature is particularly advantageous when low-precision floating points are used or if the data is noisy. Moreover, the updates can efficiently be carried out on GPUs and, therefore, benefit from the high degree of parallelization achieved via GPU computations. Furthermore, we show that the algorithm can be extended to handle a broad family of regularizers and constraints while enjoying the same theoretical and numerical properties. These factors combined result in a fast algorithm that can be applied to large-scale OT problems and regularized versions thereof, which we illustrate in several numerical experiments. In the first part of the main body of the thesis, we present how Douglas-Rachford splitting can be adapted for the unregularized OT problem to derive a fast algorithm. We present two global convergence guarantees for the resulting algorithm: a 1/k-ergodic rate and a linear rate. We also show that the stopping criteria of the algorithm can be computed on the fly with virtually no extra costs. Further, we specify how a GPU kernel can be efficiently implemented to carry out the operations needed for the algorithm. To show that the algorithm is fast, accurate, and robust, we run a series of numerical benchmarks that demonstrate the advantages of our algorithm. We then extend the algorithm to handle regularized OT using sparsity-promoting regularizers. The generalized algorithm will enjoy the same sublinear rate derived for the unregularized formulation. We also complement the global rate with local guarantees, establishing that, under non-degeneracy assumptions on the solution, the algorithm will identify the correct sparsity pattern of the solution in finitely many iterations. When the sparsity pattern is identified, a faster linear rate typically dominates. We also specify how to extend to the GPU implementation and the stopping criteria to handle regularized OT, and we subsequently specify how to backpropagate through the solver. We end this part of the thesis by presenting some numerical results, including performance on quadratically regularized OT and group Lasso regularized OT for domain adaptation, showing a substantial improvement compared to the state-of-the-art. In the last part of the thesis, we provide a more detailed analysis of the local behavior of the algorithm when applied to unregularized OT and quadratically regularized OT. We subsequently outline how to extend this analysis to several other sparsity-promoting regularizers. In the former two cases, we show that the update that constitutes the algorithm converges to a linear operator in finitely many iterations. By analyzing the spectral properties of these linear operators, we gain insights into the local behavior of the algorithm, and specifically, these results suggest how to tune stepsizes to obtain better local rates. / Denna avhandling behandlar hur Douglas–Rachford-splittning kan tillämpas för skalbara beräkningar av optimal transport (OT). Genom en noggrann splittning av problemet härleder vi en algoritm med flera fördelar. För det första åtnjuter algoritmen en global konvergenshastighet som är jämförbara med populära OT-lösare, samtidigt som den drar nytta av accelererade lokalahastigheter. Till skillnad från andra metoder är den inte beroende av hyperparametrar som kan orsaka numerisk instabilitet. Den här egenskapen är särskilt fördelaktig när lågprecisionsaritmetik används eller när data innehåller mycket brus. Uppdateringarna som algoritmen baseras på kan effektivt utföras på GPU:er och dra nytta av dess parallellberäkningar. Vi visar också att algoritmen kan utökas för att hantera en rad regulariseriseringar och bivillkor samtidigt som den åtnjuter liknande teoretiska och numeriska egenskaper. Tillsammans resulterar dessa faktorer i en snabb algoritm som kan tillämpas på storskaliga OT-problem samt flera av dess regulariserade varianter, vilket vi visar i flera numeriska experiment. I den första delen av avhandlingen presenterar vi hur Douglas-Rachford-splittning kan anpassas för det oregulariserade OT-problemet för att härleda en snabb algoritm. För den resulterande algoritmen presenterar vi två globala konvergensgarantier: en 1/k-ergodisk och en linjär konvergenshastighet. Vi presenterar också hur stoppkriterierna för algoritmen kan beräknas utan vidare extra kostnader. Dessutom specificerar vi hur en GPU-kärna kan implementeras för att effektivt utföra de operationer som algoritmen baseras på. För att visa att algoritmen är snabb, exakt och robust utför vi ett flertal numeriska experiment som påvisar flera fördelar över jämförbara algoritmer. Därefter utökar vi algoritmen för att hantera regulariserad OT med s.k. sparsity-promoting regularizers. Den generaliserade algoritmen åtnjuter samma sublinjära konvergenshastighet som härleddes för den oregulariserade fallet. Vi kompletterar också garantierna genom att tillhandahålla lokala garantier genom att fastställa att givet svaga antaganden om lösningen kommer algoritmen att identifiera den korrekta lösningens gleshetsstuktur inom ett begränsat antal iterationer. När glesheten identifierats dominerar typiskt sett en snabbare linjär konvergenshastighet. Vi specificerar också hur man utökar till GPU-kärnan och resultaten av stoppkriterierna för att hantera regulariserad OT, och vi specificerar sedan hur man differentiera genom lösaren. Vi avslutar den här delen av avhandlingen genom att presentera några numeriska resultat för kvadratiskt reglerad OT och group Lasso-reglerad OT för domänanpassning, vilket visar en betydande förbättring jämfört med de mest populära metoderna för dessa tillämpningar. I den sista delen av avhandlingen ger vi en mer detaljerad analys av algoritmens lokala beteende när den tillämpas på oregulerisad OT och kvadratiskt reglerad OT. Vi föreslår även sätt att studera flera andra fallet. I de två första fallen visar vi att uppdateringen som utgör algoritmen konvergerar till en linjär operator inom ett begränsat antal iterationer. Genom att analysera de spektrala egenskaperna hos dessa linjära operatorer får vi ytterligare insikter i algoritmens lokala beteende, och specifikt indikerar dessa resultat hur man kan justera steglängden för att uppnå ännu bättre konvergenshastigheter. / <p>QC 20231123</p>
309

Rigid Body Simulation of MacroMolecules

Svebilius, Christian January 2008 (has links)
A computer model for simulating experiments done on surface organelles(so called pili) on the Escherichia Coli bacteria have been developedand implemented. The objective of the computer simulation wasto mimic the results of experiments done with optical tweezers and to displaya graphical, three dimensional, representation of these experiments.The experiments measured the force response to elongation of pili.This force response can be divided into three regions of elongation, regionI, II and III, each with different properties. Region I is characterized by aconstant increase in force, in region II the pilus is unfolded under constantforce, and in region III the force versus elongation curve assumes a nontrivialshape with increasing force. The pili are also able to retract to itsoriginal length giving a similar force response curve. The computer modelshould be able to handle all these properties. The developed model couldhandle elongation in region I and II. In region III, the force response givenby the simulation differed from the one given by the experiments.
310

Balancing Accuracy and Complexity: Predictive Models for Proactive Scaling of Financial Workloads in Cloud Environments

Eriksson, Axel January 2024 (has links)
Predicting the future is an essential element in many fields, as it can lead to significant cost savings due to more efficient resource allocation. Modern Cloud systems are composed of large numbers of processing and storage resources, giving the potential to dynamically scale the resources provided to workloads throughout the day. If the amount of resources required is known in advance, proactive scaling can be implemented, with the aim that only the required amount of resources are allocated, resulting in optimal performance and cost efficiency. This study aims to accurately and efficiently forecast the workload of a financial system, characterized by high frequency, noise, and unpredictability. Based on the dataset describing the historical workload for individual tasks within the market system, a model within each category, statistical, machine learning, and artificial neural network was implemented. The models within each category with the most promising qualities for such data were, SARIMA, XGBoost, and LSTM. These 3 models were compared on different scenarios with a focus on the trade-off between accuracy and complexity. An optimal model has a low complexity which means efficient but still has a high accuracy. The result of this study showed that the workload within this specific system can be predicted using the 3 models. The optimal model varies depending on scaling requirements, for short-term high-accuracy predictions LSTM is the best with an R2 score of 0.92, but also the most complex. XGBoost is less complex than LSTM and has an overall better accuracy on different scenarios. SARIMA, though simpler, exhibits the best accuracy for long-term predictions with an R2 score of 0.75. This study concludes that it is possible to predict certain financial workloads in advance, paving the way for further research into proactive scaling in such scenarios.

Page generated in 0.2978 seconds