• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 16
  • 4
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 37
  • 37
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Effect Of Jacobian Evaluation On Direct Solutions Of The Euler Equations

Onur, Omer 01 December 2003 (has links) (PDF)
A direct method is developed for solving the 2-D planar/axisymmetric Euler equations. The Euler equations are discretized using a finite-volume method with upwind flux splitting schemes, and the resulting nonlinear system of equations are solved using Newton&amp / #8217 / s Method. Both analytical and numerical methods are used for Jacobian calculations. Numerical method has the advantage of keeping the Jacobian consistent with the numerical flux vector without extremely complex or impractical analytical differentiations. However, numerical method may have accuracy problem and may need longer execution time. In order to improve the accuracy of numerical method detailed error analyses were performed. It was demonstrated that the finite-difference perturbation magnitude and computer precision are the most important parameters that affect the accuracy of numerical Jacobians. A relation was developed for optimum perturbation magnitude that can minimize the error in numerical Jacobians. Results show that very accurate numerical Jacobians can be calculated with optimum perturbation magnitude. The effects of the accuracy of numerical Jacobians on the convergence of flow solver are also investigated. In order to reduce the execution time for numerical Jacobian evaluation, flux vectors with perturbed flow variables are calculated for only related cells. A sparse matrix solver based on LU factorization is used for the solution, and to improve the Jacobian matrix solution some strategies are considered. Effects of different flux splitting methods, higher-order discretizations and several parameters on the performance of the solver are analyzed.
32

Code Optimization on GPUs

Hong, Changwan 30 October 2019 (has links)
No description available.
33

Minimum Cost Distributed Computing using Sparse Matrix Factorization / Minsta-kostnads Distribuerade Beräkningar genom Gles Matrisfaktorisering

Hussein, Seif January 2023 (has links)
Distributed computing is an approach where computationally heavy problems are broken down into more manageable sub-tasks, which can then be distributed across a number of different computers or servers, allowing for increased efficiency through parallelization. This thesis explores an established distributed computing setting, in which the computationally heavy task involves a number of users requesting a linearly separable function to be computed across several servers. This setting results in a condition for feasible computation and communication that can be described by a matrix factorization problem. Moreover, the associated costs with computation and communication are directly related to the number of nonzero elements of the matrix factors, making sparse factors desirable for minimal costs. The Alternating Direction Method of Multipliers (ADMM) is explored as a possible method of solving the sparse matrix factorization problem. To obtain convergence results, extensive convex analysis is conducted on the ADMM iterates, resulting in a theorem that characterizes the limiting points of the iterates as KKT points for the sparse matrix factorization problem. Using the results of the analysis, an algorithm is devised from the ADMM iterates, which can be applied to the sparse matrix factorization problem. Furthermore, an additional implementation is considered for a noisy scenario, in which existing theoretical results are used to justify convergence. Finally, numerical implementations of the devised algorithms are used to perform sparse matrix factorization. / Distribuerad beräkning är en metod där beräkningstunga problem bryts ner i hanterbara deluppgifter, som sedan kan distribueras över ett antal olika beräkningsenheter eller servrar, vilket möjliggör ökad effektivitet genom parallelisering. Denna avhandling undersöker en etablerad distribuerad beräkningssmiljö, där den beräkningstunga uppgiften involverar ett antal användare som begär en linjärt separabel funktion som beräknas över flera servrar. Denna miljö resulterar i ett villkor för tillåten beräkning och kommunikation som kan beskrivas genom ett matrisfaktoriseringsproblem. Dessutom är det möjligt att relatera kostanderna associerade med beräkning och kommunikation till antalet nollskilda element i matrisfaktorerna, vilket gör glesa matrisfaktorer önskvärda. Alternating Direction Method of Multipliers (ADMM) undersöks som en möjlig metod för att lösa det glesa matrisfaktoriseringsproblemet. För att erhålla konvergensresultat genomförs omfattande konvex analys på ADMM-iterationerna, vilket resulterar i ett teorem som karakteriserar de begränsande punkterna för iterationerna som KKT-punkter för det glesa matrisfaktoriseringsproblemet. Med hjälp av resultaten från analysen utformas en algoritm från ADMM-iterationerna, vilken kan appliceras på det glesa matrisfaktoriseringsproblemet. Dessutom övervägs en ytterligare implementering för ett brusigt scenario, där befintliga teoretiska resultat används för att motivera konvergens. Slutligen används numeriska implementeringar av de framtagna algoritmerna för att utföra gles matrisfaktorisering.
34

Finite element modeling of electromagnetic radiation and induced heat transfer in the human body

Kim, Kyungjoo 24 September 2013 (has links)
This dissertation develops adaptive hp-Finite Element (FE) technology and a parallel sparse direct solver enabling the accurate modeling of the absorption of Electro-Magnetic (EM) energy in the human head. With a large and growing number of cell phone users, the adverse health effects of EM fields have raised public concerns. Most research that attempts to explain the relationship between exposure to EM fields and its harmful effects on the human body identifies temperature changes due to the EM energy as the dominant source of possible harm. The research presented here focuses on determining the temperature distribution within the human body exposed to EM fields with an emphasis on the human head. Major challenges in accurately determining the temperature changes lie in the dependence of EM material properties on the temperature. This leads to a formulation that couples the BioHeat Transfer (BHT) and Maxwell equations. The mathematical model is formed by the time-harmonic Maxwell equations weakly coupled with the transient BHT equation. This choice of equations reflects the relevant time scales. With a mobile device operating at a single frequency, EM fields arrive at a steady-state in the micro-second range. The heat sources induced by EM fields produce a transient temperature field converging to a steady-state distribution on a time scale ranging from seconds to minutes; this necessitates the transient formulation. Since the EM material properties depend upon the temperature, the equations are fully coupled; however, the coupling is realized weakly due to the different time scales for Maxwell and BHT equations. The BHT equation is discretized in time with a time step reflecting the thermal scales. After multiple time steps, the temperature field is used to determine the EM material properties and the time-harmonic Maxwell equations are solved. The resulting heat sources are recalculated and the process continued. Due to the weak coupling of the problems, the corresponding numerical models are established separately. The BHT equation is discretized with H¹ conforming elements, and Maxwell equations are discretized with H(curl) conforming elements. The complexity of the human head geometry naturally leads to the use of tetrahedral elements, which are commonly employed by unstructured mesh generators. The EM domain, including the head and a radiating source, is terminated by a Perfectly Matched Layer (PML), which is discretized with prismatic elements. The use of high order elements of different shapes and discretization types has motivated the development of a general 3D hp-FE code. In this work, we present new generic data structures and algorithms to perform adaptive local refinements on a hybrid mesh composed of different shaped elements. A variety of isotropic and anisotropic refinements that preserve conformity of discretization are designed. The refinement algorithms support one- irregular meshes with the constrained approximation technique. The algorithms are experimentally proven to be deadlock free. A second contribution of this dissertation lies with a new parallel sparse direct solver that targets linear systems arising from hp-FE methods. The new solver interfaces to the hierarchy of a locally refined mesh to build an elimination ordering for the factorization that reflects the h-refinements. By following mesh refinements, not only the computation of element matrices but also their factorization is restricted to new elements and their ancestors. The solver is parallelized by exploiting two-level task parallelism: tasks are first generated from a parallel post-order tree traversal on the assembly tree; next, those tasks are further refined by using algorithms-by-blocks to gain fine-grained parallelism. The resulting fine-grained tasks are asynchronously executed after their dependencies are analyzed. This approach effectively reduces scheduling overhead and increases flexibility to handle irregular tasks. The solver outperforms the conventional general sparse direct solver for a class of problems formulated by high order FEs. Finally, numerical results for a 3D coupled BHT with Maxwell equations are presented. The solutions of this Maxwell code have been verified using the analytic Mie series solutions. Starting with simple spherical geometry, parametric studies are conducted on realistic head models for a typical frequency band (900 MHz) of mobile phones. / text
35

Recommender System for Gym Customers

Sundaramurthy, Roshni January 2020 (has links)
Recommender systems provide new opportunities for retrieving personalized information on the Internet. Due to the availability of big data, the fitness industries are now focusing on building an efficient recommender system for their end-users. This thesis investigates the possibilities of building an efficient recommender system for gym users. BRP Systems AB has provided the gym data for evaluation and it consists of approximately 896,000 customer interactions with 8 features. Four different matrix factorization methods, Latent semantic analysis using Singular value decomposition, Alternating least square, Bayesian personalized ranking, and Logistic matrix factorization that are based on implicit feedback are applied for the given data. These methods decompose the implicit data matrix of user-gym group activity interactions into the product of two lower-dimensional matrices. They are used to calculate the similarities between the user and activity interactions and based on the score, the top-k recommendations are provided. These methods are evaluated by the ranking metrics such as Precision@k, Mean average precision (MAP) @k, Area under the curve (AUC) score, and Normalized discounted cumulative gain (NDCG) @k. The qualitative analysis is also performed to evaluate the results of the recommendations. For this specific dataset, it is found that the optimal method is the Alternating least square method which achieved around 90\% AUC for the overall system and managed to give personalized recommendations to the users.
36

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Ashish Gondimalla (14214179) 17 May 2024 (has links)
<p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p>
37

CUDA-based Scientific Computing / Tools and Selected Applications

Kramer, Stephan Christoph 22 November 2012 (has links)
No description available.

Page generated in 0.0663 seconds