Global ETD Search

31	Algorithm/architecture codesign of low power and high performance linear algebra compute fabrics Pedram, Ardavan 27 September 2013 (has links) In the past, we could rely on technology scaling and new micro-architectural techniques to improve the performance of processors. Nowadays, both of these methods are reaching their limits. The primary concern in future architectures with billions of transistors on a chip and limited power budgets is power/energy efficiency. Full-custom design of application-specific cores can yield up to two orders of magnitude better power efficiency over conventional general-purpose cores. However, a tremendous design effort is required in integrating a new accelerator for each new application. In this dissertation, we present the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain operations. The broad vision is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application programming support through standard linear algebra libraries. We try to address these issues specifically in the context of dense linear algebra applications. In the process, we pursue the main questions that architects will face while designing such accelerators. How broad is this class of applications that the accelerator can support? What are the limiting factors that prevent utilization of these accelerators on the chip? What is the maximum achievable performance/efficiency? Answering these questions requires expertise and careful codesign of the algorithms and the architecture to select the best possible components, datapaths, and data movement patterns resulting in a more efficient hardware-software codesign. In some cases, codesign reduces complexities that are imposed on the algorithm side due to the initial limitations in the architectures. We design a specialized Linear Algebra Processor (LAP) architecture and discuss the details of mapping of matrix-matrix multiplication onto it. We further verify the flexibility of our design for computing a broad class of linear algebra kernels. We conclude that this architecture can perform a broad range of matrix-matrix operations as complex as matrix factorizations, and even Fast Fourier Transforms (FFTs), while maintaining its ASIC level efficiency. We present a power-performance model that compares state-of-the-art CPUs and GPUs with our design. Our power-performance model reveals sources of inefficiencies in CPUs and GPUs. We demonstrate how to overcome such inefficiencies in the process of designing our LAP. As we progress through this dissertation, we introduce modifications of the original matrix-matrix multiplication engine to facilitate the mapping of more complex operations. We observe the resulting performance and efficiencies on the modified engine using our power estimation methodology. When compared to other conventional architectures for linear algebra applications and FFT, our LAP is over an order of magnitude better in terms of power efficiency. Based on our estimations, up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology. / text Low-power design Energy-aware systems Performance analysis and design aids Matrix multiplication Memory hierarchy Level-3 BLAS Special-purpose hardware Matrix factorization Fast Fourier transform
32	Kuna Yala – Effekter av lokal involvering inom destinationsutvecklingen : B-uppsats Jakobsson, Adam, Lindberg, Pontus January 2014 (has links) This essay has examined how the local community is involved with the tourism development on the San Blas islands and how it effects the destinations development. The method of choice has been a qualitative interview with an operator that offers trips to the destination. Also there has been made a literature study where previous research about the San Blas and the subjects of choice is being presented. During the writing process of this essay it has become obvious to us that the local community (Kuna Yala) is very much involved in everything that concerns the destination. We can thereby establish that the local community of San Blas can be placed on the top step of Arnsteins (1969) “Ladder of Participation”. This comprehensive involvement brings both positive and negative effects on the destinations development. On the negative side we can se that the service level sometimes is low due to lack of knowledge and communication amongst the Kunas. The positive effects however shows that the destination is perceived more unique when the visitors gets to experience the real Kuna lifestyle that also creates a sustainable social environment. Many Kunas make their living on the tourism industry, which secure that the visitors money stays at the destination. The destination itself has not followed any traditional curve regarding destination development, which makes San Blas problematic to apply in Butlers model about the destination lifecycle. / Denna uppsats har undersökt hur lokalsamhället på San Blasöarna involveras i turismnäringen och hur det påverkar destinationens turismutveckling. Valet av metod har varit en kvalitativ intervju med en aktör som erbjuder resor till destinationen. Vidare har också en litteraturstudie genomförts där tidigare forskning kring destinationen och ämnesområdet presenteras. Under arbetet med uppsatsen har det blivit tydligt att lokalbefolkningen (Kuna Yala) involveras i allra högsta grad i San Blas och påverkar alla beslut som har med destinationen att göra. Vi kan därför konstatera att San Blas lokalbefolkning kan placeras på de övre stegen i Arnsteins (1969) modell över medborgarnas deltagandegrad. Den stora involveringen för med sig både positiva och negativa konsekvenser för destinationsutvecklingen. Det negativa är att servicegraden ibland blir väldigt låg till följd av bristande kunskap och kommunikation bland lokalinvånarna. Bland de positiva effekterna nämns bl.a. att destinationen upp levs unik av besökarna som får komma nära Kuna Yala och uppleva deras levnadssätt vilket skapar ett hållbart socialt förhållande. Sysselsättningen blir även högre då många inom Kuna Yala livnär sig på turismnäringen och den höga deltagandegraden minskar läckaget vilket gör att mer pengar stannar kvar på destinationen. Destinationen i sig har inte följt någon traditionell kurva över turismutvecklingen vilket gör att San Blas blir problematisk att applicera i Butlers modell över destinationslivscykeln. Community based tourism Destination development San Blas Islands Kuna Yala Lokalsamhällsbaserad turism Destinationsutveckling San Blasöarna Kuna Yala Social Sciences Interdisciplinary
33	PART I: TWO PIECES FOR ORCHESTRA: LOS NIÑOS HEROES AND EL PORFIRIATOPART II: TWO COMPOSERS, BLAS GALINDO AND JOSE PABLO MONCAYO: AN ANALYSIS OF TWO WORKS WRITTEN DURING THE HEIGHT OF MEXICAN NATIONALISM Hernandez, Guillermo Alexandro, III 13 May 2015 (has links) No description available. Music Mexican music Mexican nationalism Jose Pablo Moncayo Blas Galindo regional Mexican folk music mariachi son jarocho son jalisciense Huapango Sones de mariachi
34	Algorithmique hiérarchique parallèle haute performance pour les problèmes à N-corps Fortin, Pierre 27 November 2006 (has links) (PDF) Cette thèse porte sur la méthode dite « méthode multipôle rapide » qui résout hiérarchiquement le problème à N-corps avec une complexité linéaire pour n'importe quelle précision. Dans le cadre de l'équation de Laplace, nous souhaitons pouvoir traiter efficacement toutes les distributions de particules rencontrées en astrophysique et en dynamique moléculaire.<br /> Nous étudions tout d'abord deux expressions distinctes du principal opérateur (« multipôle-to-local ») ainsi que les bornes d'erreur associées. Pour ces deux expressions, nous présentons une formulation matricielle dont l'implémentation avec des routines BLAS (Basic Linear Algebra Subprograms) permet d'améliorer fortement l'efficacité de calcul. Dans la gamme de précisions qui nous intéresse, cette approche se révèle plus performante que les améliorations existantes (FFT, rotations et ondes planes), pour des distributions uniformes ou non.<br /> Outre une nouvelle structure de données pour l'octree sous-jacent et des contributions algorithmiques à la version adaptative, nous avons aussi efficacement parallélisé notre méthode en mémoire partagée et en mémoire distribuée. Enfin, des comparaisons avec des codes dédiés justifient l'intérêt de notre code pour des simulations en astrophysique. problème à N-corps méthode multipôle rapide algorithme de Barnes & Hut équation de Laplace équation de Poisson astrophysique dynamique moléculaire borne d'erreur Transformée Rapide de Fourier rotations ondes planes routines BLAS octree parallélisme mémoire partagée mémoire distribuée
35	Sur la validation numérique des codes de calcul industriels Montan, Séthy 25 October 2013 (has links) (PDF) L'étude de la qualité numérique est cruciale pour les codes industriels tels que ceux développés à EDF R&D. C'est d'autant plus important dans le contexte actuel où les simulations numériques sont faites sur des architectures pouvant exécuter des milliards d'opérations flottantes par seconde. Des études ont montré que la bibliothèque CADNA est un outil adapté à la validation numérique des codes industriels. Toutefois, CADNA ne peut être utilisée simplement sur des grands codes industriels, ces derniers faisant appel à des bibliothèques externes (MPI, BLACS, BLAS, LAPACK). Il est donc nécessaire de développer des extensions compatibles avec l'outil CADNA. L'implémentation de ces diverses extensions pose un problème de performance, la complexité algorithmique et la taille des logiciels de calcul numérique impliquant d'importants temps d'exécution. A titre d'exemple, l'implémentation directe de CADNA dans la routine de produit matriciel DGEMM des BLAS, introduit un surcoût supérieur à 1000 pour une matrice carrée de taille 1024. Les raisons de ce surcoût sont expliquées dans ce mémoire. Nous présentons également, à travers notre routine DgemmCADNA, la méthodologie pour réduire ce surcoût. Cette routine a permis de réduire ce surcoût d'un facteur 1100 à un facteur 35 par rapport à la version GotoBLAS. Une deuxième partie de notre travail a été consacrée à l'étude de la qualité numérique du code Telemac-2D. Pour valider entièrement le code, nous avons implémenté une extension de CADNA pour le standard MPI. Le débogage numérique à l'aide de CADNA a montré que plus de 30% des instabilités détectées apparaissent dans les produits scalaires. L'utilisation des algorithmes de produit scalaire compensé permet d'améliorer la précision des résultats sans dégrader les performances du code. Simulation numérique Codes industriels Validation numérique CADNA Algorithmes compensés Bibliothèques scientifiques BLAS MPI
36	El daño al proyecto de vida en los casos de Gabriela Blas y del matrimonio igualitario en Chile Turkieltaub del Fierro, Manuela January 2019 (has links) Memoria para optar al grado de Licenciado en Ciencias Jurídicas y Sociales / El Daño al Proyecto de Vida es un concepto frecuentemente utilizado por el sistema Interamericano de Derechos Humanos y que ha ido desarrollándose paulatinamente a través de la doctrina. Sin embargo, éste no se encuentra recogido positivamente tanto a nivel internacional como interno, dificultando por ello su aplicación e interpretación. En el presente trabajo se busca unificar la información disponible relacionada a la conceptualización de esta categoría de daño para analizar su posible incidencia en el ordenamiento chileno. Lo anterior se realiza mediante la revisión de dos casos nacionales paradigmáticos: el de la pastora aymara Gabriela Blas, condenada penalmente por el abandono de su hijo, y de la iniciativa impulsada por el Movimiento de Integración y Liberación Homosexual para promover la incorporación del matrimonio igualitario en la legislación chilena. Nace como principal interrogante y desafío de esta investigación analizar el nivel de flexibilidad que permite el concepto de Daño al Proyecto de Vida y, a contrario sensu, delimitar adecuadamente su extensión. La tarea no resulta sencilla. En efecto, se aborda esta labor mediante la aplicación de los requisitos establecidos, fundamentalmente, por la doctrina para la configuración del Daño al Proyecto de Vida a situaciones no enfrentadas por la Corte Interamericana de Derechos Humanos, como es el caso de los derechos colectivos y de la omisión como origen de la transgresión. Estos objetivos implican necesariamente relacionar nuestro ordenamiento interno, tanto en su faceta pública como privada, con el sistema internacional de los derechos humanos para establecer de qué forma podría incluirse y aplicarse a situaciones amparadas por la normativa chilena Derechos humanos (Derecho internacional) Sistema interamericano Debido proceso Chile Daños y perjuicios Chile Matrimonio entre homosexuales Matrimonio igualitario Daño al proyecto de vida Caso de Gabriela Blas
37	Finite element modeling of electromagnetic radiation and induced heat transfer in the human body Kim, Kyungjoo 24 September 2013 (has links) This dissertation develops adaptive hp-Finite Element (FE) technology and a parallel sparse direct solver enabling the accurate modeling of the absorption of Electro-Magnetic (EM) energy in the human head. With a large and growing number of cell phone users, the adverse health effects of EM fields have raised public concerns. Most research that attempts to explain the relationship between exposure to EM fields and its harmful effects on the human body identifies temperature changes due to the EM energy as the dominant source of possible harm. The research presented here focuses on determining the temperature distribution within the human body exposed to EM fields with an emphasis on the human head. Major challenges in accurately determining the temperature changes lie in the dependence of EM material properties on the temperature. This leads to a formulation that couples the BioHeat Transfer (BHT) and Maxwell equations. The mathematical model is formed by the time-harmonic Maxwell equations weakly coupled with the transient BHT equation. This choice of equations reflects the relevant time scales. With a mobile device operating at a single frequency, EM fields arrive at a steady-state in the micro-second range. The heat sources induced by EM fields produce a transient temperature field converging to a steady-state distribution on a time scale ranging from seconds to minutes; this necessitates the transient formulation. Since the EM material properties depend upon the temperature, the equations are fully coupled; however, the coupling is realized weakly due to the different time scales for Maxwell and BHT equations. The BHT equation is discretized in time with a time step reflecting the thermal scales. After multiple time steps, the temperature field is used to determine the EM material properties and the time-harmonic Maxwell equations are solved. The resulting heat sources are recalculated and the process continued. Due to the weak coupling of the problems, the corresponding numerical models are established separately. The BHT equation is discretized with H¹ conforming elements, and Maxwell equations are discretized with H(curl) conforming elements. The complexity of the human head geometry naturally leads to the use of tetrahedral elements, which are commonly employed by unstructured mesh generators. The EM domain, including the head and a radiating source, is terminated by a Perfectly Matched Layer (PML), which is discretized with prismatic elements. The use of high order elements of different shapes and discretization types has motivated the development of a general 3D hp-FE code. In this work, we present new generic data structures and algorithms to perform adaptive local refinements on a hybrid mesh composed of different shaped elements. A variety of isotropic and anisotropic refinements that preserve conformity of discretization are designed. The refinement algorithms support one- irregular meshes with the constrained approximation technique. The algorithms are experimentally proven to be deadlock free. A second contribution of this dissertation lies with a new parallel sparse direct solver that targets linear systems arising from hp-FE methods. The new solver interfaces to the hierarchy of a locally refined mesh to build an elimination ordering for the factorization that reflects the h-refinements. By following mesh refinements, not only the computation of element matrices but also their factorization is restricted to new elements and their ancestors. The solver is parallelized by exploiting two-level task parallelism: tasks are first generated from a parallel post-order tree traversal on the assembly tree; next, those tasks are further refined by using algorithms-by-blocks to gain fine-grained parallelism. The resulting fine-grained tasks are asynchronously executed after their dependencies are analyzed. This approach effectively reduces scheduling overhead and increases flexibility to handle irregular tasks. The solver outperforms the conventional general sparse direct solver for a class of problems formulated by high order FEs. Finally, numerical results for a 3D coupled BHT with Maxwell equations are presented. The solutions of this Maxwell code have been verified using the analytic Mie series solutions. Starting with simple spherical geometry, parametric studies are conducted on realistic head models for a typical frequency band (900 MHz) of mobile phones. / text hp-FEM Hybrid mesh Local mesh refinement algorithms Electromagnetics Specific Absorption Rate Dielectric heating Gaussian elimination Directed Acyclic Graph Direct method LU Multi-core Multi-frontal OpenMP Sparse matrix Supernodes Task parallelism Unassembled HyperMatrix GPU Dense linear algebra Algorithms-by-blocks Heterogeneous architectures BLAS
38	Algorithm-Architecture Co-Design for Dense Linear Algebra Computations Merchant, Farhad January 2015 (has links) (PDF) Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap- plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod- ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic and architectural variation to significantly improve the computation efficiency. There are two standard approaches for improving the computation effficiency, first, to perform application-specific architecture customization and second, to do algorithmic tuning. In the same manner, we first perform a graph-based analysis of selected DLA kernels. From the various forms of parallelism, thus identified, we design a custom processing element for improving the CPI. The processing elements are used as building blocks for a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per- forming detailed experiments on a synthesized CGRA implementation, we demonstrate that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product. DLA computations are encountered in many engineering and scientific computing ap- plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem. Traditionally, these applications are written in highly tuned High Performance Comput- ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba- sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual efforts in development and tuning of these packages, there still exists a scope for fur- ther tuning in this packages. In this thesis, we revisit most prominent and widely used compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these algorithms. Major contributions in the algorithmic tuning in this thesis are as follows: Algorithms: We present graph based analysis of General Matrix Multiplication (GMM) and discuss different types of parallelisms available in GMM We present analysis of Givens Rotation based QR factorization where we improve GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a column of a matrix simultaneously. We show that the multiplications in CGR are lower than GR We generalize CGR further and derive Generalized GR (GGR) that can annihilate multiple elements of the columns of a matrix simultaneously. We show that the parallelism exhibited by GGR is much higher than GR and Householder Transform (HT) We extend generalizations to Square root Free GR (also knows as Fast Givens Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise Fast Givens, and Column-wise SDFG . We also extend generalization for complex matrices and derive Complex Column-wise Givens Rotation Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible for on-chip communication, while Compute Elements in the REDEFINE can be domain customized to accelerate the applications pertaining to the domain of interest. In this thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently. We perform several architectural enhancements in the PE to approach lower bound of the CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of REDEFINE. We achieve better area and power performance compared to the yesteryear customized architecture for DLA. Major contributions in architecture in this thesis are as follows: Architecture: We present design of a PE for acceleration of GMM which is a Level-3 BLAS operation We methodically enhance the PE with different features for improvement in the performance of GMM For efficient realization of Linear Algebra Package (LAPACK), we use PE that can efficiently execute GMM and show better performance For further acceleration of LU and QR factorizations in LAPACK, we identify macro operations encountered in LU and QR factorizations, and realize them on a reconfigurable data-path resulting in 25-30% lower run-time Dense Linear Algebra (DLA) Algorithms-Applications Parallesing Algorithmic Structural Variations Dense Linear Algebra General Matrix Multiplication (GMM) Column-wise Givens Rotation (CGR) QR Factorization Basic Linear Algebra Subprograms (BLAS) Linear Algebra Package (LAPACK) Electronic Systems Engineering

Search results