Global ETD Search

351	Coralai: Emergent Ecosystems of Neural Cellular Automata Barbieux, Aidan A, Barbieux, Aidan A 01 March 2024 (has links) (PDF) Artificial intelligence has traditionally been approached through centralized architectures and optimization of specific metrics on large datasets. However, the frontiers of fields spanning cognitive science, biology, physics, and computer science suggest that intelligence is better understood as a multi-scale, decentralized, emergent phenomenon. As such, scaling up approaches that mirror the natural world may be one of the next big advances in AI. This thesis presents Coralai, a framework for efficiently simulating the emergence of diverse artificial life ecosystems integrated with modular physics. The key innovations of Coralai include: 1) Hosting diverse Neural Cellular Automata organisms in the same simulation that can interact and evolve; 2) Allowing user-defined physics and weather that organisms adapt to and can utilize to enact environmental changes; 3) Hardware-acceleration using Taichi, PyTorch, and HyperNEAT, enabling interactive evolution of ecosystems with 500k evolved parameters on a grid of 1m+ 16-channel physics-governed cells, all in real-time on a laptop. Initial experiments with Coralai demonstrate the emergence of diverse ecosystems of organisms that employ a variety of strategies to compete for resources in dynamic environments. Key observations include competing mobile and sessile organisms, organisms that exploit environmental niches like dense energy sources, and cyclic dynamics of greedy dominance out-competed by resilience. Artificial Life Emergent Intelligence Neural Cellular Automata Open-ended Evolution Multi-scale Complexity Parallel Computing Integrative Biology
352	Linear Exact Repair Schemes for Distributed Storage and Secure Distributed Matrix Multiplication Valvo, Daniel William 08 May 2023 (has links) In this thesis we develop exact repair schemes capable of repairing or circumventing unavailable servers of a distributed network in the context of distributed storage and secure distributed matrix multiplication. We develop the (Λ, Γ, W, ⊙)-exact repair scheme framework for discussing both of these contexts and develop a multitude of explicit exact repair schemes utilizing decreasing monomial-Cartesian codes (DMC codes). Specifically, we construct novel DMC codes in the form of augmented Cartesian codes and rectangular monomial-Cartesian codes, as well as design exact repair schemes utilizing these constructions inspired by the schemes from Guruswami and Wootters [16] and Chen and Zhang [6]. In the context of distributed storage we demonstrate the existence of both high rate and low bandwidth systems based on these schemes, and we develop two methods to extend them to the l-erasure case. Additionally, we develop a family of hybrid schemes capable of attaining high rates, low bandwidths, and a balance in between which proves to be competitive compared to existing schemes. In the context of secure distributed matrix multiplication we develop similarly impactful schemes which have very competitive communication costs. We also construct an encoding algorithm based on multivariate interpolation and prove it is T-secure. / Doctor of Philosophy / Distributed networks may be thought of as networks of computers and/or servers which are capable of transmitting and receiving data from one another. For many applications it is possible for distributed networks to perform better than the sum of their constituent parts. In this thesis we will focus on the particular applications of distributed storage and secure distributed multiplication. A distributed storage system is a system that is capable of storing a single data file over every server in a distributed network. Distributed storage systems often come with exact repair schemes which are algorithms designed to reconstruct the data from a server in the network given the data from the other servers. In particular, if a server on the network ever fails or is otherwise unavailable an exact repair scheme can be used to repair the lost data from the server and maintain the original file. A distributed matrix multiplication scheme on the other hand is a process by which two matrices stored on a source server can be multiplied using a distributed network of helper servers. Again if a helper server becomes unavailable during this process we may use an exact repair scheme to circumvent this delay. The main goal of this thesis is to develop exact repair schemes for the distributed storage and secure distributed matrix multiplication contexts utilizing a mathematical object known as an evaluation code. We will develop several families of exact repair schemes which may be finely tuned to fit particular situations within these contexts, and we will compare these schemes to the existing schemes in the field. coding theory erasure recovery locally recoverable code linear exact repair scheme distributed storage matrix multiplication parallel computing field trace
353	In Situ Visualization of Performance Data in Parallel CFD Applications Falcao do Couto Alves, Rigel 19 January 2023 (has links) This thesis summarizes the work of the author on visualization of performance data in parallel Computational Fluid Dynamics (CFD) simulations. Current performance analysis tools are unable to show their data on top of complex simulation geometries (e.g. an aircraft engine). But in CFD simulations, performance is expected to be affected by the computations being carried out, which in turn are tightly related to the underlying computational grid. Therefore it is imperative that performance data is visualized on top of the same computational geometry which they originate from. However, performance tools have no native knowledge of the underlying mesh of the simulation. This scientific gap can be filled by merging the branches of HPC performance analysis and in situ visualization of CFD simulations data, which shall be done by integrating existing, well established state-of-the-art tools from each field. In this threshold, an extension for the open-source performance tool Score-P was designed and developed, which intercepts an arbitrary number of manually selected code regions (mostly functions) and send their respective measurements – amount of executions and cumulative time spent – to the visualization software ParaView – through its in situ library, Catalyst –, as if they were any other flow-related variable. Subsequently the tool was extended with the capacity to also show communication data (messages sent between MPI ranks) on top of the CFD mesh. Testing and evaluation are done with two industry-grade codes: Rolls-Royce’s CFD code, Hydra, and Onera, DLR and Airbus’ CFD code, CODA. On the other hand, it has been also noticed that the current performance tools have limited capacity of displaying their data on top of three-dimensional, framed (i.e. time-stepped) representations of the cluster’s topology. Parallel to that, in order for the approach not to be limited to codes which already have the in situ adapter, it was extended to take the performance data and display it – also in codes without in situ – on a three-dimensional, framed representation of the hardware resources being used by the simulation. Testing is done with the Multi-Grid and Block Tri-diagonal NAS Parallel Benchmarks (NPB), as well as with Hydra and CODA again. The benchmarks are used to explain how the new visualizations work, while real performance analyses are done with the industry-grade CFD codes. The proposed solution is able to provide concrete performance insights, which would not have been reached with the current performance tools and which motivated beneficial changes in the respective source code in real life. Finally, its overhead is discussed and proven to be suitable for usage with CFD codes. The dissertation provides a valuable addition to the state of the art of highly parallel CFD performance analysis and serves as basis for further suggested research directions. info:eu-repo/classification/ddc/006 ddc:006
354	Cumulus - translating CUDA to sequential C++ : Simplifying the process of debugging CUDA programs / Cumulus - översätter CUDA till sekventiell C++ : En studie i hur felsökande av CUDA-program kan förenklas Blomkvist Karlsson, Vera January 2021 (has links) Due to their highly parallel architecture, Graphics Processing Units (GPUs) offer increased performance for programs benefiting from parallel execution. A range of technologies exist which allow GPUs to be used for general-purpose programming, NVIDIA’s CUDA platform is one example. CUDA makes it possible to combine source code written for GPUs and Central Processing Units (CPUs) in the same program. Those sections that benefit from parallel execution can be written as CUDA kernels and will be executed on the GPU. With CUDA it is common to have tens, or even hundreds, of thousands of threads running in parallel. While the high level of parallelism can offer significant performance increases for executed programs, it can also make CUDA programs hard to debug. Although debuggers for CUDA exist, they can not be used in the same way as standard debuggers, and they do not reduce the difficulties of reasoning about parallel execution. As a result, developers may feel compelled to fall back to inefficient debugging methods, such as relying on print statements. This project examines two possible approaches for creating a tool which simplifies the process of debugging CUDA programs, by transforming a parallel CUDA program to a sequential program in another high level language: one method centered around the Clang Abstract Syntax Tree (AST), and the other method centered around LLVM Intermediate Representation (IR) code. The method using Clang was found to be the most suitable for the purpose of translating CUDA, as it enables modifying only select parts, such as kernels, of the input program. Thus, the tool Cumulus was developed as a Clang plugin. Cumulus translates parallel CUDA code into sequential C++ code, allowing developers to use any method available for C++ debugging to debug their CUDA program. Cumulus is indicated to be a potential aid in debugging CUDA programs, by providing developers with increased flexibility. / Tack vare sin högst parallella arkitektur kan grafikprocessorer erbjuda ökad prestanda för program som gagnas av parallel exekvering. En mängd teknologier finns, vilka möjliggör att grafikprocessorer kan användas inte bara till grafikberäkningar, utan även till allmäna beräkningar. NVIDIA’s plattform CUDA är en sådan teknik. CUDA gör det möjligt att i samma program kombinera källkod skriven för att exekveras på en centralprocessor, med källkod skriven för att exekveras på en grafikprocessor. Kodsektioner i ett program som gagnas av att köras parallellt kan skrivas som en CUDA kernel, vilket är en funktion som exekveras på grafikprocessorn. Med CUDA är det är inte ovanligt att ha tiotusentals, eller till och med hundratusentals, trådar som körs parallellt. Den mycket höga nivån av parallellism kan erbjuda markant ökad prestanda för exekverade program, men kan samtidigt göra det svårt att felsöka CUDA-program. Särskilda avlusare för CUDA existerar, men de kan inte användas på samma sätt som standardavlusare, och de minskar inte svårigheterna med att resonera kring parallella beräkningar. På grund av detta kan utvecklare känna sig nödgade att använda ineffektiva felsökningsmetoder, såsom att förlita sig på printsatser. Det här projektet undersöker två möjliga metoder för att skapa ett verktyg som förenklar felsökandet i CUDAprogram, genom att översätta ett parallellt CUDA-program till ett sekventiellt program i ett klassiskt högnivå-programmeringsspråk. Den ena möjliga metoden är centrerad kring Clangs AST, den andra möjliga metoden är centrerad kring LLVM IR-kod. Metoden som använder Clang fanns vara den mest lämpliga metoden för syftet att översätta CUDA-kod, eftersom den möjliggör översättning av endast utvalda delar av originalprogrammet, exempelvis kernels. Således utvecklades verktyget Cumulus som en Clangplugin. Cumulus översätter parallell CUDA-kod till serialiserad C++-kod, vilket låter utvecklare använda alla de metoder som finns tillgängliga för att felsöka C++-program, för att felsöka sina CUDA-program. Evalueringen av Cumulus indikerar att verktyget kan fungera som en möjlig hjälp vid felsökande av CUDA-program, genom att erbjuda utvecklare ökad flexibilitet. Clang Code generation CUDA Debugging Parallel computing Clang Kodgenerering CUDA Felsökning Parallella beräkningar Computer Sciences Datavetenskap (datalogi)
355	Parallel and Distributed Implementation of A Multilayer Perceptron Neural Network on A Wireless Sensor Network Gao, Zhenning 11 April 2014 (has links) No description available. Artificial Intelligence Computer Science Artificial Intelligence Artificial Neural Network Machine Learning Multilayer Perceptron Wireless Sensor Network Parallel Computing Distributed Computing
356	Fluid dynamics for the anisotropically expanding quark-gluon plasma Bazow, Dennis P. 11 October 2017 (has links) No description available. Physics Relativistic fluid dynamics Quark-gluon plasma Anisotropic dynamics Viscous hydrodynamics Boltzmann equation GPU CUDA Parallel computing
357	Digital Morphologies: Environmentally-Influenced Generative Forms Jenson, Sage 26 July 2017 (has links) No description available. Computer Science
358	A Parallel Algorithm for Query Adaptive, Locality Sensitive Hash Search Carraher, Lee A. 17 September 2012 (has links) No description available. Computer Science Locality Sensitive Hashing Approximate Nearest Neighbors CUDA GPU Image Search Distance Adaptive LSH Parallel Computing
359	Exploring High Performance SQL Databases with Graphics Processing Units Hordemann, Glen J. 26 November 2013 (has links) No description available. Computer Science GPU Graphics Processing Unit Database SQL SQLite Visualization Data Mining CUDA GPGPU Parallel Computing High Performance Computing NVIDIA
360	GPU-based Parallel Computing for Nonlinear Finite Element Deformation Analysis Mafi, Ramin 04 1900 (has links) <p>Computer-based surgical simulation and non-rigid medical image registration in image-guided interventions are examples of applications that would benefit from real-time deformation simulation of soft tissues. The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models which then can be discretized by the Finite Element Method (FEM) for a numerical solution. Computational complexity of nonlinear FEM-based models has limited their use in real-time applications. The data-parallel nature and intense arithmetic operations in nonlinear FEM models are suitable for massive parallelization of the computations, in order to meet the response time requirements in such applications.</p> <p>This thesis is concerned with computational aspects of complex nonlinear deformation analysis problems with an emphasis on the speed of response using a parallel computing philosophy. It proposes a fast, accurate and scalable Graphic Processing Unit (GPU)-based implementation of the total Lagrangian FEM using implicit time integration for dynamic nonlinear deformation analysis. This is a general formulation valid for large deformations and strains and can account for material nonlinearities. A penalty method is used to satisfy the physical boundary constraints due to contact between deformable objects. The proposed set of optimized GPU kernels for computing the FEM matrices achieves more than 100 GFLOPS on a GTX 470 GPU device. The use of a novel vector assembly kernel and memory optimization strategies result in a performance gain of up to 25 GFLOPS in the PCG computations.</p> / Doctor of Philosophy (PhD) Parallel Computing total Lagrangian Nonlinear Finite Element Real-time Surgical Simulation GPGPU Biomedical Biomedical

Search results