Spelling suggestions: "subject:"parallella.""
11 |
Numerical Study of Coherent Structures within a legacy LES code and development of a new parallel Frame Work for their computation.Giammanco, Raimondo R 22 December 2005 (has links)
The understanding of the physics of the Coherent Structures and their interaction with the remaining fluid motions is of paramount interest in Turbulence Research.
Indeed, recently had been suggested that separating and understanding the the different physical behavior of Coherent Structures and "uncoherent" background might very well be the key to understand and predict Turbulence. Available understanding of Coherent Structures shows that their size is considerably larger than the turbulent macro-scale, making permissible the application of Large Eddy Simulation to their simulation and study, with the advantage to be able to study their behavior at higher Re and more complex geometry than a Direct Numerical Simulation would normally allow. Original purpose of the present work was therefore the validation of the use of Large Eddy Simulation for the study of Coherent Structures in Shear-Layer and the its application to different flow cases to study the effect of the flow topology on the Coherent Structures nature.
However, during the investigation of the presence of Coherent Structures in numerically generated LES flow fields, the aging in house Large Eddy Simulation (LES) code of the Environmental & Applied Fluid Dynamics Department has shown a series of limitations and shortcomings that led to the decision of relegating it to the status of Legacy Code (from now on indicated as VKI LES legacy code and of discontinuing its development. A new natively parallel LES solver has then been developed in the VKI Environmental & Applied Fluid Dynamics Department, where all the shortcomings of the legacy code have been addressed and modern software technologies have been adopted both for the solver and the surrounding infrastructure, delivering a complete framework based exclusively on Free and Open Source Software (FOSS ) to maximize portability and avoid any dependency from commercial products. The new parallel LES solver retains some basic characteristics of the old legacy code to provide continuity with the past (Finite Differences, Staggered Grid arrangement, Multi Domain technique, grid conformity across domains), but improve in almost all the remaining aspects: the flow can now have all the three directions of inhomogeneity, against the only two of the past, the pressure equation can be solved using a three point stencil for improved accuracy, and the viscous terms and convective terms can be computed using the Computer Algebra System Maxima, to derive discretized formulas in an automatic way.
For the convective terms, High Resolution Central Schemes have been adapted to the three-dimensional Staggered Grid Arrangement from a collocated bi-dimensional one, and a system of Master-Slave simulations has been developed to run in parallel a Slave simulation (on 1 Processing Element) for generating the inlet data for the Master simulation (n - 1 Processing Elements). The code can perform Automatic Run-Time Load Balancing, Domain Auto-Partitioning, has embedded documentation (doxygen), has a CVS repository (version managing) for ease of use of new and old developers.
As part of the new Frame Work, a set of Visual Programs have been provided for IBM Open Data eXplorer (OpenDX), a powerful FOSS Flow visualization and analysis tool, aimed as a replacement for the commercial TecplotTM, and a bug tracking mechanism via Bugzilla and cooperative forum resources (phpBB) for developers and users alike. The new M.i.O.m.a. (MiOma) Solver is ready to be used again for Coherent Structures analysis in the near future.
|
12 |
Modeling and algorithm adaptation for a novel parallel DSP processor / Modellering och algorithm-anpassning för en ny parallell DSP-processorKraigher, Olof, Olsson, Johan January 2009 (has links)
<p>The P3RMA (Programmable, Parallel, and Predictable Random Memory Access) processor, currently being developed at Linköping University Sweden, is an attempt to solve the problems of parallel computing by utilizing a parallel memory subsystem and splitting the complexity of address computations with the complexity of data computations. It is targeted at embedded low power low cost computing for mobile phones, handsets and basestations among many others. By studying the radix-2 FFT using the P3RMA concept we have shown that even algorithms with a complex addressing pattern can be adapted to fully utilize a parallel datapath while only requiring additional simple addressing hardware. By supporting this algorithm with a SIMT instruction almost 100% utilization of the datapath can be achieved. A simulator framework for this processor has been proposed and implemented. This simulator has a very flexible structure featuring modular addition of new instructions and configurable hardware parameters. The simulator might be used by hardware developers and firmware developers in the future.</p>
|
13 |
Modeling and algorithm adaptation for a novel parallel DSP processor / Modellering och algorithm-anpassning för en ny parallell DSP-processorKraigher, Olof, Olsson, Johan January 2009 (has links)
The P3RMA (Programmable, Parallel, and Predictable Random Memory Access) processor, currently being developed at Linköping University Sweden, is an attempt to solve the problems of parallel computing by utilizing a parallel memory subsystem and splitting the complexity of address computations with the complexity of data computations. It is targeted at embedded low power low cost computing for mobile phones, handsets and basestations among many others. By studying the radix-2 FFT using the P3RMA concept we have shown that even algorithms with a complex addressing pattern can be adapted to fully utilize a parallel datapath while only requiring additional simple addressing hardware. By supporting this algorithm with a SIMT instruction almost 100% utilization of the datapath can be achieved. A simulator framework for this processor has been proposed and implemented. This simulator has a very flexible structure featuring modular addition of new instructions and configurable hardware parameters. The simulator might be used by hardware developers and firmware developers in the future.
|
14 |
Cylindriska litiumjonbatterier – koncept för kommersiella fordon / Cylindrical cell format Lithium-Ion Battery concept for Commercial VehiclesWillgård, Carl January 2018 (has links)
I processen att optimera och elektrifiera fordon som använder sig utav batterier har litiumjon battericeller introducerats till fordonen. Det vanligaste sättet är att tillverkaren installerar en stor battericell (> 10 Ah) i fordonen. En stor cell har många fördelar mot en liten cell, som att den är lättare att hantera, den utrustning som krävs för att övervaka cellen blir mindre och det krävs inga kopplingar mellan flertal celler. Det finns däremot en mängd fördelar med att ha mindre celler (< 5 Ah). De mindre cellerna skulle kunna bidra till en lägre kostnad, en jämnare värmefördelning över systemet och framförallt lättare att mekaniskt installera fordonet. Det vanligaste är att företag använder sig utav de större cellerna, det finns däremot fåtal exempel i privata fordonssektorn där tillverkare använder sig utav de mindre cellerna. Att använda sig utav de mindre cellerna kräver ett annat tänk när det gäller kylning, paketering i fordonen samt bevakningen av cellernas hårdvara och mjukvara blir annorlunda. Detta projekt har fokuserat på de elektriska och termiska aspekterna för implementering av parallellt kopplade små litiumjonceller i tunga fordon, som bussar och lastbilar. I projektet utfördes prestandaprov där temperatur, spänning och ström monitorerades över cellerna. Syftet var att öka kunskapen inom området för dessa små celler för att se om dessa har en potentiell plats på den kommersiella marknaden i framtiden. Målet med detta projekt är att mäta den spridning av ström som sker mellan de parallellt kopplade cellerna under variering av temperatur mellan cellerna. Från de utförda experimenten syns det tydligt att det sker en spridning av strömmen mellan cellerna. Den temperaturskillnaden som testas under experimentet påverkar inte strömmens spridning tillräckligt för att det ska visa någon differens i strömspridningen mellan cellerna. Detta ledde till att slutsatsen för projektet blir att det sker en strömspridning mellan parallellt kopplade celler, men temperaturdifferensen på tio grader celsius är inte tillräcklig för att påverka cellerna så pass att spridningen blir större. Under projektets gång möttes vi av många utmaningar och svårigheter. Detta har gjorde att den tid som kunde spenderas på provfasen blev väldigt kort. Det ufördes därför en minimal mängd av prov, vilket betyder att den data som samlades in under projektet inte var lika omfattande som det från början önskats. / In the process of optimizing and electrifying vehicles using batteries, lithium-ion battery cells have been introduced to the vehicles. The most common way is that the manufacturer installs a large battery cell (> 10 Ah) in the vehicles. A large cell has many advantages to a small cell. For example it is easier to handle, the equipment required to monitor the cell becomes smaller and no connections between multiple cells are required. On the other hand, there are many advantages of having smaller cells (<5 Ah). The smaller cells could contribute to a lower cost, a more even heat distribution across the system and, above all, easier to mechanically install in the vehicle. The most common choice for companies is to use the larger cells, but there are few examples in the private vehicle sector where manufacturers use the smaller cells. Using the smaller cells requires a different idea when it comes to cooling the cells, packing in the vehicles, and monitoring the hardware and software of the cells are different. This project focused on the electrical and thermal aspects of implementing parallel-connected small lithium-ion cells in heavy vehicles, such as buses and lorries. In this project performance tests were performed where temperature, voltage and current are monitored across the cells. The aim was to increase knowledge in the area of these small cells, to see if they have a potential place in the commercial market in the future. The goal of this project was to measure the spread of current that occurs between the parallel-connected cells during the varying temperature between the cells. From the experiments carried out, it was clear that there’s a spread of the current between the cells. The temperature difference tested during the experiment does not affect the spread of the current enough to show any difference in the current spread between the cells. Which leads to the conclusion of the project that there are a current spread between parallelconnected cells. However, the temperature difference of ten degrees Celsius is not sufficient to affect the cells enough that the spread becomes larger. The project faced a lot of challenges and difficulties. This has meant that the time spent on the experimental phase became very short. Therefore, a minimal amount of experiments was completed, which in turn means that the data collected for the project is not as extensive as it was meant to be initially.
|
15 |
Coil Sensitivity Estimation and Intensity Normalisation for Magnetic Resonance Imaging / Spolkänslighetsbestämning och intensitetsnormalisering för magnetresonanstomografiHerterich, Rebecka, Sumarokova, Anna January 2019 (has links)
The quest for improved efficiency in magnetic resonance imaging has motivated the development of strategies like parallel imaging where arrays of multiple receiver coils are operated simultaneously in parallel. The objective of this project was to find an estimation of phased-array coil sensitivity profiles of magnetic resonance images of the human body. These sensitivity maps can then be used to perform an intensity inhomogeneity correction of the images. Through investigative work in Matlab, a script was developed that uses data embedded in raw data from a magnetic resonance scan, to generate coil sensitivities for each voxel of the volume of interest and recalculate them to two-dimensional sensitivity maps of the corresponding diagnostic images. The resulting mapped sensitivity profiles can be used in Sensitivity Encoding where a more exact solution can be obtained using the carefully estimated sensitivity maps of the images. / Inom magnetresonanstomografi eftersträvas förbättrad effektivitet, villket bidragit till utvecklingen av strategier som parallell imaging, där arrayer av flera mottagarspolar andvänds samtidigt. Syftet med detta projekt var att uppskattamottagarspolarnas känslighetskarta för att utnyttja dem till i metoder inom magnetresonansavbildning. Dessa känslighetskartor kan användas för att utföra intensitetsinhomogenitetskorrigering av bilderna. Genom utforskande arbete i Matlab utvecklades ett skript som tillämpar inbyggd rådata, från en magnetiskresonansavbildning för att generera spolens känslighet för varje voxel av volymen och omberäkna dem till tvådimensionella känslighetskartor av motsvarande diagnostiska bilder. De resulterande kartlagda känslighetsprofilerna kan användas i känslighetskodning, där en mer exakt lösning kan erhållas med hjälp av de noggrant uppskattade känslighetskartorna.
|
16 |
Framgång genom modern luftmaktsteori?Rexling, Stefan January 2016 (has links)
Under efterkrigstiden har ett antal operationer med luftmakt genomförts. Dessa krig har haft olika förlopp där vissa operationer har nått relativt enkla framgångar medan andra haft större svårigheter att effektivt uppnå framgång, trots överlägsna resurser. Genom en fallstudie av två moderna luftoperationer, operation Desert Storm i Irak år 1991 och operation Allied Force i Kosovo år 1999, undersöks tre luftmaktsteoretiska variablers betydelse för dessa operationers väg till framgång. Variablerna som undersöks är hämtade ur den moderna luftmaktsteori som John Warden har bidragit till att bilda och vars syfte är att bidra till att förklara hur strategisk förlamning av motståndaren uppkommer. Resultatet av undersökningen visar att systemperspektiv sannolikt bidrar till luftoperativ framgång, däremot kan inte parallell attack visas ha betydelse för att uppnå framgång. Den tredje undersökta variabeln, luftoperativ kontroll, befinns vara betydelsefull som en förutsättning för de övriga två variablernas funktion.
|
17 |
Prestanda- och beteendeanalys av parallella köer med iterator. / Performance and behavior analysis of concurrent queues with iterator.Lodin, Viktor, Olovsson, Magnus January 2014 (has links)
I modern utveckling av hårdvara ligger det stort fokus på att producera processorer med fler och fler kärnor. Därmed behöver även mjukvaran utvecklas för att på bästa sätt utnyttja all denna parallella potential. En stor del av detta är då att kunna dela data mellan flera parallella processer, vilket uppnås med hjälp av parallella samlingsdatatyper. En vanlig operation på samlingsdatatyper är att iterera denna. Studiens mål var att analysera prestanda och beteende hos ett flertal kända algoritmer för iteration av datasamlingen kö. Även hur olika förutsättningar kan påverka iteratorns prestanda har värderats. Några exempel på dessa förutsättningar är antalet arbetstrådar som arbetar mot kön, initial storlek hos kön samt olika pinning strategier. Initial storlek beskriver hur många element som befinner sig i kön vid experimentens start och pinning strategi beskriver vilken kärna varje tråd skall binda sig till. Vissa iterator algoritmer lämnar garantier för att det tillstånd som returneras är ett atomiskt snapshot av kön. Ett atomiskt snapshot är en ögonblicksbild av hur kön såg ut vid någon fast tidpunkt. På grund av detta har det även varit ett mål att mäta hur stor kostnaden är för att få denna garanti. Utöver detta har prestandan hos enqueue och dequeue operationerna för respektive kö testats för att få en helhetsblick över köns prestanda.För att mäta prestandan har ett benchmarkprogram implementerats. Detta benchmarkprogram förser ett gränssnitt för samtliga köer att implementera, och kan utefter detta gränssnitt testa prestandan hos kön. Programmet kör mikrobenchmarks som mäter prestandan hos varje enskild operation hos kön. Det sätt som kön pressas på under dessa benchmarks är inte realistiskt för hur kön kan tänkas användas i skarpt läge. Istället mäts prestandan vid högsta möjliga belastning. Detta görs för att enklast kunna jämföra prestandan mellan de olika köerna.I studien har prestandan hos fyra köer med iteratorer testats, experimenten är utförda i C# med .NET 4.5 i en Windows miljö. Den parallella kö som finns i .NET biblioteket var en av köerna som testades. Dels för att det är intressant att se hur väl Microsoft optimerat denna, men också för att få en utgångspunkt att jämföra med de andra testade köerna. Michael och Scotts kö har även den testats, med två stycken olika iteratorer tillagda. Dessa är Scan and Return och Double Collect. Även en parallell kö framtagen med hjälp av universella metoder för att konstruera paralllella dataobjekt från sekventiella, baserad på den immutable kö som finns i .NET biblioteket har testats. En immutable kö är en kö som inte kan modifieras efter initiering.Resultaten från utförda benchmarks visar att Michael och Scott kön med Scan and Return iteratorn är den snabbaste på iteration, med Double Collect iteratorn som tvåa. Snabbast enqueue och dequeue operationer hittas i .NET bibliotekets parallella kö. Kön som bygger på immutable visar sig vara långsammast vad gäller iteration i de flesta fall. Den är även långsammast vad gäller enqueue och dequeue operationerna i samtliga fall. Kostnaden för att få en garanti för ett atomiskt snaphot mäter vi i skillnaden mellan Scan and Return och Double Collect iteratorerna. Detta på grund av att dessa är de två snabbaste iteratorerna och Scan and Return inte lämnar garantin medan Double Collect gör det. Denna kostnad visar sig vara relativt stor, Scan and Return presterar upp emot tre gånger så snabbt som Double Collect.Med hjälp av resultaten från denna studie kan nu utvecklare göra väl informerade val vad gäller vilken kö med iterator algoritm de skall välja för att optimera sina system. Detta kanske är som viktigast vid utveckling av större system, men kan även vara användbart vid mindre. / Program: Systemarkitekturutbildningen
|
18 |
Parallell beräkning av omslutande volymer / Parallel Computation of Bounding VolumesWinberg, Olov, Karlsson, Mattias January 2010 (has links)
<p>This paper presents techniques for speeding up commonly used algorithms forbounding volume (BV) computation, such as the AABB, sphere and k-DOP. Byexploiting the possibilities of parallelismin modern processors, the result exceedsthe expected theoretical result. The methods focus on data-level-parallelism(DLP) using Intel’s SSE instructions, for operations on 4 parallel independentsingle precision floating point values, with a theoretical speed-up factor of 4 ondata throughput. Still, a speed-up between 7–9 are shown in the computation ofAABBs and k-DOPs. For the computation of tight fitting spheres the speed-upfactor halts at approximately 4 due to a limiting data dependency. In addition,further parallelization by multithreading algorithms on multi-core CPUs showsspeed-up factors of 14 on 2 cores and reaching 25 on 4 cores, compared to nonparallel algorithms.</p>
|
19 |
Object Based Concurrency for Data Parallel Applications : Programmability and EffectivenessDiaconescu, Roxana Elena January 2002 (has links)
<p>Increased programmability for concurrent applications in distributed systems requires automatic support for some of the concurrent computing aspects. These are: the decomposition of a program into parallel threads, the mapping of threads to processors, the communication between threads, and synchronization among threads.</p><p>Thus, a highly usable programming environment for data parallel applications strives to conceal data decomposition, data mapping, data communication, and data access synchronization.</p><p>This work investigates the problem of programmability and effectiveness for scientific, data parallel applications with irregular data layout. The complicating factor for such applications is the recursive, or indirection data structure representation. That is, an efficient parallel execution requires a data distribution and mapping that ensure data locality. However, the recursive and indirect representations yield poor physical data locality. We examine the techniques for efficient, load-balanced data partitioning and mapping for irregular data layouts. Moreover, in the presence of non-trivial parallelism and data dependences, a general data partitioning procedure complicates arbitrary locating distributed data across address spaces. We formulate the general data partitioning and mapping problems and show how a general data layout can be used to access data across address spaces in a location transparent manner.</p><p>Traditional data parallel models promote instruction level, or loop-level parallelism. Compiler transformations and optimizations for discovering and/or increasing parallelism for Fortran programs apply to regular applications. However, many data intensive applications are irregular (sparse matrix problems, applications that use general meshes, etc.). Discovering and exploiting fine-grain parallelism for applications that use indirection structures (e.g. indirection arrays, pointers) is very hard, or even impossible.</p><p>The work in this thesis explores a concurrent programming model that enables coarse-grain parallelism in a highly usable, efficient manner. Hence, it explores the issues of implicit parallelism in the context of objects as a means for encapsulating distributed data. The computation model results in a trivial SPMD (Single Program Multiple Data), where the non-trivial parallelism aspects are solved automatically.</p><p>This thesis makes the following contributions:</p><p>- It formulates the general data partitioning and mapping problems for data parallel applications. Based on these formulations, it describes an efficient distributed data consistency algorithm.</p><p>- It describes a data parallel object model suitable for regular and irregular data parallel applications. Moreover, it describes an original technique to map data to processors such as to preserve locality. It also presents an inter-object consistency scheme that tries to minimize communication.</p><p>- It brings evidence on the efficiency of the data partitioning and consistency schemes. It describes a prototype implementation of a system supporting implicit data parallelism through distributed objects. Finally, it presents results showing that the approach is scalable on various architectures (e.g. Linux clusters, SGI Origin 3800).</p>
|
20 |
Object Based Concurrency for Data Parallel Applications : Programmability and EffectivenessDiaconescu, Roxana Elena January 2002 (has links)
Increased programmability for concurrent applications in distributed systems requires automatic support for some of the concurrent computing aspects. These are: the decomposition of a program into parallel threads, the mapping of threads to processors, the communication between threads, and synchronization among threads. Thus, a highly usable programming environment for data parallel applications strives to conceal data decomposition, data mapping, data communication, and data access synchronization. This work investigates the problem of programmability and effectiveness for scientific, data parallel applications with irregular data layout. The complicating factor for such applications is the recursive, or indirection data structure representation. That is, an efficient parallel execution requires a data distribution and mapping that ensure data locality. However, the recursive and indirect representations yield poor physical data locality. We examine the techniques for efficient, load-balanced data partitioning and mapping for irregular data layouts. Moreover, in the presence of non-trivial parallelism and data dependences, a general data partitioning procedure complicates arbitrary locating distributed data across address spaces. We formulate the general data partitioning and mapping problems and show how a general data layout can be used to access data across address spaces in a location transparent manner. Traditional data parallel models promote instruction level, or loop-level parallelism. Compiler transformations and optimizations for discovering and/or increasing parallelism for Fortran programs apply to regular applications. However, many data intensive applications are irregular (sparse matrix problems, applications that use general meshes, etc.). Discovering and exploiting fine-grain parallelism for applications that use indirection structures (e.g. indirection arrays, pointers) is very hard, or even impossible. The work in this thesis explores a concurrent programming model that enables coarse-grain parallelism in a highly usable, efficient manner. Hence, it explores the issues of implicit parallelism in the context of objects as a means for encapsulating distributed data. The computation model results in a trivial SPMD (Single Program Multiple Data), where the non-trivial parallelism aspects are solved automatically. This thesis makes the following contributions: - It formulates the general data partitioning and mapping problems for data parallel applications. Based on these formulations, it describes an efficient distributed data consistency algorithm. - It describes a data parallel object model suitable for regular and irregular data parallel applications. Moreover, it describes an original technique to map data to processors such as to preserve locality. It also presents an inter-object consistency scheme that tries to minimize communication. - It brings evidence on the efficiency of the data partitioning and consistency schemes. It describes a prototype implementation of a system supporting implicit data parallelism through distributed objects. Finally, it presents results showing that the approach is scalable on various architectures (e.g. Linux clusters, SGI Origin 3800).
|
Page generated in 0.0771 seconds