Spelling suggestions: "subject:"multicore"" "subject:"multiscore""
121 |
Massively Parallel Cartesian Discrete Ordinates Method for Neutron Transport Simulation / SN cartésien massivement parallèle pour la simulation neutroniqueMoustafa, Salli 15 December 2015 (has links)
La simulation haute-fidélité des coeurs de réacteurs nucléaires nécessite une évaluation précise du flux neutronique dans le coeur du réacteur. Ce flux est modélisé par l’équation de Boltzmann ou équation du transport neutronique. Dans cette thèse, on s’intéresse à la résolution de cette équation par la méthode des ordonnées discrètes (SN) sur des géométries cartésiennes. Cette méthode fait intervenir un schéma d’itérations à source, incluant un algorithme de balayage sur le domaine spatial qui regroupe l’essentiel des calculs effectués. Compte tenu du très grand volume de calcul requis par la résolution de l’équation de Boltzmann, de nombreux travaux antérieurs ont été consacrés à l’utilisation du calcul parallèle pour la résolution de cette équation. Jusqu’ici, ces algorithmes de résolution parallèles de l’équation du transport neutronique ont été conçus en considérant la machine cible comme une collection de processeurs mono-coeurs indépendants, et ne tirent donc pas explicitement profit de la hiérarchie mémoire et du parallélisme multi-niveaux présents sur les super-calculateurs modernes. Ainsi, la première contribution de cette thèse concerne l’étude et la mise en oeuvre de l’algorithme de balayage sur les super-calculateurs massivement parallèles modernes. Notre approche combine à la fois la vectorisation par des techniques de la programmation générique en C++, et la programmation hybride par l’utilisation d’un support d’exécution à base de tâches: PaRSEC. Nous avons démontré l’intérêt de cette approche grâce à des modèles de performances théoriques, permettant également de prédire le partitionnement optimal. Par ailleurs, dans le cas de la simulation des milieux très diffusifs tels que le coeur d’un REP, la convergence du schéma d’itérations à source est très lente. Afin d’accélérer sa convergence, nous avons implémenté un nouvel algorithme (PDSA), adapté à notre implémentation hybride. La combinaison de ces techniques nous a permis de concevoir une version massivement parallèle du solveur SN Domino. Les performances de la partie Sweep du solveur atteignent 33.9% de la performance crête théorique d’un super-calculateur à 768 cores. De plus, un calcul critique d’un réacteur de type REP 900MW à 26 groupes d’énergie mettant en jeu 1012 DDLs a été résolu en 46 minutes sur 1536 coeurs. / High-fidelity nuclear reactor core simulations require a precise knowledge of the neutron flux inside the reactor core. This flux is modeled by the linear Boltzmann equation also called neutron transport equation. In this thesis, we focus on solving this equation using the discrete ordinates method (SN) on Cartesian mesh. This method involves a source iteration scheme including a sweep over the spatial mesh and gathering the vast majority of computations in the SN method. Due to the large amount of computations performed in the resolution of the Boltzmann equation, numerous research works were focused on the optimization of the time to solution by developing parallel algorithms for solving the transport equation. However, these algorithms were designed by considering a super-computer as a collection of independent cores, and therefore do not explicitly take into account the memory hierarchy and multi-level parallelism available inside modern super-computers. Therefore, we first proposed a strategy for designing an efficient parallel implementation of the sweep operation on modern architectures by combining the use of the SIMD paradigm thanks to C++ generic programming techniques and an emerging task-based runtime system: PaRSEC. We demonstrated the need for such an approach using theoretical performance models predicting optimal partitionings. Then we studied the challenge of converging the source iterations scheme in highly diffusive media such as the PWR cores. We have implemented and studied the convergence of a new acceleration scheme (PDSA) that naturally suits our Hybrid parallel implementation. The combination of all these techniques have enabled us to develop a massively parallel version of the SN Domino solver. It is capable of tackling the challenges posed by the neutron transport simulations and compares favorably with state-of-the-art solvers such as Denovo. The performance of the PaRSEC implementation of the sweep operation reaches 6.1 Tflop/s on 768 cores corresponding to 33.9% of the theoretical peak performance of this set of computational resources. For a typical 26-group PWR calculations involving 1.02×1012 DoFs, the time to solution required by the Domino solver is 46 min using 1536 cores.
|
122 |
Vers un langage synchrone sûr et securisé / Towards a safe and secure synchronous languageAttar, Pejman 12 December 2013 (has links)
Cette thèse propose une nouvelle approche du parallélisme et de la concurrence, posant les bases d'un langage de programmation à la fois sûr et "secure" (garantissant la sécurité des données), fondé sur une sémantique formelle claire et simple, tout en étant adapté aux architectures multi-cœur. Nous avons adopté le paradigme synchrone, dans sa variante réactive, qui fournit une alternative simple à la programmation concurrente standard en limitant l'impact des erreurs dépendant du temps ("data-races"). Dans un premier temps, nous avons considéré un langage réactif d'orchestration, DSL, dans lequel on fait abstraction de la mémoire (Partie 1). Dans le but de pouvoir traiter la mémoire et la sécurité, nous avons ensuite étudié (Partie 2) un noyau réactif, CRL, qui utilise un opérateur de parallélisme déterministe. Nous avons prouvé la réactivité bornée des programmes de CRL. Nous avons ensuite équipé CRL de mécanismes pour contrôler le flux d'information (Partie 3). Pour cela, nous avons d'abord étendu CRL avec des niveaux de sécurité pour les données, puis nous avons défini dans le langage étendu, SSL, un système de types permettant d'éviter les fuites d'information. Parallèlement (Partie 4), nous avons ajouté la mémoire à CRL, en proposant le modèle DSLM. En utilisant une notion d'agent, nous avons structuré la mémoire de telle sorte qu'il ne puisse y avoir de "data-races". Nous avons également étudié l'implémentation de DSLM sur les architectures multi-cœur, fondée sur la notion de site et de migration d'un agent entre les sites. L'unification de SSL et de DSLM est une piste pour un travail futur. / This thesis proposes a new approach to parallelism and concurrency, laying the basis for the design of a programming language with a clear and simple formal semantics, enjoying both safety and security properties, while lending itself to an implementation on multicore architectures. We adopted the synchronous programming paradigm, in its reactive variant, which provides a simple alternative to standard concurrent programming by limiting the impact of time-dependent errors ("data-races"). As a first step (Part 1), we considered a reactive orchestration language, DSL, which abstracts away from the memory. To set the basis for a formal treatment of memory and security, we then focussed on a reactive kernel, CRL, equipped with a deterministic parallel operator (Part 2). We proved bounded reactivity of CRL programs. Next, we enriched CRL with mechanisms for information flow control (Part 3). To this end, we first extended CRL with security levels for data. We then defined a type system on the extended language, SSL, which ensures the absence of information leaks. Finally, we added memory to CRL, as well as the notions of agent and site, thus obtaining the model DSLM (Part 4). We structured the memory in such a way that data-races cannot occur, neither within nor among agents. We also investigated the implementation of DSLM on multicore architectures, using the possibility of agent migration between sites. The unification of SSL and DSLM is left for future work.
|
123 |
Design and Implementation of an Audio Codec (AMR-WB) using Dataflow Programming Language CAL in the OpenDF EnvironmentAli, Hazem, Patoary, Mohammad Nazrul Ishlam January 2010 (has links)
<p>Over the last three decades, computer architects have been able to achieve an increase in performance for single processors by, e.g., increasing clock speed, introducing cache memories and using instruction level parallelism. However, because of power consumption and heat dissipation constraints, this trend is going to cease. In recent times, hardware engineers have instead moved to new chip architectures with multiple processor cores on a single chip. With multi-core processors, applications can complete more total work than with one core alone. To take advantage of multi-core processors, we have to develop parallel applications that assign tasks to different cores. On each core, pipeline, data and task parallelization can be used to achieve higher performance. Dataflow programming languages are attractive for achieving parallelism because of their high-level, machine-independent, implicitly parallel notation and because of their fine-grain parallelism. These features are essential for obtaining effective, scalable utilization of multi-core processors.</p><p>In this thesis work we have parallelized an existing audio codec - Adaptive Multi-Rate Wide Band (AMR-WB) - written in the C language for single core processor. The target platform is a multi-core AMR11 MP developer board. The final result of the efforts is a working AMR-WB encoder implemented in CAL and running in the OpenDF simulator. The C specification of the AMR-WB encoder was analysed with respect to dataflow and parallelism. The final implementation was developed in the CAL Actor Language, with the goal of exposing available parallelism - different dataflows - as well as removing unwanted data dependencies. Our thesis work discusses mapping techniques and guidelines that we followed and which can be used in any future work regarding mapping C based applications to CAL. We also propose solutions for some specific dependencies that were revealed in the AMR-WB encoder analysis and suggest further investigation of possible modifications to the encoder to enable more efficient implementation on a multi-core target system.</p>
|
124 |
PARALLEL COMPUTING ALGORITHMS FOR TANDEM2013 April 1900 (has links)
Tandem mass spectrometry, also known as MS/MS, is an analytical technique to measure the mass-to-charge ratio of charged ions and widely used in genomics, proteomics and metabolomics areas. There are two types of automatic ways to interpret tandem mass spectra: de novo methods and database searching methods. Both of them need to use massive computational resources and complicated comparison algorithms. The real-time peptide-spectrum matching (RT-PSM) algorithm is a database searching method to interpret tandem mass spectra with strict time constraints. Restricted by the hardware and architecture of an individual workstation the RT-PSM algorithm has to sacrifice the level of accuracy in order to provide prerequisite processing speed. The peptide-spectrum similarity scoring module is the most time-consuming part out of four modules in the RT-PSM algorithm, which is also the core of the algorithm.
In this study, a multi-core computing algorithm is developed for individual workstations. Moreover, a distributed computing algorithm is designed for a cluster. The improved algorithms can achieve the speed requirement of RT-PSM without sacrificing the accuracy. With some expansion, this distributed computing algorithm can also support different PSM algorithms. Simulation results show that compared with the original RT-PSM, the parallelization version achieves 25 to 34 times speed-up based on different individual workstations. A cluster with 240 CPU cores could accelerate the similarity score module 210 times compare with the single-thread similarity score module and the whole peptide identification process 85 times compare with the single-thread peptide identification process.
|
125 |
Parallel algorithms for target tracking on multi-coreplatform with mobile LEGO robotsWahlberg, Fredrik January 2011 (has links)
The aim of this master thesis was to develop a versatile and reliable experimentalplatform of mobile robots, solving tracking problems, for education and research.Evaluation of parallel bearings-only tracking and control algorithms on a multi-corearchitecture has been performed. The platform was implemented as a mobile wirelesssensor network using multiple mobile robots, each using a mounted camera for dataacquisition. Data processing was performed on the mobile robots and on a server,which also played the role of network communication hub. A major focus was toimplement this platform in a flexible manner to allow for education and futureresearch in the fields of signal processing, wireless sensor networks and automaticcontrol. The implemented platform was intended to act as a bridge between the idealworld of simulation and the non-ideal real world of full scale prototypes.The implemented algorithms did estimation of the positions of the robots, estimationof a non-cooperating target's position and regulating the positions of the robots. Thetracking algorithms implemented were the Gaussian particle filter, the globallydistributed particle filter and the locally distributed particle filter. The regulator triedto move the robots to give the highest possible sensor information under givenconstraints. The regulators implemented used model predictive control algorithms.Code for communicating with filters in external processes were implementedtogether with tools for data extraction and statistical analysis.Both implementation details and evaluation of different tracking algorithms arepresented. Some algorithms have been tested as examples of the platformscapabilities, among them scalability and accuracy of some particle filtering techniques.The filters performed with sufficient accuracy and showed a close to linear speedupusing up to 12 processor cores. Performance of parallel particle filtering withconstraints on network bandwidth was also studied, measuring breakpoints on filtercommunication to avoid weight starvation. Quality of the sensor readings, networklatency and hardware performance are discussed. Experiments showed that theplatform was a viable alternative for data acquisition in algorithm development and forbenchmarking to multi-core architecture. The platform was shown to be flexibleenough to be used a framework for future algorithm development and education inautomatic control.
|
126 |
Effiziente Mehrkernarchitektur für eingebettete Java-Bytecode-ProzessorenZabel, Martin 21 February 2012 (has links) (PDF)
Die Java-Plattform bietet viele Vorteile für die schnelle Entwicklung komplexer Software. Für die Ausführung des Java-Bytecodes auf eingebetteten Systemen eignen sich insbesondere Java-(Bytecode)-Prozessoren, die den Java-Bytecode als nativen Befehlssatz unterstützen. Die vorliegende Arbeit untersucht detailliert die Gestaltung einer Mehrkernarchitektur für Java-Prozessoren zur effizienten Nutzung der auf Thread-Ebene ohnehin vorhandenen Parallelität eines Java-Programms. Für die Funktionalitäts- und Leistungsbewertung eines Prototyps wird eine eigene Trace-Architektur eingesetzt. Es wird eine hohe Leistungssteigerung bei nur geringem zusätzlichem Hardwareaufwand erzielt sowie eine höhere Leistung als bekannte alternative Ansätze erreicht.
|
127 |
Extraktion und Identifikation von Entitäten in Textdaten im Umfeld der Enterprise Search / Extraction and identification of entities in text data in the field of enterprise searchBrauer, Falk January 2010 (has links)
Die automatische Informationsextraktion (IE) aus unstrukturierten Texten ermöglicht völlig neue Wege, auf relevante Informationen zuzugreifen und deren Inhalte zu analysieren, die weit über bisherige Verfahren zur Stichwort-basierten Dokumentsuche hinausgehen. Die Entwicklung von Programmen zur Extraktion von maschinenlesbaren Daten aus Texten erfordert jedoch nach wie vor die Entwicklung von domänenspezifischen Extraktionsprogrammen.
Insbesondere im Bereich der Enterprise Search (der Informationssuche im Unternehmensumfeld), in dem eine große Menge von heterogenen Dokumenttypen existiert, ist es oft notwendig ad-hoc Programm-module zur Extraktion von geschäftsrelevanten Entitäten zu entwickeln, die mit generischen Modulen in monolithischen IE-Systemen kombiniert werden. Dieser Umstand ist insbesondere kritisch, da potentiell für jeden einzelnen Anwendungsfall ein von Grund auf neues IE-System entwickelt werden muss.
Die vorliegende Dissertation untersucht die effiziente Entwicklung und Ausführung von IE-Systemen im Kontext der Enterprise Search und effektive Methoden zur Ausnutzung bekannter strukturierter Daten im Unternehmenskontext für die Extraktion und Identifikation von geschäftsrelevanten Entitäten in Doku-menten. Grundlage der Arbeit ist eine neuartige Plattform zur Komposition von IE-Systemen auf Basis der Beschreibung des Datenflusses zwischen generischen und anwendungsspezifischen IE-Modulen. Die Plattform unterstützt insbesondere die Entwicklung und Wiederverwendung von generischen IE-Modulen und zeichnet sich durch eine höhere Flexibilität und Ausdrucksmächtigkeit im Vergleich zu vorherigen Methoden aus.
Ein in der Dissertation entwickeltes Verfahren zur Dokumentverarbeitung interpretiert den Daten-austausch zwischen IE-Modulen als Datenströme und ermöglicht damit eine weitgehende Parallelisierung von einzelnen Modulen. Die autonome Ausführung der Module führt zu einer wesentlichen Beschleu-nigung der Verarbeitung von Einzeldokumenten und verbesserten Antwortzeiten, z. B. für Extraktions-dienste. Bisherige Ansätze untersuchen lediglich die Steigerung des durchschnittlichen Dokumenten-durchsatzes durch verteilte Ausführung von Instanzen eines IE-Systems.
Die Informationsextraktion im Kontext der Enterprise Search unterscheidet sich z. B. von der Extraktion aus dem World Wide Web dadurch, dass in der Regel strukturierte Referenzdaten z. B. in Form von Unternehmensdatenbanken oder Terminologien zur Verfügung stehen, die oft auch die Beziehungen von Entitäten beschreiben. Entitäten im Unternehmensumfeld haben weiterhin bestimmte Charakteristiken: Eine Klasse von relevanten Entitäten folgt bestimmten Bildungsvorschriften, die nicht immer bekannt sind, auf die aber mit Hilfe von bekannten Beispielentitäten geschlossen werden kann, so dass unbekannte Entitäten extrahiert werden können. Die Bezeichner der anderen Klasse von Entitäten haben eher umschreibenden Charakter. Die korrespondierenden Umschreibungen in Texten können variieren, wodurch eine Identifikation derartiger Entitäten oft erschwert wird.
Zur effizienteren Entwicklung von IE-Systemen wird in der Dissertation ein Verfahren untersucht, das alleine anhand von Beispielentitäten effektive Reguläre Ausdrücke zur Extraktion von unbekannten Entitäten erlernt und damit den manuellen Aufwand in derartigen Anwendungsfällen minimiert. Verschiedene Generalisierungs- und Spezialisierungsheuristiken erkennen Muster auf verschiedenen Abstraktionsebenen und schaffen dadurch einen Ausgleich zwischen Genauigkeit und Vollständigkeit bei der Extraktion. Bekannte Regellernverfahren im Bereich der Informationsextraktion unterstützen die beschriebenen Problemstellungen nicht, sondern benötigen einen (annotierten) Dokumentenkorpus.
Eine Methode zur Identifikation von Entitäten, die durch Graph-strukturierte Referenzdaten vordefiniert sind, wird als dritter Schwerpunkt untersucht. Es werden Verfahren konzipiert, welche über einen exakten Zeichenkettenvergleich zwischen Text und Referenzdatensatz hinausgehen und Teilübereinstimmungen und Beziehungen zwischen Entitäten zur Identifikation und Disambiguierung heranziehen. Das in der Arbeit vorgestellte Verfahren ist bisherigen Ansätzen hinsichtlich der Genauigkeit und Vollständigkeit bei der Identifikation überlegen. / The automatic information extraction (IE) from unstructured texts enables new ways to access relevant information and analyze text contents, which goes beyond existing technologies for keyword-based search in document collections. However, the development of systems for extracting machine-readable data from text still requires the implementation of domain-specific extraction programs.
In particular in the field of enterprise search (the retrieval of information in the enterprise settings), in which a large amount of heterogeneous document types exists, it is often necessary to develop ad-hoc program-modules and to combine them with generic program components to extract by business relevant entities. This is particularly critical, as potentially for each individual application a new IE system must be developed from scratch.
In this work we examine efficient methods to develop and execute IE systems in the context of enterprise search and effective algorithms to exploit pre-existing structured data in the business context for the extraction and identification of business entities in documents. The basis of this work is a novel platform for composition of IE systems through the description of the data flow between generic and application-specific IE modules. The platform supports in particular the development and reuse of generic IE modules and is characterized by a higher flexibility as compared to previous methods.
A technique developed in this work interprets the document processing as data stream between IE modules and thus enables an extensive parallelization of individual modules. The autonomous execution of each module allows for a significant runtime improvement for individual documents and thus improves response times, e.g. for extraction services. Previous parallelization approaches focused only on an improved throughput for large document collections, e.g., by leveraging distributed instances of an IE system.
Information extraction in the context of enterprise search differs for instance from the extraction from the World Wide Web by the fact that usually a variety of structured reference data (corporate databases or terminologies) is available, which often describes the relationships among entities. Furthermore, entity names in a business environment usually follow special characteristics: On the one hand relevant entities such as product identifiers follow certain patterns that are not always known beforehand, but can be inferred using known sample entities, so that unknown entities can be extracted. On the other hand many designators have a more descriptive character (concatenation of descriptive words). The respective references in texts might differ due to the diversity of potential descriptions, often making the identification of such entities difficult.
To address IE applications in the presence of available structured data, we study in this work the inference of effective regular expressions from given sample entities. Various generalization and specialization heuristics are used to identify patterns at different syntactic abstraction levels and thus generate regular expressions which promise both high recall and precision. Compared to previous rule learning techniques in the field of information extraction, our technique does not require any annotated document corpus.
A method for the identification of entities that are predefined by graph structured reference data is examined as a third contribution. An algorithm is presented which goes beyond an exact string comparison between text and reference data set. It allows for an effective identification and disambiguation of potentially discovered entities by exploitation of approximate matching strategies. The method leverages further relationships among entities for identification and disambiguation. The method presented in this work is superior to previous approaches with regard to precision and recall.
|
128 |
High-performance direct solution of finite element problems on multi-core processorsGuney, Murat Efe 04 May 2010 (has links)
A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.
|
129 |
Analyse von Test-Pattern für SoC Multiprozessortest und -debugging mittels Test Access Port (JTAG)Vogelsang, Stefan, Köhler, Steffen, Spallek, Rainer G. 11 June 2007 (has links) (PDF)
Bei der Entwickelung von System-on-Chip (SoC) Debuggern ist es leider hinreichend oft erforderlich
den Debugger selbst auf mögliche Fehler zu untersuchen. Da alle ernstzunehmenden Debugger
konstruktionsbedingt selbst ein eingebettetes System darstellen, erwächst die Notwendigkeit
eine einfache und sicher kontrollierbare Diagnose-Hardware zu entwerfen, welche den Zugang
zur Funktionsweise des Debuggers über seine Ausgänge erschließt.
Derzeitig ist der Test Access Port (TAP nach IEEE 1149.1-Standard) für viele Integratoren die
Grundlage für den Zugriff auf ihre instanzierte Hardware. Selbst in forschungsorientierten Multi-
Core System-on-Chip Architekturen wie dem ARM11MP der Firma ARM wird dieses Verfahren
noch immer eingesetzt.
In unserem Beitrag möchten wir ein Spezialwerkzeug zur Analyse des TAPKommunikationsprotokolles
vorstellen, welches den Einsatz teurer Analysetechnik (Logik-
Analysatoren) unnötig werden lässt und darüber hinaus eine komfortable, weitergehende
Unterstützung für Multi-Core-Systeme bietet.
Aufbauend auf der Problematik der Abtastung und Erfassung der Signalzustände am TAP
mittels FPGA wird auf die verschiedenen Visualisierungs- und Analyseaspekte der TAPProtokollphasen
in einer Multi-Core-Prozessor-Zielsystemumgebung eingegangen.
Die hier vorgestellte Lösung ist im Rahmen eines FuE-Verbundprojektes enstanden. Das Vorhaben
wird im Rahmen der Technologieförderung mit Mitteln des Europäischen Fonds für regionale
Entwicklung (EFRE) 2000-2006 und mit Mitteln des Freistaates Sachsen gefördert.
|
130 |
Jack Rabbit : an effective Cell BE programming system for high performance parallelismEllis, Apollo Isaac Orion 08 July 2011 (has links)
The Cell processor is an example of the trade-offs made when designing a mass market power efficient multi-core machine, but the machine-exposing architecture and raw communication mechanisms of Cell are hard to manage for a programmer. Cell's design is simple and causes software complexity to go up in the areas of achieving low threading overhead, good bandwidth efficiency, and load balance. Several attempts have been made to produce efficient and effective programming systems for Cell, but the attempts have been too specialized and thus fall short. We present Jack Rabbit, an efficient thread pool work queue implementation, with load balancing mechanisms and double buffering. Our system incurs low threading overhead, gets good load balance, and achieves bandwidth efficiency. Our system represents a step towards an effective way to program Cell and any similar current or future processors. / text
|
Page generated in 0.0412 seconds