Global ETD Search

1	Implementing CAL Actor Component on Massively Parallel Processor Array Khanfar, Husni January 2010 (has links) No description available. MPPA dataflow CAL Massively Parallel Processor Array static scheduling
2	Décomposition des jeux dans le domaine du General Game Playing. / Game Decomposition for General Game Playing Aline Hufschmitt Hufschmitt, Aline 04 October 2018 (has links) Dans cette thèse nous présentons une approche générale et robuste pour la décomposition des jeux décrits en Game Description Language (GDL). Dans le domaine du General Game Playing (GGP), les joueurs peuvent significativement diminuer le coût de l'exploration d'un jeu s'ils disposent d'une version décomposée de celui-ci. Les travaux existants sur la décomposition des jeux s'appuient sur la structure syntaxique des règles, sur des habitudes d'écriture du GDL ou sur le coûteux calcul de la forme normale disjonctive des règles. Nous proposons une méthode plus générale pour décomposer les jeux solitaires ou multijoueurs. Dans un premier temps nous envisageons une approche fondée sur l'analyse logique des règles. Celle-ci nécessite l'utilisation d'heuristiques, qui en limitent la robustesse, et le coûteux calcul de la forme normale disjonctive des règles. Une seconde approche plus efficace est fondée sur la collecte d'informations durant des simulations (playouts). Cette dernière permet la détection des liens de causalité entre les actions et les fluents d'un jeu. Elle est capable de traiter les différents types de jeux composés et de prendre en charge certains cas difficiles comme les jeux à actions composées et les jeux en série. Nous avons testé notre approche sur un panel de 597 jeux GGP. Pour 70% des jeux, la décomposition nécessite moins d'une minute en faisant 5k playouts. Nous montrons de 87% d'entre eux peuvent être correctement décomposés après seulement 1k playouts. Nous ébauchons également une approche originale pour jouer avec ces jeux décomposés. Les tests préliminaires sur quelques jeux solitaires sont prometteurs. / This PhD thesis presents a robust and general approach for the decomposition of games described in Game Description Language (GDL). In the General Game Playing framework (GGP), players can drastically decrease game search cost if they hold a decomposed version of the game. Previous works on decomposition rely on syntactical structures and writing habits of the GDL, or on the disjunctive normal form of the rules, which is very costly to compute. We offer an approach to decompose single or multi-player games. First, we consider an approach based on logical rule analysis. This requires the use of heuristics, which limit its robustness, and the costly calculation of the disjunctive normal form of the rules. A second more efficient approach is based on information gathering during simulations of the game (playouts). The latter allows the detection of causal links between actions. It can handle the different classes of compound games and can process some difficult cases like synchrounous parallel games with compound moves and serial games. We tested our program on 597 games. Given 5k playouts, 70% of the games are decomposed in less than one minute. We demonstrate that for 87% of the games, 1k playouts are sufficient to obtain a correct decomposition. We also sketch an original approach to play with these decomposed games. Preliminary tests on some one-player games are promising. Another contribution of this thesis is the evaluation of the MPPA architecture for the parallelization of a GGP player (LeJoueur of Jean-Noël Vittaut). General Game Playing Game Description Langage Decomposition Causalité Actions composées Jeux en série Parallélisation MPPA General Game Playing Game Description Langage Decomposition Causality Compound moves Serial games Parallelization MPPA
3	Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture Savaş, Süleyman January 2009 (has links) This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages. Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable and precise numerical results are obtained as outputs of the algorithms. However the analysis results are not reliable because of the performance analysis tools. Am2000 family Ambric register aDesigner radar antenna arrays beamforming QRD Gentleman-Kung systolic array Givens Rotations MPPA massively parallel processor array fixed point
4	Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture Savaş, Süleyman January 2008 (has links) <p>This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal </p><p>processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented </p><p>to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages. </p><p>Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable and precise numerical results are obtained as outputs of the algorithms. However the analysis </p><p>results are not reliable because of the performance analysis tools.</p>
5	Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture Savaş, Süleyman January 2009 (has links) <p>This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages. Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable </p><p>and precise numerical results are obtained as outputs of the algorithms. However the analysis results are not reliable because of the performance analysis tools.</p>
6	Cache-conscious off-line real-time scheduling for multi-core platforms : algorithms and implementation / Ordonnanceur hors-ligne temps-réel et conscient du cache ciblant les architectures multi-coeurs : algorithmes et implémentations Nguyen, Viet Anh 22 February 2018 (has links) Les temps avancent et les applications temps-réel deviennent de plus en plus gourmandes en ressources. Les plate-formes multi-cœurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en réduisant la taille, le poids, et la consommation énergétique. Le challenge le plus pertinent, lors du déploiement d'un système temps-réel sur une plate-forme multi-cœur, est de garantir les contraintes temporelles des applications temps réel strict s'exécutant sur de telles plate-formes. La difficulté de ce challenge provient d'une interdépendance entre les analyses de prédictabilité temporelle. Cette interdépendance peut être figurativement liée au problème philosophique de l'œuf et de la poule, et expliqué comme suit. L'un des pré-requis des algorithmes d'ordonnancement est le Pire Temps d'Exécution (PTE) des tâches pour déterminer leur placement et leur ordre d'exécution. Mais ce PTE est lui aussi influencé par les décisions de l'ordonnanceur qui va déterminer quelles sont les tâches co-localisées ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partagées et donc le PTE. La plupart des méthodes d'analyse pour les architectures multi-cœurs supputent un seul PTE par tâche, lequel est valide pour toutes conditions d'exécutions confondues. Cette hypothèse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures dotées de caches locaux. Pour de telles architectures, le PTE d'une tâche est dépendant du contenu du cache au début de l'exécution de la dite tâche, qui est lui-même dépendant de la tâche exécutée avant et ainsi de suite. Dans cette thèse, nous proposons de prendre en compte des PTEs incluant les effets des caches privés sur le contexte d’exécution de chaque tâche. Nous proposons dans cette thèse deux techniques d'ordonnancement ciblant des architectures multi-cœurs équipées de caches locaux. Ces deux techniques ordonnancent une application parallèle modélisée par un graphe de tâches, et génèrent un planning statique partitionné et non-préemptif. Nous proposons une méthode optimale à base de Programmation Linéaire en Nombre Entier (PLNE), ainsi qu'une méthode de résolution par heuristique basée sur de l'ordonnancement par liste. Les résultats expérimentaux montrent que la prise en compte des effets des caches privés sur les PTE des tâches réduit significativement la longueur des ordonnancements générés, ce comparé à leur homologue ignorant les caches locaux. Afin de parfaire les résultats ainsi obtenus, nous avons réalisé l'implémentation de nos ordonnancements dirigés par le temps et conscients du cache pour un déploiement sur une machine Kalray MPPA-256, une plate-forme multi-cœur en grappes (clusters). En premier lieu, nous avons identifié les challenges réels survenant lors de ce type d'implémentation, tel que la pollution des caches, la contention induite par le partage du bus, les délais de lancement d'une tâche introduits par la présence de l'ordonnanceur, et l'absence de cohérence des caches de données. En second lieu, nous proposons des stratégies adaptées et incluant, dans la formulation PLNE, les contraintes matérielles ; ainsi qu'une méthode permettant de générer le code final de l'application. Enfin, l'évaluation expérimentale valide la correction fonctionnelle et temporelle de notre implémentation pendant laquelle nous avons pu observé le facteur le plus impactant la longueur de l'ordonnancement: la contention. / Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order. Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches. We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks’ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods. Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution cause by the scheduler, shared bus contention, delay to the start time of tasks, and data cache inconsistency. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules. Ordonnancement temps-Réel Ordonnancements conscient du cache PLNE Ordonnancement par liste Architectures multi-Cœur Kalray MPPA-256 Real-Time scheduling Cache-Conscious schedules ILP List scheduling Multi-Core architectures Kalray MPPA-256
7	In vitro Studies of Improvement in Treatment Efficiency of Photodynamic Therapy of Cancers through Near-Infrared/Bioluminescent Activation Luo, Ting 22 May 2015 (has links) Cancer is a leading cause of death that affects millions of people across the globe each year. Photodynamic therapy (PDT) is a relatively new treatment approach for cancer in which anticancer drugs are activated by light at an appropriate wavelength to generate highly cytotoxic reactive oxygen species (ROS) and achieve tumor destruction. Compared with conventional chemo- and radiotherapy, PDT can be performed with minimal invasiveness, local targeting and reduced side effects. However, most of the currently available PDT drugs mainly absorb in the visible part of the spectrum, where light penetration depth into human tissues is very limited. Therefore, increasing the treatment depth of PDT has been considered to be an important approach to improve the effectiveness of PDT for treating larger and thicker tumor masses. In this thesis, we present our investigation into the potential of two-photon activated PDT (2-γ PDT), combination therapy of PDT and chemotherapy, and bioluminescence-activated PDT as a means to increase the treatment depth of this modality. In 2-γ PDT, the photosensitizing agents are activated through simultaneous absorption of two photons. This approach allows the use of near-infrared (NIR) light that can penetrate deeper into tissues and thus, has the potential of treating deep-seated tumors and reducing side effects, while the non-linear nature of two-photon excitation (TPE) may improve tumor targeting. We have evaluated the PDT efficacy of a second-generation photosensitizer derived from chlorophyll a, pyropheophorbide a methyl ester (MPPa), through both one- and two-photon activation. We observed that MPPa had high one-photon (1-γ PDT efficacy against both cisplatin-sensitive human cervical (HeLa) and cisplatin-resistant human lung (A549) and ovarian (NIH:OVCAR-3) cancer cells when activated by femtosecond (fs) laser pulses at 674 nm. At a low light dose of 0.06 J cm-2, the MPPa concentration required to produce a 50% cell killing effect (IC50) was determined to be 5.3 ± 0.3, 3.4 ± 0.3 and 3.6 ± 0.4 μM in HeLa, A549 and NIH:OVCAR-3 cells, respectively. More significantly, we also found that MPPa could be effectively activated at the optimal tissue-penetrating wavelength of 800 nm through TPE. At a light dose of 886 J cm-2, where no measurable photodamage was observed in the absence of MPPa, the IC50 values were measured to be 4.1 ± 0.3, 9.6 ± 1.0 and 1.6 ± 0.3 μM in HeLa, A549 and NIH:OVCAR-3 cells, respectively. We obtained corresponding LD50 (the light dose required to produce a 50% killing effect) values of 576 ± 13, 478 ± 18 and 360 ± 16 J cm-2 for 10 μM MPPa, which were approximately 3-5 times lower than the published 2-γ LD50 of Visudyne® and 20-30 times lower than that of Photofrin®. These results indicate that MPPa may serve as a photosensitizer for both 1- and 2-γ activated PDT treatment of difficult-to-treat tumors by conventional therapies. Indocyanine green (ICG), a dye having an absorption maximum near 800 nm, has been considered to be a potential NIR PDT agent. However, the PDT efficacy of ICG has been found to be very limited probably due to the low yield of cytotoxic ROS. In the present work, we have evaluated the combination effects of ICG-mediated PDT with conventional chemotherapy mediated by two types of chemotherapeutic drugs, namely the type II topoisomerase (TOPII) poisons etoposide (VP-16)/teniposide (VM-26) and the platinum-based drugs cisplatin (CDDP)/oxaliplatin (OXP). Synergistic enhancement of cytotoxicity and increased yields of DNA double strand breaks (DSBs) were observed in HeLa, A549 and NIH:OVCAR-3 cancer cells treated with the combination of ICG-PDT and VP-16. The presence of VP-16 during the laser irradiation process was found to be critical for producing a synergistic effect. An electron-transfer-based mechanism, in which ICG could increase the yield of highly cytotoxic VP-16 metabolites, was proposed for the observed synergistic effects, although direct spectroscopic detection of the reaction products was found to be very challenging. Moreover, we observed a much lower degree of synergy in the human normal fibroblast GM05757 cells than that in the three cancer cell lines investigated. Synergistic effects were also observed in A549 cells treated with the combination of ICG-PDT and VM-26 (i.e. an analog of VP-16). Furthermore, the combination of low-dose CDDP/OXP and ICG-PDT was demonstrated to produce an additive or synergistic effect in selected cancer cell lines. These preliminary results suggest that the combination of ICG-PDT with VP-16/VM-26 or CDDP/OXP chemotherapy may offer the advantages of enhancing the therapeutic effectiveness of ICG-PDT and lowering the side effects associated with the chemotherapeutic drugs. Bioluminescence, the generation of light in living organisms through chemical reactions, has been explored as an internal light source for PDT in recent years. This approach, in principle, does not suffer from the limited tissue penetration depth of light. In the present project, we have evaluated the effectiveness of luminol bioluminescence in activating the porphyrin photosensitizers meso-tetra(4-sulfonatophenyl)porphine dihydrochloride (TPPS4) and Fe(III) meso-tetra(4-sulfonatophenyl)porphine chloride (FeTPPS). The combination treatment induced significant killing of HeLa cells, while additive effects were observed in two normal human fibroblast cell lines (GM05757 and MRC-5). Our observations indicate that bioluminescence of luminol may generate sufficient light for intracellular activation of PDT sensitizers. Furthermore, the combination treatment may have intrinsic selectivity towards cancerous tissues. In summary, we have demonstrated effective killing of cancer cells by MPPa-mediated 1- and 2-γ PDT, combination of ICG-PDT and VP-16/VM-26 or CDDP/OXP chemotherapy, and bioluminescence of luminol activated PDT mediated by TPPS4/FeTPPS. These positive preliminary results indicate that all these three approaches have the potential of increasing the treatment depth of PDT and facilitating the development of more effective PDT treatment strategies.
8	Implementation and Evaluation of MPEG-4 Simple Profile Decoder on a Massively Parallel Processor Array Savas, Suleyman January 2011 (has links) The high demand of the video decoding has pushed the developers to implement the decoders on parallel architectures. This thesis provides the deliberations about the implementation of an MPEG-4 decoder on a massively parallel processor array (MPPA), Ambric 2045, by converting the CAL actor language implementation of the decoder. This decoder is the Xilinx model of the MPEG-4 Simple Profile decoder and consists of four main blocks; parser, acdc, idct2d and motion. The parser block is developed in another thesis work [20] and the rest of the decoder, which consists of the other three blocks, is implemented in this thesis work. Afterwards, in order to complete the decoder, the parser block is combined with the other three blocks. Several methods are developed for conversion purposes. Additionally, a number of other methods are developed in order to overcome the constraints of the ambric architecture such as no division support. At the beginning, for debugging purposes, the decoder is implemented on a simulator which is designed for Ambric architecture. Finally the implementation is uploaded to the Ambric 2045 chip and tested with different input streams. The performance of the implementation is analyzed and satisfying results are achieved when compared to the standards which are in use in the market. These performance results can be considered as satisfying for any real-time application as well. Furthermore, the results are compared with the results of the CAL implementation, running on a single 2GHz i7 intel processor, in terms of speed and efficiency. The Ambric implementation runs 4,7 times faster than the CAL implementation when a small input stream (300 frames with resolution of 176x144) is used. However, when a large input stream (384 frames with resolution of 720x480) is used, the Ambric implementation shows a performance which is approximately 32 times better than the CAL implementation, in terms of decoding speed and throughput. The performance may increase further together with the size of the input stream up to some point. MPEG MPEG-4 Ambric Ambric Architecture MPPA Processor Array Parallel Architecture CAL Caltrop Video Decoding Video Decoder CAL Actor Language Converting CAL ajava astruct simple profile decoder embedded system video encoding RVC Reconfigurable Video Coding Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap

Search results