• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 16
  • 9
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 45
  • 45
  • 15
  • 15
  • 14
  • 10
  • 10
  • 9
  • 8
  • 7
  • 7
  • 7
  • 7
  • 7
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Applying tree knapsack approaches to general network design : a case study / T. Baitshenyetsi

Baitshenyetsi, Tumo January 2010 (has links)
There are many practical decision problems that fall into the category of network flow problems: numerous examples of applications can be found in areas such as telecommunications, logistics, distributions, engineering, computer science and so on. One of the most popular and valuable tools to solve network flow problems of a topological nature is the use of linear programming models. An important extension of these models is that of integer programming models that deal with problems where some, or all, of the variables are required to assume integer variables. A significant application in this class of problems is the knapsack problem that arises in different contexts such as loading containers in aircraft or satisfying the demand for various lengths of cloth which must be cut from fixed length bolts of fabric. In this study, the feasibility of representing a network flow model in a tree network model and subsequently solving it using a tree knapsack approach is investigated. To compare and validate the proposed technique, a specific case study was chosen from the literature that can be used as a basis for the research project. The said study was an oil pipeline design problem, addressed by Brimberg et al. (2003). This focuses on the optimal design of an oil pipeline network for the South Gabon oil field in Africa. The objective was to reduce oil transportation costs to a major port. Following an overview of different network flow and knapsack models, an overview of the said matter is presented. A description of the proposed tree knapsack approach and the application of this approach to the given problem is given. Results have indicated that it is feasible to apply a tree knapsack approach to solve network flow problems. / Thesis (M.Sc. (Computer Science))--North-West University, Potchefstroom Campus, 2011.
32

Concurrency Analysis and Mining Techniques for APIs

Santhiar, Anirudh January 2017 (has links) (PDF)
Software components expose Application Programming Interfaces (APIs) as a means to access their functionality, and facilitate reuse. Developers use APIs supplied by programming languages to access the core data structures and algorithms that are part of the language framework. They use the APIs of third-party libraries for specialized tasks. Thus, APIs play a central role in mediating a developer's interaction with software, and the interaction between different software components. However, APIs are often large, complex and hard to navigate. They may have hundreds of classes and methods, with incomplete or obsolete documentation. They may encapsulate concurrency behaviour that the developer is unaware of. Finding the right functionality in a large API, using APIs correctly, and maintaining software that uses a constantly evolving API are challenges that every developer runs into. In this thesis, we design automated techniques to address two problems pertaining to APIs (1) Concurrency analysis of APIs, and (2) API mining. Speci cally, we consider the concurrency analysis of asynchronous APIs, and mining of math APIs to infer the functional behaviour of API methods. The problem of concurrency bugs such as race conditions and deadlocks has been well studied for multi-threaded programs. However, developers have been eschewing a pure multi-threaded programming model in favour of asynchronous programming models supported by asynchronous APIs. Asynchronous programs and multi-threaded programs have different semantics, due to which existing techniques to analyze the latter cannot detect bugs present in programs that use asynchronous APIs. This thesis addresses the problem of concurrency analysis of programs that use asynchronous APIs in an end-to-end fashion. We give operational semantics for important classes of asynchronous and event-driven systems. The semantics are designed by carefully studying real software and serve to clarify subtleties in scheduling. We use the semantics to inform the design of novel algorithms to find races and deadlocks. We implement the algorithms in tools, and show their effectiveness by finding serious bugs in popular open-source software. To begin with, we consider APIs for asynchronous event-driven systems supporting pro-grammatic event loops. Here, event handlers can spin event loops programmatically in addition to the runtime's default event loop. This concurrency idiom is supported by important classes of APIs including GUI, web browser, and OS APIs. Programs that use these APIs are prone to interference between a handler that is spinning an event loop and another handler that runs inside the loop. We present the first happens-before based race detection technique for such programs. Next, we consider the asynchronous programming model of modern languages like C]. In spite of providing primitives for the disciplined use of asynchrony, C] programs can deadlock because of incorrect use of blocking APIs along with non-blocking (asynchronous) APIs. We present the rst deadlock detection technique for asynchronous C] programs. We formulate necessary conditions for deadlock using a novel program representation that represents procedures and continuations, control ow between them and the threads on which they may be scheduled. We design a static analysis to construct the pro-gram representation and use it to identify deadlocks. Our ideas have resulted in research tools with practical impact. Sparse Racer, our tool to detect races, found 13 previously unknown use-after-free bugs in KDE Linux applications. Dead Wait, our deadlock detector, found 43 previously unknown deadlocks in asynchronous C] libraries. Developers have fixed 43 of these races and deadlocks, indicating that our techniques are useful in practice to detect bugs that developers consider worth fixing. Using large APIs effectively entails finding the right functionality and calling the methods that implement it correctly, possibly composing many API elements. Automatically infer-ring the information required to do this is a challenge that has attracted the attention of the research community. In response, the community has introduced many techniques to mine APIs and produce information ranging from usage examples and patterns, to protocols governing the API method calling sequences. We show how to mine unit tests to match API methods to their functional behaviour, for the specific but important class of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations, and are supplied by many competing libraries. In contrast to obtaining usage examples or identifying correct call sequences, the challenge in this domain is to infer API methods required to perform a particular mathematical computation, and to compose them correctly. We let developers specify mathematical computations naturally, as a math expression in the notation of interpreted languages (such as Matlab). Our unit test mining technique maps subexpressions to math API methods such that the method's functional behaviour matches the subexpression's executable semantics, as defined by the interpreter. We apply our technique, called MathFinder, to math API discovery and migration, and validate it in a user study. Developers who used MathFinder nished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, and IDE code completion. We also demonstrate the use of MathFinder to assist in the migration of Weka, a popular machine learning library, to a different linear algebra library.
33

Modèles de calculs flot de données avec paramètres entiers et booléens. Modélisation - Analyses - Mise en oeuvre / Boolean Parametric Data Flow Modeling - Analyses - Implementation

Bempelis, Evangelos 26 February 2015 (has links)
Les applications de gestion de flux sont responsables de la majorité des calculs des systèmes embarqués (vidéo conférence, vision par ordinateur). Leurs exigences de haute performance rendent leur mise en œuvre parallèle nécessaire. Par conséquent, il est de plus en plus courant que les systèmes embarqués modernes incluent des processeurs multi-cœurs qui permettent un parallélisme massif. La mise en œuvre des applications de gestion de flux sur des multi-cœurs est difficile à cause de leur complexité, qui tend à augmenter, et de leurs exigences strictes à la fois qualitatives (robustesse, fiabilité) et quantitatives (débit, consommation d'énergie). Ceci est observé dans l'évolution de codecs vidéo qui ne cessent d'augmenter en complexité, tandis que leurs exigences de performance demeurent les mêmes. Les modèles de calcul (MdC) flot de données ont été développés pour faciliter la conception de ces applications qui sont typiquement composées de filtres qui échangent des flux de données via des liens de communication. Ces modèles fournissent une représentation intuitive des applications de gestion de flux, tout en exposant le parallélisme de tâches de l'application. En outre, ils fournissent des analyses statiques pour la vivacité et l'exécution en mémoire bornée. Cependant, les applications de gestion de flux modernes comportent des filtres qui échangent des quantités de données variables, et des liens de communication qui peuvent être activés / désactivés. Dans cette thèse, nous présentons un nouveau MdC flot de données, le Boolean Parametric Data Flow (BPDF), qui permet le paramétrage de la quantité de données échangées entre les filtres en utilisant des paramètres entiers et l'activation et la désactivation de liens de communication en utilisant des paramètres booléens. De cette manière, BPDF est capable de exprimer des applications plus complexes, comme les décodeurs vidéo modernes. Malgré l'augmentation de l'expressivité, les applications BPDF restent statiquement analysables pour la vivacité et l'exécution en mémoire bornée. Cependant, l'expressivité accrue complique grandement la mise en œuvre. Les paramètres entiers entraînent des dépendances de données de type paramétrique et les paramètres booléens peuvent désactiver des liens de communication et ainsi éliminer des dépendances de données. Pour cette raison, nous proposons un cadre d'ordonnancement qui produit des ordonnancements de type ``aussi tôt que possible'' (ASAP) pour un placement statique donné. Il utilise des contraintes d'ordonnancement, soit issues de l'application (dépendance de données) ou de l'utilisateur (optimisations d'ordonnancement). Les contraintes sont analysées pour la vivacité et, si possible, simplifiées. De cette façon, notre cadre permet une grande variété de politiques d'ordonnancement, tout en garantissant la vivacité de l'application. Enfin, le calcul du débit d'une application est important tant avant que pendant l'exécution. Il permet de vérifier que l'application satisfait ses exigences de performance et il permet de prendre des décisions d'ordonnancement à l'exécution qui peuvent améliorer la performance ou la consommation d'énergie. Nous traitons ce problème en trouvant des expressions paramétriques pour le débit maximum d'un sous-ensemble de BPDF. Enfin, nous proposons un algorithme qui calcule une taille des buffers suffisante pour que l'application BPDF ait un débit maximum. / Streaming applications are responsible for the majority of the computation load in many embedded systems (video conferencing, computer vision etc). Their high performance requirements make parallel implementations a necessity. Hence, more and more modern embedded systems include many-core processors that allow massive parallelism. Parallel implementation of streaming applications on many-core platforms is challenging because of their complexity, which tends to increase, and their strict requirements both qualitative (e.g., robustness, reliability) and quantitative (e.g., throughput, power consumption). This is observed in the evolution of video codecs that keep increasing in complexity, while their performance requirements remain the same or even increase. Data flow models of computation (MoCs) have been developed to facilitate the design process of such applications, which are typically composed of filters exchanging streams of data via communication links. Data flow MoCs provide an intuitive representation of streaming applications, while exposing the available parallelism of the application. Moreover, they provide static analyses for liveness and boundedness. However, modern streaming applications feature filters that exchange variable amounts of data, and communication links that are not always active. In this thesis, we present a new data flow MoC, the Boolean Parametric Data Flow (BPDF), that allows parametrization of the amount of data exchanged between the filters using integer parameters and the enabling and disabling of communication links using boolean parameters. In this way, BPDF is able to capture more complex streaming applications, like video decoders. Despite the increase in expressiveness, BPDF applications remain statically analyzable for liveness and boundedness. However, increased expressiveness greatly complicates implementation. Integer parameters result in parametric data dependencies and the boolean parameters disable communication links, effectively removing data dependencies. We propose a scheduling framework that facilitates the scheduling of BPDF applications. Our scheduling framework produces as soon as possible schedules for a given static mapping. It takes us input scheduling constraints that derive either from the application (data dependencies) or from the user (schedule optimizations). The constraints are analyzed for liveness and, if possible, simplified. In this way, our framework provides flexibility, while guaranteeing the liveness of the application. Finally, calculation of the throughput of an application is important both at compile-time and at run-time. It allows to verify at compile-time that the application meets its performance requirements and it allows to take scheduling decisions at run-time that can improve performance or power consumption. We approach this problem by finding parametric throughput expressions for the maximum throughput of a subset of BPDF graphs. Finally, we provide an algorithm that calculates sufficient buffer sizes for the BPDF graph to operate at maximum throughput.
34

Débogage des systèmes embarqués multiprocesseur basé sur la ré-exécution déterministe et partielle / Deterministic and partial replay debugging of multiprocessor embedded systems

Georgiev, Kiril 04 December 2012 (has links)
Les plates-formes MPSoC permettent de satisfaire les contraintes de performance, de flexibilité et de consommation énergétique requises par les systèmes embarqués émergents. Elles intègrent un nombre important de processeurs, des blocs de mémoire et des périphériques, hiérarchiquement organisés par un réseau d'interconnexion. Le développement du logiciel est réputé difficile, notamment dû à la gestion d'un grand nombre d'entités (tâches/threads/processus). L'exécution concurrente de ces entités permet d'exploiter efficacement l'architecture mais complexifie le processus de mise au point et notamment l'analyse des erreurs. D'une part, les exécutions peuvent être non-déterministes notamment dû à la concurrence, c'est à dire qu'elles peuvent se dérouler d'une manière différente à chaque reprise. En conséquence, il n'est pas garanti qu'une erreur se produirait durant la phase de mise au point. D'autre part, la complexité de l'architecture et de l'exécution peut rendre trop important le nombre d'éléments à analyser afin d'identifier une erreur. Il pourrait donc être difficile de se focaliser sur des éléments potentiellement fautifs. Un des défis majeurs du développement logiciel MPSoC est donc de réduire le temps de la mise au point. Dans cette thèse, nous proposons une méthodologie de mise au point qui aide le développeur à identifier les erreurs dans le logiciel MPSoC. Notre premier objectif est de déboguer une même exécution plusieurs fois afin d'analyser des sources potentielles de l'erreur jusqu'à son identification. Nous avons donc identifié les sources de non-déterminisme MPSoC et proposé des mécanismes de ré-exécution déterministe les plus adaptés. Notre deuxième objectif vise à minimiser les ressources pour reproduire une exécution afin de satisfaire la contrainte MPSoC de maîtrise de l'intrusion. Nous avons donc utilisé des mécanismes efficaces de ré-exécution déterministe et considéré qu'une partie du comportement non-déterministe. Le troisième objectif est de permettre le passage à l'échelle, c'est à dire de déboguer des exécutions caractérisées par un nombre d'éléments de plus en plus croissant. Nous avons donc proposé une méthode qui permet de circonscrire et de déboguer qu'une partie de l'exécution. De plus, cette méthode s'applique aux différents types d'architectures et d'applications MPSoC. / MPSoC platforms provide high performance, low power consumption and flexi-bility required by the emerging embedded systems. They incorporate many proces-sing units, memory blocs and peripherals, hierarchically organized by interconnec-tion network. The software development is known to be difficult, namely due to themanagement of multiple entities (tasks/threads/processes). The concurrent execu-tion of these entities allows to exploit efficiently the architecture but complicatesthe refinement process of the software and especially the debugging activity. Onthe one hand, the executions of the software can be non-deterministic, namely dueto the concurrency, i.e. they perform differently each time. Consequently, thereis no guaranties that an error will occur during the debugging activity. On theother hand, the complexity of the architecture and the execution can increase theelements to be analyzed in the debugging process. As a result, it can be difficultto concentrate on the potentially faulty elements. Therefore, one of the most im-portant challenges in the development process of MPSoC software is to reduce thetime of the refinement process.In this thesis, we propose a new methodology to refine the MPSoC softwarewhich helps the developers to do the debugging activity. Our first objective is tobe able to debug the same execution several times in order to analyze potentialsources of the error. To do so, we identified the sources of non-determinism in theMPSoC software executions and propose the most appropriate methods to recordand replay them. Our second objective is to reduce the execution overhead requi-red by the record mechanisms to limit the intrusiveness which is an importantMPSoC constraint. To accomplish this objective, we consider a part of the non-deterministic behaviour and selected efficient record-replay methods. The thirdobjective is to provide a scalable solution, i.e. to be able to debug more and morecomplex executions, characterized by an increasing number of elements. Therefore,we propose a partial replay method which allows to isolate and debug a fraction ofthe execution elements. Moreover, this method applies to different types of archi-tectures and applications MPSoC.
35

Exploration of parallel graph-processing algorithms on distributed architectures / Exploration d’algorithmes de traitement parallèle de graphes sur architectures distribuées

Collet, Julien 06 December 2017 (has links)
Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure. / With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture.
36

Comparison of Shared memory based parallel programming models

Ravela, Srikar Chowdary January 2010 (has links)
Parallel programming models are quite challenging and emerging topic in the parallel computing era. These models allow a developer to port a sequential application on to a platform with more number of processors so that the problem or application can be solved easily. Adapting the applications in this manner using the Parallel programming models is often influenced by the type of the application, the type of the platform and many others. There are several parallel programming models developed and two main variants of parallel programming models classified are shared and distributed memory based parallel programming models. The recognition of the computing applications that entail immense computing requirements lead to the confrontation of the obstacle regarding the development of the efficient programming models that bridges the gap between the hardware ability to perform the computations and the software ability to support that performance for those applications [25][9]. And so a better programming model is needed that facilitates easy development and on the other hand porting high performance. To answer this challenge this thesis confines and compares four different shared memory based parallel programming models with respect to the development time of the application under a shared memory based parallel programming model to the performance enacted by that application in the same parallel programming model. The programming models are evaluated in this thesis by considering the data parallel applications and to verify their ability to support data parallelism with respect to the development time of those applications. The data parallel applications are borrowed from the Dense Matrix dwarfs and the dwarfs used are Matrix-Matrix multiplication, Jacobi Iteration and Laplace Heat Distribution. The experimental method consists of the selection of three data parallel bench marks and developed under the four shared memory based parallel programming models considered for the evaluation. Also the performance of those applications under each programming model is noted and at last the results are used to analytically compare the parallel programming models. Results for the study show that by sacrificing the development time a better performance is achieved for the chosen data parallel applications developed in Pthreads. On the other hand sacrificing a little performance data parallel applications are extremely easy to develop in task based parallel programming models. The directive models are moderate from both the perspectives and are rated in between the tasking models and threading models. / From this study it is clear that threading model Pthreads model is identified as a dominant programming model by supporting high speedups for two of the three different dwarfs but on the other hand the tasking models are dominant in the development time and reducing the number of errors by supporting high growth in speedup for the applications without any communication and less growth in self-relative speedup for the applications involving communications. The degrade of the performance by the tasking models for the problems based on communications is because task based models are designed and bounded to execute the tasks in parallel without out any interruptions or preemptions during their computations. Introducing the communications violates the purpose and there by resulting in less performance. The directive model OpenMP is moderate in both aspects and stands in between these models. In general the directive models and tasking models offer better speedup than any other models for the task based problems which are based on the divide and conquer strategy. But for the data parallelism the speedup growth however achieved is low (i.e. they are less scalable for data parallel applications) are equally compatible in execution times with threading models. Also the development times are considerably low for data parallel applications this is because of the ease of development supported by those models by introducing less number of functional routines required to parallelize the applications. This thesis is concerned about the comparison of the shared memory based parallel programming models in terms of the speedup. This type of work acts as a hand in guide that the programmers can consider during the development of the applications under the shared memory based parallel programming models. We suggest that this work can be extended in two different ways: one is from the developer‘s perspective and the other is a cross-referential study about the parallel programming models. The former can be done by using a similar study like this by a different programmer and comparing this study with the new study. The latter can be done by including multiple data points in the same programming model or by using a different set of parallel programming models for the study. / C/O K. Manoj Kumar; LGH 555; Lindbloms Vägan 97; 37233; Ronneby. Phone no: 0738743400 Home country phone no: +91 9948671552
37

Passage à l'echelle d'un support d'exécution à base de tâches pour l'algèbre linéaire dense / Scalability of a task-based runtime system for dense linear algebra applications

Sergent, Marc 08 December 2016 (has links)
La complexification des architectures matérielles pousse vers l’utilisation de paradigmes de programmation de haut niveau pour concevoir des applications scientifiques efficaces, portables et qui passent à l’échelle. Parmi ces paradigmes, la programmation par tâches permet d’abstraire la complexité des machines en représentant les applications comme des graphes de tâches orientés acycliques (DAG). En particulier, le modèle de programmation par tâches soumises séquentiellement (STF) permet de découpler la phase de soumission des tâches, séquentielle, de la phase d’exécution parallèle des tâches. Même si ce modèle permet des optimisations supplémentaires sur le graphe de tâches au moment de la soumission, il y a une préoccupation majeure sur la limite que la soumission séquentielle des tâches peut imposer aux performances de l’application lors du passage à l’échelle. Cette thèse se concentre sur l’étude du passage à l’échelle du support d’exécution StarPU (développé à Inria Bordeaux dans l’équipe STORM), qui implémente le modèle STF, dans le but d’optimiser les performances d’un solveur d’algèbre linéaire dense utilisé par le CEA pour faire de grandes simulations 3D. Nous avons collaboré avec l’équipe HiePACS d’Inria Bordeaux sur le logiciel Chameleon, qui est une collection de solveurs d’algèbre linéaire portés sur supports d’exécution à base de tâches, afin de produire un solveur d’algèbre linéaire dense sur StarPU efficace et qui passe à l’échelle jusqu’à 3 000 coeurs de calcul et 288 accélérateurs de type GPU du supercalculateur TERA-100 du CEA-DAM. / The ever-increasing supercomputer architectural complexity emphasizes the need for high-level parallel programming paradigms to design efficient, scalable and portable scientific applications. Among such paradigms, the task-based programming model abstracts away much of the architecture complexity by representing an application as a Directed Acyclic Graph (DAG) of tasks. Among them, the Sequential-Task-Flow (STF) model decouples the task submission step, sequential, from the parallel task execution step. While this model allows for further optimizations on the DAG of tasks at submission time, there is a key concern about the performance hindrance of sequential task submission when scaling. This thesis’ work focuses on studying the scalability of the STF-based StarPU runtime system (developed at Inria Bordeaux in the STORM team) for large scale 3D simulations of the CEA which uses dense linear algebra solvers. To that end, we collaborated with the HiePACS team of Inria Bordeaux on the Chameleon software, which is a collection of linear algebra solvers on top of task-based runtime systems, to produce an efficient and scalable dense linear algebra solver on top of StarPU up to 3,000 cores and 288 GPUs of CEA-DAM’s TERA-100 cluster.
38

Modelos de programação matemática para o gerenciamento de energia em modernos sistemas de distribuição de energia elétrica / Models of mathematical programming for energy management in modern electricity distribution systems

Ñahuis, Fernando Vladimir Cerna [UNESP] 17 February 2017 (has links)
Submitted by FERNANDO VLADIMIR CERNA ÑAHUIS null (fvcerna83@gmail.com) on 2017-02-22T17:36:35Z No. of bitstreams: 1 TESE-FINAL.pdf: 2994686 bytes, checksum: 5e487b40d4aeb0006f40a0abdb9d9af6 (MD5) / Approved for entry into archive by LUIZA DE MENEZES ROMANETTO (luizamenezes@reitoria.unesp.br) on 2017-02-24T20:11:29Z (GMT) No. of bitstreams: 1 nahuis_fvc_dr_ilha.pdf: 2994686 bytes, checksum: 5e487b40d4aeb0006f40a0abdb9d9af6 (MD5) / Made available in DSpace on 2017-02-24T20:11:29Z (GMT). No. of bitstreams: 1 nahuis_fvc_dr_ilha.pdf: 2994686 bytes, checksum: 5e487b40d4aeb0006f40a0abdb9d9af6 (MD5) Previous issue date: 2017-02-17 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Nesta tese são apresentados três modelos de programação matemática que abordam os problemas de otimização relacionados ao gerenciamento da energia nos sistemas de distribuição de energia elétrica (SDEE), como: 1) Programação ótima das entregas e carregamento dos veículos elétricos (VEs) durante a navegação em um mapa de cidade, 2) Gerenciamento ótimo pelo lado da demanda considerando um sistema fotovoltaico híbrido (SFH) em uma residência em baixa tensão (RBT) no SDEE, e 3) O melhoramento do fator de carga (FC) do SDEE através do controle da demanda. O primeiro problema visa minimizar os custos relacionados com a manutenção e geração de horas extra durante a operação de uma frota de VEs, levando em conta um conjunto de entregas pre-especificadas, assim como, pontos de carregamento alocados ao longo de cada via urbana (principal e/ou secundária) pertencente ao mapa da cidade. No segundo problema, para uma residência em baixa tensão é planejado um perfil ótimo de consumo para o dia seguinte. Este perfil de consumo é obtido através de um programa de gerenciamento pelo lado da demanda (GLD) que considera uma estrutura tarifária e um esquema de operação que otimiza os recursos energéticos vindos de um SFH e o SDEE. Para cada problema de otimização é apresentado o seu correspondente modelo de programação não linear inteiro misto (PNLIM). O terceiro problema visa minimizar os custos por compra de energia (consumo e perdas de potência ativa) da concessionária, levando em conta, o controle da demanda dos usos-finais, presentes nas unidades consumidoras (residenciais, comerciais, e industriais) no SDEE. As incertezas na utilização dos usos-finais nas unidades consumidoras são simuladas através de um algoritmo Monte Carlo. Além disso, o modelo proposto PIMRQ é rodado dentro de um processo iterativo, que visa a melhoria do FC do SDEE. Por outro lado, através destes modelos não-lineares, a solução ótima global não é garantida, enquanto o uso de modelos equivalentes (para o primeiro e segundo problema, sendo um modelo aproximado para o terceiro) de programação linear inteira mista (PLIM) resolvidos por ferramentas de otimização clássica existentes garantem a convergência para a solução ótima global. Por conseguinte, para resolver este inconveniente, os seus modelos MILP equivalentes são obtidos e explicados em detalhe. Os modelos propostos foram implementados na linguagem de modelagem algébrica AMPL e resolvidos usando o solver comercial CPLEX. Além disso, algoritmos de simulação para representar as incertezas dos tempos de demora na operação dos VEs e os hábitos de utilização dos usos-finais durante o dia, são desenvolvidos. Um grafo unidirecional de 71 nós, uma rede elétrica IEEE de 34 nós, e 21 usos-finais (incluído um VE plug-in para o carregamento na residência) residenciais são utilizados para testar a precisão e a eficiência, assim como, também técnica de solução dos modelos propostos para cada problema. / This thesis presents three mathematical programming models to address the optimization problems related to the energy management in the electricity distribution systems (EDSs), such as: 1) Optimal delivery scheduling and charging of electric vehicles (EVs) in the navigation of a city map, 2) Optimal demand side management of an EDS considering a hybrid photovoltaic system (HPS) in a residential low voltage (RLV), and 3) Load factor improvement through the demand control in the EDS. The first problem aims at minimizing the costs related to the maintenance and generation of extra hours during the operation of a EVs fleet, taking into account a number of prespecified deliveries, as well as charging points allocated along each urban road (main or secondary) belongs to the city map. In the second problem, for a RLV, an optimal consumption profile of a day-ahead is planned. This consumption profile is obtained through a demand side management (DSM) program that considers a tariff structure and an operating scheme that optimizes the energy resources coming from HFS and EDS. The third problem aims at minimizing the costs of energy purchase (consumption and active energy losses) of the company, taking into account, the demand control of the end-uses, presents in the consumers units (residential, commercial, and industrial) in the EDS. Uncertainties in the use of the end-uses in the different consumer units are simulated through a Monte Carlo algorithm that determines a habitual consumption profile for EDSs. Based on this habitual profile, the proposed MIPRQ model determines an optimal profile for EDSs. This model uses an iterative process that aims to improve the load factor of the EDS. For each optimization problem the corresponding non-linear mixed integer programming (NLMIP) model is presented. On the other hand, via these nonlinear models, the global optimal solution is not guaranteed, while using the equivalent mixed-integer linear (MILP) models (for the first and second problems, being an approximate model for the third) and solving them by existing classical optimization tools ensures convergence to global optimal solution. Therefore, in order to address this drawback, their equivalent mixed integer linear programming (MILP) models are obtained and explained in detail. The proposed models are implemented in the algebraic modeling language AMPL and solved using the commercial CPLEX solver. Moreover, simulations algorithms to represent the uncertainties of delay times in the operation of EVs and usage habits of end-uses during the day, are developed. A multidirectional graph with 71 nodes, an electrical network IEEE 34 nodes, and a quantity of 21 residential end-uses (including an EV plug-in for residential charging) are used to test the precision and the efficiency, as well as the solution technique of the models proposed for each problem. / CNPq: 141462/2013- 2
39

Squelettes algorithmiques pour la programmation et l'exécution efficaces de codes parallèles / Algorithmic skeletons for efficient programming and execution of parallel codes

Legaux, Joeffrey 13 December 2013 (has links)
Les architectures parallèles sont désormais présentes dans tous les matériels informatiques, mais les programmeurs ne sont généralement pas formés à leur programmation dans les modèles explicites tels que MPI ou les Pthreads. Il y a un besoin important de modèles plus abstraits tels que les squelettes algorithmiques qui sont une approche structurée. Ceux-ci peuvent être vus comme des fonctions d’ordre supérieur synthétisant le comportement d’algorithmes parallèles récurrents que le développeur peut ensuite combiner pour créer ses programmes. Les développeurs souhaitent obtenir de meilleures performances grâce aux programmes parallèles, mais le temps de développement est également un facteur très important. Les approches par squelettes algorithmiques fournissent des résultats intéressants dans ces deux aspects. La bibliothèque Orléans Skeleton Library ou OSL fournit un ensemble de squelettes algorithmiques de parallélisme de données quasi-synchrones dans le langage C++ et utilise des techniques de programmation avancées pour atteindre une bonne efficacité. Nous avons amélioré OSL afin de lui apporter de meilleures performances et une plus grande expressivité. Nous avons voulu analyser le rapport entre les performances des programmes et l’effort de programmation nécessaire sur OSL et d’autres modèles de programmation parallèle. La comparaison rigoureuse entre des programmes parallèles dans OSL et leurs équivalents de bas niveau montre une bien meilleure productivité pour les modèles de haut niveau qui offrent une grande facilité d’utilisation tout en produisant des performances acceptables. / Parallel architectures have now reached every computing device, but software developers generally lackthe skills to program them through explicit models such as MPI or the Pthreads. There is a need for moreabstract models such as the algorithmic skeletons which are a structured approach. They can be viewed ashigher order functions that represent the behaviour of common parallel algorithms, and those are combinedby the programmer to generate parallel programs. Programmers want to obtain better performances through the usage of parallelism, but the development time implied is also an important factor. Algorithmic skeletons provide interesting results in both those fields. The Orléans Skeleton Library or OSL provides a set of algorithmic skeletons for data parallelism within the bulk synchronous parallel model for the C++ language. It uses advanced metaprogramming techniques to obtain good performances. We improved OSL in order to obtain better performances from its generated programs, and extended its expressivity. We wanted to analyze the ratio between the performance of programs and the development effort needed within OSL and other parallel programming models. The comparison between parallel programs written within OSL and their equivalents in low level parallel models shows a better productivity for high level models : they are easy to use for the programmers while providing decent performances.
40

Etude de performances sur processeurs multicoeur : environnement d'exécution événementiel efficace et étude comparative de modèles de programmation / Performance studies on multicore processors : efficient event-driven runtime and programming models comparison

Geneves, Sylvain 05 April 2013 (has links)
Cette thèse traite des performances des serveurs de données en multi-cœur. Plus précisément nous nous intéressons au passage à l'échelle avec le nombre de cœurs. Dans un premier temps, nous étudions le fonctionnement interne d'un support d'exécution événementiel multi-cœur. Nous montrons tout d'abord que le faux-partage ainsi que les mécanismes de communications inter-cœurs dégradent fortement les performances et empêchent le passage à l'échelle des applications. Nous proposons alors plusieurs optimisations pour pallier ces comportements. Dans un second temps, nous comparons les performances en multi-cœur de trois serveurs Web chacun représentatif d'un modèle de programmation. Nous remarquons que les différences de performances observées entre les serveurs varient lorsque le nombre de cœurs augmente. Après une analyse approfondie des performances observées, nous identifions la cause de la limitation du passage à l'échelle des serveurs étudiés. Nous présentons une proposition ainsi qu'un ensemble de pistes pour lever cette limitation. / This thesis studies the performances of data servers on multicores. More precisely, we focus on the scalability with the number of cores. First, we study the internals of an event-driven multicore runtime. We demonstrate that false sharing and inter-core communications hurt performances badly, and prevent applications from scaling. We then propose several optimisations to fix these issues. In a second part, we compare the multicore performances of three Webservers, each reprensentative of a programming model. We observe that the differences between each server's performances vary as the number of cores increases. We are able to pinpoint the cause of the scalability limitation observed. We present one approach and some perspectives to overcome this limit.

Page generated in 0.4008 seconds