Global ETD Search

1	Compiler-driven data layout transformations for network applications Fenacci, Damon January 2012 (has links) This work approaches the little studied topic of compiler optimisations directed to network applications. It starts by investigating if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compiler optimisations. It shows an automated approach that is capable of identifying domain-specific workload characterisations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarise key resource utilisation issues and enable compiler engineers to address the highlighted bottlenecks. By applying this methodology to data intensive network infrastructure application it shows that data organisation is the key obstacle to overcome in order to achieve high performance. It therefore proposes and evaluates three specialised data transformations (structure splitting, array regrouping, and software caching) against the industrial EEMBC networking benchmarks and real-world data sets. It also demonstrates on one hand that speedups of up to 2.62 can be achieved, but on the other that no single solution performs equally well across different network traffic scenarios. Hence, to address this issue, an adaptive software caching scheme for high frequency route lookup operations is introduced and its effectiveness evaluated one more time against EEMBC networking benchmarks and real-world data sets achieving speedups of up to 3.30 and 2.27. The results clearly demonstrate that adaptive data organisation schemes are necessary to ensure optimal performance under varying network loads. Finally this research addresses another issue introduced by data transformations such as array regrouping and software caching, i.e. the need for static analysis to allow efficient resource allocation. This thesis proposes a static code analyser that allows the automatic resource analysis of source code containing lists and tree structures. The tool applies a combination of amortised analysis and separation logic methodology to real code and is able to evaluate type and resource usage of existing data structures, which can be used to compute global resource consumption values for full data intensive network applications. 621.382 compiler ; networking ; optimisations
2	Optimised Ray Tracing for the SuperNEC Implementation of the Uniform Theory of Diffraction Hartleb, Robert 26 February 2007 (has links) Student Number : 0006329K - MSc(Eng) Dissertation - School of Electrical and Information Engineering - Faculty of Engineering and the Built Environment / Geometric optimisations are presented for the UTD in SuperNEC which is a commercial electromagnetic software package. Path finding optimisations rapidly find propagation paths of electromagnetic waves by using back face culling to determine the visible plates of polyhedral structures and by using reflection and diffraction zones which use image theory and the law of diffraction to determine illuminated spatial regions. An octree reduces the number of intersections during the shadow tests. Numerical results show that overall the optimisations halve the run time of the software for models which consist of plates and cylinders. The path finding optimisations do not scale with model size, are limited to plates and introduce errors. The mean absolute error due to the path finding optimisations is on average 0:02 dB for first order rays and 0:17 dB for second order rays. The octree optimisation scales with model size, can be used with any geometry and any type of ray and does not cause errors. uniform theory of diffraction geometrical optimisations ray tracing
3	Compiler optimisations and relaxed memory consistency models / Optimisations des compilateurs et modèles mémoire relâchés Morisset, Robin 05 April 2017 (has links) Les architectures modernes avec des processeurs multicœurs, ainsi que les langages de programmation modernes, ont des mémoires faiblement consistantes. Leur comportement est formalisé par le modèle mémoire de l'architecture ou du langage de programmation ; il définit précisément quelle valeur peut être lue par chaque lecture dans la mémoire partagée. Ce n'est pas toujours celle écrite par la dernière écriture dans la même variable, à cause d'optimisation dans les processeurs, telle que l'exécution spéculative d'instructions, des effets complexes des caches, et des optimisations dans les compilateurs. Dans cette thèse, nous nous concentrons sur le modèle mémoire C11 qui est défini par l'édition 2011 du standard C. Nos contributions suivent trois axes. Tout d'abord, nous avons regardé la théorie autour du modèle C11, étudiant de façon formelle quelles optimisations il autorise les compilateurs à faire. Nous montrons que de nombreuses optimisations courantes sont permises, mais, surprenamment, d'autres, importantes, sont interdites. Dans un second temps, nous avons développé une méthode à base de tests aléatoires pour détecter quand des compilateurs largement utilisés tels que GCC et Clang réalisent des optimisations invalides dans le modèle mémoire C11. Nous avons trouvés plusieurs bugs dans GCC, qui furent tous rapidement fixés. Nous avons aussi implémenté une nouvelle passez d'optimisation dans LLVM, qui recherchent des instructions des instructions spéciales qui limitent les optimisations faites par le processeur - appelées instructions barrières - et élimine celles qui ne sont pas utiles. Finalement, nous avons développé un ordonnanceur en mode utilisateur pour des threads légers communicants via des canaux premier entré-premier sorti à un seul producteur et un seul consommateur. Ce modèle de programmation est connu sous le nom de réseau de Kahn, et nous montrons comment l'implémenter efficacement, via les primitives désynchronisation de C11. Ceci démontre qu'en dépit de ses problèmes, C11 peut être utilisé en pratique. / Modern multiprocessors architectures and programming languages exhibit weakly consistent memories. Their behaviour is formalised by the memory model of the architecture or programming language; it precisely defines which write operation can be returned by each shared memory read. This is not always the latest store to the same variable, because of optimisations in the processors such as speculative execution of instructions, the complex effects of caches, and optimisations in the compilers. In this thesis we focus on the C11 memory model that is defined by the 2011 edition of the C standard. Our contributions are threefold. First, we focused on the theory surrounding the C11 model, formally studying which compiler optimisations it enables. We show that many common compiler optimisations are allowed, but, surprisingly, some important ones are forbidden. Secondly, building on our results, we developed a random testing methodology for detecting when mainstream compilers such as GCC or Clang perform an incorrect optimisation with respect to the memory model. We found several bugs in GCC, all promptly fixed. We also implemented a novel optimisation pass in LLVM, that looks for special instructions that restrict processor optimisations - called fence instructions - and eliminates the redundant ones. Finally, we developed a user-level scheduler for lightweight threads communicating through first-in first-out single-producer single-consumer queues. This programming model is known as Kahn process networks, and we show how to efficiently implement it, using C11 synchronisation primitives. This shows that despite its flaws, C11 can be usable in practice. Modèle mémoire C11 Optimisations Compilateur Barrière Memory model Memory consistency model C11 Compiler optimisations Fence 004
4	Erbium : Reconciling languages, runtimes, compilation and optimizations for streaming applications / Erbium : réconcilier les langages, les supports d'exécution, la compilation, et les optimisations pour calculs sur des flux de données Miranda, Cupertino 11 February 2013 (has links) Frappée par les rendements décroissants de la performance séquentielle et les limitations thermiques, l’industrie des microprocesseurs s’est tournée résolument vers les multiprocesseurs sur puce. Ce mouvement a ramené des problèmes anciens et difficiles sous les feux de l’actualité du développement logiciel. Les compilateurs sont l’une des pièces maitresses du puzzle permettant de poursuivre la traduction de la loi de Moore en gains de performances effectifs, gains inaccessibles sans exploiter le parallélisme de threads. Pourtant, la recherche sur les systèmes parallèles s’est concentrée sur les aspects langage et architecture, et le potentiel reste énorme en termes de compilation de programmes parallèles, d’optimisation et d’adaptation de programmes parallèles pour exploiter efficacement le matériel. Cette thèse relève ces défis en présentant Erbium, un langage de bas niveau fondé sur le traitement de flots de données, et mettant en œuvre des communications multi-producteur multi-consommateur ; un exécutif parallèle très efficace pour les architectures x86 et des variantes pour d’autres types d’architectures ; un schéma d’intégration du langage dans un compilateur illustré en tant que représentation intermédiaire dans GCC ; une étude des primitives du langage et de leurs dépendances permettant aux compilateurs d’optimiser des programmes Erbium à l’aide de transformations spécifiques aux programmes parallèles, et également à travers des formes généralisées d’optimisations classiques, telles que l’élimination de redondances partielles et l’élimination de code mort. / As transistors size and power limitations stroke computer industry, hardware parallelism arose as the solution, bringing old forgotten problems back into equation to solve the existing limitations of current parallel technologies. Compilers regain focus by being the most relevant puzzle piece in the quest for the expected computer performance improvements predicted by Moores law no longer possible without parallelism. Parallel research is mainly focused in either the language or architectural aspects, not really giving the needed attention to compiler problems, being the reason for the weak compiler support by many parallel languages or architectures, not allowing to exploit performance to the best. This thesis addresses these problems by presenting: Erbium, a low level streaming data-flow language supporting multiple producer and consumer task communication; a very efficient runtime implementation for x86 architectures also addressing other types of architectures; a compiler integration of the language as an intermediate representation in GCC; a study of the language primitives dependencies, allowing compilers to further optimise the Erbium code not only through specific parallel optimisations but also through traditional compiler optimisations, such as partial redundancy elimination and dead code elimination. Calcul sur des flux de données Représentation intermédiaire Compilation Optimisations Au moment de l'exécution Streaming data-flow Intermediate representation Compilation Optimisations Runtime
5	Utilisation des modes directionnels dans la résolution Oudot, Olivier 30 November 1987 (has links) (PDF) Cette étude met en évidence l'utilité des modes directionnels pour l'optimisation de la résolution dans le langage Prolog. Ils se caractérisent essentiellement par le fait qu'ils permettent de distinguer les différentes utilisations possibles d'un même prédicat. Un algorithme de production automatique de ces modes est décrit. L'étude est concrétisée par la réalisation du compilateur Starlog, fonctionnant sur un cas particulier de modes directionnels compilateur compilation Prolog stratégie de résolution unification termes clos optimisations modes
6	Erbium : Reconciling languages, runtimes, compilation and optimizations for streaming applications Miranda, Cupertino 11 February 2013 (has links) (PDF) As transistors size and power limitations stroke computer industry, hardware parallelism arose as the solution, bringing old forgotten problems back into equation to solve the existing limitations of current parallel technologies. Compilers regain focus by being the most relevant puzzle piece in the quest for the expected computer performance improvements predicted by Moores law no longer possible without parallelism. Parallel research is mainly focused in either the language or architectural aspects, not really giving the needed attention to compiler problems, being the reason for the weak compiler support by many parallel languages or architectures, not allowing to exploit performance to the best. This thesis addresses these problems by presenting: Erbium, a low level streaming data-flow language supporting multiple producer and consumer task communication; a very efficient runtime implementation for x86 architectures also addressing other types of architectures; a compiler integration of the language as an intermediate representation in GCC; a study of the language primitives dependencies, allowing compilers to further optimise the Erbium code not only through specific parallel optimisations but also through traditional compiler optimisations, such as partial redundancy elimination and dead code elimination. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Streaming data-flow Intermediate representation Compilation Optimisations Runtime
7	Low-cost memory analyses for efficient compilers / Analyses de mémoire à bas cout pour des compilateurs efficaces Maalej Kammoun, Maroua 26 September 2017 (has links) La rapidité, la consommation énergétique et l'efficacité des systèmes logiciels et matériels sont devenues les préoccupations majeures de la communauté informatique de nos jours. Gérer de manière correcte et efficace les problématiques mémoire est essentiel pour le développement des programmes de grande tailles sur des architectures de plus en plus complexes. Dans ce contexte, cette thèse contribue aux domaines de l'analyse mémoire et de la compilation tant sur les aspects théoriques que sur les aspects pratiques et expérimentaux. Outre l'étude approfondie de l'état de l'art des analyses mémoire et des différentes limitations qu'elles montrent, notre contribution réside dans la conception et l'évaluation de nouvelles analyses qui remédient au manque de précision des techniques publiées et implémentées. Nous nous sommes principalement attachés à améliorer l'analyse de pointeurs appartenant à une même structure de données, afin de lever une des limitations majeures des compilateurs actuels. Nous développons nos analyses dans le cadre général de l'interprétation abstraite « non dense ». Ce choix est motivé par les aspects de correction et d'efficacité : deux critères requis pour une intégration facile dans un compilateur. La première analyse que nous concevons est basée sur l'analyse d'intervalles des variables entières ; elle utilise le fait que deux pointeurs définis à l'aide d'un même pointeur de base n'aliasent pas si les valeurs possibles des décalages sont disjointes. La seconde analyse que nous développons est inspirée du domaine abstrait des Pentagones ; elle génère des relations d'ordre strict entre des paires de pointeurs comparables. Enfin, nous combinons et enrichissons les deux analyses précédentes dans un cadre plus général. Ces analyses ont été implémentées dans le compilateur LLVM. Nous expérimentons et évaluons leurs performances, et les comparons aux implémentations disponibles selon deux métriques : le nombre de paires de pointeurs pour lesquelles nous inférons le non-aliasing et les optimisations rendues possibles par nos analyses / This thesis was motivated by the emergence of massively parallel processing and supercomputingthat tend to make computer programming extremely performing. Speedup, the power consump-tion, and the efficiency of both software and hardware are nowadays the main concerns of theinformation systems community. Handling memory in a correct and efficient way is a step towardless complex and more performing programs and architectures. This thesis falls into this contextand contributes to memory analysis and compilation fields in both theoretical and experimentalaspects.Besides the deep study of the current state-of-the-art of memory analyses and their limitations,our theoretical results stand in designing new algorithms to recover part of the imprecisionthat published techniques still show. Among the present limitations, we focus our research onthe pointer arithmetic to disambiguate pointers within the same data structure. We develop ouranalyses in the abstract interpretation framework. The key idea behind this choice is correctness,and scalability: two requisite criteria for analyses to be embedded to the compiler construction.The first alias analysis we design is based on the range lattice of integer variables. Given a pair ofpointers defined from a common base pointer, they are disjoint if their offsets cannot have valuesthat intersect at runtime. The second pointer analysis we develop is inspired from the Pentagonabstract domain. We conclude that two pointers do not alias whenever we are able to build astrict relation between them, valid at program points where the two variables are simultaneouslyalive. In a third algorithm we design, we combine both the first and second analysis, and enhancethem with a coarse grained but efficient analysis to deal with non related pointers.We implement these analyses on top of the LLVM compiler. We experiment and evaluate theirperformance based on two metrics: the number of disambiguated pairs of pointers compared tocommon analyses of the compiler, and the optimizations further enabled thanks to the extraprecision they introduce Compilateurs Optimisations Analyse statique Interprétation abstraite Compilers Optimizations Static analyses Abstract interpretation 004
8	Mise au point de synthèses de stéroïdes : perspectives d'applications industrielles / Optimization of the synthesis of a key intermediate in glucocorticoids preparation Jouve, Romain 03 January 2017 (has links) L’objectif des travaux de thèse est l’amélioration du coût de revient industriel du décortidiène, intermédiaire clé dans la préparation de plusieurs principes actifs produits au sein de l’usine de production de SANOFI Vertolaye. Après une revue de l’état de l’art sur la synthèse du décortidiène et en fonction du cahier des charges défini, deux voies ont été explorées. La première voie de synthèse étudiée a permis de préparer le décortidiène en deux étapes, alors que la synthèse industrielle est effectuée en quatre étapes. Cependant, le rendement encourageant obtenu est inférieur à l’objectif fixé. Une étude de modélisation moléculaire a permis de proposer un mécanisme réactionnel concernant cette voie. Nous avons pu mettre en évidence lors de cette synthèse la formation d’un nouveau coricostéroïde, présentant une activité anti inflammatoire comparable à celle de la prénidsolone. La deuxième voie de synthèse permet de réduire le nombre d’étapes de la synthèse du décortidiène de 25% et conduit à un rendement acceptable par rapport à celui défini dans le cahier des charges. Le procédé a fait l’objet de plusieurs séries par une approche « plan d’expériences » pour atteindre les objectifs. / The aim of this doctoral works is the reduction of the production costs of decortidiene made by SANOFI Vertolaye. Decortidiene is a key intermediate for the synthesis of several active principles. After checking the iterature, two ways were explored. The first one reduces by half the synthetic steps buthe yield is lower than the target. A molecular modeling allowed us to propose a reaction mechanism. Several optimizations and experimental design of experiments enabled the second studied pathway to reduce by 25% the industrial process and to reach the targeted yield. Chimie organique Corticosteroides Optimisations Production Coût de revient industriel Organic chemistry, Corticosteroids Optimization Industrial production costs
9	Semantically-enabled stream processing and complex event processing over RDF graph streams / Traitement de flux sémantiquement activé et traitement d'évènements complexes sur des flux de graphe RDF Gillani, Syed 04 November 2016 (has links) Résumé en français non fourni par l'auteur. / There is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods Traitement de flux Traitement d'évènements complexes Graphes RDF Optimisations de question Ebauche de requête Web sémantique Requêtes top-k Données de graphes Stream processing Complex event processing RDF graphs Query optimisations Query design Semantic web Top-k queries Graph databases
10	Analysis and transformation of legacy code Manilov, Stanislav Zapryanov January 2018 (has links) Hardware evolves faster than software. While a hardware system might need replacement every one to five years, the average lifespan of a software system is a decade, with some instances living up to several decades. Inevitably, code outlives the platform it was developed for and may become legacy: development of the software stops, but maintenance has to continue to keep up with the evolving ecosystem. No new features are added, but the software is still used to fulfil its original purpose. Even in the cases where it is still functional (which discourages its replacement), legacy code is inefficient, costly to maintain, and a risk to security. This thesis proposes methods to leverage the expertise put in the development of legacy code and to extend its useful lifespan, rather than to throw it away. A novel methodology is proposed, for automatically exploiting platform specific optimisations when retargeting a program to another platform. The key idea is to leverage the optimisation information embedded in vector processing intrinsic functions. The performance of the resulting code is shown to be close to the performance of manually retargeted programs, however with the human labour removed. Building on top of that, the question of discovering optimisation information when there are no hints in the form of intrinsics or annotations is investigated. This thesis postulates that such information can potentially be extracted from profiling the data flow during executions of the program. A context-aware data dependence profiling system is described, detailing previously overlooked aspects in related research. The system is shown to be essential in surpassing the information that can be inferred statically, in particular about loop iterators. Loop iterators are the controlling part of a loop. This thesis describes and evaluates a system for extracting the loop iterators in a program. It is found to significantly outperform previously known techniques and further increases the amount of information about the structure of a program that is available to a compiler. Combining this system with data dependence profiling improves its results even more. Loop iterator recognition enables other code modernising techniques, like source code rejuvenation and commutativity analysis. The former increases the use of idiomatic code and as a result increases the maintainability of the program. The latter can potentially drive parallelisation and thus dramatically improve runtime performance.

Search results