Spelling suggestions: "subject:"pode optimization"" "subject:"mode optimization""
1 |
Code optimizations for narrow bitwidth architecturesBhagat, Indu 23 February 2012 (has links)
This thesis takes a HW/SW collaborative approach to tackle the problem of computational inefficiency in a holistic manner.
The hardware is redesigned by restraining the datapath to merely 16-bit datawidth (integer datapath only) to provide an
extremely simple, low-cost, low-complexity execution core which is best at executing the most common case efficiently. This
redesign, referred to as the Narrow Bitwidth Architecture, is unique in that although the datapath is squeezed to 16-bits, it
continues to offer the advantage of higher memory addressability like the contemporary wider datapath architectures. Its
interface to the outside (software) world is termed as the Narrow ISA. The software is responsible for efficiently mapping the
current stack of 64-bit applications onto the 16-bit hardware. However, this HW/SW approach introduces a non-negligible
penalty both in dynamic code-size and performance-impact even with a reasonably smart code-translator that maps the 64-
bit applications on to the 16-bit processor.
The goal of this thesis is to design a software layer that harnesses the power of compiler optimizations to assuage this
negative performance penalty of the Narrow ISA. More specifically, this thesis focuses on compiler optimizations targeting the
problem of how to compile a 64-bit program to a 16-bit datapath machine from the perspective of Minimum Required
Computations (MRC). Given a program, the notion of MRC aims to infer how much computation is really required to generate
the same (correct) output as the original program.
Approaching perfect MRC is an intrinsically ambitious goal and it requires oracle predictions of program behavior. Towards
this end, the thesis proposes three heuristic-based optimizations to closely infer the MRC. The perspective of MRC unfolds
into a definition of productiveness - if a computation does not alter the storage location, it is non-productive and hence, not
necessary to be performed. In this research, the definition of productiveness has been applied to different granularities of the
data-flow as well as control-flow of the programs.
Three profile-based, code optimization techniques have been proposed :
1. Global Productiveness Propagation (GPP) which applies the concept of productiveness at the granularity of a function.
2. Local Productiveness Pruning (LPP) applies the same concept but at a much finer granularity of a single instruction.
3. Minimal Branch Computation (MBC) is an profile-based, code-reordering optimization technique which applies the
principles of MRC for conditional branches.
The primary aim of all these techniques is to reduce the dynamic code footprint of the Narrow ISA. The first two optimizations
(GPP and LPP) perform the task of speculatively pruning the non-productive (useless) computations using profiles. Further,
these two optimization techniques perform backward traversal of the optimization regions to embed checks into the nonspeculative
slices, hence, making them self-sufficient to detect mis-speculation dynamically.
The MBC optimization is a use case of a broader concept of a lazy computation model. The idea behind MBC is to reorder the
backslices containing narrow computations such that the minimal necessary computations to generate the same (correct)
output are performed in the most-frequent case; the rest of the computations are performed only when necessary.
With the proposed optimizations, it can be concluded that there do exist ways to smartly compile a 64-bit application to a 16-
bit ISA such that the overheads are considerably reduced. / Esta tesis deriva su motivación en la inherente ineficiencia computacional de los procesadores actuales: a pesar de que
muchas aplicaciones contemporáneas tienen unos requisitos de ancho de bits estrechos (aplicaciones de enteros, de red y
multimedia), el hardware acaba utilizando el camino de datos completo, utilizando más recursos de los necesarios y
consumiendo más energía.
Esta tesis utiliza una aproximación HW/SW para atacar, de forma íntegra, el problema de la ineficiencia computacional. El
hardware se ha rediseñado para restringir el ancho de bits del camino de datos a sólo 16 bits (únicamente el de enteros) y
ofrecer así un núcleo de ejecución simple, de bajo consumo y baja complejidad, el cual está diseñado para ejecutar de
forma eficiente el caso común. El rediseño, llamado en esta tesis Arquitectura de Ancho de Bits Estrecho (narrow bitwidth
en inglés), es único en el sentido que aunque el camino de datos se ha estrechado a 16 bits, el sistema continúa
ofreciendo las ventajas de direccionar grandes cantidades de memoria tal como procesadores con caminos de datos más
anchos (64 bits actualmente). Su interface con el mundo exterior se denomina ISA estrecho. En nuestra propuesta el
software es responsable de mapear eficientemente la actual pila software de las aplicaciones de 64 bits en el hardware de
16 bits. Sin embargo, esta aproximación HW/SW introduce penalizaciones no despreciables tanto en el tamaño del código
dinámico como en el rendimiento, incluso con un traductor de código inteligente que mapea las aplicaciones de 64 bits en
el procesador de 16 bits.
El objetivo de esta tesis es el de diseñar una capa software que aproveche la capacidad de las optimizaciones para reducir
el efecto negativo en el rendimiento del ISA estrecho. Concretamente, esta tesis se centra en optimizaciones que tratan el
problema de como compilar programas de 64 bits para una máquina de 16 bits desde la perspectiva de las Mínimas
Computaciones Requeridas (MRC en inglés). Dado un programa, la noción de MRC intenta deducir la cantidad de cómputo
que realmente se necesita para generar la misma (correcta) salida que el programa original.
Aproximarse al MRC perfecto es una meta intrínsecamente ambiciosa y que requiere predicciones perfectas de
comportamiento del programa. Con este fin, la tesis propone tres heurísticas basadas en optimizaciones que tratan de
inferir el MRC. La utilización de MRC se desarrolla en la definición de productividad: si un cálculo no altera el dato que ya
había almacenado, entonces no es productivo y por lo tanto, no es necesario llevarlo a cabo.
Se han propuesto tres optimizaciones del código basadas en profile:
1. Propagación Global de la Productividad (GPP en inglés) aplica el concepto de productividad a la granularidad de función.
2. Poda Local de Productividad (LPP en inglés) aplica el mismo concepto pero a una granularidad mucho más fina, la de
una única instrucción.
3. Computación Mínima del Salto (MBC en inglés) es una técnica de reordenación de código que aplica los principios de
MRC a los saltos condicionales.
El objetivo principal de todas esta técnicas es el de reducir el tamaño dinámico del código estrecho. Las primeras dos
optimizaciones (GPP y LPP) realizan la tarea de podar especulativamente las computaciones no productivas (innecesarias)
utilizando profiles. Además, estas dos optimizaciones realizan un recorrido hacia atrás de las regiones a optimizar para
añadir chequeos en el código no especulativo, haciendo de esta forma la técnica autosuficiente para detectar,
dinámicamente, los casos de fallo en la especulación.
La idea de la optimización MBC es reordenar las instrucciones que generan el salto condicional tal que las mínimas
computaciones que general la misma (correcta) salida se ejecuten en la mayoría de los casos; el resto de las
computaciones se ejecutarán sólo cuando sea necesario.
|
2 |
Code optimization and detection of script conflicts in video gamesYang, Yi 11 1900 (has links)
Scripting languages have gained popularity in video games for specifying the interactive content in a story. Game designers do not necessarily possess programming skills and often demand code-generating tools that can transform textual or graphical descriptions of interactions into scripts interpreted by the game engine. However, in event-based games, this code generation process may lead to potential inefficiencies and conflicts if there are multiple independent sources generating scripts for the same event. This thesis presents solutions to both perils: transformations to eliminate redundancies in the generated scripts and an advisory tool to provide assistance in detecting unintended conflicts. By incorporating traditional compiler techniques with an original code-redundancy-elimination approach, the code transformation is able to reduce code size by 25% on scripts and 14% on compiled byte-codes. With the proposed alternative view, the advisory tool is suitable for offering aid to expose potential script conflicts.
|
3 |
Code optimization and detection of script conflicts in video gamesYang, Yi Unknown Date
No description available.
|
4 |
Optimalizace rozsáhlých aplikací / Optimizing large applicationsLiška, Martin January 2013 (has links)
Both uppermost open source compilers, GCC and LLVM, are mature enough to link-time optimize large applications. In case of large applications, we must take into account, except standard speed efficiency and memory consumption, different aspects. We focus on size of the code, cold start-up time, etc. Developers of applications often come up with ad-hoc solutions such as Elfhack utility, start-up of an application via a pre-loading utility and dlopen; prelinking and variety of different tools that reorder functions to fit the order of execution. The goal of the thesis is to analyse all existing techniques of optimization, evaluate their efficiency and design new solutions based on the link-time optimization platform. Powered by TCPDF (www.tcpdf.org)
|
5 |
RISC-V Compiler Performance:A Comparison between GCC and LLVM/clangBjäreholt, Johan January 2017 (has links)
RISC-V is a new open-source instruction set architecture (ISA) that in De-cember 2016 manufactured its rst mass-produced processors. It focuses onboth eciency and performance and diers from other open-source architec-tures by not having a copyleft license permitting vendors to freely design,manufacture and sell RISC-V chips without any fees nor having to sharetheir modications on the reference implementations of the architecture.The goal of this thesis is to evaluate the performance of the GCC andLLVM/clang compilers support for the RISC-V target and their ability tooptimize for the architecture. The performance will be evaluated from ex-ecuting the CoreMark and Dhrystone benchmarks are both popular indus-try standard programs for evaluating performance on embedded processors.They will be run on both the GCC and LLVM/clang compilers on dierentoptimization levels and compared in performance per clock to the ARM archi-tecture which is mature yet rather similar to RISC-V. The compiler supportfor the RISC-V target is still in development and the focus of this thesis willbe the current performance dierences between the GCC and LLVM com-pilers on this architecture. The platform we will execute the benchmarks onwil be the Freedom E310 processor on the SiFive HiFive1 board for RISC-Vand a ARM Cortex-M4 processor by Freescale on the Teensy 3.6 board. TheFreedom E310 is almost identical to the reference Berkeley Rocket RISC-Vdesign and the ARM Coretex-M4 processor has a similar clock speed and isaimed at a similar target audience.The results presented that the -O2 and -O3 optimization levels on GCCfor RISC-V performed very well in comparison to our ARM reference. Onthe lower -O1 optimization level and -O0 which is no optimizations and -Oswhich is -O0 with optimizations for generating a smaller executable code sizeGCC performs much worse than ARM at 46% of the performance at -O1,8.2% at -Os and 9.3% at -O0 on the CoreMark benchmark with similar resultsin Dhrystone except on -O1 where it performed as well as ARM. When turn-ing o optimizations (-O0) GCC for RISC-V was 9.2% of the performanceon ARM in CoreMark and 11% in Dhrystone which was unexpected andneeds further investigation. LLVM/clang on the other hand crashed whentrying to compile our CoreMark benchmark and on Dhrystone the optimiza-tion options made a very minor impact on performance making it 6.0% theperformance of GCC on -O3 and 5.6% of the performance of ARM on -O3, soeven with optimizations it was still slower than GCC without optimizations.In conclusion the performance of RISC-V with the GCC compiler onthe higher optimization levels performs very well considering how young theRISC-V architecture is. It does seems like there could be room for improvement on the lower optimization levels however which in turn could also pos-sibly increase the performance of the higher optimization levels. With theLLVM/clang compiler on the other hand a lot of work needs to be done tomake it competetive in both performance and stability with the GCC com-piler and other architectures. Why the -O0 optimization is so considerablyslower on RISC-V than on ARM was also very unexpected and needs furtherinvestigation.
|
6 |
ML4JIT- um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT. / ML4JIT - a framework for research on machine learning in JIT compilers.Mignon, Alexandre dos Santos 27 June 2017 (has links)
Determinar o melhor conjunto de otimizações para serem aplicadas a um programa tem sido o foco de pesquisas em otimização de compilação por décadas. Em geral, o conjunto de otimizações é definido manualmente pelos desenvolvedores do compilador e aplicado a todos os programas. Técnicas de aprendizado de máquina supervisionado têm sido usadas para o desenvolvimento de heurísticas de otimização de código. Elas pretendem determinar o melhor conjunto de otimizações com o mínimo de interferência humana. Este trabalho apresenta o ML4JIT, um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT para a linguagem Java. O arcabouço permite que sejam realizadas pesquisas para encontrar uma melhor sintonia das otimizações específica para cada método de um programa. Experimentos foram realizados para a validação do arcabouço com o objetivo de verificar se com seu uso houve uma redução no tempo de compilação dos métodos e também no tempo de execução do programa. / Determining the best set of optimizations to be applied in a program has been the focus of research on compile optimization for decades. In general, the set of optimization is manually defined by compiler developers and apply to all programs. Supervised machine learning techniques have been used for the development of code optimization heuristics. They intend to determine the best set of optimization with minimal human intervention. This work presents the ML4JIT, a framework for research with machine learning in JIT compilers for Java language. The framework allows research to be performed to better tune the optimizations specific to each method of a program. Experiments were performed for the validation of the framework with the objective of verifying if its use had a reduction in the compilation time of the methods and also in the execution time of the program.
|
7 |
ML4JIT- um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT. / ML4JIT - a framework for research on machine learning in JIT compilers.Alexandre dos Santos Mignon 27 June 2017 (has links)
Determinar o melhor conjunto de otimizações para serem aplicadas a um programa tem sido o foco de pesquisas em otimização de compilação por décadas. Em geral, o conjunto de otimizações é definido manualmente pelos desenvolvedores do compilador e aplicado a todos os programas. Técnicas de aprendizado de máquina supervisionado têm sido usadas para o desenvolvimento de heurísticas de otimização de código. Elas pretendem determinar o melhor conjunto de otimizações com o mínimo de interferência humana. Este trabalho apresenta o ML4JIT, um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT para a linguagem Java. O arcabouço permite que sejam realizadas pesquisas para encontrar uma melhor sintonia das otimizações específica para cada método de um programa. Experimentos foram realizados para a validação do arcabouço com o objetivo de verificar se com seu uso houve uma redução no tempo de compilação dos métodos e também no tempo de execução do programa. / Determining the best set of optimizations to be applied in a program has been the focus of research on compile optimization for decades. In general, the set of optimization is manually defined by compiler developers and apply to all programs. Supervised machine learning techniques have been used for the development of code optimization heuristics. They intend to determine the best set of optimization with minimal human intervention. This work presents the ML4JIT, a framework for research with machine learning in JIT compilers for Java language. The framework allows research to be performed to better tune the optimizations specific to each method of a program. Experiments were performed for the validation of the framework with the objective of verifying if its use had a reduction in the compilation time of the methods and also in the execution time of the program.
|
8 |
Intercepting functions for memoization / Interception de fonctions pour la mémoïsationSuresh, Arjun 10 May 2016 (has links)
Nous avons proposé des mécanismes pour mettre en œuvre la mémoïsation de fonction au niveau logiciel dans le cadre de nos efforts pour améliorer les performances du code séquentiel. Nous avons analysé le potentiel de la mémoïsation de fonction sur des applications et le gain de performance qu'elle apporte sur des architectures actuelles. Nous avons proposé trois approches - une approche simple qui s'applique au chargement et qui fonctionne pour toute fonction de bibliothèque liée dynamiquement, une approche à la compilation utilisant LLVM qui peut permettre la mémoïsation pour toute fonction du programme, ainsi qu'une proposition d'implémentation de la mémoïsation en matériel et ses avantages potentiels. Nous avons démontré avec les fonctions transcendantales que l'approche au chargement est applicable et donne un bon avantage, même avec des architectures et des compilateurs (avec la restriction qu'elle ne peut être appliquée que pour les fonctions liées dynamiquement) modernes. Notre approche à la compilation étend la portée de la mémoïsation et en augmente également les bénéfices. Cela fonctionne pour les fonctions définies par l’utilisateur ainsi que pour les fonctions de bibliothèque. Nous pouvons gérer certains types de fonctions non pures comme les fonctions avec des arguments de type pointeur et l'utilisation de variables globales. La mémoïsation en matériel abaisse encore le seuil de profitabilité de la mémoïsation et donne plus de gain de performance en moyenne. / We have proposed mechanisms to implement function memoization at a software level as part of our effort to improve sequential code performance. We have analyzed the potential of function memoization on applications and its performance gain on current architectures. We have proposed three schemes - a simple load time approach which works for any dynamically linked function, a compile time approach using LLVM framework which can enable memoization for any program function and also a hardware proposal for doing memoization in hardware and its potential benefits. Demonstration of the link time approach with transcendental functions showed that memoization is applicable and gives good benefit even under modern architectures and compilers (with the restriction that it can be applied only for dynamically linked functions). Our compile time approach extends the scope of memoization and also increases the benefit due to memoization. This works for both user defined functions as well as library functions. It can handle certain kind of non pure functions like those functions with pointer arguments and global variable usage. Hardware memoization lowers the threshold for a function to be memoized and gives more performance gain on average.
|
9 |
Qit – A Web-Based Sign-Up ApplicationGutzen, Lucas, Wolf-Waltz, Björn, Björkman, Greta, Abramsson, Elias, Sahlström, Anton, Widigssoon, Eric, Lundin, Johanna, Dahlander, Alex, Albrekt, David January 2022 (has links)
Students wishing to attend student events, as well as the event organisers, currentlyface several inconveniences at the sign-up occasion, due to prevailing solutions’ lack ofdeep consideration for efficiency. This report investigates how a web-based sign-up appli-cation for student events can be implemented to handle high server traffic, by developingand testing a solution that allows for a quick sign-up process, payment directly at the sign-up occasion and the gathering of relevant attendee information.By considering code optimization strategies and the change of database from SQLite toMongoDB, based on the prevailing literature on the topic, a test script together with hightraffic testing was utilised to measure the relative performance of each of these implemen-tations, relative to the initial web-application. Results show that all four aspects tested weresignificant in terms of improving the handling of high traffic. Based on this, the conclusiondrawn was that all four aspects: time optimization, space optimization, minification, and adocument-based database, should be considered when improving a web-application thatallows for handling of high traffic. This finding confirms the prevailing literature on thetopic, however, the implementation phase also raised suggestions for future and improvedresearch on the topic. / Studenter som önskar gå på studentevent, likväl som eventorganisatörerna, möts idagsläget av flertalet olägenheten vid anmälningstillfället, eftersom nuvarande lösningarsaknar djupare hänsyn till effektivitet. Denna rapport undersöker hur en web-baserad an-mälningsapplikation för studentevent kan implementeras för att hantera hög servertrafik,genom att utveckla och testa en lösning som tillåter en snabb anmälningsprocess, betalningdirekt vid anmälningstillfället och insamling av relevant deltagarinformation. Genom att tahänsyn till kodoptimeringsstrategier och byte av databas från SQLite till MongoDB, baseratpå tidigare literatur om området, så används ett testskript och ett test för hög servertrafikför att mäta den relativa prestandan av vardera implementation, relativt till den initialaimplementationen. Resultatet visar att alla fyra undersökta aspekterna var signifikanta föratt förbättra hanteringen av hög trafik. Baserat på detta drogs slutsatsen att alla fyra aspek-terna: “time optimation”, “space optimization”, “minification” och en dokumentbaseraddatabas, borde tas hänsyn till när man implementerar en webbapplikation som tillåter förhantering av hög trafik. Denna upptäckt bekräftar den tidigare litteraturen om områden,men, implementationsfasen lyfte även förslag för framtida och förbättrad forskning på området.
|
10 |
Code optimization of speckle reduction algorithms for image processing of rocket motor hologramsKaeser, Dana S. 12 1900 (has links)
Approved for public release; distribution is unlimited / This thesis supplements and updates previous research completed in the digital analysis
of rocket motor combustion chamber holographic images. In particular this thesis deals with
the software code optimization of existing automatic data retrieval algorithms that are used to
extract useful particle information from the holograms using a microcomputer-based imaging
system. Two forms of optimization were accomplished, the application of an optimizing
FORTRAN compiler to the existing FORTRAN programs and the complete rewrite of the
programs in the C language using an optimizing compiler. The overall results achieved were
a reduction in executable program size and a significant decrease in program execution speed. / http://archive.org/details/codeoptimization00kaes / Lieutenant Commander, United States Navy
|
Page generated in 0.1461 seconds