Global ETD Search

21	FORTRAN Optimizations at the Source Code Level Barber, Willie D. 08 1900 (has links) This paper discusses FORTRAN optimizations that the user can perform manually at the source code level to improve object code performance. It makes use of descriptive examples within the text of the paper for explanatory purposes. The paper defines key areas in writing a FORTRAN program and recommends ways to improve efficiency in these areas. FORTRAN optimizations Computer programming. FORTRAN (Computer program language) object code performance
22	Source code optimizations to reduce multi core and many core performance bottlenecks / Otimizações de código fonte para reduzir gargalos de desempenho em multi core e many core Serpa, Matheus da Silva January 2018 (has links) Atualmente, existe uma variedade de arquiteturas disponíveis não apenas para a indústria, mas também para consumidores finais. Processadores multi-core tradicionais, GPUs, aceleradores, como o Xeon Phi, ou até mesmo processadores orientados para eficiência energética, como a família ARM, apresentam características arquiteturais muito diferentes. Essa ampla gama de características representa um desafio para os desenvolvedores de aplicações. Os desenvolvedores devem lidar com diferentes conjuntos de instruções, hierarquias de memória, ou até mesmo diferentes paradigmas de programação ao programar para essas arquiteturas. Para otimizar uma aplicação, é importante ter uma compreensão profunda de como ela se comporta em diferentes arquiteturas. Os trabalhos relacionados provaram ter uma ampla variedade de soluções. A maioria deles se concentrou em melhorar apenas o desempenho da memória. Outros se concentram no balanceamento de carga, na vetorização e no mapeamento de threads e dados, mas os realizam separadamente, perdendo oportunidades de otimização. Nesta dissertação de mestrado, foram propostas várias técnicas de otimização para melhorar o desempenho de uma aplicação de exploração sísmica real fornecida pela Petrobras, uma empresa multinacional do setor de petróleo. Os experimentos mostram que loop interchange é uma técnica útil para melhorar o desempenho de diferentes níveis de memória cache, melhorando o desempenho em até 5,3 e 3,9 nas arquiteturas Intel Broadwell e Intel Knights Landing, respectivamente. Ao alterar o código para ativar a vetorização, o desempenho foi aumentado em até 1,4 e 6,5 . O balanceamento de carga melhorou o desempenho em até 1,1 no Knights Landing. Técnicas de mapeamento de threads e dados também foram avaliadas, com uma melhora de desempenho de até 1,6 e 4,4 . O ganho de desempenho do Broadwell foi de 22,7 e do Knights Landing de 56,7 em comparação com uma versão sem otimizações, mas, no final, o Broadwell foi 1,2 mais rápido que o Knights Landing. / Nowadays, there are several different architectures available not only for the industry but also for final consumers. Traditional multi-core processors, GPUs, accelerators such as the Xeon Phi, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. Related work proved to have a wide variety of solutions. Most of then focused on improving only memory performance. Others focus on load balancing, vectorization, and thread and data mapping, but perform them separately, losing optimization opportunities. In this master thesis, we propose several optimization techniques to improve the performance of a real-world seismic exploration application provided by Petrobras, a multinational corporation in the petroleum industry. In our experiments, we show that loop interchange is a useful technique to improve the performance of different cache memory levels, improving the performance by up to 5.3 and 3.9 on the Intel Broadwell and Intel Knights Landing architectures, respectively. By changing the code to enable vectorization, performance was increased by up to 1.4 and 6.5 . Load Balancing improved the performance by up to 1.1 on Knights Landing. Thread and data mapping techniques were also evaluated, with a performance improvement of up to 1.6 and 4.4 . We also compared the best version of each architecture and showed that we were able to improve the performance of Broadwell by 22.7 and Knights Landing by 56.7 compared to a naive version, but, in the end, Broadwell was 1.2 faster than Knights Landing. Avaliacao : Desempenho Hardware Software Performance evaluation HPC Many-core Source code optimizations
23	Run-time optimization of adaptive irregular applications Yu, Hao 15 November 2004 (has links) Compared to traditional compile-time optimization, run-time optimization could oﬀer signiﬁcant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identiﬁed a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Speciﬁcally, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm speciﬁed by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems. compiler optimizations adaptive optimization performance modeling run-time parallelization run-time optimization reduction parallelization
24	Outage Capacity and Code Design for Dying Channels Zeng, Meng 2011 August 1900 (has links) In wireless networks, communication links may be subject to random fatal impacts: for example, sensor networks under sudden power losses or cognitive radio networks with unpredictable primary user spectrum occupancy. Under such circumstances, it is critical to quantify how fast and reliably the information can be collected over attacked links. For a single point-to-point channel subject to a random attack, named as a dying channel, we model it as a block-fading (BF) channel with a finite and random channel length. First, we study the outage probability when the coding length K is fixed and uniform power allocation is assumed. Furthermore, we discuss the optimization over K and the power allocation vector PK to minimize the outage probability. In addition, we extend the single point to-point dying channel case to the parallel multi-channel case where each sub-channel is a dying channel, and investigate the corresponding asymptotic behavior of the overall outage probability with two different attack models: the independent-attack case and the m-dependent-attack case. It can be shown that the overall outage probability diminishes to zero for both cases as the number of sub-channels increases if the rate per unit cost is less than a certain threshold. The outage exponents are also studied to reveal how fast the outage probability improves over the number of sub-channels. Besides the information-theoretical results, we also study a practical coding scheme for the dying binary erasure channel (DBEC), which is a binary erasure channel (BEC) subject to a random fatal failure. We consider the rateless codes and optimize the degree distribution to maximize the average recovery probability. In particular, we first study the upper bound of the average recovery probability, based on which we define the objective function as the gap between the upper bound and the average recovery probability achieved by a particular degree distribution. We then seek the optimal degree distribution by minimizing the objective function. A simple and heuristic approach is also proposed to provide a suboptimal but good degree distribution. Block Fading Channels Convex Optimizations Dying Channels Outage Capacity Rateless Codes
25	Binary Redundancy Elimination Fernández Gómez, Manuel 13 April 2005 (has links) Dos de las limitaciones de rendimiento más importantes en los procesadores de hoy en día provienen de las operaciones de memoria y de las dependencias de control. Para resolver estos problemas, las memorias cache y los predictores de salto son dos alternativas hardware bien conocidas que explotan, entre otros factores, el reuso temporal de memoria y la correlación de saltos. En otras palabras, estas estructuras tratan de explotar la redundancia dinámica existente en los programas. Esta redundancia proviene parcialmente de la forma en que los programadores escriben código, pero también de limitaciones existentes en el modelo de compilación tradicional, lo cual introduce instrucciones de memoria y de salto innecesarias. Pensamos que los compiladores deberían ser muy agresivos optimizando programas, y por tanto ser capaces de eliminar una parte importante de esta redundancia. Por otro lado, las optimizaciones aplicadas en tiempo de enlace o directamente al programa ejecutable final han recibido una atención creciente en los últimos años, debido a limitaciones existentes en el modelo de compilación tradicional. Incluso aplicando sofisticados análisis y transformaciones interprocedurales, un compilador tradicional no es capaz de optimizar un programa como una entidad completa. Un problema similar aparece aplicando técnicas de compilación dirigidas por profiling: grandes proyectos se ven forzados a recompilar todos y cada uno de sus módulos para aprovechar dicha información. Por el contrario, seria más conveniente construir la aplicación completa, instrumentarla para obtener información de profiling y optimizar entonces el binario final sin recompilar ni un solo fichero fuente.En esta tesis presentamos nuevas técnicas de compilación dirigidas por profiling para eliminar la redundancia encontrada en programas ejecutables a nivel binario (esto es, redundancia binaria), incluso aunque estos programas hayan sido compilados agresivamente con un novísimo compilador comercial. Nuestras técnicas de eliminación de redundancia están diseñadas para eliminar operaciones de memoria y de salto redundantes, que son las más importantes para mitigar los problemas de rendimiento que hemos mencionado. Estas propuestas están basadas en técnicas de eliminación de redundancia parcial sensibles al camino de ejecución. Los resultados muestran que aplicando nuestras optimizaciones, somos capaces de alcanzar una reducción del 14% en el tiempo de ejecución de nuestro conjunto de programas.En este trabajo también revisamos el problemas del análisis de alias en programas ejecutables, identificando el por qué la desambiguación de memoria es uno de los puntos débiles en la modificación de código objeto. Proponemos varios análisis para ser aplicados en el contexto de optimizadores binarios. Primero un análisis de alias estricto para descubrir dependencias de memoria sensibles al camino de ejecución, el cual es usado en nuestras optimizaciones para la eliminación de redundancias de memoria. Seguidamente, dos análisis especulativos de posibles alias para detección de independencias de memoria. Estos análisis están basados en introducir información especulativa en tiempo de análisis, lo que incrementa la precisión en partes importantes de código manteniendo el análisis eficiente. Los resultados muestran que nuestras propuestas son altamente útiles para incrementar la desambiguación de memoria de código binario, lo que se traduce en oportunidades para aplicar optimizaciones. Todos nuestros algoritmos, tanto de análisis como de optimización, han sido implementados en un optimizador binario, enfatizando los problemas más relevantes en la aplicaciones de nuestros algoritmos en código ejecutable, sin la ayuda de gran parte de la información de alto nivel presente en compiladores tradicionales. / Two of the most important performance limiters in today's processor families comes from solving the memory wall and handling control dependencies. In order to address these issues, cache memories and branch predictors are well-known hardware proposals that take advantage of, among other things, exploiting both temporal memory reuse and branch correlation. In other words, they try to exploit the dynamic redundancy existing in programs. This redundancy comes partly from the way that programmers write source code, but also from limitations in the compilation model of traditional compilers, which introduces unnecessary memory and conditional branch instructions. We believe that today's optimizing compilers should be very aggressive in optimizing programs, and then they should be expected to optimize a significant part of this redundancy away.On the other hand, optimizations performed at link-time or directly applied to final program executables have received increased attention in recent years, due to limitations in the traditional compilation model. First, even though performing sophisticated interprocedural analyses and transformations, traditional compilers do not have the opportunity to optimize the program as a whole. A similar problem arises when applying profile-directe compilation techniques: large projects will be forced to re-build every source file to take advantage of profile information. By contrast, it would be more convenient to build the full application, instrument it to obtain profile data and then re-optimize the final binary without recompiling a single source file.In this thesis we present new profile-guided compiler optimizations for eliminating the redundancy encountered on executable programs at binary level (i.e.: binary redundancy), even though these programs have been compiled with full optimizations using a state-ofthe- art commercial compiler. In particular, our Binary Redundancy Elimination (BRE) techniques are targeted at eliminating both redundant memory operations and redundant conditional branches, which are the most important ones for addressing the performance issues that we mentioned above in today's microprocessors. These new proposals are mainly based on Partial Redundancy Elimination (PRE) techniques for eliminating partial redundancies in a path-sensitive fashion. Our results show that, by applying our optimizations, we are able to achieve a 14% execution time reduction in our benchmark suite.In this work we also review the problem of alias analysis at the executable program level, identifying why memory disambiguation is one of the weak points of object code modification. We then propose several alias analyses to be applied in the context of linktime or executable code optimizers. First, we present a must-alias analysis to recognize memory dependencies in a path- sensitive fashion, which is used in our optimization for eliminating redundant memory operations. Next, we propose two speculative may-alias data-flow algorithms to recognize memory independencies. These may-alias analyses are based on introducing unsafe speculation at analysis time, which increases alias precision on important portions of code while keeping the analysis reasonably cost-efficient. Our results show that our analyses prove to be very useful for increasing memory disambiguation accuracy of binary code, which turns out into opportunities for applying optimizations.All our algorithms, both for the analyses and the optimizations, have been implemented within a binary optimizer, which overcomes most of the existing limitations of traditional source-code compilers. Therefore, our work also points out the most relevant issues of applying our algorithms at the executable code level, since most of the high-level information available in traditional compilers is lost. binary optimizers profile-guided optimizations redundancy elimination alias analysis 3304. Tecnologia dels ordinadors 004 519.1 62
26	Run-time optimization of adaptive irregular applications Yu, Hao 15 November 2004 (has links) Compared to traditional compile-time optimization, run-time optimization could oﬀer signiﬁcant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identiﬁed a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Speciﬁcally, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm speciﬁed by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems. compiler optimizations adaptive optimization performance modeling run-time parallelization run-time optimization reduction parallelization
27	Characterization and optimization of JavaScript programs for mobile systems Srikanth, Aditya 09 October 2013 (has links) JavaScript has permeated into every aspect of the web experience in today's world, making it highly crucial to process it as quickly as possible. With the proliferation of HTML5 and its associated mobile web applications, the world is slowly but surely moving into an age where majority of the webpages will involve complex computations and manipulations within the JavaScript engine. Recent techniques like Just-in-Time (JIT) compilation have become commonplace in popular browsers like Chrome and Firefox, and there is an ongoing effort to further optimize them in the context of mobile systems. In order to fully take advantage of JavaScript-heavy webpages, it is important to first characterize the interaction of these webpages (both existing pages and modern HTML5 pages) with the different components of the JavaScript engine, viz. the interpreter, the method JIT, the optimizing compiler and the garbage collector. In this thesis, the aforementioned characterization work was leveraged to identify the limits of JavaScript optimizations. Subsequently, a particular optimization, i.e. Register Allocation heuristics was explored in detail on different types of JavaScript programs. This was primarily because the majority of the time (an average of 52.81%) spent in the optimizing compiler is for the register allocation stage alone. By varying the heuristics for register assignment, interval priority and spill selection, a clear idea is obtained about how it impacts certain types of programs more than others. This thesis also gives a preliminary insight into JavaScript applications and benchmarks, showing that these applications tend to be register-intensive, with large live intervals and sparse uses, and sensitive to array and string manipulations. A statically-selected optimal register allocation scheme outperforms the default register allocation scheme resulting in 9.1% performance improvement and 11.23% reduction in execution time on a representative mobile system. / text JavaScript Mobile systems Workload characterization Compiler optimizations Register allocation Firefox SpiderMonkey
28	Software for the Canadian Advanced Nanospace eXperiment-4/5 Leonard, Matthew Leigh 20 November 2012 (has links) The CanX-4 and CanX-5 mission currently under development at The University of Toronto Institute for Aerospace Studies Space Flight Laboratory UTIAS/SFL is a challenging formation flying technology demonstration. Its requirements of sub-metre control accuracy have yet to be realized with nanosatellites. Many large technical challenges must be addressed in order to ensure the success of the CanX-4/5 mission. This includes the development of software for an intersatellite communication system, integration and optimization of key formation flying algorithms onto the Payload On-Board Computer as well as the development of a Hardware-In-The-Loop simulator for full on-orbit mission simulations. This thesis will provide background knowledge of the Space Flight Laboratory and its activities, the CanX-4/5 mission, and nally highlight the authors contributions to overcoming each of these technical challenges and ensuring the success of the CanX-4 and CanX-5 mission. CanX Embedded Systems Software Protocol Stack Satellite Communications On-board Computers Software Optimizations Nanosatellites 0544 0538
29	Švietimo įstaigų tinklo optimizavimas, siekiant aukštesnės ugdymo kokybės: Vilniaus miesto ir Telšių rajono atvejų palyginamoji analizė / Optimization of educational network while striving to achieve higher quality of education. Comparative analysis of Vilnius city and Telšiai district situation Jurkonienė, Audrutė, Butienė, Daiva 04 August 2011 (has links) Mokyklų tinklo optimizavimas sukėlė itin prieštaringus vertinimus visuomenėje, tiek pedagogų, tiek mokinių tėvų, tiek ir pačių mokinių akimis. Tinklo pertvarkymas išryškino skirtingą požiūrį į švietimui skirtų lėšų ir kitų išteklių panaudojimą Baigiamajame bakalauro darbe nagrinėjamas mokyklų tinklo optimizavimas didmiestyje ir rajone 1995- 2010 metais. Apžvelgiama švietimo įstaigų tinklo struktūra Vilniaus mieste ir Telšių rajone minėtu laikotarpiu. Lyginami mokyklų tinklo pertvarkos procesai. Atskleista mokyklų tinklo pertvarkos esmė ir raiška: teoriniu aspektu išanalizuota švietimo reformos situacija Lietuvoje ir minėtose savivaldybėse, identifikuojami ją lemiantys veiksniai. Tyrimo metu atskleistas Vilniaus miesto ir Telšių rajono pedagogų, bei mokinių tėvų požiūris į Lietuvos švietimo sistemą. Pateikta studijuotų užsienio ir šalies mokslininkų teorinių darbų analizė apie Lietuvos švietimo reformą, išryškintos pagrindinės sąvokos, apžvelgti gauti tyrimo rezultatai. Apibendrinant darbą suformuluotos išvados, teigiančios, kad mokyklų tinklo pertvarka labai ištęsta ir būtina kuo greičiau pabaigti pradėtus darbus, tam, kad galima būtų siekti užsibrėžtų tikslų. / Optimizations of schools’ network arouse very controversial valuation in society, among teachers, parents of the students, and the students themselves. The reform of the network highlighted diverse opinions about the usage of the funds and other resources designated for the education. In the final work for Bachelor degree, the optimization of schools’ network in city and in district in 1995 – 2010 is being analyzed. It also reviews the structure of educational institutions network in Vilnius city and in Telšiai district during the earlier mentioned period. There the schools’ network reform processes are compared. The essence and expression of the schools’ network reform is revealed: the situation of educational reform in Lithuania and in the earlier mentioned municipalities is analyzed using theoretical aspect, and the factors determining the reform are identified. The research revealed the attitude of the teachers and parents of the students in Vilnius city and Telšiai district towards the system of education in Lithuania. Analysis of the studied theoretical works on the educational reform in Lithuania of foreign and national scientists is provided. The main concepts are highlighted, and the results of the research are reviewed. Summarizing the work the conclusions were formulated, and they state that the reform of the schools’ network is too lengthy and it is essential to finalize the work that had been started, so that the goals set could be achieved. Marketing and Administration Švietimo reforma Mokyklų tinklo optimizacija Švietimas Educational reform Optimizations of schools’ network Education
30	Software for the Canadian Advanced Nanospace eXperiment-4/5 Leonard, Matthew Leigh 20 November 2012 (has links) The CanX-4 and CanX-5 mission currently under development at The University of Toronto Institute for Aerospace Studies Space Flight Laboratory UTIAS/SFL is a challenging formation flying technology demonstration. Its requirements of sub-metre control accuracy have yet to be realized with nanosatellites. Many large technical challenges must be addressed in order to ensure the success of the CanX-4/5 mission. This includes the development of software for an intersatellite communication system, integration and optimization of key formation flying algorithms onto the Payload On-Board Computer as well as the development of a Hardware-In-The-Loop simulator for full on-orbit mission simulations. This thesis will provide background knowledge of the Space Flight Laboratory and its activities, the CanX-4/5 mission, and nally highlight the authors contributions to overcoming each of these technical challenges and ensuring the success of the CanX-4 and CanX-5 mission. CanX Embedded Systems Software Protocol Stack Satellite Communications On-board Computers Software Optimizations Nanosatellites 0544 0538

Search results