• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 8
  • 4
  • 1
  • Tagged with
  • 28
  • 28
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Metodologia dinâmica para avaliação da efetividade de otimização e exploração de localidade de valor. / Dynamic methodology for optimization effectiveness evaluation and value locality exploitation.

Costa, Carlos Henrique Andrade 24 September 2012 (has links)
O desempenho de um software depende das múltiplas otimizações no código realizadas por compiladores modernos para a remoção de computação redundante. A identificação de computação redundante é, em geral, indecidível em tempo de compilação, e impede a obtenção de um caso ideal de referência para a medição do potencial inexplorado de remoção de redundâncias remanescentes e para a avaliação da eficácia de otimização do código. Este trabalho apresenta um conjunto de métodos para a análise da efetividade de otimização de código através da observação do conjunto completo de instruções dinamicamente executadas e referências à memória na execução completa de um programa. Isso é feito por meio do desenvolvimento de um algoritmo de value numbering dinâmico e sua aplicação conforme as instruções vão sendo executadas. Este método reduz a análise interprocedural à análise de um grande bloco básico e detecta operações redundantes de memória e operações escalares que são visíveis apenas em tempo de execução. Desta forma, o trabalho estende a análise de reuso de instruções e oferece tanto uma aproximação mais exata do limite superior de otimização explorável dentro de um programa, quanto um ponto de referência para avaliar a eficácia de uma otimização. O método também provê uma visão clara de hotspots de redundância não explorados e uma medida de localidade de valor dentro da execução completa de um programa. Um modelo que implementa o método e integra-o a um simulador completo de sistema baseado em Power ISA 64-bits (versão 2.06) é desenvolvido. Um estudo de caso apresenta os resultados da aplicação deste método em relação a executáveis de um benchmark representativo (SPECInt2006) criados para cada nível de otimização do compilador GNU C/ C++. A análise proposta produz uma avaliação prática de eficácia da otimização de código que revela uma quantidade significativa de redundâncias remanescentes inexploradas, mesmo quando o maior nível de otimização disponível é usado. Fontes de ineficiência são identificadas através da avaliação de hotspots e de localidade de valor. Estas informações revelam-se úteis para o ajuste do compilador e da aplicação. O trabalho ainda apresenta um mecanismo eficiente para explorar o suporte de hardware na eliminação de redundâncias. / Software performance relies on multiple optimization techniques applied by modern compilers to remove redundant computation. The identification of redundant computation is in general undecidable at compile-time and prevents one from obtaining an ideal reference for the measurement of the remaining unexploited potential of redundancy removal and for the evaluation of code optimization effectiveness. This work presents a methodology for optimization effectiveness analysis by observing the complete dynamic stream of executed instructions and memory references in the whole program execution, and by developing and applying a dynamic value numbering algorithm as instructions are executed. This method reduces the interprocedural analysis to the analysis of a large basic block and detects redundant memory and scalar operations that are visible only at run-time. This way, the work extends the instruction-reuse analysis and provides both a more accurate approximation of the upper bound of exploitable optimization in the program and a reference point to evaluate optimization effectiveness. The method also generates a clear picture of unexploited redundancy hotspots and a measure of value locality in the whole application execution. A framework that implements the method and integrates it with a full-system simulator based on Power ISA 64-bit (version 2.06) is developed. A case study presents the results of applying this method to representative benchmark (SPECInt 2006) executables generated by various compiler optimization levels of GNU C/C++ Compiler. The proposed analysis yields a practical analysis that reveals a significant amount of remaining unexploited redundancies present even when using the highest optimization level available. Sources of inefficiency are identified with an evaluation of hotspot and value locality, an information that is useful for compilers and application-tuning softwares. The thesis also shows an efficient mechanism to explore hardware-support for redundancy elimination.
12

Metodologia dinâmica para avaliação da efetividade de otimização e exploração de localidade de valor. / Dynamic methodology for optimization effectiveness evaluation and value locality exploitation.

Carlos Henrique Andrade Costa 24 September 2012 (has links)
O desempenho de um software depende das múltiplas otimizações no código realizadas por compiladores modernos para a remoção de computação redundante. A identificação de computação redundante é, em geral, indecidível em tempo de compilação, e impede a obtenção de um caso ideal de referência para a medição do potencial inexplorado de remoção de redundâncias remanescentes e para a avaliação da eficácia de otimização do código. Este trabalho apresenta um conjunto de métodos para a análise da efetividade de otimização de código através da observação do conjunto completo de instruções dinamicamente executadas e referências à memória na execução completa de um programa. Isso é feito por meio do desenvolvimento de um algoritmo de value numbering dinâmico e sua aplicação conforme as instruções vão sendo executadas. Este método reduz a análise interprocedural à análise de um grande bloco básico e detecta operações redundantes de memória e operações escalares que são visíveis apenas em tempo de execução. Desta forma, o trabalho estende a análise de reuso de instruções e oferece tanto uma aproximação mais exata do limite superior de otimização explorável dentro de um programa, quanto um ponto de referência para avaliar a eficácia de uma otimização. O método também provê uma visão clara de hotspots de redundância não explorados e uma medida de localidade de valor dentro da execução completa de um programa. Um modelo que implementa o método e integra-o a um simulador completo de sistema baseado em Power ISA 64-bits (versão 2.06) é desenvolvido. Um estudo de caso apresenta os resultados da aplicação deste método em relação a executáveis de um benchmark representativo (SPECInt2006) criados para cada nível de otimização do compilador GNU C/ C++. A análise proposta produz uma avaliação prática de eficácia da otimização de código que revela uma quantidade significativa de redundâncias remanescentes inexploradas, mesmo quando o maior nível de otimização disponível é usado. Fontes de ineficiência são identificadas através da avaliação de hotspots e de localidade de valor. Estas informações revelam-se úteis para o ajuste do compilador e da aplicação. O trabalho ainda apresenta um mecanismo eficiente para explorar o suporte de hardware na eliminação de redundâncias. / Software performance relies on multiple optimization techniques applied by modern compilers to remove redundant computation. The identification of redundant computation is in general undecidable at compile-time and prevents one from obtaining an ideal reference for the measurement of the remaining unexploited potential of redundancy removal and for the evaluation of code optimization effectiveness. This work presents a methodology for optimization effectiveness analysis by observing the complete dynamic stream of executed instructions and memory references in the whole program execution, and by developing and applying a dynamic value numbering algorithm as instructions are executed. This method reduces the interprocedural analysis to the analysis of a large basic block and detects redundant memory and scalar operations that are visible only at run-time. This way, the work extends the instruction-reuse analysis and provides both a more accurate approximation of the upper bound of exploitable optimization in the program and a reference point to evaluate optimization effectiveness. The method also generates a clear picture of unexploited redundancy hotspots and a measure of value locality in the whole application execution. A framework that implements the method and integrates it with a full-system simulator based on Power ISA 64-bit (version 2.06) is developed. A case study presents the results of applying this method to representative benchmark (SPECInt 2006) executables generated by various compiler optimization levels of GNU C/C++ Compiler. The proposed analysis yields a practical analysis that reveals a significant amount of remaining unexploited redundancies present even when using the highest optimization level available. Sources of inefficiency are identified with an evaluation of hotspot and value locality, an information that is useful for compilers and application-tuning softwares. The thesis also shows an efficient mechanism to explore hardware-support for redundancy elimination.
13

Performance evaluation of code optimizations in FPGA accelerators /

Leite, Gustavo January 2019 (has links)
Orientador: Alexandro José Baldassin / Resumo: Com o crescimento contínuo do consumo de energia em microprocessadores,cientistas e engenheiros da computação redirecionaram atenção a arquiteturas heterogêneas, onde dispositivos de classes diferentes são usados para acelerar a computação. Dentre eles, existem as FPGAs (Field-Programmable Gate Arrays) cujo hardware pode ser reconfigurado após sua fabricação. Esta classe de dispositivos demonstra desempenho comparável aos processadores convencionais enquanto consomem apenas uma fração de energia. O uso de FPGAs vem se proliferando nos últimos anos e a perspectiva é que o nível de adoção continue a crescer. No entanto, programar FPGAs e aprimorar os programas para obter maior desempenho continua uma tarefa não trivial. Este trabalho apresenta uma compilação das principais transformações de código para otimização de programas direcionados à FPGAs. Neste trabalho também é avaliado o desempenho de programas executando em FPGAs. Mais especificamente, um subconjunto das transformações de código são aplicadas em um kernel OpenCL e os tempos de execução são medidos em um dispositivo da Intel®. Os resultados mostram que, sem a aplicação das transformações, o desempenho dos dispositivos é abaixo do que é observado quando as transformações são de fato aplicadas. / Abstract: With the ever increasing power wall in microprocessor design, scientists and engineers shifted their attention to heterogeneous architectures, where in several classes of devices are used for different kinds of computation. Among them are FPGAs whose hardware can be reconfigured after manufacturing. These devices offer comparable performance to CPUs while consuming only a fraction of energy. Infact, the use of FPGAs have been proliferating in recent years and should continue to do so considering the amount of attention these devices are receiving. Still, programmability and performance engineering in FPGAs remain hard. This work presents acompilation of the most prominent code transformations for optimizing code aimed at FPGAs. In this work we also evaluate the performance of programs running on FPGAs. More specifically, we apply a subset of the code transformations to an OpenCL kernel and measure the execution time on a Intel® FPGA. We show that, without applying these transformations before execution, poor performance is observed and the devices are underutilized. / Mestre
14

Code optimization and analysis for multiple-input and multiple-output communication systems

Yue, Guosen 01 November 2005 (has links)
Design and analysis of random-like codes for various multiple-input and multiple-output communication systems are addressed in this work. Random-like codes have drawn significant interest because they offer capacity-achieving performance. We first consider the analysis and design of low-density parity-check (LDPC) codes for turbo multiuser detection in multipath CDMA channels. We develop techniques for computing the probability density function (pdf) of the extrinsic messages at the output of the soft-input soft-output (SISO) multiuser detectors as a function of the pdf of input extrinsic messages, user spreading codes, channel impulse responses, and signal-to-noise ratios. Using these techniques, we are able to accurately compute the thresholds for LDPC codes and design good irregular LDPC codes. We then apply the tools of density evolution with mixture Gaussian approximations to optimize irregular LDPC codes and to compute minimum operational signal-to-noise ratios for ergodic MIMO OFDM channels. In particular, the optimization is done for various MIMO OFDM system configurations which include different number of antennas, different channel models and different demodulation schemes. We also study the coding-spreading tradeoff in LDPC coded CDMA systems employing multiuser joint decoding. We solve the coding-spreading optimization based on the extrinsic information SNR evolution curves for the SISO multiuser detectors and the SISO LDPC decoders. Both single-cell and multi-cell scenarios will be considered. For each of these cases, we will characterize the extrinsic information for both finite-size systems and the so-called large systems where asymptotic performance results must be evoked. Finally, we consider the design optimization of irregular repeat accumulate (IRA) codes for MIMO communication systems employing iterative receivers. We present the density evolution-based procedure with Gaussian approximation for optimizing the IRA code ensemble. We adopt an approximation method based on linear programming to design an IRA code with the extrinsic information transfer (EXIT) chart matched to that of the soft MIMO demodulator.
15

Code optimization and analysis for multiple-input and multiple-output communication systems

Yue, Guosen 01 November 2005 (has links)
Design and analysis of random-like codes for various multiple-input and multiple-output communication systems are addressed in this work. Random-like codes have drawn significant interest because they offer capacity-achieving performance. We first consider the analysis and design of low-density parity-check (LDPC) codes for turbo multiuser detection in multipath CDMA channels. We develop techniques for computing the probability density function (pdf) of the extrinsic messages at the output of the soft-input soft-output (SISO) multiuser detectors as a function of the pdf of input extrinsic messages, user spreading codes, channel impulse responses, and signal-to-noise ratios. Using these techniques, we are able to accurately compute the thresholds for LDPC codes and design good irregular LDPC codes. We then apply the tools of density evolution with mixture Gaussian approximations to optimize irregular LDPC codes and to compute minimum operational signal-to-noise ratios for ergodic MIMO OFDM channels. In particular, the optimization is done for various MIMO OFDM system configurations which include different number of antennas, different channel models and different demodulation schemes. We also study the coding-spreading tradeoff in LDPC coded CDMA systems employing multiuser joint decoding. We solve the coding-spreading optimization based on the extrinsic information SNR evolution curves for the SISO multiuser detectors and the SISO LDPC decoders. Both single-cell and multi-cell scenarios will be considered. For each of these cases, we will characterize the extrinsic information for both finite-size systems and the so-called large systems where asymptotic performance results must be evoked. Finally, we consider the design optimization of irregular repeat accumulate (IRA) codes for MIMO communication systems employing iterative receivers. We present the density evolution-based procedure with Gaussian approximation for optimizing the IRA code ensemble. We adopt an approximation method based on linear programming to design an IRA code with the extrinsic information transfer (EXIT) chart matched to that of the soft MIMO demodulator.
16

CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST

Palki, Anand B. 01 January 2006 (has links)
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD code - GHOST, on modern commodity clusters. The basic philosophy of this work is to optimize the cache performance of the code by splitting up the grid into smaller blocks and carrying out the required calculations on these smaller blocks. This in turn leads to enhanced code performance on commodity clusters. Accordingly, this work presents a discussion along with a detailed description of two techniques: external and internal blocking, for data access optimization. These techniques have been tested on steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The modified version of the code was also ported to the current state-of-art architectures with successful results.
17

PERFORMANCE EVALUATION AND OPTIMIZATION OF THE UNSTRUCTURED CFD CODE UNCLE

Gupta, Saurabh 01 January 2006 (has links)
Numerous advancements made in the field of computational sciences have made CFD a viable solution to the modern day fluid dynamics problems. Progress in computer performance allows us to solve a complex flow field in practical CPU time. Commodity clusters are also gaining popularity as computational research platform for various CFD communities. This research focuses on evaluating and enhancing the performance of an in-house, unstructured, 3D CFD code on modern commodity clusters. The fundamental idea is to tune the codes to optimize the cache behavior of the node on commodity clusters to achieve enhanced code performance. Accordingly, this work presents discussion of various available techniques for data access optimization and detailed description of those which yielded improved code performance. These techniques were tested on various steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The successful single node improvements were also efficiently tested on parallel platform. The modified version of the code was also ported to different hardware architectures with successful results. Loop blocking is established as a predictor of code performance.
18

Space-time turbo coded modulation for wireless communication systems

Tujkovic, D. (Djordje) 23 April 2003 (has links)
Abstract High computational complexity constrains truly exhaustive computer searches for good space-time (ST) coded modulations mostly to low constraint length space-time trellis codes (STTrCs). Such codes are primarily devised to achieve maximum transmit diversity gain. Due to their low memory order, optimization based on the design criterion of secondary importance typically results in rather modest coding gains. As another disadvantage of limited freedom, the different low memory order STTrCs are almost exclusively constructed for either slow or fast fading channels. Therefore in practical applications characterized by extremely variable Doppler frequencies, the codes typically fail to demonstrate desired robustness. On the other hand, the main drawback of eventually increased constraint lengths is the prohibitively large decoding complexity, which may increase exponentially if optimal maximum-likelihood decoding (MLD) is applied at the receiver. Therefore, robust ST coded modulation schemes with large equivalent memory orders structured as to allow sub-optimal, low complexity, iterative decoding are needed. To address the aforementioned issues, this thesis proposes parallel concatenated space-time turbo coded modulation (STTuCM). It is among the earliest multiple-input multiple-output (MIMO) coded modulation designs built on the intersection of ST coding and turbo coding. The systematic procedure for building an equivalent recursive STTrC (Rec-STTrC) based on the trellis diagram of an arbitrary non-recursive STTrC is first introduced. The parallel concatenation of punctured constituent Rec-STTrCs designed upon the non-recursive Tarokh et al. STTrCs (Tarokh-STTrCs) is evaluated under different narrow-band frequency flat block fading channels. Combined with novel transceiver designs, the applications for future wide-band code division multiple access (WCDMA) and orthogonal frequency division multiplexing (OFDM) based broadband radio communication systems are considered. The distance spectrum (DS) interpretation of the STTuCM and union bound (UB) performance analysis over slow and fast fading channels reveal the importance of multiplicities in the ST coding design. The modified design criteria for space-time codes (STCs) are introduced that capture the joint effects of error coefficients and multiplicities in the two dimensional DS of a code. Applied to STTuCM, such DS optimization resulted in a new set of constituent codes (CCs) for improved and robust performance over both slow and fast fading channels. A recursive systematic form with a primitive equivalent feedback polynomial is assumed for CCs to assure good convergence in iterative decoding. To justify such assumptions, the iterative decoding convergence analysis based on the Gaussian approximation of the extrinsic information is performed. The DS interpretation, introduced with respect to an arbitrary defined effective Hamming distance (EHD) and effective product distance (EPD), is applicable to the general class of geometrically non-uniform (GNU) CCs. With no constrains on the implemented information interleaving, the STTuCM constructed from newly designed CCs achieves full spatial diversity over quasi-static fading channels, the condition commonly identified as the most restrictive for robust performance over a variety of Doppler spreads. Finally, the impact of bit-wise and symbol-wise information interleaving on the performance of STTuCM is studied.
19

Combining Conditional Constant Propagation And Interprocedural Alias Analysis

Nandakumar, K S 05 1900 (has links) (PDF)
No description available.
20

Video Flow Classification : A Runtime Performance Study

Västlund, Filip January 2017 (has links)
Due to it being increasingly common that users' data is encrypted, the Internet service providers today find it difficult to adapt their service for the users' needs. Previously popular methods of classifying users data does not work as well today and new alternatives is therefore desired to give the users an optimal experience.This study focuses specifically on classifying data flows into video and non-video flows with the use of machine learning algorithms and with a focus on runtime performance. In this study the tested algorithms are created in Python and then exported into a C code implementation, more specifically the random forest and the gradient boosting trees algorithm.The goal is to find the algorithm with the fastest classification time relative to its accuracy, making the classification as fast as possible and the classification model to require as little space as possible.The results show that random forest was significantly faster at classification than gradient boosting trees, with initial tests showing it to be roughly 7 times faster after compiler optimization. After optimizing the C code random forest could classify more than 250,000 data flows each second with decent accuracy. Neither of the two algorithms required a lot of space (<3 megabyte). / HITS, 4707

Page generated in 0.0939 seconds