• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 223
  • 59
  • 56
  • 55
  • 29
  • 24
  • 23
  • 18
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 609
  • 157
  • 115
  • 105
  • 90
  • 89
  • 76
  • 63
  • 57
  • 55
  • 52
  • 52
  • 50
  • 49
  • 48
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

The effect of the number of log sorts on mechanised log processing productivity and value recovery in landing-based cable yarder harvesting operations

Tolan, Alexander Charles January 2014 (has links)
The New Zealand forest industry produces a diverse range of log grades and sorts to meet domestic and export market demands and to maximise returns to the forest grower. An implication for the supply chain is the number of log grades and sorts a harvesting operation is expected to produce from one species, radiata pine (Pinus radiata). The number of log grades and sorts can impact on landing size and layout requirements, value recovery, log-making complexity, machine utilisation and quality control requirements. A study was conducted to investigate if the number of log sorts affects mechanised log processing productivity and value recovery. This would determine if any gross value gains derived from producing a higher number of sorts are offset by losses in log processing productivity. Two landing-based mechanised log processors at cable yarder harvesting operations were studied using different cutting scenarios producing five, nine, twelve and fifteen log sorts. The study collected data from over 26 hours of mechanised processing which included the processing of 578 stems at an average piece size of approximately 1.6 m³. Machine utilisation results showed processors spending 84% of total time on productive tasks and that 49% of total time was spent on the primary productive tasks of log processing. Quadratic regressions were used to model log processing productivity trends which showed piece size and cutting scenario as significant predictor variables (p-value <0.01). There was a significant difference between cutting scenario with five log sorts and the cutting scenarios with twelve and fifteen log sorts (p-values <0.05), as well as a significant difference between the nine and fifteen log sort cutting scenarios (p-value <0.01). There was not enough evidence to suggest productivity was different between cutting scenarios producing five and nine log sorts. Based on this analysis, it was likely that the null hypothesis that the number of log sorts does not affect log processing productivity should be rejected. At a piece size of 2 m³, the productivity model estimated processing productivity was 10% higher producing nine log sorts compared to producing fifteen log sorts. A linear regression model showed a strong relationship between gross value recovery, piece size and cutting scenario (p-value <0.01). Gross value recovery increased as the number of log sorts increased. A significant model suggested it is likely null hypothesis 2, that the number of log sorts does not affect gross value recovery, should be rejected. There were only some differences in variances between cutting scenarios which were statistically significant. Both the average results and regression estimates showed the five log sort cutting scenario recovering 94% of the value of the cutting scenario with fifteen log sorts. Incremental gains in value recovery as the number of log sorts increased were marginal, which appeared to be due to log prices for many major log grades trading in a close range in relation to historic price trends. Regression trends for productivity and gross value recovery indicated that the most optimal cutting scenario, in terms of processing value outturn per productive machine hour, was the cutting scenario producing nine log sorts. This suggests that declines in processor productivity offset gains in gross value recovery when producing twelve and fifteen log sorts. Market sensitivity analysis suggested that differentials in log prices impact on the number of log sorts which optimise the value outturn per productive machine hour from log processing.
82

Effective use of partitioned cache memories

Page, Daniel Stephen January 2001 (has links)
No description available.
83

Energy efficient branch prediction

Hicks, Michael Andrew January 2010 (has links)
Energy efficiency is of the utmost importance in modern high-performance embedded processor design. As the number of transistors on a chip continues to increase each year, and processor logic becomes ever more complex, the dynamic switching power cost of running such processors increases. The continual progression in fabrication processes brings a reduction in the feature size of the transistor structures on chips with each new technology generation. This reduction in size increases the significance of leakage power (a constant drain that is proportional to the number of transistors). Particularly in embedded devices, the proportion of an electronic product’s power budget accounted for by the CPU is significant (often as much as 50%). Dynamic branch prediction is a hardware mechanism used to forecast the direction, and target address, of branch instructions. This is essential to high performance pipelined and superscalar processors, where the direction and target of branches is not computed until several stages into the pipeline. Accurate branch prediction also acts to increase energy efficiency by reducing the amount of time spent executing mis-speculated instructions. ‘Stalling’ is no longer a sensible option when the significance of static power dissipation is considered. Dynamic branch prediction logic typically accounts for over 10% of a processor’s global power dissipation, making it an obvious target for energy optimisation. Previous approaches at increasing the energy efficiency of dynamic branch prediction logic has focused on either fully dynamic or fully static techniques. Dynamic techniques include the introduction of a new cache-like structure that can decide whether branch prediction logic should be accessed for a given branch, and static techniques tend to focus on scheduling around branch instructions so that a prediction is not needed (or the branch is removed completely). This dissertation explores a method of combining static techniques and profiling information with simple hardware support in order to reduce the number of accesses made to a branch predictor. The local delay region is used on unconditional absolute branches to avoid prediction, and, for most other branches, Adaptive Branch Bias Measurement (through profiling) is used to assign a static prediction that is as accurate as a dynamic prediction for that branch. This information is represented as two hint-bits in branch instructions, and then interpreted by simple hardware logic that bypasses both the lookup and update phases for appropriate branches. The global processor power saving that can be achieved by this Combined Algorithm is around 6% on the experimental architectures shown. These architectures are based upon real contemporary embedded architecture specifications. The introduction of the Combined Algorithm also significantly reduces the execution time of programs on Multiple Instruction Issue processors. This is attributed to the increase achieved in global prediction accuracy.
84

Smart PCM Encoder

Bondurant, Philip D., Driesman, Andrew 11 1900 (has links)
International Telemetering Conference Proceedings / October 30-November 02, 1995 / Riviera Hotel, Las Vegas, Nevada / In this paper, a new concept in PCM telemetry encoding equipment is described. Existing "programmable" PCM encoders allow only simple changes in the functionality of the hardware, such as input gain, offset, and word formatting. More importantly, these encoders do not provide capability for "in-flight" processing of signals and in general have not taken advantage of existing hardware and software digital signal processing technology. In-flight processing of signals can provide a significant reduction in the required transmission bandwidth, allowing additional data that may not have otherwise been transmitted to be sent on the telemetry channel. A modular digital signal processor (DSP) based PCM encoder architecture is described that has a set of on-board processing algorithms configurable via a simple-to-use graphical user interface. Algorithms included are compression (lossy and lossless), Fourier transforms of various resolutions (typically followed by peak detection to provide a data rate reduction), extreme values (max, min, rms), time filtering, regression, trajectory prediction, and serial data stream processing. Custom algorithms can be developed and included as part of the suite of processing algorithms. The preprocessing algorithms exist as firmware on the DSPs and can accommodate as many different signals as the processing bandwidth of the DSP can handle. Typically one DSP can handle many input signals and different algorithms. The encoder is programmable via a standard RS-232 serial interface allowing the signal input configuration, telemetry frame layout, and on-board processing algorithms to be changed quickly.
85

An On-Line Macro Processor for the Motorola 6800 Microprocessor

Hsieh, Chang-Boe 05 1900 (has links)
The first chapter discusses the concept of macros: its definition, structure, usage, design goals, and the related prior work. This thesis principally concerns my work on OLMP (an On-Line Macro Processor for the Motorola 6800 Microprocessor), which is a macro processor which interacts with the user. It takes Motorola assembler source code and macro definitions as its input; after the appropriate editing and expansions, it outputs the expanded assembler source statements. The functional objectives, the design for implementation of OLMP, the basic macro format, and the macro definition construction are specified in Chapter Two. The software and the hardware environment of OLMP are discussed in the third chapter. The six modules of OLMP are the main spine of the fourth chapter. The comments on future improvement and how to link OLMP with the Motorola 6800 assembler are the major concern of the final chapter.
86

Embedded Processor Selection/Performance Estimation using FPGA-based Profiling

Obeidat, Fadi 26 July 2010 (has links)
In embedded systems, modeling the performance of the candidate processor architectures is very important to enable the designer to estimate the capability of each architecture against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on a given processor with a minimum of time and resources. This dissertation presents a framework that employs the softcore MicroBlaze processor as a reference architecture where FPGA-based profiling is implemented to extract the functional statistics that characterize the target application. Linear regression analysis is implemented for mapping the functional statistics of the target application to the performance of the candidate processor architecture. Hence, this approach does not require running the target application on each candidate processor; instead, it is run only on the reference processor which allows testing many processor architectures in very short time.
87

Accelerating Computational Algorithms

Risley, Michael 10 December 2013 (has links)
Mathematicians and computational scientists are often limited in their ability to model complex phenomena by the time it takes to run simulations. This thesis will inform interested researchers on how the development of highly parallel computer graphics hardware and the compiler frameworks to exploit it are expanding the range of algorithms that can be explored on affordable commodity hardware. We will discuss the complexities that have prevented researchers from exploiting advanced hardware as well as the obstacles that remain for the non-computer scientist.
88

On the simulation and design of manycore CMPs

Thompson, Christopher Callum January 2015 (has links)
The progression of Moore’s Law has resulted in both embedded and performance computing systems which use an ever increasing number of processing cores integrated in a single chip. Commercial systems are now available which provide hundreds of cores, and academics have proposed architectures for up to 1024 cores. Embedded multicores are increasingly popular as it is easier to guarantee hard-realtime constraints using individual cores dedicated for tasks, than to use traditional time-multiplexed processing. However, finding the optimal hardware configuration to meet these requirements at minimum cost requires extensive trial and error approaches to investigate the design space. This thesis tackles the problems encountered in the design of these large scale multicore systems by first addressing the problem of fast, detailed micro-architectural simulation. Initially addressing embedded systems, this work exploits the lack of hardware cache-coherence support in many deeply embedded systems to increase the available parallelism in the simulation. Then, through partitioning the NoC and using packet counting and cycle skipping reduces the amount of computation required to accurately model the NoC interconnect. In combination, this enables simulation speeds significantly higher than the state of the art, while maintaining less error, when compared to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS (Million (target) Instructions Per Second), or 110MHz, which is better than typical FPGA prototypes, and approaching final ASIC production speeds. This is achieved while maintaining an error of only 2.1%, significantly lower than other similar simulators. The thesis continues by scaling the simulator past large embedded systems up to 64-1024 core processors, adding support for coherent architectures using the same packet counting techniques along with low overhead context switching to enable the simulation of such large systems with stricter synchronisation requirements. The new interconnect model was partitioned to enable parallel simulation to further improve simulation speeds in a manner which did not sacrifice any accuracy. These innovations were leveraged to investigate significant novel energy saving optimisations to the coherency protocol, processor ISA, and processor micro-architecture. By introducing a new instruction, with the name wait-on-address, the energy spent during spin-wait style synchronisation events can be significantly reduced. This functions by putting the core into a low-power idle state while the cache line of the indicated address is monitored for coherency action. Upon an update or invalidation (or traditional timer or external interrupts) the core will resume execution, but the active energy of running the core pipeline and repeatedly accessing the data and instruction caches is effectively reduced to static idle power. The thesis also shows that existing combined software-hardware schemes to track data regions which do not require coherency can adequately address the directory-associativity problem, and introduces a new coherency sharer encoding which reduces the energy consumed by sharer invalidations when sharers are grouped closely together, such as would be the case with a system running many tasks with a small degree of parallelism in each. The research concludes by using the extremely fast simulation speeds developed to produce a large set of training data, collecting various runtime and energy statistics for a wide range of embedded applications on a huge diverse range of potential MPSoC designs. This data was used to train a series of machine learning based models which were then evaluated on their capacity to predict performance characteristics of unseen workload combinations across the explored MPSoC design space, using only two sample simulations, with promising results from some of the machine learning techniques. The models were then used to produce a ranking of predicted performance across the design space, and on average Random Forest was able to predict the best design within 89% of the runtime performance of the actual best tested design, and better than 93% of the alternative design space. When predicting for a weighted metric of energy, delay and area, Random Forest on average produced results within 93% of the optimum result. In summary this thesis improves upon the state of the art for cycle accurate multicore simulation, introduces novel energy saving changes the the ISA and microarchitecture of future multicore processors, and demonstrates the viability of machine learning techniques to significantly accelerate the design space exploration required to bring a new manycore design to market.
89

Contribution à l’optimisation de densité de code pour Processeur Embarqué / Contribution to the optimization of Embedded processor code density

Fahmi, Youssef 13 June 2013 (has links)
Les systèmes embarqués prennent une place de plus en plus grande dans le marché actuelavec des dispositifs basée sur des systèmes on-chip. Ces systèmes embarqués ont descontraintes très fortes concernant leurs coût, taille, consommation, fiabilité et dimensions.Dans ce contexte la densité de code d'un processeur devient un critère important.Dans cette thèse l'idée était de prendre un processeur RISC(l'APS3 de la société Cortus)qui a de bonne performance pour le monde embarqué et d'augmenter sa densité de code.Plusieurs méthodes ont été testé :– compression à base de Huffman.– compression à base de dictionnaire.– modification du jeu d'instructions.Les méthodes de compression ont montrée leur limites dans notre cas car soit ellesn'étaient pas compatible avec nos objectifs , soit elles offraient un gain pas assez importantcomparé aux surplus en terme de taille et de cycle en plus lors de l'exécution. Ce qui nousa poussé vers la modification du jeu d'instructions.Le résultat obtenu est une augmentation de la taille du code de 25% dans la phase derecherche et de 20.8% dans la version finale du processeur car il aura fallu faire un compromispour garder une petite taille et de bonnes performances.L'APS3CD est le résultat de cette thèse. il a une surface de 49605m2, une fréquencemaximale de 444 MHZ, un score de 2.16 DMIPS/MHZ et une consommation de12 W/MHZ(UMC90). il offre 20.8% de gain par rapport à l'APS3 et 40% par rapport aucortex-m3 (avec gcc) qui est une référence en terme de densité de code dans le marché.Toutefois le gain obtenu peut être augmente en travaillant sur le compilateur car lecompilateur actuel (gcc) n'utilise pas pleinement les instructions complexes ajoutés (dansquelque cas). Une continuation possible serait de travailler sur un compilateur qui soitmeilleur que gcc qui à la base n'est pas destinée aux systèmes embarqué avec des demandesde densité de code. Un exemple est la différence de taille du code entre gcc etiar ou keil pour les processeurs ARM. / Since the market is moving toward portable devices with a one device System on-Chip(SoC), code density of a processor becomes an important criteria.The idea of this thesis was to improve the code density of the Cortus processor theAPS3, which is an embedded RISC processor with good performances.Several methods were tried :– Huffman compression.– Dictionnary based compression.– Instruction set modification.Compression methods have shown their limits in this case either because they werenot compatible with our goals or did not provided a gain large enough compared to surplusesin terms of size and cycle number when running. This prompted us to modifie theinstruction set.The result was 25% of code density improvement in the research phase and 20.8% ofcode density improvement in the final version of the processor because we had to keepgood perfomances and small size of the APS3.APS3CD is the result of this thesis. It has an area of 49605m2, a maximum frequencyof 444 MHZ, a score of 2.16 DMIPS/MHz and a consumption of 12W/MHZ(UMC90). itoffers 20.8% gain over the APS3 and 40% compared to the cortex-m3 (with gcc) which is arefrence in termof code density in the market.However, the gain can be increased by working on the compiler because the currentcompiler (gcc) does not fully utilize the complex instructions added (in some cases). Apossible continuation would be to work on a compiler better than gcc wich is not designedfor embedded systems applications with code density at the base. An example is the codesize difference between gcc and keil or iar for ARM processors.
90

Adaptable VLIW microprocessor for energy efficiency / Microprocessador VLIW para a eficiência energética

Giraldo, Juan Sebastian Piedrahita January 2016 (has links)
O consumo de energia tem sido uma variável cada vez mais importante nos projetos de implementação de microprocessadores nas últimas décadas. A arquitetura VLIW é um exemplo representativo desta tendência, devido ao seu design simples e desempenho competitivo, resultado da exploração do paralelismo entre instruções (ILP) em tempo de compilação. Neste trabalho, é realizada uma análise da economia de energia obtida através da adaptação da microarquitetura dos processadores VLIW de acordo com as diferentes fases dos programas executados. Primeiramente, o potencial de otimização é abordado, através da execução de um grupo de benchmarks no processador configurável ρ-vex, e estudando o impacto da largura do processador (i.e.: número de issues) na performance, consumo de energia, e área. A partir desta informação, um experimento levando em conta o caso ótimo (usando um oráculo) foi realizado com o objetivo de variar dinamicamente a largura do processador de acordo com a fase do programa, considerando duas granularidades diferentes. A economia de energia usando este tipo de adaptação pode ser de até 81,5% comparado com uma versão estática do mesmo processador executando o grupo de benchmarks MiBench. Com base nestes resultados, duas técnicas de power gating nas unidades funcionais são propostas. A primeira é baseada em lógica adicional, inserida no processador, para controlar os circuitos de power gating associados com cada unidade funcional. Mostra-se que estas unidades podem ser desabilitadas em até 63% do tempo de execução para os multiplicadores e 30% para as ALUs, com um custo em performance de 13%, em média. A segunda técnica proposta propõe uma técnica para ser usada em conjunto com o compilador para aplicar power gating nas unidades funcionais, assim como nos blocos do banco de registradores. Esta operação é realizada inserindo instruções específicas em tempo de compilação, tendo em conta a análise das probabilidades de instruções de saltos e informação dos blocos básicos, obtidos através de instrumentação de código. Utilizando este tipo de estratégia, é possível economizar até 20% em energia com perda marginal de desempenho. / The development of energy efficient hardware has been a trend in microprocessor design for the last two decades. VLIW processors are a representative example, since they have a simpler design and competitive performance, due to their static ILP exploitation. In this work, we study the energy savings that could be obtained by adapting such microarchitecture according to the current program phase. First we analyze the potential of optimization, by executing a set of benchmarks on the ρ-vex configurable softcore VLIW processor, and by modifying the number of issues. With this data in hand, we develop an oracle experiment to dynamically vary the issue width of the processor according to the phase behavior, considering two different phase granularities. The potential energy savings using this policy could be as high as 81.5% when compared with the static version, executing the MiBench set. Taking into account this information, two techniques for power gating the functional units are proposed. The first approach is based on additional hardware logic to control the power gating circuitry of each Functional Unit. Our results show that these units can be put to sleep on average 63% of the execution cycles for the multipliers and 30% for the ALUs, at a performance loss of 13%. The second approach handles intelligent use of the compiler for power gating the Functional Units as well as blocks of the Register File. We do so by inserting customized instructions at compile time, based on the analysis that involves probabilities of conditional branches and basic block information obtained via dynamic profiling. By using this technique, it is possible to save up of 20% in the total energy consumption with marginal losses in performance.

Page generated in 0.0311 seconds