41 |
Using Software Thread Integration with TinyOSPurvis, Zane Dustin 13 November 2007 (has links)
with peripherals such as radios, analog-to-digital converters (ADCs), memory devices, and sensors. The periods of busy-waiting waste time and energy which are both limited resources for many mote applications. This document presents techniques of using software thread integration (STI) in TinyOS applications to reclaim the idle time and use it for useful processing. The TinyOS scheduler is modified to support the selection and execution of integrated threads and analyzes the impact of integration on task response time. A microphone array sampling application is used to demonstrate the savings. Integrated tasks in the sample application finish 17.7% faster and the application?s active time is reduced by 6.3%.
|
42 |
WELD FOR ITANIUM PROCESSORSharma, Saurabh 03 December 2002 (has links)
This dissertation extends a WELD for Itanium processors. Emre Özer presented WELD architecture in his Ph.D. thesis. WELD integrates multithreading support into an Itanium processor to hide run-time latency effects that cannot be determined by the compiler. Also, it proposes a hardware technique called operation welding that merges operations from different threads to utilize the hardware resources. Hardware contexts such as program counters and the fetch units are duplicated to support for multithreading. The experimental results show that Dual-thread WELD attains a maximum of 11% speedup as compared to single-threaded Itanium architecture while still maintaining the hardware simplicity of the EPIC architecture.
|
43 |
Performance Analysis of System-on-Chip Applications of Three-dimensional Integrated CircuitsSchoenfliess, Kory Michael 01 March 2006 (has links)
In the research community, three-dimensional integrated circuit (3DIC) technology has garnered attention for its potential use as a solution to the scaling gap between MOSFET device characteristics and interconnects. The purpose of this work is to examine the performance advantages offered by 3DICs. A 3D microprocessor-based test case has been designed using an automated 3DIC design flow developed by the researchers of North Carolina State University. The test case is based on an open architecture that is exemplary of future complex System-on-Chip (SoC) designs. Specialized partitioning and floorplanning procedures were integrated into the design flow to realize the performance gains of vertical interconnect structures called 3D vias. For the post-design characterization of the 3DIC, temperature dependent models that describe circuit performance over temperature variations were developed. Together with a thermal model of the 3DIC, the performance scaling with temperature was used to predict the degree of degradation of the delay and power dissipation of the 3D test case. Using realistic microprocessor workloads, it was shown that the temperatures of the 3DIC thermal model are convergent upon a final value. The increase in delay and power dissipation from the thermal analysis was found to be negligibly small when compared to the performance improvements of the 3DIC. Timing analysis of the 3D design and its 2D version revealed a critical path delay reduction of nearly 26.59% when opting for a 3D implementation. In addition, the 3D design offered power dissipation savings of an average of 3% while running at a proportionately higher clock frequency.
|
44 |
Socket API Extensions to Extract Packet Header Information List (PHIL)NARAYAN, RAVINDRA 26 May 1999 (has links)
<p>The following thesis focuses on the current standards and practices employedin the IP security domain. It presents an overview of the existing protocolsand proposes extensions to the socket API such that the currently available IPsecurity mechanisms can be employed by a wider range of user applications whichcomprise of the present generation application programs and the futuregeneration security applications (such as the Intrusion detection Systems).Further an implementation of the proposed extensions on the Linux operatingsystem platform is presented and the usage of the API is demonstrated withexample applications based on the API extensions.<P>
|
45 |
Power Analysis and Instruction Scheduling for Reduced di/dt in the Execution Core of High-Performance MicroprocessorsToburen, Mark C 29 June 1999 (has links)
<p>Power dissipation is becoming a first-order design issue in high-performance microprocessors as clock speeds and transistor densitiescontinue to increase. As power dissipation levels rise, thecooling and reliability of high-performance processors becomesa major issue. This implies that significant research needsto be done in the area of architectural techniques for reducingpower dissipation.One major contributor to a processor's average peak powerdissipation is the presence of high di/dt in its executioncore. High-energy instructions scheduled together in a singlecycle can result in large current spikes during execution. Inthe presence of heavily weighted regions of code, these currentspikes can cause increases in the processor's average peakpower dissipation. However, if the compiler produces largeenough regions, a certain amount of schedule slack should exist,providing opportunities for scheduling optimizations based onper-cycle energy constraints.This thesis proposes a novel approach to instruction schedulingbased on the concept of schedule slack, which builds energyefficient schedules by limiting the energy dissipated in asingle cycle. In this manner, a more uniform di/dt curve isgenerated resulting in a decrease in the execution core's averagepeak power dissipation. <P>
|
46 |
Wavelet Transform Adaptive Signal DetectionHuang, Wensheng 21 November 1999 (has links)
<p>Wavelet Transform Adaptive Signal Detection is a signal detection method that uses the Wavelet Transform Adaptive Filter (WTAF). The WTAF is the application of adaptive filtering on the subband signals obtained by wavelet decomposition and reconstruction. The WTAF is an adaptive filtering technique that leads to good convergence and low computational complexity. It can effectively adapt to non-stationary signals, and thus could find practical use for transient signals. Different architectures for implementing the WTAF were proposed and studied in this dissertation. In terms of the type of the wavelet transform being used, we presented the DWT based WTAF and the wavelet tree based WTAF. In terms of the position of the adaptive filter in the signal paths of the system, we presented the Before-Reconstruction WTAF, in which the adaptive filter is placed before the reconstruction filter; and the After-Reconstruction WTAF, in which the adaptive filter is placed after the reconstruction filter. This could also be considered as implementing the adaptive filtering in different domains, with the Before-Reconstruction structure corresponding to adaptive filtering in the scale-domain, and the After-Reconstruction structure corresponding to adaptive filtering in the time-domain. In terms of the type of the error signal used in the WTAF, we presented the output error based WTAF and the subband error based WTAF. In the output error based WTAF, the output error signal is used as input to the LMS algorithm. In the subband error based WTAF, the error signal in each subband is used as input to the LMS algorithm. The algorithms for the WTAF were also generalized in this work. In order to speed up the calculation, we developed the block LMS based WTAF, which modifies the weights of the adaptive filter block-by-block instead of sample-by-sample. Experimental studies were performed to study the performance of different implementation schemes for the WTAF. Simulations were performed on different WTAF algorithms with a sinusoidal input and with a pulse input. The speed and stability properties of each structure were studied experimentally and theoretically. It was found that different WTAF structures had different tradeoffs in terms of stability, performance, computational complexity, and convergence speed. The WTAF algorithms were applied to an online measurement system for fabric compressional behavior and they showed encouraging results. A 3-stage DWT based WTAF and a block WTAF based on a 3-stage DWT was employed to process the noisy force-displacement signal acquired from the online measurement system. The signal-to-noise ratio was greatly increased by applying these WTAFs, which makes a lower sampling rate a possibility. The reduction of the required time for data sampling and processing greatly improves the system speed to meet faster testing requirements. The WTAF algorithm could also be used in other applications requiring fast processing, such as in the real-time applications in communications, measurement, and control.<P>
|
47 |
Dynamic Optimization Infrastructure and Algorithms for IA-64Hazelwood, Kim Michelle 29 June 2000 (has links)
<p><P>Dynamic Optimization refers to any program optimization performed after the initial static compile time. While typically not designed as a replacement for static optimization, dynamic optimization is a complementary optimization opportunity that leverages a vast amount of information that is not available until runtime. Dynamic optimization opens the doors for machine and user-specific optimizations without the need for original source code. <P>This thesis includes three contributions to the field of dynamic optimization. The first main goal is the survey of several current approaches to dynamic optimization, as well as its related topics of dynamic compilation, the postponement of some or all of compilation until runtime, and dynamic translation, the translation of an executable from one instruction-set architecture (ISA) to another.<P>The second major goal of this thesis is the proposal of a new infrastructure for dynamic optimization in EPIC architectures. Several salient features of the EPIC ISA prove it to be not only a good candidate for dynamic optimization, but such optimizations are essential for scalability that is up to par with superscalar processors. By extending many of the existing approaches to dynamic optimization to allow for offline optimization, a new dynamic optimization system is proposed for EPIC architectures. For compatibility reasons, this new system is almost entirely a software-based solution, yet it utilizes the hardware-based profiling counters planned for future EPIC processors. <P>Finally, the third contribution of this thesis is the introduction of several original optimization algorithms, which are specifically designed for implementation in a dynamic optimization infrastructure. Dynamic if-conversion is a lightweight runtime algorithm that converts control dependencies to data dependencies and vice versa at runtime, based on branch misprediction rates, that achieves a speedup of up to 17% for the SpecInt95 benchmarks. Several other algorithms, such as predicate profiling, predicate promotion and false predicate path collapse are designed to aid in offline instruction rescheduling.<P>
|
48 |
Exploiting Program Redundancy to Improve Performance, Cost and Power Consumtion in Embedded SystemsLarin, Sergei Yurievich 19 July 2000 (has links)
<p>During the last 15 years embedded systems have grown rapidly in complexity and performance to a point where theynow rival the design challenges of desktop systems. Embedded systems are now targets for contradictory requirements: they are expected to occupy a small amount of physical space (e.g., low package count), be inexpensive, consume low power and be highly reliable. Regardless of the decades of intensive research and development, there are still areas that can promise significant benefits if further researched. One such area is the quality of the data which embedded system operates upon. This includes both code and data segments of an embedded system application. This work presents a unified, compiler-driven approach to solving the redundancy problem. It attempts toincrease the quality of the data stream that embedded systems are operating upon while preserving the original functionality. The code size reduction is achieved by Huffman compressing or tailor encoding the ISA of the original program. The data segment size reduction is accomplished by modified Discrete Dynamic Huffman encoding. This work is the first such study that also details the design of instruction fetch mechanisms for the proposed compression schemes.<P>
|
49 |
Evaluating Placement Algorithms with the DREAM Framework for Reconfigurable Hardware DevicesEatmon, Dedra 09 August 2000 (has links)
<p><p>The field programmable gate array (FPGA) has become one of the most utilized configurable devices in the area of reconfigurable computing. FPGAs have alarge amount of flexibility and provide a high degree of parallel computing capability. Since their introduction in the 1980's, these configurable logicdevices have experienced a dramatic increase in programming capabilities and performance. Both factors have been significant in the changing roles ofconfigurable devices in custom-computing machines. However, the improvements in capability and performance have not eliminated the issues related toefficient placement of applications on these devices.<p>This thesis presents a tool that evaluates placement algorithms for configurable logic devices. Written in Java, the tool is a framework in whichvarious placement algorithms can be executed and the performance and quality ofeach placement evaluated using a cost function. Based on devices thatsupport relocatable hardware modules (RHMs), the tool places modules with user-specified placement algorithms and provides feedback that can be usedfor a comparative analysis. The framework manages module mappings to the logicdevice that are both independent of each other and do not requirepin-to-pin routing connections. Such a tool is valuable for the identification of effective placement algorithms for real-time placement of RHMs in run-time reconfigurable systems.<p>The <b>D</b>ynamic <b>Re</b>source <b>A</b>llocation and <b>M</b>anagement (DREAM) framework, has been designed and developed to evaluate FPGA placement algorithms/heuristics. Aportion of the evaluation is based on a simplistic cost function that calculates the amount of contiguous unused space remaining on the device intwo dimensions. In our experiments, we use an FPGA logic core generator to produce several rectangular RHMs. In addition to the rectangular RHMs producedby the logic core generation tool, our framework can handle arbitrary circuit profiles. Several scenarios consisting of approximately 500insertions/deletions of both rectangular and non-rectangular RHMs are used as test data sets for placement. Three placement algorithms are presented todemonstrate the flexibility of the framework. The first algorithm tested in the DREAM framework is a random placement algorithm. The second algorithm isan adaptation of a traditional best-fit algorithm that we call exhaustivesearch. The third algorithm is a modified version of first-fit.Future work will involve the development of additional placement algorithms andthe incorporation ofplacement issues that relate to requests for central reconfigurable computing resources originating from a remote site.<p>The DREAM framework answers the call for a tool that is sorely needed to identify placement algorithms that can be effectively used for real-timeplacement. In addition to providing results that can be used to benchmark the performance of placement algorithms in real-time on a configurablesystem, this tool also allows the end-user methods to store and load placementsfor future optimization. By taking full advantage of the partial andfull dynamic reconfiguration capabilities of logic devices currently used in run-time reconfigurable systems, the goal of DREAM is to provide a tool with whichthe quality of placement algorithms can be quantified and compared.<P>
|
50 |
IA-64 Code GenerationRao, Vikram S. 11 August 2000 (has links)
<p>This work presents an approach to code generation for a new 64-bit ExplicitlyParallel Instruction Computing (EPIC) architecture from Intel, called IA-64. Themajor contribution of this work is the design of a machine independent optimizer,munger, that transforms code generated originally for a Very Long Instruction Word(VLIW) processor, called Tinker, to one that can run on the IA-64 architecture.The munger does this transformation by reading in a set of rules that specify amapping from Tinker specific code to IA-64 specic code. Th aim is to do thistransformation outside the compiler back-end, thereby being able to take advantageof any optimizations that the back-end might perform on the code. This would alsopreclude rewriting the existing back-end significantly, to support the new architecture.The primary motivation for this approach was the fairly large similarity betweenthe Tinker, and the IA-64 architectures. Besides, Tinker is an experimental VLIWarchitecture that supports a number of features to exploit instruction level parallelism(ILP) and can be easily extended to support new features. This makes the back-endfor Tinker an ideal compiler to retarget for the IA-64 architecture, since it alreadyperforms most ILP optimizations that are supported on the IA-64.<P>
|
Page generated in 0.1285 seconds