Global ETD Search

371	An Embedded Garbage Collection Module with Support for Multiple Mutators and Weak References Preußer, Thomas B., Reichel, Peter, Spallek, Rainer G. 14 November 2012 (has links) (PDF) This report details the design of a garbage collection (GC) module, which introduces modern GC features to the domain of embedded implementations. The described design supports weak references and feeds reference queues. Its architecture allows multiple concurrent application cores operating as mutators on the shared memory managed by the GC module. The garbage collection is exact and fully concurrent so as to enable the uninterrupted computational progress of the mutators. It combines a distributed root marking with a centralized heap scan of the managed memory. It features a novel mark-and-copy GC strategy on a segmented memory, which thereby overcomes both the tremendous space overhead of two-space copying and the compaction race of mark-and-compact approaches. The proposed GC architecture has been practically implemented and proven using the embedded bytecode processor SHAP as a sample testbed. The synthesis results for settings up to three SHAP mutator cores are given and online functional measurements are presented. Basic performance dependencies on the system configuration are evaluated. SHAP Programmierung Java garbage collection embedded systems ddc:004 rvk:SS 5514
372	Storage Management for Embedded SIMD Processors Ryu, Soojung 17 December 2003 (has links) SIMD parallelism offers a high performance and efficient execution approach for today's broad range of portable multimedia consumer products. However, new methods are needed to meet the complex demands of high performance, embedded systems. This research explores new storage management techniques for this focused but critical application. These techniques include memory design exploration based on the application retargeting technique, storage-based systolic instruction broadcast, and systolic virtual memory to improve both the performance and efficiency of embedded SIMD systems. For an efficient storage usage by memory design space exploration in embedded SIMD systems, an analysis method for assessing storage needs and costs of a given application automatically retargeted across a spectrum of storage configuration designs was developed. Using this technique, a SIMD processing element achieves optimal area and energy efficiency with a register file containing between 8 and 12 words for given workload. This configuration is between 15% and 25% more area and energy efficient than other memory configurations being considered. Systolic instruction broadcast is a high performance and area efficient instruction broadcasting scheme with short-wire interconnects by eliminating of wire latency bottleneck found in global instruction broadcast. Three implementation methods are defined and evaluated - software method, 2-write port register file method, and bypass method. In our evaluations, due to the system's short clock cycle time and scheduler, a speedup in system performance of up to 7.5 can be achieved by the year 2010. In addition, speedup of area efficiency also can be achieved up to 7.2 for a given workload. The ability of minimizing off-chip memory access latency while maximizing access frequency by scheduling techniques along with data prefetch techniques in systolic virtual memory mechanism was evaluated using our SIMD-systolic architecture simulator. Results show that, systolic virtual off-chip memory with shared address space can achieve over 50% higher area efficiency than that of an on-chip only system for a matrix multiplication application. Storage management SIMD architectures Embedded systems Embedded computer systems Memory management (Computer science)
373	Dynamic Memory Management for Embedded Real-Time Multiprocessor System-on-a-Chip Shalan, Mohamed A. 25 November 2003 (has links) The aggressive evolution of the semiconductor industry smaller process geometries, higher densities, and greater chip complexity has provided design engineers the means to create complex, high-performance System-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor and huge (tens of Mega Bytes) amount of memory, all on the same chip. Dealing with the global on-chip memory allocation/deallocation in a dynamic yet deterministic way is an important issue for upcoming billion transistor multiprocessor SoC designs. To achieve this, we propose a memory management hierarchy we call Two-Level Memory Management. To implement this memory management scheme which presents a shift in the way designers look at on-chip dynamic memory allocation we present the System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as Level Two memory management (Level One is the management of memory allocated to a particular on-chip Processing Element, e.g., an operating systems management of memory allocated to a particular processor). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time. A new tool is introduced to generate a custom optimized version of the SoCDMMU hardware. Also, a real-time operating system is modified support the new proposed SoCDMMU. We show an example where shared memory multiprocessor SoC that employs the Two-Level Memory Management and utilizes the SoCDMMU has an overall average speedup in application transition time as well as normal execution time. SoC Embedded systems Real-time allocation SoCDMMU Hardware allocator Systems on a chip
374	Architectural Enhancements for Color Image and Video Processing on Embedded Systems Kim, Jongmyon 21 April 2005 (has links) As emerging portable multimedia applications demand more and more computational throughput with limited energy consumption, the need for high-efficiency, high-throughput embedded processing is becoming an important challenge in computer architecture. In this regard, this dissertation addresses application-, architecture-, and technology-level issues in existing processing systems to provide efficient processing of multimedia in many, or ideally all, of its form. In particular, this dissertation explores color imaging in multimedia while focusing on two architectural enhancements for memory- and performance-hungry embedded applications: (1) a pixel-truncation technique and (2) a color-aware instruction set (CAX) for embedded multimedia systems. The pixel-truncation technique differs from previous techniques (e.g., 4:2:2 and 4:2:0 subsampling) used in image and video compression applications (e.g., JPEG and MPEG) in that it reduces the information content in individual pixel word sizes rather than in each dimension. Thus, this technique drastically reduces the bandwidth and memory required to transport and store color images without perceivable distortion in color. At the same time, it maintains the pixel storage format of color image processing in which each pixel computation is performed simultaneously on 3-D YCbCr components, which are widely used in the image and video processing community. CAX supports parallel operations on two-packed 16-bit (6:5:5) YCbCr data in a 32-bit datapath processor, providing greater concurrency and efficiency for processing color image sequences. This dissertation presents the impact of CAX on processing performance and on both area and energy efficiency for color imaging applications in three major processor architectures: dynamically scheduled (superscalar), statically scheduled (very long instruction word, VLIW), and embedded single instruction multiple data (SIMD) array processors. Unlike typical multimedia extensions, CAX obtains substantial performance and code density improvements through direct support for color data processing rather than depending solely on generic subword parallelism. In addition, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. In summary, CAX, coupled with the pixel-truncation technique, provides an efficient mechanism that meets the computational requirements and cost goals for future embedded multimedia products. Data parallel architectures Superscalar processors Embedded systems Subword parallelism Computer architecture Color image and video processing
375	A Methodology of SSA&D Modeling for Embedded Systems Hsu, Wen-cheng 22 July 2010 (has links) Structured technique is the traditional and the popular systems analysis and design language. With the rapid progress and development of information technology, embedded systems have penetrated into most of the equipments which we used daily. Over the past few years a considerable effort has been made in modeling the platform independent model (PIM) for business information systems. However, the detailed guideline for modeling the PIM of embedded systems is lacking. This study proposed a PIM modeling methodology with structured technique for embedded systems. The structured modeling process is consisted of three parts: requirement modeling, process modeling and module modeling. For each part, its modeling tool, modeling processes and rules are provided. The research methodology is articulated using the design science research methodology. A usability evaluation is performed to demonstrate its applicability with a real-world embedded system case. The evaluation results indicated that with this proposed method, the system developer can easily and effectively analyze and design the embedded systems with structured technique. Structured Technique Modules Modeling Process Modeling Requirement Modeling Structured Modeling Embedded Systems
376	DESIGNING COST-EFFECTIVE COARSE-GRAINED RECONFIGURABLE ARCHITECTURE Kim, Yoonjin 2009 May 1900 (has links) Application-specific optimization of embedded systems becomes inevitable to satisfy the market demand for designers to meet tighter constraints on cost, performance and power. On the other hand, the flexibility of a system is also important to accommodate the short time-to-market requirements for embedded systems. To compromise these incompatible demands, coarse-grained reconfigurable architecture (CGRA) has emerged as a suitable solution. A typical CGRA requires many processing elements (PEs) and a configuration cache for reconfiguration of its PE array. However, such a structure consumes significant area and power. Therefore, designing cost-effective CGRA has been a serious concern for reliability of CGRA-based embedded systems. As an effort to provide such cost-effective design, the first half of this work focuses on reducing power in the configuration cache. For power saving in the configuration cache, a low power reconfiguration technique is presented based on reusable context pipelining achieved by merging the concept of context reuse into context pipelining. In addition, we propose dynamic context compression capable of supporting only required bits of the context words set to enable and the redundant bits set to disable. Finally, we provide dynamic context management capable of reducing reduce power consumption in configuration cache by controlling a read/write operation of the redundant context words In the second part of this dissertation, we focus on designing a cost-effective PE array to reduce area and power. For area and power saving in a PE array, we devise a costeffective array fabric addresses novel rearrangement of processing elements and their interconnection designs to reduce area and power consumption. In addition, hierarchical reconfigurable computing arrays are proposed consisting of two reconfigurable computing blocks with two types of communication structure together. The two computing blocks have shared critical resources and such a sharing structure provides efficient communication interface between them with reducing overall area. Based on the proposed design approaches, a CGRA combining the multiple design schemes is shown to verify the synergy effect of the integrated approach. Experimental results show that the integrated approach reduces area by 23.07% and power by up to 72% when compared with the conventional CGRA. System-on-Chip Embedded Systems Configuration Cache Context Word PE Array
377	Effects Of Parallel Programming Design Patterns On The Performance Of Multi-core Processor Based Real Time Embedded Systems Kekec, Burak 01 June 2010 (has links) (PDF) Increasing usage of multi-core processors has led to their use in real time embedded systems (RTES). This entails high performance requirements which may not be easily met when software development follows traditional techniques long used for single processor systems. In this study, parallel programming design patterns especially developed and reported in the literature will be used to improve RTES implementations on multi-core systems. Specific performance parameters will be selected for assessment, and performance of traditionally developed software will be compared with that of software developed using parallel programming patterns.
378	Performance Evaluation of Embedded Microcomputers for Avionics Applications Bilen, Celal Can, Alcalde, John January 2010 (has links) <p>Embedded microcomputers are used in a wide range of applications nowadays. Avionics is one of these areas and requires extra attention regarding reliability and determinism. Thus, these issues should also be born in mind in addition to performance when evaluating embedded microcomputers.</p><p>This master thesis suggests a framework for performance evaluation of two members of the PowerPC microprocessor family, namely the MPC5554 from Freescale and PPC440EPx from AMCC, and analyzes the results within and between these processors. The framework can be generalized to be used in any microprocessor family, if required.</p><p>Apart from performance evaluation, this thesis also suggests also a new terminology by introducing the concept of determinism levels to be able to estimate determinism issues in avionics applications more clearly, which is crucial regarding the requirements and working conditions of this very application. Such estimation does not include any practical results as in performance evaluation, but rather remains theoretical. Similar to Automark™ used by AutoBench™ in the EEMBC Benchmark Suite, we introduce a new performance metric score that we call ”Aviomark” and we carry out a detailed comparison of Aviomark with the traditional Automark™ score to be able to see how Aviomark differs from Automark™ in behavior.</p><p>Finally, we have developed a graphical user interface (GUI) which works in parallel with the Green Hills MULTI Integrated Development Environment (IDE) in order to simplify and automate the evaluation process. By the help of the GUI, the users will be able to easily evaluate their specific PowerPC processors by starting the debugging from MULTI IDE.</p> Microprocessor Avionics PowerPC Performance Evaluation Determinism Embedded Systems Computer Architecture Computer engineering Datorteknik
379	Model and tool integration in high level design of embedded systems Shi, Jianlin January 2007 (has links) <p>The development of advanced embedded systems requires a systematic approach as well as advanced tool support in dealing with their increasing complexity. This complexity is due to the increasing functionality that is implemented in embedded systems and stringent (and conflicting) requirements placed upon such systems from various stakeholders. The corresponding system development involves several specialists employing different modeling languages and tools. Integrating their work and the results thereof then becomes a challenge. In order to facilitate system architecting and design integration of different models, an approach that provides dedicated workspaces/views supported by structured information management and information exchange between domain models and tools is required.</p><p>This work is delimited to the context of embedded systems design and taking a model based approach. The goal of the work is to study possible technical solutions for integrating different models and tools, and to develop knowledge, support methods and a prototype tool platform.</p><p>To this end, this thesis examines a number of approaches that focus on the integration of multiple models and tools. Selected approaches are compared and characterized, and the basic mechanisms for integration are identified. Several scenarios are identified and further investigated in case studies. Two case studies have been performed with model transformations as focus. In the first one, integration of Matlab/Simulink® and UML2 are discussed with respect to the motivations, technical possibilities, and challenges. A preliminary mapping strategy, connecting a subset of concepts and constructs of Matlab/Simulink® and UML2, is presented together with a prototype implementation in the Eclipse environment. The second case study aims to enable safety analysis based on system design models in a UML description. A safety analysis tool, HiP-HOPS (Hierarchically Performed Hazard Origin and Propagation Studies), is partially integrated with a UML tool where an EAST-ADL2 based architecture model is developed. The experience and lessons learned from the experiments are reported in this thesis.</p><p>Multiple specific views are involved in the development of embedded systems. This thesis has studied the integration between system architecture design, function development and safety analysis through using UML tools, Matlab/Simulink, and HiP-HOPS. The results indicate that model transformations provide a feasible and promising solution for integrating multiple models and tools. The contributions are believed to be valid for a large class of advanced embedded systems. However, the developed transformations so far are not really scalable. A systematic approach for efficient development of model transformations is desired to standardize the design process and reuse developed transformations. To this end, future studies will be carried out to develop guidelines for model and tool integration and to provide support for structured information at both meta level and instance level.</p> Model integration Model transformation Tool integration Model based development Embedded Systems Engineering mechanics Teknisk mekanik
380	Functional Self-Test of DSP cores in a SOC Dahir, Sarmad Jamal January 2007 (has links) <p>The rapid progress made in integrating enormous numbers of transistors on a single chip is making it possible for hardware designers to implement more complex hardware architectures in their designs. Nowadays digital telecommunication systems are implementing several forms of SOC (System-On-Chip) structures. These SOCs usually contain a microprocessor, several DSP cores (Digital-Signal-Processors), other hardware blocks, on-chip memories and peripherals.</p><p>As new IC process technologies are deployed, with decreasing geometrical dimensions, the probabilities of hardware faults to occur during operation are increasing. Testing SOCs is becoming a very complex issue due to the increasing complexity of the design and the increasing need of a test mechanism that is able to achieve acceptable fault coverage in a short test application time with low power consumption without the use of external logic testers.</p><p>As a part of the overall test strategy for a SOC, functional self-testing of a DSP core is considered in this project to be applied in the field. This test is used to verify whether fault indications in systems are caused by permanent hardware faults in the DSP. If so, the DSP where the fault is located needs to be taken out of operation, and the board it sits on will be later replaced. If not, the operational state can be restored, and the system will become fully functional again.</p><p>The main purpose of this project is to develop a functional self-test of a DSP core, and to evaluate the characteristics of the test. This project also involves proposing a scheme on how to apply a functional test on a DSP core in an embedded environment, and how to retrieve results from the test. The test program shall run at system speed.</p><p>To develop and measure the quality of the test program, two different coverage metrics were used. The first is the code coverage metric achieved by simulating the test program on the RTL representation of the DSP. The second metric used was the fault coverage achieved. The fault coverage of the test was calculated using a commercial Fault Simulator working on a gate-level representation of the DSP. The results achieved in this report show that this proposed approach can achieve acceptable levels of fault coverage in short execution time without the need for external testers which makes it possible to perform the self-test in the field. This approach has the unique property of not requiring any hardware modifications in the DSP design, and the ability of testing several DSPs in parallel.</p> functional testing SOC DSP Self test Embedded systems testing Informatik, data- och systemvetenskap

Search results