Global ETD Search

121	Memory Study and Dataflow Representations for Rapid Prototyping of Signal Processing Applications on MPSoCs / Etude mémoire et représentations flux de données pour le prototypage rapide d'applications de traitement du signal sur MPSoCs Desnos, Karol 26 September 2014 (has links) Le développement d’applications de traitement du signal pour des architectures multi-coeurs embarquées est une tâche complexe qui nécessite la prise en compte de nombreuses contraintes. Parmi ces contraintes figurent les contraintes temps réel, les limitations énergétiques, ou encore la quantité limitée des ressources matérielles disponibles. Pour satisfaire ces contraintes, une connaissance précise des caractéristiques des applications à implémenter est nécessaire. La caractérisation des besoins en mémoire d’une application est primordiale car cette propriété a un impact important sur la qualité et les performances finales du système développé. En effet, les composants de mémoire d’un système embarqué peuvent occuper jusqu’à 80% de la surface totale de silicium et être responsable d’une majeure partie de la consommation énergétique. Malgré cela, les limitations mémoires restent une contrainte forte augmentant considérablement les temps de développements. Les modèles de calcul de type flux de données sont couramment utilisés pour la spécification, l’analyse et l’optimisation d’applications de traitement du signal. La popularité de ces modèles est due à leur bonne analysabilité ainsi qu’à leur prédisposition à exprimer le parallélisme des applications. L’abstraction de toute notion de temps dans les diagrammes flux de données facilite l’exploitation du parallélisme offert par les architectures multi-coeurs hétérogènes. Dans cette thèse, nous présentons une méthode complète pour l’étude des caractéristiques mémoires d’applications de traitement du signal modélisées par des diagrammes flux de données. La méthode proposée couvre la caractérisation théorique d’applications, indépendamment des architectures ciblées, jusqu’à l’allocation quasi-optimale de ces applications en mémoire partagée d’architectures multi-coeurs embarquées. L’implémentation de cette méthode au sein d’un outil de prototypage rapide permet son évaluation sur des applications récentes de vision par ordinateur, de télécommunication, et de multimédia. Certaines applications de traitement du signal au comportement très dynamique ne pouvant être modélisé par le modèle de calcul supporté par notre méthode, nous proposons un nouveau méta-modèle de type flux de données répondant à ce besoin. Ce nouveau méta-modèle permet la modélisation d’applications reconfigurables et modulaires tout en préservant la prédictibilité, la concision et la lisibilité des diagrammes de flux de données. / The development of embedded Digital Signal Processing (DSP) applications for Multiprocessor Systems-on-Chips (MPSoCs) is a complex task requiring the consideration of many constraints including real-time requirements, power consumption restrictions, and limited hardware resources. To satisfy these constraints, it is critical to understand the general characteristics of a given application: its behavior and its requirements in terms of MPSoC resources. In particular, the memory requirements of an application strongly impact the quality and performance of an embedded system, as the silicon area occupied by the memory can be as large as 80% of a chip and may be responsible for a major part of its power consumption. Despite the large overhead, limited memory resources remain an important constraint that considerably increases the development time of embedded systems. Dataflow Models of Computation (MoCs) are widely used for the specification, analysis, and optimization of DSP applications. The popularity of dataflow MoCs is due to their great analyzability and their natural expressivity of the parallelism of a DSP application. The abstraction of time in dataflow MoCs is particularly suitable for exploiting the parallelism offered by heterogeneous MPSoCs. In this thesis, we propose a complete method to study the important aspect of memory characteristic of a DSP application modeled with a dataflow graph. The proposed method spans the theoretical, architecture-independent memory characterization to the quasi-optimal static memory allocation of an application on a real shared-memory MPSoC. The proposed method, implemented as part of a rapid prototyping framework, is extensively tested on a set of state-of-the-art applications from the computer-vision, the telecommunication, and the multimedia domains. Then, because the dataflow MoC used in our method is unable to model applications with a dynamic behavior, we introduce a new dataflow meta-model to address the important challenge of managing dynamics in DSP-oriented representations. The new reconfigurable and composable dataflow meta-model strengthens the predictability, the conciseness and the readability of application descriptions. Flux de données (modèle de calcul) Signal processing -- Digital techniques Flowgraphs Memory management (Computer science) Dataflow Multiprocessor 621.38
122	Designing Optimized parallel interleaver architecture for Turbo and LDPC decoders / Conception d’architectures d’entrelaceurs parallèles pour les décodeurs de Turbo-Codes et de LDPC Rehman, Saeed Ur 24 September 2014 (has links) Les codes correcteurs d’erreurs sont largement utilisés dans des domaines allant de l’automobile aux communications sans fils. La complexité croissante des algorithmes implémentés et l’augmentation continue des débits applicatifs constituent des contraintes fortes pour la conception d’architectures matérielles. Un tel composant utilise (1) des éléments de calculs, (2) des mémoires et des modules de brassage de données (entrelaceur/désentrelaceur TurboCodes, blocs de redondance spatio-temporelle des systèmes OFDM/MIMO…). La complexité et le coût de ces systèmes sont très élevés; les concepteurs doivent pourtant parvenir à minimiser la consommation et la surface total du circuit, tout en garantissant les performances temporelles requises. Dans ce cadre nous nous intéressons à l’optimisation des architectures des modules de brassage de données. Différentes solutions sont proposées dans la littérature, nos travaux se focalisent sur la définition d’approches de placement de données en mémoire permettant d’optimiser le coût matériel de ces architectures. Ainsi, nous présentons deux approches méthodologiques. Premièrement, nous proposons deux solutions de placement mémoire s’appliquant au moment de la conception du système: (1) placement mémoire avec personnalisation de réseau (dite Relaxation de réseau); et (2) placement mémoire garantissant un placement des données dit in-place afin de générer architecture optimisée. Deuxièmement, nous présentons une approche se basant sur l’exécution des algorithmes de placement de données directement dans le système via l’intégration d’un composant matériel dédié. / Turbo and LDPC codes are two families of codes that are extensively used in current communication standards due to their excellent error correction capabilities. To achieve high performance, parallel architectures are required. However, these architectures suffer from memory conflict problems. These conflicts increase latency of memory accesses due to the presence of conflict management mechanisms in communication network, and unfortunately decreases system throughput with augmenting system cost.To tackle memory conflict problem, different types of approaches are used in literature. In this thesis, we aim to design optimized parallel architecture. For this purpose, we have presented two different categories of approaches. In first category, we have proposed design time off-chip approaches in which we have proposed two kinds of solution: a first one based on network customization; and a second approach based on in-place memory architecture in order to generate optimized architecture. In the second category, memory mapping algorithms is embedded on-chip in order to execute them at runtime to solve conflict problem. Dedicated architecture is composed of an embedded processor and RAM memory banks to store generated command words. Polynomial time memory mapping approach and routing algorithm (based on Benes network) is embedded on-chip to solve memory conflict problem. Different experiments have been performed by using memory mapping approaches executed on several embedded processors. Entrelacement de données Placement mémoire Architecture matérielle Telecommunication Memory management (Computer science)
123	High Performance Architecture using Speculative Threads and Dynamic Memory Management Hardware Li, Wentong 12 1900 (has links) With the advances in very large scale integration (VLSI) technology, hundreds of billions of transistors can be packed into a single chip. With the increased hardware budget, how to take advantage of available hardware resources becomes an important research area. Some researchers have shifted from control flow Von-Neumann architecture back to dataflow architecture again in order to explore scalable architectures leading to multi-core systems with several hundreds of processing elements. In this dissertation, I address how the performance of modern processing systems can be improved, while attempting to reduce hardware complexity and energy consumptions. My research described here tackles both central processing unit (CPU) performance and memory subsystem performance. More specifically I will describe my research related to the design of an innovative decoupled multithreaded architecture that can be used in multi-core processor implementations. I also address how memory management functions can be off-loaded from processing pipelines to further improve system performance and eliminate cache pollution caused by runtime management functions. memory management Thread level speculation data flow architecture decoupled architecture Computer architecture. Memory management (Computer science)
124	A Quantitative Analysis of Memory Controller Page Policies Blackmore, Matthew 28 February 2013 (has links) Two common goals in computing system design are increasing performance and decreasing power consumption. DRAM-based memory subsystems are a major component of both system performance and power consumption. Memory controllers employ strategies to efficiently schedule DRAM operations to reduce latency and to utilize DRAM low power modes when possible. One of the most important of these is the page policy, which determines when to close pages in DRAM. An effective DRAM memory controller page policy is important to minimizing power consumption and increasing system performance. This thesis explores the impact memory controller page policy has on performance as measured by the number of page-hits minus page-misses and estimated average memory access latency. I captured real-time DDR3 command and address memory traces for the SPEC CPU2006 benchmarks under three memory controller page policies: closed page, fixed open-page, and Intel's adaptive open-page [1]. Traces were captured using a programmable memory traffic analyzer (PMTA), a device interposed between the DIMM slot and DDR3 DIMM on the motherboard. The memory traces for each benchmark were analyzed to determine the absolute number of page-hits and page-misses that occurred. In software post-processing I simulated a theoretically perfect "oracle" page policy for each captured trace to compare the efficiency of existing policies. The SPEC CPU 2006 benchmarks under the oracle page policy for each trace exhibited an average increase in the number of page-hits minus page-misses of 280.3% and an average decrease in the average memory latency of 11.1%. Two new adaptive open-page policies are proposed and simulated using the captured memory traces. These proposed policies result in an average increase of 74.8% and 62.4% in the number of page-hits minus page-misses over Intel's adaptive open-page policy and an average decrease in the average memory latency of 3.8% and 3.4%. Memory management (Computer science) Random access memory Computer storage devices Data Storage Systems Electrical and Computer Engineering
125	A Functional Approach to Memory-Safe Operating Systems Leslie, Rebekah 01 January 2011 (has links) Purely functional languages--with static type systems and dynamic memory management using garbage collection--are a known tool for helping programmers to reduce the number of memory errors in programs. By using such languages, we can establish correctness properties relating to memory-safety through our choice of implementation language alone. Unfortunately, the language characteristics that make purely functional languages safe also make them more difficult to apply in a low-level domain like operating systems construction. The low-level features that support the kinds of hardware manipulations required by operating systems are not typically available in memory-safe languages with garbage collection. Those that are provided may have the ability to violate memory- and type-safety, destroying the guarantees that motivate using such languages in the first place. This work demonstrates that it is possible to bridge the gap between the requirements of operating system implementations and the features of purely functional languages without sacrificing type- and memory-safety. In particular, we show that this can be achieved by isolating the potentially unsafe memory operations required by operating systems in an abstraction layer that is well integrated with a purely functional language. The salient features of this abstraction layer are that the operations it exposes are memory-safe and yet sufficiently expressive to support the implementation of realistic operating systems. The abstraction layer enables systems programmers to perform all of the low-level tasks necessary in an OS implementation, such as manipulating an MMU and executing user-level programs, without compromising the static memory-safety guarantees of programming in a purely functional language. A specific contribution of this work is an analysis of memory-safety for the abstraction layer by formalizing a meaning for memory-safety in the presence of virtual-memory using a novel application of noninterference security policies. In addition, we evaluate the expressiveness of the abstraction layer by implementing the L4 microkernel API, which has a flexible set of virtual memory management operations. Haskell Programming languages Safety L4 Operating systems (Computers) Memory management (Computer science) Garbage collection (Computer science)
126	Hardware related optimizations in a Java virtual machine Gu, Dayong. January 2007 (has links) No description available. Virtual computer systems. Memory management (Computer science) Computer storage devices. Java (Computer program language)
127	A design methodology for robust, energy-efficient, application-aware memory systems Chatterjee, Subho 28 August 2012 (has links) Memory design is a crucial component of VLSI system design from area, power and performance perspectives. To meet the increasingly challenging system specifications, architecture, circuit and device level innovations are required for existing memory technologies. Emerging memory solutions are widely explored to cater to strict budgets. This thesis presents design methodologies for custom memory design with the objective of power-performance benefits across specific applications. Taking example of STTRAM (spin transfer torque random access memory) as an emerging memory candidate, the design space is explored to find optimal energy design solution. A thorough thermal reliability study is performed to estimate detection reliability challenges and circuit solutions are proposed to ensure reliable operation. Adoption of the application-specific optimal energy solution is shown to yield considerable energy benefits in a read-heavy application called MBC (memory based computing). Circuit level customizations are studied for the volatile SRAM (static random access memory) memory, which will provide improved energy-delay product (EDP) for the same MBC application. Memory design has to be aware of upcoming challenges from not only the application nature but also from the packaging front. Taking 3D die-folding as an example, SRAM performance shift under die-folding is illustrated. Overall the thesis demonstrates how knowledge of the system and packaging can help in achieving power efficient and high performance memory design. Application-aware SRAM STTRAM Memory management (Computer science) Computer storage devices Random access memory
128	Memory Management and Garbage Collection Algorithms for Java-Based Prolog Zhou, Qinan 08 1900 (has links) Implementing a Prolog Runtime System in a language like Java which provides its own automatic memory management and safety features such as built--in index checking and array initialization requires a consistent approach to memory management based on a simple ultimate goal: minimizing total memory management time and extra space involved. The total memory management time for Jinni is made up of garbage collection time both for Java and Jinni itself. Extra space is usually requested at Jinni's garbage collection. This goal motivates us to find a simple and practical garbage collection algorithm and implementation for our Prolog engine. In this thesis we survey various algorithms already proposed and offer our own contribution to the study of garbage collection by improvements and optimizations for some classic algorithms. We implemented these algorithms based on the dynamic array algorithm for an all--dynamic Prolog engine (JINNI 2000). The comparisons of our implementations versus the originally proposed algorithm allow us to draw informative conclusions on their theoretical complexity model and their empirical effectiveness. Memory management prolog runtime system garbage collection, algorithm Memory management (Computer science) Garbage collection (Computer science) Prolog (Computer program language)
129	Cache design and timing analysis for preemptive multi-tasking real-time uniprocessor systems Tan, Yudong 18 April 2005 (has links) In this thesis, we propose an approach to estimate the Worst Case Response Time (WCRT) of each task in a preemptive multi-tasking single-processor real-time system utilizing an L1 cache. The approach combines inter-task cache eviction analysis and intra-task cache access analysis to estimate the Cache Related Preemption Delay (CRPD). CRPD caused by preempting task(s) is then incorporated into WCRT analysis. We also propose a prioritized cache to reduce CRPD by exploiting cache partitioning technique. Our WCRT analysis approach is then applied to analyze the behavior of a prioritized cache. Four sets of applications with up to six concurrent tasks running are used to test our WCRT analysis approach and the prioritized cache. The experimental results show that our WCRT analysis approach can tighten the WCRT estimate by up to 32% (1.4X) over prior state-of-the-art. By using a prioritized cache, we can reduce the WCRT estimate of tasks by up to 26%, as compared to a conventional set associative cache. Real-time systems Multi-tasking Cache Timing analysis Real-time data processing Cache memory Memory management (Computer science)
130	IMPRESS improving multicore performance and reliability via efficient software support for monitoring / Nagarajan, Vijayanand. January 2009 (has links) Thesis (Ph. D.)--University of California, Riverside, 2009. / Includes abstract. Title from first page of PDF file (viewed March 12, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 151-158). Also issued in print.

Search results