131 |
Autonomous Control in Advanced Life Support Systems : Air Revitalisation within the Micro-Ecological Life Support System Alternative / Autonom styrning i avancerade livsuppehållande system : Återupplivning av luft inom det Micro-Ecological Life Support System AlternativeDemey, Lukas January 2023 (has links)
In recent years international space agencies have become more and more explicit about long term lunar and Martian space missions. With the space program Terrae Novae, the European Space Agency puts forward a focus on the development of Human & Robotic Exploration technologies essential in enabling such long term missions. An integral component of this program is the focus on Advanced Life Support Systems. Life support systems are operated to provide astronauts with life necessities like oxygen, water and food. Currently, conventional Life Support System often have a linear supply design, relying on resources shipped from Earth, with limited onboard re-usage. However, for extended space missions, this linear supply model becomes impractical due to the constraints of dry mass during space travel. Given this need, the European Space Agency initiated the MELiSSA (Micro-Ecological Life Support System Alternative) project aimed at the development of a bioregenerative life support systems. In previous works, the MELiSSA Loop has been proposed: a system design inspired by terrestial ecosystems, that consists of multiple compartments that perform specific biological functions like nitrification and biosynthesis. Due to the complex interdependence of the individual compartments and general space system requirements, the control of such this cyber-physical system forms a significant challenge. This thesis proposes a previously undescribed architecture for the MELiSSA Loop controller design that coordinates the resource distribution between the compartments and establishes atmosphere revitalisation. The architecture meets control objectives specified at high level, and at the same time satisfies the physical and operational constraints. / Under de senaste åren har internationella rymdorganisationer blivit mer och mer tydliga om långsiktiga mån- och rymduppdrag på mars. Med rymdprogrammet Terrae Novae lägger Europeiska rymdorganisationen fram ett fokus på utvecklingen av Human & Robotic Exploration-teknik som är nödvändig för att möjliggöra sådana långsiktiga uppdrag. En integrerad del av detta program är fokus på Advanced Life Support Systems. Livsuppehållande system används för att förse astronauter med livsnödvändigheter som syre, vatten och mat. För närvarande har konventionella livsuppehållande system ofta en linjär försörjningsdesign som förlitar sig på resurser som skickas från jorden, med begränsad återanvändning ombord. Men för utökade rymduppdrag blir denna linjära försörjningsmodell opraktisk på grund av begränsningarna av torr massa under rymdresor. Med tanke på detta behov initierade Europeiska rymdorganisationen MELiSSA-projektet (MicroEcological Life Support System Alternative) som syftade till att utveckla ett bioregenerativt livsuppehållande system. I tidigare arbeten har MELiSSA Loop föreslagits: en systemdesign inspirerad av terrestiska ekosystem, som består av flera fack som utför specifika biologiska funktioner som nitrifikation och biosyntes. På grund av det komplexa ömsesidiga beroendet mellan de enskilda avdelningarna och allmänna krav på rymdsystem, utgör kontrollen av sådana detta cyberfysiska system en betydande utmaning. Denna avhandling föreslår en tidigare obeskriven arkitektur för MELiSSA Loopkontrollerdesignen som koordinerar resursfördelningen mellan avdelningarna och etablerar återupplivning av atmosfären. Arkitekturen uppfyller styrmål som anges på hög nivå, och uppfyller samtidigt de fysiska och operativa begränsningarna.
|
132 |
Automatic Design Space Exploration of Fault-tolerant Embedded Systems ArchitecturesTierno, Antonio 26 January 2023 (has links)
Embedded Systems may have competing design objectives, such as to maximize the reliability, increase the functional safety, minimize the product cost, and minimize the energy consumption. The architectures must be therefore configured to meet varied requirements and multiple design objectives. In particular, reliability and safety are receiving increasing attention. Consequently, the configuration of fault-tolerant mechanisms is a critical design decision. This work proposes a method for automatic selection of appropriate fault-tolerant design patterns, optimizing simultaneously multiple objective functions. Firstly, we present an exact method that leverages the power of Satisfiability Modulo Theory to encode the problem with a symbolic technique. It is based on a novel assessment of reliability which is part of the evaluation of alternative designs. Afterwards, we empirically evaluate the performance of a near-optimal approximation variation that allows us to solve the problem even when the instance size makes it intractable in terms of computing resources. The efficiency and scalability of this method is validated with a series of experiments of different sizes and characteristics, and by comparing it with existing methods on a test problem that is widely used in the reliability optimization literature.
|
133 |
Assessment of a Low Cost IR Laser Local Tracking Solution for Robotic OperationsDu, Minzhen 14 May 2021 (has links)
This thesis aimed to assess the feasibility of using an off-the-shelf virtual reality tracking system as a low cost precision pose estimation solution for robotic operations in both indoor and outdoor environments. Such a tracking solution has the potential of assisting critical operations related to planetary exploration missions, parcel handling/delivery, and wildfire detection/early warning systems. The boom of virtual reality experiences has accelerated the development of various low-cost, precision indoor tracking technologies. For the purpose of this thesis we choose to adapt the SteamVR Lighthouse system developed by Valve, which uses photo-diodes on the trackers to detect the rotating IR laser sheets emitted from the anchored base stations, also known as lighthouses. Some previous researches had been completed using the first generation of lighthouses, which has a few limitations on communication from lighthouses to the tracker. A NASA research has cited poor tracking performance under sunlight. We choose to use the second generation lighthouses which has improved the method of communication from lighthouses to the tracker, and we performed various experiments to assess their performance outdoors, including under sunlight. The studies of this thesis have two stages, the first stage focused on a controlled, indoor environment, having an Unmanned Aerial Vehicle (UAS) perform repeatable flight patterns and simultaneously tracked by the Lighthouse and a reference indoor tracking system, which showed that the tracking precision of the lighthouse is comparable to the industrial standard indoor tracking solution. The second stage of the study focused on outdoor experiments with the tracking system, comparing UAS flights between day and night conditions as well as positioning accuracy assessments with a CNC machine under indoor and outdoor conditions. The results showed matching performance between day and night while still comparable to industrial standard indoor tracking solution down to centimeter precision, and matching simulated CNC trajectory down to millimeter precision. There is also some room for improvement in regards to the experimental method and equipment used, as well as improvements on the tracking system itself needed prior to adaptation in real-world applications. / Master of Science / This thesis aimed to assess the feasibility of using an off-the-shelf virtual reality tracking system as a low cost precision pose estimation solution for robotic operations in both indoor and outdoor environments. Such a tracking solution has the potential of assisting critical operations related to planetary exploration missions, parcel handling/delivery, and wildfire detection/early warning systems. The boom of virtual reality experiences has accelerated the development of various low-cost, precision indoor tracking technologies. For the purpose of this thesis we choose to adapt the SteamVR Lighthouse system developed by Valve, which uses photo-diodes on the trackers to detect the rotating IR laser sheets emitted from the anchored base stations, also known as lighthouses. Some previous researches had been completed using the first generation of lighthouses, which has a few limitations on communication from lighthouses to the tracker. A NASA research has cited poor tracking performance under sunlight. We choose to use the second generation lighthouses which has improved the method of communication from lighthouses to the tracker, and we performed various experiments to assess their performance outdoors, including under sunlight. The studies of this thesis have two stages, the first stage focused on a controlled, indoor environment, having an Unmanned Aerial Vehicle (UAS) perform repeatable flight patterns and simultaneously tracked by the Lighthouse and a reference indoor tracking system, which showed that the tracking precision of the lighthouse is comparable to the industrial standard indoor tracking solution. The second stage of the study focused on outdoor experiments with the tracking system, comparing UAS flights between day and night conditions as well as positioning accuracy assessments with a CNC machine under indoor and outdoor conditions. The results showed matching performance between day and night while still comparable to industrial standard indoor tracking solution down to centimeter precision, and matching simulated CNC trajectory down to millimeter precision. There is also some room for improvement in regards to the experimental method and equipment used, as well as improvements on the tracking system itself needed prior to adaptation in real-world applications.
|
134 |
Evaluation of techniques for handling luminescence in Raman spectroscopy for space application in regard to the search for extraterrestrial life / A comparison of five different methods for identifying space-relevant luminescent biological and mineralogical sampleHanke, Franziska 18 February 2020 (has links)
Die Ramanspektroskopie (RS) ist eine analytische Technik, die in Folge einer optischen Anregung eines Stoffes materialspezifische Informationen über dessen molekulare Schwingungen und Kristallstruktur liefert. Da sowohl Minerale als auch biologische Materialien untersucht werden können, ist die RS in der Weltraumforschung von besonderem Interesse. So werden im Jahr 2020 gleich zwei Marsrover (ExoMars und Mars 2020) Ramanspektrometer mitführen, deren Aufgabe unter anderem die Detektion von Spuren von vergangenem oder gegenwärtigem extraterrestrischen Leben sein wird.
Die Charakterisierung von stark lumineszierenden biologischen Proben und Mineralen stellt eine der größten Herausforderungen in der konventionellen RS dar. Daher beschäftigt sich diese Dissertation mit dem Problem der Lumineszenz in der RS. Dazu wird das Potenzial von fünf verschiedenen ramanspektroskopischen Techniken zur Handhabung der Lumineszenz evaluiert. Diese Techniken beinhalten
(i) die Auswahl von verschiedenen Anregungswellenlängen (325 nm, 532 nm, 785 nm, 1064 nm), welche auf dem Konzept der spektralen Trennung des Lumineszenz- und Ramansignals basiert.
(ii) Eine Alternative ist das Photobleichen, wobei die Lumineszenz durch eine lange Belichtungszeit unterdrückt wird.
(iii) Eine weitere Methode für die spektrale Separation von Raman- und Lumineszenzphotonen ist die anti-Stokes RS.
(iv) Bei der SERDS Technik werden zwei leicht verschobene Anregungswellenlängen verwendet.
(v) Abschließend erfolgt die Untersuchung der Streu- und Emissionsstrahlung in der Zeitdomäne.
Die Ergebnisse dieser Arbeit zeigen, dass es keine universelle Lösung gibt um das Problem der Lumineszenz in der RS zu überwinden. Allerdings weist die Verwendung unterschiedlicher Laserwellenlängen großes Potenzial für die erfolgreiche Handhabung der Lumineszenz in der RS auf. In Kombination mit SERDS und/oder Photobleichen steigt die Wahrscheinlichkeit verwertbare Spektren für die Probencharakterisierung zu erhalten. / Raman spectroscopy (RS) is an analytical technique conveying material-specific information about a material’s molecular vibrations and crystal structure in succession of an optical excitation of the material. Due to the fact that mineralogical as well as biological material can be examined, RS is of special interest for space research. For instance, two Mars rovers (ExoMars and Mars 2020) will each carry along a Raman spectrometer in the year 2020, with the aim of detecting inter alia traces of extant or extinct extraterrestrial life.
One of the biggest challenges in conventional RS is the characterization of strongly luminescent biological or mineralogical material; therefore, the dissertation at hand deals with the problem of luminescence in RS. For this purpose, the potential of five different Raman spectroscopic techniques for the handling of luminescence will be evaluated. These techniques include
(i) the selection of different excitation wavelengths (325 nm, 532 nm, 785 nm and 1064 nm), which is based on the concept of the spectral separation of the luminescence signals as well as Raman signals.
(ii) Photobleaching provides an alternative whereby the luminescence is suppressed by long exposure.
(iii) A further method for the spectral separation of Raman photons as well as luminescence photons is provided by the anti-Stokes RS.
(iv) The SERDS technique uses two slightly shifted excitation wavelengths.
(v) Finally the examination of inelastic scattering and emission takes place in the time domain.
The results of this dissertation show that there is no universal solution to overcome the problem of luminescence in RS. However, the usage of different excitation wavelengths offers great potential for handling luminescence in RS successfully. In combination with SERDS and/or photobleaching the probability to obtain exploitable spectra for sample characterization increases
|
135 |
Em busca da cultura espacialBorges, Fabiane Morais 14 June 2013 (has links)
Made available in DSpace on 2016-04-28T20:38:43Z (GMT). No. of bitstreams: 1
Fabiane Morais Borges.pdf: 3503545 bytes, checksum: 7297b721aeaa0cb975f5cbd2df5be8ad (MD5)
Previous issue date: 2013-06-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This thesis has arisen from the complex Internet networks intending to build up new Space
paradigms based on practices of free software and hardware and open source systems, as they
appeared, roughly, since the turn of the century. In order to examine these new paradigms, I
consider prior processes connected to the Space Culture. The text goes back to the history of the
Space Race; the first rockets, the first satellites, some tenets of the international politics that guided
the cold war in the years after the Second World War. I bring up interesting elements of the Space
programs both of the US and the USSR, as well as the main technicians and scientists behind the
engineering of the rockets. The research dives in the rockets of Nazi Germany who first invested in
the production of rockets, goes to communist Russia as well as to liberal post-war America.
The thesis brings up ideas concerning Space utopias, science fiction in literature and cinema
and engages with the difference between Space exploration taken up by humans and by robots. It
examines the first rocket flights and the first artificial satellites placed in outer Space, paying
attention to the particulars of each of those first endeavors, to their purpose and to how much they
accomplished their mission. The thesis is therefore ready to question the importance of the Space
Race to human imagination and to analyse the realm of Space dreams from the late 19th century up
to now.
The last part of the thesis is concerned with the groups that are building Space travels in an
independent way, moved either by ideological or by commercial reasons. The investigation
uncovers the ideas of each of those groups concerning Space exploration. It then goes on to think
the relation between the makers of such exploration and a possible industrial revolution. Finally, the
thesis raises some criticisms to the creative processes of individuals, groups, networks and social
movements that are concerned with the outer Space / Essa tese surge a partir das complexas redes de internet voltadas à construção de novos
paradigmas espaciais, baseadas em práticas de software e hardware livre e sistemas open source que
surgiram, a grosso modo, a partir dos anos 2000. Mas para chegar nesse ponto foi preciso investigar
processos anteriores em relação à Cultura Espacial. O texto retoma a história da Corrida Espacial,
os primeiros foguetes, os primeiros satélites, a política que estava em voga durante os anos da
Guerra Fria pós II Guerra Mundial. Ela tenta levantar os pontos de tensão dos programas espaciais
da União Soviética e dos Estados Unidos, assim como dos principais técnicos que estavam por traz
de toda a engenharia de foguetes. Vai mergulhar na Alemanha Nazista que foi a primeira a investir
irrestritamente na produção de foguetes, passeia pela Russia comunista e o liberalismo americano.
A tese traz à tona ideias sobre utopias espaciais, ficção científica na literatura e no cinema, e
analisa a diferença entre exploração espacial humana e robótica. Traz tabelas dos primeiros vôos
espaciais e os primeiros satélites levados ao Espaço, atentando para as particularidades de cada um
deles, para que serviam e que fim levaram. Levanta questionamentos sobre a importância da
Corrida Espacial para a imaginação humana e analisa o arco dos sonhos espaciais desde o final do
século XIX até os dias atuais.
O final da tese é dedicada aos grupos que estão retomando a questão das viagens espaciais
de forma independente, sejam grupos ideológicos ou mais empresariais e as ideias de cada um a
respeito da exploração espacial. Pensa a relação dos makers com uma possível revolução industrial
e levanta algumas críticas aos processos criativos de indivíduos, grupos, redes e movimentos sociais
que se dedicam ao espaço
|
136 |
Design, Analysis, and Applications of Approximate Arithmetic ModulesUllah, Salim 06 April 2022 (has links)
From the initial computing machines, Colossus of 1943 and ENIAC of 1945, to modern high-performance data centers and Internet of Things (IOTs), four design goals, i.e., high-performance, energy-efficiency, resource utilization, and ease of programmability, have remained a beacon of development for the computing industry. During this period, the computing industry has exploited the advantages of technology scaling and microarchitectural enhancements to achieve these goals. However, with the end of Dennard scaling, these techniques have diminishing energy and performance advantages. Therefore, it is necessary to explore alternative techniques for satisfying the computational and energy requirements of modern applications. Towards this end, one promising technique is analyzing and surrendering the strict notion of correctness in various layers of the computation stack. Most modern applications across the computing spectrum---from data centers to IoTs---interact and analyze real-world data and take decisions accordingly. These applications are broadly classified as Recognition, Mining, and Synthesis (RMS). Instead of producing a single golden answer, these applications produce several feasible answers. These applications possess an inherent error-resilience to the inexactness of processed data and corresponding operations. Utilizing these applications' inherent error-resilience, the paradigm of Approximate Computing relaxes the strict notion of computation correctness to realize high-performance and energy-efficient systems with acceptable quality outputs.
The prior works on circuit-level approximations have mainly focused on Application-specific Integrated Circuits (ASICs). However, ASIC-based solutions suffer from long time-to-market and high-cost developing cycles. These limitations of ASICs can be overcome by utilizing the reconfigurable nature of Field Programmable Gate Arrays (FPGAs). However, due to architectural differences between ASICs and FPGAs, the utilization of ASIC-based approximation techniques for FPGA-based systems does not result in proportional performance and energy gains. Therefore, to exploit the principles of approximate computing for FPGA-based hardware accelerators for error-resilient applications, FPGA-optimized approximation techniques are required. Further, most state-of-the-art approximate arithmetic operators do not have a generic approximation methodology to implement new approximate designs for an application's changing accuracy and performance requirements. These works also lack a methodology where a machine learning model can be used to correlate an approximate operator with its impact on the output quality of an application. This thesis focuses on these research challenges by designing and exploring FPGA-optimized logic-based approximate arithmetic operators. As multiplication operation is one of the computationally complex and most frequently used arithmetic operations in various modern applications, such as Artificial Neural Networks (ANNs), we have, therefore, considered it for most of the proposed approximation techniques in this thesis.
The primary focus of the work is to provide a framework for generating FPGA-optimized approximate arithmetic operators and efficient techniques to explore approximate operators for implementing hardware accelerators for error-resilient applications.
Towards this end, we first present various designs of resource-optimized, high-performance, and energy-efficient accurate multipliers. Although modern FPGAs host high-performance DSP blocks to perform multiplication and other arithmetic operations, our analysis and results show that the orthogonal approach of having resource-efficient and high-performance multipliers is necessary for implementing high-performance accelerators. Due to the differences in the type of data processed by various applications, the thesis presents individual designs for unsigned, signed, and constant multipliers. Compared to the multiplier IPs provided by the FPGA Synthesis tool, our proposed designs provide significant performance gains. We then explore the designed accurate multipliers and provide a library of approximate unsigned/signed multipliers. The proposed approximations target the reduction in the total utilized resources, critical path delay, and energy consumption of the multipliers. We have explored various statistical error metrics to characterize the approximation-induced accuracy degradation of the approximate multipliers. We have also utilized the designed multipliers in various error-resilient applications to evaluate their impact on applications' output quality and performance.
Based on our analysis of the designed approximate multipliers, we identify the need for a framework to design application-specific approximate arithmetic operators. An application-specific approximate arithmetic operator intends to implement only the logic that can satisfy the application's overall output accuracy and performance constraints.
Towards this end, we present a generic design methodology for implementing FPGA-based application-specific approximate arithmetic operators from their accurate implementations according to the applications' accuracy and performance requirements. In this regard, we utilize various machine learning models to identify feasible approximate arithmetic configurations for various applications. We also utilize different machine learning models and optimization techniques to efficiently explore the large design space of individual operators and their utilization in various applications. In this thesis, we have used the proposed methodology to design approximate adders and multipliers.
This thesis also explores other layers of the computation stack (cross-layer) for possible approximations to satisfy an application's accuracy and performance requirements. Towards this end, we first present a low bit-width and highly accurate quantization scheme for pre-trained Deep Neural Networks (DNNs). The proposed quantization scheme does not require re-training (fine-tuning the parameters) after quantization. We also present a resource-efficient FPGA-based multiplier that utilizes our proposed quantization scheme. Finally, we present a framework to allow the intelligent exploration and highly accurate identification of the feasible design points in the large design space enabled by cross-layer approximations. The proposed framework utilizes a novel Polynomial Regression (PR)-based method to model approximate arithmetic operators. The PR-based representation enables machine learning models to better correlate an approximate operator's coefficients with their impact on an application's output quality.:1. Introduction
1.1 Inherent Error Resilience of Applications
1.2 Approximate Computing Paradigm
1.2.1 Software Layer Approximation
1.2.2 Architecture Layer Approximation
1.2.3 Circuit Layer Approximation
1.3 Problem Statement
1.4 Focus of the Thesis
1.5 Key Contributions and Thesis Overview
2. Preliminaries
2.1 Xilinx FPGA Slice Structure
2.2 Multiplication Algorithms
2.2.1 Baugh-Wooley’s Multiplication Algorithm
2.2.2 Booth’s Multiplication Algorithm
2.2.3 Sign Extension for Booth’s Multiplier
2.3 Statistical Error Metrics
2.4 Design Space Exploration and Optimization Techniques
2.4.1 Genetic Algorithm
2.4.2 Bayesian Optimization
2.5 Artificial Neural Networks
3. Accurate Multipliers
3.1 Introduction
3.2 Related Work
3.3 Unsigned Multiplier Architecture
3.4 Motivation for Signed Multipliers
3.5 Baugh-Wooley’s Multiplier
3.6 Booth’s Algorithm-based Signed Multipliers
3.6.1 Booth-Mult Design
3.6.2 Booth-Opt Design
3.6.3 Booth-Par Design
3.7 Constant Multipliers
3.8 Results and Discussion
3.8.1 Experimental Setup and Tool Flow
3.8.2 Performance comparison of the proposed accurate unsigned multiplier
3.8.3 Performance comparison of the proposed accurate signed multiplier with the state-of-the-art accurate multipliers
3.8.4 Performance comparison of the proposed constant multiplier with the state-of-the-art accurate multipliers
3.9 Conclusion
4. Approximate Multipliers
4.1 Introduction
4.2 Related Work
4.3 Unsigned Approximate Multipliers
4.3.1 Approximate 4 × 4 Multiplier (Approx-1)
4.3.2 Approximate 4 × 4 Multiplier (Approx-2)
4.3.3 Approximate 4 × 4 Multiplier (Approx-3)
4.4 Designing Higher Order Approximate Unsigned Multipliers
4.4.1 Accurate Adders for Implementing 8 × 8 Approximate Multipliers from 4 × 4 Approximate Multipliers
4.4.2 Approximate Adders for Implementing Higher-order Approximate Multipliers
4.5 Approximate Signed Multipliers (Booth-Approx)
4.6 Results and Discussion
4.6.1 Experimental Setup and Tool Flow
4.6.2 Evaluation of the Proposed Approximate Unsigned Multipliers
4.6.3 Evaluation of the Proposed Approximate Signed Multiplier
4.7 Conclusion
5. Designing Application-specific Approximate Operators
5.1 Introduction
5.2 Related Work
5.3 Modeling Approximate Arithmetic Operators
5.3.1 Accurate Multiplier Design
5.3.2 Approximation Methodology
5.3.3 Approximate Adders
5.4 DSE for FPGA-based Approximate Operators Synthesis
5.4.1 DSE using Bayesian Optimization
5.4.2 MOEA-based Optimization
5.4.3 Machine Learning Models for DSE
5.5 Results and Discussion
5.5.1 Experimental Setup and Tool Flow
5.5.2 Accuracy-Performance Analysis of Approximate Adders
5.5.3 Accuracy-Performance Analysis of Approximate Multipliers
5.5.4 AppAxO MBO
5.5.5 ML Modeling
5.5.6 DSE using ML Models
5.5.7 Proposed Approximate Operators
5.6 Conclusion
6. Quantization of Pre-trained Deep Neural Networks
6.1 Introduction
6.2 Related Work
6.2.1 Commonly Used Quantization Techniques
6.3 Proposed Quantization Techniques
6.3.1 L2L: Log_2_Lead Quantization
6.3.2 ALigN: Adaptive Log_2_Lead Quantization
6.3.3 Quantitative Analysis of the Proposed Quantization Schemes
6.3.4 Proposed Quantization Technique-based Multiplier
6.4 Results and Discussion
6.4.1 Experimental Setup and Tool Flow
6.4.2 Image Classification
6.4.3 Semantic Segmentation
6.4.4 Hardware Implementation Results
6.5 Conclusion
7. A Framework for Cross-layer Approximations
7.1 Introduction
7.2 Related Work
7.3 Error-analysis of approximate arithmetic units
7.3.1 Application Independent Error-analysis of Approximate Multipliers
7.3.2 Application Specific Error Analysis
7.4 Accelerator Performance Estimation
7.5 DSE Methodology
7.6 Results and Discussion
7.6.1 Experimental Setup and Tool Flow
7.6.2 Behavioral Analysis
7.6.3 Accelerator Performance Estimation
7.6.4 DSE Performance
7.7 Conclusion
8. Conclusions and Future Work
|
137 |
Processor design-space exploration through fast simulation / Exploration de l'espace de conception de processeurs via simulation accéléréeKhan, Taj Muhammad 12 May 2011 (has links)
Nous nous focalisons sur l'échantillonnage comme une technique de simulation pour réduire le temps de simulation. L'échantillonnage est basé sur le fait que l'exécution d'un programme est composée des parties du code qui se répètent, les phases. D'où vient l'observation que l'on peut éviter la simulation entière d'un programme et simuler chaque phase juste une fois et à partir de leurs performances calculer la performance du programme entier. Deux questions importantes se lèvent: quelles parties du programme doit-on simuler? Et comment restaurer l'état du système avant chaque simulation? Pour répondre à la première question, il existe deux solutions: une qui analyse l'exécution du programme en termes de phases et choisit de simuler chaque phase une fois, l'échantillonnage représentatif, et une deuxième qui prône de choisir les échantillons aléatoirement, l'échantillonnage statistique. Pour répondre à la deuxième question de la restauration de l'état du système, des techniques ont été développées récemment qui restaurent l'état (chauffent) du système en fonction des besoins du bout du code simulé (adaptativement). Les techniques des choix des échantillons ignorent complètement les mécanismes de chauffage du système ou proposent des alternatives qui demandent beaucoup de modification du simulateur et les techniques adaptatives du chauffage ne sont pas compatibles avec la plupart des techniques d'échantillonnage. Au sein de cette thèse nous nous focalisons sur le fait de réconcilier les techniques d'échantillonnage avec celles du chauffage adaptatif pour développer un mécanisme qui soit à la fois facile à utiliser, précis dans ses résultats, et soit transparent à l'utilisateur. Nous avons prit l'échantillonnage représentatif et statistique et modifié les techniques adaptatives du chauffage pour les rendre compatibles avec ces premiers dans un seul mécanisme. Nous avons pu montrer que les techniques adaptatives du chauffage peuvent être employées dans l'échantillonnage. Nos résultats sont comparables avec l'état de l'art en terme de précision mais en débarrassant l'utilisateur des problèmes du chauffage et en lui cachant les détails de la simulation, nous rendons le processus plus facile. On a aussi constaté que l'échantillonnage statistique donne des résultats meilleurs que l'échantillonnage représentatif / Simulation is a vital tool used by architects to develop new architectures. However, because of the complexity of modern architectures and the length of recent benchmarks, detailed simulation of programs can take extremely long times. This impedes the exploration of processor design space which the architects need to do to find the optimal configuration of processor parameters. Sampling is one technique which reduces the simulation time without adversely affecting the accuracy of the results. Yet, most sampling techniques either ignore the warm-up issue or require significant development effort on the part of the user.In this thesis we tackle the problem of reconciling state-of-the-art warm-up techniques and the latest sampling mechanisms with the triple objective of keeping the user effort minimum, achieving good accuracy and being agnostic to software and hardware changes. We show that both the representative and statistical sampling techniques can be adapted to use warm-up mechanisms which can accommodate the underlying architecture's warm-up requirements on-the-fly. We present the experimental results which show an accuracy and speed comparable to latest research. Also, we leverage statistical calculations to provide an estimate of the robustness of the final results.
|
138 |
Dynamic instruction set extension of microprocessors with embedded FPGAsBauer, Heiner 13 April 2017 (has links) (PDF)
Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually.
This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions.
For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann.
In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen.
Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt.
|
139 |
Methods for parameterizing and exploring Pareto frontiers using barycentric coordinatesDaskilewicz, Matthew John 08 April 2013 (has links)
The research objective of this dissertation is to create and demonstrate methods for parameterizing the Pareto frontiers of continuous multi-attribute design problems using barycentric coordinates, and in doing so, to enable intuitive exploration of optimal trade spaces. This work is enabled by two observations about Pareto frontiers that have not been previously addressed in the engineering design literature. First, the observation that the mapping between non-dominated designs and Pareto efficient response vectors is a bijection almost everywhere suggests that points on the Pareto frontier can be inverted to find their corresponding design variable vectors. Second, the observation that certain common classes of Pareto frontiers are topologically equivalent to simplices suggests that a barycentric coordinate system will be more useful for parameterizing the frontier than the Cartesian coordinate systems typically used to parameterize the design and objective spaces.
By defining such a coordinate system, the design problem may be reformulated from y = f(x) to (y,x) = g(p) where x is a vector of design variables, y is a vector of attributes and p is a vector of barycentric coordinates. Exploration of the design problem using p as the independent variables has the following desirable properties: 1) Every vector p corresponds to a particular Pareto efficient design, and every Pareto efficient design corresponds to a particular vector p. 2) The number of p-coordinates is equal to the number of attributes regardless of the number of design variables. 3) Each attribute y_i has a corresponding coordinate p_i such that increasing the value of p_i corresponds to a motion along the Pareto frontier that improves y_i monotonically.
The primary contribution of this work is the development of three methods for forming a barycentric coordinate system on the Pareto frontier, two of which are entirely original. The first method, named "non-domination level coordinates," constructs a coordinate system based on the (k-1)-attribute non-domination levels of a discretely sampled Pareto frontier. The second method is based on a modification to an existing "normal boundary intersection" multi-objective optimizer that adaptively redistributes its search basepoints in order to sample from the entire frontier uniformly. The weights associated with each basepoint can then serve as a coordinate system on the frontier. The third method, named "Pareto simplex self-organizing maps" uses a modified a self-organizing map training algorithm with a barycentric-grid node topology to iteratively conform a coordinate grid to the sampled Pareto frontier.
|
140 |
Design, Implementation and Evaluation of a Configurable NoC for AcENoCs FPGA Accelerated Emulation PlatformLotlikar, Swapnil Subhash 2010 August 1900 (has links)
The heterogenous nature and the demand for extensive parallel processing in modern applications have resulted in widespread use of Multicore System-on-Chip (SoC) architectures. The emerging Network-on-Chip (NoC) architecture provides an energy-efficient and scalable communication solution for Multicore SoCs, serving as a powerful replacement for traditional bus-based solutions. The key to successful realization of such architectures is a flexible, fast and robust emulation platform for fast design space exploration. In this research, we present the design and evaluation of a highly configurable NoC used in AcENoCs (Accelerated Emulation platform for NoCs), a flexible and cycle accurate field programmable gate array (FPGA) emulation platform for validating NoC architectures. Along with the implementation details, we also discuss the various design optimizations and tradeoffs, and assess the performance improvements of AcENoCs over existing simulators and emulators. We design a hardware library consisting of routers and links using verilog hardware description language (HDL). The router is parameterized and has a configurable number of physical ports, virtual channels (VCs) and pipeline depth. A packet switched NoC is constructed by connecting the routers in either 2D-Mesh or 2D-Torus topology. The NoC is integrated in the AcENoCs platform and prototyped on Xilinx Virtex-5 FPGA. The NoC was evaluated under various synthetic and realistic workloads generated by AcENoCs' traffic generators implemented on the Xilinx MicroBlaze embedded processor. In order to validate the NoC design, performance metrics like average latency and throughput were measured and compared against the results obtained using standard network simulators. FPGA implementation of the NoC using Xilinx tools indicated a 76% LUT utilization for a 5x5 2D-Mesh network. A VC allocator was found to be the single largest consumer of hardware resources within a router. The router design synthesized at a frequency of 135MHz, 124MHz and 109MHz for 3-port, 4-port and 5-port configurations, respectively. The operational frequency of the router in the AcENoCs environment was limited only by the software execution latency even though the hardware itself could be clocked at a much higher rate. An AcENoCs emulator showed speedup improvements of 10000-12000X over HDL simulators and 5-15X over software simulators, without sacrificing cycle accuracy.
|
Page generated in 0.0893 seconds