• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 95
  • 13
  • 9
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 177
  • 177
  • 54
  • 36
  • 35
  • 33
  • 31
  • 25
  • 25
  • 22
  • 22
  • 20
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Quality-of-Service Aware Design and Management of Embedded Mixed-Criticality Systems

Ranjbar, Behnaz 06 December 2022 (has links)
Nowadays, implementing a complex system, which executes various applications with different levels of assurance, is a growing trend in modern embedded real-time systems to meet cost, timing, and power consumption requirements. Medical devices, automotive, and avionics industries are the most common safety-critical applications, exploiting these systems known as Mixed-Criticality (MC) systems. MC applications are real-time, and to ensure the correctness of these applications, it is essential to meet strict timing requirements as well as functional specifications. The correct design of such MC systems requires a thorough understanding of the system's functions and their importance to the system. A failure/deadline miss in functions with various criticality levels has a different impact on the system, from no effect to catastrophic consequences. Failure in the execution of tasks with higher criticality levels (HC tasks) may lead to system failure and cause irreparable damage to the system, while although Low-Criticality (LC) tasks assist the system in carrying out its mission successfully, their failure has less impact on the system's functionality and does not harm the system itself to fail. In order to guarantee the MC system safety, tasks are analyzed with different assumptions to obtain different Worst-Case Execution Times (WCETs) corresponding to the multiple criticality levels and the operation mode of the system. If the execution time of at least one HC task exceeds its low WCET, the system switches from low-criticality mode (LO mode) to high-criticality mode (HI mode). Then, all HC tasks continue executing by considering the high WCET to guarantee the system's safety. In this HI mode, all or some LC tasks are dropped/degraded in favor of HC tasks to ensure HC tasks' correct execution. Determining an appropriate low WCET for each HC task is crucial in designing efficient MC systems and ensuring QoS maximization. However, in the case where the low WCETs are set correctly, it is not recommended to drop/degrade the LC tasks in the HI mode due to its negative impact on the other functions or on the entire system in accomplishing its mission correctly. Therefore, how to analyze the task dropping in the HI mode is a significant challenge in designing efficient MC systems that must be considered to guarantee the successful execution of all HC tasks to prevent catastrophic damages while improving the QoS. Due to the continuous rise in computational demand for MC tasks in safety-critical applications, like controlling autonomous driving, the designers are motivated to deploy MC applications on multi-core platforms. Although the parallel execution feature of multi-core platforms helps to improve QoS and ensures the real-timeliness, high power consumption and temperature of cores may make the system more susceptible to failures and instability, which is not desirable in MC applications. Therefore, improving the QoS while managing the power consumption and guaranteeing real-time constraints is the critical issue in designing such MC systems in multi-core platforms. This thesis addresses the challenges associated with efficient MC system design. We first focus on application analysis by determining the appropriate WCET by proposing a novel approach to provide a reasonable trade-off between the number of scheduled LC tasks at design-time and the probability of mode switching at run-time to improve the system utilization and QoS. The approach presents an analytic-based scheme to obtain low WCETs based on the Chebyshev theorem at design-time. We also show the relationship between the low WCETs and mode switching probability, and formulate and solve the problem for improving resource utilization and reducing the mode switching probability. Further, we analyze the LC task dropping in the HI mode to improve QoS. We first propose a heuristic in which a new metric is defined that determines the number of allowable drops in the HI mode. Then, the task schedulability analysis is developed based on the new metric. Since the occurrence of the worst-case scenario at run-time is a rare event, a learning-based drop-aware task scheduling mechanism is then proposed, which carefully monitors the alterations in the behavior of MC systems at run-time to exploit the dynamic slacks for improving the QoS. Another critical design challenge is how to improve QoS using the parallel feature of multi-core platforms while managing the power consumption and temperature of these platforms. We develop a tree of possible task mapping and scheduling at design-time to cover all possible scenarios of task overrunning and reduce the LC task drop rate in the HI mode while managing the power and temperature in each scenario of task scheduling. Since the dynamic slack is generated due to the early execution of tasks at run-time, we propose an online approach to reduce the power consumption and maximum temperature by using low-power techniques like DVFS and task re-mapping, while preserving the QoS. Specifically, our approach examines multiple tasks ahead to determine the most appropriate task for the slack assignment that has the most significant effect on power consumption and temperature. However, changing the frequency and selecting a proper task for slack assignment and a suitable core for task re-mapping at run-time can be time-consuming and may cause deadline violation. Therefore, we analyze and optimize the run-time scheduler.:1. Introduction 1.1. Mixed-Criticality Application Design 1.2. Mixed-Criticality Hardware Design 1.3. Certain Challenges and Questions 1.4. Thesis Key Contributions 1.4.1. Application Analysis and Modeling 1.4.2. Multi-Core Mixed-Criticality System Design 1.5. Thesis Overview 2. Preliminaries and Literature Reviews 2.1. Preliminaries 2.1.1. Mixed-Criticality Systems 2.1.2. Fault-Tolerance, Fault Model and Safety Requirements 2.1.3. Hardware Architectural Modeling 2.1.4. Low-Power Techniques and Power Consumption Model 2.2. Related Works 2.2.1. Mixed-Criticality Task Scheduling Mechanisms 2.2.2. QoS Improvement Methods in Mixed-Criticality Systems 2.2.3. QoS-Aware Power and Thermal Management in Multi-Core Mixed-Criticality Systems 2.3. Conclusion 3. Bounding Time in Mixed-Criticality Systems 3.1. BOT-MICS: A Design-Time WCET Adjustment Approach 3.1.1. Motivational Example 3.1.2. BOT-MICS in Detail 3.1.3. Evaluation 3.2. A Run-Time WCET Adjustment Approach 3.2.1. Motivational Example 3.2.2. ADAPTIVE in Detail 3.2.3. Evaluation 3.3. Conclusion 4. Safety- and Task-Drop-Aware Mixed-Criticality Task Scheduling 4.1. Problem Objectives and Motivational Example 4.2. FANTOM in detail 4.2.1. Safety Quantification 4.2.2. MC Tasks Utilization Bounds Definition 4.2.3. Scheduling Analysis 4.2.4. System Upper Bound Utilization 4.2.5. A General Design Time Scheduling Algorithm 4.3. Evaluation 4.3.1. Evaluation with Real-Life Benchmarks 4.3.2. Evaluation with Synthetic Task Sets 4.4. Conclusion 5. Learning-Based Drop-Aware Mixed-Criticality Task Scheduling 5.1. Motivational Example and Problem Statement 5.2. Proposed Method in Detail 5.2.1. An Overview of the Design-Time Approach 5.2.2. Run-Time Approach: Employment of SOLID 5.2.3. LIQUID Approach 5.3. Evaluation 5.3.1. Evaluation with Real-Life Benchmarks 5.3.2. Evaluation with Synthetic Task Sets 5.3.3. Investigating the Timing and Memory Overheads of ML Technique 5.4. Conclusion 6. Fault-Tolerance and Power-Aware Multi-Core Mixed-Criticality System Design 6.1. Problem Objectives and Motivational Example 6.2. Design Methodology 6.3. Tree Generation and Fault-Tolerant Scheduling and Mapping 6.3.1. Making Scheduling Tree 6.3.2. Mapping and Scheduling 6.3.3. Time Complexity Analysis 6.3.4. Memory Space Analysis 6.4. Evaluation 6.4.1. Experimental Setup 6.4.2. Analyzing the Tree Construction Time 6.4.3. Analyzing the Run-Time Timing Overhead 6.4.4. Peak Power Management and Thermal Distribution for Real-Life and Synthetic Applications 6.4.5. Analyzing the QoS of LC Tasks 6.4.6. Analyzing the Peak Power Consumption and Maximum Temperature 6.4.7. Effect of Varying Different Parameters on Acceptance Ratio 6.4.8. Investigating Different Approaches at Run-Time 6.5. Conclusion 7. QoS- and Power-Aware Run-Time Scheduler for Multi-Core Mixed-Criticality Systems 7.1. Research Questions, Objectives and Motivational Example 7.2. Design-Time Approach 7.3. Run-Time Mixed-Criticality Scheduler 7.3.1. Selecting the Appropriate Task to Assign Slack 7.3.2. Re-Mapping Technique 7.3.3. Run-Time Management Algorithm 7.3.4. DVFS governor in Clustered Multi-Core Platforms 7.4. Run-Time Scheduler Algorithm Optimization 7.5. Evaluation 7.5.1. Experimental Setup 7.5.2. Analyzing the Relevance Between a Core Temperature and Energy Consumption 7.5.3. The Effect of Varying Parameters of Cost Functions 7.5.4. The Optimum Number of Tasks to Look-Ahead and the Effect of Task Re-mapping 7.5.5. The Analysis of Scheduler Timings Overhead on Different Real Platforms 7.5.6. The Latency of Changing Frequency in Real Platform 7.5.7. The Effect of Latency on System Schedulability 7.5.8. The Analysis of the Proposed Method on Peak Power, Energy and Maximum Temperature Improvement 7.5.9. The Analysis of the Proposed Method on Peak power, Energy and Maximum Temperature Improvement in a Multi-Core Platform Based on the ODROID-XU3 Architecture 7.5.10. Evaluation of Running Real MC Task Graph Model (Unmanned Air Vehicle) on Real Platform 7.6. Conclusion 8. Conclusion and Future Work 8.1. Conclusions 8.2. Future Work
112

Automatic methods for distribution of data-parallel programs on multi-device heterogeneous platforms

Moreń, Konrad 07 February 2024 (has links)
This thesis deals with the problem of finding effective methods for programming and distributing data-parallel applications for heterogeneous multiprocessor systems. These systems are ubiquitous today. They range from embedded devices with low power consumption to high performance distributed systems. The demand for these systems is growing steadily. This is due to the growing number of data-intensive applications and the general growth of digital applications. Systems with multiple devices offer higher performance but unfortunately add complexity to the software development for such systems. Programming heterogeneous multiprocessor systems present several unique challenges compared to single device systems. The first challenge is the programmability of such systems. Despite constant innovations in programming languages and frameworks, they are still limited. They are either platform specific, like CUDA which supports only NVIDIA GPUs, or applied at a low level of abstraction, such as OpenCL. Application developers that design OpenCL programs must manually distribute data to the different devices and synchronize the distributed computations. These capabilities have an impact on the productivity of the developers. To reduce the programming complexity and the development time, this thesis introduces two approaches that automatically distribute and synchronize the data-parallel workloads. Another challenge is the multi-device hardware utilization. In contrast to single-device platforms, the application optimization process for a multi-device system is even more complicated. The application designers need to apply not only optimization strategies specific for a single-device architecture. They need also focus on the careful workload balancing between all the platform processors. For the balancing problem, this thesis proposes a method based on the platform model. The platform model is created with machine learning techniques. Using machine learning, this thesis builds automatically a reliable platform model, which is portable and adaptable to different platform setups, with a minimum manual involvement of the programmers.
113

Parallel Viterbi Search For Continuous Speech Recognition On A Multi-Core Architecture

Parihar, Naveen 11 December 2009 (has links)
State-of-the-art speech-recognition systems can successfully perform simple tasks in real-time on most computers, when the tasks are performed in controlled and noiseree environments. However, current algorithms and processors are not yet powerful enough for real-time large-vocabulary conversational speech recognition in noisy, real-world environments. Parallel processing can improve the real-time performance of speech recognition systems and increase their applicability, and developing an effective approach to parallelization is especially important given the recent trend toward multi-core processor design. In this dissertation, we introduce methods for parallelizing a single-pass across-word n-gram lexical-tree based Viterbi recognizer, which is the most popular architecture for Viterbi-based large vocabulary continuous speech recognition. We parallelize two different open-source implementations of such a recognizer, one developed at Mississippi State University and the other developed at Rheinisch-Westfalische Technische Hochschule University in Germany. We describe three methods for parallelization. The first, called parallel fast likelihood computation, parallelizes likelihood computations by decomposing mixtures among CPU cores, so that each core computes the likelihood of the set of mixtures allocated to it. A second method, lexical-tree division, parallelizes the search management component of a speech recognizer by dividing the lexical tree among the cores. A third and alternative method for parallelizing the search-management component of a speech recognizer, called lexical-tree copies decomposition, dynamically distributes the active lexical-tree copies among the cores. All parallelization methods were tested on two and four cores of an Intel Core2 Quad processor and significantly improved real-time performance. Several challenges for parallelizing a lexical-tree based Viterbi speech recognizer are also identified and discussed.
114

Power-Efficient Nanophotonic Architectures for Intra- and Inter-Chip Communication

Kennedy, Matthew D. 15 July 2016 (has links)
No description available.
115

ANALYSIS AND MITIGATION OF THE NONLINEAR IMPAIRMENTS IN FIBER-OPTIC COMMUNICATION SYSTEMS

NADERI, SHAHI SINA 10 1900 (has links)
<p>Fiber-optic communication systems have revolutionized the telecommunications industry and have played a major role in the advent of the Information Age. Thousands of kilometers of optical fiber are used by telecommunications companies to transmit telephone signals, Internet communication, and cable television signals throughout the world. So, working in this area has always been interesting. This thesis analyzes the nonlinearity of fiber-optic systems and proposes a system to mitigate fiber nonlinear e®ects. The topics of this thesis can be categorized into two parts. In the first part of thesis (Chapters 2, 3, and 4), analytical models are developed for fiber-optic nonlinear effects. It is important to have an accurate analytical model so that the impact of a specific system/signal parameter on the performance can be assessed quickly without doing time-consuming Monte-Carlo simulations. In the second part (Chapters 5, and 6), a multi-core/fiber architecture is proposed to reduce the nonlinear effects.</p> <p>In Chapter 2, intrachannel nonlinear impairments are studied and an analytical model for the calculation of power spectral density (PSD) and variance of the non- linear distortion is obtained based on quadrature phase-shift keying (QPSK) signal. For QPSK signals, intrachannel four-wave mixing (IFWM) is the only stochastic non- linear distortion. To develop the analytical model, a first order perturbation theory is used. For a Gaussian pulse shape, a closed form formula is obtained for the PSD of IFWM. For non-Gaussian pulses, it is not possible to find the PSD analytically. However, using stationary phase approximation approach, convolutions become multiplications and a simple analytical expression for the PSD of the nonlinear distortion can be found. The total PSD is obtained by adding the PSD of amplified spontaneous emission (ASE) PSD to that of the nonlinear distortion. Using the total PSD, bit error ratio (BER) can be obtained analytically for a QPSK system. The analytically estimated BER is found to be in good agreement with numerical simulations. Significant computational effort can be saved using the analytical model as compared to numerical simulations, without sacrificing much accuracy.</p> <p>In Chapter 3, the same approach as that in Chapter 2 is used to find an analytical expression for the PSD of the intrachannel nonlinear distortion of a fiber-optic system based on quadrature amplitude modulation (QAM) signal. Unlike the QPSK signal, intrachannel cross-phase modulation (IXPM) is a stochastic process for the QAM signal which leads to the increase of the nonlinear distortion variance. In this chapter, analytical expressions for the PSDs of self-phase modulation (SPM), IXPM, IFWM, and their correlations are obtained for the QAM signal. Simulation results show good agreement between the analytical model and numerical simulation.</p> <p>In Chapter 4, inter-channel nonlinear impairment is studied. This time, a first order perturbation technique is used to develop an analytical model for SPM and cross-phase modulation (XPM) distortions in a wavelength division multiplexing (WDM) system based on QAM. In this case, SPM distortion is deterministic and does not contribute to the nonlinear noise variance. On the other hand, XPM is stochastic and contributes to the noise variance. In this chapter, effects of input launch power, fiber dispersion, system reach, and channel spacing on the nonlinear noise variance are investigated as well.</p> <p>In Chapter 5, a single-channel multi-core/fiber architecture is proposed to reduce intrachannel fiber nonlinear effects. Based on the analytical model obtained in the first part of thesis, the nonlinear distortion variance scales as P<sup>3</sup>, where P is the fiber input launch power, which suggests that decreasing the fiber input power can reduce the nonlinear distortion significantly. In this system, the input power is divided between multiple cores/fibers by a power splitter at the input of each span and a power combiner adds the output fields of multiple cores/fibers so that one amplifier can be used for each span. In this case, each core/fiber receives less power and hence adds less nonlinear distortion to the signal. In a practical system, individual fiber parameters are not identical; so the optical pulses propagating in the fibers undergo different amounts of phase shifts and timing delays due to the fluctuations of fibers' propagation constants and fibers' inverse group speeds. Optical and electrical equalizers are proposed to compensate for these inter-core/fiber dispersions. In the case of an optical equalizer, adaptive time shifters and phase shifters are adjusted such that the maximum power is obtained at the output of power combiner. Our numerical simulation results show that for unrepeatered systems, the performance (Q factor) is improved by 6.2 dB using 8-core/fiber configuration as compared to single- core fiber system. In addition, for multi-span system, the transmission reach at BER of 2.1*10<sup>-3</sup> is quadrupled in 8-core/fiber configuration.</p> <p>In Chapter 6, a multi-channel multi-core/fiber architecture is proposed to reduce the inter-channel nonlinear distortions. In this architecture, different channels of a WDM system are interleaved between multiple cores/fibers which increases the channel spacing in each core/fiber. Higher channel spacing decreases the inter-channel nonlinear impairments in each core/fiber which leads to system performance improvement. At the end of each span, a multiplexer adds the channels from different cores/fibers so that one amplifier can be used for all of the channels. Unlike the single-channel multi-core/fiber system, the WDM multi-core/fiber system does not require equalizers since different cores/fibers carry channels with different frequencies. Simulation results show that for a 39-span system, the 4-core/fiber system with negligible crosstalk outperforms the single-core system by 2.2 dBQ<sub>20</sub>. The impact of crosstalk between cores of a multi-core fiber (MCF) on the system performance is studied. The simulation results show that the performance of the multi-core WDM system is less sensitive to the crosstalk effect compared to conventional multi-core systems since the propagating channels in the cores are not correlated in frequency domain.</p> / Doctor of Philosophy (PhD)
116

Quality-of-Service Aware Design and Management of Embedded Mixed-Criticality Systems

Ranjbar, Behnaz 12 April 2024 (has links)
Nowadays, implementing a complex system, which executes various applications with different levels of assurance, is a growing trend in modern embedded real-time systems to meet cost, timing, and power consumption requirements. Medical devices, automotive, and avionics industries are the most common safety-critical applications, exploiting these systems known as Mixed-Criticality (MC) systems. MC applications are real-time, and to ensure the correctness of these applications, it is essential to meet strict timing requirements as well as functional specifications. The correct design of such MC systems requires a thorough understanding of the system's functions and their importance to the system. A failure/deadline miss in functions with various criticality levels has a different impact on the system, from no effect to catastrophic consequences. Failure in the execution of tasks with higher criticality levels (HC tasks) may lead to system failure and cause irreparable damage to the system, while although Low-Criticality (LC) tasks assist the system in carrying out its mission successfully, their failure has less impact on the system's functionality and does not harm the system itself to fail. In order to guarantee the MC system safety, tasks are analyzed with different assumptions to obtain different Worst-Case Execution Times (WCETs) corresponding to the multiple criticality levels and the operation mode of the system. If the execution time of at least one HC task exceeds its low WCET, the system switches from low-criticality mode (LO mode) to high-criticality mode (HI mode). Then, all HC tasks continue executing by considering the high WCET to guarantee the system's safety. In this HI mode, all or some LC tasks are dropped/degraded in favor of HC tasks to ensure HC tasks' correct execution. Determining an appropriate low WCET for each HC task is crucial in designing efficient MC systems and ensuring QoS maximization. However, in the case where the low WCETs are set correctly, it is not recommended to drop/degrade the LC tasks in the HI mode due to its negative impact on the other functions or on the entire system in accomplishing its mission correctly. Therefore, how to analyze the task dropping in the HI mode is a significant challenge in designing efficient MC systems that must be considered to guarantee the successful execution of all HC tasks to prevent catastrophic damages while improving the QoS. Due to the continuous rise in computational demand for MC tasks in safety-critical applications, like controlling autonomous driving, the designers are motivated to deploy MC applications on multi-core platforms. Although the parallel execution feature of multi-core platforms helps to improve QoS and ensures the real-timeliness, high power consumption and temperature of cores may make the system more susceptible to failures and instability, which is not desirable in MC applications. Therefore, improving the QoS while managing the power consumption and guaranteeing real-time constraints is the critical issue in designing such MC systems in multi-core platforms. This thesis addresses the challenges associated with efficient MC system design. We first focus on application analysis by determining the appropriate WCET by proposing a novel approach to provide a reasonable trade-off between the number of scheduled LC tasks at design-time and the probability of mode switching at run-time to improve the system utilization and QoS. The approach presents an analytic-based scheme to obtain low WCETs based on the Chebyshev theorem at design-time. We also show the relationship between the low WCETs and mode switching probability, and formulate and solve the problem for improving resource utilization and reducing the mode switching probability. Further, we analyze the LC task dropping in the HI mode to improve QoS. We first propose a heuristic in which a new metric is defined that determines the number of allowable drops in the HI mode. Then, the task schedulability analysis is developed based on the new metric. Since the occurrence of the worst-case scenario at run-time is a rare event, a learning-based drop-aware task scheduling mechanism is then proposed, which carefully monitors the alterations in the behavior of MC systems at run-time to exploit the dynamic slacks for improving the QoS. Another critical design challenge is how to improve QoS using the parallel feature of multi-core platforms while managing the power consumption and temperature of these platforms. We develop a tree of possible task mapping and scheduling at design-time to cover all possible scenarios of task overrunning and reduce the LC task drop rate in the HI mode while managing the power and temperature in each scenario of task scheduling. Since the dynamic slack is generated due to the early execution of tasks at run-time, we propose an online approach to reduce the power consumption and maximum temperature by using low-power techniques like DVFS and task re-mapping, while preserving the QoS. Specifically, our approach examines multiple tasks ahead to determine the most appropriate task for the slack assignment that has the most significant effect on power consumption and temperature. However, changing the frequency and selecting a proper task for slack assignment and a suitable core for task re-mapping at run-time can be time-consuming and may cause deadline violation. Therefore, we analyze and optimize the run-time scheduler.:1. Introduction 1.1. Mixed-Criticality Application Design 1.2. Mixed-Criticality Hardware Design 1.3. Certain Challenges and Questions 1.4. Thesis Key Contributions 1.4.1. Application Analysis and Modeling 1.4.2. Multi-Core Mixed-Criticality System Design 1.5. Thesis Overview 2. Preliminaries and Literature Reviews 2.1. Preliminaries 2.1.1. Mixed-Criticality Systems 2.1.2. Fault-Tolerance, Fault Model and Safety Requirements 2.1.3. Hardware Architectural Modeling 2.1.4. Low-Power Techniques and Power Consumption Model 2.2. Related Works 2.2.1. Mixed-Criticality Task Scheduling Mechanisms 2.2.2. QoS Improvement Methods in Mixed-Criticality Systems 2.2.3. QoS-Aware Power and Thermal Management in Multi-Core Mixed-Criticality Systems 2.3. Conclusion 3. Bounding Time in Mixed-Criticality Systems 3.1. BOT-MICS: A Design-Time WCET Adjustment Approach 3.1.1. Motivational Example 3.1.2. BOT-MICS in Detail 3.1.3. Evaluation 3.2. A Run-Time WCET Adjustment Approach 3.2.1. Motivational Example 3.2.2. ADAPTIVE in Detail 3.2.3. Evaluation 3.3. Conclusion 4. Safety- and Task-Drop-Aware Mixed-Criticality Task Scheduling 4.1. Problem Objectives and Motivational Example 4.2. FANTOM in detail 4.2.1. Safety Quantification 4.2.2. MC Tasks Utilization Bounds Definition 4.2.3. Scheduling Analysis 4.2.4. System Upper Bound Utilization 4.2.5. A General Design Time Scheduling Algorithm 4.3. Evaluation 4.3.1. Evaluation with Real-Life Benchmarks 4.3.2. Evaluation with Synthetic Task Sets 4.4. Conclusion 5. Learning-Based Drop-Aware Mixed-Criticality Task Scheduling 5.1. Motivational Example and Problem Statement 5.2. Proposed Method in Detail 5.2.1. An Overview of the Design-Time Approach 5.2.2. Run-Time Approach: Employment of SOLID 5.2.3. LIQUID Approach 5.3. Evaluation 5.3.1. Evaluation with Real-Life Benchmarks 5.3.2. Evaluation with Synthetic Task Sets 5.3.3. Investigating the Timing and Memory Overheads of ML Technique 5.4. Conclusion 6. Fault-Tolerance and Power-Aware Multi-Core Mixed-Criticality System Design 6.1. Problem Objectives and Motivational Example 6.2. Design Methodology 6.3. Tree Generation and Fault-Tolerant Scheduling and Mapping 6.3.1. Making Scheduling Tree 6.3.2. Mapping and Scheduling 6.3.3. Time Complexity Analysis 6.3.4. Memory Space Analysis 6.4. Evaluation 6.4.1. Experimental Setup 6.4.2. Analyzing the Tree Construction Time 6.4.3. Analyzing the Run-Time Timing Overhead 6.4.4. Peak Power Management and Thermal Distribution for Real-Life and Synthetic Applications 6.4.5. Analyzing the QoS of LC Tasks 6.4.6. Analyzing the Peak Power Consumption and Maximum Temperature 6.4.7. Effect of Varying Different Parameters on Acceptance Ratio 6.4.8. Investigating Different Approaches at Run-Time 6.5. Conclusion 7. QoS- and Power-Aware Run-Time Scheduler for Multi-Core Mixed-Criticality Systems 7.1. Research Questions, Objectives and Motivational Example 7.2. Design-Time Approach 7.3. Run-Time Mixed-Criticality Scheduler 7.3.1. Selecting the Appropriate Task to Assign Slack 7.3.2. Re-Mapping Technique 7.3.3. Run-Time Management Algorithm 7.3.4. DVFS governor in Clustered Multi-Core Platforms 7.4. Run-Time Scheduler Algorithm Optimization 7.5. Evaluation 7.5.1. Experimental Setup 7.5.2. Analyzing the Relevance Between a Core Temperature and Energy Consumption 7.5.3. The Effect of Varying Parameters of Cost Functions 7.5.4. The Optimum Number of Tasks to Look-Ahead and the Effect of Task Re-mapping 7.5.5. The Analysis of Scheduler Timings Overhead on Different Real Platforms 7.5.6. The Latency of Changing Frequency in Real Platform 7.5.7. The Effect of Latency on System Schedulability 7.5.8. The Analysis of the Proposed Method on Peak Power, Energy and Maximum Temperature Improvement 7.5.9. The Analysis of the Proposed Method on Peak power, Energy and Maximum Temperature Improvement in a Multi-Core Platform Based on the ODROID-XU3 Architecture 7.5.10. Evaluation of Running Real MC Task Graph Model (Unmanned Air Vehicle) on Real Platform 7.6. Conclusion 8. Conclusion and Future Work 8.1. Conclusions 8.2. Future Work
117

Quantitative phase imaging through an ultra-thin lensless fiber endoscope

Sun, Jiawei, Wu, Jiachen, Wu, Song, Goswami, Ruchi, Girardo, Salvatore, Cao, Liangcai, Guck, Jochen, Koukourakis, Nektarios, Czarske, Juergen W. 08 April 2024 (has links)
Quantitative phase imaging (QPI) is a label-free technique providing both morphology and quantitative biophysical information in biomedicine. However, applying such a powerful technique to in vivo pathological diagnosis remains challenging. Multi-core fiber bundles (MCFs) enable ultra-thin probes for in vivo imaging, but current MCF imaging techniques are limited to amplitude imaging modalities. We demonstrate a computational lensless microendoscope that uses an ultra-thin bare MCF to perform quantitative phase imaging with microscale lateral resolution and nanoscale axial sensitivity of the optical path length. The incident complex light field at the measurement side is precisely reconstructed from the far-field speckle pattern at the detection side, enabling digital refocusing in a multi-layer sample without any mechanical movement. The accuracy of the quantitative phase reconstruction is validated by imaging the phase target and hydrogel beads through the MCF. With the proposed imaging modality, three-dimensional imaging of human cancer cells is achieved through the ultra-thin fiber endoscope, promising widespread clinical applications.
118

Application du concept des transactions pour la modélisation et la simulation multicoeur des systèmes sur puce

Anane, Amine 01 1900 (has links)
Avec la complexité croissante des systèmes sur puce, de nouveaux défis ne cessent d’émerger dans la conception de ces systèmes en matière de vérification formelle et de synthèse de haut niveau. Plusieurs travaux autour de SystemC, considéré comme la norme pour la conception au niveau système, sont en cours afin de relever ces nouveaux défis. Cependant, à cause du modèle de concurrence complexe de SystemC, relever ces défis reste toujours une tâche difficile. Ainsi, nous pensons qu’il est primordial de partir sur de meilleures bases en utilisant un modèle de concurrence plus efficace. Par conséquent, dans cette thèse, nous étudions une méthodologie de conception qui offre une meilleure abstraction pour modéliser des composants parallèles en se basant sur le concept de transaction. Nous montrons comment, grâce au raisonnement simple que procure le concept de transaction, il devient plus facile d’appliquer la vérification formelle, le raffinement incrémental et la synthèse de haut niveau. Dans le but d’évaluer l’efficacité de cette méthodologie, nous avons fixé l’objectif d’optimiser la vitesse de simulation d’un modèle transactionnel en profitant d’une machine multicoeur. Nous présentons ainsi l’environnement de modélisation et de simulation parallèle que nous avons développé. Nous étudions différentes stratégies d’ordonnancement en matière de parallélisme et de surcoût de synchronisation. Une expérimentation faite sur un modèle du transmetteur Wi-Fi 802.11a a permis d’atteindre une accélération d’environ 1.8 en utilisant deux threads. Avec 8 threads, bien que la charge de travail des différentes transactions n’était pas importante, nous avons pu atteindre une accélération d’environ 4.6, ce qui est un résultat très prometteur. / With the increasing complexity of SoCs, new challenges continue to emerge in the design of these systems in terms of formal verification and high-level synthesis. Several research efforts around SystemC, considered the de facto standard for system-level design, are underway to meet these new challenges. However, because of the complex concurrency model of SystemC, these challenges remain difficult tasks. Thus, we believe it is important to continue on a better footing by using a more effective concurrency model. Therefore, in this thesis, we study a design methodology that provides a better abstraction for modeling parallel components based on the concept of transaction. We show how, through simple reasoning about transactions, it becomes easier to apply formal verification, incremental refinement and high-level synthesis. In order to evaluate the effectiveness of this methodology, we set the goal to optimize the simulation speed of a transactional model by taking advantage of a multicore machine. We present a modeling and parallel simulation environment that we developed. We study different scheduling strategies in terms of parallelism and synchronization overhead. An experiment made on a Wi-Fi 802.11a transmitter model achieved a speed up of about 1.8 using two threads. With 8 threads, although the workload of individual transactions was not significant, we could reach a speed up equal to 4.6 which is a very promising result.
119

Stratégie de placement et d'ordonnancement de taches logicielles pour architectures reconfigurables sous contrainte énergétique / Mapping and scheduling strategy of OS tasks into reconfigurable architectures under energy constraint

Gammoudi, Aymen 26 June 2018 (has links)
La conception de systèmes temps-réel embarqués se développe de plus en plus avec l’intégration croissante de fonctionnalités critiques pour les applications de surveillance, notamment dans le domaine biomédical, environnemental, domotique, etc. Le développement de ces systèmes doit relever divers défis en termes de minimisation de la consommation énergétique. Gérer de tels dispositifs embarqués, entièrement autonomes, nécessite cependant de résoudre différents problèmes liés à la quantité d’énergie disponible dans la batterie, à l’ordonnancement temps-réel des tâches qui doivent être exécutées avant leurs échéances, aux scénarios de reconfiguration, particulièrement dans le cas d’ajout de tâches, et à la contrainte de communication pour pouvoir assurer l’échange des messages entre les processeurs, de façon à assurer une autonomie durable jusqu’à la prochaine recharge et ce, tout en maintenant un niveau de qualité de service acceptable du système de traitement. Pour traiter cette problématique, nous proposons dans ces travaux une stratégie de placement et d’ordonnancement de tâches permettant d’exécuter des applications temps-réel sur une architecture contenant des cœurs hétérogènes. Dans cette thèse, nous avons choisi d’aborder cette problématique de façon incrémentale pour traiter progressivement les problèmes liés aux contraintes temps-réel, énergétique et de communications. Tout d’abord, nous nous intéressons particulièrement à l’ordonnancement des tâches sur une architecture mono-cœur. Nous proposons une stratégie d’ordonnancement basée sur le regroupement des tâches dans des packs pour pouvoir calculer facilement les nouveaux paramètres des tâches afin de réobtenir la faisabilité du système. Puis, nous l’avons étendu pour traiter le cas de l’ordonnancement sur une architecture multi-cœurs homogènes. Finalement, une extension de ce dernier sera réalisée afin d’arriver à l’objectif principal qui est l’ordonnancement des tâches pour les architectures hétérogènes. L’idée est de prendre progressivement en compte des contraintes d’exécution de plus en plus complexes. Nous formalisons tous les problèmes en utilisant la formulation ILP afin de pouvoir produire des résultats optimaux. L’idée est de pouvoir situer nos solutions proposées par rapport aux solutions optimales produites par un solveur et par rapport aux autres algorithmes de l’état de l’art. Par ailleurs, la validation par simulation des stratégies proposées montre qu’elles engendrent un gain appréciable vis-à-vis des critères considérés importants dans les systèmes embarqués, notamment le coût de la communication entre cœurs et le taux de rejet des tâches. / The design of embedded real-time systems is developing more and more with the increasing integration of critical functionalities for monitoring applications, particularly in the biomedical, environmental, home automation, etc. The developement of these systems faces various challenges particularly in terms of minimizing energy consumption. Managing such autonomous embedded devices, requires solving various problems related to the amount of energy available in the battery and the real-time scheduling of tasks that must be executed before their deadlines, to the reconfiguration scenarios, especially in the case of adding tasks, and to the communication constraint to be able to ensure messages exchange between cores, so as to ensure a lasting autonomy until the next recharge, while maintaining an acceptable level of quality of services for the processing system. To address this problem, we propose in this work a new strategy of placement and scheduling of tasks to execute real-time applications on an architecture containing heterogeneous cores. In this thesis, we have chosen to tackle this problem in an incremental manner in order to deal progressively with problems related to real-time, energy and communication constraints. First of all, we are particularly interested in the scheduling of tasks for single-core architecture. We propose a new scheduling strategy based on grouping tasks in packs to calculate the new task parameters in order to re-obtain the system feasibility. Then we have extended it to address the scheduling tasks on an homogeneous multi-core architecture. Finally, an extension of the latter will be achieved in order to realize the main objective, which is the scheduling of tasks for the heterogeneous architectures. The idea is to gradually take into account the constraints that are more and more complex. We formalize the proposed strategy as an optimization problem by using integer linear programming (ILP) and we compare the proposed solutions with the optimal results provided by the CPLEX solver. Inaddition, the validation by simulation of the proposed strategies shows that they generate a respectable gain compared with the criteria considered important in embedded systems, in particular the cost of communication between cores and the rate of new tasks rejection.
120

Massively Parallel Cartesian Discrete Ordinates Method for Neutron Transport Simulation / SN cartésien massivement parallèle pour la simulation neutronique

Moustafa, Salli 15 December 2015 (has links)
La simulation haute-fidélité des coeurs de réacteurs nucléaires nécessite une évaluation précise du flux neutronique dans le coeur du réacteur. Ce flux est modélisé par l’équation de Boltzmann ou équation du transport neutronique. Dans cette thèse, on s’intéresse à la résolution de cette équation par la méthode des ordonnées discrètes (SN) sur des géométries cartésiennes. Cette méthode fait intervenir un schéma d’itérations à source, incluant un algorithme de balayage sur le domaine spatial qui regroupe l’essentiel des calculs effectués. Compte tenu du très grand volume de calcul requis par la résolution de l’équation de Boltzmann, de nombreux travaux antérieurs ont été consacrés à l’utilisation du calcul parallèle pour la résolution de cette équation. Jusqu’ici, ces algorithmes de résolution parallèles de l’équation du transport neutronique ont été conçus en considérant la machine cible comme une collection de processeurs mono-coeurs indépendants, et ne tirent donc pas explicitement profit de la hiérarchie mémoire et du parallélisme multi-niveaux présents sur les super-calculateurs modernes. Ainsi, la première contribution de cette thèse concerne l’étude et la mise en oeuvre de l’algorithme de balayage sur les super-calculateurs massivement parallèles modernes. Notre approche combine à la fois la vectorisation par des techniques de la programmation générique en C++, et la programmation hybride par l’utilisation d’un support d’exécution à base de tâches: PaRSEC. Nous avons démontré l’intérêt de cette approche grâce à des modèles de performances théoriques, permettant également de prédire le partitionnement optimal. Par ailleurs, dans le cas de la simulation des milieux très diffusifs tels que le coeur d’un REP, la convergence du schéma d’itérations à source est très lente. Afin d’accélérer sa convergence, nous avons implémenté un nouvel algorithme (PDSA), adapté à notre implémentation hybride. La combinaison de ces techniques nous a permis de concevoir une version massivement parallèle du solveur SN Domino. Les performances de la partie Sweep du solveur atteignent 33.9% de la performance crête théorique d’un super-calculateur à 768 cores. De plus, un calcul critique d’un réacteur de type REP 900MW à 26 groupes d’énergie mettant en jeu 1012 DDLs a été résolu en 46 minutes sur 1536 coeurs. / High-fidelity nuclear reactor core simulations require a precise knowledge of the neutron flux inside the reactor core. This flux is modeled by the linear Boltzmann equation also called neutron transport equation. In this thesis, we focus on solving this equation using the discrete ordinates method (SN) on Cartesian mesh. This method involves a source iteration scheme including a sweep over the spatial mesh and gathering the vast majority of computations in the SN method. Due to the large amount of computations performed in the resolution of the Boltzmann equation, numerous research works were focused on the optimization of the time to solution by developing parallel algorithms for solving the transport equation. However, these algorithms were designed by considering a super-computer as a collection of independent cores, and therefore do not explicitly take into account the memory hierarchy and multi-level parallelism available inside modern super-computers. Therefore, we first proposed a strategy for designing an efficient parallel implementation of the sweep operation on modern architectures by combining the use of the SIMD paradigm thanks to C++ generic programming techniques and an emerging task-based runtime system: PaRSEC. We demonstrated the need for such an approach using theoretical performance models predicting optimal partitionings. Then we studied the challenge of converging the source iterations scheme in highly diffusive media such as the PWR cores. We have implemented and studied the convergence of a new acceleration scheme (PDSA) that naturally suits our Hybrid parallel implementation. The combination of all these techniques have enabled us to develop a massively parallel version of the SN Domino solver. It is capable of tackling the challenges posed by the neutron transport simulations and compares favorably with state-of-the-art solvers such as Denovo. The performance of the PaRSEC implementation of the sweep operation reaches 6.1 Tflop/s on 768 cores corresponding to 33.9% of the theoretical peak performance of this set of computational resources. For a typical 26-group PWR calculations involving 1.02×1012 DoFs, the time to solution required by the Domino solver is 46 min using 1536 cores.

Page generated in 0.0833 seconds