Global ETD Search

11	Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip Zhu, Jun January 2009 (has links) <p>With the increasing capacity of today's integrated circuits, a number ofheterogeneous system-on-chip (SoC) architectures in embedded systemshave been proposed. In order to achieve energy and design cost efficientstreaming applications on these systems, new design space explorationframeworks and performance analysis approaches are required. Thisthesis considers three state-of-the-art SoCs architectures, i.e., themulti-processor SoCs (MPSoCs) with network-on-chip (NoC) communication,the hybrid CPU/FPGA architectures, and the run-time reconfigurable (RTR)FPGAs. The main topic of the author?s research is to model and capturethe application scheduling, architecture customization, and bufferdimensioning problems, according to the real-time requirement. Sincethese problems are NP-complete, heuristic algorithms and constraintprogramming solver are used to compute a solution.For NoC communication based MPSoCs, an approach to optimize thereal-time streaming applications with customized processorvoltage-frequency levels and memory sizes is presented. A multi-clockedsynchronous model of computation (MoC) framework is proposed inheterogeneous timing analysis and energy estimation. Using heuristicsearching (i.e., greedy and taboo search), the experiments show anenergy reduction (up to 21%) without any loss in application throughputcompared with an ad-hoc approach.On hybrid CPU/FPGA architectures, the buffer minimization scheduling ofreal-time streaming applications is addressed. Based on event models,the problem has been formalized decoratively as constraint basescheduling, and solved by public domain constraint solver Gecode.Compared with traditional PAPS method, the proposed method needssignificantly smaller buffers (2.4% of PAPS in the best case), whilehigh throughput guarantees can still be achieved.Furthermore, a novel compile-time analysis approach based on iterativetiming phases is proposed for run-time reconfigurations in adaptivereal-time streaming applications on RTR FPGAs. Finally, thereconfigurations analysis and design trade-offs analysis capabilities ofthe proposed framework have been exemplified with experiments on bothexample and industrial applications.</p> / Andres Streaming applications Systems-on-chip Synchronous dataflow energy efficiency buffer minimization performance analysis Informatik, data- och systemvetenskap
12	Des réseaux de processus cyclo-statiques à la génération de code pour le pipeline multi-dimensionnel / From Cyclo-Static Process Networks to Code Generation for Multidimensional Software Pipelining Fellahi, Mohammed 22 April 2011 (has links) Les applications de flux de données sont des cibles importantes de l’optimisation de programme en raison de leur haute exigence de calcul et la diversité de leurs domaines d’application: communication, systèmes embarqués, multimédia, etc. L’un des problèmes les plus importants et difficiles dans la conception des langages de programmation destinés à ce genre d’applications est comment les ordonnancer à grain fin à fin d’exploiter les ressources disponibles de la machine.Dans cette thèse on propose un "framework" pour l’ordonnancement à grain fin des applications de flux de données et des boucles imbriquées en général. Premièrement on essaye de paralléliser le nombre maximum de boucles en appliquant le pipeline logiciel. Après on merge le prologue et l’épilogue de chaque boucle (phase) parallélisée pour éviter l’augmentation de la taille du code. Ce processus est un pipeline multidimensionnel, quelques occurrences (ou instructions) sont décalées par des iterations de la boucle interne et d’autres occurrences (instructions) par des iterationsde la boucle externe. Les expériences montrent que l’application de cette technique permet l’amélioration des performances, extraction du parallélisme sans augmenter la taille du code, à la fois dans le cas des applications de flux des donnée et des boucles imbriquées en général. / Applications based on streams, ordered sequences of data values, are important targets of program optimization because of their high computational requirements and the diversity of their application domains: communication, embedded systems, multimedia, etc. One of the most important and difficult problems in special purpose stream language design and implementation is how to schedule these applications in a fine-grain way to exploit available machine resources In this thesis we propose a framework for fine-grain scheduling of streaming applications and nested loops in general. First, we try to pipeline steady state phases (inner loops), by finding the repeated kernel pattern, and executing actor occurrences in parallel as much as possible. Then we merge the kernel prolog and epilog of pipelined phases to move them out of the outer loop. Merging the kernel prolog and epilog means that we shift acotor occurrences, or instructions, from one phase iteration to another and from one outer loop iteration to another, a multidimensional shifting. Experimental shows that our framwork can imporove perfomance, prallelism extraction without increasing the code size, in streaming applications and nested loops in general. Applications de flux de données Boucles imbriquées Pipeline logiciel Ordonnancement multidimentionnel Streaming applications Nested loops Multidimensional scheduling Software Pipelining
13	Memory Optimizations for Distributed Stream-based Applications Harel, Nissim 01 November 2006 (has links) Distributed stream-based applications manage large quantities of data and exhibit unique production and consumption patterns that set them apart from general-purpose applications. This dissertation examines possible ways of creating more efficient memory management schemes. Specifically, it looks at the memory reclamation problem. It takes advantage of special traits of streaming applications to extend the definition of the garbage collection problem for those applications and include not only data items that are not reachable but also items that have no effect on the final outcome of the application. Streaming applications typically fully process only a portion of the data, and resources directed towards the remaining data items (i.e., those that dont affect the final outcome) can be viewed as wasted resources that should be minimized. Two complementary approaches are suggested: 1. Garbage Identification 2. Adaptive Resource Utilization Garbage Identification is concerned with an analysis of dynamic data dependencies to infer those items that the application is no longer going to access. Several garbage identification algorithms are examined. Each one of the algorithms uses a set of application properties (possibly distinct from one another) to reduce the memory consumption of the application. The performance of these garbage identification algorithms is compared to the performance of an ideal garbage collector, using a novel logging/post-mortem analyzer. The results indicate that the algorithms that achieve a low memory footprint (close to that of an ideal garbage collector) perform their garbage identification decisions locally; however, they base these decisions on best-effort global information obtained from other components of the distributed application. The Adaptive Resource Utilization (ARU) algorithm analyzes the dynamic relationships between the production and consumption of data items. It uses this information to infer the capacity of the system to process data items and adjusts data generation accordingly. The ARU algorithm makes local capacity decisions based on best-effort global information. This algorithm is found to be as effective as the most successful garbage identification algorithm in reducing the memory footprint of stream-based applications, thus confirming the observation that using best-effort global information to perform local decisions is fundamental in reducing memory consumption for stream-based applications. Simulations Resource management Streaming applications Garbage collection Memory management Computer simulation Electronic data processing
14	Scheduling for Reliability : complexity and Algorithms Dufossé, Fanny 06 September 2011 (has links) (PDF) This thesis deals with the mapping and the scheduling of workflows. In this context, we consider unreliable platforms, with processors subject to failures. In a first part, we consider a particular model of streaming applications : the filtering services. In this context, we aim at minimizing period and latency. We first neglect communication costs. In this model, we study scheduling problems on homogeneous and heterogeneous platforms. Then, the impact of communication costs on scheduling problems of a filtering application is studied. Finally, we consider the scheduling problem of such an application on a chain of processors. The theoretical complexity of any variant of this problem is proved. This filtering property can model the reliability of processors. The results of some computations are successfully computed, and some other ones are lost. We consider the more frequent failure types : transient failures. We aim efficient and reliable schedules. The complexity of many variants of this problem is proved. Two heuristics are proposed and compared using using simulations. Even if transient failures are the most common failures in classical grids, some particular type of platform are more concerned by other type of problems. Desktop grids are especially unstable. In this context, we want to execute iterative applications. All tasks are executed, then a synchronization occurs, and so on. Two variants of this problem are considered : applicationsof independent tasks, and applications where all tasks need to be executed at same speed. In both cases, the problem is first theoretically studied, then heuristics are proposed and compared using simulations. [INFO:INFO_OH] Computer Science/Other Streaming applications Multicriteria optimization Heuristics Linear programs Heterogeneous platforms Complexity results Desktop grids Transient failures Reliability
15	Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip Zhu, Jun January 2009 (has links) With the increasing capacity of today's integrated circuits, a number ofheterogeneous system-on-chip (SoC) architectures in embedded systemshave been proposed. In order to achieve energy and design cost efficientstreaming applications on these systems, new design space explorationframeworks and performance analysis approaches are required. Thisthesis considers three state-of-the-art SoCs architectures, i.e., themulti-processor SoCs (MPSoCs) with network-on-chip (NoC) communication,the hybrid CPU/FPGA architectures, and the run-time reconfigurable (RTR)FPGAs. The main topic of the author?s research is to model and capturethe application scheduling, architecture customization, and bufferdimensioning problems, according to the real-time requirement. Sincethese problems are NP-complete, heuristic algorithms and constraintprogramming solver are used to compute a solution.For NoC communication based MPSoCs, an approach to optimize thereal-time streaming applications with customized processorvoltage-frequency levels and memory sizes is presented. A multi-clockedsynchronous model of computation (MoC) framework is proposed inheterogeneous timing analysis and energy estimation. Using heuristicsearching (i.e., greedy and taboo search), the experiments show anenergy reduction (up to 21%) without any loss in application throughputcompared with an ad-hoc approach.On hybrid CPU/FPGA architectures, the buffer minimization scheduling ofreal-time streaming applications is addressed. Based on event models,the problem has been formalized decoratively as constraint basescheduling, and solved by public domain constraint solver Gecode.Compared with traditional PAPS method, the proposed method needssignificantly smaller buffers (2.4% of PAPS in the best case), whilehigh throughput guarantees can still be achieved.Furthermore, a novel compile-time analysis approach based on iterativetiming phases is proposed for run-time reconfigurations in adaptivereal-time streaming applications on RTR FPGAs. Finally, thereconfigurations analysis and design trade-offs analysis capabilities ofthe proposed framework have been exemplified with experiments on bothexample and industrial applications. / Andres Streaming applications Systems-on-chip Synchronous dataflow energy efficiency buffer minimization performance analysis Computer and Information Sciences Data- och informationsvetenskap
16	On The Issues Of Supporting On-Demand Streaming Application Over Peer-to-Peer Networks Kalapriya, K 06 1900 (has links) Bandwidth and resource constraints at the server side is a limitation for deployment of streaming media applications. Resource constraints at the server side often leads to saturation of resources during sudden increase in requests. End System Multicast (ESM) is used to overcome the problem of resource saturation. Resources such as storage, bandwidth available at the end systems are utilized to deliver streaming media. In ESM, the end-systems (also known as peers) form a network which is commonly known as Peer-to-Peer (P2P) network. These peers that receive the stream in turn act as routable components and forward the stream to other requests. These peers do not possess server like characteristics. The peers diﬀer from the server in the following ways: (a) they join and exit the system at will (b) unlike servers, they are not reliable source of media. This induces instability in the network. Therefore, streaming media solution over such unstable peer network is a challenging task. Two kinds of media streaming is supported by ESM, namely, live streaming media and on-demand streaming media. ESM is well studied to support live streaming media. In this thesis we explore the eﬀectiveness of using ESM to support on-demand streaming media over P2P network. There are two major issues to support on-demand streaming video.They are: (a)unlike live streaming, every request should be served from the beginning of the stream and (b) instability in the network due to peer characteristics (particularly transience of peers). In our work, late arriving peers can join the existing stream if the initial segments can be served to these peers. In this scheme, a single stream is used to serve multiple requests and therefore the throughput increases. We propose patching mechanism in which the initial segments of media are temporarily cached in the peers as patches. The peers as they join, contribute storage and this storage space is used to cache the initial segments. The patching mechanism is controlled by Expanding Window Control Protocol (EWCP). EWCP deﬁnes a “virtual window” that logically represents the aggregated cache contributed by the peers. The window expands as the peer contribute more resources. Larger the window size more is the number of clients that can be served by a single stream. GAP is formed when contiguous segments of media is lost. GAP limits the expansion of the virtual window. We explore the conditions that lead to the formation of GAP. GAP is formed due to the transience and non-cooperation of peers. Transience of peers coupled with real time nature of the application requires fast failure recovery algorithms and methods to overcome loss of media segments. We propose an eﬃcient peer management protocol that provides constant failure recovery time. We explore several redundancy techniques to overcome the problem of loss of video segments during transience of peers. Peer characteristics (duration, resource contribution etc.) have signiﬁcant impact on performance.The design of peer management protocol must include peer characteristics to increase its eﬀectiveness. In this thesis we present detailed analysis of the relationship between the peer characteristics and performance. Our results indicate that peer characteristics and realtime nature of the application control the performance of the system. Based on our study, we propose algorithms that considers these parameters and increase the performance of the system. Finally, we bring all the pieces of our work together into a comprehensive system architecture for streaming media over P2P networks. We have implemented a prototype Black-Board System (BBS), a distance program utility that reﬂects the main concepts of our work. We show that algorithms that exploit peer characteristics performs well in P2P networks. Peer to Peer Networks P2P Networks End System Multicast (ESM) On-Demand Streaming Applications Black Board System P2P Networks - Clustering Algorithms Expanding Window Control Protocol (EWCP) Computer Science
17	Modèles de calculs flot de données avec paramètres entiers et booléens. Modélisation - Analyses - Mise en oeuvre / Boolean Parametric Data Flow Modeling - Analyses - Implementation Bempelis, Evangelos 26 February 2015 (has links) Les applications de gestion de flux sont responsables de la majorité des calculs des systèmes embarqués (vidéo conférence, vision par ordinateur). Leurs exigences de haute performance rendent leur mise en œuvre parallèle nécessaire. Par conséquent, il est de plus en plus courant que les systèmes embarqués modernes incluent des processeurs multi-cœurs qui permettent un parallélisme massif. La mise en œuvre des applications de gestion de flux sur des multi-cœurs est difficile à cause de leur complexité, qui tend à augmenter, et de leurs exigences strictes à la fois qualitatives (robustesse, fiabilité) et quantitatives (débit, consommation d'énergie). Ceci est observé dans l'évolution de codecs vidéo qui ne cessent d'augmenter en complexité, tandis que leurs exigences de performance demeurent les mêmes. Les modèles de calcul (MdC) flot de données ont été développés pour faciliter la conception de ces applications qui sont typiquement composées de filtres qui échangent des flux de données via des liens de communication. Ces modèles fournissent une représentation intuitive des applications de gestion de flux, tout en exposant le parallélisme de tâches de l'application. En outre, ils fournissent des analyses statiques pour la vivacité et l'exécution en mémoire bornée. Cependant, les applications de gestion de flux modernes comportent des filtres qui échangent des quantités de données variables, et des liens de communication qui peuvent être activés / désactivés. Dans cette thèse, nous présentons un nouveau MdC flot de données, le Boolean Parametric Data Flow (BPDF), qui permet le paramétrage de la quantité de données échangées entre les filtres en utilisant des paramètres entiers et l'activation et la désactivation de liens de communication en utilisant des paramètres booléens. De cette manière, BPDF est capable de exprimer des applications plus complexes, comme les décodeurs vidéo modernes. Malgré l'augmentation de l'expressivité, les applications BPDF restent statiquement analysables pour la vivacité et l'exécution en mémoire bornée. Cependant, l'expressivité accrue complique grandement la mise en œuvre. Les paramètres entiers entraînent des dépendances de données de type paramétrique et les paramètres booléens peuvent désactiver des liens de communication et ainsi éliminer des dépendances de données. Pour cette raison, nous proposons un cadre d'ordonnancement qui produit des ordonnancements de type ``aussi tôt que possible'' (ASAP) pour un placement statique donné. Il utilise des contraintes d'ordonnancement, soit issues de l'application (dépendance de données) ou de l'utilisateur (optimisations d'ordonnancement). Les contraintes sont analysées pour la vivacité et, si possible, simplifiées. De cette façon, notre cadre permet une grande variété de politiques d'ordonnancement, tout en garantissant la vivacité de l'application. Enfin, le calcul du débit d'une application est important tant avant que pendant l'exécution. Il permet de vérifier que l'application satisfait ses exigences de performance et il permet de prendre des décisions d'ordonnancement à l'exécution qui peuvent améliorer la performance ou la consommation d'énergie. Nous traitons ce problème en trouvant des expressions paramétriques pour le débit maximum d'un sous-ensemble de BPDF. Enfin, nous proposons un algorithme qui calcule une taille des buffers suffisante pour que l'application BPDF ait un débit maximum. / Streaming applications are responsible for the majority of the computation load in many embedded systems (video conferencing, computer vision etc). Their high performance requirements make parallel implementations a necessity. Hence, more and more modern embedded systems include many-core processors that allow massive parallelism. Parallel implementation of streaming applications on many-core platforms is challenging because of their complexity, which tends to increase, and their strict requirements both qualitative (e.g., robustness, reliability) and quantitative (e.g., throughput, power consumption). This is observed in the evolution of video codecs that keep increasing in complexity, while their performance requirements remain the same or even increase. Data flow models of computation (MoCs) have been developed to facilitate the design process of such applications, which are typically composed of filters exchanging streams of data via communication links. Data flow MoCs provide an intuitive representation of streaming applications, while exposing the available parallelism of the application. Moreover, they provide static analyses for liveness and boundedness. However, modern streaming applications feature filters that exchange variable amounts of data, and communication links that are not always active. In this thesis, we present a new data flow MoC, the Boolean Parametric Data Flow (BPDF), that allows parametrization of the amount of data exchanged between the filters using integer parameters and the enabling and disabling of communication links using boolean parameters. In this way, BPDF is able to capture more complex streaming applications, like video decoders. Despite the increase in expressiveness, BPDF applications remain statically analyzable for liveness and boundedness. However, increased expressiveness greatly complicates implementation. Integer parameters result in parametric data dependencies and the boolean parameters disable communication links, effectively removing data dependencies. We propose a scheduling framework that facilitates the scheduling of BPDF applications. Our scheduling framework produces as soon as possible schedules for a given static mapping. It takes us input scheduling constraints that derive either from the application (data dependencies) or from the user (schedule optimizations). The constraints are analyzed for liveness and, if possible, simplified. In this way, our framework provides flexibility, while guaranteeing the liveness of the application. Finally, calculation of the throughput of an application is important both at compile-time and at run-time. It allows to verify at compile-time that the application meets its performance requirements and it allows to take scheduling decisions at run-time that can improve performance or power consumption. We approach this problem by finding parametric throughput expressions for the maximum throughput of a subset of BPDF graphs. Finally, we provide an algorithm that calculates sufficient buffer sizes for the BPDF graph to operate at maximum throughput. Modèles de programmation Flots de données Systèmes embarqués Ordonnancement Débit Applications de gestion de flux Programming models Dataflow Embedded systems Scheduling Throughput Streaming applications 004
18	Modèles de calculs flot de données avec paramètres entiers et booléens. Modélisation - Analyses - Mise en oeuvre / Boolean Parametric Data Flow Modeling - Analyses - Implementation Bempelis, Evangelos 26 February 2015 (has links) Les applications de gestion de flux sont responsables de la majorité des calculs des systèmes embarqués (vidéo conférence, vision par ordinateur). Leurs exigences de haute performance rendent leur mise en œuvre parallèle nécessaire. Par conséquent, il est de plus en plus courant que les systèmes embarqués modernes incluent des processeurs multi-cœurs qui permettent un parallélisme massif. La mise en œuvre des applications de gestion de flux sur des multi-cœurs est difficile à cause de leur complexité, qui tend à augmenter, et de leurs exigences strictes à la fois qualitatives (robustesse, fiabilité) et quantitatives (débit, consommation d'énergie). Ceci est observé dans l'évolution de codecs vidéo qui ne cessent d'augmenter en complexité, tandis que leurs exigences de performance demeurent les mêmes. Les modèles de calcul (MdC) flot de données ont été développés pour faciliter la conception de ces applications qui sont typiquement composées de filtres qui échangent des flux de données via des liens de communication. Ces modèles fournissent une représentation intuitive des applications de gestion de flux, tout en exposant le parallélisme de tâches de l'application. En outre, ils fournissent des analyses statiques pour la vivacité et l'exécution en mémoire bornée. Cependant, les applications de gestion de flux modernes comportent des filtres qui échangent des quantités de données variables, et des liens de communication qui peuvent être activés / désactivés. Dans cette thèse, nous présentons un nouveau MdC flot de données, le Boolean Parametric Data Flow (BPDF), qui permet le paramétrage de la quantité de données échangées entre les filtres en utilisant des paramètres entiers et l'activation et la désactivation de liens de communication en utilisant des paramètres booléens. De cette manière, BPDF est capable de exprimer des applications plus complexes, comme les décodeurs vidéo modernes. Malgré l'augmentation de l'expressivité, les applications BPDF restent statiquement analysables pour la vivacité et l'exécution en mémoire bornée. Cependant, l'expressivité accrue complique grandement la mise en œuvre. Les paramètres entiers entraînent des dépendances de données de type paramétrique et les paramètres booléens peuvent désactiver des liens de communication et ainsi éliminer des dépendances de données. Pour cette raison, nous proposons un cadre d'ordonnancement qui produit des ordonnancements de type ``aussi tôt que possible'' (ASAP) pour un placement statique donné. Il utilise des contraintes d'ordonnancement, soit issues de l'application (dépendance de données) ou de l'utilisateur (optimisations d'ordonnancement). Les contraintes sont analysées pour la vivacité et, si possible, simplifiées. De cette façon, notre cadre permet une grande variété de politiques d'ordonnancement, tout en garantissant la vivacité de l'application. Enfin, le calcul du débit d'une application est important tant avant que pendant l'exécution. Il permet de vérifier que l'application satisfait ses exigences de performance et il permet de prendre des décisions d'ordonnancement à l'exécution qui peuvent améliorer la performance ou la consommation d'énergie. Nous traitons ce problème en trouvant des expressions paramétriques pour le débit maximum d'un sous-ensemble de BPDF. Enfin, nous proposons un algorithme qui calcule une taille des buffers suffisante pour que l'application BPDF ait un débit maximum. / Streaming applications are responsible for the majority of the computation load in many embedded systems (video conferencing, computer vision etc). Their high performance requirements make parallel implementations a necessity. Hence, more and more modern embedded systems include many-core processors that allow massive parallelism. Parallel implementation of streaming applications on many-core platforms is challenging because of their complexity, which tends to increase, and their strict requirements both qualitative (e.g., robustness, reliability) and quantitative (e.g., throughput, power consumption). This is observed in the evolution of video codecs that keep increasing in complexity, while their performance requirements remain the same or even increase. Data flow models of computation (MoCs) have been developed to facilitate the design process of such applications, which are typically composed of filters exchanging streams of data via communication links. Data flow MoCs provide an intuitive representation of streaming applications, while exposing the available parallelism of the application. Moreover, they provide static analyses for liveness and boundedness. However, modern streaming applications feature filters that exchange variable amounts of data, and communication links that are not always active. In this thesis, we present a new data flow MoC, the Boolean Parametric Data Flow (BPDF), that allows parametrization of the amount of data exchanged between the filters using integer parameters and the enabling and disabling of communication links using boolean parameters. In this way, BPDF is able to capture more complex streaming applications, like video decoders. Despite the increase in expressiveness, BPDF applications remain statically analyzable for liveness and boundedness. However, increased expressiveness greatly complicates implementation. Integer parameters result in parametric data dependencies and the boolean parameters disable communication links, effectively removing data dependencies. We propose a scheduling framework that facilitates the scheduling of BPDF applications. Our scheduling framework produces as soon as possible schedules for a given static mapping. It takes us input scheduling constraints that derive either from the application (data dependencies) or from the user (schedule optimizations). The constraints are analyzed for liveness and, if possible, simplified. In this way, our framework provides flexibility, while guaranteeing the liveness of the application. Finally, calculation of the throughput of an application is important both at compile-time and at run-time. It allows to verify at compile-time that the application meets its performance requirements and it allows to take scheduling decisions at run-time that can improve performance or power consumption. We approach this problem by finding parametric throughput expressions for the maximum throughput of a subset of BPDF graphs. Finally, we provide an algorithm that calculates sufficient buffer sizes for the BPDF graph to operate at maximum throughput. Modèles de programmation Flots de données Systèmes embarqués Ordonnancement Débit Applications de gestion de flux Programming models Dataflow Embedded systems Scheduling Throughput Streaming applications 004
19	Architectures parallèles reconfigurables pour le traitement vidéo temps-réel / Parallel reconfigurable hardware architectures for video processing applications Ali, Karim Mohamed Abedallah 08 February 2018 (has links) Les applications vidéo embarquées sont de plus en plus intégrées dans des systèmes de transport intelligents tels que les véhicules autonomes. De nombreux défis sont rencontrés par les concepteurs de ces applications, parmi lesquels : le développement des algorithmes complexes, la vérification et le test des différentes contraintes fonctionnelles et non-fonctionnelles, la nécessité d’automatiser le processus de conception pour augmenter la productivité, la conception d’une architecture matérielle adéquate pour exploiter le parallélisme inhérent et pour satisfaire la contrainte temps-réel, réduire la puissance consommée pour prolonger la durée de fonctionnement avant de recharger le véhicule, etc. Dans ce travail de thèse, nous avons utilisé les technologies FPGAs pour relever certains de ces défis et proposer des architectures matérielles reconfigurables dédiées pour des applications embarquées de traitement vidéo temps-réel. Premièrement, nous avons implémenté une architecture parallèle flexible avec deux contributions principales : (1) Nous avons proposé un modèle générique de distribution/collecte de pixels pour résoudre le problème de transfert de données à haut débit à travers le système. Les paramètres du modèle requis sont tout d’abord définis puis la génération de l’architecture a été automatisée pour minimiser le temps de développement. (2) Nous avons appliqué une technique d’ajustement de la fréquence pour réduire la consommation d’énergie. Nous avons dérivé les équations nécessaires pour calculer le niveau maximum de parallélisme ainsi que les équations utilisées pour calculer la taille des FIFO pour le passage d’un domaine de l’horloge à un autre. Au fur et à mesure que le nombre de cellules logiques sur une seule puce FPGAaugmente, passer à des niveaux d’abstraction plus élevés devient inévitable pour réduire la contrainte de « time-to-market » et augmenter la productivité des concepteurs. Pendant la phase de conception, l’espace de solutions architecturales présente un grand nombre d’alternatives avec des performances différentes en termes de temps d’exécution, ressources matérielles, consommation d’énergie, etc. Face à ce défi, nous avons développé l’outil ViPar avec deux contributions principales : (1) Un modèle empirique a été introduit pour estimer la consommation d’énergie basé sur l’utilisation du matériel (Slice et BRAM) et la fréquence de fonctionnement ; en plus de cela, nous avons dérivé les équations pour estimer les ressources matérielles et le temps d’exécution pour chaque alternative au cours de l’exploration de l’espace de conception. (2) En définissant les principales caractéristiques de l’architecture parallèle comme le niveau de parallélisme, le nombre de ports d’entrée/sortie, le modèle de distribution des pixels, ..., l’outil ViPar génère automatiquement l’architecture matérielle pour les solutions les plus pertinentes. Dans le cadre d’une collaboration industrielle avec NAVYA, nous avons utilisé l’outil ViPar pour implémenter une solution matérielle parallèle pour l’algorithme de stéréo matching « Multi-window Sum of Absolute Difference ». Dans cette implémentation, nous avons présenté un ensemble d’étapes pour modifier le code de description de haut niveau afin de l’adapter efficacement à l’implémentation matérielle. Nous avons également exploré l’espace de conception pour différentes alternatives en termes de performance, ressources matérielles, fréquence, et consommation d’énergie. Au cours de notre travail, les architectures matérielles ont été implémentées et testées expérimentalement sur la plateforme d’évaluation Xilinx Zynq ZC706. / Embedded video applications are now involved in sophisticated transportation systems like autonomous vehicles. Many challenges faced the designers to build those applications, among them: complex algorithms should be developed, verified and tested under restricted time-to-market constraints, the necessity for design automation tools to increase the design productivity, high computing rates are required to exploit the inherent parallelism to satisfy the real-time constraints, reducing the consumed power to extend the operating duration before recharging the vehicle, etc. In this thesis work, we used FPGA technologies to tackle some of these challenges to design parallel reconfigurable hardware architectures for embedded video streaming applications. First, we implemented a flexible parallel architecture with two main contributions: (1)We proposed a generic model for pixel distribution/collection to tackle the problem of the huge data transferring through the system. The required model parameters were defined then the architecture generation was automated to minimize the development time. (2) We applied frequency scaling as a technique for reducing power consumption. We derived the required equations for calculating the maximum level of parallelism as well as the ones used for calculating the depth of the inserted FIFOs for clock domain crossing. As the number of logic cells on a single FPGA chip increases, moving to higher abstraction design levels becomes inevitable to shorten the time-to-market constraint and to increase the design productivity. During the design phase, it is common to have a space of design alternatives that are different from each other regarding hardware utilization, power consumption and performance. We developed ViPar tool with two main contributions to tackle this problem: (1) An empirical model was introduced to estimate the power consumption based on the hardware utilization (Slice and BRAM) and the operating frequency. In addition to that, we derived the equations for estimating the hardware resources and the execution time for each point during the design space exploration. (2) By defining the main characteristics of the parallel architecture like parallelism level, the number of input/output ports, the pixel distribution pattern, etc. ViPar tool can automatically generate the parallel architecture for the selected designs for implementation. In the context of an industrial collaboration, we used high-level synthesis tools to implement a parallel hardware architecture for Multi-window Sum of Absolute Difference stereo matching algorithm. In this implementation, we presented a set of guiding steps to modify the high-level description code to fit efficiently for hardware implementation as well as we explored the design space for different alternatives in terms of hardware resources, performance, frequency and power consumption. During the thesis work, our designs were implemented and tested experimentally on Xilinx Zynq ZC706 (XC7Z045- FFG900) evaluation board. Applications vidéo temps-Réel Synthèse de haut niveau Exploration de l’espace de conception Fpga Video streaming applications Parallel reconfigurable architectures Highlevelsynthesis Design space exploration Fpga
20	A Novel Cloud Broker-based Resource Elasticity Management and Pricing for Big Data Streaming Applications Runsewe, Olubisi A. 28 May 2019 (has links) The pervasive availability of streaming data from various sources is driving todays’ enterprises to acquire low-latency big data streaming applications (BDSAs) for extracting useful information. In parallel, recent advances in technology have made it easier to collect, process and store these data streams in the cloud. For most enterprises, gaining insights from big data is immensely important for maintaining competitive advantage. However, majority of enterprises have diﬃculty managing the multitude of BDSAs and the complex issues cloud technologies present, giving rise to the incorporation of cloud service brokers (CSBs). Generally, the main objective of the CSB is to maintain the heterogeneous quality of service (QoS) of BDSAs while minimizing costs. To achieve this goal, the cloud, although with many desirable features, exhibits major challenges — resource prediction and resource allocation — for CSBs. First, most stream processing systems allocate a ﬁxed amount of resources at runtime, which can lead to under- or over-provisioning as BDSA demands vary over time. Thus, obtaining optimal trade-oﬀ between QoS violation and cost requires accurate demand prediction methodology to prevent waste, degradation or shutdown of processing. Second, coordinating resource allocation and pricing decisions for self-interested BDSAs to achieve fairness and eﬃciency can be complex. This complexity is exacerbated with the recent introduction of containers. This dissertation addresses the cloud resource elasticity management issues for CSBs as follows: First, we provide two contributions to the resource prediction challenge; we propose a novel layered multi-dimensional hidden Markov model (LMD-HMM) framework for managing time-bounded BDSAs and a layered multi-dimensional hidden semi-Markov model (LMD-HSMM) to address unbounded BDSAs. Second, we present a container resource allocation mechanism (CRAM) for optimal workload distribution to meet the real-time demands of competing containerized BDSAs. We formulate the problem as an n-player non-cooperative game among a set of heterogeneous containerized BDSAs. Finally, we incorporate a dynamic incentive-compatible pricing scheme that coordinates the decisions of self-interested BDSAs to maximize the CSB’s surplus. Experimental results demonstrate the eﬀectiveness of our approaches. Cloud Computing Big Data Resource Prediction Resource Allocation Stream Processing Game Theory Layered Hidden Markov Model Resource Management Container-Clusters Virtual Machines Streaming Applications Nash Equilibrium Queuing Theory Dynamic Pricing Resource scaling

Search results