• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19
  • 4
  • Tagged with
  • 23
  • 23
  • 10
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip

Zhu, Jun January 2009 (has links)
<p>With the increasing capacity of today's integrated circuits, a number ofheterogeneous  system-on-chip (SoC)  architectures  in embedded  systemshave been proposed. In order to achieve energy and design cost efficientstreaming applications  on these  systems, new design  space explorationframeworks  and  performance  analysis  approaches are  required.   Thisthesis  considers three state-of-the-art  SoCs architectures,  i.e., themulti-processor SoCs (MPSoCs)  with network-on-chip (NoC) communication,the hybrid CPU/FPGA architectures, and the run-time reconfigurable (RTR)FPGAs.  The main topic of the  author?s research is to model and capturethe  application  scheduling,  architecture  customization,  and  bufferdimensioning  problems, according to  the real-time  requirement.  Sincethese  problems  are NP-complete,  heuristic  algorithms and  constraintprogramming solver are used to compute a solution.For  NoC  communication  based  MPSoCs,  an  approach  to  optimize  thereal-time    streaming    applications    with   customized    processorvoltage-frequency levels and memory  sizes is presented. A multi-clockedsynchronous  model  of  computation   (MoC)  framework  is  proposed  inheterogeneous  timing analysis and  energy estimation.   Using heuristicsearching  (i.e., greedy  and  taboo search),  the  experiments show  anenergy reduction (up to 21%)  without any loss in application throughputcompared with an ad-hoc approach.On hybrid CPU/FPGA architectures,  the buffer minimization scheduling ofreal-time streaming  applications is addressed.  Based  on event models,the  problem  has  been  formalized  decoratively  as  constraint  basescheduling,  and  solved  by  public domain  constraint  solver  Gecode.Compared  with  traditional  PAPS  method,  the  proposed  method  needssignificantly smaller  buffers (2.4%  of PAPS in  the best  case), whilehigh throughput guarantees can still be achieved.Furthermore, a  novel compile-time analysis approach  based on iterativetiming  phases is  proposed  for run-time  reconfigurations in  adaptivereal-time   streaming   applications  on   RTR   FPGAs.   Finally,   thereconfigurations analysis and design trade-offs analysis capabilities ofthe proposed  framework have been  exemplified with experiments  on bothexample and industrial applications.</p> / Andres
12

Des réseaux de processus cyclo-statiques à la génération de code pour le pipeline multi-dimensionnel / From Cyclo-Static Process Networks to Code Generation for Multidimensional Software Pipelining

Fellahi, Mohammed 22 April 2011 (has links)
Les applications de flux de données sont des cibles importantes de l’optimisation de programme en raison de leur haute exigence de calcul et la diversité de leurs domaines d’application: communication, systèmes embarqués, multimédia, etc. L’un des problèmes les plus importants et difficiles dans la conception des langages de programmation destinés à ce genre d’applications est comment les ordonnancer à grain fin à fin d’exploiter les ressources disponibles de la machine.Dans cette thèse on propose un "framework" pour l’ordonnancement à grain fin des applications de flux de données et des boucles imbriquées en général. Premièrement on essaye de paralléliser le nombre maximum de boucles en appliquant le pipeline logiciel. Après on merge le prologue et l’épilogue de chaque boucle (phase) parallélisée pour éviter l’augmentation de la taille du code. Ce processus est un pipeline multidimensionnel, quelques occurrences (ou instructions) sont décalées par des iterations de la boucle interne et d’autres occurrences (instructions) par des iterationsde la boucle externe. Les expériences montrent que l’application de cette technique permet l’amélioration des performances, extraction du parallélisme sans augmenter la taille du code, à la fois dans le cas des applications de flux des donnée et des boucles imbriquées en général. / Applications based on streams, ordered sequences of data values, are important targets of program optimization because of their high computational requirements and the diversity of their application domains: communication, embedded systems, multimedia, etc. One of the most important and difficult problems in special purpose stream language design and implementation is how to schedule these applications in a fine-grain way to exploit available machine resources In this thesis we propose a framework for fine-grain scheduling of streaming applications and nested loops in general. First, we try to pipeline steady state phases (inner loops), by finding the repeated kernel pattern, and executing actor occurrences in parallel as much as possible. Then we merge the kernel prolog and epilog of pipelined phases to move them out of the outer loop. Merging the kernel prolog and epilog means that we shift acotor occurrences, or instructions, from one phase iteration to another and from one outer loop iteration to another, a multidimensional shifting. Experimental shows that our framwork can imporove perfomance, prallelism extraction without increasing the code size, in streaming applications and nested loops in general.
13

Memory Optimizations for Distributed Stream-based Applications

Harel, Nissim 01 November 2006 (has links)
Distributed stream-based applications manage large quantities of data and exhibit unique production and consumption patterns that set them apart from general-purpose applications. This dissertation examines possible ways of creating more efficient memory management schemes. Specifically, it looks at the memory reclamation problem. It takes advantage of special traits of streaming applications to extend the definition of the garbage collection problem for those applications and include not only data items that are not reachable but also items that have no effect on the final outcome of the application. Streaming applications typically fully process only a portion of the data, and resources directed towards the remaining data items (i.e., those that dont affect the final outcome) can be viewed as wasted resources that should be minimized. Two complementary approaches are suggested: 1. Garbage Identification 2. Adaptive Resource Utilization Garbage Identification is concerned with an analysis of dynamic data dependencies to infer those items that the application is no longer going to access. Several garbage identification algorithms are examined. Each one of the algorithms uses a set of application properties (possibly distinct from one another) to reduce the memory consumption of the application. The performance of these garbage identification algorithms is compared to the performance of an ideal garbage collector, using a novel logging/post-mortem analyzer. The results indicate that the algorithms that achieve a low memory footprint (close to that of an ideal garbage collector) perform their garbage identification decisions locally; however, they base these decisions on best-effort global information obtained from other components of the distributed application. The Adaptive Resource Utilization (ARU) algorithm analyzes the dynamic relationships between the production and consumption of data items. It uses this information to infer the capacity of the system to process data items and adjusts data generation accordingly. The ARU algorithm makes local capacity decisions based on best-effort global information. This algorithm is found to be as effective as the most successful garbage identification algorithm in reducing the memory footprint of stream-based applications, thus confirming the observation that using best-effort global information to perform local decisions is fundamental in reducing memory consumption for stream-based applications.
14

Scheduling for Reliability : complexity and Algorithms

Dufossé, Fanny 06 September 2011 (has links) (PDF)
This thesis deals with the mapping and the scheduling of workflows. In this context, we consider unreliable platforms, with processors subject to failures. In a first part, we consider a particular model of streaming applications : the filtering services. In this context, we aim at minimizing period and latency. We first neglect communication costs. In this model, we study scheduling problems on homogeneous and heterogeneous platforms. Then, the impact of communication costs on scheduling problems of a filtering application is studied. Finally, we consider the scheduling problem of such an application on a chain of processors. The theoretical complexity of any variant of this problem is proved. This filtering property can model the reliability of processors. The results of some computations are successfully computed, and some other ones are lost. We consider the more frequent failure types : transient failures. We aim efficient and reliable schedules. The complexity of many variants of this problem is proved. Two heuristics are proposed and compared using using simulations. Even if transient failures are the most common failures in classical grids, some particular type of platform are more concerned by other type of problems. Desktop grids are especially unstable. In this context, we want to execute iterative applications. All tasks are executed, then a synchronization occurs, and so on. Two variants of this problem are considered : applicationsof independent tasks, and applications where all tasks need to be executed at same speed. In both cases, the problem is first theoretically studied, then heuristics are proposed and compared using simulations.
15

Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip

Zhu, Jun January 2009 (has links)
With the increasing capacity of today's integrated circuits, a number ofheterogeneous  system-on-chip (SoC)  architectures  in embedded  systemshave been proposed. In order to achieve energy and design cost efficientstreaming applications  on these  systems, new design  space explorationframeworks  and  performance  analysis  approaches are  required.   Thisthesis  considers three state-of-the-art  SoCs architectures,  i.e., themulti-processor SoCs (MPSoCs)  with network-on-chip (NoC) communication,the hybrid CPU/FPGA architectures, and the run-time reconfigurable (RTR)FPGAs.  The main topic of the  author?s research is to model and capturethe  application  scheduling,  architecture  customization,  and  bufferdimensioning  problems, according to  the real-time  requirement.  Sincethese  problems  are NP-complete,  heuristic  algorithms and  constraintprogramming solver are used to compute a solution.For  NoC  communication  based  MPSoCs,  an  approach  to  optimize  thereal-time    streaming    applications    with   customized    processorvoltage-frequency levels and memory  sizes is presented. A multi-clockedsynchronous  model  of  computation   (MoC)  framework  is  proposed  inheterogeneous  timing analysis and  energy estimation.   Using heuristicsearching  (i.e., greedy  and  taboo search),  the  experiments show  anenergy reduction (up to 21%)  without any loss in application throughputcompared with an ad-hoc approach.On hybrid CPU/FPGA architectures,  the buffer minimization scheduling ofreal-time streaming  applications is addressed.  Based  on event models,the  problem  has  been  formalized  decoratively  as  constraint  basescheduling,  and  solved  by  public domain  constraint  solver  Gecode.Compared  with  traditional  PAPS  method,  the  proposed  method  needssignificantly smaller  buffers (2.4%  of PAPS in  the best  case), whilehigh throughput guarantees can still be achieved.Furthermore, a  novel compile-time analysis approach  based on iterativetiming  phases is  proposed  for run-time  reconfigurations in  adaptivereal-time   streaming   applications  on   RTR   FPGAs.   Finally,   thereconfigurations analysis and design trade-offs analysis capabilities ofthe proposed  framework have been  exemplified with experiments  on bothexample and industrial applications. / Andres
16

On The Issues Of Supporting On-Demand Streaming Application Over Peer-to-Peer Networks

Kalapriya, K 06 1900 (has links)
Bandwidth and resource constraints at the server side is a limitation for deployment of streaming media applications. Resource constraints at the server side often leads to saturation of resources during sudden increase in requests. End System Multicast (ESM) is used to overcome the problem of resource saturation. Resources such as storage, bandwidth available at the end systems are utilized to deliver streaming media. In ESM, the end-systems (also known as peers) form a network which is commonly known as Peer-to-Peer (P2P) network. These peers that receive the stream in turn act as routable components and forward the stream to other requests. These peers do not possess server like characteristics. The peers differ from the server in the following ways: (a) they join and exit the system at will (b) unlike servers, they are not reliable source of media. This induces instability in the network. Therefore, streaming media solution over such unstable peer network is a challenging task. Two kinds of media streaming is supported by ESM, namely, live streaming media and on-demand streaming media. ESM is well studied to support live streaming media. In this thesis we explore the effectiveness of using ESM to support on-demand streaming media over P2P network. There are two major issues to support on-demand streaming video.They are: (a)unlike live streaming, every request should be served from the beginning of the stream and (b) instability in the network due to peer characteristics (particularly transience of peers). In our work, late arriving peers can join the existing stream if the initial segments can be served to these peers. In this scheme, a single stream is used to serve multiple requests and therefore the throughput increases. We propose patching mechanism in which the initial segments of media are temporarily cached in the peers as patches. The peers as they join, contribute storage and this storage space is used to cache the initial segments. The patching mechanism is controlled by Expanding Window Control Protocol (EWCP). EWCP defines a “virtual window” that logically represents the aggregated cache contributed by the peers. The window expands as the peer contribute more resources. Larger the window size more is the number of clients that can be served by a single stream. GAP is formed when contiguous segments of media is lost. GAP limits the expansion of the virtual window. We explore the conditions that lead to the formation of GAP. GAP is formed due to the transience and non-cooperation of peers. Transience of peers coupled with real time nature of the application requires fast failure recovery algorithms and methods to overcome loss of media segments. We propose an efficient peer management protocol that provides constant failure recovery time. We explore several redundancy techniques to overcome the problem of loss of video segments during transience of peers. Peer characteristics (duration, resource contribution etc.) have significant impact on performance.The design of peer management protocol must include peer characteristics to increase its effectiveness. In this thesis we present detailed analysis of the relationship between the peer characteristics and performance. Our results indicate that peer characteristics and realtime nature of the application control the performance of the system. Based on our study, we propose algorithms that considers these parameters and increase the performance of the system. Finally, we bring all the pieces of our work together into a comprehensive system architecture for streaming media over P2P networks. We have implemented a prototype Black-Board System (BBS), a distance program utility that reflects the main concepts of our work. We show that algorithms that exploit peer characteristics performs well in P2P networks.
17

Modèles de calculs flot de données avec paramètres entiers et booléens. Modélisation - Analyses - Mise en oeuvre / Boolean Parametric Data Flow Modeling - Analyses - Implementation

Bempelis, Evangelos 26 February 2015 (has links)
Les applications de gestion de flux sont responsables de la majorité des calculs des systèmes embarqués (vidéo conférence, vision par ordinateur). Leurs exigences de haute performance rendent leur mise en œuvre parallèle nécessaire. Par conséquent, il est de plus en plus courant que les systèmes embarqués modernes incluent des processeurs multi-cœurs qui permettent un parallélisme massif. La mise en œuvre des applications de gestion de flux sur des multi-cœurs est difficile à cause de leur complexité, qui tend à augmenter, et de leurs exigences strictes à la fois qualitatives (robustesse, fiabilité) et quantitatives (débit, consommation d'énergie). Ceci est observé dans l'évolution de codecs vidéo qui ne cessent d'augmenter en complexité, tandis que leurs exigences de performance demeurent les mêmes. Les modèles de calcul (MdC) flot de données ont été développés pour faciliter la conception de ces applications qui sont typiquement composées de filtres qui échangent des flux de données via des liens de communication. Ces modèles fournissent une représentation intuitive des applications de gestion de flux, tout en exposant le parallélisme de tâches de l'application. En outre, ils fournissent des analyses statiques pour la vivacité et l'exécution en mémoire bornée. Cependant, les applications de gestion de flux modernes comportent des filtres qui échangent des quantités de données variables, et des liens de communication qui peuvent être activés / désactivés. Dans cette thèse, nous présentons un nouveau MdC flot de données, le Boolean Parametric Data Flow (BPDF), qui permet le paramétrage de la quantité de données échangées entre les filtres en utilisant des paramètres entiers et l'activation et la désactivation de liens de communication en utilisant des paramètres booléens. De cette manière, BPDF est capable de exprimer des applications plus complexes, comme les décodeurs vidéo modernes. Malgré l'augmentation de l'expressivité, les applications BPDF restent statiquement analysables pour la vivacité et l'exécution en mémoire bornée. Cependant, l'expressivité accrue complique grandement la mise en œuvre. Les paramètres entiers entraînent des dépendances de données de type paramétrique et les paramètres booléens peuvent désactiver des liens de communication et ainsi éliminer des dépendances de données. Pour cette raison, nous proposons un cadre d'ordonnancement qui produit des ordonnancements de type ``aussi tôt que possible'' (ASAP) pour un placement statique donné. Il utilise des contraintes d'ordonnancement, soit issues de l'application (dépendance de données) ou de l'utilisateur (optimisations d'ordonnancement). Les contraintes sont analysées pour la vivacité et, si possible, simplifiées. De cette façon, notre cadre permet une grande variété de politiques d'ordonnancement, tout en garantissant la vivacité de l'application. Enfin, le calcul du débit d'une application est important tant avant que pendant l'exécution. Il permet de vérifier que l'application satisfait ses exigences de performance et il permet de prendre des décisions d'ordonnancement à l'exécution qui peuvent améliorer la performance ou la consommation d'énergie. Nous traitons ce problème en trouvant des expressions paramétriques pour le débit maximum d'un sous-ensemble de BPDF. Enfin, nous proposons un algorithme qui calcule une taille des buffers suffisante pour que l'application BPDF ait un débit maximum. / Streaming applications are responsible for the majority of the computation load in many embedded systems (video conferencing, computer vision etc). Their high performance requirements make parallel implementations a necessity. Hence, more and more modern embedded systems include many-core processors that allow massive parallelism. Parallel implementation of streaming applications on many-core platforms is challenging because of their complexity, which tends to increase, and their strict requirements both qualitative (e.g., robustness, reliability) and quantitative (e.g., throughput, power consumption). This is observed in the evolution of video codecs that keep increasing in complexity, while their performance requirements remain the same or even increase. Data flow models of computation (MoCs) have been developed to facilitate the design process of such applications, which are typically composed of filters exchanging streams of data via communication links. Data flow MoCs provide an intuitive representation of streaming applications, while exposing the available parallelism of the application. Moreover, they provide static analyses for liveness and boundedness. However, modern streaming applications feature filters that exchange variable amounts of data, and communication links that are not always active. In this thesis, we present a new data flow MoC, the Boolean Parametric Data Flow (BPDF), that allows parametrization of the amount of data exchanged between the filters using integer parameters and the enabling and disabling of communication links using boolean parameters. In this way, BPDF is able to capture more complex streaming applications, like video decoders. Despite the increase in expressiveness, BPDF applications remain statically analyzable for liveness and boundedness. However, increased expressiveness greatly complicates implementation. Integer parameters result in parametric data dependencies and the boolean parameters disable communication links, effectively removing data dependencies. We propose a scheduling framework that facilitates the scheduling of BPDF applications. Our scheduling framework produces as soon as possible schedules for a given static mapping. It takes us input scheduling constraints that derive either from the application (data dependencies) or from the user (schedule optimizations). The constraints are analyzed for liveness and, if possible, simplified. In this way, our framework provides flexibility, while guaranteeing the liveness of the application. Finally, calculation of the throughput of an application is important both at compile-time and at run-time. It allows to verify at compile-time that the application meets its performance requirements and it allows to take scheduling decisions at run-time that can improve performance or power consumption. We approach this problem by finding parametric throughput expressions for the maximum throughput of a subset of BPDF graphs. Finally, we provide an algorithm that calculates sufficient buffer sizes for the BPDF graph to operate at maximum throughput.
18

Modèles de calculs flot de données avec paramètres entiers et booléens. Modélisation - Analyses - Mise en oeuvre / Boolean Parametric Data Flow Modeling - Analyses - Implementation

Bempelis, Evangelos 26 February 2015 (has links)
Les applications de gestion de flux sont responsables de la majorité des calculs des systèmes embarqués (vidéo conférence, vision par ordinateur). Leurs exigences de haute performance rendent leur mise en œuvre parallèle nécessaire. Par conséquent, il est de plus en plus courant que les systèmes embarqués modernes incluent des processeurs multi-cœurs qui permettent un parallélisme massif. La mise en œuvre des applications de gestion de flux sur des multi-cœurs est difficile à cause de leur complexité, qui tend à augmenter, et de leurs exigences strictes à la fois qualitatives (robustesse, fiabilité) et quantitatives (débit, consommation d'énergie). Ceci est observé dans l'évolution de codecs vidéo qui ne cessent d'augmenter en complexité, tandis que leurs exigences de performance demeurent les mêmes. Les modèles de calcul (MdC) flot de données ont été développés pour faciliter la conception de ces applications qui sont typiquement composées de filtres qui échangent des flux de données via des liens de communication. Ces modèles fournissent une représentation intuitive des applications de gestion de flux, tout en exposant le parallélisme de tâches de l'application. En outre, ils fournissent des analyses statiques pour la vivacité et l'exécution en mémoire bornée. Cependant, les applications de gestion de flux modernes comportent des filtres qui échangent des quantités de données variables, et des liens de communication qui peuvent être activés / désactivés. Dans cette thèse, nous présentons un nouveau MdC flot de données, le Boolean Parametric Data Flow (BPDF), qui permet le paramétrage de la quantité de données échangées entre les filtres en utilisant des paramètres entiers et l'activation et la désactivation de liens de communication en utilisant des paramètres booléens. De cette manière, BPDF est capable de exprimer des applications plus complexes, comme les décodeurs vidéo modernes. Malgré l'augmentation de l'expressivité, les applications BPDF restent statiquement analysables pour la vivacité et l'exécution en mémoire bornée. Cependant, l'expressivité accrue complique grandement la mise en œuvre. Les paramètres entiers entraînent des dépendances de données de type paramétrique et les paramètres booléens peuvent désactiver des liens de communication et ainsi éliminer des dépendances de données. Pour cette raison, nous proposons un cadre d'ordonnancement qui produit des ordonnancements de type ``aussi tôt que possible'' (ASAP) pour un placement statique donné. Il utilise des contraintes d'ordonnancement, soit issues de l'application (dépendance de données) ou de l'utilisateur (optimisations d'ordonnancement). Les contraintes sont analysées pour la vivacité et, si possible, simplifiées. De cette façon, notre cadre permet une grande variété de politiques d'ordonnancement, tout en garantissant la vivacité de l'application. Enfin, le calcul du débit d'une application est important tant avant que pendant l'exécution. Il permet de vérifier que l'application satisfait ses exigences de performance et il permet de prendre des décisions d'ordonnancement à l'exécution qui peuvent améliorer la performance ou la consommation d'énergie. Nous traitons ce problème en trouvant des expressions paramétriques pour le débit maximum d'un sous-ensemble de BPDF. Enfin, nous proposons un algorithme qui calcule une taille des buffers suffisante pour que l'application BPDF ait un débit maximum. / Streaming applications are responsible for the majority of the computation load in many embedded systems (video conferencing, computer vision etc). Their high performance requirements make parallel implementations a necessity. Hence, more and more modern embedded systems include many-core processors that allow massive parallelism. Parallel implementation of streaming applications on many-core platforms is challenging because of their complexity, which tends to increase, and their strict requirements both qualitative (e.g., robustness, reliability) and quantitative (e.g., throughput, power consumption). This is observed in the evolution of video codecs that keep increasing in complexity, while their performance requirements remain the same or even increase. Data flow models of computation (MoCs) have been developed to facilitate the design process of such applications, which are typically composed of filters exchanging streams of data via communication links. Data flow MoCs provide an intuitive representation of streaming applications, while exposing the available parallelism of the application. Moreover, they provide static analyses for liveness and boundedness. However, modern streaming applications feature filters that exchange variable amounts of data, and communication links that are not always active. In this thesis, we present a new data flow MoC, the Boolean Parametric Data Flow (BPDF), that allows parametrization of the amount of data exchanged between the filters using integer parameters and the enabling and disabling of communication links using boolean parameters. In this way, BPDF is able to capture more complex streaming applications, like video decoders. Despite the increase in expressiveness, BPDF applications remain statically analyzable for liveness and boundedness. However, increased expressiveness greatly complicates implementation. Integer parameters result in parametric data dependencies and the boolean parameters disable communication links, effectively removing data dependencies. We propose a scheduling framework that facilitates the scheduling of BPDF applications. Our scheduling framework produces as soon as possible schedules for a given static mapping. It takes us input scheduling constraints that derive either from the application (data dependencies) or from the user (schedule optimizations). The constraints are analyzed for liveness and, if possible, simplified. In this way, our framework provides flexibility, while guaranteeing the liveness of the application. Finally, calculation of the throughput of an application is important both at compile-time and at run-time. It allows to verify at compile-time that the application meets its performance requirements and it allows to take scheduling decisions at run-time that can improve performance or power consumption. We approach this problem by finding parametric throughput expressions for the maximum throughput of a subset of BPDF graphs. Finally, we provide an algorithm that calculates sufficient buffer sizes for the BPDF graph to operate at maximum throughput.
19

Architectures parallèles reconfigurables pour le traitement vidéo temps-réel / Parallel reconfigurable hardware architectures for video processing applications

Ali, Karim Mohamed Abedallah 08 February 2018 (has links)
Les applications vidéo embarquées sont de plus en plus intégrées dans des systèmes de transport intelligents tels que les véhicules autonomes. De nombreux défis sont rencontrés par les concepteurs de ces applications, parmi lesquels : le développement des algorithmes complexes, la vérification et le test des différentes contraintes fonctionnelles et non-fonctionnelles, la nécessité d’automatiser le processus de conception pour augmenter la productivité, la conception d’une architecture matérielle adéquate pour exploiter le parallélisme inhérent et pour satisfaire la contrainte temps-réel, réduire la puissance consommée pour prolonger la durée de fonctionnement avant de recharger le véhicule, etc. Dans ce travail de thèse, nous avons utilisé les technologies FPGAs pour relever certains de ces défis et proposer des architectures matérielles reconfigurables dédiées pour des applications embarquées de traitement vidéo temps-réel. Premièrement, nous avons implémenté une architecture parallèle flexible avec deux contributions principales : (1) Nous avons proposé un modèle générique de distribution/collecte de pixels pour résoudre le problème de transfert de données à haut débit à travers le système. Les paramètres du modèle requis sont tout d’abord définis puis la génération de l’architecture a été automatisée pour minimiser le temps de développement. (2) Nous avons appliqué une technique d’ajustement de la fréquence pour réduire la consommation d’énergie. Nous avons dérivé les équations nécessaires pour calculer le niveau maximum de parallélisme ainsi que les équations utilisées pour calculer la taille des FIFO pour le passage d’un domaine de l’horloge à un autre. Au fur et à mesure que le nombre de cellules logiques sur une seule puce FPGAaugmente, passer à des niveaux d’abstraction plus élevés devient inévitable pour réduire la contrainte de « time-to-market » et augmenter la productivité des concepteurs. Pendant la phase de conception, l’espace de solutions architecturales présente un grand nombre d’alternatives avec des performances différentes en termes de temps d’exécution, ressources matérielles, consommation d’énergie, etc. Face à ce défi, nous avons développé l’outil ViPar avec deux contributions principales : (1) Un modèle empirique a été introduit pour estimer la consommation d’énergie basé sur l’utilisation du matériel (Slice et BRAM) et la fréquence de fonctionnement ; en plus de cela, nous avons dérivé les équations pour estimer les ressources matérielles et le temps d’exécution pour chaque alternative au cours de l’exploration de l’espace de conception. (2) En définissant les principales caractéristiques de l’architecture parallèle comme le niveau de parallélisme, le nombre de ports d’entrée/sortie, le modèle de distribution des pixels, ..., l’outil ViPar génère automatiquement l’architecture matérielle pour les solutions les plus pertinentes. Dans le cadre d’une collaboration industrielle avec NAVYA, nous avons utilisé l’outil ViPar pour implémenter une solution matérielle parallèle pour l’algorithme de stéréo matching « Multi-window Sum of Absolute Difference ». Dans cette implémentation, nous avons présenté un ensemble d’étapes pour modifier le code de description de haut niveau afin de l’adapter efficacement à l’implémentation matérielle. Nous avons également exploré l’espace de conception pour différentes alternatives en termes de performance, ressources matérielles, fréquence, et consommation d’énergie. Au cours de notre travail, les architectures matérielles ont été implémentées et testées expérimentalement sur la plateforme d’évaluation Xilinx Zynq ZC706. / Embedded video applications are now involved in sophisticated transportation systems like autonomous vehicles. Many challenges faced the designers to build those applications, among them: complex algorithms should be developed, verified and tested under restricted time-to-market constraints, the necessity for design automation tools to increase the design productivity, high computing rates are required to exploit the inherent parallelism to satisfy the real-time constraints, reducing the consumed power to extend the operating duration before recharging the vehicle, etc. In this thesis work, we used FPGA technologies to tackle some of these challenges to design parallel reconfigurable hardware architectures for embedded video streaming applications. First, we implemented a flexible parallel architecture with two main contributions: (1)We proposed a generic model for pixel distribution/collection to tackle the problem of the huge data transferring through the system. The required model parameters were defined then the architecture generation was automated to minimize the development time. (2) We applied frequency scaling as a technique for reducing power consumption. We derived the required equations for calculating the maximum level of parallelism as well as the ones used for calculating the depth of the inserted FIFOs for clock domain crossing. As the number of logic cells on a single FPGA chip increases, moving to higher abstraction design levels becomes inevitable to shorten the time-to-market constraint and to increase the design productivity. During the design phase, it is common to have a space of design alternatives that are different from each other regarding hardware utilization, power consumption and performance. We developed ViPar tool with two main contributions to tackle this problem: (1) An empirical model was introduced to estimate the power consumption based on the hardware utilization (Slice and BRAM) and the operating frequency. In addition to that, we derived the equations for estimating the hardware resources and the execution time for each point during the design space exploration. (2) By defining the main characteristics of the parallel architecture like parallelism level, the number of input/output ports, the pixel distribution pattern, etc. ViPar tool can automatically generate the parallel architecture for the selected designs for implementation. In the context of an industrial collaboration, we used high-level synthesis tools to implement a parallel hardware architecture for Multi-window Sum of Absolute Difference stereo matching algorithm. In this implementation, we presented a set of guiding steps to modify the high-level description code to fit efficiently for hardware implementation as well as we explored the design space for different alternatives in terms of hardware resources, performance, frequency and power consumption. During the thesis work, our designs were implemented and tested experimentally on Xilinx Zynq ZC706 (XC7Z045- FFG900) evaluation board.
20

A Novel Cloud Broker-based Resource Elasticity Management and Pricing for Big Data Streaming Applications

Runsewe, Olubisi A. 28 May 2019 (has links)
The pervasive availability of streaming data from various sources is driving todays’ enterprises to acquire low-latency big data streaming applications (BDSAs) for extracting useful information. In parallel, recent advances in technology have made it easier to collect, process and store these data streams in the cloud. For most enterprises, gaining insights from big data is immensely important for maintaining competitive advantage. However, majority of enterprises have difficulty managing the multitude of BDSAs and the complex issues cloud technologies present, giving rise to the incorporation of cloud service brokers (CSBs). Generally, the main objective of the CSB is to maintain the heterogeneous quality of service (QoS) of BDSAs while minimizing costs. To achieve this goal, the cloud, although with many desirable features, exhibits major challenges — resource prediction and resource allocation — for CSBs. First, most stream processing systems allocate a fixed amount of resources at runtime, which can lead to under- or over-provisioning as BDSA demands vary over time. Thus, obtaining optimal trade-off between QoS violation and cost requires accurate demand prediction methodology to prevent waste, degradation or shutdown of processing. Second, coordinating resource allocation and pricing decisions for self-interested BDSAs to achieve fairness and efficiency can be complex. This complexity is exacerbated with the recent introduction of containers. This dissertation addresses the cloud resource elasticity management issues for CSBs as follows: First, we provide two contributions to the resource prediction challenge; we propose a novel layered multi-dimensional hidden Markov model (LMD-HMM) framework for managing time-bounded BDSAs and a layered multi-dimensional hidden semi-Markov model (LMD-HSMM) to address unbounded BDSAs. Second, we present a container resource allocation mechanism (CRAM) for optimal workload distribution to meet the real-time demands of competing containerized BDSAs. We formulate the problem as an n-player non-cooperative game among a set of heterogeneous containerized BDSAs. Finally, we incorporate a dynamic incentive-compatible pricing scheme that coordinates the decisions of self-interested BDSAs to maximize the CSB’s surplus. Experimental results demonstrate the effectiveness of our approaches.

Page generated in 0.5791 seconds