Spelling suggestions: "subject:"multiprocessor lemsystems"" "subject:"multiprocessor atemsystems""
11 |
Extending FreeRTOS to support dynamic and distributed task mapping in multiprocessor systems / Extensão do FreeRTOS para Suporte ao mapeamento dinâmico e distribuído de tarefas em sistemas multiprocessadosAbich, Geancarlo January 2017 (has links)
Sistemas de Multiprocessados Embarcados são uma realidade, tanto no setor da indústria e quanto no setor acadêmico. Esses dispositivos oferecem capacidades de processamento paralelo objetivando cobrir requisitos cada vez maiores de aplicações complexas. A carga de trabalho subjacente das aplicações é suscetível a variação em tempo de execução o que, se não for tratada adequadamente, pode levar a degradação de eficiência em desempenho e energia. O aumento contínuo da complexidade da carga de trabalho das aplicações, bem como do tamanho dos sistemas multiprocessados emergentes, requer soluções de mapeamento dinâmicas e distribuídas. A maioria das técnicas de mapeamento propostas são implementações personalizadas, considerando um sistema operacional interno desenvolvido para uma arquitetura de processador específica. Essa prática restringe sua aplicação em outras plataformas, levando a um design extra, revalidação e, consequentemente, um custo oculto que pode ser um tanto quanto alto. Neste cenário, esta dissertação propõe a extensão do FreeRTOS para suportar mapeamento dinâmico e distribuído de tarefas em sistemas multiprocessados. O FreeRTOS tem portabilidade para mais de 30 arquiteturas de processadores embarcados, aumentando a portabilidade de software e reduzindo o tempo de desenvolvimento. A extensão proposta utiliza técnicas de mapeamento que permitem ao FreeRTOS atender a altas demandas de mapeamento de aplicações em tempo de execução. Outra contribuição deste trabalho é o desenvolvimento de um framework que permite a exploração de grandes sistemas fornecendo, simultaneamente, resultados para depuração. O framework proposto possibilita a geração automática de plataformas multiprocessadas considerando seu tamanho, a arquitetura do processador e um conjunto de aplicações. A descrição da plataforma resultante é altamente escalável permitindo extração de dados em tempo de execução e alta depuração. Estas características permitiram validar a extensão do FreeRTOS proposta em mais de uma arquitetura de processador da família ARM Cortex-M. Os casos de teste foram executados em plataformas de grande escala e em diferentes níveis de abstração com casos de mais de 120 aplicações incorporando mais de 600 tarefas processadas. Os resultados mostram que a extensão proposta apresenta resultados melhores ou iguais à literatura. / Embedded Multiprocessor systems are a reality, in both industry and academia sectors. Such devices offer parallel processing capabilities, aiming at covering the increasing requirements of complex applications. Underlying application workloads are susceptible to variation at runtime, which if not properly handled, may lead to the performance and power efficiency degradation. The continuous increase in the complexity of application workload and the size of emerging multiprocessor systems, calls for dynamic and distributed mapping solutions. The majority of the promoted mapping techniques are bespoke implementations, which consider an in-house operating system developed to a particular processor architecture. This practice restricts its adoption in other platforms, leading to extra design time, re-validation and, consequentially, a hidden cost that may well be quite high. In this scenario, this dissertation proposes a FreeRTOS extension that integrates the support to dynamic and distributed tasks mapping in multiprocessor systems. FreeRTOS is portable to more than 30 embedded processors architectures, increasing software portability and reducing development time. The proposed extension employs mapping techniques allowing FreeRTOS for handle high demands of application mapping in runtime. Another contribution of this work is the development of a framework, which enables the exploration of large systems while providing debugging facilities. The proposed framework provides the automatic generation of multiprocessor platforms, considering parameters of size, processor architecture, and an application set. The resulting platform description is high scalable while allows runtime data extraction and high debugging. These features allowed to validate the proposed FreeRTOS extension in more than one processor architecture from ARM Cortex-M family. Test cases were executed on large-scale platforms and at different levels of abstraction with cases of more than 120 applications incorporating more than 600 tasks processed. The results show that the proposed extension presents better or equal results to the literature.
|
12 |
Predictable Real-Time Applications on Multiprocessor Systems-on-ChipRosén, Jakob January 2011 (has links)
Being predictable with respect to time is, by definition, a fundamental requirement for any real-time system. Modern multiprocessor systems impose a challenge in this context, due to resource sharing conflicts causing memory transfers to become unpredictable. In this thesis, we present a framework for achieving predictability for real-time applications running on multiprocessor system-on-chip platforms. Using a TDMA bus, worst-case execution time analysis and scheduling are done simultaneously. Since the worst-case execution times are directly dependent on the bus schedule, bus access design is of special importance. Therefore, we provide an efficient algorithm for generating bus schedules, resulting in a minimized worst-case global delay. We also present a new approach considering the average-case execution time in a predictable context. Optimization techniques for improving the average-case execution time of tasks, for which predictability with respect to time is not required, have been investigated for a long time in many different contexts. However, this has traditionally been done without paying attention to the worst-case execution time. For predictable real-time applications, on the other hand, the focus has been solely on worst-case execution time optimization, ignoring how this affects the execution time in the average case. In this thesis, we show that having a good average-case global delay can be important also for real-time applications, for which predictability is required. Furthermore, for real-time applications running on multiprocessor systems-on-chip, we present a technique for optimizing for the average case and the worst case simultaneously, allowing for a good average case execution time while still keeping the worst case as small as possible. The proposed solutions in this thesis have been validated by extensive experiments. The results demonstrate the efficiency and importance of the presented techniques.
|
13 |
Exploration d'architectures et allocation/affectation mémoire dans les systèmes multiprocesseurs mono puce = Architectures exploration and memory allocation/assignment in multiprocessor SoCMeftali, S. 06 September 2002 (has links) (PDF)
Les dernières années ont connu une grande évolution dans la technologie de fabrication des circuits intégrés. Ces derniers sont de plus en plus complexes. Ils intègrent des parties dites logicielles (processeurs + programmes) et des parties matérielles dédiées ou spécifiques de calcul ou de mémorisation. <br />De nombreuses applications dans les domaines du multimédia et des télécommunications sont apparues. Elles nécessitent l'intégration de mémoires de différents types et tailles dans ces modèles d'architectures multiprocesseurs. Dans ces applications embarquées, les performances du système sont étroitement liées à celles de la partie mémoire. Celle-ci occupe plus de 90% de la surface du système, et la consommation en énergie ainsi que les performances temporelles du système sont essentiellement dues au stockage et à l'échange de données entre les différents composants. <br />Avec cette présence croissante de la mémoire dans les systèmes monopuce, on note de nos jours l'absence d'une méthodologie systématique et optimisée pour la conception de tels systèmes avec une architecture mémoire spécifique. <br />Nous proposons dans cette thèse un flot de conception d'une architecture mémoire spécifique pour les systèmes monopuce. L'architecture mémoire est obtenue avec une méthode exacte basée sur un modèle de programmation linéaire en nombres entiers. Ce modèle permet d'obtenir une architecture mémoire distribuée partagée optimale pour l'application, minimisant le coût global des accès aux données partagées et le coût de la mémoire. On réalise ensuite automatiquement les transformations de l'architecture et du code de l'application en fonction de l'architecture mémoire choisie. Cette nouvelle spécification système (architecture + code applicatif) reste simulable.<br />La faisabilité et les performances de ce flot ont été testées sur l'application du VDSL.
|
14 |
Dedicated Hardware Context-Switch Services for Real-Time Multiprocessor SystemsAllard, Yannick 07 November 2017 (has links) (PDF)
Computers are widely present in our daily life and are used in critical applic-ations like cars, planes, pacemakers. Those real-time systems are nowadaysbased on processors which have an increasing complexity and have specifichardware services designed to reduce task preemption and migration over-heads. However using those services can add unpredictable overheads whenthe system has to switch from one task to another in some cases.This document screens existing solutions used in commonly availableprocessors to ease preemption and migration to highlight their strengths andweaknesses. A new hardware service is proposed to speed up task switchingat the L1 cache level, to reduce context switch overheads and to improvesystem predictability.The solution presented is based on stacking several identical cachememories at the L1 level. Each layer is able to save and restore its completestate independently to/from the main memory. One layer can be used forthe active task running on the processor while another layers can be restoredor saved concurrently. The active task can remain in execution until thepreempting task is ready in another layer after restoration from the mainmemory. The context switch between tasks can then be performed in avery short time by switching to the other layer which is now ready to runthe preempting task. Furthermore, the task will be resumed with the exactL1 cache memory state as saved earlier after the previous preemption. Theprevious task state can be sent back to the main memory for future use.Using this mechanism can lead to minimise the time required for migrationsand preemptions and consequently lower overheads and limit cache missesdue to preemptions and usually considered in the cache migration andpreemption delays. Isolation between tasks is also provided as they areexecuted from a dedicated layer.Both uniprocessor and multiprocessor designs are presented along withimplications on the real-time theory induced by the use of this hardware ser-vice. An implementation of the system is characterized and results show im-provements to the maximum and average execution time of a set of varioustasks: When the same size is used for the baseline cache and HwCS layers,94% of the tasks have a better execution time (up to 67%) and 80% have a bet-ter Worst Case Execution Time (WCET). 80% of the tasks are more predictableand the remaining 20% still have a better execution time. When we split thebaseline cache size among layers of the HwCS, measurements show that 75%of the tasks have a better execution time (up to 67%) leading to 50% of thetasks having a better WCET. Only 6% of the tasks suffer from worse executiontime and worse predictability while 75% of the tasks remain more predictablewhen using the HwCS compared to the baseline cache. / Les ordinateurs ont envahi notre quotidien et sont de plus en plus souventutilisés pour remplir des missions critiques. Ces systèmes temps réel sontbasés sur des processeurs dont la complexité augmente sans cesse. Des ser-vices matériels spécifiques permettent de réduire les coûts de préemption etmigration. Malheureusement, ces services ajoutent des temps morts lorsquele système doit passer d’une tâche à une autre.Ce document expose les solutions actuelles utilisées dans les processeurscourants pour mettre en lumière leurs qualités et défauts. Un nouveau ser-vice matériel (HwCS) est proposé afin d’accélérer le changement de tâches aupremier niveau de mémoire (L1) et de réduire ainsi les temps morts dus auxchangements de contextes tout en améliorant la prédictibilité du système.Bien que cette thèse se concentre sur le cache L1, le concept développépeut également s’appliquer aux autres niveaux de mémoire ainsi qu’àtout bloc dépendant du contexte. La solution présentée se base sur unempilement de caches identiques au premier niveau. Chaque couche del’empilement est capable de sauvegarder ou recharger son état vers/depuisla mémoire principale du système en toute autonomie. Une couche peutêtre utilisée par la tâche active pendant qu’une autre peut sauvegarder ourestaurer l’état d’une autre tâche. La tâche active peut ainsi poursuivre sonexécution en attendant que la tâche suivante soit rechargée. Le changementde contexte entre la tâche active et la tâche suivante peut alors avoir lieu enun temps très court. De plus, la tâche reprendra son exécution sur un cacheL1 dont l’état sera identique à celui au moment où elle a été interrompueprécédemment. L’état du cache de la tâche désormais inactive peut êtresauvegardé dans la mémoire principale en vue d’une utilisation ultérieure.Ce mécanisme permet de réduire au strict minimum le temps de calculperdu à cause des préemptions et migrations, les temps de sauvegarde et derechargement de la L1 n’ayant plus d’influence sur l’exécution des tâches. Deplus, chaque niveau étant dédié à une tâche, les interférences entre tâchessont réduites.Les propriétés ainsi que les implications sur les aspects temps réelsthéoriques sont présentées pour des systèmes mono et multiprocesseurs.Une implémentation d’un système uniprocesseur incluant ce servicematériel et sa caractérisation par rapport à l’exécution d’un set de tâchessont également présentées ainsi que les bénéfices apportés par le HwCS:Lorsque les couches du HwCS ont la même taille que le cache de base, 94%des tâches ont un meilleur temps d’exécution (jusqu’à 67%) et 80% ont unmeilleur pire temps d’exécution (WCET). 80% des tâches deviennent plusprédictibles et les 20% restants bénéficient néanmoins d’un meilleur WCET.Toutefois, si la taille du cache est partagée entre les couches du HwCS, lesmesures montrent que 75% des tâches ont un meilleur temps d’exécution,impliquant un meilleur WCET pour la moitié des tâches du système. Seule-ment 6% des tâches voient leur WCET augmenter et leur prédictibilitédiminuer tandis que 75% des tâches améliorent leur prédictibilité grâce auHwCS. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
|
15 |
Bounds For Scheduling In Non-Identical Uniform Multiprocessor SystemsDarera, Vivek N 06 1900 (has links)
With multiprocessors and multicore processors becoming ubiquitous, focus has shifted from research on uniprocessors to that on multiprocessors. Results derived for the uniprocessor case unfortunately do not always directly extend to the multiprocessor case in a straightforward manner. This necessitates a paradigm shift in the approach used to design and analyse the behaviour of such processors. In the case of Real-time systems, that is, systems
characterised by explicit timing constraints, analysis and performance guarantees are more important, as failure is unacceptable. Scheduling algorithms used in Real-time systems have to be carefully designed as the performance of the system depends critically on them. Efficient tests for determining if a set of tasks can be feasibly scheduled on such a computing
system using a particular scheduling algorithm thus assumes importance. Traditionally, the ‘task utilization’ parameter has been used for devising such tests. Utilization based tests for
Earliest Deadline First(EDF) and Rate Monotonic(RM) scheduling algorithms are known
and are well understood for uniprocessor systems. In our work, we derive limits on similar bounds for the multiprocessor case. Our work diners from previous literature in that we explore the case when the individual processors constituting the multiprocessor need not be identical. Each processor in such a system is characterised by a capacity, or speed, and the time taken by a task to execute on a processor is inversely proportional to its speed. Such instances may arise during system upgradation, when faster processors may be added to the
system, making it a non-identical multiprocessor, or during processor design, when the different cores on the chip may have different processing power to handle dynamic workloads. We derive results for the partitioned paradigm of multiprocessor scheduling, that is, when tasks are partitioned among the processors, and interprocessor migration after a part of execution is completed is not allowed. Results are derived for both fixed priority algorithms(RM)and dynamic priority algorithms (EDF) used on individual processors. A maximum and minimum limit on the bounds for a ‘greedy’ class of algorithms are established, since the actual value of the bound depends on the algorithm that allocates the tasks. We also derive the utilization bound of an algorithm whose bound is close to the upper limit in both
cases. We find that an expression for the utilization bound can be obtained when EDF is
used as the uniprocessor scheduling algorithm, but when RM is the uniprocessor scheduling algorithm,an O(mn) algorithm is required to find the utilization bound, where m is the number of tasks in the system and n is the number of processors. Knowledge of such bounds allows us to carry out very fast schedulability tests, although we have the limitation that the tests are sufficient but not necessary to ensure schedulability. We also compare the value of the bounds with those achievable in ‘equivalent’ identical multiprocessor systems and find that the performance guarantees provided by the non-identical multiprocessor system are far higher than those offered by the equivalent identical system.
|
16 |
Securing Multiprocessor Systems-on-ChipBiswas, Arnab Kumar 16 August 2016 (has links) (PDF)
MHRD PhD scholarship / With Multiprocessor Systems-on-Chips (MPSoCs) pervading our lives, security issues are emerging as a serious problem and attacks against these systems are becoming more critical and sophisticated. We have designed and implemented different hardware based solutions to ensure security of an MPSoC. Security assisting modules can be implemented at different abstraction
levels of an MPSoC design. We propose solutions both at circuit level and system level of abstractions. At the VLSI circuit level abstraction, we consider the problem of presence of noise voltage in input signal coming from outside world. This noise voltage disturbs the normal circuit operation inside a chip causing false logic reception. If the disturbance is caused
intentionally the security of a chip may be compromised causing glitch/transient attack. We propose an input receiver with hysteresis characteristic that can work at voltage levels between 0.9V and 5V. The circuit can protect the MPSoC from glitch/transient attack. At the system level, we propose solutions targeting Network-on-Chip (NoC) as the on-chip communication medium. We survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC enabled MPSoC. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple Trusted Execution Environments (TEEs). Software attacks, the most common type of attacks, mainly exploit vulnerabilities like buffer overflow. This is possible if proper access control to memory is absent in the system. We propose four hardware based mechanisms to implement Role Based Access Control (RBAC) model in NoC based MPSoC.
|
17 |
Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric ApproachRettkowski, Jens 12 July 2022 (has links)
A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements.
This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced.
As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis.
The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated.
The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems.
|
18 |
High Level Design and Control of Adaptive Multiprocessor Systems-on-ChipAn, Xin 16 October 2013 (has links) (PDF)
La conception de systèmes embarqués modernes est de plus en plus complexe, car plus de fonctionnalités sont intégrées dans ces systèmes. En même temps, afin de répondre aux exigences de calcul tout en conservant une consommation d'énergie de faible niveau, MPSoCs sont apparus comme les principales solutions pour tels systèmes embarqués. En outre, les systèmes embarqués sont de plus en plus adaptatifs, comme l'adaptabilité peut apporter un certain nombre d'avantages, tels que la flexibilité du logiciel et l'efficacité énergétique. Cette thèse vise la conception sécuritaire de ces MPSoCs adaptatifs. Tout d'abord, chaque configuration de système doit être analysée en ce qui concerne ses propriétés fonctionnelles et non fonctionnelles. Nous présentons un cadre abstraite de conception et d'analyse qui permet des décisions d'implémentation rapide et rentable. Ce cadre est conçu comme un support de raisonnement intermédiaire pour les environnements de co-conception de logiciel / matériel au niveau de système. Il peut élaguer l'espace de conception à sa plus grande portée, et identifier les candidats de solutions de conception de manière rapide et efficace. Dans ce cadre, nous utilisons un codage basé sur l'horloge abstraite pour modéliser les comportements du système. Différents scénarios d'applications de mapping et de planification sur MPSoCs sont analysés via les traces d'horloge qui représentent les simulations du système. Les propriétés d'intérêt sont l'exactitude du comportement fonctionnel, la performance temporelle et la consommation d'énergie. Deuxièmement, la gestion de la reconfiguration de MPSoCs adaptatifs doit être abordée. Nous sommes particulièrement intéressés par les MPSoCs implémentés sur des architectures reconfigurables (ex. FPGAs) qui offrent une bonne flexibilité et une efficacité de calcul pour les MPSoCs adaptatifs. Nous proposons un cadre général de conception basé sur la technique de la synthèse de contrôleurs discrets (DCS) pour résoudre ce problème. L'avantage principal de cette technique est qu'elle permet une synthèse d'un contrôleur automatique selon une spécification des objectifs de contrôle. Dans ce cadre, le comportement de reconfiguration du système est modélisé en termes d'automates synchrones en parallèle. Le problème de calcul de la gestion reconfiguration selon de multiples objectifs concernant, par exemple, les usages des ressources, la performance et la consommation d'énergie, est codé comme un problème de DCS. Le langage de programmation BZR existant et l'outil Sigali sont employés pour effectuer DCS et générer un contrôleur qui satisfait aux exigences du système. Finalement, nous étudions deux façons différentes de combiner les deux cadres de conception proposées pour MPSoCs adaptatifs. Tout d'abord, ils sont combinés pour construire un flot de conception complet pour MPSoCs adaptatifs. Deuxièmement, ils sont combinés pour présenter la façon dont le manager run-time calculé par le second cadre peut être intégré dans le premier cadre afin de réaliser des simulations et des analyses combinées de MPSoCs adaptatifs.
|
19 |
ESCALONAMENTO DE TAREFAS E FLUXOS DE COMUNICAÇÃO PARA SISTEMAS SEMI-PARTICIONADOS EM ARQUITETURAS NOC / SEMI-PARTITIONED SCHEDULING OF TASKS AND COMMUNICATION FLOWS ON NOC ARCHTECTURESBonilha, Iaê Santos 24 March 2014 (has links)
Despiste the fact that many scheduling models teoretically capable of high system resource
utilization were proposed with the development of the real-time system, the industry still uses
the first scheduling model proposed for multi-processor real-time systems, the partitioned
scheduling model. This scheduling model can guarantee scheduling of task sets up to around
69% processor utilization, which falls pale in comparison to recent scheduling models that
can guarantee scheduling up to 97% processor utilization. The motive behind the utilization of
the partitioned scheduling as industrial model is the amount of studies made on this model
and the development of scheduling analysis capable of providing temporal guarantees for this
model on a real system environment. Recent scheduling models, like semi-partitioned
scheduling, offer the possibility of a higher system resource utilization, it still lack studies and
scheduling analysis capable of provide temporal guarantees under a real environment. The
current scheduling analysis for most of the more recent models take advantage of a series of
abstractions, failing to provide guarantees under real circumstances. This papers primary
objective is to produce a new scheduling analysis for semi-partitioned scheduling, capable of
achieving temporal guarantees taking some of the previously abstracted factors, like task
communication and the impact f task migration on its communications flows, approximating
the scheduling model to real environmental conditions. With the development of such
analysis preliminary studies were made on heuristic task mapping algorithms for semipartitioned
systems. / Com a popularização de sistemas multi-processador, surgiu uma série de propostas de
modelos de escalonamento, na área de sistemas de tempo real que, teoricamente, são capazes
de obter um alto aproveitamento dos recursos do sistema. Entretanto, o modelo de
escalonamento mais adotado continua sendo um dos primeiros modelos de escalonamento
propostos na área, o modelo de escalonamento particionado. O modelo de escalonamento
particionado só pode garantir o escalonamento de conjuntos com até cerca de 69% de
utilização de processador, sendo limitado se comparado com garantias de escalonamento de
até 97% de utilização de modelos mais recentes. O motivo pelo qual o escalonamento
particionado continua sendo utilizado é a grande concentração de estudos a respeito do
modelo e o desenvolvimento de análises de escalonamento capazes de garantir o
escalonamento do modelo em condições reais do sistema. Modelos mais recentes, como o
escalonamento semi-particionado, apresentam uma possibilidade de um maior aproveitamento
do sistema, porém, ainda possuem estudos limitados e não dispõe de análises de
escalonamento capazes de prover garantias temporais para o sistema em condições reais,
devido à presença de diversas abstrações no modelo. Neste sentido, este trabalho foca em
arquiteturas Network-on-Chip que apresentam comunicação explícita, abstraída nos trabalhos
encontrados na literatura. Este trabalho tem como objetivo primário o desenvolvimento de
uma análise de escalonamento capaz de prover garantias temporais para o modelo de
escalonamento semi-particionado levando em consideração fatores previamente abstraídos,
como a necessidade de comunicação entre tarefas e o impacto da migração das tarefas nos
seus fluxos de comunicação, aproximando o modelo da realidade. O desenvolvimento de tal
análise possibilita o estudo preliminar de algoritmos heurísticos de mapeamento de tarefas,
capazes de mapear conjuntos de tarefas levando em consideração migrações de tarefas e
comunicação entre tarefas em um modelo de escalonamento semi-particionado.
|
20 |
Frequent itemset mining on multiprocessor systemsSchlegel, Benjamin 08 May 2014 (has links) (PDF)
Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism.
In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined.
For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets.
|
Page generated in 0.0836 seconds