Global ETD Search

11	Locality Conscious Scheduling Strategies for High Performance Data Analysis Applications Vydyanathan, Nagavijayalakshmi 20 August 2008 (has links) No description available. Computer Science scheduling resource management task-parallelism data-parallelism pipelined-parallelism
12	A runtime system for data-flow task programming on multicore architectures with accelerators / Uma ferramenta para programação com dependência de dados em arquiteturas multicore com aceleradores / Vers un support exécutif avec dépendance de données pour les architectures multicoeur avec des accélérateurs Lima, João Vicente Ferreira January 2014 (has links) Dans cette thèse , nous proposons d’étudier des questions sur le parallélism de tâche avec dépendance de données dans le cadre de machines multicoeur avec des accélérateurs. La solution proposée a été développée en utilisant l’interface de programmation haute niveau XKaapi du projet MOAIS de l’INRIA Rhône-Alpes. D’abord nous avons étudié des questions liés à une approche d’exécution totalement asyncrone et l’ordonnancement par vol de travail sur des architectures multi-GPU. Le vol de travail avec localité de données a montré des résultats significatifs, mais il ne prend pas en compte des différents ressources de calcul. Ensuite nous avons conçu une interface et une modèle de coût qui permettent d’écrire des politiques d’ordonnancement sur XKaapi. Finalement on a évalué XKaapi sur un coprocesseur Intel Xeon Phi en mode natif. Notre conclusion est double. D’abord nous avons montré que le modèle de programmation data-flow peut être efficace sur des accélérateurs tels que des GPUs ou des coprocesseurs Intel Xeon Phi. Ensuite, le support à des différents politiques d’ordonnancement est indispensable. Les modèles de coût permettent d’obtenir de performance significatifs sur des calculs très réguliers, tandis que le vol de travail permet de redistribuer la charge en cours d’exécution. / Esta tese investiga os desafios no uso de paralelismo de tarefas com dependências de dados em arquiteturas multi-CPU com aceleradores. Para tanto, o XKaapi, desenvolvido no grupo de pesquisa MOAIS (INRIA Rhône-Alpes), é a ferramenta de programação base deste trabalho. Em um primeiro momento, este trabalho propôs extensões ao XKaapi a fim de sobrepor transferência de dados com execução através de operações concorrentes em GPU, em conjunto com escalonamento por roubo de tarefas em multi-GPU. Os resultados experimentais sugerem que o suporte a asincronismo é importante à escalabilidade e desempenho em multi-GPU. Apesar da localidade de dados, o roubo de tarefas não pondera a capacidade de processamento das unidades de processamento disponíveis. Nós estudamos estratégias de escalonamento com predição de desempenho em tempo de execução através de modelos de custo de execução. Desenvolveu-se um framework sobre o XKaapi de escalonamento que proporciona a implementação de diferentes algoritmos de escalonamento. Esta tese também avaliou o XKaapi em coprocessodores Intel Xeon Phi para execução nativa. A conclusão desta tese é dupla. Primeiramente, nós concluímos que um modelo de programação com dependências de dados pode ser eficiente em aceleradores, tais como GPUs e coprocessadores Intel Xeon Phi. Não obstante, uma ferramenta de programação com suporte a diferentes estratégias de escalonamento é essencial. Modelos de custo podem ser usados no contexto de algoritmos paralelos regulares, enquanto que o roubo de tarefas poder reagir a desbalanceamentos em tempo de execução. / In this thesis, we propose to study the issues of task parallelism with data dependencies on multicore architectures with accelerators. We target those architectures with the XKaapi runtime system developed by the MOAIS team (INRIA Rhône-Alpes). We first studied the issues on multi-GPU architectures for asynchronous execution and scheduling. Work stealing with heuristics showed significant performance results, but did not consider the computing power of different resources. Next, we designed a scheduling framework and a performance model to support scheduling strategies over XKaapi runtime. Finally, we performed experimental evaluations over the Intel Xeon Phi coprocessor in native execution. Our conclusion is twofold. First we concluded that data-flow task programming can be efficient on accelerators, which may be GPUs or Intel Xeon Phi coprocessors. Second, the runtime support of different scheduling strategies is essential. Cost models provide significant performance results over very regular computations, while work stealing can react to imbalances at runtime. Programmation parallèle Accélérateur Parallélisme de tâche Dépendance de données Vol de travail Arquitetura : Computadores Processamento paralelo Parallel programming Accelerators Task parallelism Data flow dependencies Work stealing
13	A runtime system for data-flow task programming on multicore architectures with accelerators / Uma ferramenta para programação com dependência de dados em arquiteturas multicore com aceleradores / Vers un support exécutif avec dépendance de données pour les architectures multicoeur avec des accélérateurs Lima, João Vicente Ferreira January 2014 (has links) Dans cette thèse , nous proposons d’étudier des questions sur le parallélism de tâche avec dépendance de données dans le cadre de machines multicoeur avec des accélérateurs. La solution proposée a été développée en utilisant l’interface de programmation haute niveau XKaapi du projet MOAIS de l’INRIA Rhône-Alpes. D’abord nous avons étudié des questions liés à une approche d’exécution totalement asyncrone et l’ordonnancement par vol de travail sur des architectures multi-GPU. Le vol de travail avec localité de données a montré des résultats significatifs, mais il ne prend pas en compte des différents ressources de calcul. Ensuite nous avons conçu une interface et une modèle de coût qui permettent d’écrire des politiques d’ordonnancement sur XKaapi. Finalement on a évalué XKaapi sur un coprocesseur Intel Xeon Phi en mode natif. Notre conclusion est double. D’abord nous avons montré que le modèle de programmation data-flow peut être efficace sur des accélérateurs tels que des GPUs ou des coprocesseurs Intel Xeon Phi. Ensuite, le support à des différents politiques d’ordonnancement est indispensable. Les modèles de coût permettent d’obtenir de performance significatifs sur des calculs très réguliers, tandis que le vol de travail permet de redistribuer la charge en cours d’exécution. / Esta tese investiga os desafios no uso de paralelismo de tarefas com dependências de dados em arquiteturas multi-CPU com aceleradores. Para tanto, o XKaapi, desenvolvido no grupo de pesquisa MOAIS (INRIA Rhône-Alpes), é a ferramenta de programação base deste trabalho. Em um primeiro momento, este trabalho propôs extensões ao XKaapi a fim de sobrepor transferência de dados com execução através de operações concorrentes em GPU, em conjunto com escalonamento por roubo de tarefas em multi-GPU. Os resultados experimentais sugerem que o suporte a asincronismo é importante à escalabilidade e desempenho em multi-GPU. Apesar da localidade de dados, o roubo de tarefas não pondera a capacidade de processamento das unidades de processamento disponíveis. Nós estudamos estratégias de escalonamento com predição de desempenho em tempo de execução através de modelos de custo de execução. Desenvolveu-se um framework sobre o XKaapi de escalonamento que proporciona a implementação de diferentes algoritmos de escalonamento. Esta tese também avaliou o XKaapi em coprocessodores Intel Xeon Phi para execução nativa. A conclusão desta tese é dupla. Primeiramente, nós concluímos que um modelo de programação com dependências de dados pode ser eficiente em aceleradores, tais como GPUs e coprocessadores Intel Xeon Phi. Não obstante, uma ferramenta de programação com suporte a diferentes estratégias de escalonamento é essencial. Modelos de custo podem ser usados no contexto de algoritmos paralelos regulares, enquanto que o roubo de tarefas poder reagir a desbalanceamentos em tempo de execução. / In this thesis, we propose to study the issues of task parallelism with data dependencies on multicore architectures with accelerators. We target those architectures with the XKaapi runtime system developed by the MOAIS team (INRIA Rhône-Alpes). We first studied the issues on multi-GPU architectures for asynchronous execution and scheduling. Work stealing with heuristics showed significant performance results, but did not consider the computing power of different resources. Next, we designed a scheduling framework and a performance model to support scheduling strategies over XKaapi runtime. Finally, we performed experimental evaluations over the Intel Xeon Phi coprocessor in native execution. Our conclusion is twofold. First we concluded that data-flow task programming can be efficient on accelerators, which may be GPUs or Intel Xeon Phi coprocessors. Second, the runtime support of different scheduling strategies is essential. Cost models provide significant performance results over very regular computations, while work stealing can react to imbalances at runtime. Programmation parallèle Accélérateur Parallélisme de tâche Dépendance de données Vol de travail Arquitetura : Computadores Processamento paralelo Parallel programming Accelerators Task parallelism Data flow dependencies Work stealing
14	A runtime system for data-flow task programming on multicore architectures with accelerators / Uma ferramenta para programação com dependência de dados em arquiteturas multicore com aceleradores / Vers un support exécutif avec dépendance de données pour les architectures multicoeur avec des accélérateurs Lima, João Vicente Ferreira January 2014 (has links) Dans cette thèse , nous proposons d’étudier des questions sur le parallélism de tâche avec dépendance de données dans le cadre de machines multicoeur avec des accélérateurs. La solution proposée a été développée en utilisant l’interface de programmation haute niveau XKaapi du projet MOAIS de l’INRIA Rhône-Alpes. D’abord nous avons étudié des questions liés à une approche d’exécution totalement asyncrone et l’ordonnancement par vol de travail sur des architectures multi-GPU. Le vol de travail avec localité de données a montré des résultats significatifs, mais il ne prend pas en compte des différents ressources de calcul. Ensuite nous avons conçu une interface et une modèle de coût qui permettent d’écrire des politiques d’ordonnancement sur XKaapi. Finalement on a évalué XKaapi sur un coprocesseur Intel Xeon Phi en mode natif. Notre conclusion est double. D’abord nous avons montré que le modèle de programmation data-flow peut être efficace sur des accélérateurs tels que des GPUs ou des coprocesseurs Intel Xeon Phi. Ensuite, le support à des différents politiques d’ordonnancement est indispensable. Les modèles de coût permettent d’obtenir de performance significatifs sur des calculs très réguliers, tandis que le vol de travail permet de redistribuer la charge en cours d’exécution. / Esta tese investiga os desafios no uso de paralelismo de tarefas com dependências de dados em arquiteturas multi-CPU com aceleradores. Para tanto, o XKaapi, desenvolvido no grupo de pesquisa MOAIS (INRIA Rhône-Alpes), é a ferramenta de programação base deste trabalho. Em um primeiro momento, este trabalho propôs extensões ao XKaapi a fim de sobrepor transferência de dados com execução através de operações concorrentes em GPU, em conjunto com escalonamento por roubo de tarefas em multi-GPU. Os resultados experimentais sugerem que o suporte a asincronismo é importante à escalabilidade e desempenho em multi-GPU. Apesar da localidade de dados, o roubo de tarefas não pondera a capacidade de processamento das unidades de processamento disponíveis. Nós estudamos estratégias de escalonamento com predição de desempenho em tempo de execução através de modelos de custo de execução. Desenvolveu-se um framework sobre o XKaapi de escalonamento que proporciona a implementação de diferentes algoritmos de escalonamento. Esta tese também avaliou o XKaapi em coprocessodores Intel Xeon Phi para execução nativa. A conclusão desta tese é dupla. Primeiramente, nós concluímos que um modelo de programação com dependências de dados pode ser eficiente em aceleradores, tais como GPUs e coprocessadores Intel Xeon Phi. Não obstante, uma ferramenta de programação com suporte a diferentes estratégias de escalonamento é essencial. Modelos de custo podem ser usados no contexto de algoritmos paralelos regulares, enquanto que o roubo de tarefas poder reagir a desbalanceamentos em tempo de execução. / In this thesis, we propose to study the issues of task parallelism with data dependencies on multicore architectures with accelerators. We target those architectures with the XKaapi runtime system developed by the MOAIS team (INRIA Rhône-Alpes). We first studied the issues on multi-GPU architectures for asynchronous execution and scheduling. Work stealing with heuristics showed significant performance results, but did not consider the computing power of different resources. Next, we designed a scheduling framework and a performance model to support scheduling strategies over XKaapi runtime. Finally, we performed experimental evaluations over the Intel Xeon Phi coprocessor in native execution. Our conclusion is twofold. First we concluded that data-flow task programming can be efficient on accelerators, which may be GPUs or Intel Xeon Phi coprocessors. Second, the runtime support of different scheduling strategies is essential. Cost models provide significant performance results over very regular computations, while work stealing can react to imbalances at runtime. Programmation parallèle Accélérateur Parallélisme de tâche Dépendance de données Vol de travail Arquitetura : Computadores Processamento paralelo Parallel programming Accelerators Task parallelism Data flow dependencies Work stealing
15	Stream Computing on FPGAs Plavec, Franjo 01 September 2010 (has links) Field Programmable Gate Arrays (FPGAs) are programmable logic devices used for the implementation of a wide range of digital systems. In recent years, there has been an increasing interest in design methodologies that allow high-level design descriptions to be automatically implemented in FPGAs. This thesis describes the design and implementation of a novel compilation flow that implements circuits in FPGAs from a streaming programming language. The streaming language supported is called FPGA Brook, and is based on the existing Brook and GPU Brook languages, which target streaming multiprocessors and graphics processing units (GPUs), respectively. A streaming language is suitable for targeting FPGAs because it allows system designers to express applications in a way that exposes parallelism, which can then be exploited through parallel hardware implementation. FPGA Brook supports replication, which allows the system designer to trade-off area for performance, by specifying the parts of an application that should be implemented as multiple hardware units operating in parallel, to achieve desired application throughput. Hardware units are interconnected through FIFO buffers, which effectively utilize the small memory modules available in FPGAs. The FPGA Brook design flow uses a source-to-source compiler, and combines it with a commercial behavioural synthesis tool to generate hardware. The source-to-source compiler was developed as a part of this thesis and includes novel algorithms for implementation of complex reductions in FPGAs. The design flow is fully automated and presents a user-interface similar to traditional software compilers. A suite of benchmark applications was developed in FPGA Brook and implemented using our design flow. Experimental results show that applications implemented using our flow achieve much higher throughput than the Nios II soft processor implemented in the same FPGA device. Comparison to the commercial C2H compiler from Altera shows that while simple applications can be effectively implemented using the C2H compiler, complex applications achieve significantly better throughput when implemented by our system. Performance of many applications implemented using our design flow would scale further if a larger FPGA device were used. The thesis demonstrates that using an automated design flow to implement streaming applications in FPGAs is a promising methodology. Field Programmable Gate Array FPGA Streaming Behavioral synthesis Behavioural synthesis Data parallelism Task parallelism Replication Throughput Scalability C2H High-level synthesis Reduction Parallel reduction FIFO Strength reduction 0544 0984
16	Stream Computing on FPGAs Plavec, Franjo 01 September 2010 (has links) Field Programmable Gate Arrays (FPGAs) are programmable logic devices used for the implementation of a wide range of digital systems. In recent years, there has been an increasing interest in design methodologies that allow high-level design descriptions to be automatically implemented in FPGAs. This thesis describes the design and implementation of a novel compilation flow that implements circuits in FPGAs from a streaming programming language. The streaming language supported is called FPGA Brook, and is based on the existing Brook and GPU Brook languages, which target streaming multiprocessors and graphics processing units (GPUs), respectively. A streaming language is suitable for targeting FPGAs because it allows system designers to express applications in a way that exposes parallelism, which can then be exploited through parallel hardware implementation. FPGA Brook supports replication, which allows the system designer to trade-off area for performance, by specifying the parts of an application that should be implemented as multiple hardware units operating in parallel, to achieve desired application throughput. Hardware units are interconnected through FIFO buffers, which effectively utilize the small memory modules available in FPGAs. The FPGA Brook design flow uses a source-to-source compiler, and combines it with a commercial behavioural synthesis tool to generate hardware. The source-to-source compiler was developed as a part of this thesis and includes novel algorithms for implementation of complex reductions in FPGAs. The design flow is fully automated and presents a user-interface similar to traditional software compilers. A suite of benchmark applications was developed in FPGA Brook and implemented using our design flow. Experimental results show that applications implemented using our flow achieve much higher throughput than the Nios II soft processor implemented in the same FPGA device. Comparison to the commercial C2H compiler from Altera shows that while simple applications can be effectively implemented using the C2H compiler, complex applications achieve significantly better throughput when implemented by our system. Performance of many applications implemented using our design flow would scale further if a larger FPGA device were used. The thesis demonstrates that using an automated design flow to implement streaming applications in FPGAs is a promising methodology. Field Programmable Gate Array FPGA Streaming Behavioral synthesis Behavioural synthesis Data parallelism Task parallelism Replication Throughput Scalability C2H High-level synthesis Reduction Parallel reduction FIFO Strength reduction 0544 0984
17	A Runtime System for Data-Flow Task Programming on Multicore Architectures with Accelerators / Vers un support exécutif avec dépendance de données pour les architectures multicoeur avec des accélérateurs / Uma Ferramenta para Programação com Dependência de Dados em Arquiteturas Multicore com Aceleradores Lima, Joao Vicente Ferreira 05 May 2014 (has links) Dans cette thèse , nous proposons d’étudier des questions sur le parallélism de tâcheavec dépendance de données dans le cadre de machines multicoeur avec des accélérateurs.La solution proposée a été développée en utilisant l’interface de programmation hauteniveau XKaapi du projet MOAIS de l’INRIA Rhône-Alpes.D’abord nous avons étudié des questions liés à une approche d’exécution totalementasyncrone et l’ordonnancement par vol de travail sur des architectures multi-GPU. Le volde travail avec localité de données a montré des résultats significatifs, mais il ne prend pasen compte des différents ressources de calcul. Ensuite nous avons conçu une interface etune modèle de coût qui permettent d’écrire des politiques d’ordonnancement sur XKaapi.Finalement on a évalué XKaapi sur un coprocesseur Intel Xeon Phi en mode natif.Notre conclusion est double. D’abord nous avons montré que le modèle de programma-tion data-flow peut être efficace sur des accélérateurs tels que des GPUs ou des coproces-seurs Intel Xeon Phi. Ensuite, le support à des différents politiques d’ordonnancement estindispensable. Les modèles de coût permettent d’obtenir de performance significatifs surdes calculs très réguliers, tandis que le vol de travail permet de redistribuer la charge encours d’exécution. / In this thesis, we propose to study the issues of task parallelism with data dependencies onmulticore architectures with accelerators. We target those architectures with the XKaapiruntime system developed by the MOAIS team (INRIA Rhône-Alpes).We first studied the issues on multi-GPU architectures for asynchronous execution andscheduling. Work stealing with heuristics showed significant performance results, but didnot consider the computing power of different resources. Next, we designed a schedulingframework and a performance model to support scheduling strategies over XKaapi runtime.Finally, we performed experimental evaluations over the Intel Xeon Phi coprocessor innative execution.Our conclusion is twofold. First we concluded that data-flow task programming canbe efficient on accelerators, which may be GPUs or Intel Xeon Phi coprocessors. Second,the runtime support of different scheduling strategies is essential. Cost models providesignificant performance results over very regular computations, while work stealing canreact to imbalances at runtime. / Esta tese investiga os desafios no uso de paralelismo de tarefas com dependências dedados em arquiteturas multi-CPU com aceleradores. Para tanto, o XKaapi, desenvolvidono grupo de pesquisa MOAIS (INRIA Rhône-Alpes), é a ferramenta de programação basedeste trabalho.Em um primeiro momento, este trabalho propôs extensões ao XKaapi a fim de sobre-por transferência de dados com execução através de operações concorrentes em GPU, emconjunto com escalonamento por roubo de tarefas em multi-GPU. Os resultados experimen-tais sugerem que o suporte a asincronismo é importante à escalabilidade e desempenho emmulti-GPU. Apesar da localidade de dados, o roubo de tarefas não pondera a capacidadede processamento das unidades de processamento disponíveis. Nós estudamos estratégiasde escalonamento com predição de desempenho em tempo de execução através de modelosde custo de execução. Desenvolveu-se um framework sobre o XKaapi de escalonamentoque proporciona a implementação de diferentes algoritmos de escalonamento. Esta tesetambém avaliou o XKaapi em coprocessodores Intel Xeon Phi para execução nativa.A conclusão desta tese é dupla. Primeiramente, nós concluímos que um modelo deprogramação com dependências de dados pode ser eficiente em aceleradores, tais comoGPUs e coprocessadores Intel Xeon Phi. Não obstante, uma ferramenta de programaçãocom suporte a diferentes estratégias de escalonamento é essencial. Modelos de custo podemser usados no contexto de algoritmos paralelos regulares, enquanto que o roubo de tarefaspoder reagir a desbalanceamentos em tempo de execução. Programmation parallèle Accélérateur Parallélisme de tâche Dépendance de données Vol de travail Parallel programming Accelerators Task parallelism Data flow dependencies Work stealing Programação paralela Aceleradores Paralelismo de tarefas Dependência de dados Roubo de tarefas 004
18	Modèles de programmation des applications de traitement du signal et de l'image sur cluster parallèle et hétérogène / Programming models for signal and image processing on parallel and heterogeneous architectures Mansouri, Farouk 14 October 2015 (has links) Depuis une dizaine d'année, l'évolution des machines de calcul tend vers des architectures parallèles et hétérogènes. Composées de plusieurs nœuds connectés via un réseau incluant chacun des unités de traitement hétérogènes, ces grilles offrent de grandes performances. Pour programmer ces architectures, l'utilisateur doit s'appuyer sur des modèles de programmation comme MPI, OpenMP, CUDA. Toutefois, il est toujours difficile d'obtenir à la fois une bonne productivité du programmeur, qui passe par une abstraction des spécificités de l'architecture et performances. Dans cette thèse, nous proposons d'exploiter l'idée qu'un modèle de programmation spécifique à un domaine applicatif particulier permet de concilier ces deux objectifs antagonistes. En effet, en caractérisant une famille d'applications, il est possible d'identifier des abstractions de haut niveau permettant de les modéliser. Nous proposons deux modèles spécifiques au traitement du signal et de l'image sur cluster hétérogène. Le premier modèle est statique. Nous lui apportons une fonctionnalité de migration de tâches. Le second est dynamique, basé sur le support exécutif StarPU. Les deux modèles offrent d'une part un haut niveau d'abstraction en modélisant les applications de traitement du signal et de l'image sous forme de graphe de flot de données et d'autre part, ils permettent d'exploiter efficacement les différents niveaux de parallélisme tâche, données, graphe. Ces deux modèles sont validés par plusieurs implémentations et comparaisons incluant deux applications de traitement de l'image du monde réel sur cluster CPU-GPU. / Since a decade, computing systems evolved to parallel and heterogeneous architectures. Composed of several nodes connected via a network and including heterogeneous processing units, clusters achieve high performances. To program these architectures, the user must rely on programming models such as MPI, OpenMP or CUDA. However, it is still difficult to conciliate productivity provided by abstracting the architectural specificities, and performances. In this thesis, we exploit the idea that a programming model specific to a particular domain of application can achieve these antagonist goals. In fact, by characterizing a family of application, it is possible to identify high level abstractions to efficiently model them. We propose two models specific to the implementation of signal and image processing applications on heterogeneous clusters. The first model is static. We enrich it with a task migration feature. The second model is dynamic, based on the StarPU runtime. Both models offer firstly a high level of abstraction by modeling image and signal applications as a data flow graph and secondly they efficiently exploit task, data and graph parallelisms. We validate these models with different implementations and comparisons including two real-world applications of images processing on a CPU-GPU cluster. Modèle de programmation Modèle de calcul DFG Traitement de signal et d'images Parallel programming models Parallel and heterogeneous cluster DFG model of computation Task parallelism data parallelism Digital signal processing 620
19	Finite element modeling of electromagnetic radiation and induced heat transfer in the human body Kim, Kyungjoo 24 September 2013 (has links) This dissertation develops adaptive hp-Finite Element (FE) technology and a parallel sparse direct solver enabling the accurate modeling of the absorption of Electro-Magnetic (EM) energy in the human head. With a large and growing number of cell phone users, the adverse health effects of EM fields have raised public concerns. Most research that attempts to explain the relationship between exposure to EM fields and its harmful effects on the human body identifies temperature changes due to the EM energy as the dominant source of possible harm. The research presented here focuses on determining the temperature distribution within the human body exposed to EM fields with an emphasis on the human head. Major challenges in accurately determining the temperature changes lie in the dependence of EM material properties on the temperature. This leads to a formulation that couples the BioHeat Transfer (BHT) and Maxwell equations. The mathematical model is formed by the time-harmonic Maxwell equations weakly coupled with the transient BHT equation. This choice of equations reflects the relevant time scales. With a mobile device operating at a single frequency, EM fields arrive at a steady-state in the micro-second range. The heat sources induced by EM fields produce a transient temperature field converging to a steady-state distribution on a time scale ranging from seconds to minutes; this necessitates the transient formulation. Since the EM material properties depend upon the temperature, the equations are fully coupled; however, the coupling is realized weakly due to the different time scales for Maxwell and BHT equations. The BHT equation is discretized in time with a time step reflecting the thermal scales. After multiple time steps, the temperature field is used to determine the EM material properties and the time-harmonic Maxwell equations are solved. The resulting heat sources are recalculated and the process continued. Due to the weak coupling of the problems, the corresponding numerical models are established separately. The BHT equation is discretized with H¹ conforming elements, and Maxwell equations are discretized with H(curl) conforming elements. The complexity of the human head geometry naturally leads to the use of tetrahedral elements, which are commonly employed by unstructured mesh generators. The EM domain, including the head and a radiating source, is terminated by a Perfectly Matched Layer (PML), which is discretized with prismatic elements. The use of high order elements of different shapes and discretization types has motivated the development of a general 3D hp-FE code. In this work, we present new generic data structures and algorithms to perform adaptive local refinements on a hybrid mesh composed of different shaped elements. A variety of isotropic and anisotropic refinements that preserve conformity of discretization are designed. The refinement algorithms support one- irregular meshes with the constrained approximation technique. The algorithms are experimentally proven to be deadlock free. A second contribution of this dissertation lies with a new parallel sparse direct solver that targets linear systems arising from hp-FE methods. The new solver interfaces to the hierarchy of a locally refined mesh to build an elimination ordering for the factorization that reflects the h-refinements. By following mesh refinements, not only the computation of element matrices but also their factorization is restricted to new elements and their ancestors. The solver is parallelized by exploiting two-level task parallelism: tasks are first generated from a parallel post-order tree traversal on the assembly tree; next, those tasks are further refined by using algorithms-by-blocks to gain fine-grained parallelism. The resulting fine-grained tasks are asynchronously executed after their dependencies are analyzed. This approach effectively reduces scheduling overhead and increases flexibility to handle irregular tasks. The solver outperforms the conventional general sparse direct solver for a class of problems formulated by high order FEs. Finally, numerical results for a 3D coupled BHT with Maxwell equations are presented. The solutions of this Maxwell code have been verified using the analytic Mie series solutions. Starting with simple spherical geometry, parametric studies are conducted on realistic head models for a typical frequency band (900 MHz) of mobile phones. / text hp-FEM Hybrid mesh Local mesh refinement algorithms Electromagnetics Specific Absorption Rate Dielectric heating Gaussian elimination Directed Acyclic Graph Direct method LU Multi-core Multi-frontal OpenMP Sparse matrix Supernodes Task parallelism Unassembled HyperMatrix GPU Dense linear algebra Algorithms-by-blocks Heterogeneous architectures BLAS

Search results