Global ETD Search

11	A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms / Um sistema de escalonamento dinâmico e tuning em tempo de execução para plataformas desktop heterogêneas de múltiplos núcleos Binotto, Alécio Pedro Delazari January 2011 (has links) Atualmente, o computador pessoal (PC) moderno poder ser considerado como um cluster heterogênedo de um nodo, o qual processa simultâneamente inúmeras tarefas provenientes das aplicações. O PC pode ser composto por Unidades de Processamento (PUs) assimétricas, como a Unidade Central de Processamento (CPU), composta de múltiplos núcleos, a Unidade de Processamento Gráfico (GPU), composta por inúmeros núcleos e que tem sido um dos principais co-processadores que contribuiram para a computação de alto desempenho em PCs, entre outras. Neste sentido, uma plataforma de execução heterogênea é formada em um PC para efetuar cálculos intensivos em um grande número de dados. Na perspectiva desta tese, a distribuição da carga de trabalho de uma aplicação nas PUs é um fator importante para melhorar o desempenho das aplicações e explorar tal heterogeneidade. Esta questão apresenta desafios uma vez que o custo de execução de uma tarefa de alto nível em uma PU é não-determinístico e pode ser afetado por uma série de parâmetros não conhecidos a priori, como o tamanho do domínio do problema e a precisão da solução, entre outros. Nesse escopo, esta pesquisa de doutorado apresenta um sistema sensível ao contexto e de adaptação em tempo de execução com base em um compromisso entre a redução do tempo de execução das aplicações - devido a um escalonamento dinâmico adequado de tarefas de alto nível - e o custo de computação do próprio escalonamento aplicados em uma plataforma composta de CPU e GPU. Esta abordagem combina um modelo para um primeiro escalonamento baseado em perfis de desempenho adquiridos em préprocessamento com um modelo online, o qual mantém o controle do tempo de execução real de novas tarefas e escalona dinâmicamente e de modo eficaz novas instâncias das tarefas de alto nível em uma plataforma de execução composta de CPU e de GPU. Para isso, é proposto um conjunto de heurísticas para escalonar tarefas em uma CPU e uma GPU e uma estratégia genérica e eficiente de escalonamento que considera várias unidades de processamento. A abordagem proposta é aplicada em um estudo de caso utilizando uma plataforma de execução composta por CPU e GPU para computação de métodos iterativos focados na solução de Sistemas de Equações Lineares que se utilizam de um cálculo de stencil especialmente concebido para explorar as características das GPUs modernas. A solução utiliza o número de incógnitas como o principal parâmetro para a decisão de escalonamento. Ao escalonar tarefas para a CPU e para a GPU, um ganho de 21,77% em desempenho é obtido em comparação com o escalonamento estático de todas as tarefas para a GPU (o qual é utilizado por modelos de programação atuais, como OpenCL e CUDA para Nvidia) com um erro de escalonamento de apenas 0,25% em relação à combinação exaustiva. / A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) - which have become one of the main co-processors that contributed towards high performance computing - and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications - due to appropriate dynamic scheduling of high-level tasks - and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search. Processamento paralelo Microeletrônica Processamento : Imagem Processamento : Alto desempenho High-performance computing Scheduling Dynamic load-balancing Heterogenous systems Graphics processors Solvers for systems of linear equations
12	Entwurf einer fehlerüberwachten Modellreduktion basierend auf Krylov-Unterraumverfahren und Anwendung auf ein strukturmechanisches Modell / Implementation of an error-controlled model reduction based on Krylov-subspace methods and application to a mechanical model Bernstein, David 17 October 2014 (has links) (PDF) Die FEM-MKS-Kopplung erfordert Modellordnungsreduktions-Verfahren, die mit kleiner reduzierter Systemdimension das Übertragungsverhalten mechanischer Strukturen abbilden. Rationale Krylov-Unterraum-Verfahren, basierend auf dem Arnoldi-Algorithmen, ermöglichen solche Abbildungen in frei wählbaren, breiten Frequenzbereichen. Ziel ist der Entwurf einer fehlerüberwachten Modelreduktion auf Basis von Krylov-Unterraumverfahren und Anwendung auf ein strukturmechanisches Model. Auf Grundlage der Software MORPACK wird eine Arnoldi-Funktion erster Ordnung um interpolativen Startvektor, Eliminierung der Starrkörperbewegung und Reorthogonalisierung erweitert. Diese Operationen beinhaltend, wird ein rationales, interpolatives SOAR-Verfahren entwickelt. Ein rationales Block-SOAR-Verfahren erweist sich im Vergleich als unterlegen. Es wird interpolative Gleichwichtung verwendet. Das Arnoldi-Verfahren zeichnet kleiner Berechnungsaufwand aus. Das rationale, interpolative SOAR liefert kleinere reduzierte Systemdimensionen für gleichen abgebildeten Frequenzbereich. Die Funktionen werden auf Rahmen-, Getriebegehäuse- und Treibsatzwellen-Modelle angewendet. Zur Fehlerbewertung wird eigenfrequenzbasiert ein H2-Integrationsbereich festgelegt und der übertragungsfunktionsbasierte, relative H2-Fehler berechnet. Es werden zur Lösung linearer Gleichungssysteme mit Matlab entsprechende Löser-Funktionen, auf Permutation und Faktorisierung basierend, implementiert. / FEM-MKS-coupling requires model order reduction methods to simulate the frequency response of mechanical structures using a smaller reduced representation of the original system. Most of the rational Krylov-subspace methods are based on Arnoldi-algorithms. They allow to represent the frequency response in freely selectable, wide frequency ranges. Subject of this thesis is the implementation of an error-controlled model order reduction based on Krylov-subspace methods and the application to a mechanical model. Based on the MORPACK software, a first-order-Arnoldi function is extended by an interpolative start vector, the elimination of rigid body motion and a reorthogonalization. Containing these functions, a rational, interpolative Second Order Arnoldi (SOAR) method is designed that works well compared to a rational Block-SOAR-method. Interpolative equal weighting is used. The first-order-Arnoldi method requires less computational effort compared to the rational, interpolative SOAR that is able to compute a smaller reduction size for same frequency range of interest. The methods are applied to the models of a frame, a gear case and a drive shaft. Error-control is realized by eigenfrequency-based H2-integration-limit and relative H2-error based on the frequency response function. For solving linear systems of equations in Matlab, solver functions based on permutation and factorization are implemented. rational interpolativ SOAR Arnoldi zweiter Ordnung Modellreduktion Modellordnungsreduktion Fehlerüberwachung H2 Fehler lineare Gleichungssysteme Matlab Anwendung mechanische Systeme rational interpolative SOAR second order arnoldi model order reduction error control H2 error systems of linear equations Matlab application mechanical systems ddc:620 rvk:SK 910 rvk:UF 1500 Ordnungsreduktion rational Fehlerabschätzung Lineares Gleichungssystem MATLAB Anwendung Mechanisches System
13	Entwurf einer fehlerüberwachten Modellreduktion basierend auf Krylov-Unterraumverfahren und Anwendung auf ein strukturmechanisches Modell Bernstein, David 04 June 2014 (has links) Die FEM-MKS-Kopplung erfordert Modellordnungsreduktions-Verfahren, die mit kleiner reduzierter Systemdimension das Übertragungsverhalten mechanischer Strukturen abbilden. Rationale Krylov-Unterraum-Verfahren, basierend auf dem Arnoldi-Algorithmen, ermöglichen solche Abbildungen in frei wählbaren, breiten Frequenzbereichen. Ziel ist der Entwurf einer fehlerüberwachten Modelreduktion auf Basis von Krylov-Unterraumverfahren und Anwendung auf ein strukturmechanisches Model. Auf Grundlage der Software MORPACK wird eine Arnoldi-Funktion erster Ordnung um interpolativen Startvektor, Eliminierung der Starrkörperbewegung und Reorthogonalisierung erweitert. Diese Operationen beinhaltend, wird ein rationales, interpolatives SOAR-Verfahren entwickelt. Ein rationales Block-SOAR-Verfahren erweist sich im Vergleich als unterlegen. Es wird interpolative Gleichwichtung verwendet. Das Arnoldi-Verfahren zeichnet kleiner Berechnungsaufwand aus. Das rationale, interpolative SOAR liefert kleinere reduzierte Systemdimensionen für gleichen abgebildeten Frequenzbereich. Die Funktionen werden auf Rahmen-, Getriebegehäuse- und Treibsatzwellen-Modelle angewendet. Zur Fehlerbewertung wird eigenfrequenzbasiert ein H2-Integrationsbereich festgelegt und der übertragungsfunktionsbasierte, relative H2-Fehler berechnet. Es werden zur Lösung linearer Gleichungssysteme mit Matlab entsprechende Löser-Funktionen, auf Permutation und Faktorisierung basierend, implementiert.:1. Einleitung 1.1. Motivation 1.2. Einordnung 1.3. Aufbau der Arbeit 2. Theorie 2.1. Simulationsmethoden 2.1.1. Finite Elemente Methode 2.1.2. Mehrkörpersimulation 2.1.3. Kopplung der Simulationsmethoden 2.2. Zustandsraumdarstellung und Reduktion 2.3. Krylov Unterraum Methoden 2.4. Arnoldi-Algorithmen erster Ordnung 2.5. Arnoldi-Algorithmen zweiter Ordnung 2.6. Korrelationskriterien 2.6.1. Eigenfrequenzbezogene Kriterien 2.6.2. Eigenvektorbezogene Kriterien 2.6.3. Übertragungsfunktionsbezogene Kriterien 2.6.4. Fehlerbewertung 2.6.5. Anwendung auf Systeme sehr großer Dimension 3. Numerik linearer Gleichungssysteme 3.1. Grundlagen 3.2. Singularität der Koeffizientenmatrix 3.2.1. Randbedingungen des Systems 3.2.2. Verwendung einer generellen Diagonalperturbation 3.3. Iterative Lösungsverfahren 3.4. Faktorisierungsverfahren 3.4.1. Cholesky-Faktorisierung 3.4.2. LU-Faktorisierung 3.4.3. Fillin-Reduktion durch Permutation 3.4.4. Fazit 3.5. Direkte Lösungsverfahren 3.6. Verwendung externer Gleichungssystem-Löser 3.7. Zusammenfassung 4. Implementierung 4.1. Aufbau von MORPACK 4.2. Anforderungen an Reduktions-Funktionen 4.3. Eigenschaften und Optionen der KSM-Funktionen 4.3.1. Arnoldi-Funktion erster Ordnung 4.3.2. Rationale SOAR-Funktionen 4.4. Korrelationskriterien 4.4.1. Eigenfrequenzbezogen 4.4.2. Eigenvektorbezogen 4.4.3. Übertragungsfunktionsbezogen 4.5. Lösungsfunktionen linearer Gleichungssysteme 4.5.1. Anforderungen und Aufbau 4.5.2. Verwendung der Gleichungssystem-Löser 4.5.3. Hinweise zur Implementierung von Gleichungssystem-Lösern 5. Anwendung 5.1. Versuchsmodelle 5.1.1. Testmodelle kleiner Dimension 5.1.2. Getriebegehäuse 5.1.3. Treibsatzwelle 5.2. Validierung der Reduktionsmethoden an kleinem Modell 5.2.1. Modifizierte Arnoldi-Funktion erster Ordnung 5.2.2. Rationale SOAR-Funktionen 5.2.3. Zusammenfassung 5.3. Anwendung der KSM auf große Modelle 5.3.1. Getriebegehäuse 5.3.2. Treibsatzwelle 5.4. Auswertung 6. Zusammenfassung und Ausblick 6.1. Zusammenfassung 6.2. Ausblick / FEM-MKS-coupling requires model order reduction methods to simulate the frequency response of mechanical structures using a smaller reduced representation of the original system. Most of the rational Krylov-subspace methods are based on Arnoldi-algorithms. They allow to represent the frequency response in freely selectable, wide frequency ranges. Subject of this thesis is the implementation of an error-controlled model order reduction based on Krylov-subspace methods and the application to a mechanical model. Based on the MORPACK software, a first-order-Arnoldi function is extended by an interpolative start vector, the elimination of rigid body motion and a reorthogonalization. Containing these functions, a rational, interpolative Second Order Arnoldi (SOAR) method is designed that works well compared to a rational Block-SOAR-method. Interpolative equal weighting is used. The first-order-Arnoldi method requires less computational effort compared to the rational, interpolative SOAR that is able to compute a smaller reduction size for same frequency range of interest. The methods are applied to the models of a frame, a gear case and a drive shaft. Error-control is realized by eigenfrequency-based H2-integration-limit and relative H2-error based on the frequency response function. For solving linear systems of equations in Matlab, solver functions based on permutation and factorization are implemented.:1. Einleitung 1.1. Motivation 1.2. Einordnung 1.3. Aufbau der Arbeit 2. Theorie 2.1. Simulationsmethoden 2.1.1. Finite Elemente Methode 2.1.2. Mehrkörpersimulation 2.1.3. Kopplung der Simulationsmethoden 2.2. Zustandsraumdarstellung und Reduktion 2.3. Krylov Unterraum Methoden 2.4. Arnoldi-Algorithmen erster Ordnung 2.5. Arnoldi-Algorithmen zweiter Ordnung 2.6. Korrelationskriterien 2.6.1. Eigenfrequenzbezogene Kriterien 2.6.2. Eigenvektorbezogene Kriterien 2.6.3. Übertragungsfunktionsbezogene Kriterien 2.6.4. Fehlerbewertung 2.6.5. Anwendung auf Systeme sehr großer Dimension 3. Numerik linearer Gleichungssysteme 3.1. Grundlagen 3.2. Singularität der Koeffizientenmatrix 3.2.1. Randbedingungen des Systems 3.2.2. Verwendung einer generellen Diagonalperturbation 3.3. Iterative Lösungsverfahren 3.4. Faktorisierungsverfahren 3.4.1. Cholesky-Faktorisierung 3.4.2. LU-Faktorisierung 3.4.3. Fillin-Reduktion durch Permutation 3.4.4. Fazit 3.5. Direkte Lösungsverfahren 3.6. Verwendung externer Gleichungssystem-Löser 3.7. Zusammenfassung 4. Implementierung 4.1. Aufbau von MORPACK 4.2. Anforderungen an Reduktions-Funktionen 4.3. Eigenschaften und Optionen der KSM-Funktionen 4.3.1. Arnoldi-Funktion erster Ordnung 4.3.2. Rationale SOAR-Funktionen 4.4. Korrelationskriterien 4.4.1. Eigenfrequenzbezogen 4.4.2. Eigenvektorbezogen 4.4.3. Übertragungsfunktionsbezogen 4.5. Lösungsfunktionen linearer Gleichungssysteme 4.5.1. Anforderungen und Aufbau 4.5.2. Verwendung der Gleichungssystem-Löser 4.5.3. Hinweise zur Implementierung von Gleichungssystem-Lösern 5. Anwendung 5.1. Versuchsmodelle 5.1.1. Testmodelle kleiner Dimension 5.1.2. Getriebegehäuse 5.1.3. Treibsatzwelle 5.2. Validierung der Reduktionsmethoden an kleinem Modell 5.2.1. Modifizierte Arnoldi-Funktion erster Ordnung 5.2.2. Rationale SOAR-Funktionen 5.2.3. Zusammenfassung 5.3. Anwendung der KSM auf große Modelle 5.3.1. Getriebegehäuse 5.3.2. Treibsatzwelle 5.4. Auswertung 6. Zusammenfassung und Ausblick 6.1. Zusammenfassung 6.2. Ausblick info:eu-repo/classification/ddc/620 ddc:620

Page generated in 0.1321 seconds