• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 31
  • 24
  • 18
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 84
  • 18
  • 14
  • 13
  • 13
  • 12
  • 12
  • 12
  • 11
  • 10
  • 10
  • 10
  • 9
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Influência da intrusão marinha numa ETAR : o caso da ETAR do Ave

Hernández Manzano, Alejandro January 2012 (has links)
Tese de mestrado integrado. Engenharia Civil (Área de Especialização de Hidráulica). Faculdade de Engenharia. Universidade do Porto. 2012
42

Ordonnancement de processus légers sur architectures multiprocesseurs hiérarchiques : BubbleSched, une approche exploitant la structure du parallélisme des applications

Thibault, Samuel 06 December 2007 (has links) (PDF)
La tendance des constructeurs pour le calcul scientifique est à l'imbrication de technologies permettant un degré de parallélisme toujours plus fort au sein d'une même machine : architecture NUMA, puces multicœurs, SMT. L'efficacité de l'exécution d'une application parallèle irrégulière sur de telles machines hiérarchiques repose alors sur la qualité de l'ordonnancement des tâches et du placement des données, pour éviter le plus possible les pénalités NUMA et les défauts de cache. Les systèmes d'exploitation actuels, pris au dépourvu car trop généralistes, laissent les concepteurs d'application contraints à « câbler » leurs programmes pour une machine donnée.<br /><br />Dans cette thèse, pour garantir une certaine portabilité des performances, nous définissons la notion de /bulle/ permettant d'exprimer la nature structurée du parallélisme du calcul, et nous modélisons l'architecture de la machine cible par une hiérarchie de listes de tâches. Une interface de programmation et des outils de débogage de haut niveau permettent alors de développer simplement des ordonnanceurs dédiés, efficaces et portables. Différents ordonnanceurs mettant en œuvre des approches variées ont été développés, en partie notamment par des stagiaires encadrés au sein de l'équipe, ce qui montre à la fois la puissance et la simplicité de l'interface. C'est ainsi une véritable plate-forme de développement et d'expérimentation d'ordonnanceurs à bulles qui a été intégrée au sein de la bibliothèque de threads utilisateur marcel. Le support OpenMP du compilateur GCC, GOMP, a été étendu pour utiliser cette bibliothèque et exprimer la nature structurée des sections parallèles imbriquées à l'aide de bulles. Avec la couche de compatibilité POSIX de marcel, ces supports ont permis de tester les différents ordonnanceurs à bulles développés, sur différentes applications. Les gains obtenus, de l'ordre de 20 à 40%, montrent l'intérêt de notre approche.
43

Ordonnancement hybride statique-dynamique en algèbre linéaire creuse pour de grands clusters de machines NUMA et multi-cœurs

Faverge, Mathieu 07 December 2009 (has links) (PDF)
Les nouvelles architectures de calcul intensif intègrent de plus en plus de microprocesseurs qui eux-mêmes intègrent un nombre croissant de cœurs de calcul. Cette multiplication des unités de calcul dans les architectures ont fait apparaître des topologies fortement hiérarchiques. Ces architectures sont dites NUMA. Les algorithmes de simulation numérique et les solveurs de systèmes linéaires qui en sont une brique de base doivent s'adapter à ces nouvelles architectures dont les accès mémoire sont dissymétriques. Nous proposons dans cette thèse d'introduire un ordonnancement dynamique adapté aux architectures NUMA dans le solveur PaStiX. Les structures de données du solveur, ainsi que les schémas de communication ont dû être modifiés pour répondre aux besoins de ces architectures et de l'ordonnancement dynamique. Nous nous sommes également intéressés à l'adaptation dynamique du grain de calcul pour exploiter au mieux les architectures multi-cœurs et la mémoire partagée. Ces développements sont ensuite validés sur un ensemble de cas tests sur différentes architectures.
44

De l'exécution structurée d'applications scientifiques OpenMP sur les architectures hiérarchiques.

Broquedis, François 09 December 2010 (has links) (PDF)
Le domaine applicatif de la simulation numérique requiert toujours plus de puissance de calcul. La technologie multicœur aide à satisfaire ces besoins mais impose toutefois de nouvelles contraintes aux programmeurs d'applications scientifiques qu'ils devront respecter s'ils souhaitent en tirer la quintessence. En particulier, il devient plus que jamais nécessaire de structurer le parallélisme des applications pour s'adapter au relief imposé par la hiérarchie mémoire des architectures multicœurs. Les approches existantes pour les programmer ne tiennent pas compte de cette caractéristique, et le respect de la structure du parallélisme reste à la charge du programmeur. Il reste de ce fait très difficile de développer une application qui soit à la fois performante et portable.La contribution de cette thèse s'articule en trois axes. Il s'agit dans un premier temps de s'appuyer sur le langage OpenMP pour générer du parallélisme structuré, et de permettre au programmeur de transmettre cette structure au support exécutif ForestGOMP. L'exécution structurée de ces flots de calcul est ensuite laissée aux ordonnanceurs Cacheet Memory développés au cours de cette thèse, permettant respectivement de maximiser la réutilisation des caches partagés et de maximiser la bande passante mémoire accessible par les programmes OpenMP. Enfin, nous avons étudié la composition de ces ordonnanceurs, et plus généralement de bibliothèques parallèles, en considérant cette voie comme une piste sérieuse pour exploiter efficacement les multiples unités de calcul des architectures multicœurs.Les gains obtenus sur des applications scientifiques montrent l'intérêt d'une communication forte entre l'application et le support exécutif, permettant l'ordonnancement dynamique et portable de parallélisme structuré sur les architectures hiérarchiques.
45

The Numan tradition and its uses in the literature Rome's 'Golden Age' /

Otis, Lise. January 2001 (has links)
This dissertation presents a critical analysis of literary texts that recount fully or briefly the life and legend of King Numa Pompilius. Focusing on the 'Golden Age', it comprises the Numan accounts of Cicero, Livy, Dionysius of Halicarnassus and Ovid. These authors lived at a time when Rome was trying to reconcile for herself and for her subjects the price of her military world domination with the belief in her foreordained supremacy. This reconciliation was to be achieved by a reacquaintance with the Roman ancestral values whose observance had merited Rome her dominion and whose neglect had driven the state to civil war. The question of Roman national identity is at the heart of the Numan accounts of the chosen prose-writers. In his portrayal of Numa, who combines the civilizing virtues of classical Athens with native Roman virtue, Cicero offers a rebuttal for Greek critics who questioned Rome's supremacy because of her lack of civilizing virtues. Livy investigates the leading causes of Rome's world domination and identifies the national values and institutions that many generations of leaders forged. Numa is one such leader, having established laws, religious rite and a peaceful way of life. Dionysius represents Numa as the Greek ideal of kingship in order to establish for the Greek world the excellence of the Roman national identity founded on Greek virtue. The Numan accounts of Livy and Dionysius, composed in Augustus' principate, do not draw direct parallels between Numa and Augustus, although the narration sometimes suggests a special relevance to Augustan rule. Finally, Ovid, the only poet, recounting traditional Numan tales, offers analogies and allegories of certain Augustan ideas and measures that may be seen to flatter the ruler.
46

Regulation of Asymmetric Cell Divisions in the Developing Epidermis

Poulson, Nicholas January 2012 (has links)
<p>During development, oriented cell divisions are crucial for correctly organizing and shaping a tissue. Mitotic spindle orientation can be coupled with cell fate decisions to provide cellular diversity through asymmetric cell divisions (ACDs), in which the division of a progenitor cell results in two daughters with different cell fates. Proper tissue morphogenesis relies on the coupling of these two phenomena being highly regulated. The development of the mouse epidermis provides a powerful system in which to study the many levels that regulate ACDs. Within the basal layer of the epidermis, both symmetric and asymmetric cell divisions occur. While symmetric divisions allow for an increase in surface area and progenitor cell number, asymmetric divisions drive the stratification of the epidermis, directly contributing additional cell layers (Lechler and Fuchs 2005; Poulson and Lechler 2010; Williams, Beronja et al. 2011). </p><p>Utilizing genetic lineage tracing to label individual basal cells I show that individual basal cells can undergo both symmetric and asymmetric divisions. Therefore, the balance of symmetric:asymmetric divisions is provided by the sum of individual cells' choices. In addition, I define two control points for determining a cell's mode of division. First is the expression of the mInscuteable gene, which is sufficient to drive ACDs. However, there is robust control of division orientation as excessive ACDs are prevented by a change in the localization of NuMA, an effector of spindle orientation. Finally, I show that p63, a transcriptional regulator of stratification, does not control either of these processes, rather it controls ACD indirectly by promoting cell polarity. </p><p>Given the robust control on NuMA localization to prevent excess ACDs, I sought to determine how targeting of NuMA to the cortex is regulated. First, I determined which regions within the protein were necessary and sufficient for cortical localization. NuMA is a large coiled- coil protein that binds many factors important for ACDs, which include but are not limited to: microtubules, 4.1, and LGN. Interestingly, while the LGN binding domain was necessary, it was not sufficient for proper NuMA localization at the cortex. However, a fragment of NuMA containing both the 4.1 and LGN binding domains was able to localize to the cortex. Additionally, the NuMA-binding domain of 4.1 was able to specifically disrupt NuMA localization at the cortex. These data suggested an important role for a NuMA-4.1 interaction at the cortex. While the 4.1 binding domain was not necessary for the cortical localization of NuMA, it was important for the overall stability of NuMA at the cortex. I hypothesize that 4.1 acts to anchor/stabilize NuMA at the cortex to provide resistance against pulling forces on the mitotic spindle to ensure proper spindle orientation.</p><p>Finally, to determine if post-translational modifications of NuMA could regulate its localization I tested the importance of a conserved Cdk-1 phosphorylation site. Interestingly, a non-phosphorylatable form of NuMA localized predominately to the cortex while the phosphomimetic protein localized strongly to spindle poles. In agreement with these data, use of a CDK-1 inhibitor was able to enhance the cortical localization of NuMA. Unexpectedly, the non-phosphorylatable form of NuMA did not require LGN to localize to the cortex. Additionally, restoration of cortical localization of the phosphomimetic form of NuMA was accomplished by the overexpression of either LGN or 4.1. Thus, phosphorylation of NuMA may alter its overall affinity for the cortex. </p><p>Overall, my studies highlight two important regulatory mechanisms controlling asymmetric cell division in the epidermis. Additionally, I show a novel role for the interaction between NuMA and 4.1 in providing stability at the cortex. This will ultimately provide a framework for analysis of how external cues control the important choice between asymmetric and symmetric cell divisions.</p> / Dissertation
47

Contributions au contrôle de l'affinité mémoire sur architectures multicoeurs et hiérarchiques

Ribeiro, Christiane 29 June 2011 (has links) (PDF)
Les plates-formes multi-coeurs avec un accès mémoire non uniforme (NUMA) sont devenu des ressources usuelles de calcul haute performance. Dans ces plates-formes, la mémoire partagée est constituée de plusieurs bancs de mémoires physiques organisés hiérarchiquement. Cette hiérarchie est également constituée de plusieurs niveaux de mémoires caches et peut être assez complexe. En raison de cette complexité, les coûts d'accès mémoire peuvent varier en fonction de la distance entre le processeur et le banc mémoire accédé. Aussi, le nombre de coeurs est très élevé dans telles machines entraînant des accès mémoire concurrents. Ces accès concurrents conduisent à des ponts chauds sur des bancs mémoire, générant des problèmes d'équilibrage de charge, de contention mémoire et d'accès distants. Par conséquent, le principal défi sur les plates-formes NUMA est de réduire la latence des accès mémoire et de maximiser la bande passante. Dans ce contexte, l'objectif principal de cette thèse est d'assurer une portabilité des performances évolutives sur des machines NUMA multi-coeurs en contrôlant l'affinité mémoire. Le premier aspect consiste à étudier les caractéristiques des plates-formes NUMA que sont à considérer pour contrôler efficacement les affinités mémoire, et de proposer des mécanismes pour tirer partie de telles affinités. Nous basons notre étude sur des benchmarks et des applications de calcul scientifique ayant des accès mémoire réguliers et irréguliers. L'étude de l'affinité mémoire nous a conduit à proposer un environnement pour gérer le placement des données pour les différents processus des applications. Cet environnement s'appuie sur des informations de compilation et sur l'architecture matérielle pour fournir des mécanismes à grains fins pour contrôler le placement. Ensuite, nous cherchons à fournir des solutions de portabilité des performances. Nous entendons par portabilité des performances la capacité de l'environnement à apporter des améliorations similaires sur des plates-formes NUMA différentes. Pour ce faire, nous proposons des mécanismes qui sont indépendants de l'architecture machine et du compilateur. La portabilité de l'environnement est évaluée sur différentes plates-formes à partir de plusieurs benchmarks et des applications numériques réelles. Enfin, nous concevons des mécanismes d'affinité mémoire qui peuvent être facilement adaptés et utilisés dans différents systèmes parallèles. Notre approche prend en compte les différentes structures de données utilisées dans les différentes applications afin de proposer des solutions qui peuvent être utilisées dans différents contextes. Toutes les propositions développées dans ce travail de recherche sont mises en œuvre dans une framework nommée Minas (Memory Affinity Management Software). Nous avons évalué l'adaptabilité de ces mécanismes suivant trois modèles de programmation parallèle à savoir OpenMP, Charm++ et mémoire transactionnelle. En outre, nous avons évalué ses performances en utilisant plusieurs benchmarks et deux applications réelles de géophysique.
48

Multithreaded PDE Solvers on Non-Uniform Memory Architectures

Nordén, Markus January 2006 (has links)
A trend in parallel computer architecture is that systems with a large shared memory are becoming more and more popular. A shared memory system can be either a uniform memory architecture (UMA) or a cache coherent non-uniform memory architecture (cc-NUMA). In the present thesis, the performance of parallel PDE solvers on cc-NUMA computers is studied. In particular, we consider the shared namespace programming model, represented by OpenMP. Since the main memory is physically, or geographically distributed over several multi-processor nodes, the latency for local memory accesses is smaller than for remote accesses. Therefore, the geographical locality of the data becomes important. The focus of the present thesis is to study multithreaded PDE solvers on cc-NUMA systems, in particular their memory access pattern with respect to geographical locality. The questions posed are: (1) How large is the influence on performance of the non-uniformity of the memory system? (2) How should a program be written in order to reduce this influence? (3) Is it possible to introduce optimizations in the computer system for this purpose? The main conclusion is that geographical locality is important for performance on cc-NUMA systems. This is shown experimentally for a broad range of PDE solvers as well as theoretically using a model involving characteristics of computer systems and applications. Geographical locality can be achieved through migration directives that are inserted by the programmer or — possibly in the future — automatically by the compiler. On some systems, it can also be accomplished by means of transparent, hardware initiated migration and replication. However, a necessary condition that must be fulfilled if migration is to be effective is that the memory access pattern must not be "speckled", i.e. as few threads as possible shall make accesses to each memory page. We also conclude that OpenMP is competitive with MPI on cc-NUMA systems if care is taken to get a favourable data distribution.
49

Automatic task and data mapping in shared memory architectures / Mapeamento automático de processos e dados em arquiteturas de memória compartilhada

Diener, Matthias January 2015 (has links)
Arquiteturas paralelas modernas têm hierarquias de memória complexas, que consistem de vários níveis de memórias cache privadas e compartilhadas, bem como Non-Uniform Memory Access (NUMA) devido a múltiplos controladores de memória por sistema. Um dos grandes desafios dessas arquiteturas é melhorar a localidade e o balanceamento de acessos à memória de tal forma que a latência média de acesso à memória é reduzida. Dessa forma, o desempenho e a eficiência energética de aplicações paralelas podem ser melhorados. Os acessos podem ser melhorados de duas maneiras: (1) processos que acessam dados compartilhados (comunicação entre processos) podem ser alocados em unidades de execução próximas na hierarquia de memória, a fim de melhorar o uso das caches. Esta técnica é chamada de mapeamento de processos. (2) Mapear as páginas de memória que cada processo acessa ao nó NUMA que ele está sendo executado, assim, pode-se reduzir o número de acessos a memórias remotas em arquiteturas NUMA. Essa técnica é conhecida como mapeamento de dados. Para melhores resultados, os mapeamentos de processos e dados precisam ser realizados de forma integrada. Trabalhos anteriores nesta área executam os mapeamentos separadamente, o que limita os ganhos que podem ser alcançados. Além disso, a maioria dos mecanismos anteriores exigem operações caras, como traços de acessos à memória, para realizar o mapeamento, além de exigirem mudanças no hardware ou na aplicação paralela. Estes mecanismos não podem ser considerados soluções genéricas para o problema de mapeamento. Nesta tese, fazemos duas contribuições principais para o problema de mapeamento. Em primeiro lugar, nós introduzimos um conjunto de métricas e uma metodologia para analisar aplicações paralelas, a fim de determinar a sua adequação para um melhor mapeamento e avaliar os possíveis ganhos que podem ser alcançados através desse mapeamento otimizado. Em segundo lugar, propomos um mecanismo que executa o mapeamento de processos e dados online. Este mecanismo funciona no nível do sistema operacional e não requer alterações no hardware, os códigos fonte ou bibliotecas. Uma extensa avaliação com múltiplos conjuntos de carga de trabalho paralelos mostram consideráveis melhorias em desempenho e eficiência energética. / Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping tasks that share data (communicate) to cores with a shared cache, cache usage and communication performance are improved. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques task mapping and data mapping, respectively. For optimal results, task and data mapping need to be performed in an integrated way. Previous work in this area performs the mapping only separately, which limits the gains that can be achieved. Furthermore, most previous mechanisms require expensive operations, such as communication or memory access traces, to perform the mapping, require changes to the hardware or to the parallel application, or use a simple static mapping. These mechanisms can not be considered generic solutions for the mapping problem. In this thesis, we make two contributions to the mapping problem. First, we introduce a set of metrics and a methodology to analyze parallel applications in order to determine their suitability for an improved mapping and to evaluate the possible gains that can be achieved using an optimized mapping. Second, we propose two automatic mechanisms that perform task mapping and combined task/data mapping, respectively, during the execution of a parallel application. These mechanisms work on the operating system level and require no changes to the hardware, the applications themselves or their runtime libraries. An extensive evaluation with parallel applications from multiple benchmark suites as well as real scientific applications shows substantial performance and energy efficiency improvements that are significantly higher than simple mechanisms and previous work, while maintaining a low overhead.
50

Online thread and data mapping using the memory management unit / Mapeamento dinâmico de threads e dados usando a unidade de gerência de memória

Cruz, Eduardo Henrique Molina da January 2016 (has links)
Conforme o paralelismo a nível de threads aumenta nas arquiteturas modernas devido ao aumento do número de núcleos por processador e processadores por sistema, a complexidade da hierarquia de memória também aumenta. Tais hierarquias incluem diversos níveis de caches privadas ou compartilhadas e tempo de acesso não uniforme à memória. Um desafio importante em tais arquiteturas é a movimentação de dados entre os núcleos, caches e bancos de memória primária, que ocorre quando um núcleo realiza uma transação de memória. Neste contexto, a redução da movimentação de dados é um dos pilares para futuras arquiteturas para manter o aumento de desempenho e diminuir o consumo de energia. Uma das soluções adotadas para reduzir a movimentação de dados é aumentar a localidade dos acessos à memória através do mapeamento de threads e dados. Mecanismos de mapeamento do estado-da-arte aumentam a localidade de memória mapeando threads que compartilham um grande volume de dados em núcleos próximos na hierarquia de memória (mapeamento de threads), e mapeando os dados em bancos de memória próximos das threads que os acessam (mapeamento de dados). Muitas propostas focam em mapeamento de threads ou dados separadamente, perdendo oportunidades de ganhar desempenho. Outras propostas dependem de traços de execução para realizar um mapeamento estático, que podem impor uma sobrecarga alta e não podem ser usados em aplicações cujos comportamentos de acesso à memória mudam em diferentes execuções. Há ainda propostas que usam amostragem ou informações indiretas sobre o padrão de acesso à memória, resultando em informação imprecisa sobre o acesso à memória. Nesta tese de doutorado, são propostas soluções inovadoras para identificar um mapeamento que otimize o acesso à memória fazendo uso da unidade de gerência de memória para monitor os acessos à memória. As soluções funcionam dinamicamente em paralelo com a execução da aplicação, detectando informações para o mapeamento de threads e dados. Com tais informações, o sistema operacional pode realizar o mapeamento durante a execução das aplicações, não necessitando de conhecimento prévio sobre o comportamento da aplicação. Como as soluções funcionam diretamente na unidade de gerência de memória, elas podem monitorar a maioria dos acessos à memória com uma baixa sobrecarga. Em arquiteturas com TLB gerida por hardware, as soluções podem ser implementadas com pouco hardware adicional. Em arquiteturas com TLB gerida por software, algumas das soluções podem ser implementadas sem hardware adicional. As soluções aqui propostas possuem maior precisão que outros mecanismos porque possuem acesso a mais informações sobre o acesso à memória. Para demonstrar os benefícios das soluções propostas, elas são avaliadas com uma variedade de aplicações usando um simulador de sistema completo, uma máquina real com TLB gerida por software, e duas máquinas reais com TLB gerida por hardware. Na avaliação experimental, as soluções reduziram o tempo de execução em até 39%. O ganho de desempenho se deu por uma redução substancial da quantidade de faltas na cache, e redução do tráfego entre processadores. / As thread-level parallelism increases in modern architectures due to larger numbers of cores per chip and chips per system, the complexity of their memory hierarchies also increase. Such memory hierarchies include several private or shared cache levels, and Non-Uniform Memory Access nodes with different access times. One important challenge for these architectures is the data movement between cores, caches, and main memory banks, which occurs when a core performs a memory transaction. In this context, the reduction of data movement is an important goal for future architectures to keep performance scaling and to decrease energy consumption. One of the solutions to reduce data movement is to improve memory access locality through sharing-aware thread and data mapping. State-of-the-art mapping mechanisms try to increase locality by keeping threads that share a high volume of data close together in the memory hierarchy (sharing-aware thread mapping), and by mapping data close to where its accessing threads reside (sharing-aware data mapping). Many approaches focus on either thread mapping or data mapping, but perform them separately only, losing opportunities to improve performance. Some mechanisms rely on execution traces to perform a static mapping, which have a high overhead and can not be used if the behavior of the application changes between executions. Other approaches use sampling or indirect information about the memory access pattern, resulting in imprecise memory access information. In this thesis, we propose novel solutions to identify an optimized sharing-aware mapping that make use of the memory management unit of processors to monitor the memory accesses. Our solutions work online in parallel to the execution of the application and detect the memory access pattern for both thread and data mappings. With this information, the operating system can perform sharing-aware thread and data mapping during the execution of the application, without any prior knowledge of their behavior. Since they work directly in the memory management unit, our solutions are able to track most memory accesses performed by the parallel application, with a very low overhead. They can be implemented in architectures with hardwaremanaged TLBs with little additional hardware, and some can be implemented in architectures with software-managed TLBs without any hardware changes. Our solutions have a higher accuracy than previous mechanisms because they have access to more accurate information about the memory access behavior. To demonstrate the benefits of our proposed solutions, we evaluate them with a wide variety of applications using a full system simulator, a real machine with software-managed TLBs, and a trace-driven evaluation in two real machines with hardware-managed TLBs. In the experimental evaluation, our proposals were able to reduce execution time by up to 39%. The improvements happened to a substantial reduction in cache misses and interchip interconnection traffic.

Page generated in 0.4824 seconds