Global ETD Search

31	Un modèle de programmation à grain fin pour la parallélisation de solveurs linéaires creux / A fine grain model programming for parallelization of sparse linear solver Rossignon, Corentin 17 July 2015 (has links) La résolution de grands systèmes linéaires creux est un élément essentiel des simulations numériques.Ces résolutions peuvent représenter jusqu’à 80% du temps de calcul des simulations.Une parallélisation efficace des noyaux d’algèbre linéaire creuse conduira donc à obtenir de meilleures performances. En mémoire distribuée, la parallélisation de ces noyaux se fait le plus souvent en modifiant leschéma numérique. Par contre, en mémoire partagée, un parallélisme plus efficace peut être utilisé. Il est doncimportant d’utiliser deux niveaux de parallélisme, un premier niveau entre les noeuds d’une grappe de serveuret un deuxième niveau à l’intérieur du noeud. Lors de l’utilisation de méthodes itératives en mémoire partagée,les graphes de tâches permettent de décrire naturellement le parallélisme en prenant comme granularité letravail sur une ligne de la matrice. Malheureusement, cette granularité est trop fine et ne permet pas d’obtenirde bonnes performances à cause du surcoût de l’ordonnanceur de tâches.Dans cette thèse, nous étudions le problème de la granularité pour la parallélisation par graphe detâches. Nous proposons d’augmenter la granularité des tâches de calcul en créant des agrégats de tâchesqui deviendront eux-mêmes des tâches. L’ensemble de ces agrégats et des nouvelles dépendances entre lesagrégats forme un graphe de granularité plus grossière. Ce graphe est ensuite utilisé par un ordonnanceur detâches pour obtenir de meilleurs résultats. Nous utilisons comme exemple la factorisation LU incomplète d’unematrice creuse et nous montrons les améliorations apportées par cette méthode. Puis, dans un second temps,nous nous concentrons sur les machines à architecture NUMA. Dans le cas de l’utilisation d’algorithmeslimités par la bande passante mémoire, il est intéressant de réduire les effets NUMA liés à cette architectureen plaçant soi-même les données. Nous montrons comment prendre en compte ces effets dans un intergiciel àbase de tâches pour ainsi améliorer les performances d’un programme parallèle. / Solving large sparse linear system is an essential part of numerical simulations. These resolve can takeup to 80% of the total of the simulation time.An efficient parallelization of sparse linear kernels leads to better performances. In distributed memory,parallelization of these kernels is often done by changing the numerical scheme. Contrariwise, in sharedmemory, a more efficient parallelism can be used. It’s necessary to use two levels of parallelism, a first onebetween nodes of a cluster and a second inside a node.When using iterative methods in shared memory, task-based programming enables the possibility tonaturally describe the parallelism by using as granularity one line of the matrix for one task. Unfortunately,this granularity is too fine and doesn’t allow to obtain good performance.In this thesis, we study the granularity problem of the task-based parallelization. We offer to increasegrain size of computational tasks by creating aggregates of tasks which will become tasks themself. Thenew coarser task graph is composed by the set of these aggregates and the new dependencies betweenaggregates. Then a task scheduler schedules this new graph to obtain better performance. We use as examplethe Incomplete LU factorization of a sparse matrix and we show some improvements made by this method.Then, we focus on NUMA architecture computer. When we use a memory bandwidth limited algorithm onthis architecture, it is interesting to reduce NUMA effects. We show how to take into account these effects ina task-based runtime in order to improve performance of a parallel program. Parallélisme Graphe de tâches Supports d’exécution NUMA Multi-coeurs Algèbre linéaire creuse Parallelism Task-based programming Runtime NUMA Multicore Sparse linear algebra
32	Ordonnancement hybride statique-dynamique en algèbre linéaire creuse pour de grands clusters de machines NUMA et multi-coeurs Faverge, Mathieu 07 December 2009 (has links) Les nouvelles architectures de calcul intensif intègrent de plus en plus de microprocesseurs qui eux-mêmes intègrent un nombre croissant de cœurs de calcul. Cette multiplication des unités de calcul dans les architectures ont fait apparaître des topologies fortement hiérarchiques. Ces architectures sont dites NUMA. Les algorithmes de simulation numérique et les solveurs de systèmes linéaires qui en sont une brique de base doivent s'adapter à ces nouvelles architectures dont les accès mémoire sont dissymétriques. Nous proposons dans cette thèse d'introduire un ordonnancement dynamique adapté aux architectures NUMA dans le solveur PaStiX. Les structures de données du solveur, ainsi que les schémas de communication ont dû être modifiés pour répondre aux besoins de ces architectures et de l'ordonnancement dynamique. Nous nous sommes également intéressés à l'adaptation dynamique du grain de calcul pour exploiter au mieux les architectures multi-cœurs et la mémoire partagée. Ces développements sont ensuite validés sur un ensemble de cas tests sur différentes architectures. / New supercomputers incorporate many microprocessors which include themselves one or many computational cores. These new architectures induce strongly hierarchical topologies. These are called NUMA architectures. Sparse direct solvers are a basic building block of many numerical simulation algorithms. They need to be adapted to these new architectures with Non Uniform Memory Accesses. We propose to introduce a dynamic scheduling designed for NUMA architectures in the PaStiX solver. The data structures of the solver, as well as the patterns of communication have been modified to meet the needs of these architectures and dynamic scheduling. We are also interested in the dynamic adaptation of the computation grain to use efficiently multi-core architectures and shared memory. Experiments on several numerical test cases will be presented to prove the efficiency of the approach on different architectures. Parallélisme Architectures NUMA Ordonnancement dynamique Systèmes linéaires creux Méthodes directes Parallelism Dynamic scheduling Sparse direct solver Sparse linear system NUMA architectures
33	Mouvement de données et placement des tâches pour les communications haute performance sur machines hiérarchiques Moreaud, Stéphanie 12 October 2011 (has links) Les architectures des machines de calcul sont de plus en plus complexes et hiérarchiques, avec des processeurs multicœurs, des bancs mémoire distribués, et de multiples bus d'entrées-sorties. Dans le cadre du calcul haute performance, l'efficacité de l'exécution des applications parallèles dépend du coût de communication entre les tâches participantes qui est impacté par l'organisation des ressources, en particulier par les effets NUMA ou de cache.Les travaux de cette thèse visent à l'étude et à l'optimisation des communications haute performance sur les architectures hiérarchiques modernes. Ils consistent tout d'abord en l'évaluation de l'impact de la topologie matérielle sur les performances des mouvements de données, internes aux calculateurs ou au travers de réseaux rapides, et pour différentes stratégies de transfert, types de matériel et plateformes. Dans une optique d'amélioration et de portabilité des performances, nous proposons ensuite de prendre en compte les affinités entre les communications et le matériel au sein des bibliothèques de communication. Ces recherches s'articulent autour de l'adaptation du placement des tâches en fonction des schémas de transfert et de la topologie des calculateurs, ou au contraire autour de l'adaptation des stratégies de mouvement de données à une répartition définie des tâches. Ce travail, intégré aux principales bibliothèques MPI, permet de réduire de façon significative le coût des communications et d'améliorer ainsi les performances applicatives. Les résultats obtenus témoignent de la nécessité de prendre en compte les caractéristiques matérielles des machines modernes pour en exploiter la quintessence. / The emergence of multicore processors led to an increasing complexity inside the modern servers, with many cores, distributed memory banks and multiple Input/Output buses. The execution time of parallel applications depends on the efficiency of the communications between computing tasks. On recent architectures, the communication cost is largely impacted by hardware characteristics such as NUMA or cache effects. In this thesis, we propose to study and optimize high performance communication on hierarchical architectures. We first evaluate the impact of the hardware affinities on data movement, inside servers or across high-speed networks, and for multiple transfer strategies, technologies and platforms. We then propose to consider affinities between hardware and communicating tasks inside the communication libraries to improve performance and ensure their portability. To do so,we suggest to adapt the tasks binding according to the transfer method and thetopology, or to adjust the data transfer strategies to a defined task distribution. Our approaches have been integrated in some main MPI implementations. They significantly reduce the communication costs and improve the overall application performance. These results highlight the importance of considering hardware topology for nowadays servers. Calcul intensif Communication réseau Mémoire partagée Mpi Multiprocesseur Numa Mulicœur Affinité matérielle Topologie High Performance Computing Network communication Shared memory Mpi Multiprocessor Numa Multicore Hardware affinity Topology
34	Garbage collector for memory intensive applications on NUMA architectures / Ramasse-miette pour les applications avec forte utilisation de la mémoire sur architectures NUMA Gidra, Lokesh 28 September 2015 (has links) Afin de maximiser la localité des accès mémoire pendant la phase de collection, un thread GC évite d’accéder un autre noeud mémoire en notifiant à la place un thread GC distant avec un message. Néanmoins, NumaGiC évite les inconvénients d’un design complètement distribué qui tend à diminuer le parallélisme et augmenter le déséquilibre des accès mémoire en permettant aux threads de voler depuis les autres noeuds quand ceux-ci sont inactifs. NumaGiC fait son possible pour trouver un équilibre parfait entre les accès distant, le déséquilibre des accès mémoire et le parallélisme. Dans ce travail, nous comparons NumaGiC avec Parallel Scavenge et certaines de ses variantes améliorées de façon incrémentale sur deux architectures ccNUMA en utilisant la machine virtuelle Hotspot d’OpenJDK 7. Sur Spark et Neo4j, deux applications d’informatique décisionnelle de niveau industriel, avec une taille de tas allant de 160 GB à 350 GB, et sur SPECjbb2013 et SPECjbb2005, NumaGiC améliore la performance globale jusqu’à 94% par rapport à Parallel Scavenge et améliore la performance du collecteur lui-même jusqu’à 5,4times par rapport à Parallel Scavenge. En terme de passage à l’échelle du débit du GC en augmentant le nombre de noeuds NUMA, NumaGiC passe à l’échelle beaucoup mieux qu’avec Parallel Scavenge pour toutes les applications. Dans le cas de SPECjbb2005, où les références inter-objets sont les moins nombreuses parmi toutes les applications, NumaGiC passe à l’échelle quasiment linéairement. / Large-scale multicore architectures create new challenges for garbage collectors (GCs). On con-temporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. In this thesis, we address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism and increase memory access imbalance, by allowing threads to steal from other nodes when they are idle. NumaGiC strives to find a perfect balance between local access, memory access balance, and parallelism. In this work, we compare NumaGiC with Parallel Scavenge and some of its incrementally improved variants on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, NumaGiC improves overall performance by up to 94% over Parallel Scavenge, and increases the performance of the collector itself by up to 5.4× over Parallel Scavenge. In terms of scalability of GC throughput with increasing number of NUMA nodes, NumaGiC scales substantially better than Parallel Scavenge for all the applications. In fact in case of SPECjbb2005, where inter-node object references are the least among all, NumaGiC scales almost linearly. Ramasse-Miette Gestion de la mémoire Architecture numa Multi-coeur Passage à l'échelle Garbage collection NUMA architecture Memory acces 004
35	Partitioning Strategy Selection for In-Memory Graph Pattern Matching on Multiprocessor Systems Krause, Alexander, Kissinger, Thomas, Habich, Dirk, Voigt, Hannes, Lehner, Wolfgang 19 July 2023 (has links) Pattern matching on large graphs is the foundation for a variety of application domains. The continuously increasing size of the underlying graphs requires highly parallel in-memory graph processing engines that need to consider non-uniform memory access (NUMA) and concurrency issues to scale up on modern multiprocessor systems. To tackle these aspects, a fine-grained graph partitioning becomes increasingly important. Hence, we present a classification of graph partitioning strategies and evaluate representative algorithms on medium and large-scale NUMA systems in this paper. As a scalable pattern matching processing infrastructure, we leverage a data-oriented architecture that preserves data locality and minimizes concurrency-related bottlenecks on NUMA systems. Our in-depth evaluation reveals that the optimal partitioning strategy depends on a variety of factors and consequently, we derive a set of indicators for selecting the optimal partitioning strategy suitable for a given graph and workload. info:eu-repo/classification/ddc/004 ddc:004
36	The Role of Spindle Orientation in Epidermal Development and Homeostasis Seldin, Lindsey January 2015 (has links) <p>Robust regulation of spindle orientation is essential for driving asymmetric cell divisions (ACDs), which generate cellular diversity within a tissue. During the development of the multilayered mammalian epidermis, mitotic spindle orientation in the proliferative basal cells is crucial not only for dictating daughter cell fate but also for initiating stratification of the entire tissue. A conserved protein complex, including LGN, Nuclear mitotic apparatus (NuMA) and dynein/dynactin, plays a key role in establishing proper spindle orientation during ACDs. Two of these proteins, NuMA and dynein, interact directly with astral microtubules (MTs) that emanate from the mitotic spindle. While the contribution of these MT-binding interactions to spindle orientation remains unclear, these implicate apical NuMA and dynein as strong candidates for the machinery required to transduce pulling forces onto the spindle to drive perpendicular spindle orientation. </p><p> In my work, I first investigated the requirements for the cortical recruitment of NuMA and dynein, which had never been thoroughly addressed. I revealed that NuMA is required to recruit the dynein/dynactin complex to the cell cortex of cultured epidermal cells. In addition, I found that interaction with LGN is necessary but not sufficient for cortical NuMA recruitment. This led me to examine the role of additional NuMA-interacting proteins in spindle orientation. Notably, I identified a role for the 4.1 protein family in stabilizing NuMA's association with the cell cortex using a FRAP (fluorescence recovery after photobleaching)-based approach. I also showed that NuMA's spindle orientation activity is perturbed in the absence of 4.1 interactions. This effect was demonstrated in culture using both a cortical NuMA/spindle alignment assay as well as a cell stretch assay. Interestingly, I also noted a significant increase in cortical NuMA localization as cells enter anaphase. I found that inhibition of Cdk1 or mutation of a single residue on NuMA mimics this effect. I also revealed that this anaphase localization is independent of LGN and 4.1 interactions, thus revealing two independent mechanisms responsible for NuMA cortical recruitment at different stages of mitosis. </p><p> After gaining a deeper understanding of how NuMA is recruited and stabilized at the cell cortex, I then sought to investigate how cortical NuMA functions during spindle orientation. NuMA contains binding domains in its N- and C-termini that facilitate its interactions with the molecular motor dynein and MTs, respectively. In addition to its known role in recruiting dynein, I was interested in determining whether NuMA's ability to interact directly with MTs was critical for its function in spindle orientation. Surprisingly, I revealed that direct interactions between NuMA and MTs are required for spindle orientation in cultured keratinocytes. I also discovered that NuMA can specifically interact with MT ends and remain attached to depolymerizing MTs. To test the role of NuMA/MT interactions in vivo, I generated mice with an epidermal-specific in-frame deletion of the NuMA MT-binding domain. I determined that this deletion causes randomization of spindle orientation in vivo, resulting in defective epidermal differentiation and barrier formation, as well as neonatal lethality. In addition, conditional deletion of the NuMA MT-binding domain in adult mice results in severe hair growth defects. I found that NuMA is required for proper spindle positioning in hair follicle matrix cells and that differentiation of matrix-derived progeny is disrupted when NuMA is mutated, thus revealing an essential role for spindle orientation in hair morphogenesis. Finally, I discovered hyperproliferative regions in the interfollicular epidermis of these adult mutant mice, which is consistent with a loss of ACDs and perturbed differentiation. Based on these data, I propose a novel mechanism for force generation during spindle positioning whereby cortically-tethered NuMA plays a critical dynein-independent role in coupling MT depolymerization energy with cortical tethering to promote robust spindle orientation accuracy. </p><p> Taken together, my work highlights the complexity of NuMA localization and demonstrates the importance of NuMA cortical stability for productive force generation during spindle orientation. In addition, my findings validate the direct role of NuMA in spindle positioning and reveal that spindle orientation is used reiteratively in multiple distinct cell populations during epidermal morphogenesis and homeostasis.</p> / Dissertation Cellular biology Developmental biology asymmetric cell division epidermis hair follicle NuMA spindle orientation
37	Um processo de gera??o autom?tica de c?digo paralelo para arquiteturas h?bridas com afinidade de mem?ria / An automatic parallel code generation process for hybrid architectures using memory affinity Raeder, Mateus 27 August 2014 (has links) Submitted by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-06-29T13:36:54Z No. of bitstreams: 1 TES_MATEUS_RAEDER_COMPLETO.pdf: 6448267 bytes, checksum: af90fc3a763acd6de5c2203df411193f (MD5) / Made available in DSpace on 2017-06-29T13:36:54Z (GMT). No. of bitstreams: 1 TES_MATEUS_RAEDER_COMPLETO.pdf: 6448267 bytes, checksum: af90fc3a763acd6de5c2203df411193f (MD5) Previous issue date: 2014-08-27 / Over the last years, technological advances provide machines with different levels of parallelism, producing a great impact in high-performance computing area. These advances allowed developers to improve further the performance of large scale applications. In this context, clusters of multiprocessor machines with Non-Uniform Memory Access (NUMA) are a trend in parallel processing. In NUMA architectures, the access time to data depends on where it is placed in memory. For this reason, managing data location is essential in this type of machine. In this scenario, developing software for a cluster of NUMA machines must explore the internode part (multicomputer, with distributed memory) and the intranode part (multiprocessor, with shared memory) of this architecture. This type of hybrid programming takes advantage of all features provided by NUMA architectures. However, rewriting a sequential application so that it exploits the parallelism of the environment correctly is not a trivial task, but can be facilitated through an automated process. In this sense, our work presents an automatic parallel code generation process for hybrid architectures. With the proposed approach, users do not need to know low level routines of parallel programming libraries. In order to do so, we developed a graphical tool, in which users can dynamically and intuitively create their parallel models. Thereby, it is possible to create parallel programs in such a way that is not required to be familiar with libraries commonly used by professionals of high performance computing area (such as MPI, for example). By using the developed tool, user draws a directed graph to indicate the number of processes (nodes of the graph) and the communication between them (edges). From this drawing, user inserts the sequential code of each process defined in the graphical interface, and the tool automatically generates the corresponding parallel code. Moreover, weight process and memory mappings were defined and tested on a NUMA machine cluster, as well as a hybrid mapping. The tool was developed in Java and generates parallel code with MPI for C++, in the same way that it applies memory affinity policies for NUMA machines through the Memory Affinity Interface (MAI) library. Some applications were developed with and without our model. The obtained results evidence that the proposed mapping is valid, providing performance gains in relation to sequential versions and behaving in a very similar way to traditional parallel implementations. / Nos ?ltimos anos, avan?os tecnol?gicos t?m disponibilizado m?quinas com diferentes n?veis de paralelismo, produzindo um grande impacto na ?rea de processamento de alto desempenho. Estes avan?os permitiram aos desenvolvedores melhorar ainda mais o desempenho de aplica??es de grande porte. Neste contexto, a cria??o de clusters de m?quinas multiprocessadas com acesso n?o uniforme ? mem?ria (NUMA - Non-Uniform Memory Access), surge como uma tend?ncia. Em uma arquitetura NUMA, o tempo de acesso a um dado depende de sua localiza??o na mem?ria. Por este motivo, gerenciar a localiza??o dos dados ? essencial em m?quinas deste tipo. Neste cen?rio, o desenvolvimento de software para um cluster de m?quinas NUMA deve explorar tanto a parte internodo (multicomputador, com mem?ria distribu?da) quanto a parte intranodo (multiprocessador, mem?ria compartilhada) desta arquitetura. Este tipo de programa??o h?brida faz melhor uso dos recursos disponibilizados por arquiteturas NUMA. Entretanto, reescrever uma aplica??o sequencial de modo que explore o paralelismo do ambiente de forma correta n?o ? uma tarefa trivial, mas que pode ser facilitada atrav?s de um processo automatizado. Neste sentido, o presente trabalho apresenta um processo de gera??o autom?tica e transparente de aplica??es paralelas h?bridas, sem que o usu?rio precise conhecer as rotinas de baixo n?vel das bibliotecas de programa??o paralela. Foi desenvolvida ent?o, uma ferramenta gr?fica para que o usu?rio crie seu modelo paralelo de forma din?mica e intuitiva. Assim, ? poss?vel criar programas paralelos de tal forma que n?o ? necess?rio ser familiarizado com bibliotecas comumente utilizadas por profissionais da ?rea de alto desempenho (como o MPI, por exemplo). Atrav?s da ferramenta desenvolvida, o usu?rio desenha um grafo dirigido para indicar a quantidade de processos (nodos do grafo) e as formas de comunica??o entre eles (arestas). A partir desse desenho, o usu?rio insere o c?digo sequencial de cada processo definido na interface gr?fica, e a ferramenta gera o c?digo paralelo correspondente. Al?m disto, mapeamentos de processos pesados e de mem?ria foram definidos e testados em um cluster de m?quinas NUMA, bem como um mapeamento h?brido. A ferramenta foi desenvolvida em Java e gera c?digo paralelo com MPI em C++, al?m de aplicar pol?ticas de afinidade de mem?ria para m?quinas NUMA atrav?s da biblioteca MAI (Memory Affinity Interface). Algumas aplica??es foram desenvolvidas com e sem a utiliza??o do modelo. Os resultados demonstram que o mapeamento proposto ? v?lido, j? que houve ganho de desempenho em rela??o ?s vers?es sequenciais, al?m de um comportamento similar a implementa??es paralelas tradicionais. Programa??o H?brida Afinidade de Mem?ria Cluster de NUMA
38	Comportamento das curvas resultantes da interação das leituras visuais e digitais na triagem de auto-anticorpos (FAN) pela imunofluorescência indireta Francescantonio, Paulo Luiz Carvalho 12 May 2005 (has links) Made available in DSpace on 2016-08-10T10:55:53Z (GMT). No. of bitstreams: 1 PAULO LUIZ CARVALHO FRANCESCANTONIO.pdf: 1917098 bytes, checksum: 4ed2ad7744c8c5690f722bdbc60c57b7 (MD5) Previous issue date: 2005-05-12 / The indirect immunofluorescence test is widely used in the clinical pathology laboratory. Among the tests in this group we have FAN, which means research of antibodies against components of the nucleus, nucleolus, cytoplasm and mitotic spindle. This test, used in the selection of auto antibodies, imposes some difficulties for its application such as: it requires two highly specialized technicians to read of the slides, what prevents automation of the process; low standardization; lack of documentation; decay of the fluorescence while reading of the slide; lack of standardization of the optic system; great variability of substrata and the dependence on the experience of the observer. These factors limit the use of the test to a small group of laboratories worldwide. This work aims at minimizing those difficulties by creating the basic knowledge on the behavior of the resultant curves of the interactions between visual and digital readings. The images visual read are digitally captured and the intensity of its fluorescent green is evaluated. We analyze the curves produced for five patterns of FAN: homogeneous nuclear pattern, speckled centromeric nuclear pattern, mixed fine speckled nuclear standard and mitotic spindle NuMa 1, mitotic spindle pattern NuMa 2 and nucleolar homogeneous pattern. The behavior of the resultant curves of the interaction of digital and visual readings shows that the digital readings within a single title in the same patterns can be differentiated in 96,7% of the cases. In a second within patterns analysis, homologous dilutions, with several titles in the same pattern were distinguished in 99,2% of cases. The inter patterns analysis, which assesses whether the calibration curve obtained using homogeneous nucleolar pattern can be used for the others patterns, we obtained 98,8% of concordance. The creation of a new technology through knowledge of the behaviors of the resultant curves of the visual and digital readings of FAN testes can cause what we call “technological unemployment”, this phenomenon can be aggravated if society is in a decreasing economic cycle. In Brazil there is a group of public policies theat guide the application of a fund to support worker qualification minimizing the impact caused by the introduction of a new technology. / O teste de imunofluorescência indireta é amplamente utilizado no laboratório de patologia clinica. Entre os ensaios efetuados, temos o FAN, que significa pesquisa de anticorpos contra componentes do núcleo, nucléolo, citoplasma e aparelho mitótico. Este ensaio, utilizado na triagem de auto anticorpos, apresenta algumas dificuldades para sua realização tais como: dois técnicos altamente especializados para a leitura das lâminas, o que impossibilita automação; baixo nível de padronização; pouca documentação queda da fluorescência durante a leitura da lâmina; falta de padronização do sistema óptico, a grande variabilidade dos substratos e a dependência da experiência do observador. Isto limita o uso do ensaio a um grupo menor de laboratórios em todo mundo. Este trabalho visa minimizar essas dificuldades criando o conhecimento básico do comportamento das curvas resultantes das interações de leituras visuais e digitais. As imagens lidas visualmente são capturadas digitalmente e têm a intensidade de seu verde fluorescente avaliadas. Analisamos as curvas produzidas por cinco padrões de FAN: padrão nuclear homogêneo, padrão nuclear pontilhado centromérico, padrão misto nuclear pontilhado fino e aparelho mitótico NuMa 1, padrão aparelho mitótico tipo fuso mitótico NuMa 2 e padrão nucleolar homogêneo. O comportamento das curvas resultantes da interação das leituras digitais e visuais mostra que os valores digitais das diluições dentro de um mesmo título em um mesmo padrão, podem ser diferenciados em 96,7% dos casos. Na segunda análise intrapadrão, as diluições homólogas, entre os diversos títulos dentro de um mesmo padrão, têm seus valores digitais diferenciados em 99,2% dos casos. A análise interpadrões, que avalia se a curva de calibração produzida utilizando o padrão nuclear homogêneo pode ser utilizada nos demais padrões, demonstrou um nível de concordância de 98,8% nos ensaios examinados. A criação de uma nova tecnologia derivada do conhecimento do comportamento das curvas resultantes das leituras visuais e digitais podem gerar o que denominamos desemprego tecnológico, essa situação pode ser agravada se a sociedade se encontra em um ciclo econômico decrescente. No Brasil existe um grupo de políticas publicas que orientam a aplicação do Fundo de amparo ao trabalhador na qualificação de sua mão de obra minimizando o impacto causado pela introdução de uma nova tecnologia. Saúde anticorpos FAN imunofluorescência indireta aparelho mitótico NuMa patologia CNPQ::CIENCIAS DA SAUDE
39	Online thread and data mapping using the memory management unit / Mapeamento dinâmico de threads e dados usando a unidade de gerência de memória Cruz, Eduardo Henrique Molina da January 2016 (has links) Conforme o paralelismo a nível de threads aumenta nas arquiteturas modernas devido ao aumento do número de núcleos por processador e processadores por sistema, a complexidade da hierarquia de memória também aumenta. Tais hierarquias incluem diversos níveis de caches privadas ou compartilhadas e tempo de acesso não uniforme à memória. Um desafio importante em tais arquiteturas é a movimentação de dados entre os núcleos, caches e bancos de memória primária, que ocorre quando um núcleo realiza uma transação de memória. Neste contexto, a redução da movimentação de dados é um dos pilares para futuras arquiteturas para manter o aumento de desempenho e diminuir o consumo de energia. Uma das soluções adotadas para reduzir a movimentação de dados é aumentar a localidade dos acessos à memória através do mapeamento de threads e dados. Mecanismos de mapeamento do estado-da-arte aumentam a localidade de memória mapeando threads que compartilham um grande volume de dados em núcleos próximos na hierarquia de memória (mapeamento de threads), e mapeando os dados em bancos de memória próximos das threads que os acessam (mapeamento de dados). Muitas propostas focam em mapeamento de threads ou dados separadamente, perdendo oportunidades de ganhar desempenho. Outras propostas dependem de traços de execução para realizar um mapeamento estático, que podem impor uma sobrecarga alta e não podem ser usados em aplicações cujos comportamentos de acesso à memória mudam em diferentes execuções. Há ainda propostas que usam amostragem ou informações indiretas sobre o padrão de acesso à memória, resultando em informação imprecisa sobre o acesso à memória. Nesta tese de doutorado, são propostas soluções inovadoras para identificar um mapeamento que otimize o acesso à memória fazendo uso da unidade de gerência de memória para monitor os acessos à memória. As soluções funcionam dinamicamente em paralelo com a execução da aplicação, detectando informações para o mapeamento de threads e dados. Com tais informações, o sistema operacional pode realizar o mapeamento durante a execução das aplicações, não necessitando de conhecimento prévio sobre o comportamento da aplicação. Como as soluções funcionam diretamente na unidade de gerência de memória, elas podem monitorar a maioria dos acessos à memória com uma baixa sobrecarga. Em arquiteturas com TLB gerida por hardware, as soluções podem ser implementadas com pouco hardware adicional. Em arquiteturas com TLB gerida por software, algumas das soluções podem ser implementadas sem hardware adicional. As soluções aqui propostas possuem maior precisão que outros mecanismos porque possuem acesso a mais informações sobre o acesso à memória. Para demonstrar os benefícios das soluções propostas, elas são avaliadas com uma variedade de aplicações usando um simulador de sistema completo, uma máquina real com TLB gerida por software, e duas máquinas reais com TLB gerida por hardware. Na avaliação experimental, as soluções reduziram o tempo de execução em até 39%. O ganho de desempenho se deu por uma redução substancial da quantidade de faltas na cache, e redução do tráfego entre processadores. / As thread-level parallelism increases in modern architectures due to larger numbers of cores per chip and chips per system, the complexity of their memory hierarchies also increase. Such memory hierarchies include several private or shared cache levels, and Non-Uniform Memory Access nodes with different access times. One important challenge for these architectures is the data movement between cores, caches, and main memory banks, which occurs when a core performs a memory transaction. In this context, the reduction of data movement is an important goal for future architectures to keep performance scaling and to decrease energy consumption. One of the solutions to reduce data movement is to improve memory access locality through sharing-aware thread and data mapping. State-of-the-art mapping mechanisms try to increase locality by keeping threads that share a high volume of data close together in the memory hierarchy (sharing-aware thread mapping), and by mapping data close to where its accessing threads reside (sharing-aware data mapping). Many approaches focus on either thread mapping or data mapping, but perform them separately only, losing opportunities to improve performance. Some mechanisms rely on execution traces to perform a static mapping, which have a high overhead and can not be used if the behavior of the application changes between executions. Other approaches use sampling or indirect information about the memory access pattern, resulting in imprecise memory access information. In this thesis, we propose novel solutions to identify an optimized sharing-aware mapping that make use of the memory management unit of processors to monitor the memory accesses. Our solutions work online in parallel to the execution of the application and detect the memory access pattern for both thread and data mappings. With this information, the operating system can perform sharing-aware thread and data mapping during the execution of the application, without any prior knowledge of their behavior. Since they work directly in the memory management unit, our solutions are able to track most memory accesses performed by the parallel application, with a very low overhead. They can be implemented in architectures with hardwaremanaged TLBs with little additional hardware, and some can be implemented in architectures with software-managed TLBs without any hardware changes. Our solutions have a higher accuracy than previous mechanisms because they have access to more accurate information about the memory access behavior. To demonstrate the benefits of our proposed solutions, we evaluate them with a wide variety of applications using a full system simulator, a real machine with software-managed TLBs, and a trace-driven evaluation in two real machines with hardware-managed TLBs. In the experimental evaluation, our proposals were able to reduce execution time by up to 39%. The improvements happened to a substantial reduction in cache misses and interchip interconnection traffic. Processamento paralelo Memoria : Computadores Data movement Thread and data mapping Cache memory NUMA
40	Análise de fiabilidade de permutadores de calor de unidades de tratamento de ar Pereira, Antonino José Dias de Castro January 2012 (has links) Tese de mestrado integrado. Engenharia Mecânica. Faculdade de Engenharia. Universidade do Porto. 2012

Search results