Spelling suggestions: "subject:"demory lemsystems"" "subject:"demory atemsystems""
21 |
The GraphGrind framework : fast graph analytics on large shared-memory systemsSun, Jiawen January 2018 (has links)
As shared memory systems support terabyte-sized main memory, they provide an opportunity to perform efficient graph analytics on a single machine. Graph analytics is characterised by frequent synchronisation, which is addressed in part by shared memory systems. However, performance is limited by load imbalance and poor memory locality, which originate in the irregular structure of small-world graphs. This dissertation demonstrates how graph partitioning can be used to optimise (i) load balance, (ii) Non-Uniform Memory Access (NUMA) locality and (iii) temporal locality of graph partitioning in shared memory systems. The developed techniques are implemented in GraphGrind, a new shared memory graph analytics framework. At first, this dissertation shows that heuristic edge-balanced partitioning results in an imbalance in the number of vertices per partition. Thus, load imbalance exists between partitions, either for loops iterating over vertices, or for loops iterating over edges. To address this issue, this dissertation introduces a classification of algorithms to distinguish whether they algorithmically benefit from edge-balanced or vertex-balanced partitioning. This classification supports the adaptation of partitions to the characteristics of graph algorithms. Evaluation in GraphGrind, shows that this outperforms state-of-the-art graph analytics frameworks for shared memory including Ligra by 1.46x on average, and Polymer by 1.16x on average, using a variety of graph algorithms and datasets. Secondly, this dissertation demonstrates that increasing the number of graph partitions is effective to improve temporal locality due to smaller working sets. However, the increasing number of partitions results in vertex replication in some graph data structures. This dissertation resorts to using a graph layout that is immune to vertex replication and an automatic graph traversal algorithm that extends the previously established graph traversal heuristics to a 3-way graph layout choice is designed. This new algorithm furthermore depends upon the classification of graph algorithms introduced in the first part of the work. These techniques achieve an average speedup of 1.79x over Ligra and 1.42x over Polymer. Finally, this dissertation presents a graph ordering algorithm to challenge the widely accepted heuristic to balance the number of edges per partition and minimise edge or vertex cut. This algorithm balances the number of edges per partition as well as the number of unique destinations of those edges. It balances edges and vertices for graphs with a power-law degree distribution. Moreover, this dissertation shows that the performance of graph ordering depends upon the characteristics of graph analytics frameworks, such as NUMA-awareness. This graph ordering algorithm achieves an average speedup of 1.87x over Ligra and 1.51x over Polymer.
|
22 |
Estudo e implementação da otimização de Preload de dados usando o processador XScale / Study and implementation of data Preload optimization using XScaleOliveira, Marcio Rodrigo de 08 October 2005 (has links)
Orientador: Guido Costa Souza Araujo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-06T14:27:52Z (GMT). No. of bitstreams: 1
Oliveira_MarcioRodrigode_M.pdf: 1563381 bytes, checksum: 52e2e029998b3539a26f5c2b76284d88 (MD5)
Previous issue date: 2005 / Resumo: Atualmente existe um grande mercado para o desenvolvimento de aplicações para sistemas embutidos, pois estes estão fazendo parte crescente do cotidiano das pessoas em produtos de eletrônica de consumo como telefones celulares, palmtop's, agendas eletrônicas, etc.
Os produtos de eletrônica de consumo possuem grandes restrições de projeto, tais como custo reduzido, baixo consumo de potência e muitas vezes alto desempenho. Deste modo, o código produzido pelos compiladores para os programas executados nestes produtos, devem executar rapidamente, economizando energia de suas baterias. Estes melhoramentos são alcançados através de transformações no programa fonte chamadas de otimizações de código. A otimização preload de dados consiste em mover dados de um alto nível da hierarquia de memória para um baixo nível dessa hierarquia antes deste dado ser usado. Este é um método que pode reduzir a penalidade da latência de memória. Este trabalho mostra o desenvolvimento da otimização de preload de dados no compilador Xingo para a plataforma Pocket PC, cuja arquitetura possui um processador XScale. A arquitetura XScale possui a instrução preload, cujo objetivo é fazer uma pré-busca de dados para a cache. Esta otimização insere (através de previsões) a instrução preload no código intermediário do programa fonte, tentando prever quais dados serão usados e que darão miss na cache (trazendo-os para esta cache antes de seu uso). Com essa estratégia, tenta-se minimizar a porcentagem de misses na cache de dados, reduzindo o tempo gasto em acessos à memória. Foram usados neste trabalho vários programas de benchmarks conhecidos para a avaliação dos resultados, dentre eles destacam-se DSPstone e o MiBench. Os resultados mostram que esta otimização de preload de dados para o Pocket PC produz um aumento considerável de desempenho para a maioria dos programa testados, sendo que em vários programas observou-se uma melhora de desempenho maior que 30%! / Abstract: Nowadays, there is a big market for applications for embedded systems, in products as celIular phones, palmtops, electronic schedulers, etc. Consumer electronics are designed under stringent design constraints, like reduced cost, low power consumption and high performance. This way, the code produced by compiling programs to execute on these products, must execute quickly, and also should save power consumption. In order to achieve that, code optimizations must be performed at compile time. Data preload consists of moving data from a higher leveI of the memory hierarchy to a lower leveI before data is actualIy needed, thus reducing memory latency penalty. This dissertation shows how data preload optimization was implemented into the Xingo compiler for the Pocket PC platform, a XScale based processor. The XScale architecture has a preload instruction, whose main objective is to prefetch program data into cache. This optimization inserts (through heuristics) preload instructions into the program source code, in order to anticipate which data will be used. This strategy minimizes cache misses, allowing to reduce the cache miss latency while running the program code. Some benchmark programs have been used for evaluation, like DSPstone and MiBench. The results show a considerable performance improvement for almost alI tested programs, subject to the preload optimization. Many of the tested programs achieved performance improvements larger than 30% / Mestrado / Otimização de Codigo / Mestre em Ciência da Computação
|
23 |
Caching Strategies And Design Issues In CD-ROM Based Multimedia StorageShastri, Vijnan 04 1900 (has links) (PDF)
No description available.
|
24 |
Genotypic Handedness, Memory, and Cerebral LateralizationPerotti, Laurence Peter 08 1900 (has links)
The relationship of current manual preference (phenotypic handedness) and family history of handedness (genotypic handedness) to memory for imageable stimuli was studied. The purpose of the study was to test the hypothesis that genotypic handedness was related to lessened cerebral lateralization of Paivio's (1969) dual memory systems. The structure of memory was not at issue, but the mediation of storage and retrieval in memory has been explained with reference to verbal or imaginal processes. Verbal mediation theories and supporting data were reviewed along with imaginal theories and supporting data for these latter theories. Paivio's (1969) dual coding and processing theory was considered a conceptual bridge between the competing positions.
|
25 |
Exploring new boundaries in team cognition: Integrating knowledge in distributed teamsZajac, Stephanie 01 January 2014 (has links)
Distributed teams continue to emerge in response to the complex organizational environments brought about by globalization, technological advancements, and the shift toward a knowledge-based economy. These teams are comprised of members who hold the disparate knowledge necessary to take on cognitively demanding tasks. However, knowledge coordination between team members who are not co-located is a significant challenge, often resulting in process loss and decrements to the effectiveness of team level knowledge structures. The current effort explores the configuration dimension of distributed teams, and specifically how subgroup formation based on geographic location, may impact the effectiveness of a team's transactive memory system and subsequent team process. In addition, the role of task cohesion as a buffer to negative intergroup interaction is explored.
|
26 |
Modeling and Runtime Systems for Coordinated Power-Performance ManagementLi, Bo 28 January 2019 (has links)
Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method.
|
27 |
The Crash Consistency, Performance, and Security of Persistent Memory ObjectsGreenspan, Derrick Alex 01 January 2024 (has links) (PDF)
Persistent memory (PM) is expected to augment or replace DRAM as main memory. PM combines byte-addressability with non-volatility, providing an opportunity to host byte-addressable data persistently. There are two main approaches for utilizing PM: either as memory mapped files or as persistent memory objects (PMOs). Memory mapped files require that programmers reconcile two different semantics (file system and virtual memory) for the same underlying data, and require the programmer use complicated transaction semantics to keep data crash consistent.
To solve this problem, the first part of this dissertation designs, implements, and evaluates a new PMO abstraction that addresses these problems by hosting data in pointer-rich data structures without the backing of a filesystem, and introduces a new primitive, psync, that when invoked renders data crash consistent while concealing the implementation details from the programmer via shadowing. This new approach outperforms a state-of-the-art memory mapped design by 3.2 times depending on the workload. It also addresses the security of at-rest PMOs, by providing for encryption and integrity verification of PMOs. To do this, it performs encryption and integrity verification on the entire PMO, which adds an overhead of between 3-46% depending on the level of protection.
The second part of this dissertation demonstrates how crash consistency, security, and integrity verification can be conserved while the overall overhead is reduced by decrypting individual memory pages instead of the entire PMO, yielding performance improvements compared to the original whole PMO design of 2.62 times depending on the workload.
The final part of this dissertation improves the performance of PMOs even further by mapping userspace pages to volatile memory and copying them into PM, rather than directly writing to PM. Bundling this design with a stream buffer predictor to decrypt pages into DRAM ahead of time improves performance by 1.9 times.
|
28 |
Multiple memory systems in instrumental music learningHeath, Karen Louise 30 October 2024 (has links)
Playing a musical instrument involves the simultaneous expression or performance of several cognitive functions, including motor actions, visual and auditory processing, working temporal-spatial processing, and sensorimotor awareness. To explore relationships between discrete skills in music learning and how performance can all occur at the same time, this constructivist grounded theory (GT) study explored learning phenomena of beginner instrumental music students (n = 15) through the lens of the multiple memory systems theory and its two major memory class systems of explicit and implicit memory. In addition to the multiple memory system model, special focus was given to working memory, an explicit memory operant in which conscious processing and synthesis of information occurs, and automaticity, the immediate recall or action through the implicit memory system. Three major themes emerged in the analysis phase of the study, resulting in the synthesis of a new theory for instrumental music education: the multiple memory music learning (MMML) framework. The first theme central to MMML, automatic music learning, illustrates how automaticity appears to occur within the short-term memory paradigm when learning an instrument. This phenomenon challenges the current viewpoint in neuroscientific and psychological literature that automaticity only exists as a long-term memory function. The second theme, contextual music learning, relates to context-dependent learning outcomes,
and the third theme, music learning sequencing and attentional behavior, pertains to the
order in which learning events took place and how these orders influence performance
outcomes. Although further research is recommended, the results of the study suggest that MMML could be a valuable framework for understanding cognitive and memory functioning for instrumental music students.
|
29 |
How do teams learn? shared mental models and transactive memory systems as determinants of team learning and effectivenessNandkeolyar, Amit Kumar 01 January 2008 (has links)
Shared mental models (SMM) and Transactive memory systems (TMS) have been advocated as the main team learning mechanisms. Despite multiple appeals for collaboration, research in both these fields has progressed in parallel and little effort has been made to integrate these theories. The purpose of this study was to test the relationship between SMM and TMS in a field setting and examine their influence on various team effectiveness outcomes such as team performance, team learning, team creativity, team members' satisfaction and team viability.
Contextual factors relevant to an organizational setting were tested and these included team size, tenure, country of origin, team reward and organizational support. Based on responses from 41 teams from 7 industries across two countries (US and India), results indicate that team size, country of origin and team tenure impact team performance and team learning. In addition, team reward and organizational support predicted team viability and satisfaction.
Results indicated that TMS components (specialization, coordination and credibility) were better predictors of team outcomes than the omnibus TMS construct. In particular, TMS credibility predicted team performance and creativity while TMS coordination predicted team viability and satisfaction. SMM was measured in two different ways: an average deviation index and a 6-item scale. Both methods resulted in a conceptually similar interpretation although average deviation indices provided slightly better results in predicting effectiveness outcomes.
TMS components moderated the relationship between SMM and team outcomes. Team performance was lowest when both SMM and TMS were low. However, contrary to expectations, high levels of SMM did not always result in effective team outcomes (performance, learning and creativity) especially when teams exhibited high TMS specialization and credibility. An interaction pattern was observed under conditions of low levels of SMM such that high TMS resulted in higher levels of team outcomes. The theoretical and practical implications of these results are discussed.
|
30 |
Coordination de systèmes de mémoire : modèles théoriques du comportement animal et humain / Coordination of memory systems : theoretical models of human and animals behaviorViejo, Guillaume 28 November 2016 (has links)
Durant ce doctorat financé par l'observatoire B2V des mémoires, nous avons réalisé une modélisation mathématique du comportement dans trois tâches distinctes (avec des sujets humains, des sujets singes et des rongeurs), mais qui supposent toutes une coordination entre systèmes de mémoire. Dans la première expérience, nous avons reproduit le comportement de sujets humains (choix et temps de réaction) en combinant les modèles mathématiques d'une mémoire de travail et d'une mémoire inflexible. Nous avons associé pour un sujet son comportement au meilleur modèle possible en comparant des modèles génériques de coordination de ces deux mémoires issues de la littérature actuelle ainsi que notre propre proposition d'une interaction dynamique entre les mémoires. Au final, c'est notre proposition d'une interaction au lieu d'une séparation stricte qui s'est avérée la plus efficace dans la majorité des cas pour expliquer le comportement des sujets. Dans une deuxième expérience, les mêmes modèles de coordination ont été testés dans une tâche chez le singe. Considérée comme un test de transférabilité, cette expérience démontre principalement la nécessité de coordination de mémoires pour expliquer le comportement de certains singes. Dans une troisième expérience, nous avons modélisé le comportement d'un groupe de souris confronté à l'apprentissage d'une séquence d'action motrice dans un labyrinthe sans indices externes. En comparant avec deux autres stratégies d'apprentissages (intégration de chemin et planification dans un graphe), la combinaison d'une mémoire épisodique avec une mémoire inflexible s'est révélée être le meilleur modèle pour reproduire le comportement des souris. / During this PhD funded by the B2V Memories Observatory, we performed a mathematical modeling of behavior in three distinct tasks (with human subjects, monkeys and rodents), all involving coordination between memory systems. In the first experiment, we reproduced the behavior of human subjects (choice and reaction time) by combining the mathematical models of working memory and procedural memory. For each subject, we associated their behavior to the best possible model by comparing generic models of coordination of these two memories from the current literature as well as our own proposal of a dynamic interaction between memories. In the end, it was our proposal of an interaction instead of a strict separation which proved most effective in the majority of cases to explain the behavior of the subjects. In a second experiment, the same coordination models were tested in a monkey task. Considered as a transferability test, this experiment mainly demonstrates the need for coordination of memories to explain the behavior of certain monkeys. In a third experiment, we modeled the behavior of a group of mice confronted with the learning of a motor action sequence in a labyrinth without visual cues. Comparing with two other learning strategies (path integration and graph planning), the combination of an episodic memory with a procedural memory proved to be the best model to reproduce the behavior of mice.
|
Page generated in 0.0611 seconds