Global ETD Search

521	Μελέτη της διαχείρισης της κρυφής μνήμης σε πραγματικό περιβάλλον Περγαντής, Μηνάς 19 January 2010 (has links) Στη σύγχρονη εποχή το κενό απόδοσης μεταξύ του επεξεργαστή και της μνήμης ενός σύγχρονου υπολογιστικού συστήματος συνεχώς μεγαλώνει. Είναι λοιπόν σημαντικό να ερευνηθούν νέοι τρόποι για να καλυφθεί η αδυναμία της κύριας μνήμης να ακολουθήσει τον επεξεργαστή. Η μνήμη cache ήταν ανέκαθεν ένα χρήσιμο εργαλείο προς αυτήν την κατεύθυνση. Χρειάζεται όμως πλέον να προχωρήσει πέρα από την απλοϊκή μορφή της και τον αλγόριθμο LRU Η παρούσα διπλωματική έχει σαν σκοπό την μελέτη της cache σε πραγματικό περιβάλλον και την ανάλυση της δυνατότητας και της χρησιμότητας της πρόβλεψης της συμπεριφοράς ενός σύγχρονου προγράμματος όσον αφορά την προσπέλαση της μνήμης. Η εργασία επικεντρώνεται στην χρήση τεχνικών dynamic instrumentation για την υλοποίηση ενός μηχανισμού πρόβλεψης της απόστασης επαναχρησιμοποίησης μιας θέσης μνήμης, μέσω της ανάλυσης και μελέτης της συμπεριφοράς της εντολής, που ζητά να προσπελάσει την συγκεκριμένη θέση μνήμης. Αναλύεται εκτενώς η λειτουργία ενός τέτοιου μηχανισμού και παρέχονται στατιστικές μετρήσεις που επιβεβαιώνουν την χρησιμότητα και ευστοχία μιας τέτοιας πρόβλεψης. / In contemporary times the performance gap between the CPU and the main memory of a modern computer system grows larger. So it is important to find new ways to cover the inability of the main memory to cope with the CPU’s performance. Cache memory has always been a useful tool towards this goal. However the need arises for it to move beyond simplistic implementations and algorithms like LRU. The present end year project aims towards the study of cache memory in a real time environment and the analysis of the capability and usefulness of prediction of the memory access behaviour of a modern program. The thesis puts weight on the use of dynamic instrumentation techniques for the creation of a prediction mechanism of the reuse distance of a memory address, through the analysis and study of the behavior of the instruction that accessed this memory address. The function of such a mechanism is analyzed in depth and statistical measures are provided to prove the usefulness and accuracy of such a prediction. Κρυφή μνήμη Πρόβλεψη 005.435 Computer architecture Cache memory Prediction Reuse distance
522	Διαχείριση κρυφής μνήμης επεξεργαστών με πρόβλεψη Σπηλιωτακάρας, Αθανάσιος 11 May 2010 (has links) Στον διαρκώς μεταβαλλόμενο τομέα της αρχιτεκτονικής των υπολογιστών, τα τελευταία 30 τουλάχιστον χρόνια οι αλλαγές έρχονται με εκθετικό ρυθμό. Οι κρυφές μνήμες αποτελούν πλέον το κέντρο του ενδιαφέροντος, αφού οι επεξεργαστές γίνονται ολοένα και ταχύτεροι, ολοένα και αποδοτικότεροι, αλλά τα κυκλώματα μνήμης αδυνατούν να τους ακολουθήσουν. Το επιστημονικό αυτό πεδίο στρέφεται πλέον σε έξυπνες λύσεις που έχουν ως στόχο την μείωση του κόστους επικοινωνίας μεταξύ των δύο υποσυστημάτων. Οι τρόποι διαχείρισης της κρυφής μνήμης αποτελούν έκφανση της πραγματικότητας αυτής και ένα από τα βασικότερα μέρη της είναι οι αλγόριθμοι αντικατάστασης. Η μελέτη εστιάζει στη σχέση ανάμεσα σε δύο, ήδη εφαρμοσμένων, νέων πολιτικών αντικατάστασης, καθώς και το βαθμό στον οποίο μπορεί να υπάρξει συγχώνευση τους σε μία καινούργια. Οι νέοι αλγόριθμοι που μελετάμε είναι ο αλγόριθμος αντικατάστασης IbRdPrediction (Instruction-based Reuse-Distance Prediction – Πρόβλεψης απόστασης επαναχρησιμοποίησης βασισμένης σε εντολή) και ο αλγόριθμος MLP-Aware (Memory level parallelism aware – επίγνωσης επιπέδου παραλληλισμού μνήμης). Εξετάζουμε κατά πόσο είναι δυνατόν να δημιουργηθεί ένας νέος μηχανισμός πρόβλεψης βασισμένος σε εντολη (instruction-based) που να λαμβάνει υπόψιν του τα χαρακτηριστικά του παραλληλισμού επιπέδου μνήμης (MLP) και κατα πόσο βελτιώνει τις ήδη υπάρχουσες τεχνικές ως προς την απόδοση του συστήματος. / In the continiously altering field of computer architecture, changes occur with exponential rate the last 30 years. Cache memories have become the pole of interest, as processors are growing all faster, all efficient, but memory circuits fail to follow them. The scientific community is now turning to clever solutions which aim to limit the two subsytem communication cost. Cache management consists the expression of this reality, and one of its most basic parts is cache replacement algorithms. The thesis focuses on the relation between two, already applied, recent replacement policies, and the degree in which their coalescence in a new policy can exist. We study the IbRdPrediction (Instruction-based Reuse-Distance Prediction) replacement algorithm and the MLP-Aware (Memory level parallelism aware) replacement algorithm. We thoroughly examine if it is possible to create a novel prediction mecahnism, based on instruction, that takes into account the MLP ((Memory level parallelism) characteristics, and how much it improves the existing techniques concerning system performance. Κρυφές μνήμες Πρόβλεψη Διαχείριση Εντολές 004.5 Cache memories Prediction Algorithms Replacement
523	Αρχιτεκτονική προσομοίωση σε επεξεργαστικές μονάδες υψηλού βαθμού παραλληλίας Στρίκος, Νικόλαος 11 January 2011 (has links) Η πρόσφατη εξάπλωση που είδε το μοντέλο της παράλληλης επεξεργασίας στους μικροεπεξεργαστές γενικής χρήσης με την εισαγωγή περισσότερων από έναν πυρήνες εντός του ολοκληρωμένου κυκλώματος έφερε νέες απαιτήσεις στις μεθόδους προσομοίωσης που παραδοσιακά χρησιμοποιήθηκαν για την εξερεύνηση νέων αρχιτεκτονικών. Στην εργασία αυτή προτείνεται ένα πλαίσιο και ένα προγραμματιστικό μοντέλο που κάνει χρήση της αρχιτεκτονικής υψηλού βαθμού παραλληλίας CUDA για να επιτύχει επιτάχυνση στην αρχιτεκτονική προσομοίωση πρωτοκόλλων συνοχής κρυφής μνήμης. / The recent adoption of the parallel computing model in general-use microprocessors with the inclusion of more than one cores in the IC has raised new demands for the simulation methodologies that have been traditionally used. In this work, a framework and a programming model are proposed that make use of the highly parallel CUDA platform to accelerate architectural simulation of cache coherency protocols. Μικροεπεξεργαστές Κρυφή μνήμη 005.275 GPU CUDA Cache coherency protocols Parallel simulation
524	A Structured Design Methodology for High Performance VLSI Arrays January 2012 (has links) abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor. / Dissertation/Thesis / Ph.D. Electrical Engineering 2012 Electrical engineering Design Arrayed Structures Cache Directed placement Register File (RF) Translation lookaside buffer (TLB)
525	Využití nových forem v rekreační pohybové aktivitě se zaměřením na geocaching v lokalitě "Česká Kanada" / Theuse of newforms in recreationalactivitywith a focus on geocaching in " Česká Kanada" JŮDA, Jiří January 2015 (has links) BACKGROUND: Recreational physical activity is of great importance to human health. Reduces the risk of disease and positively beneficial to the human body. Recreational physical activity is influenced by many factors, eg. Age, sex, place of residence, but also its form. OBJECTIVES: Create a project and presentation of a study on new forms of recreational physical activities focused on geocaching. METHODS: The research involved five families with children permanently living in the Czech Canada. Their task was a regular recreational tourism for the purpose of hunting caches. To obtain data was used informal conversation and as a complementary data collection and record sheet output questionnaire. RESULTS: All the families surveyed had a positive shift in recreational physical activity. The most common form of recreational physical activity was hiking as well as cycling. Thanks geocaching was for all investigated deepened relationship to nature and cultural sites.
526	Estudo sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC Silva, Gustavo Girão Barreto da January 2009 (has links) Ao longo dos últimos anos, os sistemas embarcados vêm se tornando cada vez mais complexos tanto em termos de hardware quanto de software. Ultimamente têm-se adotado como solução o uso de MPSoCs (sistemas multiprocessados integrados em chip) para uma maior eficiência energética e computacional nestes sistemas. Com o uso de diversos elementos de processamento, redes-em-chip (NoC - networks-on-chip) aparecem como soluções de melhor desempenho do que barramentos. Nestes ambientes cujo desempenho depende da eficiência do modelo de comunicação, a hierarquia de memória se torna um elemento chave. Baseando-se neste cenário, este trabalho realiza uma investigação sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC. Dentro deste escopo foi desenvolvida uma nova organização de memória fisicamente centralizada com diferentes espaços de endereçamentos denominada nDMA. Este trabalho também apresenta uma comparação entre a nova organização e outras três organizações bastante difundidas tais como memória distribuída, memória compartilhada e memória compartilhada distribuída. Estas duas ultimas adotam um modelo de coerência de cache baseado em diretório completamente desenvolvido em hardware. Os modelos de memória foram implementados na plataforma virtual SIMPLE (SIMPLE Multiprocessor Platform Environment). Resultados experimentais mostram uma forte dependência com relação à carga de comunicação gerada pelas aplicações. O modelo de memória distribuída apresenta melhores resultados conforme a carga de comunicação das aplicações é baixa. Por outro lado, o novo modelo de memória fisicamente compartilhado com diferentes espaços de endereçamento apresenta melhores resultados conforme a carga de comunicação das aplicações é alta. Também foram realizados experimentos objetivando analisar o desempenho dos modelos de memória em situações de alta latência de comunicação na rede. Resultados mostram melhores resultados do modelo de memória distribuída quando a carga de comunicação das aplicações é alta e, caso contrário, o modelo nDMA apresenta melhores resultados. Por fim, foram analisados os desempenhos dos modelos de memória durante o processo de migração de tarefas. Neste caso, os modelos de memória compartilhada e compartilhada distribuída apresentaram melhores resultados devido ao fato de que não se faz necessária o envio dos dados da aplicação nestes modelos e também devido ao menor tamanho de código se comparado com os outros modelos. / In the past few the years, embedded systems have become even more complex both on terms of hardware and software. Lately, the use of MPSoCs (Multi-Processor Systems-on-Chip) has been adopted on these systems for a better energetic and computational efficiency. Due to the use of several processing elements, Networks-on-Chip arise as better performance solutions than buses. Considering this scenario, this work performs an investigation on the impact of memory hierarchy in NoC-based MPSoCs. In this context, a new physically centralized and shared memory organization with different address spaces named nDMA was developed. This work also presents a comparison between the new memory organization and three different well-known memory hierarchy models such as distributed memory and shared and distributed shared memories that make use of a fully hardware cache coherence solution. The memory models were implemented in the SIMPLE (SIMPLE Multiprocessor Platform Environment) virtual platform. Experimental results shows a strong dependency on the application communication workload. The distributed memory model presents better results as the application communication workload is low. On the other hand, the new memory model (physically shared with different address spaces) presents better results as the application communication workload is high. There were also experiments aiming at observing the performance of the memory models in situations where the communication latency on the network is high. Results show better results of the distributed memory model when the application communication workload is high, and the nDMA model presents better results otherwise. Finally, the performance of the memory models during a task migration process were evaluated. In this case, the shared memory and distributed shared memory models presented better results due to the fact that in this case the data memory does not need to be transferred from one point to another and also due to the low size of the memory code in these cases if compared to other memory models. Microeletrônica MPSoC NoC Embedded systems Multiprocessor system-on-chip Network-on-chip Cache coherence Task migration
527	Substituição de objetos em cache na internet usando modelo vetorial para comparação semântica da informação / Igor de Souza Paiva ; orientador, Alcides Calsavara Paiva, Igor de Souza January 2005 (has links) Dissertação (mestrado) - Pontifícia Universidade Católica do Paraná, Curitiba, 2005 / Inclui bibliografia / Esta dissertação de mestrado apresenta uma nova abordagem para a comparação de objetos armazenados em mecanismos de cache através da análise da semântica da informação contida nestes objetos. A comparação semântica é utilizada como critério para a substit / This research work presents a new approach to compare objects stored in cache engines through semantic analysis of object contents. Semantic comparison between objects is used as a replacement strategy, differently from classical approaches that use objec 004 20 Informática - Dissertações Memória cache Semântica
528	Cache cooperativo aplicado ao protocolo GIP em redes AD HOC móveis / Rodrigo Cantú Polo ; orientador, Luiz Lima Júnior Polo, Rodrigo Cantu, 1977- January 2009 (has links) Dissertação (mestrado) - Pontifícia Universidade Católica do Paraná, Curitiba, 2009 / Bibliografia: f. 82-86 / Redes sem fio móveis estão cada vez mais populares no nosso cotidiano. Estas redes dinâmicas, que não necessitam de nenhuma infra-estrutura de operação ou gerenciamento centralizado, são conhecidas como redes ad hoc móveis (Mobile Ad hoc Networks, MANETS) / Mobile wireless networks are becoming increasingly popular nowadays. These dynamic networks, which do not need any operational infrastructure or centralized management, are known as mobile ad hoc networks (MANETs). Due to route instability on this kind of 004 20 Informática Dissertações Memória cache Redes de computação Protocolos CORBA (Arquitetura de computador) Sistemas de comunicação móvel
529	Online thread and data mapping using the memory management unit / Mapeamento dinâmico de threads e dados usando a unidade de gerência de memória Cruz, Eduardo Henrique Molina da January 2016 (has links) Conforme o paralelismo a nível de threads aumenta nas arquiteturas modernas devido ao aumento do número de núcleos por processador e processadores por sistema, a complexidade da hierarquia de memória também aumenta. Tais hierarquias incluem diversos níveis de caches privadas ou compartilhadas e tempo de acesso não uniforme à memória. Um desafio importante em tais arquiteturas é a movimentação de dados entre os núcleos, caches e bancos de memória primária, que ocorre quando um núcleo realiza uma transação de memória. Neste contexto, a redução da movimentação de dados é um dos pilares para futuras arquiteturas para manter o aumento de desempenho e diminuir o consumo de energia. Uma das soluções adotadas para reduzir a movimentação de dados é aumentar a localidade dos acessos à memória através do mapeamento de threads e dados. Mecanismos de mapeamento do estado-da-arte aumentam a localidade de memória mapeando threads que compartilham um grande volume de dados em núcleos próximos na hierarquia de memória (mapeamento de threads), e mapeando os dados em bancos de memória próximos das threads que os acessam (mapeamento de dados). Muitas propostas focam em mapeamento de threads ou dados separadamente, perdendo oportunidades de ganhar desempenho. Outras propostas dependem de traços de execução para realizar um mapeamento estático, que podem impor uma sobrecarga alta e não podem ser usados em aplicações cujos comportamentos de acesso à memória mudam em diferentes execuções. Há ainda propostas que usam amostragem ou informações indiretas sobre o padrão de acesso à memória, resultando em informação imprecisa sobre o acesso à memória. Nesta tese de doutorado, são propostas soluções inovadoras para identificar um mapeamento que otimize o acesso à memória fazendo uso da unidade de gerência de memória para monitor os acessos à memória. As soluções funcionam dinamicamente em paralelo com a execução da aplicação, detectando informações para o mapeamento de threads e dados. Com tais informações, o sistema operacional pode realizar o mapeamento durante a execução das aplicações, não necessitando de conhecimento prévio sobre o comportamento da aplicação. Como as soluções funcionam diretamente na unidade de gerência de memória, elas podem monitorar a maioria dos acessos à memória com uma baixa sobrecarga. Em arquiteturas com TLB gerida por hardware, as soluções podem ser implementadas com pouco hardware adicional. Em arquiteturas com TLB gerida por software, algumas das soluções podem ser implementadas sem hardware adicional. As soluções aqui propostas possuem maior precisão que outros mecanismos porque possuem acesso a mais informações sobre o acesso à memória. Para demonstrar os benefícios das soluções propostas, elas são avaliadas com uma variedade de aplicações usando um simulador de sistema completo, uma máquina real com TLB gerida por software, e duas máquinas reais com TLB gerida por hardware. Na avaliação experimental, as soluções reduziram o tempo de execução em até 39%. O ganho de desempenho se deu por uma redução substancial da quantidade de faltas na cache, e redução do tráfego entre processadores. / As thread-level parallelism increases in modern architectures due to larger numbers of cores per chip and chips per system, the complexity of their memory hierarchies also increase. Such memory hierarchies include several private or shared cache levels, and Non-Uniform Memory Access nodes with different access times. One important challenge for these architectures is the data movement between cores, caches, and main memory banks, which occurs when a core performs a memory transaction. In this context, the reduction of data movement is an important goal for future architectures to keep performance scaling and to decrease energy consumption. One of the solutions to reduce data movement is to improve memory access locality through sharing-aware thread and data mapping. State-of-the-art mapping mechanisms try to increase locality by keeping threads that share a high volume of data close together in the memory hierarchy (sharing-aware thread mapping), and by mapping data close to where its accessing threads reside (sharing-aware data mapping). Many approaches focus on either thread mapping or data mapping, but perform them separately only, losing opportunities to improve performance. Some mechanisms rely on execution traces to perform a static mapping, which have a high overhead and can not be used if the behavior of the application changes between executions. Other approaches use sampling or indirect information about the memory access pattern, resulting in imprecise memory access information. In this thesis, we propose novel solutions to identify an optimized sharing-aware mapping that make use of the memory management unit of processors to monitor the memory accesses. Our solutions work online in parallel to the execution of the application and detect the memory access pattern for both thread and data mappings. With this information, the operating system can perform sharing-aware thread and data mapping during the execution of the application, without any prior knowledge of their behavior. Since they work directly in the memory management unit, our solutions are able to track most memory accesses performed by the parallel application, with a very low overhead. They can be implemented in architectures with hardwaremanaged TLBs with little additional hardware, and some can be implemented in architectures with software-managed TLBs without any hardware changes. Our solutions have a higher accuracy than previous mechanisms because they have access to more accurate information about the memory access behavior. To demonstrate the benefits of our proposed solutions, we evaluate them with a wide variety of applications using a full system simulator, a real machine with software-managed TLBs, and a trace-driven evaluation in two real machines with hardware-managed TLBs. In the experimental evaluation, our proposals were able to reduce execution time by up to 39%. The improvements happened to a substantial reduction in cache misses and interchip interconnection traffic. Processamento paralelo Memoria : Computadores Data movement Thread and data mapping Cache memory NUMA
530	Cache-Related Delay Server for Aperiodic Job Handling in Real-Time Systems Pukhraj Jain, Vardhman Jain 01 December 2010 (has links) Embedded/real-time systems are becoming ubiquitous in today's world and their pervasive nature is increasing with the advent of cyber-physical systems. Providing temporal guarantees is paramount in such systems. Most of the normal operation in real-time systems is modelled using periodic tasks. Event-driven behaviour is modelled using aperiodic jobs. To ensure an acceptable Quality of Service for aperiodic jobs without jeopardizing safety of periodic tasks, aperiodic servers were introduced [2], [3]. Aperiodic servers are used to reserve a quota for the execution of aperiodic jobs. However, they do not take into account, cache-related delays that the execution of aperiodic jobs could impose on periodic tasks, thereby making their use in systems with caches unsafe. In this thesis, we introduce Cache Related Delay Servers to solve this problem. Statically, every periodic task's worst-case execution time includes a pre-determined delay quota for delay caused by aperiodic jobs. During system operation, the aperiodic server is allowed to execute only if periodic jobs that may be affected by it have sufficient delay quota to accommodate its execution. Otherwise, the priority of the aperiodic server is temporarily decreased to the level of the lowest-priority periodic job with insufficient quota, thereby ensuring safe execution of periodic tasks. aperiodic Job Handling aperiodic Servers cache related delay servers caches delay real time systems vardhman

Search results