Global ETD Search

1	Increasing TLB reach using TCAM cells Kumar, Anuj 17 February 2005 (has links) We propose dynamic aggregation of virtual tags in TLB to increase its coverage and improve the overall miss ratio during address translation. Dynamic aggregation exploits both the spatial and temporal locality inherent in most application programs. To support dynamic aggregation, we introduce the use of ternary-CAM (TCAM) cells at the second-level TLB. The modified TLB architecture results in an increase of TLB reach without additional CAM entries. We also adopt bulk prefetching concurrently with aggregation technique to enhance the benefits due to spatial locality. The performance of the proposed TLB architecture is evaluated using SPEC2000 benchmarks concentrating on those that show high data TLB miss ratios. Simulation results indicate a reduction in miss ratios between 59% and 99.99% for all the considered bench-marks except for one benchmark, which has a reduction of 10%. We show that the L2 TLB when enhanced using TCAM cells is an attractive solution to high miss ratios exhibited by applications. TLB TCAM Aggregation
2	Increasing TLB reach using TCAM cells Kumar, Anuj 17 February 2005 (has links) We propose dynamic aggregation of virtual tags in TLB to increase its coverage and improve the overall miss ratio during address translation. Dynamic aggregation exploits both the spatial and temporal locality inherent in most application programs. To support dynamic aggregation, we introduce the use of ternary-CAM (TCAM) cells at the second-level TLB. The modified TLB architecture results in an increase of TLB reach without additional CAM entries. We also adopt bulk prefetching concurrently with aggregation technique to enhance the benefits due to spatial locality. The performance of the proposed TLB architecture is evaluated using SPEC2000 benchmarks concentrating on those that show high data TLB miss ratios. Simulation results indicate a reduction in miss ratios between 59% and 99.99% for all the considered bench-marks except for one benchmark, which has a reduction of 10%. We show that the L2 TLB when enhanced using TCAM cells is an attractive solution to high miss ratios exhibited by applications. TLB TCAM Aggregation
3	USING RUNTIME INFORMATION TO IMPROVE MEMORY SYSTEM PERFORMANCE MIN, RUI January 2005 (has links) No description available. TLB Cache Virtual Memory OS Set Associativity
4	Auto-Determination of Cache/TLB parameters Kommanaboina, Kishor Yadav 23 August 2013 (has links) No description available. Computer Science Cache TLB Microbenchmarks inclusive caches exclusive caches
5	Impact of Increased Cache Misses on Runtime Performance of MPX-enabled Programs Sharma, Niti 10 June 2019 (has links) Low level languages like C and C++ provide high performance and direct control over memory management. But these languages are prone to memory safety violations. Intel introduced a new ISA extension-Memory Protection Extension(MPX), a hardware-assisted full-stack solution, to protect against the memory safety violations. While MPX efficiently prevents memory errors like buffer overflows and out of bound memory accesses, it comes at the cost of high performance overheads. Also, the cache locality worsens in MPX protected applications. In our research, we analyze if there is a correlation between increase in cache misses and runtime degradation in programs compiled with MPX support. We analyze 15 SPEC CPU benchmark programs for different input sizes on Windows platform, compiled with Intel's ICC compiler. We find that for input sizes train(medium) and ref(large), the average performance overheads are 140% and 144% respectively. We find that 5 out of 15 benchmarks do not have any runtime overheads and also, do not have any change in cache misses at any level. However for rest of the 10 benchmarks, we find a strong correlation between runtime overheads and cache misses overheads, with the correlation coefficients ranging from 0.8 to 0.36 for different input sizes. Based on our findings, we conclude that there is a direct correlation between runtime overheads and increase in cache misses. We also find that instructions overheads and runtime overheads have a positive correlation, with the coefficient values ranging from 0.7 to 0.33 for different input sizes. / Master of Science / Low level programming languages like C and C++ are primary choices to write low-level systems software such as operating systems, virtual machines, embedded software, and performance-critical applications. But these languages are considered as unsafe and prone to memory safety errors. Intel introduced a new technique- Memory Protection Extensions (MPX) to protect against these memory errors. But prior research found that applications supported with MPX have increased runtimes (slowdowns). In our research, we analyze these slowdowns for different input sizes(medium and large) in 15 benchmark applications. Based on the input sizes, the average slowdowns range from 140% to 144%. We then examine if there is a correlation between increase in cache misses under MPX and the slowdowns. A hardware cache is a component that stores data so that future requests for that data can be served faster. Hence, cache miss is a state where the data requested for processing by a component or application is not found in the cache. Whenever a cache miss happen, the processor waits for the data to be fetched from the next cache level or from main memory before it can continue to execute. This wait influences the runtime performance of the application. Our evaluations find that 10 out of 15 applications which have increased runtimes, also have increase in cache misses. This shows a positive correlation between these two parameters. Along with that, we also found that increase in instruction size in MPX protected applications also has a direct correlation with the runtime degradation. We also quantify these relationships with a statistical measure called correlation coefficient. Spatial Security Memory Protection Extensions Caches Benchmarks Runtime Overheads TLB
6	Big Data causing Big (TLB) Problems: Taming Random Memory Accesses on the GPU Karnagel, Tomas, Ben-Nun, Tal, Werner, Matthias, Habich, Dirk, Lehner, Wolfgang 13 June 2022 (has links) GPUs are increasingly adopted for large-scale database processing, where data accesses represent the major part of the computation. If the data accesses are irregular, like hash table accesses or random sampling, the GPU performance can suffer. Especially when scaling such accesses beyond 2GB of data, a performance decrease of an order of magnitude is encountered. This paper analyzes the source of the slowdown through extensive micro-benchmarking, attributing the root cause to the Translation Lookaside Buffer (TLB). Using the micro-benchmarks, the TLB hierarchy and structure are fully analyzed on two different GPU architectures, identifying never-before-published TLB sizes that can be used for efficient large-scale application tuning. Based on the gained knowledge, we propose a TLB-conscious approach to mitigate the slowdown for algorithms with irregular memory access. The proposed approach is applied to two fundamental database operations - random sampling and hash-based grouping - showing that the slowdown can be dramatically reduced, and resulting in a performance increase of up to 13×. info:eu-repo/classification/ddc/004 ddc:004
7	Síndrome de intestino corto como factor desencadenante de translocación bacteriana y del fallo multiorgánico, El Zurita Romero, Manuel 08 November 1993 (has links) Está quedando claro que el tracto gastrointestinal no es un órgano pasivo sino que posee importantes funciones endocrinas, metabólicas, inmunológicas y de barrera junto a las de absorción de nutrientes. En situaciones normales la mucosa gastrointestinal íntegra ejerce una función de barrera para las bacterias y endotoxinas intraluminales, pero en determinadas circunstancias estas bacterias pueden alcanzar inicialmente los ganglios linfáticos mesentéricos e invadir los tejidos extraintestinales, constituyendo lo que se conoce como Translocación bacteriana (TLB) concepto ya expuesto por FINE en la década de los sesenta como reservorio de infecciones sistémicas en pacientes de alto riesgo y origen del Fracaso multiorgánico (FMO), no fué tenida en consideración. CONCEPTO DE SINDROME DE INTESTINO CORTOSe conoce con el nombre de Síndrome de Intestino Corto (SIC) al conjunto de transtornos que se presentan en el organismo tras la práctica de resecciones masivas del intestino delgado, en las que pueden incluirse total o parcialmente el colon. El término de resección masiva intestinal está en discrepancia según distintos autores, siendo la más aceptada aquella que sobrepasa los 2/3 de su longitud, y puede ser tolerada, pero si excede las 3/4 partes, aparece un cuadro clínico denominado síndrome de intestino corto (SIC). Para llevar una vida razonable debe quedar un remanente entre 90 y 120 cm, adquiriendo gran importancia conservar la vávula ileocecal por sus consecuencia metabólicas derivadas y afectarán: a) a la frecuencia y composición cualitativa de las deposiciones. b) A la secreción intestinal. c) a la secreción gástrica. d) a la secreción biliopancreática y e) A los procesos absortivos de estos órganos. Procesos que pueden ser causa de un SIC- Resección Masiva del Intestino Delgado: Oclusión vascular mesentérica, vólvulos, enfermedad de Crohn, neoplasias, traumatismos, hernias internas.- Operaciones de Cortocircuito Intestinal: Por obesidad morbosa y/o hipercolesterinemia, gastroileostomía inadvertida y fístulas internas.- Afecciones extensas del Intestino Delgado: Enfermedad de Crohn, carcinomatosis, atresias intestinales múltiples.Mientras que la resección de segmentos de pequeña longitud es bien tolerada y no se producen consecuencias detectables, en los casos de resecciones amplias se observan alteraciones específicas que pueden poner en peligro la vida del paciente de no proporcionársele algún método adicional para mantener un estado nutritivo adecuado. Así pues, el resultado de una resección intestinal dependerá: 1.- De la extensión de la resección,2.- De la extirpación del ileon terminal (funciones muy específicas de la absorción),3.- De la función del intestino residual con presencia o ausencia de la válvula ileocecal, y 4.- De la capacidad del intestino residual para adaptarse, desde el punto de vista morfológico y funcional.Ya que la longitud media real del intestino es considerable y muy variable, la longitud absoluta del segmento no resecado puede no revestir importancia. De hecho el factor más importante es la longitud del intestino residual en forma de % de la longitud total del intestino. Resección Proximal y Resección Distal Aproximadamente la mitad de la mucosa disponible, unos 100 m2 (SCHMIDT 1965), se encuentra en el 1/4 proximal del intestino delgado. La superficie va disminuyendo progresivamente desde la región proximal a la distal.El Yeyuno es esencialmente importante para la absorción de hierro, calcio, ácido fólico, vitaminas, etc.En el Ileon se encuentran mecanismos importantes de transporte para la absorción activa de sales biliares y de vitamina B12, por lo que dependiendo de la amplitud de la reseción puede ocasionarse malabsorción de dicha vitamina y eliminación de ácidos biliares hacia el colon (al quedar interrumpida la circulación enterohepática)lo tendrá varias consecuencias:1. Diarrea (diarrea de ácidos biliares).2. Esteatorrea.3. Litiasis biliar incrementada (DOWLING et al, 1972).4. Urolitiasis debida a Hiperoxaluria.Toda resección intestinal conduce a nivel del intestino residual a modificaciones estructurales, cinéticas y funcionales. Estos fenómenos están bajo la dependencia de factores aún no totalmente conocidos interfiriendo fenómenos de regulación intrínseca, factores intraluminales o factores humorales.Cuadro ClínicoEl síntoma principal es la diarrea, que inicialmente es muy importante y debida: a) Los hidratos de carbono por la acción bacteriana del colon pueden convertirse en ácidos grasos de cadena corta provocando así diarrea, como resultado de la elevada osmolaridad y el bajo pH del contenido intestinal. b) El paso de AB al colon en sus formas deshidroxiladas altera la absorción de éste y puede desencadenar diarrea. c) El crecimiento excesivo de bacterias como resultado de la pérdida de la válvula ileocecal, tambien puede provocar diarrea de ácidos biliares y esteatorrea (MEKHJIAN et al, 1971). d. La hipersecreción ácida puede agravar el cuadro. Nutrición postoperatoriaEl SIC se manifiesta por diarrea intensa y desnutrición consecutivas a la pérdida de la capacidad de absorción del intestino residual.Hay tres fases postoperatorias en él. La primera es un periodo de pérdida de líquidos y electrolitos por diarrea intensa. Puede no manifestarse hasta que se inicia la alimentación oral. La diarrea acuosa disminuye gradualmente de uno a tres meses pudiendo permanecer alto el volumen fecal.La segunda fase es el periodo en el que ocurre en gran medida la adaptación del intestino remanante. La diarrea suele estabilizarse pudiendo lograrse un balance positivo de líquidos y electrolitos mediante la ingesta. Sin embargo la grasa se absorbe mal y pueden aparecer deficiencias de calcio y magnesio.En la tercera fase, de adaptación total, puede alcanzarse un balance positivo de todos los nutrientes con la alimentación oral. No en todos los casos se llega a ésta última y si se consigue dicha adaptación suele requerirse nutrición parenteral de 3 á 12 meses e incluso más tiempo.Durante la primera fase el único medio de alimentación posible es la nutrición parenteral total (NPT). Síndrome de l'Intestí Curt (SIC) Traslocació bacteriana (TLB) Mucosa gastrointestinal Ciències de la Salut 616.3
8	Cost-effective Designs for Supporting Correct Execution and Scalable Performance in Many-core Processors Romanescu, Bogdan Florin January 2010 (has links) <p>Many-core processors offer new levels of on-chip performance by capitalizing on the increasing rate of device integration. Harnessing the full performance potential of these processors requires that hardware designers not only exploit the advantages, but also consider the problems introduced by the new architectures. Such challenges arise from both the processor's increased structural complexity and the reliability issues of the silicon substrate. In this thesis, we address these challenges in a framework that targets correct execution and performance on three coordinates: 1) tolerating permanent faults, 2) facilitating static and dynamic verification through precise specifications, and 3) designing scalable coherence protocols.</p> <p>First, we propose CCA, a new design paradigm for increasing the processor's lifetime performance in the presence of permanent faults in cores. CCA chips rely on a reconfiguration mechanism that allows cores to replace faulty components with fault-free structures borrowed from neighboring cores. In contrast with existing solutions for handling hard faults that simply shut down cores, CCA aims to maximize the utilization of defect-free resources and increase the availability of on-chip cores. We implement three-core and four-core CCA chips and demonstrate that they offer a cumulative lifetime performance improvement of up to 65% for industry-representative utilization periods. In addition, we show that CCA benefits systems that employ modular redundancy to guarantee correct execution by increasing their availability.</p> <p>Second, we target the correctness of the address translation system. Current processors often exhibit design bugs in their translation systems, and we believe one cause for these faults is a lack of precise specifications describing the interactions between address translation and the rest of the memory system, especially memory consistency. We address this aspect by introducing a framework for specifying translation-aware consistency models. As part of this framework, we identify the critical role played by address translation in supporting correct memory consistency implementations. Consequently, we propose a set of invariants that characterizes address translation. Based on these invariants, we develop DVAT, a dynamic verification mechanism for address translation. We demonstrate that DVAT is efficient in detecting translation-related faults, including several that mimic design bugs reported in processor errata. By checking the correctness of the address translation system, DVAT supports dynamic verification of translation-aware memory consistency.</p> <p>Finally, we address the scalability of translation coherence protocols. Current software-based solutions for maintaining translation coherence adversely impact performance and do not scale. We propose UNITD, a hardware coherence protocol that supports scalable performance and architectural decoupling. UNITD integrates translation coherence within the regular cache coherence protocol, such that TLBs participate in the cache coherence protocol similar to instruction or data caches. We evaluate snooping and directory UNITD coherence protocols on processors with up to 16 cores and demonstrate that UNITD reduces the performance penalty of translation coherence to almost zero.</p> / Dissertation Computer Engineering Address translation Error detection Many-core Proccesors Memory consistency Permanent faults TLB coherence
9	Semantics-oriented low power architecture Ballapuram, Chinnakrishnan S. 01 April 2008 (has links) Innovations in the microarchitecture and prominent advances in the semiconductor process technology enable sophisticated and powerful microprocessors. However, they also lead to increased power consumption. The main contribution of the thesis is the demonstration of Semantics-Oriented Low Power Architecture techniques that use the semantics of memory references and variables used in an application program to reduce the power consumption in the memory sub-system of a microprocessor. The Semantic-Aware Multilateral Partitioning (SAM) technique reduces the cache and TLB power consumption by decoupling the data TLB lookups and the data cache accesses, based on the semantic regions defined by the programming languages and the software convention, into discrete reference sub-streams, namely, stack, global static, and heap. To reduce the power consumed by the snoops in Chip Multiprocessor, we propose a hardware technique called Selective Snoop Probe (SSP) and a compiler-based hardware supported technique called Essential Snoop Probe (ESP) that use the properties of the program variables. By selectively sending the snoop probes, the SSP and ESP techniques relax the conservative nature of the cache coherency protocol and its implementation to reduce power and improve performance. Semantics Snoop Low-power TLB Cache Computer architecture Microcomputing Energy conservation Memory management (Computer science)
10	A Structured Design Methodology for High Performance VLSI Arrays January 2012 (has links) abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor. / Dissertation/Thesis / Ph.D. Electrical Engineering 2012 Electrical engineering Design Arrayed Structures Cache Directed placement Register File (RF) Translation lookaside buffer (TLB)

Search results