by Chi-sum, Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1995. / Includes bibliographical references (leaves 110-113). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Cache Memories --- p.1 / Chapter 1.3 --- Improving Cache Performance --- p.3 / Chapter 1.4 --- Improving System Performance --- p.4 / Chapter 1.5 --- Organization of the dissertation --- p.6 / Chapter 2 --- Related Work --- p.8 / Chapter 2.1 --- Cache Performance --- p.8 / Chapter 2.2 --- Non-Blocking Cache --- p.9 / Chapter 2.3 --- Cache Prefetching --- p.10 / Chapter 2.3.1 --- Hardware Prefetching --- p.10 / Chapter 2.3.2 --- Software-assisted Prefetching --- p.13 / Chapter 2.3.3 --- Improving Cache Effectiveness --- p.22 / Chapter 2.4 --- Other Techniques to Reduce and Hide Memory Latencies --- p.25 / Chapter 2.4.1 --- Register Preloading --- p.25 / Chapter 2.4.2 --- Write Policies --- p.26 / Chapter 2.4.3 --- Small Specialized Cache --- p.26 / Chapter 2.4.4 --- Program Transformation --- p.27 / Chapter 3 --- Stride CAM Prefetching --- p.30 / Chapter 3.1 --- Introduction --- p.30 / Chapter 3.2 --- Architectural Model --- p.32 / Chapter 3.2.1 --- Compiler Support --- p.33 / Chapter 3.2.2 --- Hardware Support --- p.35 / Chapter 3.2.3 --- Model Details --- p.39 / Chapter 3.3 --- Optimization Issues --- p.39 / Chapter 3.3.1 --- Eliminating Reductant Prefetching --- p.40 / Chapter 3.3.2 --- Code Motion --- p.40 / Chapter 3.3.3 --- Burst Mode --- p.44 / Chapter 3.3.4 --- Stride CAM Overflow --- p.45 / Chapter 3.3.5 --- Effects of Loop Optimizations --- p.46 / Chapter 3.4 --- Practicability --- p.50 / Chapter 3.4.1 --- Evaluation Methodology --- p.51 / Chapter 3.4.2 --- Prefetch Accuracy --- p.54 / Chapter 3.4.3 --- Stride CAM Size --- p.56 / Chapter 3.4.4 --- Software Overhead --- p.60 / Chapter 4 --- Stride Register Prefetching --- p.67 / Chapter 4.1 --- Motivation --- p.67 / Chapter 4.2 --- Architectural Model --- p.67 / Chapter 4.2.1 --- Stride Register --- p.69 / Chapter 4.2.2 --- Compiler Support --- p.70 / Chapter 4.2.3 --- Prefetch Bits --- p.72 / Chapter 4.2.4 --- Operation Details --- p.77 / Chapter 4.3 --- Practicability and Optimizations --- p.78 / Chapter 4.3.1 --- Practicability on NASA7 Benchmark Programs --- p.78 / Chapter 4.3.2 --- Optimization Issues --- p.81 / Chapter 4.4 --- Comparison Between Stride CAM and Stride Register Models --- p.84 / Chapter 5 --- Small Software-Driven Array Cache --- p.87 / Chapter 5.1 --- Introduction --- p.87 / Chapter 5.2 --- Cache Pollution in MXM --- p.88 / Chapter 5.3 --- Architectural Model --- p.89 / Chapter 5.3.1 --- Operation Details --- p.91 / Chapter 5.4 --- Effectiveness of Array Cache --- p.92 / Chapter 6 --- Conclusion --- p.96 / Chapter 6.1 --- Conclusion --- p.96 / Chapter 6.2 --- Future Research: An Extension of the Stride CAM Model --- p.97 / Chapter 6.2.1 --- Background --- p.97 / Chapter 6.2.2 --- Reference Address Series --- p.98 / Chapter 6.2.3 --- Extending the Stride CAM Model --- p.100 / Chapter 6.2.4 --- Prefetch Overhead --- p.109 / Bibliography --- p.110 / Appendix --- p.114 / Chapter A --- Simulation Results - Stride CAM Model --- p.114 / Chapter A.l --- Execution Time --- p.114 / Chapter A.1.1 --- BTRIX --- p.114 / Chapter A.1.2 --- CFFT2D --- p.115 / Chapter A.1.3 --- CHOLSKY --- p.116 / Chapter A.1.4 --- EMIT --- p.117 / Chapter A.1.5 --- GMTRY --- p.118 / Chapter A.1.6 --- MXM --- p.119 / Chapter A.1.7 --- VPENTA --- p.120 / Chapter A.2 --- Memory Delay --- p.122 / Chapter A.2.1 --- BTRIX --- p.122 / Chapter A.2.2 --- CFFT2D --- p.123 / Chapter A.2.3 --- CHOLSKY --- p.124 / Chapter A.2.4 --- EMIT --- p.125 / Chapter A.2.5 --- GMTRY --- p.126 / Chapter A.2.6 --- MXM --- p.127 / Chapter A.2.7 --- VPENTA --- p.128 / Chapter A.3 --- Overhead --- p.129 / Chapter A.3.1 --- BTRIX --- p.129 / Chapter A.3.2 --- CFFT2D --- p.130 / Chapter A.3.3 --- CHOLSKY --- p.131 / Chapter A.3.4 --- EMIT --- p.132 / Chapter A.3.5 --- GMTRY --- p.133 / Chapter A.3.6 --- MXM --- p.134 / Chapter A.3.7 --- VPENTA --- p.135 / Chapter A.4 --- Hit Ratio --- p.136 / Chapter A.4.1 --- BTRIX --- p.136 / Chapter A.4.2 --- CFFT2D --- p.137 / Chapter A.4.3 --- CHOLSKY --- p.137 / Chapter A.4.4 --- EMIT --- p.138 / Chapter A.4.5 --- GMTRY --- p.139 / Chapter A.4.6 --- MXM --- p.139 / Chapter A.4.7 --- VPENTA --- p.140 / Chapter B --- Simulation Results - Array Cache --- p.141 / Chapter C --- NASA7 Benchmark --- p.145 / Chapter C.1 --- BTRIX --- p.145 / Chapter C.2 --- CFFT2D --- p.161 / Chapter C.2.1 --- cfft2dl --- p.161 / Chapter C.2.2 --- cfft2d2 --- p.169 / Chapter C.3 --- CHOLSKY --- p.179 / Chapter C.4 --- EMIT --- p.192 / Chapter C.5 --- GMTRY --- p.205 / Chapter C.6 --- MXM --- p.217 / Chapter C.7 --- VPENTA --- p.220
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_320686 |
Date | January 1995 |
Contributors | Ho, Chi-sum., Chinese University of Hong Kong Graduate School. Division of Computer Science. |
Publisher | Chinese University of Hong Kong |
Source Sets | The Chinese University of Hong Kong |
Language | English |
Detected Language | English |
Type | Text, bibliography |
Format | print, xii, 226 leaves : ill. ; 30 cm. |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0015 seconds