Global ETD Search

21	Optimizing cache utilization in modern cache hierarchies Huang, Cheng-Chieh January 2016 (has links) Memory wall is one of the major performance bottlenecks in modern computer systems. SRAM caches have been used to successfully bridge the performance gap between the processor and the memory. However, SRAM cache’s latency is inversely proportional to its size. Therefore, simply increasing the size of caches could result in negative impact on performance. To solve this problem, modern processors employ multiple levels of caches, each of a different size, forming the so called memory hierarchy. Upon a miss, the processor will start to lookup the data from the highest level (L1 cache) to the lowest level (main memory). Such a design can effectively reduce the negative performance impact of simply using a large cache. However, because SRAM has lower storage density compared to other volatile storage, the size of an SRAM cache is restricted by the available on-chip area. With modern applications requiring more and more memory, researchers are continuing to look at techniques for increasing the effective cache capacity. In general, researchers are approaching this problem from two angles: maximizing the utilization of current SRAM caches or exploiting new technology to support larger capacity in cache hierarchies. The first part of this thesis focuses on how to maximize the utilization of existing SRAM cache. In our first work, we observe that not all words belonging to a cache block are accessed around the same time. In fact, a subset of words are consistently accessed sooner than others. We call this subset of words as critical words. In our study, we found these critical words can be predicted by using access footprint. Based on this observation, we propose critical-words-only cache (co cache). Unlike the conventional cache which stores all words that belongs to a block, co-cache only stores the words that we predict as critical. In this work, we convert an L2 cache to a co-cache and use L1s access footprint information to predict critical words. Our experiments show the co-cache can outperform a conventional L2 cache in the workloads whose working-set-sizes are greater than the L2 cache size. To handle the workloads whose working-set-sizes fit in the conventional L2, we propose the adaptive co-cache (acocache) which allows the co-cache to be configured back to the conventional cache. The second part of this thesis focuses on how to efficiently enable a large capacity on-chip cache. In the near future, 3D stacking technology will allow us to stack one or multiple DRAM chip(s) onto the processor. The total size of these chips is expected to be on the order of hundreds of megabytes or even few gigabytes. Recent works have proposed to use this space as an on-chip DRAM cache. However, the tags of the DRAM cache have created a classic space/time trade-off issue. On the one hand, we would like the latency of a tag access to be small as it would contribute to both hit and miss latencies. Accordingly, we would like to store these tags in a faster media such as SRAM. However, with hundreds of megabytes of die-stacked DRAM cache, the space overhead of the tags would be huge. For example, it would cost around 12 MB of SRAM space to store all the tags of a 256MB DRAM cache (if we used conventional 64B blocks). Clearly this is too large, considering that some of the current chip multiprocessors have an L3 that is smaller. Prior works have proposed to store these tags along with the data in the stacked DRAM array (tags-in-DRAM). However, this scheme increases the access latency of the DRAM cache. To optimize access latency in the DRAM cache, we propose aggressive tag cache (ATCache). Similar to a conventional cache, the ATCache caches recently accessed tags to exploit temporal locality; it exploits spatial locality by prefetching tags from nearby cache sets. In addition, we also address the high miss latency issue and cache pollution caused by excessive prefetching. To reduce this overhead, we propose a cost-effective prefetching, which is a combination of dynamic prefetching granularity tunning and hit-prefetching, to throttle the number of sets prefetched. Our proposed ATCache (which consumes 0.4% of overall tag size) can satisfy over 60% of DRAM cache tag accesses on average. The last proposed work in this thesis is a DRAM-Cache-Aware (DCA) DRAM controller. In this work, we first address the challenge of scheduling requests in the DRAM cache. While many recent DRAM works have built their techniques based on a tagsin- DRAM scheme, storing these tags in the DRAM array, however, increases the complexity of a DRAM cache request. In contrast to a conventional request to DRAM main memory, a request to the DRAM cache will now translate into multiple DRAM cache accesses (tag and data). In this work, we address challenges of how to schedule these DRAM cache accesses. We start by exploring whether or not a conventional DRAM controller will work well in this scenario. We introduce two potential designs and study their limitations. From this study, we derive a set of design principles that an ideal DRAM cache controller must satisfy. We then propose a DRAM-cache-aware (DCA) DRAM controller that is based on these design principles. Our experimental results show that DCA can outperform the baseline over 14%. 004.5 cache ; DRAM ; memory hierarchy
22	A Hierarchy Based Interface for Integration of Scientific Applications Doddamani, Niranjana Sharma 10 May 2003 (has links) Computational Field Simulation processes are typically complex and involve execution of multiple software tools in the form of pipelines to perform simulations successfully. Very often, handling the input and output communication between the tools and allocating computing resources for the processes becomes an essential but an unimportant task for the user. A well written script can often reduce these peripheral tasks and allow the user to concentrate on the analysis. This thesis studies the aspects of design and implementation of a framework called the Integrated Simulation Environment or ISE, that not only forms a scripted environment for high level integration of simulation software tools, but is also flexible enough to accommodate new tools on the fly, while maintaining ease of use and reliability. A hierarchy based design methodology was used to implement the ISE. Hierarchies provide the framework with the flexibility to decompose the complexities of simulation process pipelines and physical entities such as grids and geometries into managable components. Also hierarchies are easily translated into standards such as XML for saving and restoring, and external communication. An Overset CFD simulation process pipeline was integrated into the framework and tested for ease of use, reliability and extensibility. Both simple and complex tools such as a curve extraction tool, a surface grid generation tool, a volume grid generation tool and tools for preparing flow solver inputs were integrated into the system and tested successfully. ISE GUI CFD Interface Hierarchy
23	The Status, Survival, and Current Dilemma of a Female Dalit Cobbler of India Kamen, Gale Ellen 15 April 2004 (has links) Historically, oppression has been and continues to be a serious issue of concern worldwide in both developed and underdeveloped countries. The structure of Indian society, with its hierarchies and power structures, is an ideal place to better understand the experience of oppression. Women throughout the long established Indian hierarchy, and members of the lower castes and classes, have traditionally born the force of oppression generated by the Indian social structure. The focus of this research explored the way the way class, caste, and gender hierarchies coalesce to influence the life choices and experiences of an Indian woman born into the lowest level of the caste and class structure. This research specifically addressed the female <i>Dalit </i> cobbler (leatherworker), who exists among a caste and class of people who have been severely oppressed throughout Indian history. One female Dalit cobbler from a rural village was studied. Her life represents three levels of oppression: females (gender), Dalits (caste), and cobblers (class). This study was based on three interconnected research questions that attempted to uncover the way class, caste, and gender hierarchies influence the lives of Dalit female cobblers: what the Dalit female cobbler has experienced in terms of economic, personal, and social struggle; how the Dalit female cobbler manages to get through her day-to-day struggles; and where the Dalit female cobbler sees herself in the future. Participant observation and triangulation were major components in the design of this study, as it was important to view the local daily life of this individual. Detailed field notes were collected and recorded, interviews based on open-ended questions were conducted, and site documents were gathered. The findings that have become evident throughout this observation have increasingly exposed one continuous theme in particular: the "lived' experience and position that one must accept his or her station in life without question. This dissertation, however, has shown how acceptance does not mean that one stops trying to thrive. On the contrary, the life of this particular female Dalit cobbler exemplifies the ingenuity and perseverance of people who are not members of the dominant social structure. It demonstrates how one individual had the ability to negotiate multiple levels of oppression and succeed in sustaining herself, her family, and her community. / Ph. D. Dalit oppression hierarchy caste untouchable
24	Analysis of a multiple dispatch algorithm Holmberg, Johannes January 2004 (has links) <p>The development of the new programming language Scream, within the project Software Renaissance, led to the need of a good multiple dispatch algorithm. A multiple dispatch algorithm, called Compressed n-dimensional table with row sharing; CNT-RS, was developed from the algorithm Compressed n-dimensional table, CNT. The purpose of CNT-RS was to create a more efficient algorithm. This report is the result of the work to analyse the CNT-RS algorithm. </p><p>In this report the domain of multiple dispatch, the multiple dispatch algorithm CNT and the new extended algorithm CNT-RS are presented. The correctness of CNT- RS algorithm is shown and it’s proven that the CNT-RS algorithm is at least as good as the CNT algorithm, in regards to space complexity of the dispatch structure.</p> Datalogi dispatch multiple dispatch dispatch table pole multipole influence type hierarchy pole hierarchy multipole hierarchy Datalogi Computer science Datalogi
25	Analysis of a multiple dispatch algorithm Holmberg, Johannes January 2004 (has links) The development of the new programming language Scream, within the project Software Renaissance, led to the need of a good multiple dispatch algorithm. A multiple dispatch algorithm, called Compressed n-dimensional table with row sharing; CNT-RS, was developed from the algorithm Compressed n-dimensional table, CNT. The purpose of CNT-RS was to create a more efficient algorithm. This report is the result of the work to analyse the CNT-RS algorithm. In this report the domain of multiple dispatch, the multiple dispatch algorithm CNT and the new extended algorithm CNT-RS are presented. The correctness of CNT- RS algorithm is shown and it’s proven that the CNT-RS algorithm is at least as good as the CNT algorithm, in regards to space complexity of the dispatch structure. Datalogi dispatch multiple dispatch dispatch table pole multipole influence type hierarchy pole hierarchy multipole hierarchy Datalogi Computer Sciences Datavetenskap (datalogi)
26	Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching Khan, Muneeb January 2016 (has links) Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefetching, to increase performance. Such complex hardware structures have helped improve performance in general, however, their full potential is not realized as software often utilizes the memory hierarchy inefficiently. Performance can be improved further by ensuring careful interaction between software and hardware. Performance can typically improve by increasing the cache utilization and by conserving the DRAM bandwidth, i.e., retaining more useful data in the caches and lowering data requests to the DRAM. One way to achieve this is to conserve space across the cache hierarchy and increase opportunity for temporal reuse of cached data. Similarly, conserving the DRAM bandwidth is essential for performance in highly utilized multicores, as it can easily become a critical resource. When multiple cores are active and the per-core share of DRAM bandwidth shrinks, its efficient utilization plays an important role in improving the overall performance. Together the cache hierarchy and the DRAM bandwidth play a significant role in defining the overall performance in multicores. Based on deep insight from memory behavior modeling of software, this thesis explores five software-only methods to analyze and increase performance in multicores. The underlying philosophy that drives these techniques is to increase cache utilization and conserve DRAM bandwidth by 1) focusing on making data prefetching more accurate, and 2) lowering the miss rate in the cache hierarchy either by preserving useful data longer by cache-bypassing the less useful data or via code size compaction using compiler options. First, we show how microarchitecture-independent memory access profiles can be used to analyze the Instruction Cache performance of software. We use this information in a compiler pass to recompile application phases (with large Instruction cache miss rate) for smaller code size in an effort to improve the application Instruction Cache behavior. Second, we demonstrate how a resourceefficient software prefetching method can be combined with hardware prefetching to improve performance in multicores when running software that exhibits irregular memory access patterns. Third, we show that hardware prefetching on high performance commodity multicores is sub-optimal and demonstrate how a resource-efficient software-only prefetching method can perform better in fully utilized multicores. Fourth, we present an adaptive prefetching approach that dynamically combines software and hardware prefetching in a runtime system to improve performance in highly utilized multicores. Finally, in the fifth work we develop a method to predict per-core prefetching configurations that deliver near-optimal overall multicore performance. These software techniques enable us to tap greater performance in multicores (up to 50%), without requiring more processing resources. Performance Optimization Prefetching multicore memory hierarchy
27	Hierarchical structures in medium-sized manufacturing companies and their lower boundaries Cebi, Ali Can, Bauer, Tobias January 2016 (has links) Application of low hierarchy structures are becoming increasingly popular by enhancing job satisfaction and productivity of employees. On the other hand formation of hierarchy appears to be natural and beneficial in many cases. This study explores how low hierarchies could become and where the boundaries regarding job satisfaction lie as well as how these differ depending on formal position of employees. The inquiry is undertaken with a focus is on medium-sized companies in manufacturing industry in Germany where job satisfaction and productivity via such applications is vital. Extensive qualitative data was collected with a single-case approach; analysis was conducted qualitatively likewise. The lower limits of hierarchy are discovered to lie in various aspects mainly relating to supervision, recognition of good performance and promotion opportunities and to differ significantly with formal position. The study is believed to be unique and assist in shedding light into the area of beneficial and practical low hierarchy applications.
28	From quantum many body systems to nonlinear Schrödinger Equations Xie, Zhihui 06 November 2014 (has links) The derivation of nonlinear dispersive PDE, such as the nonlinear Schrödinger (NLS) or nonlinear Hartree equations, from many body quantum dynamics is a central topic in mathematical physics, which has been approached by many authors in a variety of ways. In particular, one way to derive NLS is via the Gross-Pitaevskii (GP) hierarchy, which is an infinite system of coupled linear non-homogeneous PDE. In this thesis we present two types of results related to obtaining NLS via the GP hierarchy. In the first part of the thesis, we derive a NLS with a linear combination of power type nonlinearities in R[superscript d] for d = 1, 2. In the second part of the thesis, we focus on considering solutions to the cubic GP hierarchy and we prove unconditional uniqueness of low regularity solutions to the cubic GP hierarchy in R[superscript d] with d ≥ 1: the regularity of solution in our result coincides with known regularity of solutions to the cubic NLS for which unconditional uniqueness holds. / text GP hierarchy Nonlinear Schrödinger equation Unconditional uniqueness
29	Discovering lexical generalisations : a supervised machine learning approach to inheritance hierarchy construction Sporleder, Caroline January 2004 (has links) Grammar development over the last decades has seen a shift away from large inventories of grammar rules to richer lexical structures. Many modern grammar theories are highly lexicalised. But simply listing lexical entries typically results in an undesirable amount of redundancy. Lexical inheritance hierarchies, on the other hand, make it possible to capture linguistic generalisations and thereby reduce redundancy. Inheritance hierarchies are usually constructed by hand but this is time-consuming and often impractical if a lexicon is very large. Constructing hierarchies automatically or semiautomatically facilitates a more systematic analysis of the lexical data. In addition, lexical data is often extracted automatically from corpora and this is likely to increase over the coming years. Therefore it makes sense to go a step further and automate the hierarchical organisation of lexical data too. Previous approaches to automatic lexical inheritance hierarchy construction tended to focus on minimality criteria, aiming for hierarchies that minimised one or more criteria such as the number of path-value pairs, the number of nodes or the number of inheritance links (Petersen 2001, Barg 1996a, and in a slightly different context: Light 1994). Aiming for minimality is motivated by the fact that the conciseness of inheritance hierarchies is a main reason for their use. However, I will argue that there are several problems with minimality-based approaches. First, minimality is not well defined in the context of lexical inheritance hierarchies as there is a tension between different minimality criteria. Second, minimality-based approaches tend to underestimate the importance of linguistic plausibility. While such approaches start with a definition of minimal redundancy and then try to prove that this leads to plausible hierarchies, the approach suggested here takes the opposite direction. It starts with a manually built hierarchy to which a supervised machine learning algorithm is applied with the aim of finding a set of formal criteria that can guide the construction of plausible hierarchies. Taking this direction means that it is more likely that the selected criteria do in fact lead to plausible hierarchies. Using a machine learning technique also has the advantage that the set of criteria can be much larger than in hand-crafted definitions. Consequently, one can define conciseness in very broad terms, taking into account interdependencies in the data as well as simple minimality criteria. This leads to a more fine-grained model of hierarchy quality. In practice, the method proposed here consists of two components: Galois lattices are used to define the search space as the set of all generalisations over the input lexicon. Maximum entropy models which have been trained on a manually built hierarchy are then applied to the lattice of the input lexicon to distinguish between plausible and implausible generalisations based on the formal criteria that were found in the training step. An inheritance hierarchy is then derived by pruning implausible generalisations. The hierarchy is automatically evaluated by matching it to a manually built hierarchy for the input lexicon. Automatically constructing lexical hierarchies is a hard task, partly because what is considered the best hierarchy for a lexicon is to some extent subjective. Supervised learning methods also suffer from a lack of suitable training data. Hence, a semi-automatic architecture may be best suited for the task. Therefore, the performance of the system has been tested using a semi-automatic as well as an automatic architecture and it has also been compared to the performance achieved by the pruning algorithm suggested by Petersen (2001). The findings show that the method proposed here is well suited for semi-automatic hierarchy construction. 410
30	Victorian Women and the Carnivalesque in Six Novels Threlkeld-Dent, Debra 10 May 2017 (has links) This analysis will explore the progression and transformation of carnivalesque theory in six novels. The carnivalesque analysis will focus on Victorian women and the working class over a time period beginning around 1830 and ending in 1910. The novels that comprise this study are Thomas Hardy’s The Return of the Native and Jude the Obscure; Elizabeth Gaskell’s North and South and Charlotte Bronte’s Shirley; and finally Arnold Bennett’s The Old Wives’ Tale and E. M. Forster’s Howards End. The study intends to show a progression in the role of women that utilizes carnivalesque display as a vehicle. Women in the Hardy novels represent those who rebel against prescriptive Victorian mores in the midst of carnivalesque scenes. Hardy intends to use transgressive women and the suffering they endure to illustrate how Victorian rules of decorum and the institution of marriage are confining to point of being destructive. Gaskell and Bronte’s novels represent industrial or condition-of-England novels that show how Victorian women gain greater access and understanding of the working class and poor through spending time with these groups while performing charitable works. The carnivalesque has indeed undergone a partial transformation because scenes that overturn authority occur not only in public settings like the marketplace, but they also show up in the form of worker strikes and uprisings. Because the females in these novels have a greater understanding of the plight of the poor workers, they are able to advocate on their behalf and exert influence upon the managers and owners that helps to bring about reform in the workers’ situation. Finally the last two novels represent the culmination of this study as they reveal how carnivalesque scenes, both public and private, frame the experiences of two sets of sisters, both of which occupy the liminal space between the Victorian Age and Modernism. Women have progressed to the point of being able to overcome adversity and personal failure and grow into strong, independent individuals who speak for themselves, live independently, exert their own authority, and finally vote. carnivalesque carnival transgression hierarchy female victorian

Search results