Global ETD Search

1	A Different Threshold Approach to Data Replication in Data Grids Huang, Yen-Wei 21 January 2008 (has links) Certain scientific application domains, such as High-Energy Physics or Earth Observation, are expected to produce several Petabytes (220 Gigabyes) of data that is analyzed and evaluated by the scientists all over the world. In the context of data grid technology, data replication is mostly used to reduce access latency and bandwidth consumption. In this thesis, we adopt the typical Data Grid architecture, three kinds of nodes: server, cache, and client nodes. A server node represents a main storage site. A client node represents a site where data access requests are generated, and a cache node represents an intermediate storage site. However, the access latency of the hierarchical storage system may be of the order of seconds up to hours. The static replication strategy can be used to improve such long delay; however, it cannot adapt to changes of users¡¦ behaviors. Therefore, the dynamic data replication strategy is used in Data Grids. Three fundamental design issues in a dynamic replication strategy are: (1) when to create the replicas, (2) which files to be replicated, and (3) where the replicas to be placed. Two of well known replication strategies are Fast-Spread and Cascading, which can work well for different kinds of access patterns individually. For example, the Fast-Spread strategy works well for random access patterns, and the Cascading strategy works well for the patterns with the properties of localities. However, for so many different access patterns, if we use a strategy for one kind of access patterns and another strategy for another kind of access patterns, the system may become too complex. Therefore, in this thesis, we propose one strategy which can work for any kind of access patterns. We propose a replication approach, a Different Threshold (DT) approach to data replication in Data Grids, which can be dynamically adapted to several kinds of access patterns and provide even better performance than Cascading and Fast-Spread strategies. In our approach, there are different thresholds for different layers. Based on this approach, first, we propose a static DT strategy in which the threshold at each layer is fixed. So, by carefully adjusting the difference between the thresholds Ti, where i is the i-th layer of the tree structure, we can even provide the better performance than the above two well-known strategies. Moreover, among large amount of different data files, there may exist some hot data files. Those files which have been mostly requested are hot data files. To reduce the number of requests for the hot files, next, we propose the dynamic DT strategy. In the dynamic DT strategy, each data file even has its own threshold. We let data replication of hot files occur earlier than others by decreasing the thresholds of hot files earlier than the normal ones. From our simulation results, we show that the response time in our static DT strategy is less than that in the Cascading and the Fast-Spread strategies. Moreover, we can show that the performance of the dynamic DT strategy is better than that of the static DT strategy. Data Grids Access patterns Data Replication Thresholds
2	Data Access Mechanisms for Skewed Access Patterns in Wireless Information Systems Shen, Jun-Hong 16 June 2008 (has links) Wireless data broadcast is an efficient way to disseminate digital information to clients equipped with mobile devices. It allows a huge number of the mobile clients simultaneously access data at anytime and anywhere in the wireless environments. Applications using wireless data broadcast to disseminate information include accessing stock activities and traffic conditions. Using index technologies on the broadcast file, i.e., selective tuning, can reduce a lot of energy consumption of the mobile devices without significantly increasing client waiting time. Most of the research work for selective tuning assumes that each data item broadcast on the wireless channel is fairly evenly accessed by mobile clients. In real-life applications, more popular data may be frequently accessed by clients than less popular ones, i.e., skewed access patterns. In this dissertation, to support efficiently selective tuning with skewed access patterns in the single-channel wireless environments, we first propose a skewed distributed index, SDI, on the uniform data broadcast, on which each data item is broadcast once in a broadcast cycle. Second, we propose a skewed index, SI, on the nonuniform data broadcast, on which a few popular data items are broadcast more frequently in a broadcast cycle than the others. The first proposed algorithm, SDI, considers the access probabilities of data items and the replication of index nodes. The proposed algorithm traverses a balanced tree to determine whether an index node should be replicated by considering the access probability of its child node. In our performance analysis and simulation results, we have shown that our proposed algorithm outperforms the variant-fanout tree index and the distributed index. The second proposed algorithm, SI, applies Acharya et al.'s Broadcast Disks to generate a broadcast program, in which the popular data items are broadcast more times than the others, in order to reduce client waiting time. Moreover, the proposed algorithm builds a skewed tree for these data items and allocates index nodes for the popular data items more times than those for the less popular ones in a broadcast cycle. From our performance analysis and simulation results, we have shown that our proposed SI outperforms the flexible index and the flexible distributed index. Power Conservation Data Broadcast Selective Tuning Skewed Access Patterns Wireless Network
3	Practical transparent persistence Ibrahim, Ali Hussein, 1980- 23 March 2011 (has links) Many enterprise applications persist data beyond their lifetimes, usually in a database management system. Orthogonal persistence provides a clean programming model for communicating with databases. A program using orthogonal persistence operates over persistent and non-persistent data uniformly. However, a straightforward implementation of orthogonal persistence results in a large number of small queries each of which incurs a large overhead when accessing a remote database. In addition, the program cannot take advantage of a database's query optimizations for large and complex queries. Instead, most programs compose smaller queries into a single large query explicitly and send the query to the database through a command-level interface. These explicit queries compromise the modularity of programs because they do not compose well and they contain information about the program's future data access patterns. Consequently, programs with explicit queries are harder to maintain and reason about. In this thesis, we first define transparent persistence, a relaxation of orthogonal persistence. We show how transparent persistence in current tools can be made more practical by developing AutoFetch. The key idea in AutoFetch is to dynamically observe a program's data access patterns and use that information to reduce the number of queries. While AutoFetch is constrained by existing Java technology and tools, Remote Batch Invocation (RBI) adds the batch statement to the Java language. The batch statement is a general purpose mechanism for optimizing distributed communication using batching. RBI-DB specializes the ideas in RBI for databases. Both of these ideas help bridge the performance gap between orthogonally persistent systems and traditional database interfaces. / text Transparent persistence AutoFetch Data access patterns Java language Remote Batch Invocation Distributed communication
4	Improving Storage Performance Through Layout Optimizations Bhadkamkar, Medha 28 July 2009 (has links) Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds - parent-tochild or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5-54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3-127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance. storage systems on-disk data layout disk access patterns disk performance storage optimizations tree/graph layout
5	Parallel Garbage Collection in Solid State Drives Kolla, Purushotham Pothu Raju 20 September 2012 (has links) No description available. Computer Engineering Solid State Drives SSD Garbage Collection RAID Flash Memory Access Patterns
6	Método otimizado de arquitetura de coerência de cache baseado em sistemas embarcados multinúcleos. / Optimized method for cache coherence architecture based on multicore embedded systems. Kofuji, Jussara Marândola 01 December 2011 (has links) A tese apresenta um método de arquitetura de coerência de cache especializado por sistemas embarcados. Um das contribuições principais deste método é apresentar uma proposição de arquitetura CMP de memória compartilhada orientada a padrões de acesso a memória e de um protocolo de coerência híbrido. A contribuição principal é a especificação do novo componente de hardware, chamado tabela de padrões, o qual é validado por representação formal e pela implementação da estrutura da tabela de padrões. A partir desta tabela foi desenvolvido um modelo de transação de mensagens do protocolo híbrido que diferencia as mensagens em clássicas e especulativas. A contribuição final apresenta um modelo analítico do custo efetivo de desempenho do protocolo híbrido. / This thesis presents the optimized method of cache coherent architecture based on embedded systems. The main contribution of this method presents the proposal of shared memory architecture CMP oriented by memory access patterns and cache coherent hybrid protocol. The cache coherent architecture provided the hardware specification called pattern table which can be validated by formal representation and the first implementation of pattern table. Through pattern table was developed the model of messages transaction to hybrid protocol witch differ the messages in classical and speculative. The final contribution presents the analytic model of effective cost of hybrid protocol performance. Cache coherent protocol Chip design Concepção de processador Descrição de hardware Hardware description Memory access patterns Padrões de acesso à memória Protocolo de coerência de cache
7	Método otimizado de arquitetura de coerência de cache baseado em sistemas embarcados multinúcleos. / Optimized method for cache coherence architecture based on multicore embedded systems. Jussara Marândola Kofuji 01 December 2011 (has links) A tese apresenta um método de arquitetura de coerência de cache especializado por sistemas embarcados. Um das contribuições principais deste método é apresentar uma proposição de arquitetura CMP de memória compartilhada orientada a padrões de acesso a memória e de um protocolo de coerência híbrido. A contribuição principal é a especificação do novo componente de hardware, chamado tabela de padrões, o qual é validado por representação formal e pela implementação da estrutura da tabela de padrões. A partir desta tabela foi desenvolvido um modelo de transação de mensagens do protocolo híbrido que diferencia as mensagens em clássicas e especulativas. A contribuição final apresenta um modelo analítico do custo efetivo de desempenho do protocolo híbrido. / This thesis presents the optimized method of cache coherent architecture based on embedded systems. The main contribution of this method presents the proposal of shared memory architecture CMP oriented by memory access patterns and cache coherent hybrid protocol. The cache coherent architecture provided the hardware specification called pattern table which can be validated by formal representation and the first implementation of pattern table. Through pattern table was developed the model of messages transaction to hybrid protocol witch differ the messages in classical and speculative. The final contribution presents the analytic model of effective cost of hybrid protocol performance. Concepção de processador Descrição de hardware Padrões de acesso à memória Protocolo de coerência de cache Cache coherent protocol Chip design Hardware description Memory access patterns
8	MINING USER ACCESS PATTERNSFROM NETWORK FLOW ON THE INTERNET Chang, Shih-Ta 18 July 2000 (has links) This thesis focuses on mining user access patterns from netflow database collected from the core router of a regional network center. We use the attributed relational graph representation to formulate user access patterns on the Internet, and then propose a procedure to generalize common connection patterns and detect deviation patterns with such methods as large graph generalization, error correcting graph matching, frontier identification and pattern base recognition. The major contributions of this thesis are on represeting the network connection with attributed relational graph and developing data mining tehcniques for identifying access paterns and detecting deviation. The results can be used for better managing regional network in order to improve user satification in using regional netwrok netwrok services. generalization access patterns large graph graph similarity and distance error-correcting graph matching deviation detection data mining attributed releated graph
9	An Incremental Approach to Discovering Regional Network Access Patterns Tzeng, Yung-Shuen 18 July 2001 (has links) This thesis proposes an incremental algorithm to discover regional network access patterns from traffic data of a regional network. Because the size of network traffic database is very large, we need to develop a fast algorithm of association rules in order to efficiently generate user access patterns. Attributed relational graph is used to represent user access patterns on the network. The change of relational graph indicates the access pattern of a regional network is changed. In order to keep the network access pattern up to date without spending great computation costs, we propose an incremental procedure to generalize network access patterns from time to time. The results can be used for supporting network administrators to easily keep track of network usage patterns and better manage regional networks access patterns attributed relational graph association rules incremental large access graph large access graph network traffic data mining
10	Performance impact of programmer-inserted Data Prefetches for irregular access patterns with a case study of FMM VList algorithm Tondon, Abhishek 22 April 2014 (has links) Data Prefetching is a well-known technique to speed up applications wherein hardware prefetchers or compilers speculatively prefetch data into caches closer to the processor to ensure it’s readily available when the processor demands it. Since incorrect speculation leads to prefetching useless data which, in turn, results in wasting memory bandwidth and polluting caches, prefetch mechanisms are usually conservative and prefetch on spotting fairly regular access patterns only. This gives the programmer with a knowledge of application, an opportunity to insert fine-grain software prefetches in the code to clinically prefetch the data that is certain to be demanded but whose access pattern is not too obvious for hardware prefetchers or compiler to detect. In this study, the author demonstrates the performance improvement obtained by such programmer-inserted prefetches with the case study of an FMM (Fast Multipole Method) VList application kernel run with several different configurations. The VList computation requires computing the Hadamard product of matrices. However, the way each node of the octree is stored in the memory, leads to indirect accessing of elements where memory accesses themselves are not sequential but the pointers pointing to those memory locations are still stored sequentially. Since compilers do not insert prefetches for indirect accesses, and to hardware, the access pattern appears random, programmer-inserted prefetching is the only solution for such a case. The author demonstrates the performance gain obtained by employing different prefetching choices in terms of what all structures in the code to prefetch and which level of cache to prefetch those to and also presents an analysis of the impact of different configuration parameters on performance gain. The author shows that there are several prefetching combinations which always bring performance gain without ever hurting the performance, and also identifies prefetching to L1 cache and prefetching all data structures in question, as the best prefetching recommendation for this application kernel. It is shown that this one combination gets the highest performance gain for most run configurations and an average performance gain of 10.14% across all run configurations. / text Data prefetching Software prefetching Programmer-inserted prefetches Irregular access patterns High performance applications FMM VList Performance enhancement

Search results