Global ETD Search

261	Enabling Efficient Big Data Services on HPC Systems with SHMEM-Based Programming Stack Unknown Date (has links) Thesis abstract With the continuous expansion of the Big Data universe, researchers have been relentlessly searching for ways to improve the efficiency of big data services, including data analytics and data infrastructures. In the meantime, there has also been an increasing interest to leverage High-performance Computing (HPC) capabilities for big data analytics. Symmetric Hierarchical Memory (SHMEM) is a popular parallel programming model thrived in the HPC realm. For many Partitioned Global Address Space (PGAS) systems and applications, SHMEM libraries are popularly used as a high-performance communication layer between the applications and underlying fast-speed interconnects. SHMEM features an one-sided communication interface. It allows remote data to be accessed in a shared-memory manner, in contrast to the conventional two-sided communication where remote data must be accessed through an explicit handshake protocol. We reveal that SHMEM offers a number of great benefits to develop parallel and distributed applications and frameworks on tightly-coupled, high-end HPC systems, such as its shared-memory style addressing model and the flexibility of its communication model. This dissertation focuses on improving the performance of big data services by leveraging a lightweight, flexible and balanced SHMEM-based programming stack. In order to realize this goal, we have studied some representative data infrastructure and data analytic framework. Specifically, key-value stores are a very popular form of data infrastructure deployed for many large-scale web services. Unfortunately, a key-value store usually adopts an inefficient communication design in a traditional server-client architecture, where the server can easily become a bottleneck in processing a huge amount of requests. Because of this, both latency and throughput can be seriously affected. Moreover, graph processing is an emerging type of data analytics that deals with large-scale graph data. Unsuitable for traditional MapReduce, graph analytic algorithms are often written and run with programming models that are specifically designed for graph processing. However, there is an imbalance issue in state-of-the-art graph processing programming model which has drastically affected the performance of graph processing. There is a critical need to revisit the conventional design of graph processing while the volume of real-world useful graph data keeps increasing everyday. Furthermore, although we reveal that a SHMEM-based programming stack helps solve the aforementioned issues, there is still a lack of understanding about how portable this stack can be for it to fit in with specific data infrastructure and framework being optimized and also other distributed systems in general. This includes to understand the potential performance gain or loss, limitations of usage, and portability on different platforms etc. This dissertation has centered around addressing these research challenges and carried out three studies, each tackling a unique challenge but all focusing on facilitating a SHMEM-based programming stack to enable and accelerate big data services. Firstly, we use a popular SHMEM standard called OpenSHMEM to build a high-performance key-value store called SHMEMCache, which overcomes several issues in enabling direct access to key-value pairs, including race conditions, remote point chasing and unawareness of remote access. We have then thoroughly evaluated SHMEMCache and shown that it has accomplished significant performance improvements over the other contemporary key-value stores, and also achieved good scalability over a thousand nodes on a leadership-class supercomputer. Secondly, to understand the implications in using various SHMEM model and one-sided communication library for big data services, we revisit the design of SHMEMCache and extend it with a portable communication interface and develop Portable-SHMEMCache. Portable-SHMEMCache is able to support a variety of one-sided communication libraries. Based on this new framework, we have supported both OpenSHMEM and MPI-RMA for SHMEMCache as proof-of-concept. We have conducted an extensive experimental analysis to evaluate the performance of Portable-SHMEMCache on two different platforms. Thirdly, we have thoroughly studied the issues existed in state-of-the-art graph processing frameworks. We have proposed salient design features to tackle their serious inefficiency and imbalance issues. The design features have been incorporated in a new graph processing framework called SHMEMGraph. Our comprehensive experiments for SHMEMGraph have demonstrated its significant performance advantages compared to state-of-the-art graph processing frameworks. This dissertation has pushed forward the big data evolution by enabling efficient representative data infrastructure and analytic frameworks on HPC systems with SHMEM-based programming models. The performance improvements compared to state-of-the-art frameworks have demonstrated the efficacy of our solution designs and the potential of leveraging HPC capabilities for big data. We believe that our work has better prepared contemporary data infrastructures and analytic frameworks for addressing the big data challenge. / A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2018. / December 6, 2018. / Graph Processing, Key-value Store, One-sided Communication, PGAS, SHMEM / Includes bibliographical references. / Weikuan Yu, Professor Directing Dissertation; Ming Ye, University Representative; Zhenhai Duan, Committee Member; Manjunath Gorentla Venkata, Committee Member; Michael Mascagni, Committee Member. Computer science
262	Machine Learning Approach for Generalizing Traffic Pattern-Based Adaptive Routing in Dragonfly Networks Unknown Date (has links) Universal Global Adaptive routing (UGAL) is a common routing scheme used in systems based on the Dragonfly interconnect topology. UGAL uses information about local link-loads to make adaptive routing decisions. Traffic Pattern-based Adaptive Routing (TPR) enhances UGAL by incorporating additional network statistics into the routing process. Contemporary switches are designed to accommodate an expansive set of network performance metrics. Distinguishing between significant, predictive metrics and insignificant metrics is critical to the process of designing an adaptive routing algorithm. We propose the use of recurrent neural networks to assess the relative predictive power of various network statistics. Using this method we rank the predictive power of network statistics using data collected from a network simulator. Both UGAL and TPR require tuning of hyper-parameters to achieve optimal performance, with TPR having more than 20 parameters for the Cray Cascade architecture. We demonstrate that the optimal value of these parameters can vary significantly based on the size of the architecture, the arrangement of global links chosen for the Dragonfly topology, and the traffic that the system will likely encounter. We propose and evaluate using a neural network to simplify the tuning of hyper-parameters used in TPR. We find that this approach is able to match or exceed the performance of TPR across several synthetic traffic patterns using a network simulator. / A Thesis submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Master of Science. / Spring Semester 2019. / April 22, 2019. / adaptive routing, adaptive systems, dragonfly network, machine learning, network topology, routing / Includes bibliographical references. / Xin Yuan, Professor Co-Directing Thesis; Xiuwen Liu, Professor Co-Directing Thesis; Piyush Kumar, Committee Member. Computer science
263	Towards Ubiquitous Sensing Using Commodity WiFi Unknown Date (has links) Recently, the prevalence of WiFi devices and ubiquitous coverage of WiFi network provide us the opportunity to extend WiFi capabilities beyond communication, particularly in sensing the physical environment. Most existing systems that enable human sensing utilizing commodity WiFi devices are simply rely on profile training based techniques. Such techniques suffer from performance degradation when configuration changes after training. Furthermore, those systems can not work under multi-user scenarios. To overcome the limitations of existing solutions, this dissertation introduces the design and implementation of three systems. First, we propose MultiTrack, a multi-user indoor tracking and activity recognition system. It leverages multiple transmission links and all the available bandwidth at 5GHz of commodity WiFi to achieve tracking multiple users simultaneously. Second, we present WiFinger, a fine-grained finger gesture recognition system, which utilizes single RF device and does not require per-user or per-location training. Lastly, we present FruitSense, a RF based fruit ripeness level detection system that achieves environment-independent sensing. Such system demonstrates the wireless sensing can be utilized beyond human sensing to the biosensing field. / A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2019. / April 1, 2019. / activity recognition, fruit sensing, gesture recognition, indoor tracking, localization, wireless sensing / Includes bibliographical references. / Jie Yang, Professor Directing Dissertation; Sachin Shanbhag, University Representative; An-I Andy Wang, Committee Member; Zhenhai Duan, Committee Member. Computer science
264	Topology Aggregation for Networks with Two Additive Metrics Unknown Date (has links) Topology Aggregation is concerned about summarizing a network domain in a concise manner. This thesis deals with topology aggregation for networks with two additive metrics. Summarizing such a network domain is difficult for a number of reasons. First, computing paths between two nodes with two additive metrics is NP-Hard. Second, it is unclear how the quality of two paths with two additive metrics can be compared, which leads to the difficulty in determining the quality of topology aggregation schemes. In this thesis, we develop a method to evaluate the quality of aggregation schemes for networks with two additive metrics, propose to compute the full mesh representation of a domain using the limited path heuristic and demonstrate that the information carried in the full mesh representation is very close to that in the original network representation. We also develop and study a number of schemes to reduce the full mesh representation to the spanning tree based representation. The performance of the proposed schemes is studied through simulation. The results show that minimum spanning tree based schemes yield reasonable performance. / A Thesis submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Master of Science. / Degree Awarded: Summer Semester, 2004. / Date of Defense: July 9, 2004. / Topology aggregation, multiple additive metrics / Includes bibliographical references. / Xin Yuan, Professor Directing Thesis; Lois Hawkes, Committee Member; Sudhir Aggarwal, Committee Member. Computer science
265	A METHODOLOGICAL APPROACH TO A RE-USABLE FUZZY EXPERT SYSTEM Unknown Date (has links) A methodology for the development of expert systems is proposed and developed. Expert systems are examined and classified into three major groups. The imprecision inherent in many expert's domains is addressed with the use of fuzzy set theory. Fuzzy relations are used for knowledge representation and the inference process. Knowledge acquisition is also addressed in the relational context. All aspects of the system may be seen in the context of fuzzy relations which provide a powerful consistency. The developed methodology allows the construction of an expert system which has no domain knowledge built into it. This allows the system to be used in different domains with only the knowledge base being updated. We have applied the methodology to develop a multi-knowledge source expert system which incorporates fuzzy reasoning techniques. A blackboard is used for communication and the system may be distributed across several processors. Successful examples of the systems operation are shown. / Source: Dissertation Abstracts International, Volume: 47-05, Section: B, page: 2059. / Thesis (Ph.D.)--The Florida State University, 1986. Computer Science
266	FUZZY ALGORITHM FOR PRIORITY ESTIMATION IN IMMEDIATE AREA NETWORKS Unknown Date (has links) The advent of lower cost computers has accelerated the use of distributed computer networks. These systems vary in architecture from those that are tightly coupled by extensive parallel busses or shared resources, such as memory, to those that are loosely coupled and connected by long busses, that are usually serial in nature. Both of these extremes serve their particular applications well, but a middle ground exists where neither of the techniques quite satisfies the requirements. Situations exist where the processors may be in close proximity to each other but still have the requirement for high transmission speeds and the requirements for the flexibility and modularity of a loosely coupled network. The proposed architecture for an Immediate Area Network (IAN), that falls into this category, is being investigated in this project. / In an application of this nature, the importance or priority of the message may be a significant factor. In those cases, it is necessary that the high priority messages be delayed as little as possible before transmission. / We may also expect very heavy loads on the bus where bursts of messages far exceed the capacity of the bus for a short time. During such bursts, messages may be delayed for extended periods while multiple collisions and message delays occur. A protocol algorithm for back-off determination using fuzzy set theory is presented to minimize the delay of high priority messages. This priority algorithm will also work with the Ethernet Local Area Network (LAN) and testing is provided to show its performance there. / Source: Dissertation Abstracts International, Volume: 48-07, Section: B, page: 2027. / Thesis (Ph.D.)--The Florida State University, 1987. Computer Science
267	COFESS--cooperative fuzzy expert systems for intelligent recognition Unknown Date (has links) COFESS is a pattern recognition system composed of three cooperating fuzzy expert systems (denoted by COFES1, COFES2 and COFES3) which utilizes fuzzy set theory and fuzzy logic in its decision making mechanisms. COFESS employs a recursion in the process of pattern recognition. Decisions related to the nature of the recognition need to be made along the way such as what feature to recognize next, etc. In order to solve this problem, an inference engine is constructed that examines a knowledge base and determines the next step in the recognition process. Another problem arises when we have to decide how one feature is related to the rest of the features that construct an object. Consider the problem of recognizing an object containing five identical squares--how can we prevent the system from recognizing the same square five times. To solve this problem (as well as other related problems) we defined two types of relations between features. / The first type of relation determines the relative location of a feature with regard to other features and thus enables the system to distinguish between features. Moreover, by finding the area in which a certain element is expected to be found we are able to reduce the search space and increase the speed of the recognition process. The second type of relation is developed to help the system determine whether the feature recognized is indeed the feature that we intended to recognize. These are physical relations between the features (such as, how is the length of one feature related to the length of another feature, etc.), and are designed to help to distinguish between a feature and an accidental noise that resembles this feature. / Upon successful localization of the designated area for recognition, a recognizer is activated to perform the actual pattern matching. / Thus, the recognition of a feature involves four steps: (1) Deciding which element to recognize. (2) Finding the local area in which this element can be found. (3) Performing the pattern matching. (4) Checking whether or not this element is really the element which was expected to be recognized. (Abstract shortened with permission of author.) / Source: Dissertation Abstracts International, Volume: 49-06, Section: B, page: 2268. / Major Professor: Abraham Kandel. / Thesis (Ph.D.)--The Florida State University, 1987. Computer Science
268	An informal reasoning technique and truth maintenance subsystem for global diagnosis in an instructional system Unknown Date (has links) Intelligent tutoring systems (ITS) have become increasingly important in light of the concerns that have arisen regarding public education and technical training. The main objective of an ITS is to provide individualized or adaptive instruction. In order to provide this type of instruction, an ITS must have access to information about a student's problem-solving abilities. The system's representation of this information is referred to as the student model, and the process by which this structure is created and maintained is known as student modelling or diagnosis. / The research presented in this dissertation has led to the development of a diagnostic system which functions as the student modelling component of an ITS. This system addresses three key issues which are necessary to guarantee the satisfactory performance of the diagnostic function. These are: (1) how to assign credit or blame to individual skills, (2) how to handle noise in the diagnostic process, and (3) how to handle the problem of combinatorial explosion common to many AI applications. / Furthermore, this system offers a unique approach for modelling a student's performance abilities through the use of three mechanisms. First, an informal reasoning technique is used for determining the performance levels of individual skills in the student model. Another feature is the use of an assumption-based truth maintenance procedure for maintaining consistent information. The third feature is the inclusion of historical information in the diagnostic process. / Source: Dissertation Abstracts International, Volume: 52-06, Section: B, page: 3148. / Major Professor: Lois W. Hawkes. / Thesis (Ph.D.)--The Florida State University, 1991. Computer Science
269	Predicting execution time on contemporary computer architectures Unknown Date (has links) Predicting the execution times of straight-line code sequences is a fundamental problem in the design and analysis of hard-real-time systems. A survey of the hardware and software factors that make predicting execution time difficult is presented, along with the results of experiments that evaluate the degree of variation in execution time that may be caused by these factors. The traditional methods of measuring and predicting execution time are examined, and their strengths and weaknesses discussed. / A new technique is presented for predicting point-to-point execution times on contemporary microprocessors. This technique, is called Micro-analysis. It uses a machine description, which is in the form of a set of translation rules similar to those that have proven useful for code generation and peephole optimization, to translate compiled object code into a sequence of very low level instructions. The stream of micro-instructions is then analyzed for timing, via a three-level pattern matching scheme. At this low level, the effects of advanced features such as caching and instruction overlap can be taken into account. The technique is compiler and language-independent, and easily retargetable. / We have implemented a software timing prediction tool based on Micro-analysis, as well as a hybrid timing measurement tool using a programmable HP 1650A logic analyzer and a Sun workstation. Using these tools, we have compared the performance of the Micro-analysis technique against several traditional methods. We have also tested the retargetability of the Micro-analysis tool. The implementations of the tools, the retargeting effort, the experiments performed, and the results of the experiments are reported. / Source: Dissertation Abstracts International, Volume: 52-06, Section: B, page: 3147. / Major Professor: T. P. Baker. / Thesis (Ph.D.)--The Florida State University, 1991. Computer Science
270	Exploiting parallel processing in large scientific applications Unknown Date (has links) This dissertation presents programming strategies and tools for exploiting parallel processing of large scientific applications in distributed memory computers. The tools form a parallel software library called Salam, which is useful for parallel software development. It has been implemented on SUN 3 and Intel iPSC/2 computers. It was tested by implementing the Lanczos method, commonly used in nuclear physics applications. Two approaches were used to predict the performance of this parallel program on large data sets from an analysis of the code and its performance on small data sets. / As with all modern scientific applications, scientists need machine independent and efficient parallel software to achieve high performance for large applications. Salam is designed to support the development of such programs. It is an efficient core and basic step for developing a standard parallel library for different parallel architectures. / One achievement of this dissertation is a parallel implementation of the Lanczos algorithm on distributed memory computers which has the advantage of reducing the execution time of the algorithm and increasing the size of the problem that can be solved. A second is analyzing and predicting the performance of this implementation. The algorithm will achieve nearly linear speedup whenever the amount of data per processor is sufficiently large--even when communication costs are high, as they are in networks of workstations. A third is the design of Salam which exploits domain parallelism and provides a collection of arithmetic and communication operations. A fourth is the design of control and node programs that emphasize optimization for distributed memory computers and networks of high-performance workstations. / Source: Dissertation Abstracts International, Volume: 52-06, Section: B, page: 3156. / Major Professor: Gregory A. Riccardi. / Thesis (Ph.D.)--The Florida State University, 1991. Computer Science

Search results