Global ETD Search

331	Performance and Cost Optimization for Distributed Cloud-native Systems Ashraf Y Mahgoub (13169517) 28 July 2022 (has links) <p> First, NoSQL data-stores provide a set of features that is demanded by high perfor?mance computing (HPC) applications such as scalability, availability and schema flexibility. High performance computing (HPC) applications, such as metagenomics and other big data systems, need to store and analyze huge volumes of semi-structured data. Such applica?tions often rely on NoSQL-based datastores, and optimizing these databases is a challenging endeavor, with over 50 configuration parameters in Cassandra alone. As the application executes, database workloads can change rapidly over time (e.g. from read-heavy to write-heavy), and a system tuned for one phase of the workload becomes suboptimal when the workload changes. </p> Cloud computing Distributed systems and algorithms Operating systems performance tuning
332	EFFICIENT AND PRODUCTIVE GPU PROGRAMMING Mengchi Zhang (13109886) 28 July 2022 (has links) <p> </p> <p>Productive programmable accelerators, like GPUs, have been developed for generations to support programming features. The ever-increasing performance improves the usability of programming features on GPUs, and these programming features further ease the porting of code and data structure from CPU to GPU. However, GPU programming features, such as function call or runtime polymorphism, have not been well explored or optimized.</p> <p>I identify efficient and productive GPU programming as a potential area to exploit. Although many programming paradigms are well studied and efficiently supported on CPU architectures, their performance on novel accelerators, like GPUs, has never been studied, evaluated, and made perfect. For instance, programming with functions is a commonplace programming paradigm that shapes software programs with modularity and simplifies code with reusability. A large amount of work has been proposed to alleviate function calling overhead on CPUs, however, less paper talked about its deficiencies on GPUs. On the other hand, polymorphism amplifies an object’s behaviors at runtime. A body of work targets</p> <p>efficient polymorphism on CPUs, but no work has ever discussed this feature under GPU contexts.</p> <p><br></p> <p>In this dissertation, I discussed those two programming features on GPU architectures. First, I performed the first study to identify the deficiency of GPU polymorphism. I created micro-benchmarks to evaluate virtual function overhead in controlled settings and the first GPU polymorphic benchmark suite, ParaPoly, to investigate real-world scenarios. The micro-benchmarks indicated that the virtual function overhead is usually negligible but can</p> <p>cause up to a 7x slowdown. Virtual functions in ParaPoly show a geometric meaning of 77% overhead on GPUs compared to the function’s inlined version. Second, I proposed two novel techniques that determine an object’s type only by its address pointer to improve GPU polymorphism. The first technique, Coordinated Object</p> <p>Allocation and function Lookup (COAL) is a software-only technique that uses the object’s address to determine its type. The second technique, TypePointer, needs hardware modification to embed the object’s type information into its address pointer. COAL achieves 80% and 6% improvements, and TypePointer achieves 90% and 12% over contemporary CUDA and our type-based SharedOA.</p> <p>Considering the growth of GPU programs, function calls become a pervasive paradigm to be consistently used on GPUs. I also identified the overhead of excessive register spilling with function calls on GPU. To diminish this cost, I proposed a novel Massively Multithreaded Register Windowing technique with Variable Size Register Window and Register-Conscious Warp Scheduling. Our techniques improve the representative workloads with a geometric</p> <p>mean of 1.18x with only 1.8% hardware storage overhead.</p> Digital processor architectures Distributed systems and algorithms Operating systems Programming languages GPU Programmability Function Virtual Function Polymorphism Object-Oriented Programming
333	Scalable and Energy-Efficient SIMT Systems for Deep Learning and Data Center Microservices Mahmoud Khairy A. Abdallah (12894191) 04 July 2022 (has links) <p> </p> <p>Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transistors per chip have reached their breaking point. As a result, High-Performance Computing (HPC) domain and cloud data centers are encountering significant energy, cost, and environmental hurdles that have led them to embrace custom hardware/software solutions. Single Instruction Multiple Thread (SIMT) accelerators, like Graphics Processing Units (GPUs), are compelling solutions to achieve considerable energy efficiency while still preserving programmability in the twilight of Moore’s Law.</p> <p>In the HPC and Deep Learning (DL) domain, the death of single-chip GPU performance scaling will usher in a renaissance in multi-chip Non-Uniform Memory Access (NUMA) scaling. Advances in silicon interposers and other inter-chip signaling technology will enable single-package systems, composed of multiple chiplets that continue to scale even as per-chip transistors do not. Given this evolving, massively parallel NUMA landscape, the placement of data on each chiplet, or discrete GPU card, and the scheduling of the threads that use that data is a critical factor in system performance and power consumption.</p> <p>Aside from the supercomputer space, general-purpose compute units are still the main driver of data center’s total cost of ownership (TCO). CPUs consume 60% of the total data center power budget, half of which comes from the CPU pipeline’s frontend. Coupled with the hardware efficiency crisis is an increased desire for programmer productivity, flexible scalability, and nimble software updates that have led to the rise of software microservices. Consequently, single servers are now packed with many threads executing the same, relatively small task on different data.</p> <p>In this dissertation, I discuss these new paradigm shifts, addressing the following concerns: (1) how do we overcome the non-uniform memory access overhead for next-generation multi-chiplet GPUs in the era of DL-driven workloads?; (2) how can we improve the energy efficiency of data center’s CPUs in the light of microservices evolution and request similarity?; and (3) how to study such rapidly-evolving systems with an accurate and extensible SIMT performance modeling?</p> Distributed systems and algorithms Operating systems Programming languages SIMT Deep Learning Microservices Systems GPU computing Data Center Energy Efficiency
334	Multitasking for sensor based systems Reddy, Srinivas T. January 1985 (has links) Multitasking systems are being used increasingly for real-time applications. Multitasking is suited very well for real-time systems since events in the real world do not occur in strict sequence but rather tend to overlap. Multitasking operating systems coordinate the activities of the different overlapping functions and give the user the appearance of concurrent activity. The coordination and scheduling is performed according to a user defined order of importance or priority. There are many multi tasking operating systems available for all the popular microprocessors. One such multitasking executive is VRTX/86 for the 8086 microprocessor. This executive comes in a PROM and is independent of any specific hardware configuration. Using this executive the IBM PC has been converted into a multitasking environment and multitasking test programs have been executed on the PC. A general methodology for defining tasks and assigning priorities to these tasks has been defined. Using this methodology a typical real-time application called a Vehicle Instrumentation System was developed. / M.S. LD5655.V855 1985.R422 IBM Personal Computer Operating systems (Computers) Real-time data processing
335	A technology reference model for client/server software development Nienaber, R. C. (Rita Charlotte) 06 1900 (has links) In today's highly competitive global economy, information resources representing enterprise-wide information are essential to the survival of an organization. The development of and increase in the use of personal computers and data communication networks are supporting or, in many cases, replacing the traditional computer mainstay of corporations. The client/server model incorporates mainframe programming with desktop applications on personal computers. The aim of the research is to compile a technology model for the development of client/server software. A comprehensive overview of the individual components of the client/server system is given. The different methodologies, tools and techniques that can be used are reviewed, as well as client/server-specific design issues. The research is intended to create a road map in the form of a Technology Reference Model for Client/Server Software Development. / Computing / M. Sc. (Information Systems) Client/Server Open systems Software development Software standards Interoperability Object-orientation Middleware Groupware Reference model Distributed systems 004.36 Client/server computing
336	A semi-formal comparison between the Common Object Request Broker Architecture (COBRA) and the Distributed Component Object Model (DCOM) Conradie, Pieter Wynand 06 1900 (has links) The way in which application systems and software are built has changed dramatically over the past few years. This is mainly due to advances in hardware technology, programming languages, as well as the requirement to build better software application systems in less time. The importance of mondial (worldwide) communication between systems is also growing exponentially. People are using network-based applications daily, communicating not only locally, but also globally. The Internet, the global network, therefore plays a significant role in the development of new software. Distributed object computing is one of the computing paradigms that promise to solve the need to develop clienVserver application systems, communicating over heterogeneous environments. This study, of limited scope, concentrates on one crucial element without which distributed object computing cannot be implemented. This element is the communication software, also called middleware, which allows objects situated on different hardware platforms to communicate over a network. Two of the most important middleware standards for distributed object computing today are the Common Object Request Broker Architecture (CORBA) from the Object Management Group, and the Distributed Component Object Model (DCOM) from Microsoft Corporation. Each of these standards is implemented in commercially available products, allowing distributed objects to communicate over heterogeneous networks. In studying each of the middleware standards, a formal way of comparing CORBA and DCOM is presented, namely meta-modelling. For each of these two distributed object infrastructures (middleware), meta-models are constructed. Based on this uniform and unbiased approach, a comparison of the two distributed object infrastructures is then performed. The results are given as a set of tables in which the differences and similarities of each distributed object infrastructure are exhibited. By adopting this approach, errors caused by misunderstanding or misinterpretation are minimised. Consequently, an accurate and unbiased comparison between CORBA and DCOM is made possible, which constitutes the main aim of this dissertation. / Computing / M. Sc. (Computer Science) 004.36 CORBA (Computer architecture) DCOM (Computer architecture) Client/server computing Internet programming
337	Contributions for improving debugging of kernel-level services in a monolithic operating system / Contributions à l'amélioration du débogage des services noyau dans un système d'exploitation monolithique Bissyande, Tegawende 12 March 2013 (has links) Alors que la recherche sur la qualité du code des systèmes a connu un formidable engouement, les systèmes d’exploitation sont encore aux prises avec des problèmes de fiabilité notamment dus aux bogues de programmation au niveau des services noyaux tels que les pilotes de périphériques et l’implémentation des systèmes de fichiers. Des études ont en effet montré que chaque version du noyau Linux contient entre 600 et 700 fautes, et que la propension des pilotes de périphériques à contenir des erreurs est jusqu’à sept fois plus élevée que toute autre partie du noyau. Ces chiffres suggèrent que le code des services noyau n’est pas suffisamment testé et que de nombreux défauts passent inaperçus ou sont difficiles à réparer par des programmeurs non-experts, ces derniers formant pourtant la majorité des développeurs de services. Cette thèse propose une nouvelle approche pour le débogage et le test des services noyau. Notre approche est focalisée sur l’interaction entre les services noyau et le noyau central en abordant la question des “trous de sûreté” dans le code de définition des fonctions de l’API du noyau. Dans le contexte du noyau Linux, nous avons mis en place une approche automatique, dénommée Diagnosys, qui repose sur l’analyse statique du code du noyau afin d’identifier, classer et exposer les différents trous de sûreté de l’API qui pourraient donner lieu à des fautes d’exécution lorsque les fonctions sont utilisées dans du code de service écrit par des développeurs ayant une connaissance limitée des subtilités du noyau. Pour illustrer notre approche, nous avons implémenté Diagnosys pour la version 2.6.32 du noyau Linux. Nous avons montré ses avantages à soutenir les développeurs dans leurs activités de tests et de débogage. / Despite the existence of an overwhelming amount of research on the quality of system software, Operating Systems are still plagued with reliability issues mainly caused by defects in kernel-level services such as device drivers and file systems. Studies have indeed shown that each release of the Linux kernel contains between 600 and 700 faults, and that the propensity of device drivers to contain errors is up to seven times higher than any other part of the kernel. These numbers suggest that kernel-level service code is not sufficiently tested and that many faults remain unnoticed or are hard to fix bynon-expert programmers who account for the majority of service developers. This thesis proposes a new approach to the debugging and testing of kernel-level services focused on the interaction between the services and the core kernel. The approach tackles the issue of safety holes in the implementation of kernel API functions. For Linux, we have instantiated the Diagnosys automated approach which relies on static analysis of kernel code to identify, categorize and expose the different safety holes of API functions which can turn into runtime faults when the functions are used in service code by developers with limited knowledge on the intricacies of kernel code. To illustrate our approach, we have implemented Diagnosys for Linux 2.6.32 and shown its benefits in supporting developers in their testing and debugging tasks. Diagnosys Systèmes d'exploitation Débogage Services noyau Pilotes de périphériques Génie Logiciel Linux Tests Diagnosys Operating systems Debugging Kernel-level services Device drivers Software Engineering Linux Testing
338	Uso do conceito de qualidade do conteúdo da memória em algoritmos de gerência de memória paginada. / Using the concept of quality of memory contents in paged memory management algorithms. Silva, Ricardo Leandro Piantola da 17 July 2015 (has links) No contexto da gerência de memória em sistemas operacionais, vários grupos de pesquisa desenvolvem trabalhos na área de algoritmos para gerência de memória virtual e alguns algoritmos para substituição de páginas têm sido propostos na literatura recente. No entanto, essas propostas não chegaram a um algoritmo que resolva satisfatoriamente o problema de desempenho na gerência de memória. Ainda não existe um consenso entre os pesquisadores de como essa questão deve ser tratada de maneira eficiente, e as propostas de algoritmos desenvolvidas possuem grande sobrecarga devido à sua complexidade. O objetivo deste trabalho é propor uma forma eficiente de gerenciar a memória com a composição de técnicas de busca, posicionamento e substituição de páginas. A hipótese aqui adotada é que para tratar o problema da gerência da memória é melhor consumir recursos computacionais determinando quais páginas deveriam estar na memória em um determinado instante de tempo do que gastar recursos determinando qual página será retirada da memória. A tese apresenta uma reanálise dos principais trabalhos que têm como objetivo o desempenho da gerência de memória, tornando possível retirar conclusões e ideias sobre quais fatores influenciam de maneira positiva com relação ao desempenho do sistema. A partir deste estudo, é determinado o conceito de qualidade do conteúdo da memória e criada uma métrica para medi-la. Aplicando tal conceito, formula-se um método sistêmico de construção de algoritmos de gerência de memória. Realiza-se uma aplicação desse método, criando-se então, os algoritmos RR+ng e RRlock+ng. A métrica é aplicada em simulações na fase final do método, mostrando-se adequada para realizar as análises. Os resultados obtidos mostram que a hipótese tratar o problema da gerência da memória, consumindo recursos computacionais determinando quais páginas devem estar na memória ao invés de quais devem deixá-la mostrou-se válida e parece promissora. / When it comes to memory management in operating systems, many research groups have been developing works in the memory management algorithms area and some page replacement algorithms have been proposed in the recent literature. Such proposals were not successful in developing algorithms that worked well as far as the performance in memory management is concerned. There is no consensus among the researches about how this problem can be treated efficiently, and the algorithms proposed have high overhead because of their complexity. The objective of this work is to propose an efficient memory management with the composition of page fetch, placement and replacement techniques. This thesis hypothesis is that to treat the memory management problem it is better to consume computational resources determining which pages must be in the memory in a given time than to waste resources defining which pages would be evicted from the memory. This work presents a reanalysis of the main works whose objective is memory management performance, making it possible to draw different conclusions and ideas about what factors may have a positive influence with respect to system performance. This study develops both the concept of quality of memory contents and a metric to measure it. Besides, a systemic method to create memory management algorithms is devised, applying the concept just created. Then, the method is followed, creating the RR+ng and RRlock+ng algorithms. In the final phase of the method, the metric is applied in simulations, proving to be adequate to perform the analysis. The results show that the idea of treating the memory management problem, consuming computational resources to determine which pages must be in the memory instead of which ones must leave it, hold true and seems to be promising. Demand paging Gerência de memória Memória virtual Memory management Operating systems Page replacement Paginação sob demanda Sistemas operacionais Substituição de páginas Virtual memory
339	Optimisation de l'ordonnancement sous contrainte de faisabilité / Scheduling optimisation under feasibility constraint Grenier, Mathieu 26 October 2007 (has links) L’objectif que nous nous sommes fixés dans ce travail est la conception d’algorithmes d’ordonnancement temps réel en-ligne faisables optimisant l’utilisation de la plate-forme d’exécution et/ou des critères applicatifs de qualité de service propres à l’application. Nous avons en particulier étudié l’ordonnancement d’activités sur une ressource unique. Deux cas ont été analysés : le cas de tâches indépendantes périodiques s’exécutant sur un processeur et le cas de flux de messages indépendants périodiques sur un réseau de terrain avec accès au médium priorisé. Nos contributions reposent sur le “modèle classique” de l’ordonnancement temps réel où le système est représenté par un ensemble d’activités périodiques indépendantes et deux problématiques ont été abordées : • optimisation de l’utilisation de la plate-forme d’exécution : utiliser au mieux le potentiel de la plate-forme d’exécution tout en garantissant le respect des contraintes temporelles imposées au système ; ceci optimise le nombre de configurations faisables, • optimisation des critères applicatifs de qualité de service propres à l’application (i.e., pris en compte des performances de l’application autre que la faisabilité) : garantir les contraintes de temps tout en optimisant les performances de l’application. Nous avons donc proposé : • des méthodes de configurations permettant d’optimiser l’utilisation de la plate-forme d’exécution (i.e., maximiser faisabilité) en fixant les paramètres des politiques ou des systèmes considérés d’une manière appropriée. Deux études ont été conduites dans ce cadre : • allocation des “offsets” dans les systèmes “offset free”, • allocation de priorités, de politiques et de quantum dans les systèmes conformes au standard Posix 1003.1b, • une nouvelle classe de politiques d’ordonnancement permettant d’optimiser des critères de performances propres à l’application. De plus, une analyse d’ordonnancement générique pour cette classe a été proposée / Our goal is to come up with feasible (i.e., all required time constraints are met) on-line real-time scheduling algorithms. These algorithms have to optimise 1) the utilisation of the execution platform (i.e., meet time constraints and use platform at its fullest potential) and/or 2) optimise the application dependent performance criteria. We study two cases : the case of independent periodic tasks scheduled on a processor and the case of periodic traffic streams scheduled on a priority bus. To deal with these two problems, we propose : • Configuration methods to allow to optmlise the utilisation rate of the execution platform by setting the parameters of the policies or of the activities of the considered system. We perform two studies : the allocation of offsets in "Offset free" systems (I.E., offsets can be chosen off-line) and the priorities, policies and quantum allocations in systems compliant to the standard Posix 1003.1B, • A new class of scheduling policies to allow optimising application performance dependent criteria Ordonnancement Priorité dynamique Priorité fixe Round-robin Systèmes d’exploitation Multi-tâches Ttemps réel Scheduling Round-robin Dynamic priority Real-time Operating systems Fixed priority Multi-tasking
340	Experimental implementation of the new prototype in Linux Unknown Date (has links) The Transmission Control Protocol (TCP) is one of the core protocols of the Internet protocol suite. In the wired network, TCP performs remarkably well due to its scalability and distributed end-to-end congestion control algorithms. However, many studies have shown that the unmodified standard TCP performs poorly in networks with large bandwidth-delay products and/or lossy wireless links. In this thesis, we analyze the problems TCP exhibits in the wireless communication and develop TCP congestion control algorithm for mobile applications. We show that the optimal TCP congestion control and link scheduling scheme amounts to window-control oriented implicit primaldual solvers for underlying network utility maximization. Based on this idea, we used a scalable congestion control algorithm called QUeueIng-Control (QUIC) TCP where it utilizes queueing-delay based MaxWeight-type scheduler for wireless links developed in [34]. Simulation and test results are provided to evaluate the proposed schemes in practical networks. / by Gee Won Han. / Thesis (M.S.C.S.)--Florida Atlantic University, 2013. / Includes bibliography. / Mode of access: World Wide Web. / System requirements: Adobe Reader. Ad hoc networks (Computer networks) Wireless sensor networks Embedded computer systems--Programming Operating systems (Computers) Network performance (Telecommunication) TCP/IP (Computer network protocol)

Search results