• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 247
  • 27
  • 19
  • 12
  • 10
  • 8
  • 6
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 395
  • 135
  • 79
  • 64
  • 62
  • 57
  • 55
  • 52
  • 49
  • 48
  • 46
  • 42
  • 35
  • 35
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Nutrient and Carbon-Dioxide Requirements for Large-Scale Microalgae Biofuel Production

Shurtz, Benjamin K. 01 August 2013 (has links)
Growing demand for energy worldwide has increased interest in the production of renewable fuels, with microalgae representing a promising feedstock. The large-scale feasibility of microalgae based biofuels has previously been evaluated through technoeconomic and environmental impact assessments, with limited work performed on resource requirements. This study presents the use of a modular engineering system process model, founded on literature, to evaluate the nutrient (nitrogen and phosphorus) and carbon dioxide resource demand of five large-scale microalgae to biofuels production systems. The baseline scenario, representative of a near-term large-scale production system includes process models for growth, dewater, lipid extraction, anaerobic digestion, and biofuel conversion. Optimistic and conservative process scenarios are simulated to represent practical best and worst case system performance to bound the total resource demand of large-scale production. Baseline modeling results combined with current US nutrient availability from fertilizer and wastewater are used to perform a scalability assessment. Results show nutrient requirements represent a major barrier to the development of microalgae based biofuels to meet the US Department of Energy 2030 renewable fuel goal of 30% of transportation fuel, or 60 billion gallons per year. Specifically, results from the baseline and optimistic fuel production systems show wastewater sources can provide sufficient nutrients to produce 3.8 billion gallons and 13 billion gallons of fuel per year, corresponding to 6% and 22% of the DOE goal, respectively. High resource demand necessitates nutrient recovery from the lipid-extracted algae, thus limiting its use as a value-added co-product. Discussion focuses on system scalability, comparison of results to previous resource assessments, and model sensitivity of nutrient and carbon dioxide resource requirements to system parameter inputs.
32

SCALABLE HYBRID DATA DISSEMINATION FOR INTERNET HOT SPOTS

Zhang, Wenhui 17 January 2007 (has links)
No description available.
33

Scalable Analysis of Large Dynamic Dependence Graphs

Singh, Shashank 01 September 2015 (has links)
No description available.
34

Design of Secure Scalable Frameworks for Next Generation Cellular Networks

Atalay, Tolga Omer 06 June 2024 (has links)
Leveraging Network Functions Virtualization (NFV), the Fifth Generation (5G) core, and Radio Access Network (RAN) functions are implemented as Virtual Network Functions (VNFs) on Commercial-off-the-Shelf (COTS) hardware. The use of virtualized micro-services to implement these 5G VNFs enables the flexible and scalable construction of end-to-end logically isolated network fragments denoted as network slices. The goal of this dissertation is to design more scalable, flexible, secure, and visible 5G networks. Thus, each chapter will present a design and evaluation that addresses one or more of these aspects. The first objective is to understand the limits of 5G core micro-service virtualization when using lightweight containers for constructing various network slicing models with different service guarantees. The initial deployment model consists of the OpenAirInterface (OAI) 5G core in a containerized setting to create a universally deployable testbed. Operational and computational stress tests are performed on individual 5G core VNFs where different network slicing models are created that are applicable to real-life scenarios. The analysis captures the increase in compute resource consumption of individual VNFs during various core network procedures. Furthermore, using different network slicing models, the progressive increase in resource consumption can be seen as the service guarantees of the slices become more demanding. The framework created using this testbed is the first to provide such analytics on lightweight virtualized 5G core VNFs with large-scale end-to-end connections. Moving into the cloud-native ecosystem, 5G core deployments will be orchestrated by middle-men Network-slice-as-a-Service (NSaaS) providers. These NSaaS providers will consume Infrastructure-as-a-service (IaaS) offerings and offer network slices to Mobile Virtual Network Operators (MVNOs). To investigate this future model, end-to-end emulated 5G deployments are conducted to offer insight into the cost implications surrounding such NSaaS offerings in the cloud. The deployment features real-life traffic patterns corresponding to practical use cases which are matched with specific network slicing models. These models are implemented in a 5G testbed to gather compute resource consumption metrics. The obtained data are used to formulate infrastructure procurement costs for popular cloud providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. The results show steady patterns in compute consumption across multiple use cases, which are used to make high-scale cost projections for public cloud deployments. In the end, the trade-off between cost and throughput is achieved by decentralizing the network slices and offloading the user plane. The next step is the demystification of 5G traffic patterns using the Over-the-Air (OTA) testbed. An open-source OTA testbed is constructed leveraging advanced features of 5G radio access and core networks developed by OAI. The achievable Quality of Service (QoS) is evaluated to provide visibility into the compute consumption of individual components. Additionally, a method is presented to utilize WiFi devices for experimenting with 5G QoS. Resource consumption analytics are collected from the 5G user plane in correlation to raw traffic patterns. The results show that the open-source 5G testbed can sustain sub-20ms latency with up to 80Mbps throughput over a 25m range using COTS devices. Device connection remains stable while supporting different use cases such as AR/VR, online gaming, video streaming, and Voice-over IP (VoIP). It illustrates how these popular use cases affect CPU utilization in the user plane. This provides insight into the capabilities of existing 5G solutions by demystifying the resource needs of specific use cases. Moving into public cloud-based deployments, creates a growing demand for general-purpose compute resources as 5G deployments continue to expand. Given their existing infrastructures, cloud providers such as AWS are attractive platforms to address this need. Therefore, it is crucial to understand the control and user plane QoS implications associated with deploying the 5G core on top of AWS. To this end, a 5G testbed is constructed using open-source components spanning multiple global locations within the AWS infrastructure. Using different core deployment strategies by shuffling VNFs into AWS edge zones, an operational breakdown of the latency overhead is conducted for 5G procedures. The results show that moving specific VNFs into edge regions reduces the latency overhead for key 5G operations. Multiple user plane connections are instantiated between availability zones and edge regions with different traffic loads. As more data sessions are instantiated, it is observed that the deterioration of connection quality varies depending on traffic load. Ultimately, the findings provide new insights for MVNOs to determine favorable placements of their 5G core entities in the cloud. The transition into cloud-native deployments has encouraged the development of supportive platforms for 5G. One such framework is the OpenRAN initiative, led by the O-RAN Alliance. The OpenRAN initiative promotes an open Radio Access Network (RAN) and offers operators fine-grained control over the radio stack. To that end, O-RAN introduces new components to the 5G ecosystem, such as the near real-time RAN Intelligent Controller (near-RT RIC) and the accompanying Extensible Applications (xApps). The introduction of these entities expands the 5G threat surface. Furthermore, with the movement from proprietary hardware to virtual environments enabled by NFV, attack vectors that exploit the existing NFV attack surface pose additional threats. To deal with these threats, the textbf{xApp repository function (XRF)} framework is constructed for scalable authentication, authorization, and discovery of xApps. In order to harden the XRF microservices, deployments are isolated using Intel Software Guard Extensions (SGX). The XRF modules are individually benchmarked to compare how different microservices behave in terms of computational overhead when deployed in virtual and hardware-based isolation sandboxes. The evaluation shows that the XRF framework scales efficiently in a multi-threaded Kubernetes environment. Isolation of the XRF microservices introduces different amounts of processing overhead depending on the sandboxing strategy. A security analysis is conducted to show how the XRF framework addresses chosen key issues from the O-RAN and 5G standardization efforts. In the final chapter of the dissertation, the focus shifts towards the development and evaluation of 5G-STREAM, a service mesh tailored for rapid, efficient, and authorized microservices in cloud-based 5G core networks. 5G-STREAM addresses critical scalability and efficiency challenges in the 5G core control plane by optimizing traffic and reducing signaling congestion across distributed cloud environments. The framework enhances Virtual Network Function (VNF) service chains' topology awareness, enabling dynamic configuration of communication pathways which significantly reduces discovery and authorization signaling overhead. A prototype of 5G-STREAM was developed and tested, showing a reduction of up to 2× in inter-VNF latency per HTTP transaction in the core network service chains, particularly benefiting larger service chains with extensive messaging. Additionally, 5G-STREAM's deployment strategies for VNF placement are explored to further optimize performance and cost efficiency in cloud-based infrastructures, ultimately providing a scalable solution that can adapt to increasing network demands while maintaining robust service levels. This innovative approach signifies a pivotal advancement in managing 5G core networks, paving the way for more dynamic, efficient, and cost-effective cellular network infrastructures. Overall, this dissertation is devoted to designing, building, and evaluating scalable and secure 5G deployments. / Doctor of Philosophy / Ever since the emergence of the Global System for Mobile Communications (GSM), humanity has relied on cellular communications for the fast and efficient exchange of information. Today, with the Fifth Generation (5G) of mobile networks, what may have passed for science fiction 40 years ago, is now slowly becoming reality. In addition to enabling extremely fast data rates and low latency for user handsets, 5G networks promise to deliver a very rich and integrated ecosystem. This includes a plethora of interconnected devices ranging from smart home sensors to Augmented/Virtual Reality equipment. To that end, the stride from the Fourth Generation (4G) of mobile networks to 5G is yet to be the biggest evolutionary step in cellular networks. In 4G, the backbone entities that glued the base stations together were deployed on proprietary hardware. With 5G, these entities have been moved to Commercial off-the-shelf (COTS) hardware which can be hosted by cloud providers (e.g., Amazon, Google, Microsoft) or various Small to Medium Enterprises (SMEs). This substantial paradigm shift in cellular network deployments has introduced a variety of security, flexibility, and scalability concerns around the deployment of 5G networks. Thus, this thesis is a culmination of a wide range of studies that seek to collectively facilitate the secure, scalable, and flexible deployment of 5G networks in different types of environments. Starting with small-scale optimizations and building up towards the analysis of global 5G deployments, the goal of this work is to demystify the scalability implications of deploying 5G networks. On this journey, several security flaws are identified within the 5G ecosystem, and frameworks are constructed to address them in a fluent manner.
35

The Visual Scalability of Integrated and Multiple View Visualizations for Large, High Resolution Displays

Yost, Beth Ann 19 April 2007 (has links)
Geospatial intelligence analysts, epidemiologists, sociologists, and biologists are all faced with trying to understand massive datasets that require integrating spatial and multidimensional data. Information visualizations are often used to aid these scientists, but designing the visualizations is challenging. One aspect of the visualization design space is a choice of when to use a single complex integrated view and when to use multiple simple views. Because of the many tradeoffs involved with this decision it is not always clear which design to use. Additionally, as the cost of display technologies continues to decrease, large, high resolution displays are gradually becoming a more viable option for single users. These large displays offer new opportunities for scaling up visualization to very large datasets. Visualizations that are visually scalable are able to effectively display large datasets in terms of both graphical scalability (the number of pixels required) and perceptual scalability (the effectiveness of a visualization, measured in terms of user performance, as the amount of data being visualized is scaled-up). The purpose of this research was to compare information visualization designs for integrating spatial and multidimensional data in terms of their visual scalability for large, high resolution displays. Toward that goal a hierarchical design space was articulated and a series of user experiments were performed. A baseline was established by comparing user performance with opposing visualizations on a desktop monitor. Then, visualizations were compared as more information was added using the additional pixels available with a large, high resolution display. Results showed that integrated views were more visually scalable than multiple view visualizations. The visualizations tested were even scalable beyond the limits of visual acuity. User performance on certain tasks improved due to the additional information that was visualized even on a display with enough pixels to require physical navigation to visually distinguish all elements. The reasons for the benefits of integrated views on large, high resolution displays include a reduction in navigation due to spatial grouping and visual aggregation resulting in the emergence of patterns. These findings can help with the design of information visualizations for large, high resolution displays. / Ph. D.
36

POSEIDON: The First Safe and Scalable Persistent Memory Allocator

Demeri, Anthony K. 20 May 2020 (has links)
With the advent of byte-addressable Non-Volatile Memory (NVMM), the need for a safe, scalable and high-performing memory allocator is inevitable. A slow memory allocator can bottleneck the entire application stack, while an unsecure memory allocator can render underlying systems and applications inconsistent upon program bugs or system failure. Unlike DRAM-based memory allocators, it is indispensable for an NVMM allocator to guarantee its heap metadata safety from both internal and external errors. An effective NVMM memory allocator should be 1) safe 2) scalable and 3) high performing. Unfortunately, none of the existing persistent memory allocators achieve all three requisites; critically, we also note: the de-facto NVMM allocator, Intel's Persistent Memory Development Kit (PMDK), is vulnerable to silent data corruption and persistent memory leaks as result of a simple heap overflow. We closely investigate the existing defacto NVMM memory allocators, especially PMDK, to study their vulnerability to metadata corruption and reasons for poor performance and scalability. We propose Poseidon, which is safe, fast and scalable. The premise of Poseidon revolves around providing a user application with per-CPU sub-heaps for scalability, while managing the heap metadata in a segregated fashion and efficiently protecting the metadata using a scalable hardware-based protection scheme, Intel's Memory Protection Keys (MPK). We evaluate Poseidon with a wide array of microbenchmarks and real-world benchmarks, noting: Poseidon outperforms the state-of-art allocators by a significant margin, showing improved scalability and performance, while also guaranteeing metadata safety. / Master of Science / Since the dawn of time, civilization has revolved around effective communication. From smoke signals to telegraphs and beyond, communication has continued to be a cornerstone of successful societies. Today, communication and collaboration occur, daily, on a global scale, such that even sub-second units of time are critical to successful societal operation. Naturally, many forms of modern communication revolve around our digital systems, such as personal computers, email servers, and social networking database applications. There is, thus, a never-ending surge of digital system development, constantly striving toward increased performance. For some time, increasing a system's dynamic random-access memory, or DRAM, has been able to provide performance gains; unfortunately, due to thermal and power constraints, such an increase is no longer feasible. Additionally, loss of power on a DRAM system causes bothersome loss of data, since the memory storage is volatile to power loss. Now, we are on the advent of an entirely new physical memory system, termed non-volatile main memory (NVMM), which has near identical performance properties to DRAM, but is operational in much larger quantities, thus allowing increased overall system speed. Alas, such a system also imposes additional requirements upon software developers; since, for NVMM, all memory updates are permanent, such that a failed update can cause persistent memory corruption. Regrettably, the existing software standard, led by Intel's Persistent Memory Development Kit (PMDK), is both unsecure (allowing for permanent memory corruption, with ease), low performance, and a bottleneck for multicore systems. Here, we present a secure, high performing solution, termed Poseidon, which harnesses the full potential of NVMM.
37

Scalable Robust Models Under Adversarial Data Corruption

Zhang, Xuchao 04 April 2019 (has links)
The presence of noise and corruption in real-world data can be inevitably caused by accidental outliers, transmission loss, or even adversarial data attacks. Unlike traditional random noise usually assume a specific distribution with low corruption ratio, the data collected from crowdsourcing or labeled by weak annotators can contain adversarial data corruption. More challenge, the adversarial data corruption can be arbitrary, unbounded and do not follow any specific distribution. In addition, in the era of data explosion, the fast-growing amount of data makes the robust models more difficult to handle large-scale data sets. This thesis focuses on the development of methods for scalable robust models under the adversarial data corruption assumptions. Four methods are proposed, including robust regression via heuristic hard-thresholding, online and distributed robust regression with adversarial noises, self-paced robust learning for leveraging clean labels in noisy data, and robust regression via online feature selection with adversarial noises. Moreover, I extended the self-paced robust learning method to its distributed version for the scalability of the proposed algorithm, named distributed self-paced learning in alternating direction method of multiplier. Last, a robust multi-factor personality prediction model is proposed to hand the correlated data noises. For the first method, existing solutions for robust regression lack rigorous recovery guarantee of regression coefficients under the adversarial data corruption with no prior knowledge of corruption ratio. The proposed contributions of our work include: (1) Propose efficient algorithms to address the robust least-square regression problem; (2) Design effective approaches to estimate the corruption ratio; (3) Provide a rigorous robustness guarantee for regression coefficient recovery; and (4) Conduct extensive experiments for performance evaluation. For the second method, existing robust learning methods typically focus on modeling the entire dataset at once; however, they may meet the bottleneck of memory and computation as more and more datasets are becoming too large to be handled integrally. The proposed contributions of our work for this task include: (1) Formulate a framework for the scalable robust least-squares regression problem; (2) Propose online and distributed algorithms to handle the adversarial corruption; (3) Provide a rigorous robustness guarantee for regression coefficient recovery; and (4) Conduct extensive experiments for performance evaluations. For the third method, leveraging the prior knowledge of clean labels in noisy data is actually a crucial issue in practice, but existing robust learning methods typically focus more on eliminating noisy data. However, the data collected by ``weak annotator" or crowd-sourcing can be too noisy for existing robust methods to train an accurate model. Moreover, existing work that utilize additional clean labels are usually designed for some specific problems such as image classification. These methods typically utilize clean labels in large-scale noisy data based on their additional domain knowledge; however, these approaches are difficult to handle extremely noisy data and relied on their domain knowledge heavily, which makes them difficult be used in more general problems. The proposed contributions of our work for this task include: (1) Formulating a framework to leverage the clean labels in noisy data; (2) Proposing a self-paced robust learning algorithm to train models under the supervision of clean labels; (3) Providing a theoretical analysis for the convergence of the proposed algorithm; and (4) Conducting extensive experiments for performance evaluations. For the fourth method, the presence of data corruption in user-generated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection when data contains adversarial corruption; and 3) scaling to a massive dataset. This paper proposes a novel RObust regression algorithm via Online Feature Selection (textit{RoOFS}) that concurrently addresses all the above challenges. Specifically, the algorithm iteratively updates the regression coefficients and the uncorrupted set via a robust online feature substitution method. We also prove that our algorithm has a restricted error bound compared to the optimal solution. Extensive empirical experiments in both synthetic and real-world data sets demonstrated that the effectiveness of our new method is superior to that of existing methods in the recovery of both feature selection and regression coefficients, with very competitive efficiency. For the fifth method, existing self-paced learning approaches typically focus on modeling the entire dataset at once; however, this may introduce a bottleneck in terms of memory and computation, as today's fast-growing datasets are becoming too large to be handled integrally. The proposed contributions of our work for this task include: (1) Reformulate the self-paced problem into a distributed setting.; (2) A distributed self-paced learning algorithm based on consensus ADMM is proposed to solve the textit{SPL} problem in a distributed setting; (3) A theoretical analysis is provided for the convergence of our proposed textit{DSPL} algorithm; and (4) Extensive experiments have been conducted utilizing both synthetic and real-world data based on a robust regression task. For the last method, personality prediction in multiple factors, such as openness and agreeableness, is growing in interest especially in the context of social media, which contains massive online posts or likes that can potentially reveal an individual's personality. However, the data collected from social media inevitably contains massive amounts of noise and corruption. To address it, traditional robust methods still suffer from several important challenges, including 1) existence of correlated corruption among multiple factors, 2) difficulty in estimating the corruption ratio in multi-factor data, and 3) scalability to massive datasets. This paper proposes a novel robust multi-factor personality prediction model that concurrently addresses all the above challenges by developing a distributed robust regression algorithm. Specifically, the algorithm optimizes regression coefficients of each factor in parallel with a heuristically estimated corruption ratio and then consolidates the uncorrupted set from multiple factors in two strategies: global consensus and majority voting. We also prove that our algorithm benefits from strong guarantees in terms of convergence rates and coefficient recovery, which can be utilized as a generic framework for the multi-factor robust regression problem with correlated corruption property. Extensive experiment on synthetic and real dataset demonstrates that our algorithm is superior to those of existing methods in both effectiveness and efficiency. / Doctor of Philosophy / Social media has experienced a rapid growth during the past decade. Millions of users of sites such as Twitter have been generating and sharing a wide variety of content including texts, images, and other metadata. In addition, social media can be treated as a social sensor that reflects different aspects of our society. Event analytics in social media have enormous significance for applications like disease surveillance, business intelligence, and disaster management. Social media data possesses a number of important characteristics including dynamics, heterogeneity, noisiness, timeliness, big volume, and network properties. These characteristics cause various new challenges and hence invoke many interesting research topics, which will be addressed here. This dissertation focuses on the development of five novel methods for social media-based spatiotemporal event detection and forecasting. The first of these is a novel unsupervised approach for detecting the dynamic keywords of spatial events in targeted domains. This method has been deployed in a practical project for monitoring civil unrest events in several Latin American regions. The second builds on this by discovering the underlying development progress of events, jointly considering the structural contexts and spatiotemporal burstiness. The third seeks to forecast future events using social media data. The basic idea here is to search for subtle patterns in specific cities as indicators of ongoing or future events, where each pattern is defined as a burst of context features (keywords) that are relevant to a specific event. For instance, an initial expression of discontent gas price increases could actually be a potential precursor to a more general protest about government policies. Beyond social media data, in the fourth method proposed here, multiple data sources are leveraged to reflect different aspects of the society for event forecasting. This addresses several important problems, including the common phenomena that different sources may come from different geographical levels and have different available time periods. The fifth study is a novel flu forecasting method based on epidemics modeling and social media mining. A new framework is proposed to integrate prior knowledge of disease propagation mechanisms and real-time information from social media.
38

Controlling Scalability in Distributed Virtual Environments

Singh, Hermanpreet 01 May 2013 (has links)
A Distributed Virtual Environment (DVE) system provides a shared virtual environment where physically separated users can interact and collaborate over a computer network. More simultaneous DVE users could result in intolerable system performance degradation. We address the three major challenges to improve DVE scalability: effective DVE system performance measurement, understanding the controlling factors of system performance/quality and determining the consequences of DVE system changes. We propose a DVE Scalability Engineering (DSE) process that addresses these three major challenges for DVE design. DSE allow us to identify, evaluate, and leverage trade-offs among DVE resources, the DVE software, and the virtual environment. DSE has three stages. First, we show how to simulate different numbers and types of users on DVE resources. Collected user study data is used to identify representative user types. Second, we describe a modeling method to discover the major trade-offs between quality of service and DVE resource usage. The method makes use of a new instrumentation tool called ppt. ppt collects atomic blocks of developer-selected instrumentation at high rates and saves it for offline analysis. Finally, we integrate our load simulation and modeling method into a single process to explore the effects of changes in DVE resources. We use the simple Asteroids DVE as a minimal case study to describe the DSE process. The larger and commercial Torque and Quake III DVE systems provide realistic case studies and demonstrate DSE usage. The Torque case study shows the impact of many users on a DVE system. We apply the DSE process to significantly enhance the Quality of Experience given the available DVE resources. The Quake III case study shows how to identify the DVE network needs and evaluate network characteristics when using a mobile phone platform. We analyze the trade-offs between power consumption and quality of service. The case studies demonstrate the applicability of DSE for discovering and leveraging tradeoffs between Quality of Experience and DVE resource usage. Each of the three stages can be used individually to improve DVE performance. The DSE process enables fast and effective DVE performance improvement. / Ph. D.
39

Scalability of Stepping Stones and Pathways

Venkatachalam, Logambigai 30 May 2008 (has links)
Information Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. "Search" is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP "proof-of-concept" implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents. / Master of Science
40

Multicore Scalability Through Asynchronous Work

Mathew, Ajit 13 January 2020 (has links)
With the end of Moore's Law, computer architects have turned to multicore architecture to provide high performance. Unfortunately, to achieve higher performance, multicores require programs to be parallelized which is an untamed problem. Amdahl's law tells that the maximum theoretical speedup of a program is dictated by the size of the non-parallelizable section of a program. Hence to achieve higher performance, programmers need to reduce the size of sequential code in the program. This thesis explores asynchronous work as a means to reduce sequential portions of program. Using asynchronous work, a programmer can remove tasks which do not affect data consistency from the critical path and can be performed using background thread. Using this idea, the thesis introduces two systems. First, a synchronization mechanism, Multi-Version Read-Log-Update(MV-RLU), which extends Read-Log-Update (RLU) through multi-versioning. At the core of MV-RLU design is a concurrent garbage collection algorithm which reclaims obsolete versions asynchronously reducing blocking of threads. Second, a concurrent and highly scalable index-structure called Hydralist for multi-core. The key idea behind design of Hydralist is that an index-structure can be divided into two component (search layer and data layer) and updates to data layer can be done synchronously while updates to search layer can be propagated asynchronously using background threads. / Master of Science / Up until mid-2000s, Moore's law predicted that performance CPU doubled every two years. This is because improvement in transistor technology allowed smaller transistor which can switch at higher frequency leading to faster CPU clocks. But faster clock leads to higher heat dissipation and as chips reached their thermal limits, computer architects could no longer increase clock speeds. Hence they moved to multicore architecture, wherein a single die contains multiple CPUs, to allow higher performance. Now programmers are required to parallelize their code to take advangtage of all the CPUs in a chip which is a non trivial problem. The theoretical speedup achieved by a program on multicore architecture is dictated by Amdahl's law which describes the non parallelizable code in a program as the limiting factor for speedup. For example, a program with 99% parallelizable code can achieve speedup of 20 whereas a program with 50% parallelizable code can only achieve speedup of 2. Therefore to achieve high speedup, programmers need to reduce size of serial section in their program. One way to reduce sequential section in a program is to remove non-critical task from the sequential section and perform the tasks asynchronously using background thread. This thesis explores this technique in two systems. First, a synchronization mechanism which is used co-ordinate access to shared resource called Multi-Version Read-Log-Update (MV-RLU). MV-RLU achieves high performance by removing garbage collection from critical path and performing it asynchronously using background thread. Second, an index structure, Hydralist, which based on the insight that an index structure can be decomposed into two components, search layer and data layer, and decouples updates to both the layer which allows higher performance. Updates to search layer is done synchronously while updates to data layer is done asynchronously using background threads. Evaluation shows that both the systems perform better than state-of-the-art competitors in a variety of workloads.

Page generated in 0.0339 seconds