• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 243
  • 27
  • 19
  • 12
  • 8
  • 8
  • 6
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 389
  • 135
  • 79
  • 62
  • 62
  • 57
  • 55
  • 51
  • 49
  • 47
  • 46
  • 40
  • 34
  • 34
  • 33
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

A scalability evaluation on CockroachDB

Lifhjelm, Tobias January 2021 (has links)
Databases are a cornerstone in data storage since they store and organize large amounts of data while allowing users to access specific parts of data easily. Databases must however adapt to an increasing amount of users without negatively affect the end-users. CochroachDB (CRDB) is a distributed SQLdatabase that combines consistency related to relational database management systems, with scalability to handle more user requests simultaneously while still being consistent. This paper presents a study that evaluates the scalability properties of CRDB by measuring how latency is affected by the addition of more nodes to a CRDB cluster. The findings show that the latency can decrease with the addition of nodes to a cluster. However, there are cases when more nodes increase the latency.
32

Nutrient and Carbon-Dioxide Requirements for Large-Scale Microalgae Biofuel Production

Shurtz, Benjamin K. 01 August 2013 (has links)
Growing demand for energy worldwide has increased interest in the production of renewable fuels, with microalgae representing a promising feedstock. The large-scale feasibility of microalgae based biofuels has previously been evaluated through technoeconomic and environmental impact assessments, with limited work performed on resource requirements. This study presents the use of a modular engineering system process model, founded on literature, to evaluate the nutrient (nitrogen and phosphorus) and carbon dioxide resource demand of five large-scale microalgae to biofuels production systems. The baseline scenario, representative of a near-term large-scale production system includes process models for growth, dewater, lipid extraction, anaerobic digestion, and biofuel conversion. Optimistic and conservative process scenarios are simulated to represent practical best and worst case system performance to bound the total resource demand of large-scale production. Baseline modeling results combined with current US nutrient availability from fertilizer and wastewater are used to perform a scalability assessment. Results show nutrient requirements represent a major barrier to the development of microalgae based biofuels to meet the US Department of Energy 2030 renewable fuel goal of 30% of transportation fuel, or 60 billion gallons per year. Specifically, results from the baseline and optimistic fuel production systems show wastewater sources can provide sufficient nutrients to produce 3.8 billion gallons and 13 billion gallons of fuel per year, corresponding to 6% and 22% of the DOE goal, respectively. High resource demand necessitates nutrient recovery from the lipid-extracted algae, thus limiting its use as a value-added co-product. Discussion focuses on system scalability, comparison of results to previous resource assessments, and model sensitivity of nutrient and carbon dioxide resource requirements to system parameter inputs.
33

SCALABLE HYBRID DATA DISSEMINATION FOR INTERNET HOT SPOTS

Zhang, Wenhui 17 January 2007 (has links)
No description available.
34

Scalable Analysis of Large Dynamic Dependence Graphs

Singh, Shashank 01 September 2015 (has links)
No description available.
35

Scalable Robust Models Under Adversarial Data Corruption

Zhang, Xuchao 04 April 2019 (has links)
The presence of noise and corruption in real-world data can be inevitably caused by accidental outliers, transmission loss, or even adversarial data attacks. Unlike traditional random noise usually assume a specific distribution with low corruption ratio, the data collected from crowdsourcing or labeled by weak annotators can contain adversarial data corruption. More challenge, the adversarial data corruption can be arbitrary, unbounded and do not follow any specific distribution. In addition, in the era of data explosion, the fast-growing amount of data makes the robust models more difficult to handle large-scale data sets. This thesis focuses on the development of methods for scalable robust models under the adversarial data corruption assumptions. Four methods are proposed, including robust regression via heuristic hard-thresholding, online and distributed robust regression with adversarial noises, self-paced robust learning for leveraging clean labels in noisy data, and robust regression via online feature selection with adversarial noises. Moreover, I extended the self-paced robust learning method to its distributed version for the scalability of the proposed algorithm, named distributed self-paced learning in alternating direction method of multiplier. Last, a robust multi-factor personality prediction model is proposed to hand the correlated data noises. For the first method, existing solutions for robust regression lack rigorous recovery guarantee of regression coefficients under the adversarial data corruption with no prior knowledge of corruption ratio. The proposed contributions of our work include: (1) Propose efficient algorithms to address the robust least-square regression problem; (2) Design effective approaches to estimate the corruption ratio; (3) Provide a rigorous robustness guarantee for regression coefficient recovery; and (4) Conduct extensive experiments for performance evaluation. For the second method, existing robust learning methods typically focus on modeling the entire dataset at once; however, they may meet the bottleneck of memory and computation as more and more datasets are becoming too large to be handled integrally. The proposed contributions of our work for this task include: (1) Formulate a framework for the scalable robust least-squares regression problem; (2) Propose online and distributed algorithms to handle the adversarial corruption; (3) Provide a rigorous robustness guarantee for regression coefficient recovery; and (4) Conduct extensive experiments for performance evaluations. For the third method, leveraging the prior knowledge of clean labels in noisy data is actually a crucial issue in practice, but existing robust learning methods typically focus more on eliminating noisy data. However, the data collected by ``weak annotator" or crowd-sourcing can be too noisy for existing robust methods to train an accurate model. Moreover, existing work that utilize additional clean labels are usually designed for some specific problems such as image classification. These methods typically utilize clean labels in large-scale noisy data based on their additional domain knowledge; however, these approaches are difficult to handle extremely noisy data and relied on their domain knowledge heavily, which makes them difficult be used in more general problems. The proposed contributions of our work for this task include: (1) Formulating a framework to leverage the clean labels in noisy data; (2) Proposing a self-paced robust learning algorithm to train models under the supervision of clean labels; (3) Providing a theoretical analysis for the convergence of the proposed algorithm; and (4) Conducting extensive experiments for performance evaluations. For the fourth method, the presence of data corruption in user-generated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection when data contains adversarial corruption; and 3) scaling to a massive dataset. This paper proposes a novel RObust regression algorithm via Online Feature Selection (textit{RoOFS}) that concurrently addresses all the above challenges. Specifically, the algorithm iteratively updates the regression coefficients and the uncorrupted set via a robust online feature substitution method. We also prove that our algorithm has a restricted error bound compared to the optimal solution. Extensive empirical experiments in both synthetic and real-world data sets demonstrated that the effectiveness of our new method is superior to that of existing methods in the recovery of both feature selection and regression coefficients, with very competitive efficiency. For the fifth method, existing self-paced learning approaches typically focus on modeling the entire dataset at once; however, this may introduce a bottleneck in terms of memory and computation, as today's fast-growing datasets are becoming too large to be handled integrally. The proposed contributions of our work for this task include: (1) Reformulate the self-paced problem into a distributed setting.; (2) A distributed self-paced learning algorithm based on consensus ADMM is proposed to solve the textit{SPL} problem in a distributed setting; (3) A theoretical analysis is provided for the convergence of our proposed textit{DSPL} algorithm; and (4) Extensive experiments have been conducted utilizing both synthetic and real-world data based on a robust regression task. For the last method, personality prediction in multiple factors, such as openness and agreeableness, is growing in interest especially in the context of social media, which contains massive online posts or likes that can potentially reveal an individual's personality. However, the data collected from social media inevitably contains massive amounts of noise and corruption. To address it, traditional robust methods still suffer from several important challenges, including 1) existence of correlated corruption among multiple factors, 2) difficulty in estimating the corruption ratio in multi-factor data, and 3) scalability to massive datasets. This paper proposes a novel robust multi-factor personality prediction model that concurrently addresses all the above challenges by developing a distributed robust regression algorithm. Specifically, the algorithm optimizes regression coefficients of each factor in parallel with a heuristically estimated corruption ratio and then consolidates the uncorrupted set from multiple factors in two strategies: global consensus and majority voting. We also prove that our algorithm benefits from strong guarantees in terms of convergence rates and coefficient recovery, which can be utilized as a generic framework for the multi-factor robust regression problem with correlated corruption property. Extensive experiment on synthetic and real dataset demonstrates that our algorithm is superior to those of existing methods in both effectiveness and efficiency. / Doctor of Philosophy / Social media has experienced a rapid growth during the past decade. Millions of users of sites such as Twitter have been generating and sharing a wide variety of content including texts, images, and other metadata. In addition, social media can be treated as a social sensor that reflects different aspects of our society. Event analytics in social media have enormous significance for applications like disease surveillance, business intelligence, and disaster management. Social media data possesses a number of important characteristics including dynamics, heterogeneity, noisiness, timeliness, big volume, and network properties. These characteristics cause various new challenges and hence invoke many interesting research topics, which will be addressed here. This dissertation focuses on the development of five novel methods for social media-based spatiotemporal event detection and forecasting. The first of these is a novel unsupervised approach for detecting the dynamic keywords of spatial events in targeted domains. This method has been deployed in a practical project for monitoring civil unrest events in several Latin American regions. The second builds on this by discovering the underlying development progress of events, jointly considering the structural contexts and spatiotemporal burstiness. The third seeks to forecast future events using social media data. The basic idea here is to search for subtle patterns in specific cities as indicators of ongoing or future events, where each pattern is defined as a burst of context features (keywords) that are relevant to a specific event. For instance, an initial expression of discontent gas price increases could actually be a potential precursor to a more general protest about government policies. Beyond social media data, in the fourth method proposed here, multiple data sources are leveraged to reflect different aspects of the society for event forecasting. This addresses several important problems, including the common phenomena that different sources may come from different geographical levels and have different available time periods. The fifth study is a novel flu forecasting method based on epidemics modeling and social media mining. A new framework is proposed to integrate prior knowledge of disease propagation mechanisms and real-time information from social media.
36

Controlling Scalability in Distributed Virtual Environments

Singh, Hermanpreet 01 May 2013 (has links)
A Distributed Virtual Environment (DVE) system provides a shared virtual environment where physically separated users can interact and collaborate over a computer network. More simultaneous DVE users could result in intolerable system performance degradation. We address the three major challenges to improve DVE scalability: effective DVE system performance measurement, understanding the controlling factors of system performance/quality and determining the consequences of DVE system changes. We propose a DVE Scalability Engineering (DSE) process that addresses these three major challenges for DVE design. DSE allow us to identify, evaluate, and leverage trade-offs among DVE resources, the DVE software, and the virtual environment. DSE has three stages. First, we show how to simulate different numbers and types of users on DVE resources. Collected user study data is used to identify representative user types. Second, we describe a modeling method to discover the major trade-offs between quality of service and DVE resource usage. The method makes use of a new instrumentation tool called ppt. ppt collects atomic blocks of developer-selected instrumentation at high rates and saves it for offline analysis. Finally, we integrate our load simulation and modeling method into a single process to explore the effects of changes in DVE resources. We use the simple Asteroids DVE as a minimal case study to describe the DSE process. The larger and commercial Torque and Quake III DVE systems provide realistic case studies and demonstrate DSE usage. The Torque case study shows the impact of many users on a DVE system. We apply the DSE process to significantly enhance the Quality of Experience given the available DVE resources. The Quake III case study shows how to identify the DVE network needs and evaluate network characteristics when using a mobile phone platform. We analyze the trade-offs between power consumption and quality of service. The case studies demonstrate the applicability of DSE for discovering and leveraging tradeoffs between Quality of Experience and DVE resource usage. Each of the three stages can be used individually to improve DVE performance. The DSE process enables fast and effective DVE performance improvement. / Ph. D.
37

Towards a Scalable Docker Registry

Littley, Michael Brian 29 June 2018 (has links)
Containers are an alternative to virtual machines rapidly increasing in popularity due to their minimal overhead. To help facilitate their adoption, containers use management systems with central registries to store and distribute container images. However, these registries rely on other, preexisting services to provide load balancing and storage, which limits their scalability. This thesis introduces a new registry design for Docker, the most prevalent container management system. The new design coalesces all the services into a single, highly scalable, registry. By increasing the scalability of the registry, the new design greatly decreases the distribution time for container images. This work also describes a new Docker registry benchmarking tool, the trace player, that uses real Docker registry workload traces to test the performance of new registry designs and setups. / Master of Science
38

Scalability of Stepping Stones and Pathways

Venkatachalam, Logambigai 30 May 2008 (has links)
Information Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. "Search" is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP "proof-of-concept" implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents. / Master of Science
39

Multicore Scalability Through Asynchronous Work

Mathew, Ajit 13 January 2020 (has links)
With the end of Moore's Law, computer architects have turned to multicore architecture to provide high performance. Unfortunately, to achieve higher performance, multicores require programs to be parallelized which is an untamed problem. Amdahl's law tells that the maximum theoretical speedup of a program is dictated by the size of the non-parallelizable section of a program. Hence to achieve higher performance, programmers need to reduce the size of sequential code in the program. This thesis explores asynchronous work as a means to reduce sequential portions of program. Using asynchronous work, a programmer can remove tasks which do not affect data consistency from the critical path and can be performed using background thread. Using this idea, the thesis introduces two systems. First, a synchronization mechanism, Multi-Version Read-Log-Update(MV-RLU), which extends Read-Log-Update (RLU) through multi-versioning. At the core of MV-RLU design is a concurrent garbage collection algorithm which reclaims obsolete versions asynchronously reducing blocking of threads. Second, a concurrent and highly scalable index-structure called Hydralist for multi-core. The key idea behind design of Hydralist is that an index-structure can be divided into two component (search layer and data layer) and updates to data layer can be done synchronously while updates to search layer can be propagated asynchronously using background threads. / Master of Science / Up until mid-2000s, Moore's law predicted that performance CPU doubled every two years. This is because improvement in transistor technology allowed smaller transistor which can switch at higher frequency leading to faster CPU clocks. But faster clock leads to higher heat dissipation and as chips reached their thermal limits, computer architects could no longer increase clock speeds. Hence they moved to multicore architecture, wherein a single die contains multiple CPUs, to allow higher performance. Now programmers are required to parallelize their code to take advangtage of all the CPUs in a chip which is a non trivial problem. The theoretical speedup achieved by a program on multicore architecture is dictated by Amdahl's law which describes the non parallelizable code in a program as the limiting factor for speedup. For example, a program with 99% parallelizable code can achieve speedup of 20 whereas a program with 50% parallelizable code can only achieve speedup of 2. Therefore to achieve high speedup, programmers need to reduce size of serial section in their program. One way to reduce sequential section in a program is to remove non-critical task from the sequential section and perform the tasks asynchronously using background thread. This thesis explores this technique in two systems. First, a synchronization mechanism which is used co-ordinate access to shared resource called Multi-Version Read-Log-Update (MV-RLU). MV-RLU achieves high performance by removing garbage collection from critical path and performing it asynchronously using background thread. Second, an index structure, Hydralist, which based on the insight that an index structure can be decomposed into two components, search layer and data layer, and decouples updates to both the layer which allows higher performance. Updates to search layer is done synchronously while updates to data layer is done asynchronously using background threads. Evaluation shows that both the systems perform better than state-of-the-art competitors in a variety of workloads.
40

POSEIDON: The First Safe and Scalable Persistent Memory Allocator

Demeri, Anthony K. 20 May 2020 (has links)
With the advent of byte-addressable Non-Volatile Memory (NVMM), the need for a safe, scalable and high-performing memory allocator is inevitable. A slow memory allocator can bottleneck the entire application stack, while an unsecure memory allocator can render underlying systems and applications inconsistent upon program bugs or system failure. Unlike DRAM-based memory allocators, it is indispensable for an NVMM allocator to guarantee its heap metadata safety from both internal and external errors. An effective NVMM memory allocator should be 1) safe 2) scalable and 3) high performing. Unfortunately, none of the existing persistent memory allocators achieve all three requisites; critically, we also note: the de-facto NVMM allocator, Intel's Persistent Memory Development Kit (PMDK), is vulnerable to silent data corruption and persistent memory leaks as result of a simple heap overflow. We closely investigate the existing defacto NVMM memory allocators, especially PMDK, to study their vulnerability to metadata corruption and reasons for poor performance and scalability. We propose Poseidon, which is safe, fast and scalable. The premise of Poseidon revolves around providing a user application with per-CPU sub-heaps for scalability, while managing the heap metadata in a segregated fashion and efficiently protecting the metadata using a scalable hardware-based protection scheme, Intel's Memory Protection Keys (MPK). We evaluate Poseidon with a wide array of microbenchmarks and real-world benchmarks, noting: Poseidon outperforms the state-of-art allocators by a significant margin, showing improved scalability and performance, while also guaranteeing metadata safety. / Master of Science / Since the dawn of time, civilization has revolved around effective communication. From smoke signals to telegraphs and beyond, communication has continued to be a cornerstone of successful societies. Today, communication and collaboration occur, daily, on a global scale, such that even sub-second units of time are critical to successful societal operation. Naturally, many forms of modern communication revolve around our digital systems, such as personal computers, email servers, and social networking database applications. There is, thus, a never-ending surge of digital system development, constantly striving toward increased performance. For some time, increasing a system's dynamic random-access memory, or DRAM, has been able to provide performance gains; unfortunately, due to thermal and power constraints, such an increase is no longer feasible. Additionally, loss of power on a DRAM system causes bothersome loss of data, since the memory storage is volatile to power loss. Now, we are on the advent of an entirely new physical memory system, termed non-volatile main memory (NVMM), which has near identical performance properties to DRAM, but is operational in much larger quantities, thus allowing increased overall system speed. Alas, such a system also imposes additional requirements upon software developers; since, for NVMM, all memory updates are permanent, such that a failed update can cause persistent memory corruption. Regrettably, the existing software standard, led by Intel's Persistent Memory Development Kit (PMDK), is both unsecure (allowing for permanent memory corruption, with ease), low performance, and a bottleneck for multicore systems. Here, we present a secure, high performing solution, termed Poseidon, which harnesses the full potential of NVMM.

Page generated in 0.0656 seconds