311 |
Flexible Computing with Virtual MachinesLagar Cavilla, Horacio Andres 30 March 2011 (has links)
This thesis is predicated upon a vision of the future of computing with a separation of functionality between core and edges, very
similar to that governing the Internet itself. In this vision, the core of our computing infrastructure is made up of vast server farms with an abundance of storage and processing cycles. Centralization of
computation in these farms, coupled with high-speed wired or wireless connectivity, allows for pervasive access to a highly-available and well-maintained repository for data, configurations, and applications. Computation in the edges is concerned with provisioning application state and user data to rich clients, notably mobile devices equipped with powerful displays and graphics processors.
We define flexible computing as systems support for applications that dynamically leverage the resources available in the core
infrastructure, or cloud. The work in this thesis focuses on two instances of flexible computing that are crucial to the
realization of the aforementioned vision. Location flexibility aims to, transparently and seamlessly, migrate applications between
the edges and the core based on user demand. This enables performing the interactive tasks on rich edge clients and the computational tasks on powerful core servers. Scale flexibility is the ability of
applications executing in cloud environments, such as parallel jobs or
clustered servers, to swiftly grow and shrink their footprint according to execution demands.
This thesis shows how we can use system virtualization to implement systems that provide scale and location flexibility. To that effect we build and evaluate two system prototypes: Snowbird and SnowFlock. We present techniques for manipulating virtual machine state that turn running software into a malleable entity which is easily manageable, is decoupled from the underlying hardware, and is capable of dynamic relocation and scaling. This thesis demonstrates that virtualization technology is a powerful and suitable tool to
enable solutions for location and scale flexibility.
|
312 |
Data Quality By Design: A Goal-oriented ApproachJiang, Lei 13 August 2010 (has links)
A successful information system is the one that meets its design goals. Expressing these goals and subsequently translating them into a working solution is a major challenge for information systems engineering. This thesis adopts the concepts and techniques from goal-oriented (software)
requirements engineering research for conceptual database design, with a focus on data quality issues. Based on a real-world case study, a goal-oriented process is proposed for database requirements analysis and modeling. It spans from analysis of high-level stakeholder goals to detailed design of a conceptual databases schema. This process is then extended specifically for dealing with data quality issues: data of low quality may be detected and corrected by performing various quality assurance activities; to support these activities, the schema needs to be revised by accommodating additional data requirements. The extended process therefore focuses on analyzing and modeling quality assurance data requirements.
A quality assurance activity supported by a revised schema may involve manual work,
and/or rely on some automatic techniques, which often depend on the specification and enforcement of data quality rules. To address the constraint aspect in conceptual database design, data quality rules are classified according to a number of domain and application independent properties. This classification can be used to guide rule designers and to facilitate building of a
rule repository. A quantitative framework is then proposed for measuring and comparing DQ
rules according to one of these properties: effectiveness; this framework relies on derivation of formulas that represent the effectiveness of DQ rules under different probabilistic assumptions.
A semi-automatic approach is also presented to derive these effectiveness formulas.
|
313 |
Using System Structure and Semantics for Validating and Optimizing Performance of Multi-tier Storage SystemsSoundararajan, Gokul 01 September 2010 (has links)
Modern persistent storage systems must balance two competing imperatives: they must meet strict application-level performance goals and they must reduce the operating costs. The current techniques of either manual tuning by administrators or by over-provisioning resources are either time-consuming or expensive. Therefore, to reduce the costs of management, automated performance-tuning solutions are needed.
To address this need, we develop and evaluate algorithms centered around the key thesis that a holistic semantic-aware view of the application and system is needed for automatically tuning and validating the performance of multi-tier storage systems. We obtain this global system view by leveraging structural and semantic information available at each tier and by making this information available to all tiers. Specifically, we develop two key build- ing blocks: (i) context-awareness, where information about the application structure and semantics is exchanged between the tiers, and (ii) dynamic performance models that use the structure of the system to build lightweight resource-to-performance mappings quickly. We implement a prototype storage system, called Akash, based on commodity components. This prototype enables us to study all above scenarios in a realistic rendering of a modern multi-tier storage system. We also develop a runtime tool, Dena, to analyze the performance and behaviour of multi-tier server systems.
We apply these tools and techniques in three real-world scenarios. First, we leverage application context-awareness at the storage server in order to improve the performance of I/O prefetching. Tracking application access patterns per context enables us to improve
the prediction accuracy for future access patterns, over existing algorithms, where the high interleaving of I/O accesses from different contexts make access patterns hard to recognize. Second, we build and leverage dynamic performance models for resource allocation, providing consistent and predictable performance, corresponding to pre-determined application goals. We show that our dynamic resource allocation algorithms minimize the interference effects between e-commerce applications sharing a common infrastructure. Third, we introduce a high-level paradigm for interactively validating system performance by the system administrator. The administrator leverages existing performance models and other semantic knowledge about the system in order to discover bottlenecks and other opportunities for performance improvements. Our evaluation shows that our techniques enable significant improvements in performance over current approaches.
|
314 |
Predicative Quantum ProgrammingTafliovich, Anya 01 September 2010 (has links)
This work presents Quantum Predicative Programming --- a theory ofquantum programming that encompasses many aspects of quantum computation and quantum communication. The theory provides a
methodology to specify, implement, and analyse quantum algorithms, the paradigm of quantum non-locality, quantum pseudotelepathy
games, computing with mixed states, and quantum communication protocols that use both quantum and classical communication channels.
|
315 |
Constant-RMR Implementations of CAS and Other Synchronization Primitives Using Read and Write OperationsGolab, Wojciech 15 February 2011 (has links)
We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, all comparison
primitives (such as CAS and TAS), and load-linked/store-conditional using only a constant number of remote memory references (RMRs), in both the cache-coherent and the
distributed-shared-memory models of such multiprocessors. Our implementations are
blocking, rather than wait-free: they ensure progress provided all processes that invoke
the implemented primitive are live.
Our results imply that any algorithm using read and write operations, comparison
primitives and load-linked/store-conditional, can be simulated by an algorithm that uses read and write operations only, with at most a constant-factor increase in RMR complexity.
|
316 |
Low and Mid-level Shape Priors for Image SegmentationLevinshtein, Alex 15 February 2011 (has links)
Perceptual grouping is essential to manage the complexity of real world scenes. We explore bottom-up grouping at three different levels. Starting from low-level grouping, we propose a novel method for oversegmenting an image into compact superpixels, reducing the complexity of many high-level tasks. Unlike most low-level segmentation techniques, our geometric flow formulation enables us to impose additional compactness constraints, resulting in a fast method with minimal undersegmentation. Our subsequent work utilizes compact superpixels to detect two important mid-level shape regularities, closure and symmetry. Unlike the majority of closure detection approaches, we transform the closure detection problem into one of finding a subset of superpixels whose collective boundary has strong edge support in the image. Building on superpixels, we define a closure cost which is a ratio of a novel learned boundary gap measure to area, and show how it can be globally minimized to recover a small set of promising shape hypotheses. In our final contribution, motivated by the success of shape skeletons, we recover and group symmetric parts without assuming prior figure-ground segmentation. Further exploiting superpixel compactness, superpixels are this time used as an approximation to deformable maximal discs that comprise a medial axis. A learned measure of affinity between neighboring superpixels and between symmetric parts enables the purely bottom-up recovery of a skeleton-like structure, facilitating indexing and generic object recognition in complex real images.
|
317 |
Hierarchical Bayesian Models of Verb Learning in ChildrenParisien, Christopher 11 January 2012 (has links)
The productivity of language lies in the ability to generalize linguistic knowledge to new situations. To understand how children can learn to use language in novel, productive ways, we must investigate how children can find the right abstractions over their input, and how these abstractions can actually guide generalization. In this thesis, I present a series of hierarchical Bayesian models that provide an explicit computational account of how children can acquire and generalize highly abstract knowledge of the verb lexicon from the language around them. By applying the models to large, naturalistic corpora of child-directed speech, I show that these models capture key behaviours in child language development. These models offer the power to investigate developmental phenomena with a degree of breadth and realism unavailable in existing computational accounts of verb learning.
By most accounts, children rely on strong regularities between form and meaning to help them acquire abstract verb knowledge. Using a token-level clustering model, I show that by attending to simple syntactic features of potential verb arguments in the input, children can acquire abstract representations of verb argument structure that can reasonably distinguish the senses of a highly polysemous verb.
I develop a novel hierarchical model that acquires probabilistic representations of verb argument structure, while also acquiring classes of verbs with similar overall patterns of usage. In a simulation of verb learning within a broad, naturalistic context, I show how this abstract, probabilistic knowledge of alternations can be generalized to new verbs to support learning.
I augment this verb class model to acquire associations between form and meaning in verb argument structure, and to generalize this knowledge appropriately via the syntactic and semantic aspects of verb alternations. The model captures children's ability to use the alternation pattern of a novel verb to infer aspects of the verb's meaning, and to use the meaning of a novel verb to predict the range of syntactic forms in which the verb may participate. These simulations also provide new predictions of children's linguistic development, emphasizing the value of this model as a useful framework to investigate verb learning in a complex linguistic environment.
|
318 |
Mining User-generated Content for InsightsAngel, Albert-David 20 August 2012 (has links)
The proliferation of social media, such as blogs, micro-blogs and social networks, has led to a plethora of readily available user-generated content. The latter offers a unique, uncensored window into emerging stories and events, ranging from politics and revolutions to product perception and the zeitgeist.
Importantly, structured information is available for user-generated content, by dint of its metadata, or can be surfaced via recently commoditized information extraction tools. This wealth of information, in the form of real-world entities and facts mentioned in a document, author demographics, and so on, provides exciting opportunities for mining insights from this content.
Capitalizing upon these, we develop Grapevine, an online system that distills information from the social media collective on a daily basis, and facilitates its interactive exploration. To further this goal, we address important research problems, which are also of independent interest. The sheer scale of the data being processed, necessitates that our solutions be highly efficient.
We propose efficient techniques for mining important stories, on a per-user-demographic basis, based on named entity co-occurrences in user-generated content. Building upon these, we propose efficient techniques for identifying emerging stories as-they-happen, by identifying dense structures in an evolving entity graph.
To facilitate the exploration of these stories, we propose efficient techniques for filtering them, based on users’ textual descriptions of the entities involved.
These gathered insights need to be presented to users in a useful manner, via a diverse set of representative documents; we thus propose efficient techniques for addressing this problem.
Recommending related stories to users is important for navigation purposes. As the way in which these are related to the story being explored is not always clear, we propose efficient techniques for generating recommendation explanations via entity relatedness queries.
|
319 |
Predictor Virtualization: Teaching Old Caches New TricksBurcea, Ioana Monica 20 August 2012 (has links)
To improve application performance, current processors rely on prediction-based hardware optimizations, such as data prefetching and branch prediction. These hardware optimizations store application metadata in on-chip predictor tables and use the metadata to anticipate and optimize for future application behavior. As application footprints grow, the predictor tables need to scale for predictors to remain effective.
One important challenge in processor design is to decide which hardware optimizations to implement and how much resources to dedicate to a specific optimization. Traditionally, processor architects employ a one-size-fits-all approach when designing predictor-based hardware optimizations: for each optimization, a fixed portion of the on-chip resources is allocated to the predictor storage. This approach often leads to sub-optimal designs where: 1) resources are wasted for applications that do not benefit from a particular predictor or require only small predictor tables, or 2) predictors under-perform for applications that need larger predictor tables that can not be built due to area-latency-power constraints.
This thesis introduces Predictor Virtualization (PV), a framework that uses the traditional processor memory hierarchy to store application metadata used in speculative hardware optimizations. This allows to emulate large, more accurate predictor tables, which, in return, leads to higher application performance. PV exploits the current trend of unprecedentedly large on- chip secondary caches and allocates on demand a small portion of the cache capacity to store application metadata used in hardware optimizations, adjusting to the application’s need for predictor resources. As a consequence, PV is a pay-as-you-go technique that emulates large predictor tables without increasing the dedicated storage overhead.
To demonstrate the benefits of virtualizing hardware predictors, we present virtualized designs for three different hardware optimizations: a state-of-the-art data prefetcher, conventional branch target buffers and an object-pointer prefetcher. While each of these hardware predictors exhibit different characteristics that lead to different virtualized designs, virtualization improves the cost-performance trade-off for all these optimizations.
PV increases the utility of traditional processor caches: in addition to being accelerators for slow off-chip memories, on-chip caches are leveraged for increasing the effectiveness of predictor-based hardware optimizations.
|
320 |
Inferring the Binding Preferences of RNA-binding ProteinsHilal, Kazan 17 December 2012 (has links)
Post-transcriptional regulation is carried out by RNA-binding proteins (RBPs) that bind to specific RNA molecules and control their processing, localization, stability and degradation. Experimental studies have successfully identified RNA targets associated with specific RBPs. However, because the locations of the binding sites within the targets are unknown and because RBPs recognize both sequence and structure elements in their binding sites, identification of RBP binding preferences from these data remains challenging.
The unifying theme of this thesis is to identify RBP binding preferences from experimental data. First, we propose a protocol to design a complex RNA pool that represents diverse sets of sequence and structure elements to be used in an in vitro assay to efficiently measure RBP binding preferences. This design has been implemented in the RNAcompete method, and applied genome-wide to human and Drosophila RBPs. We show that RNAcompete-derived motifs are consistent with established binding preferences.
We developed two computational models to learn binding preferences of RBPs from large-scale data. Our first model, RNAcontext uses a novel representation of secondary structure to infer both sequence and structure preferences of RBPs, and is optimized for use with in vitro binding data on short RNA sequences. We show that including structure information improves the prediction accuracy significantly. Our second model, MaLaRKey, extends RNAcontext to fit motif models to sequences of arbitrary length, and to incorporate a richer set of structure features to better model in vivo RNA secondary structure. We demonstrate that MaLaRKey infers detailed binding models that accurately predict binding of full-length transcripts.
|
Page generated in 0.0283 seconds