Global ETD Search

491	Sliding Window Query Processing over Data Streams Golab, Lukasz January 2006 (has links) Database management systems (DBMSs) have been used successfully in traditional business applications that require persistent data storage and an efficient querying mechanism. Typically, it is assumed that the data are static, unless explicitly modified or deleted by a user or application. Database queries are executed when issued and their answers reflect the current state of the data. However, emerging applications, such as sensor networks, real-time Internet traffic analysis, and on-line financial trading, require support for processing of unbounded data streams. The fundamental assumption of a data stream management system (DSMS) is that new data are generated continually, making it infeasible to store a stream in its entirety. At best, a sliding window of recently arrived data may be maintained, meaning that old data must be removed as time goes on. Furthermore, as the contents of the sliding windows evolve over time, it makes sense for users to ask a query once and receive updated answers over time. <br /><br /> This dissertation begins with the observation that the two fundamental requirements of a DSMS are dealing with transient (time-evolving) rather than static data and answering persistent rather than transient queries. One implication of the first requirement is that data maintenance costs have a significant effect on the performance of a DSMS. Additionally, traditional query processing algorithms must be re-engineered for the sliding window model because queries may need to re-process expired data and "undo" previously generated results. The second requirement suggests that a DSMS may execute a large number of persistent queries at the same time, therefore there exist opportunities for resource sharing among similar queries. <br /><br /> The purpose of this dissertation is to develop solutions for efficient query processing over sliding windows by focusing on these two fundamental properties. In terms of the transient nature of streaming data, this dissertation is based upon the following insight. Although the data keep changing over time as the windows slide forward, the changes are not random; on the contrary, the inputs and outputs of a DSMS exhibit patterns in the way the data are inserted and deleted. It will be shown that the knowledge of these patterns leads to an understanding of the semantics of persistent queries, lower window maintenance costs, as well as novel query processing, query optimization, and concurrency control strategies. In the context of the persistent nature of DSMS queries, the insight behind the proposed solution is that various queries may need to be refreshed at different times, therefore synchronizing the refresh schedules of similar queries creates more opportunities for resource sharing. Computer Science
492	Reducing Interaction Cost: A Mechanism Deisgn Approach Yunqi, Zhang 27 August 2007 (has links) In this thesis we study the problem of requiring self-interested agents need to interact with some centralized mechanism where this interaction is costly. To improve their utility, agents may choose to interact with \emph{neighbours} in order to coordinate their actions, potentially resulting in savings with respect to total interaction costs for all involved. We highlight the issues that arise in such a setting for the mechanism as well as for the agents. We use a mechanism-design approach to study this problem and present a model for self-interested agents to form groups with neighbours in order to reduce the total interaction cost. Our model focuses on two aspects: reward-distribution and cost-sharing. We look at two scenarios for reward-distribution mechanisms and proposed a core-stable payoff as well as a fair payoff mechanism. We then propose a cost-sharing mechanism that agents can use to coordinate and reduce their interaction costs. We prove this mechanism to be incentive-compatible, cost-recovery and fair. We also discuss how agents might form groups in order to save on cost. We study how our final outcome (the total percentage of savings as a group) depends on the agents' interaction topology and analyze different topologies. In addition we carry out experiments which further validate our proposal. Computer Science
493	Algorithms and Design Principles for Rural Kiosk Networks Guo, Shimin January 2007 (has links) The KioskNet project aims to provide extremely low-cost Internet access to rural kiosks in developing countries, where conventional access technologies, \eg\, DSL, CDMA and dial-up, are currently economically infeasible. In the KioskNet architecture, an Internet-based proxy gathers data from the Internet and sends it to a set of edge nodes, called ``gateways'' from which ferries, such as buses and cars, opportunistically pick up the data using short-range WiFi as they drive past, and deliver it wirelessly to kiosks in remote villages. The first part of this thesis studies the downlink scheduling problem in the context of KioskNet. We pose the following question: assuming knowledge of the bus schedules, when and to which gateway should the proxy send each data bundle so that 1) the bandwidth is shared fairly and 2) given 1), the overall delay is minimized? We show that an existing schedule-aware scheme proposed in the literature, \ie\, EDLQ~\cite{JainFP04}, while superficially appearing to perform well, has some inherent limitations which could lead to poor performance in some situations. Moreover, EDLQ does not provide means to enforce desired bandwidth allocations. To remedy these problems, we employ a token-bucket mechanism to enforce fairness and decouple fairness and delay-minimization concerns. We then describe a utility-based scheduling algorithm which repeatedly computes an optimal schedule for all eligible bundles as they come in. We formulate this optimal scheduling problem as a minimum-cost network-flow problem, for which efficient algorithms exist. Through simulations, we show that the proposed scheme performs at least as well as EDLQ in scenarios that favour EDLQ and achieves up to 40\% reduction in delay in those that do not. Simulation results also indicate that our scheme is robust against the randomness in actual timing of buses. The second part of the thesis shares some of our experience with building and testing the software for KioskNet. We subjected a prototype of the KioskNet system, built on top of the DTN reference implementation, to stress tests and were able to identify and fix several software defects which severely limited the performance. From this experience, we abstract some general principles common to software that deals with opportunistic communication. Computer Science
494	Direct User Calls from the Kernel: Design and Implementation Wang, Weihan January 2007 (has links) Traditional, general-purpose operating systems strictly separate user processes from the kernel. Processes can only communicate with the kernel through system calls. As a means to ensure system security, system calls inevitably involve performance overhead. Direct User Callback from the Kernel, or DUCK, is a framework that improves the performance of network-centric applications by executing a part of application code directly from inside the kernel. Because the code runs in kernel mode and can access kernel memory directly, DUCK is able to eliminate two important sources of system call overhead, namely mode switches and data copying. One issue with DUCK is how to design an application programming interface (API) that is general, efficient, and easy to use. In this thesis, we present the design of the DUCK API, which includes functions for both direct user code execution and zero-copy buffer management. We have implemented DUCK prototypes on the Solaris/SPARC platform. An efficient way to implement direct user code invocation is through memory sharing between the kernel and user processes. However, because Solaris/SPARC separates the user and kernel address spaces, achieving memory sharing is difficult. In the thesis, we study the SPARC architecture and the Solaris virtual memory subsystem, and discuss three potential approaches to support memory sharing required by DUCK. We proceed to present micro-benchmark experiments demonstrating that our DUCK prototype implementation is capable of improving the peak throughput of a simple UDP forwarder by 28% to 44%. Computer Science
495	Computational Complexity of Bi-clustering. Hassanpour Ghady, Saeed 26 September 2007 (has links) Bi-clustering, i.e. simultaneously clustering the rows and columns of matrices based on their entries, covers a large variety of techniques in data mining. The goal of all bi-clustering techniques is finding the partitions of the rows and the columns in which sub-rows and sub-columns show a similar behavior. Currently existing algorithms for bi-clustering problems are either heuristic, or try to solve approximations of the original problems. There is no efficient algorithm for exact bi-clustering problems. The computational complexity of bi-clustering problems depends on the exact problem formulation, and particularly on the merit function used to evaluate the quality of a given bi-clustering partition. The computational complexity of most of the common bi-clustering problems is unknown. In this thesis, we present a formal definition for the homogeneous cover problem. This problem has many applications from bio-informatics to targeted marketing. We analyze its computational complexity and show that the problem is NP-hard. Computer Science
496	Process Models for Distributed Event-Based Systems Blanco, Rolando Maldonado January 2010 (has links) Distributed Event-Based Systems (DEBSs) are middleware supporting the interaction of publisher and subscriber components via events. In DEBSs, the subscribers to be notified when an event is announced are decided at run-time without requiring publisher components to know the name or locations of the subscribers, nor the subscribers to know the name or locations of the publishers. This low coupling between components makes DEBSs suitable for applications with a large or unpredictable number of autonomous components. The development of applications in DEBSs is an ad hoc process poorly supported by current software engineering methodologies. Moreover, the behaviours exhibited by these systems and their applications are not well understood, and no suitable models exist where these behaviours can be described and analyzed. The main concern of this thesis is the development of such models. Specifically, we develop formalisms and models supporting the specification, prediction, and validation of the behaviour exhibited by the middleware and the applications executing on it. Our main contributions to the area are: new formalisms for the representation of DEBSs and their applications, and for the specification of both, system and application properties; a categorization of the features related to the definition, announcement, and notification of events in DEBSs and, in general, event-based systems; models representing the categorized DEBS features; case studies detailing models and properties for specific systems; a prototype tool for the verification of DEBSs and applications. The formalisms developed expose the location of the actions in the modelled systems and support the specification of several forms of location-awareness and adaptive behaviour. Computer Science
497	Designing a Privacy-Aware Location Proof Architecture Luo, Wanying January 2010 (has links) Although location-based applications have existed for several years, verifying the correctness of a user's claimed location is a challenge that has only recently gained attention in the research community. Existing architectures for the generation and verification of such location proofs have limited flexibility. For example, they do not support the proactive gathering of location proofs, where, at the time of acquiring a location proof, a user does not yet know for which application or service she will use this proof. Supporting proactive location proofs is challenging because these proofs might enable proof issuers to track a user or they might violate a user's location privacy by revealing more information about a user's location than strictly necessary to an application. In addition, none of the existing architectures possesses an effective cheat detection mechanism to spot users who cheat about their location. We present seven essential design goals that a flexible location proof architecture should meet. Furthermore, we introduce a lightweight location proof architecture that realizes a subset of our design goals and that includes user anonymity and location privacy as key design components, as opposed to previous proposals. We then present a complete architecture that meets all of the design goals and demonstrate how some of the design goals can be achieved by adopting proper cryptographic techniques. Note that the reason of having a lightweight architecture that meets a subset of our design goals is explained in section 2.4.6. Finally, we provide an implementation, experimental results and a deployment strategy of our location proof architecture, and present three real-world location-proof-based applications to further demonstrate the practicality of our architecture. Computer Science
498	Exact solutions to combinatorial optimizations and the traveling baseball fan problem Terrell, Neal D. 10 January 2013 Exact solutions to combinatorial optimizations and the traveling baseball fan problem Computer Science
499	Optimizing Machine Translation by Learning to Search Galron, Daniel 12 January 2013 Optimizing Machine Translation by Learning to Search Computer Science
500	Linear vs. branching time: A semantical perspective January 2009 (has links) The discussion of the relative merits of linear versus branching-time goes back to early 1980s. The dominating belief has been that the linear-time framework is not expressive enough semantically, marking linear-time logics as weak. Here we examine this issue from the perspective of process equivalence, one of the most fundamental notions in concurrency theory. We postulate three principles that we view as fundamental to any discussion of process equivalence. First, we take contextual equivalence as the primary notion of equivalence. Second, we require the description of a process to fully specify all relevant behavioral aspects of the process. Finally, we require observable process behavior to be reflected in input/output behavior. Under these postulates the distinctions between the linear and branching semantics tend to evaporate. Applying them to the framework of transducers, we show that our postulates result in a unique notion of process equivalence, which is trace based, rather than tree based. Computer Science

Search results