Global ETD Search

11	Entity resolution for large relational datasets Guo, Zhaochen 06 1900 (has links) As the volume of data on the Web or in databases increases, data integration is becoming more expensive and challenging than ever before. One of the challenges is entity resolution when integrating data from different sources. References with different representations but referring to the same underlying entity need to be resolved. And, references with similar descriptions but referring to different entities need to be distinguished from one another. Correctly de-duplicating and disambiguating these entities is an essential task in preparing high quality data. Traditional approaches mainly focus on the attribute similarity of references, but they do not always work for datasets with insufficient information. However, in relational datasets like social networks, references are always associated with one or more relationships and these relationships can provide additional information for identifying duplicates. In this thesis, we solve the entity resolution problem by using relationships in the relational datasets. We implement a relational entity resolution algorithm to resolve entities based on an existing algorithm, greatly improving its efficiency and performance. Also, we generalize the single-type entity resolution algorithm to a multi-type entity resolution algorithm for applications that require to resolve multiple types of reference simultaneously and demonstrate its advantage over the single-type entity resolution algorithm. To improve the efficiency of the entity resolution process, we implement two blocking approaches to reduce the number of redundant comparisons performed by other methods. In addition, we implement a disk-based clustering algorithm that addresses the scalability problem, and apply it on a large academic social network dataset. Entity resolution Scalability Relational entity resolution
12	Capacity and scale-free dynamics of evolving wireless networks Iyer, Bharat Vishwanathan 17 February 2005 (has links) Many large-scale random graphs (e.g., the Internet) exhibit complex topology, nonhomogeneous spatial node distribution, and preferential attachment of new nodes. Current topology models for ad-hoc networks mostly consider a uniform spatial distribution of nodes and do not capture the dynamics of evolving, real-world graphs, in which nodes "gravitate" toward popular locations and self-organize into non-uniform clusters. In this thesis, we first investigate two constraints on scalability of ad-hoc networks network reliability and node capacity. Unlike other studies, we analyze network resilience to node and link failure with an emphasis on the growth (i.e., evolution) dynamics of the entire system. Along the way, we also study important graph-theoretic properties of ad-hoc networks (including the clustering coefficient and the expected path length) and strengthen our generic understanding of these systems. Finally, recognizing that under existing uniform models future ad-hoc networks cannot scale beyond trivial sizes, we argue that ad-hoc networks should be modeled from an evolution standpoint, which takes into account the well-known "clustering" phenomena observed in all real-world graphs. This model is likely to describe how future ad-hoc networks will self-organize since it is well documented that information content distribution among end-users (as well as among spatial locations) is non-uniform (often heavy-tailed). Results show that node capacity in the proposed evolution model scales to larger network sizes than in traditional approaches, which suggest that non-uniformly clustered, self-organizing, very large-scale ad-hoc networks may become feasible in the future. wireless networks evolution models scalability capacity
13	Placement By Marriage Bian, Huimin 30 July 2008 (has links) As the field programmable gate array (FPGA) industry grows device capacity with Moore's law and expands its market to high performance computing, scalability of its key CAD algorithms emerges as a new priority to deliver a user experience competitive to parallel processors. Among the many walls to overcome, placement stands out due to its critical impact on both frontend synthesis and backend routing. To construct a scalable placement flow, we present three innovations in detailed placement: a legalizer that works well under low whitespace; a wirelength optimizer based on bipartite matching; and a cache-aware annealer. When applied to the hundred-thousand cell IBM benchmark suite, our detailed placer can achieve 27% better wirelength and 8X faster runtime against FastDP, the fastest academic detailed placer reported, and our full placement flow can achieve 101X faster runtime, with 5% wirelength overhead, against VPR, the de facto standard in FPGA placements. Design Aids Placement Algorithm Scalability 0984
14	Placement By Marriage Bian, Huimin 30 July 2008 (has links) As the field programmable gate array (FPGA) industry grows device capacity with Moore's law and expands its market to high performance computing, scalability of its key CAD algorithms emerges as a new priority to deliver a user experience competitive to parallel processors. Among the many walls to overcome, placement stands out due to its critical impact on both frontend synthesis and backend routing. To construct a scalable placement flow, we present three innovations in detailed placement: a legalizer that works well under low whitespace; a wirelength optimizer based on bipartite matching; and a cache-aware annealer. When applied to the hundred-thousand cell IBM benchmark suite, our detailed placer can achieve 27% better wirelength and 8X faster runtime against FastDP, the fastest academic detailed placer reported, and our full placement flow can achieve 101X faster runtime, with 5% wirelength overhead, against VPR, the de facto standard in FPGA placements. Design Aids Placement Algorithm Scalability 0984
15	Improving web server efficiency on commodity hardware Beltrán Querol, Vicenç 03 October 2008 (has links) El ràpid creixement de la Web requereix una gran quantitat de recursos computacionals que han de ser utilitzats eficientment. Avui en dia, els servidors basats en hardware estendard son les plataformes preferides per executar els servidors web, ja que són les plataformes amb millor relació rendiment/cost. El treball presentat en aquesta tesi esta dirigit a millorar la eficàcia en la gestió de recursos dels servidors web actuals. Per assolir els objectius d'aquesta tesis s'ha caracteritzat el funcionament dels servidors web en diverses entorns representatius, per tal de identificar el problemes i coll d'ampolla que limiten el rendiment del servidor web. Amb l'estudi dels servidors web s'ha identificat dos problemes principals que disminueixen l'eficiència dels servidors web en la utilització dels recursos hardware disponibles. El primer problema identificat és la evolució del protocol HTTP per incorporar connexions persistents i seguretat, que disminueix el rendiment e incrementa la complexitat de configuració dels servidors web. El segon problema és la naturalesa de algunes aplicacions web, les quals estan limitades per la memòria física o l'ample de banda amb el disc, que impedeix la correcta utilització dels recursos presents en les maquines multiprocessadors. Per solucionar aquests dos problemes dels servidors web hem proposat dues tècniques. En primer lloc, l'arquitectura hibrida, una evolució de l'arquitectura multi-threaded que es pot implementar fàcilment el els servidor web actuals i que millora notablement la gestió de les connexions i redueix la complexitat de configuració de tot el sistema. En segon lloc, hem implementat en el kernel del sistema operatiu Linux un comprensió de memòria principal per millorar el rendiment de les aplicacions que tenen la memòria com ha coll d'ampolla, millorant així la utilització dels recursos disponibles. Els resultats d'aquesta tesis estan avalats per una avaluació experimental exhaustiva que ha provat la efectivitat i viabilitat de les nostres propostes. Cal destacar que l'arquitectura de servidor web hybrida proposada en aquesta tesis ha estat implementada recentment per coneguts servidors web com és el cas de Apache, Tomcat i Glassfish. / The unstoppable growth of the World Wide Web requires a huge amount of computational resources that must be used efficiently. Nowadays, commodity hardware is the preferred platform to run web server systems because it is the most cost-effective solution. The work presented in this thesis aims to improve the efficiency of current web server systems, allowing the web servers to make the most of hardware resources. To this end, we first characterize current web server system and identify the problems that hinder web servers from providing an efficient utilization of resources. From the study of web servers in a wide range of situations and environments, we have identified two main issues that prevents web servers systems from efficiently using current hardware resources. The first is the extension of the HTTP protocol to include connection persistence and security, which dramatically impacts the performance and configuration complexity of traditional multi-threaded web servers. The second is the memory-bounded or disk-bounded nature of some web workloads that prevents the full utilization of the abundant CPU resources available on current commodity hardware. We propose two novel techniques to overcome the main problems with current web server systems. Firstly, we propose a Hybrid web serverarchitecture which can be easily implemented in any multi-threaded web server to improve CPU utilization so as to provide better management of client connections. And secondly, we describe a main memory compression technique implemented in the Linux operating system that makes optimum use of current multiprocessor's hardware, in order to improve the performance of memory bound web applications. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems. It is worth noting that the main concepts behind the Hybrid architecture have recently been implemented in popular web servers like Apache, Tomcat and Glassfish. memory compression scalability performance web server 004
16	Dynamic Scale-out Mechanisms for Partitioned Shared-Nothing Databases Karyakin, Alexey January 2011 (has links) For a database system used in pay-per-use cloud environments, elastic scaling becomes an essential feature, allowing for minimizing costs while accommodating fluctuations of load. One approach to scalability involves horizontal database partitioning and dynamic migration of partitions between servers. We define a scale-out operation as a combination of provisioning a new server followed by migration of one or more partitions to the newly-allocated server. In this thesis we study the efficiency of different implementations of the scale-out operation in the context of online transaction processing (OLTP) workloads. We designed and implemented three migration mechanisms featuring different strategies for data transfer. The first one is based on a modification of the Xen hypervisor, Snowflock, and uses on-demand block transfers for both server provisioning and partition migration. The second one is implemented in a database management system (DBMS) and uses bulk transfers for partition migration, optimized for higher bandwidth utilization. The third one is a conventional application, using SQL commands to copy partitions between servers. We perform an experimental comparison of those scale-out mechanisms for disk-bound and CPU-bound configurations. When comparing the mechanisms we analyze their impact on whole-system performance and on the experience of individual clients. DBMS Scalability Cloud Computing Computer Science
17	Scalable application-aware router mechanisms Awad, Ashraf A. 01 December 2003 (has links) No description available. Computer networks Scalability Routers (Computer networks)
18	System Support for Scalable, Reliable and Highly Manageable Internet Services Luo, Mon-Yen 13 September 2002 (has links) The Internet is increasingly being used as basic infrastructure for a variety of services. A high-performance server system is the key to the success of all these Internet services. However, the explosive growth of Internet has resulted in heavy demands being placed on Internet servers and has raised great concerns in terms of performance, scalability and availability of the associated services. A monolithic server hosting a service is usually not sufficient to handle these challenges. Distributed server architecture, consisting of multiple heterogeneous computers that appears as a single high performance system, has proven a successful and cost effective alternative to meet these challenges. Consequently, more and more Internet service providers run their service on a cluster of servers, and this trend is accelerating. The distributed server architecture is only an insufficient answer to the challenges faced by Internet service providers today. This thesis presents an integrated system for supporting scalable and highly reliable Internet services on the distributed server architecture. This system is composed consists of two major parts: Server Load Balancer and Distributed Server management System. The server load balancer can intelligently route the incoming requests to the appropriate server node. The Java-based management system can relieve administrator¡¥s burden on managing such a distributed server system. With these mechanisms, we can provide an integrated system to consolidate a group of heterogeneous computers to be a powerful, adaptive, reliable Internet server system. scalability Internet WWW reliability server cluster
19	Capacity and scale-free dynamics of evolving wireless networks Iyer, Bharat Vishwanathan 17 February 2005 (has links) Many large-scale random graphs (e.g., the Internet) exhibit complex topology, nonhomogeneous spatial node distribution, and preferential attachment of new nodes. Current topology models for ad-hoc networks mostly consider a uniform spatial distribution of nodes and do not capture the dynamics of evolving, real-world graphs, in which nodes "gravitate" toward popular locations and self-organize into non-uniform clusters. In this thesis, we first investigate two constraints on scalability of ad-hoc networks network reliability and node capacity. Unlike other studies, we analyze network resilience to node and link failure with an emphasis on the growth (i.e., evolution) dynamics of the entire system. Along the way, we also study important graph-theoretic properties of ad-hoc networks (including the clustering coefficient and the expected path length) and strengthen our generic understanding of these systems. Finally, recognizing that under existing uniform models future ad-hoc networks cannot scale beyond trivial sizes, we argue that ad-hoc networks should be modeled from an evolution standpoint, which takes into account the well-known "clustering" phenomena observed in all real-world graphs. This model is likely to describe how future ad-hoc networks will self-organize since it is well documented that information content distribution among end-users (as well as among spatial locations) is non-uniform (often heavy-tailed). Results show that node capacity in the proposed evolution model scales to larger network sizes than in traditional approaches, which suggest that non-uniformly clustered, self-organizing, very large-scale ad-hoc networks may become feasible in the future. wireless networks evolution models scalability capacity
20	On the information flow required for the scalability of the stability of motion of approximately rigid formation Yadlapalli, Sai Krishna 29 August 2005 (has links) It is known in the literature on Automated Highway Systems that information flow can significantly affect the propagation of errors in spacing in a collection of vehicles. This thesis investigates this issue further for a homogeneous collection of vehicles. Specifically, we consider the effect of information flow on the propagation of errors in spacing and velocity in a collection of vehicles trying to maintain a rigid formation. The motion of each vehicle is modeled using a Linear Time Invariant (LTI) system. We consider undirected and connected information flow graphs, and assume that that each vehicle can communicate with a maximum of q(n) vehicles, where q(n) may vary with the size n of the collection. The feedback controller of each vehicle takes into account the aggregate errors in position and velocity of the vehicles, with which it is in direct communication. The controller is chosen in such a way that the resulting closed loop system is a Type-2 system. This implies that the loop transfer function must have at least two poles at the origin. We then show that if the loop transfer function has three or more poles at the origin, and if the size of the formation is sufficiently large, then the motion of the collection is unstable. Suppose l is the number of poles of the transfer function relating the position of a vehicle with the control input at the origin of the complex plane, and if the number (q(n)l+1)/(nl) -> 0 as n -> (Infinity), then we show that there is a low frequency sinusoidal disturbance with unity maximum amplitude acting on each vehicle such that the maximum errors in spacing response increase at least as much as O (square_root(n^l/(q(n)^(l+1)) ) consequence of the results presented in this paper is that the maximum of the error in spacing and velocity of any vehicle can be made insensitive to the size of the collection only if there is at least one vehicle in the collection that communicates with at least O(square_root(n)) other vehicles in the collection. Approximately Rigid Formations Autonomous Vehicles Scalability of Formations

Search results