In many network applications, such as the Internet and infrastructure networks, nodes fail or get congested dynamically, but tracking this information about all the nodes in a network where some dynamical processes are taking place is a fundamental problem. In this work, we study the problem of inferring the complete set of failed nodes, when only a sample of the node failures are known---we will be referring to this particular problem as prob{} . We consider the setting in which there exists correlations between node failures in networks, which has been studied in the case of many infrastructure networks. We formalize the prob{} problem using the Minimum Description Length (MDL) principle and we show that, in general, finding solutions that minimize the MDL cost is hard, and develop efficient algorithms with rigorous performance guarantees for finding near-optimal MDL cost solutions. We evaluate our methods on both synthetic and real world datasets, which includes the one from WAZE. WAZE is a crowd-sourced road navigation tool, that collects and presents the traffic incident reports. We found that the proposed greedy algorithm for this problem is able to recover $80%$, on average, of the failed nodes in a network for a given partial sample of input failures, which are sampled from the true set of failures at some predefined rate. Furthermore, we have also proved that this algorithm will find a solution that has MDL cost with an additive approximation guarantee of log(n) from the optimal. / Master of Science / In many real-world networks, such as Internet and Transportation networks, there will be some dynamical processes taking place. Due to the activity of these processes some of the elements in these networks may fail at random. For example service node failures in Internet, traffic congestion in road networks are some such scenarios. Identifying the complete state information of such networks is a fundamental problem. In this work, we study the problem of identifying unknown node failures in a network based on the partial observations – we referr to this problem as NetStateInf. Similar to some of the previous studies in this area we assume the settings where node failures in these networks are correlated. We approached this problem using Minimum Description Length (MDL) principle, which states that the information learned from a given data can be maximized by compressing it i.e., by identifying maximum number of patterns in the data. Using these concepts we have developed a mathematical representation of NetStateInf problem and proposed efficient algorithms with rigorous performance guarantees for finding the best set of failed nodes in the network that can best explain the observed faiures. We evaluated our algorithms against both synthetic – artificial network with failures generated based on a predefined mathematical model – and real-world data, for example traffic alerts data collected by WAZE, a crowdsourced navigation tool, for Boston road network. Using this approach we are able to recover around 80% of the failured nodes in the network from the given partial failure data. Furthermore, we have proved that our algorithm will find a solution that has a maximum cost difference of <i>log(n)</i> when compared with the optimal solution, where cost of a solution is the MDL way of representing its allignment with desired requirements.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/74983 |
Date | 09 February 2017 |
Creators | Rangudu, Venkata Pavan Kumar |
Contributors | Computer Science, Marathe, Madhav Vishnu, Vullikanti, Anil Kumar S., Bisset, Keith R. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.002 seconds