Global ETD Search

1	Space-efficient data sketching algorithms for network applications Hua, Nan 06 July 2012 (has links) Sketching techniques are widely adopted in network applications. Sketching algorithms “encode” data into succinct data structures that can later be accessed and “decoded” for various purposes, such as network measurement, accounting, anomaly detection and etc. Bloom filters and counter braids are two well-known representatives in this category. Those sketching algorithms usually need to strike a tradeoff between performance (how much information can be revealed and how fast) and cost (storage, transmission and computation). This dissertation is dedicated to the research and development of several sketching techniques including improved forms of stateful Bloom Filters, Statistical Counter Arrays and Error Estimating Codes. Bloom filter is a space-efficient randomized data structure for approximately representing a set in order to support membership queries. Bloom filter and its variants have found widespread use in many networking applications, where it is important to minimize the cost of storing and communicating network data. In this thesis, we propose a family of Bloom Filter variants augmented by rank-indexing method. We will show such augmentation can bring a significant reduction of space and also the number of memory accesses, especially when deletions of set elements from the Bloom Filter need to be supported. Exact active counter array is another important building block in many sketching algorithms, where storage cost of the array is of paramount concern. Previous approaches reduce the storage costs while either losing accuracy or supporting only passive measurements. In this thesis, we propose an exact statistics counter array architecture that can support active measurements (real-time read and write). It also leverages the aforementioned rank-indexing method and exploits statistical multiplexing to minimize the storage costs of the counter array. Error estimating coding (EEC) has recently been established as an important tool to estimate bit error rates in the transmission of packets over wireless links. In essence, the EEC problem is also a sketching problem, since the EEC codes can be viewed as a sketch of the packet sent, which is decoded by the receiver to estimate bit error rate. In this thesis, we will first investigate the asymptotic bound of error estimating coding by viewing the problem from two-party computation perspective and then investigate its coding/decoding efficiency using Fisher information analysis. Further, we develop several sketching techniques including Enhanced tug-of-war(EToW) sketch and the generalized EEC (gEEC)sketch family which can achieve around 70% reduction of sketch size with similar estimation accuracies. For all solutions proposed above, we will use theoretical tools such as information theory and communication complexity to investigate how far our proposed solutions are away from the theoretical optimal. We will show that the proposed techniques are asymptotically or empirically very close to the theoretical bounds. Sketching algorithms Bloom filter Statistics counter Error estimating codes Rank-indexing Anomaly detection (Computer security) Hashing (Computer science) Algorithms
2	Towards Ideal Network Traffic Measurement: A Statistical Algorithmic Approach Zhao, Qi 03 October 2007 (has links) With the emergence of computer networks as one of the primary platforms of communication, and with their adoption for an increasingly broad range of applications, there is a growing need for high-quality network traffic measurements to better understand, characterize and engineer the network behaviors. Due to the inherent lack of fine-grained measurement capabilities in the original design of the Internet, it does not have enough data or information to compute or even approximate some traffic statistics such as traffic matrices and per-link delay. While it is possible to infer these statistics from indirect aggregate measurements that are widely supported by network measurement devices (e.g., routers), how to obtain the best possible inferences is often a challenging research problem. We name this as "too little data" problem after its root cause. Interestingly, while "too little data" is clearly a problem, "too much data" is not a blessing either. With the rapid increase of network link speeds, even to keep sampled summarized network traffic (for inferring various network statistics) at low sample rates results in too much data to be stored, processed, and transmitted over measurement devices. In summary high-quality measurements in today's Internet is very challenging due to resource limitations and lack of built-in support, manifested as either "too little data" or "too much data". We present some new practices and proposals to alleviate these two problems.The contribution is four fold: i) designing universal methodologies towards ideal network traffic measurements; ii) providing accurate estimations for several critical traffic statistics guided by the proposed methodologies; iii) offering multiple useful and extensible building blocks which can be used to construct a universal network measurement system in the future; iv) leading to some notable mathematical results such as a new large deviation theorem that finds applications in various areas. Network measurements Traffic analysis Data streaming Performance evaluation Statistics counter Computer networks Routing (Computer network management) Inference Network performance (Telecommunication) Telecommunication Traffic
3	Konzeption eines RDF-Vokabulars für die Darstellung von COUNTER-Nutzungsstatistiken: innerhalb des Electronic Resource Management Systems der Universitätsbibliothek Leipzig Domin, Annika 04 July 2014 (has links) Die vorliegende Masterarbeit dokumentiert die Erstellung eines RDF-basierten Vokabulars zur Darstellung von Nutzungsstatistiken elektronischer Ressourcen, die nach dem COUNTER-Standard erstellt wurden. Die konkrete Anwendung dieses Vokabulars bildet das Electronic Resource Management System (ERMS), welches momentan von der Universitätsbibliothek Leipzig im Rahmen des kooperativen Projektes AMSL entwickelt wird. Dieses basiert auf Linked Data, soll die veränderten Verwaltungsprozesse elektronischer Ressourcen abbilden können und gleichzeitig anbieterunabhängig und flexibel sein. Das COUNTER-Vokabular soll aber auch über diese Anwendung hinaus einsetzbar sein. Die Arbeit gliedert sich in die beiden Teile Grundlagen und Modellierung. Im ersten Teil wird zu nächst die bibliothekarische Notwendigkeit von ERM-Systemen herausgestellt und der Fokus der Betrachtung auf das Teilgebiet der Nutzungsstatistiken und die COUNTER-Standardisierung gelenkt. Anschließend werden die technischen Grundlagen der Modellierung betrachtet, um die Arbeit auch für nicht mit Linked Data vertraute Leser verständlich zu machen. Darauf folgt der Modellierungsteil, der mit einer Anforderungsanalyse sowie der Analyse des den COUNTER-Dateien zugrunde liegenden XML-Schemas beginnt. Daran schließt sich die Modellierung des Vokabulars mit Hilfe von RDFS und OWL an. Aufbauend auf angestellten Überlegungen zur Übertragung von XML-Statistiken nach RDF und der Vergabe von URIs werden anschließend reale Beispieldateien manuell konvertiert und in einem kurzen Test erfolgreich überprüft. Den Abschluss bilden ein Fazit der Arbeit sowie ein Ausblick auf das weitere Verfahren mit den Ergebnissen. Das erstellte RDF-Vokabular ist bei GitHub unter der folgenden URL zur Weiterverwendung hinterlegt: https://github.com/a-nnika/counter.vocab:Inhaltsverzeichnis Abbildungsverzeichnis 6 Tabellenverzeichnis 7 Abkürzungsverzeichnis 8 1 Einleitung 9 1.1 Problematik, Ziel und Abgrenzung 9 1.2 Zielgruppe, Methodik und Aufbau 11 1.3 Forschungsstand und Quellenlage 13 TEIL I - Grundlagen 17 2 Bibliothekarische Ausgangssituation 18 2.1 Electronic Resource Management 18 2.2 Nutzungsdaten elektronischer Ressourcen 20 2.3 Projekt AMSL 23 3 Technischer Hintergrund 26 3.1 XML 26 3.2 Linked Data und Semantic Web 27 3.3 Grundkonzepte der Modellierung 29 3.4 RDF 30 3.4.1 Datenmodell 30 3.4.2 Serialisierungen 34 3.5 RDFS 36 3.6 OWL 38 TEIL II - Modellierung 41 4 Vorarbeiten 42 4.1 Anforderungsanalyse 42 4.2 Analyse des COUNTER XML-Schemas 45 4.2.1 Grundstruktur 45 4.2.2 Details 48 4.3 Grundkonzeption 54 4.4 Verwendete Programme 56 4.4.1 Notepad++ 56 4.4.2 Raptor 58 4.4.3 OntoWiki 59 5 Realisierung des RDF-Vokabulars 61 5.1 Grundlegende Modellierung: RDFS 61 5.2 Erweiterung: OWL 70 5.3 Übertragung von XML-Daten nach RDF 75 5.4 URI-Vergabe 78 6 Test des Vokabulars 83 6.1 Planung des Tests 83 6.2 Erstellung von Testdatensätzen 85 6.3 Testergebnisse 87 7 Fazit und Ausblick 90 Literatur- und Quellenverzeichnis 93 Selbstständigkeitserklärung 101 Anhänge I info:eu-repo/classification/ddc/020 ddc:020

1

Page generated in 0.0738 seconds