Global ETD Search

11	Towards Malleable Distributed Storage Systems˸ From Models to Practice / Malléabilité des Systèmes de Stockage Distribués ˸ Des Modèles à la Pratique Cheriere, Nathanaël 05 November 2019 (has links) Le Cloud, avec son modèle économique, offre la possibilité d’un gestion élastique des ressources; les utilisateurs peuvent louer des ressources selon leurs besoins. Cette élasticité permet de réduire les coûts énergétiques et financiers, et aide les applications à s’adapter aux charges de travail variables.Les applications manipulant de grandes quantités de données exécutées dans le Cloud ou sur des supercalculateurs sont souvent colocalisées avec un système de stockage distribué pour garantir un accès rapide aux données. Bien que de nombreux travaux aient été proposés pour redimensionner dynamiquement les capacités de calcul pour s’ajuster à la charge de travail, le stockage n’est pas considéré comme malléable (capable d’être redimensionné dynamiquement) puisque les transferts de grandes quantités de données nécessaires sont considérés trop lents. Cependant, le matériel et les techniques de stockage ont évolué et cette hypothèse doit être réévaluée.Dans cette thèse, nous présentons une étude sous différents angles des opérations de redimensionnement des systèmes de stockage distribués.Nous commençons par modéliser la durée minimale de ces opérations pour évaluer leur vitesse potentielle. Puis, nous développons un benchmark conçu pour mesurer la viabilité de la malléabilité d’un système de stockage sur une plateforme donnée. Finalement, nous implémentons un gestionnaire d’opérations de redimensionnement pour systèmes de stockage distribués qui décide et organise les transferts de données requis par ces opérations. / The Cloud, with its pay-as-you-go model, gives the possibility of elastic resource management; users can claim and release resources as needed. This elasticity leads to financial and energetical cost reductions, and helps applications to cope with varying workloads.Distributed cloud and HPC applications processing large amounts of data are often co-located with a distributed storage system in order to ensure fast data accesses. Although many works have been proposed to dynamically rescale the processing part of such systems to match their workload, the storage is never considered as malleable (able to be dynamically rescaled) since moving massive amounts of data around is assumed to be too slow in practice. However, in recent years hardware and storage techniques have evolved and this assumption needs to be revisited.In this thesis, we present a study of the rescaling operations in distributed storage systems approached from different angles. We start by modeling the minimal duration of rescaling operations to estimate their potential speed. Then, we develop a benchmark to measure the viability of distributed storage system malleability on a given platform. Last, we implement a rescaling manager for distributed storage systems that decides and organizes the data transfers required during a rescaling operation. Systèmes de Stockage Distribués Malléabilité Elasticité Modélisation Benchmark Distributed Storage Systems Malleability Elasticity Modeling Benchmarking
12	Towards a Flexible High-efficiency Storage System for Containerized Applications Zhao, Nannan 08 October 2020 (has links) Due to their tight isolation, low overhead, and efficient packaging of the execution environment, Docker containers have become a prominent solution for deploying modern applications. Consequently, a large amount of Docker images are created and this massive image dataset presents challenges to the registry and container storage infrastructure and so far has remained a largely unexplored area. Hence, there is a need of docker image characterization that can help optimize and improve the storage systems for containerized applications. Moreover, existing deduplication techniques significantly degrade the performance of registries, which will slow down the container startup time. Therefore, there is growing demand for high storage efficiency and high-performance registry storage systems. Last but not least, different storage systems can be integrated with containers as backend storage systems and provide persistent storage for containerized applications. So, it is important to analyze the performance of different backend storage systems and storage drivers and draw out the implications for container storage system design. These above observations and challenges motivate my dissertation. In this dissertation, we aim to improve the flexibility, performance, and efficiency of the storage systems for containerized applications. To this end, we focus on the following three important aspects: Docker images, Docker registry storage system, and Docker container storage drivers with their backend storage systems. Specifically, this dissertation adopts three steps: (1) analyzing the Docker image dataset; (2) deriving the design implications; (3) designing a new storage framework for Docker registries and propose different optimizations for container storage systems. In the first part of this dissertation (Chapter 3), we analyze over 167TB of uncompressed Docker Hub images, characterize them using multiple metrics and evaluate the potential of le level deduplication in Docker Hub. In the second part of this dissertation (Chapter 4), we conduct a comprehensive performance analysis of container storage systems based on the key insights from our image characterizations, and derive several design implications. In the third part of this dissertation (Chapter 5), we propose DupHunter, a new Docker registry architecture, which not only natively deduplicates layers for space savings but also reduces layer restore overhead. DupHunter supports several configurable deduplication modes, which provide different levels of storage efficiency, durability, and performance, to support a range of uses. In the fourth part of this dissertation (Chapter 6), we explore an innovative holistic approach, Chameleon, that employs data redundancy techniques such as replication and erasure-coding, coupled with endurance-aware write offloading, to mitigate wear level imbalance in distributed SSD-based storage systems. This high-performance fash cluster can be used for registries to speedup performance. / Doctor of Philosophy / The amount of Docker images stored in Docker registries is increasing rapidly and present challenges for the underlying storage infrastructures. Before we do any optimizations for the storage system, we should first analyze this big Docker image dataset. To this end, in this dissertation we perform the first large-scale characterization and redundancy analysis of the images and layers stored in the Docker Hub registry. Based on the findings, this dissertation presents a series of practical and efficient techniques, algorithms, optimizations to achieve high performance and flexibility, and space-efficient storage system for containerized applications. The experimental evaluation demonstrates the effectiveness of our optimizations and techniques to make storage systems flexible and space-efficacy. Containers Distributed storage systems Deduplication Wear Balancing Flash memory Docker registry Docker images
13	Efficient Resource Allocation Schemes for Wireless Networks with with Diverse Quality-of-Service Requirements Kumar, Akshay 16 August 2016 (has links) Quality-of-Service (QoS) to users is a critical requirement of resource allocation in wireless networks and has drawn significant research attention over a long time. However, the QoS requirements differ vastly based on the wireless network paradigm. At one extreme, we have a millimeter wave small-cell network for streaming data that requires very high throughput and low latency. At the other end, we have Machine-to-Machine (M2M) uplink traffic with low throughput and low latency. In this dissertation, we investigate and solve QoS-aware resource allocation problems for diverse wireless paradigms. We first study cross-layer dynamic spectrum allocation in a LTE macro-cellular network with fractional frequency reuse to improve the spectral efficiency for cell-edge users. We show that the resultant optimization problem is NP-hard and propose a low-complexity layered spectrum allocation heuristic that strikes a balance between rate maximization and fairness of allocation. Next, we develop an energy efficient downlink power control scheme in a energy harvesting small-cell base station equipped with local cache and wireless backhaul. We also study the tradeoff between the cache size and the energy harvesting capabilities. We next analyzed the file read latency in Distributed Storage Systems (DSS). We propose a heterogeneous DSS model wherein the stored data is categorized into multiple classes based on arrival rate of read requests, fault-tolerance for storage etc. Using a queuing theoretic approach, we establish bounds on the average read latency for different scheduling policies. We also show that erasure coding in DSS serves the dual purpose of reducing read latency and increasing the energy efficiency. Lastly, we investigate the problem of delay-efficient packet scheduling in M2M uplink with heterogeneous traffic characteristics. We classify the uplink traffic into multiple classes and propose a proportionally-fair delay-efficient heuristic packet scheduler. Using a queuing theoretic approach, we next develop a delay optimal multiclass packet scheduler and later extend it to joint medium access control and packet scheduling for M2M uplink. Using extensive simulations, we show that the proposed schedulers perform better than state-of-the-art schedulers in terms of average delay and packet delay jitter. / PHD Quality-of-Service Dynamic Resource Allocation Cross-Layer Optimization Distributed Storage M2M Communication Delay-Optimal Scheduler
14	Repairing Cartesian Codes with Linear Exact Repair Schemes Valvo, Daniel William 10 June 2020 (has links) In this paper, we develop a scheme to recover a single erasure when using a Cartesian code,in the context of a distributed storage system. Particularly, we develop a scheme withconsiderations to minimize the associated bandwidth and maximize the associateddimension. The problem of recovering a missing node's data exactly in a distributedstorage system is known as theexact repair problem. Previous research has studied theexact repair problem for Reed-Solomon codes. We focus on Cartesian codes, and show wecan enact the recovery using a linear exact repair scheme framework, similar to the oneoutlined by Guruswami and Wooters in 2017. / Master of Science / Distributed storage systems are systems which store a single data file over multiple storage nodes. Each storage node has a certain storage efficiency, the "space" required to store the information on that node. The value of these systems, is their ability to safely store data for extended periods of time. We want to design distributed storage systems such that if one storage node fails, we can recover it from the data in the remaining nodes. Recovering a node from the data stored in the other nodes requires the nodes to communicate data with each other. Ideally, these systems are designed to minimize the bandwidth, the inter-nodal communication required to recover a lost node, as well as maximize the storage efficiency of each node. A great mathematical framework to build these distributed storage systems on is erasure codes. In this paper, we will specifically develop distributed storage systems that use Cartesian codes. We will show that in the right setting, these systems can have a very similar bandwidth to systems build from Reed-Solomon codes, without much loss in storage efficiency. Cartesian code Reed-Solomon code exact repair schemes finite fields distributed storage networks multivariate polynomials
15	On Codes for Private Information Retrieval and Ceph Implementation of a High-Rate Regenerating Code Vinayak, R January 2017 (has links) (PDF) Error-control codes, which are being extensively used in communication systems, have found themselves very useful in data storage as well during the past decade. This thesis deals with two types of codes for data storage, one pertaining to the issue of privacy and the other to reliability. In many scenarios, user accessing some critical data from a server would not want the server to learn the identity of data retrieved. This problem, called Private Information Retrieval (PIR) was rst formally introduced by Chor et al and they gave protocols for PIR in the case where multiple copies of the same data is stored in non-communicating servers. The PIR protocols that came up later also followed this replication model. The problem with data replication is the high storage overhead involved, which will lead to large storage costs. Later, Fazeli, Vardy and Yaakobi, came up with the notion of PIR code that enables information-theoretic PIR with low storage overhead. In the rst part of this thesis, construction of PIR codes for certain parameter values is presented. These constructions are based on a variant of conventional Reed-Muller (RM) codes called binary Projective Reed-Muller (PRM) codes. A lower bound on block length of systematic PIR codes is derived and the PRM based PIR codes are shown to be optimal with respect to this bound in some special cases. The codes constructed here have smaller block lengths than the short block length PIR codes known in the literature. The generalized Hamming weights of binary PRM codes are also studied. Another work described here is the implementation and evaluation of an erasure code called Coupled Layer (CL) code in Ceph distributed storage system. Erasure codes are used in distributed storage to ensure reliability. An additional desirable feature required for codes used in this setting is the ability to handle node repair efficiently. The Minimum Storage Regenerating (MSR) version of CL code downloads optimal amount of data from other nodes during repair of a failed node and even disk reads during this process is optimum, for that storage overhead. The CL-Near-MSR code, which is a variant of CL-MSR, can efficiently handle a restricted set of multiple node failures also. Four example CL codes were evaluated using a 26 node Amazon cluster and performance metrics like network bandwidth, disk read and repair time were measured. Repair time reduction of the order of 3 was observed for one of those codes, in comparison with Reed Solomon code having same parameters. To the best of our knowledge, such large gains in repair performance have never been demonstrated before. Error Control Codes Regenerating Code Ceph Implementation Private Information Retrieval Distributed Storage Reliability PIR Codes PRM Codes Coupled Layer Code Ceph Distributed Storage System Minimum Storage Regenerating (MSR) Codes CL-MSR Code CL-NMSR Code Electrical Communication Engineering
16	Distributed Data Storage System for Data Survivability in Wireless Sensor Networks Al-Awami, Louai 03 October 2013 (has links) Wireless Sensor Networks (WSNs) that use tiny wireless devices capable of communicating, processing, and sensing promise to have applications in virtually all fields. Smart homes and smart cities are just few of the examples that WSNs can enable. Despite their potential, WSNs suffer from reliability and energy limitations. In this study, we address the problem of designing Distributed Data Storage Systems (DDSSs) for WSNs using decentralized erasure codes. A unique aspect of WSNs is that their data is inherently decentralized. This calls for a decentralized mechanism for encoding and decoding. We propose a distributed data storage framework to increase data survivability in WSNs. The framework utilizes Decentralized Erasure Codes for Data Survivability (DEC-DS) which allow for determining the amount of redundancy required in both hardware and data to allow sensed data to survive failures in the network. To address the energy limitations, we show two approaches to implement the proposed solution in an energy efficient manner. The two approaches employ Random Linear Network Coding (RLNC) to exploit coding opportunities in order to save energy and in turn prolong network life. A routing based scheme, called DEC Encode-and-Forward (DEC-EaF), applies to networks with routing capability, while the second, DEC Encode-and-Disseminate (DEC-EaD), uses a variation of random walk to build the target code in a decentralized fashion. We also introduce a new decentralized approach to implement Luby Transform (LT)-Codes based DDSSs. The scheme is called Decentralized Robust Soliton Storage (DRSS) and it operates in a decentralized fashion and requires no coordination between sensor nodes. The schemes are tested through extensive simulations to evaluate their performance. We also compare the proposed schemes to similar schemes in the literature. The comparison considers energy efficiency as well as coding related aspects. Using the proposed schemes can greatly improve the reliability of WSNs especially under harsh working conditions. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2013-09-30 22:43:04.509 wireless sensor networks data survivability fountain codes distributed storage rateless codes decentralized erasure codes network coding erasure codes
17	Linear Exact Repair Schemes for Distributed Storage and Secure Distributed Matrix Multiplication Valvo, Daniel William 08 May 2023 (has links) In this thesis we develop exact repair schemes capable of repairing or circumventing unavailable servers of a distributed network in the context of distributed storage and secure distributed matrix multiplication. We develop the (Λ, Γ, W, ⊙)-exact repair scheme framework for discussing both of these contexts and develop a multitude of explicit exact repair schemes utilizing decreasing monomial-Cartesian codes (DMC codes). Specifically, we construct novel DMC codes in the form of augmented Cartesian codes and rectangular monomial-Cartesian codes, as well as design exact repair schemes utilizing these constructions inspired by the schemes from Guruswami and Wootters [16] and Chen and Zhang [6]. In the context of distributed storage we demonstrate the existence of both high rate and low bandwidth systems based on these schemes, and we develop two methods to extend them to the l-erasure case. Additionally, we develop a family of hybrid schemes capable of attaining high rates, low bandwidths, and a balance in between which proves to be competitive compared to existing schemes. In the context of secure distributed matrix multiplication we develop similarly impactful schemes which have very competitive communication costs. We also construct an encoding algorithm based on multivariate interpolation and prove it is T-secure. / Doctor of Philosophy / Distributed networks may be thought of as networks of computers and/or servers which are capable of transmitting and receiving data from one another. For many applications it is possible for distributed networks to perform better than the sum of their constituent parts. In this thesis we will focus on the particular applications of distributed storage and secure distributed multiplication. A distributed storage system is a system that is capable of storing a single data file over every server in a distributed network. Distributed storage systems often come with exact repair schemes which are algorithms designed to reconstruct the data from a server in the network given the data from the other servers. In particular, if a server on the network ever fails or is otherwise unavailable an exact repair scheme can be used to repair the lost data from the server and maintain the original file. A distributed matrix multiplication scheme on the other hand is a process by which two matrices stored on a source server can be multiplied using a distributed network of helper servers. Again if a helper server becomes unavailable during this process we may use an exact repair scheme to circumvent this delay. The main goal of this thesis is to develop exact repair schemes for the distributed storage and secure distributed matrix multiplication contexts utilizing a mathematical object known as an evaluation code. We will develop several families of exact repair schemes which may be finely tuned to fit particular situations within these contexts, and we will compare these schemes to the existing schemes in the field. coding theory erasure recovery locally recoverable code linear exact repair scheme distributed storage matrix multiplication parallel computing field trace
18	Information-Theoretically Secure Communication Under Channel Uncertainty Ly, Hung Dinh 2012 May 1900 (has links) Secure communication under channel uncertainty is an important and challenging problem in physical-layer security and cryptography. In this dissertation, we take a fundamental information-theoretic view at three concrete settings and use them to shed insight into efficient secure communication techniques for different scenarios under channel uncertainty. First, a multi-input multi-output (MIMO) Gaussian broadcast channel with two receivers and two messages: a common message intended for both receivers (i.e., channel uncertainty for decoding the common message at the receivers) and a confidential message intended for one of the receivers but needing to be kept asymptotically perfectly secret from the other is considered. A matrix characterization of the secrecy capacity region is established via a channel-enhancement argument and an extremal entropy inequality previously established for characterizing the capacity region of a degraded compound MIMO Gaussian broadcast channel. Second, a multilevel security wiretap channel where there is one possible realization for the legitimate receiver channel but multiple possible realizations for the eavesdropper channel (i.e., channel uncertainty at the eavesdropper) is considered. A coding scheme is designed such that the number of secure bits delivered to the legitimate receiver depends on the actual realization of the eavesdropper channel. More specifically, when the eavesdropper channel realization is weak, all bits delivered to the legitimate receiver need to be secure. In addition, when the eavesdropper channel realization is strong, a prescribed part of the bits needs to remain secure. We call such codes security embedding codes, referring to the fact that high-security bits are now embedded into the low-security ones. We show that the key to achieving efficient security embedding is to jointly encode the low-security and high-security bits. In particular, the low-security bits can be used as (part of) the transmitter randomness to protect the high-security ones. Finally, motivated by the recent interest in building secure, robust and efficient distributed information storage systems, the problem of secure symmetrical multilevel diversity coding (S-SMDC) is considered. This is a setting where there are channel uncertainties at both the legitimate receiver and the eavesdropper. The problem of encoding individual sources is first studied. A precise characterization of the entire admissible rate region is established via a connection to the problem of secure coding over a three-layer wiretap network and utilizing some basic polyhedral structure of the admissible rate region. Building on this result, it is then shown that the simple coding strategy of separately encoding individual sources at the encoders can achieve the minimum sum rate for the general S-SMDC problem. Channel uncertainty secure communication MIMO secure communication security embedding secure distributed storage systems physical-layer security
19	Tromos : a software development kit for virtual storage systems / Tromos : un cadre pour la construction de systèmes de stockage distribués Nikolaidis, Fotios 22 May 2019 (has links) Les applications modernes ont des tendances de diverger à la fois le profile I/O et les requiers du stockage. La liaison d'une application scientifique ou commerciale avec un system "general-purpose" produit probablement un résultât sous-optimale. Même sous la présence des systèmes "purpose specific" des application aux classes multiples de workloads ont encore besoin de distribuer du travail de calcul au correct system. Cependant, cette stratégie n'est pas triviale comme des plateformes différentes butent diversifier leur propos et par conséquence elles requièrent que l'application intégrée des chemins multiples de code. Le but de l'implémentation de ces chemins n'est pas trivial, il requiert beaucoup d'effort et des capacités de codage. Le problème devient vaste quand les applications ont besoin de bénéficier de plusieurs data-stores en parallèle. Dans cette dissertation, on va introduire les "storage containers" comme le prochain étape logique, mais révolutionnaire. Un "storage container" est une infrastructure virtuelle qui découple une application de ses data-stores correspondants avec la même manière que Docker découple l'application runtime des servers physiques. En particulier, un "storage container" est un middleware qui sépare des changements fait pour bouts de code des application par des utilisateurs scientifiques, de celui fait pour des actions de I/O par des développeurs ou des administrateurs.Pour faciliter le développement et déploiement d'un "storage container" on va introduire un cadre appelé Tromos. Parmi son filtre, tout qui est nécessaire pour qu'un architecte d'une application construite une solution de stockage est de modéliser l'environnement voulu dans un fichier de définition and laisser le reste au logiciel. Tromos est livré avec un dépôt de plugins parmi les quelles l'architecte peut choisir d'optimiser le conteneur pour l'application activée. Parmi des options disponibles, sont inclus des transformations des données, des politiques de placement des données, des méthodes de reconstruction des données, du management d'espace de noms, et de la gestion de la cohérence à la demande. Comme preuve de concept, on utilisera Tromos pour créer des environnements de stockage personnalisés facilement comparés à Gluster, un système de stockage bien établi et polyvalent. Les résultats vous montrent que les "storage containers" adaptés aux applications, même s'ils sont auto-produits, peuvent surpasser les systèmes "general purpose" les plus sophistiqués en supprimant simplement la surcharge inutile de fonctionnalités factices. / Modern applications tend to diverge both in the I/O profile and storage requirements. Matching a scientific or commercial application with a general-purpose system will most likely yield suboptimal performance. Even in the presence of purpose-specific' systems, applications with multiple classes of workloads are still in need to disseminate the workload to the right system. This strategy, however, is not trivial as different platforms aim at diversified goals and therefore require the application to incorporate multiple codepaths. Implementing such codepaths is non-trivial, requires a lot of effort and programming skills, and is error-prone. The hurdles are getting worse when applications need to leverage multiple data-stores in parallel. In this dissertation, we introduce "storage containers" as the next logical in the storage evolution. A "storage container" is virtual infrastructure that decouples the application from the underlying data-stores in the same way Docker decouples the application runtime from the physical servers. In other words, it is middleware that separate changes made to application codes by science users from changes made to I/O actions by developers or administrators.To facilitate the development and deployment of a "storage container" we introduce a framework called Tromos. Through its lens, all that it takes for an application architect to spin-up a custom storage solution is to model the target environment into a definition file and let the framework handles the rest. Tromos comes with a repository of plugins which the architect can choose as to optimize the container for the application at hand. Available options include data transformations, data placement policies, data reconstruction methods, namespace management, and on-demand consistency handling.As a proof-of-concept we use Tromos to prototype customized storage environments which we compare against Gluster; a well-estalished and versatile storage system. The results have shown that application-tailored "storage containers", even if they are auto-produced, can outperform more mature "general-purpose" systems by merely removing the unnecessary overhead of unused features. “Blobs storage” Stockage distribué Hpc Stockage sur le nuage Storage containers Cloud storage and technologies Distributed storage. Storage blobs Hpc Storage containers 005.435
20	Méthodes et outils d'analyse de données de signalisation mobile pour l'étude de la mobilité humaine / Methods and analysis tools for human mobility study, based on mobile network signaling data Sultan, Alexis 28 September 2016 (has links) Cette thèse a pour but d’étudier les activités humaines à travers l’analyse du flux de signalisation du réseau cellulaire de données (GTP). Pour ce faire, nous avons mis en place un ensemble d’outils nous permettant de collecter, stocker et analyser ces données de signalisation. Ceci en se basant sur une architecture indépendante au maximum des constructeurs de matériel. À partir des données extraites par cette plateforme nous avons fait trois contributions.Dans une première contribution, nous présentons l’architecture de la plateforme de capture et d’analyse de la signalisation GTP dans un réseau d’opérateur. Ce travail a pour but de faire l’inventaire des différents éléments déclenchant des mises à jour et aussi d’estimer la précision temporelle et spatiale des données collectées. Ensuite, nous présentons une série de mesures, mettant en avant les caractéristiques principales de la mobilité humaine observées au travers de la signalisation mobile (le temps inter-arrivées des messages de mise à jour, la distance observée des sauts entre cellules lors des déplacements des clients). Finalement, nous présentons l’analyse des compromis qui ont été faits entre la rapidité d’écriture/de lecture et la facilité d’usage du format de fichier utilisé lors de l’échange d’informations entre les sondes de capture et le système stockage. Deuxièmement, nous avons été capables de mettre en place un algorithme de reconstitution de trajets. Cet algorithme permet, à partir de données éparses issues du réseau cellulaire, de forger des trajets sur les voies de transport. Il se base sur les données des trajets sous-échantillonnées et en déduit les positions du client sur les voies de communication. Nous avons mis en place un graphe de transport intermodal. Celui-ci porte sur le métro, le train et le réseau routier. Il connecte les différents points entre eux dans chacune des couches de transport et interconnecte les modes de transport entre eux, aux intersections. Notre algorithme se base sur un modèle de chaîne de Markov cachée pour placer sur le graphe les positions probables des individus entre les différentes observations. L’apport de ce travail est l’utilisation des propriétés topologiques du réseau de transport afin de renseigner les probabilités d’émission et de transition dans un modèle non supervisé. Ces travaux ont donné lieu à une publication et à un brevet. Finalement, notre dernière contribution utilise les données issues de la signalisation à des fins de dimensionnement du réseau mobile d’opérateur. Il s’agit de dimensionner dynamiquement un réseau mobile en utilisant les bandes de fréquences dites vTV-Whitespace. Ces bandes de fréquences sont libérées sous certaines conditions aux USA et soumises à vente aux enchères. Ce que nous proposons est un système basé sur un algorithme de qualité d’expérience (QoE) et sur le coût de la ressource radio afin de choisir où déployer des femtocells supplémentaires et où en supprimer en fonction des variations de population par unité d’espace. En conclusion, cette thèse offre un aperçu du potentiel de l’analyse des metadata de signalisation d’un réseau dans un contexte plus général que la simple supervision d’un réseau d’opérateur / The aim of this thesis is to study human activities through the analysis of the signaling flow in cellular data network (GTP). In order to achieve this goal, we implemented a set of tools allowing us to collect, store and analyze this signaling data. We created an architecture independent at most of hardware manufacturers and network operators. Using data extracted by this platform we made three main contributions. In our first contribution, we present the GTP capture and analysis platform in a mobile operator network. This work intends to list the different elements triggering updates and to estimate the temporal and spatial accuracy of the data collected. Next, we present a set of measures that represent the main characteristics of human mobility observed through the mobile signaling data (the inter-arrival time of update messages, the observed distances of hops from cell to cell made by moving users). Finally, we present the analysis of the compromise that was made between the writing/reading performances and the ease of use of the file format for the data storage. In our second contribution, we propose CT-Mapper, an unsupervised algorithm that enables the mapping of mobile phone traces over a multimodal transport network. One of the main strengths of CT-Mapper is its capability to map noisy sparse cellular multimodal trajectories over a multilayer transportation network where the layers have different physical properties and not only to map trajectories associated with a single layer. Such a network is modeled by a large multilayer graph in which the nodes correspond to metro/train stations or road intersections and edges correspond to connections between them. The mapping problem is modeled by an unsupervised HMM where the observations correspond to sparse user mobile trajectories and the hidden states to the multilayer graph nodes. The HMM is unsupervised as the transition and emission probabilities are inferred using respectively the physical transportation properties and the information on the spatial coverage of antenna base stations. Finally, in our last contribution we propose a method for cellular resource planning taking into account user mobility. Since users move, the bandwidth resource should move accordingly. We design a score based method using TV Whitespace, and user experience, to determine from which cell resource should be removed and to which one it should be added. Combined with traffic history it calculates scores for each cell. Bandwidth is reallocated on a half-day basis. Before that, real traces of cellular networks in urban districts are presented which confirm that static network planning is no longer optimal. A dynamic femtocell architecture is then presented. It is based on mesh interconnected elements and designed to serve the score based bandwidth allocation algorithm. The score method along with the architecture are simulated and results are presented. They confirm the expected improvement in bandwidth and delay per user while maintaining a low operation cost at the operator side. In conclusion, this thesis provides an overview of the potential of analyzing the signaling metadata of a network in a broader context that supervision of an operator network Réseau cellulaire Mobilité humaine Dimensionnement de réseau Capture passive de données Stockage distribué Analyse distribuée Cellular network Human mobility Network capacity planning Passive network capture Distributed storage Distributed analysis

Search results