• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 13
  • 4
  • 3
  • 2
  • Tagged with
  • 25
  • 25
  • 10
  • 7
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Raciocínio baseado em casos aplicado ao gerenciamento de falhas em redes de computadores / Case-based reasoning applied to fault management in computer networks

Melchiors, Cristina January 1999 (has links)
Com o crescimento do número e da heterogeneidade dos equipamentos presentes nas atuais redes de computadores, o gerenciamento eficaz destes recursos toma-se crítico. Esta atividade exige dos gerentes de redes a disponibilidade de uma grande quantidade de informações sobre os seus equipamentos, as tecnologias envolvidas e os problemas associados a elas. Sistemas de registro de problemas (trouble ticket systems) tem lido utilizados para armazenar os incidentes ocorridos, servindo como uma memória histórica da rede e acumulando o conhecimento derivado do processo de diagnose e resolução de problemas. Todavia, o crescente número de registros armazenados torna a busca manual nestes sistemas por situações similares ocorridas anteriormente muito morosa e imprecisa. Assim, uma solução apropriada para consolidar a memória histórica das redes é o desenvolvimento de um sistema especialista que utilize o conhecimento armazenado nos sistemas de registro de problemas para propor soluções para um problema corrente. Uma abordagem da Inteligência Artificial que tem atraído enorme atenção nos últimos anos e que pode ser utilizada para tal fim é o raciocínio baseado em casos (casebased reasoning). Este paradigma de raciocínio visa propor soluções para novos problemas através da recuperação de um caso similar ocorrido no passado, cuja solução pode ser reutilizada na nova situação. Além disso, os benefícios deste paradigma incluem a capacidade de aprendizado com a experiência, permitindo que novos problemas sejam incorporados e se tomem disponíveis para use em situações futuras, aumentando com isso o conhecimento presente no sistema. Este trabalho apresenta um sistema que utiliza o paradigma de raciocínio baseado em casos aplicado a um sistema de registro de problemas para propor soluções para um novo problema. Esse sistema foi desenvolvido com o propósito de auxiliar no diagnostico e resolução dos problemas em redes. Os problemas típicos deste domínio, a abordagem adotada e os resultados obtidos com o protótipo construído são descritos. / With the increasing number of computer equipments and their increasing heterogeneity, the efficient management of those resources has become a hard job. This activity demands from the network manager a big amount of expertise on network equipments, technologies involved, and eventual problems that may arise. So far, trouble ticket systems (TTS) have been used to store network problems, working like a network historical memory and accumulating the knowledge derived from the diagnosis and troubleshooting of such problems. However, the increasing number of stored tickets makes the manual search of similar situations very slow and inaccurate in these kind of systems. So, an adequate approach to consolidate the network historic memory is the development of an expert system that uses the knowledge stored in the trouble ticket systems to propose a solution for a current problem. Case-based reasoning (CBR), an approach borrowed from Artificial Intelligence that recently had attracted many researchers attention, may be applied to help diagnosing and troubleshooting networking management problems. This reasoning paradigm proposes solution to new problems by retrieving a similar case occurred in the past, whose solution can be reused in the new situation. Furthermore, the benefits of this paradigm include the experience learning capability, allowing new problems being added and becoming available to use in future situations, expanding the knowledge of the system. This work presents a system that uses case-based reasoning applied to a trouble ticket system to propose solutions for a new problem in the network. This system was developed with the aim of helping the diagnostic and troubleshooting of network problems. It describes the typical problems of this domain, the adopted approach and the results obtained with the prototype built.
12

Raciocínio baseado em casos aplicado ao gerenciamento de falhas em redes de computadores / Case-based reasoning applied to fault management in computer networks

Melchiors, Cristina January 1999 (has links)
Com o crescimento do número e da heterogeneidade dos equipamentos presentes nas atuais redes de computadores, o gerenciamento eficaz destes recursos toma-se crítico. Esta atividade exige dos gerentes de redes a disponibilidade de uma grande quantidade de informações sobre os seus equipamentos, as tecnologias envolvidas e os problemas associados a elas. Sistemas de registro de problemas (trouble ticket systems) tem lido utilizados para armazenar os incidentes ocorridos, servindo como uma memória histórica da rede e acumulando o conhecimento derivado do processo de diagnose e resolução de problemas. Todavia, o crescente número de registros armazenados torna a busca manual nestes sistemas por situações similares ocorridas anteriormente muito morosa e imprecisa. Assim, uma solução apropriada para consolidar a memória histórica das redes é o desenvolvimento de um sistema especialista que utilize o conhecimento armazenado nos sistemas de registro de problemas para propor soluções para um problema corrente. Uma abordagem da Inteligência Artificial que tem atraído enorme atenção nos últimos anos e que pode ser utilizada para tal fim é o raciocínio baseado em casos (casebased reasoning). Este paradigma de raciocínio visa propor soluções para novos problemas através da recuperação de um caso similar ocorrido no passado, cuja solução pode ser reutilizada na nova situação. Além disso, os benefícios deste paradigma incluem a capacidade de aprendizado com a experiência, permitindo que novos problemas sejam incorporados e se tomem disponíveis para use em situações futuras, aumentando com isso o conhecimento presente no sistema. Este trabalho apresenta um sistema que utiliza o paradigma de raciocínio baseado em casos aplicado a um sistema de registro de problemas para propor soluções para um novo problema. Esse sistema foi desenvolvido com o propósito de auxiliar no diagnostico e resolução dos problemas em redes. Os problemas típicos deste domínio, a abordagem adotada e os resultados obtidos com o protótipo construído são descritos. / With the increasing number of computer equipments and their increasing heterogeneity, the efficient management of those resources has become a hard job. This activity demands from the network manager a big amount of expertise on network equipments, technologies involved, and eventual problems that may arise. So far, trouble ticket systems (TTS) have been used to store network problems, working like a network historical memory and accumulating the knowledge derived from the diagnosis and troubleshooting of such problems. However, the increasing number of stored tickets makes the manual search of similar situations very slow and inaccurate in these kind of systems. So, an adequate approach to consolidate the network historic memory is the development of an expert system that uses the knowledge stored in the trouble ticket systems to propose a solution for a current problem. Case-based reasoning (CBR), an approach borrowed from Artificial Intelligence that recently had attracted many researchers attention, may be applied to help diagnosing and troubleshooting networking management problems. This reasoning paradigm proposes solution to new problems by retrieving a similar case occurred in the past, whose solution can be reused in the new situation. Furthermore, the benefits of this paradigm include the experience learning capability, allowing new problems being added and becoming available to use in future situations, expanding the knowledge of the system. This work presents a system that uses case-based reasoning applied to a trouble ticket system to propose solutions for a new problem in the network. This system was developed with the aim of helping the diagnostic and troubleshooting of network problems. It describes the typical problems of this domain, the adopted approach and the results obtained with the prototype built.
13

Caractérisation et modélisation électrothermique compacte étendue du MOSFET SiC en régime extrême de fonctionnement incluant ses modes de défaillance : application à la conception d'une protection intégrée au plus proche du circuit de commande / Extensive compact electrothermal characterization and modeling of the SiC MOSFET under extreme operating conditions including failure modes : application to the design of an integrated protection as close as possible to the gate driver

Boige, François 27 September 2019 (has links)
Le défi de la transition vers une énergie sans carbone passe, aujourd’hui, par un recours systématique à l’énergie électrique avec au centre des échanges l’électronique de puissance. Pour être à la hauteur des enjeux, l'électronique de puissance nécessite des composants de plusen plus performants pour permettre un haut niveau d'intégration, une haute efficacité énergétique et un haut niveau de fiabilité. Aujourd’hui, le transistor de puissance, du type MOSFET, en carbure de silicium (SiC) est une technologie de rupture permettant de répondre aux enjeux d’intégration et d’efficacité par un faible niveau de perte et une vitesse de commutation élevée. Cependant, leur fiabilité non maitrisée et leur faible robustesse aux régimes extrêmes du type court-circuit répétitifs freinent aujourd’hui leur pénétration dans les applications industrielles. Dans cette thèse, une étude poussée du comportement en court-circuit d'un ensemble exhaustif de composants commerciaux, décrivant toutes les variantes structurelles et technologiques en jeu, a été menée sur un banc de test spécifique développé durant la thèse, afin de quantifier leur tenue au courtcircuit. Cette étude a mis en lumière des propriétés à la fois génériques et singulières aux semiconducteurs en SiC déclinés en version MOSFET tel qu’un courant de fuite dynamique de grille et un mode de défaillance par un court-circuit grille-source amenant, dans certaines conditions d'usage et pour certaines structures de MOSFET, à un auto-blocage drain-source. Une recherchesystématique de la compréhension physique des phénomènes observés a été menée par une approche mêlant analyse technologique interne des composants défaillants et modélisation électrothermique fine. Une modélisation électrothermique compacte étendue à la prise en compte des modes de défaillance a été établie et implémentée dans un logiciel de type circuit. Ce modèle a été confronté à de très nombreux résultats expérimentaux sur toutes les séquences temporelles décrivant un cycle de court-circuit jusqu'à la défaillance. Ce modèle offre un support d'analyse intéressant et aussi une aide à la conception des circuits de protection. Ainsi, à titre d'application, un driver doté d'une partie de traitement numérique a été conçu et validé en mode de détection de plusieurs scénarii de court-circuit mais aussi potentiellement pour la détection de la dégradation de la grille du composant de puissance. D’autres travaux plus exploratoires ont aussi été menés en partenariat avec l’Université de Nottingham afin d’étudier l'impact de régimes de court-circuit impulsionnels répétés sur le vieillissement de puces en parallèle présentant des dispersions. La propagation d'un premier mode de défaillance issu d'un composant "faible" a aussi été étudiée. Ce travail ouvre la voie à la conception de convertisseurs intrinsèquement sûrs et disponibles en tirant parti des propriétés atypiques et originales des semi-conducteurs en SiC et du MOSFET en particulier / Nowaday, the challenge of the transition to carbon-free energy involves a systematic use of electrical energy with power electronics at the heart of the exchanges. To meet the challenges, power electronics requires increasingly high-performance devices to provide a high level of integration, high efficiency and a high level of reliability. Today, the power transistor, of the MOSFET type, made of silicon carbide (SiC) is a breakthrough technology that allows us to meet the challenges of integration and efficiency through their low level of loss and high switching speed. However, their limited reliability and low robustness at extreme operating conditions such as repetitive short-circuits are now hindering their expansion in industrial applications. In this thesis, an in-depth study of the short-circuit behaviour of an exhaustive set of commercial devices, describing all the structural and technological variants involved, was carried out on a specific test bench developed during the thesis, in order to quantify their short-circuit resistance. This study highlighted both generic and singular properties of SiC semiconductors for every Mosfet version such as a dynamic gate leakage current and a failure mode by a short-circuit grid-source leading, under certain conditions of use and for certain Mosfet structures, to a self-blocking drain-source. A systematic research of the physical understanding of the observed mechanisms was carried out by an approach combining an internal technological analysis of the failed devices and a fine electrothermal modelling. A compact electrothermal modeling extended to failure mode consideration has been established and implemented in circuit software. This model was confronted with numerous experimental results describing a short-circuit cycle up to failure. This model offers an interesting analytical support and also helps the design of protection circuits. Thus, as an application, a driver equipped with a digital processing part has been designed and validated in detection mode for several short-circuit scenarios but also potentially for the detection of the degradation of the power component grid. Other more exploratory work has also been carried out in partnership with the University of Nottingham to study the impact of repeated pulse short-circuit regimes on the aging of parallel chips with dispersions. The propagation of a first failure mode from a "weak" device was also studied. This work paves the way for the design of intrinsically safe and available converters by taking advantage of the atypical and original properties of SiC semiconductors and Mosfet in particular
14

VoIP Operators : From a Carrier Point of View

Sidiropoulou, Christina January 2011 (has links)
Voice over Internet Protocol (VoIP) is a service that has recently gained a lot of attention from the telecommunications (telecom) world since both Internet service providers (ISPs) and telecommunications operators have realized the important advantages that it can offer. Although traditional telephony is well established both in the telecom world and in our daily lives, VoIP is now competing with it by offering cost savings, simplicity, and introducing new ways of communicating. Internet service providers have already started deploying efficient VoIP services for their customers and carriers are transforming their network infrastructures in order to be able to accommodate the requirements of VoIP traffic. There are a lot of essential factors that both providers and carriers have to take into consideration in order to efficiently build and operate VoIP technologies. Proper service planning and well-established monitoring and troubleshooting procedures are vital for successful VoIP service. This thesis focuses on commercial VoIP implementation at the carrier’s side and investigates how a carrier can efficiently maintain and troubleshoot their VoIP infrastructure so as to comply with the Service Level Agreements (SLAs) they have signed with their customers (ISP providers), as well as analyses proactive actions that can betaken for minimizing the resources required for customer support. As an outcome, this thesis presents efficient ways of network planning and monitoring, as well as it provides conclusions regarding what are the efficient methods for troubleshooting the carrier’s VoIP products inboth technical and organizational level. / Röst över Internet Protokoll (VoIP) är en tjänst som nyligen har fått ökad uppmärksamhet inom telekommunikations (telecom) branschen eftersom att både Internetleverantörer (ISPs) och telecom operatörer har insett vilka fördelar som tjänsten erbjuder. Även om traditionell telefoni är väl etablerad i både telecombranschen och vår vardag, så kan VoIP konkurrera genom att erbjuda kostnadsbesparingar, förenkling, och introducera nya sätt att kommunicera på. IP leverantörer har redan påbörjat lansering av effektiva VoIP tjänster till sina kunder och telecom carriers bygger om sin nätverksstruktur för att möta kraven av VoIP traffik. Det finns många faktorer att bejaka för både IP leverantörer och telecom carriers för att effektivt bygga och driva VoIP nätverk. Noggrann produktplanering och väletablerad övervakning samt felsökningsprocedurer är en vital del i en framgångsrik VoIP tjänst. Denna avhandling fokuserar på VoIP implementering hos en telecom carrier och hur en telecom carrier effektivt kan underhålla och felsöka VoIP infrastruktur för att möta de servicenivåavtal de har skrivit med sina kunder (IP leverantörer), samt analysera det förebyggande åtgärder som kan tas för att minimera de resurser som behövs till kundtjänst. Denna avhandling presenteras effektiva tillvägagångssätt för planering och övervakning samt erbjuder effektiva,teknisk och organisationella metoder för felsökning av en telecom carriers VoIP produkter.
15

Méthodes d'autoréparation proactives pour les réseaux d'opérateurs / Proactive self-healing methods for carrier networks

Vidalenc, Bruno 28 June 2012 (has links)
Les opérateurs de réseaux de télécommunications accordent une importance toute particulière à la gestion des pannes. L’implication de l'humain dans la prise de décision et l'analyse d'une quantité énorme d'alarmes et d'informations, ainsi que le caractère réactif des mécanismes de gestion des pannes, ne permettent pas la réactivité nécessaire à une gestion optimale des incidents. Pour pallier ce problème, cette thèse s'intéresse à des mécanismes proactifs qui anticipent les pannes afin d'améliorer l'efficacité de leur gestion. La mise en oeuvre, dans les équipements, de composants autonomes capables d'analyser en permanence l'état de santé du réseau permettrait de fournir une information en temps réel sur le risque de panne, nécessaire au déploiement de nouveaux mécanismes d'autoréparation proactifs. La première partie de cette thèse est donc consacrée à la définition des composants architecturaux indispensables à l'introduction de fonctions d'autoréparation proactives. Dans un deuxième temps, nous étudions et analysons en détail trois mécanismes d'autoréparation proactifs exploitant une information de risque de panne. Le premier mécanisme a pour objectif d'accélérer la convergence des protocoles de routage à état de lien en adaptant la fréquence d'envoi des messages de détection de pannes en fonction du risque de panne. Le deuxième mécanisme modifie dynamiquement les métriques de routage afin de détourner le trafic des équipements risqués et de minimiser l'impact d'une panne sur le trafic. Enfin, le dernier mécanisme s'attache aux dispositifs de protection et de restauration du protocole GMPLS afin d'adapter dynamiquement la consommation des ressources, aux risques encourus / Network providers attach a significant focus to fault-management. Indeed, availability and quality of service are highly important parameters in the competition between networks operators. Tthe involvement of human in the decision making process and the analyzing a huge amount of alarms and information, as well as the reactive nature of fault management mechanisms, do not allow the required reactivity for optimal management of incidents. This thesis focuses on proactive mechanisms which anticipate failures to improve the effectiveness of their management. Indeed, the failures are often preceded by alarms or symptomatic behaviors. Implementation, in equipment, of autonomous components capable of continuously analyzing the network health would enable to provide a real-time risk of failure information, required to deploy new proactive self-healing mechanisms. The first part of this thesis is devoted to the definition of architectural components necessary for the introduction of proactive self-healing functions. Then, in a second step, we study and analyze in detail three self-healing mechanisms exploiting a proactive risk-level of failure information. The first mechanism is designed to accelerate the convergence of link-state routing protocols by adjusting the frequency of sending failure detection messages function of the risk-level. The second mechanism dynamically tunes routing metrics in order to divert traffic flows from risky equipment and to minimize the failure incidence on traffic. Finally, the last proposition is dedicated to the recovery mechanisms of GMPLS protocol by dynamically adapting the resources consumption of recovery to the involved risks
16

Probabilistic Fault Management in Networked Systems

Steinert, Rebecca January 2014 (has links)
Technical advances in network communication systems (e.g. radio access networks) combined with evolving concepts based on virtualization (e.g. clouds), require new management algorithms in order to handle the increasing complexity in the network behavior and variability in the network environment. Current network management operations are primarily centralized and deterministic, and are carried out via automated scripts and manual interventions, which work for mid-sized and fairly static networks. The next generation of communication networks and systems will be of significantly larger size and complexity, and will require scalable and autonomous management algorithms in order to meet operational requirements on reliability, failure resilience, and resource-efficiency. A promising approach to address these challenges includes the development of probabilistic management algorithms, following three main design goals. The first goal relates to all aspects of scalability, ranging from efficient usage of network resources to computational efficiency. The second goal relates to adaptability in maintaining the models up-to-date for the purpose of accurately reflecting the network state. The third goal relates to reliability in the algorithm performance in the sense of improved performance predictability and simplified algorithm control. This thesis is about probabilistic approaches to fault management that follow the concepts of probabilistic network management (PNM). An overview of existing network management algorithms and methods in relation to PNM is provided. The concepts of PNM and the implications of employing PNM-algorithms are presented and discussed. Moreover, some of the practical differences of using a probabilistic fault detection algorithm compared to a deterministic method are investigated. Further, six probabilistic fault management algorithms that implement different aspects of PNM are presented. The algorithms are highly decentralized, adaptive and autonomous, and cover several problem areas, such as probabilistic fault detection and controllable detection performance; distributed and decentralized change detection in modeled link metrics; root-cause analysis in virtual overlays; event-correlation and pattern mining in data logs; and, probabilistic failure diagnosis. The probabilistic models (for a large part based on Bayesian parameter estimation) are memory-efficient and can be used and re-used for multiple purposes, such as performance monitoring, detection, and self-adjustment of the algorithm behavior. / <p>QC 20140509</p>
17

Analyse et intercomparaison des choix techniques majeurs en terme de structures de réseau et de règles d'exploitation parmi les grands distributeurs d'électricité / Analysis and intercomparison of major technical choices in terms of grid structure and operation practices among large distribution companies

Emelin, Samuel 31 March 2014 (has links)
Confronté à un contexte de stagnation globale de la consommation d'électricité mais avec un potentiel important de développement de nouveaux usages, ainsi qu'à l'apparition d'unités de production dispersée sur le territoire, le principal gestionnaire de réseau de distribution français a besoin d'expliciter ce que sont ses grands choix de structure et de règles d'exploitation, et de les comparer avec les distributeurs étrangers. Les principes de construction du réseau ont un impact sur la capacité à intégrer les nouvelles installations de consommation ou de production à moindre cout, et à assurer les exigences sociétales,notamment pour ce qui concerne la continuité de fourniture. Cette thèse permet de comparer ces choix techniques majeurs avec les pratiques rencontrées à l'étranger, pour situer le réseau français et ses spécificités dans un environnement technique international.Après avoir arrêté une perspective de développement des usages et des productions en France sur la base notamment des textes législatifs, l'architecture du réseau de distribution français est décrite. Les différences fonctionnelles de choix de structure dans le monde sont alors analysés, en soulignant les conséquences dans le dimensionnement par rapport au cas français. Puis l'équilibre entre niveaux de tension est questionné, ainsi que les effet des caractéristiques du territoire sur le réseau. Enfin, des choix techniques nouveaux sont proposés après analyse des points forts et des points faibles des variantes existant dans le monde. / Faced to a context of a global lack of growth in electricity consumption, but with many potential development of new uses, added with the appearance of more and more dispersed generation, the main French distribution grid utility needs to question and compare its choicesconcerning grid structure and exploitation practices. Grid building principles have a greatimpact over its capacity to integrate at low cost consumption and production facilities, whilemeeting society needs, mainly continuity of supply. This work allows to compare thosetechnical choices between France and other countries, to determine where French utilitystands and what are its specific features in a worldwide technical environment.After setting a perspective about uses and production development, mainly on the basis of French legislation, the overall distribution grid architecture is described. Functional differences in structure choices in the world are then analysed, their consequences in thesizing of equipments is underlined. Then the equilibrium between voltage levels is questioned,as the effect of territorial features on the grid. Finally, new technical choices are proposed after advantages and drawbacks analysis of existing world grids.
18

Integrated Network Management Using Extended Blackboard Architecture

Prem Kumar, G 07 1900 (has links) (PDF)
No description available.
19

On monitoring and fault management of next generation networks

Shi, Lei 04 November 2010 (has links)
No description available.
20

Efficient end-to-end monitoring for fault management in distributed systems / La surveillance efficace de bout-à-bout pour la gestion des pannes dans les systèmes distribués

Feng, Dawei 27 March 2014 (has links)
Dans cette thèse, nous présentons notre travail sur la gestion des pannes dans les systèmes distribués, avec comme motivation principale le suivi de fautes et de changements brusques dans de grands systèmes informatiques comme la grille et le cloud.Au lieu de construire une connaissance complète a priori du logiciel et des infrastructures matérielles comme dans les méthodes traditionnelles de détection ou de diagnostic, nous proposons d'utiliser des techniques spécifiques pour effectuer une surveillance de bout en bout dans des systèmes de grande envergure, en laissant les détails inaccessibles des composants impliqués dans une boîte noire.Pour la surveillance de pannes d'un système distribué, nous modélisons tout d'abord cette application basée sur des sondes comme une tâche de prédiction statique de collaboration (CP), et démontrons expérimentalement l'efficacité des méthodes de CP en utilisant une méthode de la max margin matrice factorisation. Nous introduisons en outre l’apprentissage actif dans le cadre de CP et exposons son avantage essentiel dans le traitement de données très déséquilibrées, ce qui est particulièrement utile pour identifier la class de classe de défaut de la minorité.Nous étendons ensuite la surveillance statique de défection au cas séquentiel en proposant la méthode de factorisation séquentielle de matrice (SMF). La SMF prend une séquence de matrices partiellement observées en entrée, et produit des prédictions comportant des informations à la fois sur les fenêtres temporelles actuelle et passé. L’apprentissage actif est également utilisé pour la SMF, de sorte que les données très déséquilibrées peuvent être traitées correctement. En plus des méthodes séquentielles, une action de lissage pris sur la séquence d'estimation s'est avérée être une astuce pratique utile pour améliorer la performance de la prédiction séquentielle.Du fait que l'hypothèse de stationnarité utilisée dans le surveillance statique et séquentielle devient irréaliste en présence de changements brusques, nous proposons un framework en ligne semi-Supervisé de détection de changement (SSOCD) qui permette de détecter des changements intentionnels dans les données de séries temporelles. De cette manière, le modèle statique du système peut être recalculé une fois un changement brusque est détecté. Dans SSOCD, un procédé hors ligne non supervisé est proposé pour analyser un échantillon des séries de données. Les points de changement ainsi détectés sont utilisés pour entraîner un modèle en ligne supervisé, qui fournit une décision en ligne concernant la détection de changement à parti de la séquence de données en entrée. Les méthodes de détection de changements de l’état de l’art sont utilisées pour démontrer l'utilité de ce framework.Tous les travaux présentés sont vérifiés sur des ensembles de données du monde réel. Plus précisément, les expériences de surveillance de panne sont effectuées sur un ensemble de données recueillies auprès de l’infrastructure de grille Biomed faisant partie de l’European Grid Initiative et le framework de détection de changement brusque est vérifié sur un ensemble de données concernant le changement de performance d'un site en ligne ayant un fort trafic. / In this dissertation, we present our work on fault management in distributed systems, with motivating application roots in monitoring fault and abrupt change of large computing systems like the grid and the cloud. Instead of building a complete a priori knowledge of the software and hardware infrastructures as in conventional detection or diagnosis methods, we propose to use appropriate techniques to perform end-To-End monitoring for such large scale systems, leaving the inaccessible details of involved components in a black box.For the fault monitoring of a distributed system, we first model this probe-Based application as a static collaborative prediction (CP) task, and experimentally demonstrate the effectiveness of CP methods by using the max margin matrix factorization method. We further introduce active learning to the CP framework and exhibit its critical advantage in dealing with highly imbalanced data, which is specially useful for identifying the minority fault class.Further we extend the static fault monitoring to the sequential case by proposing the sequential matrix factorization (SMF) method. SMF takes a sequence of partially observed matrices as input, and produces predictions with information both from the current and history time windows. Active learning is also employed to SMF, such that the highly imbalanced data can be coped with properly. In addition to the sequential methods, a smoothing action taken on the estimation sequence has shown to be a practically useful trick for enhancing sequential prediction performance.Since the stationary assumption employed in the static and sequential fault monitoring becomes unrealistic in the presence of abrupt changes, we propose a semi-Supervised online change detection (SSOCD) framework to detect intended changes in time series data. In this way, the static model of the system can be recomputed once an abrupt change is detected. In SSOCD, an unsupervised offline method is proposed to analyze a sample data series. The change points thus detected are used to train a supervised online model, which gives online decision about whether there is a change presented in the arriving data sequence. State-Of-The-Art change detection methods are employed to demonstrate the usefulness of the framework.All presented work is verified on real-World datasets. Specifically, the fault monitoring experiments are conducted on a dataset collected from the Biomed grid infrastructure within the European Grid Initiative, and the abrupt change detection framework is verified on a dataset concerning the performance change of an online site with large amount of traffic.

Page generated in 0.0538 seconds