Return to search

Fault Diagnosis in Enterprise Software Systems Using Discrete Monitoring Data

Success for many businesses depends on their information software systems.
Keeping these systems operational is critical, as failure in these systems is
costly. Such systems are in many cases sophisticated, distributed and
dynamically composed.

To ensure high availability and correct operation, it is essential that
failures be detected promptly, their causes diagnosed and remedial actions
taken. Although automated recovery approaches exists for specific problem
domains, the problem-resolution process is in many cases manual and painstaking.
Computer support personnel put a great deal of effort into resolving the reported
failures. The growing size and complexity of these systems creates the need to
automate this process.

The primary focus of our research is on automated fault diagnosis and recovery
using discrete monitoring data such as log files and notifications. Our goal is
to quickly pinpoint the root-cause of a failure. Our contributions are:
Modelling discrete monitoring data for automated analysis, automatically leveraging common symptoms of failures from historic
monitoring data using such models to pinpoint faults, and providing a model for decision-making under uncertainty such that
appropriate recovery actions are chosen.

Failures in such systems are caused by software defects, human error, hardware
failures, environmental conditions and malicious behaviour. Our primary focus
in this thesis is on software defects and misconfiguration.

Identiferoai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/6757
Date18 May 2012
CreatorsReidemeister, Thomas
Source SetsUniversity of Waterloo Electronic Theses Repository
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation

Page generated in 0.0017 seconds