Global ETD Search

Return to search

Using semantic knowledge to improve compression on log files

With the move towards global and multi-national companies, information technology infrastructure requirements are increasing. As the size of these computer networks increases, it becomes more and more difficult to monitor, control, and secure them. Networks consist of a number of diverse devices, sensors, and gateways which are often spread over large geographical areas. Each of these devices produce log files which need to be analysed and monitored to provide network security and satisfy regulations. Data compression programs such as gzip and bzip2 are commonly used to reduce the quantity of data for archival purposes after the log files have been rotated. However, there are many other compression programs which exist - each with their own advantages and disadvantages. These programs each use a different amount of memory and take different compression and decompression times to achieve different compression ratios. System log files also contain redundancy which is not necessarily exploited by standard compression programs. Log messages usually use a similar format with a defined syntax. In the log files, all the ASCII characters are not used and the messages contain certain "phrases" which often repeated. This thesis investigates the use of compression as a means of data reduction and how the use of semantic knowledge can improve data compression (also applying results to different scenarios that can occur in a distributed computing environment). It presents the results of a series of tests performed on different log files. It also examines the semantic knowledge which exists in maillog files and how it can be exploited to improve the compression results. The results from a series of text preprocessors which exploit this knowledge are presented and evaluated. These preprocessors include: one which replaces the timestamps and IP addresses with their binary equivalents and one which replaces words from a dictionary with unused ASCII characters. In this thesis, data compression is shown to be an effective method of data reduction producing up to 98 percent reduction in filesize on a corpus of log files. The use of preprocessors which exploit semantic knowledge results in up to 56 percent improvement in overall compression time and up to 32 percent reduction in compressed size. / TeX / pdfTeX-1.40.3

Computer networks

Data compression (Computer science)

Semantics--Data processing

Identifer	oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:rhodes/vital:4650
Date	19 November 2008
Creators	Otten, Frederick John
Publisher	Rhodes University, Faculty of Science, Computer Science
Source Sets	South African National ETD Portal
Language	English
Detected Language	English
Type	Thesis, Masters, MSc
Format	240 p., pdf
Rights	Otten, Frederick John

Page generated in 0.0021 seconds

Using semantic knowledge to improve compression on log files

Description

Links & Downloads

Tags

Additional Fields