This thesis is submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Ph.D) in the FASTAR group of the Department of Computer Science, University of Pretoria, South Africa. I present a number of algorithms for constructing minimal acyclic deterministic finite automata (MADFAs), most of which I originally derived/designed or co-discovered. Being acyclic, such automata represent finite languages and have proven useful in applications such as spellchecking, virus-searching and text indexing. In many of those applications, the automata grow to billions of states, making them difficult to store without using various compression techniques — the most important of which is minimization. Results from the late 1950’s show that minimization yields a unique automaton (for a given language), and later results show that minimization of acyclic automata is possible in time linear in the number of states. These two results make for a rich area of algorithmics research; automata and algorithmics research are relatively old fields of computing science and the discovery/invention of new algorithms in the field is an exciting result. I present both incremental and nonincremental algorithms. With nonincremental techniques, the unminimized acyclic deterministic finite automaton (ADFA) is first constructed and then minimized. As mentioned above, the unminimized ADFA can be very large indeed — often even too large to fit within the virtual memory space of the computer. As a result, incremental techniques for minimization (i.e. the ADFA is minimized during its construction) become interesting. Incremental algorithms frequently have some overhead: if the unminimized ADFA fits easily within physical memory, it may still be faster to use nonincremental techniques. The presentation used in this thesis has a few unusual characteristics: <ul><li> Few other presentations follow a correctness-by-construction style for presenting and deriving algorithms. The presentations given here include correctness arguments or sketches thereof. </li><li> The presentation is taxonomic — emphasizing the similarities and differences between the algorithms at a fundamental level. </li><li> While it is possible to present these algorithms in a formal-language-theoretic setting, this thesis remains somewhat closer to the actual implementation issues. </li><li> In several chapters, new algorithms and interesting new variants of existing algorithms are presented. </li><li> It gives new presentations of many existing algorithms — all in a common format with common examples. </li><li> There are extensive links to the existing literature. </li></ul> / Thesis (PhD)--University of Pretoria, 2010. / Computer Science / unrestricted
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:up/oai:repository.up.ac.za:2263/23648 |
Date | 30 March 2011 |
Creators | Watson, Bruce William |
Contributors | Kourie, Derrick G., bruce@fastar.org |
Publisher | University of Pretoria |
Source Sets | South African National ETD Portal |
Detected Language | English |
Type | Thesis |
Rights | © 2010 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. |
Page generated in 0.0021 seconds