An important sub-field of machine learning is the inductive formation of a pertinent class description. Given a collection of positive and negative examples of the concept, the aim is to create a description not only capable of correctly classifying the training examples, but one able to be used predictively on unseen examples. This thesisi nvestigatesth e problemo f inductivec onceptf ormation in poorly understood domains.M any well-understoodp roblemse xist wheret he individual attribute-valueuss ed to describee xamplesv ary systematicallyw ith categorym embership.O ften this meanst hat such descriptions are sufficient to identify significant regularities in the concept. In contrastm, anyr eal-worldp roblemsa rep oorly understoodi,. e . examplesa re describedb y a relatively large number of seemingly irrelevant attributes (because expertise is often unavailablet o specify a suitable level of abstractionw hen measurementas re initially recorded). The fundamentaal ssumptionb eing that when combinedi n somew ay, these attributes are complete enough to identify the target concept. This initial language insufficiency,o ften causedb y concealeda ttributei nteractionp resentsp roblemsf or many current induction algorithms which typically reply on uncovering simpler correlations. For all but the simplestp roblems,t he combinatoriale xplosiona ssociatedw ith unconstrained hypothesis generation means that the inductive process must employ more intelligent mechanisms. A two-stage solution is proposed based on first identifying whether the initial problem formulation has the potential to cause difficulties for typical inductive learners. A qualitative measure based on a novel information theoretic function is used to gauge the absence of conditional dependencies between attributes. This approach is compared to other current identification measures, in particular a bias towards misleading estimates of concept difficulty due to irrelevant attributes is addressed. Once the level of attribute interaction has been estimated one of two learning components is selected for acquiring the relevant concept. For low to moderate degrees of attribute interaction, a general-to-specific beam search is utilised. However this mechanism focuses the induction process on the most promising hypotheses by utilising relative assessment measures i. e. the degree with which a specialised hypothesis improves with respect to its constituent parts. This relative improvement becomes increasingly important as conditional dependencies increase. In addition, a pair of relative bounds are calculated for each hypothesis based on the assessmenht euristic used for validation whilst learning. These bounds place limits on the number of negative examples a hypothesis can cover and still outperform its best constituent part. These bounds result in a substantial reduction in the number of poor hypotheses generated during concept formation. For extremel evels of featurei nteraction,a specific-to-generagl reedy searcht echniquei s employed. This approach is more likely to uncover hidden interactions than approaches that begin hypothesisf ormationb asedo n one-dimensionapl rojections. This combination of search direction and a heuristic based on Minimum Description Length, ensures that highly conditional dependenciecsa n be pinpointed. In addition a number of speedup operatorsa re developedw hich curtail the numbero f tentativeh ypothesesg enerateda nd alsor esulti n fewerp roblemsa ssociatedw ith local searchs pacem inima.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:343396 |
Date | January 1999 |
Creators | Nazar, Kamal |
Publisher | University of Portsmouth |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Page generated in 0.0016 seconds