Return to search

Machine learning, data mining, and the World Wide Web : design of special-purpose search engines

Thesis (MSc)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: We present DEADLINER, a special-purpose search engine that indexes conference and workshop
announcements, and which extracts a range of academic information from the Web. SVMs provide
an efficient and highly accurate mechanism for obtaining relevant web documents. DEADLINER
currently extracts speakers, locations (e.g. countries), dates, paper submission (and other) deadlines,
topics, program committees, abstracts, and affiliations. Complex and detailed searches are
possible on these fields. The niche search engine was constructed by employing a methodology
for rapid implementation of specialised search engines. Bayesian integration of simple extractors
provides this methodology, that avoids complex hand-tuned text extraction methods. The simple
extractors exploit loose formatting and keyword conventions. The Bayesian framework further produces
a search engine where each user can control each fields false alarm rate in an intuitive and
rigorous fashion, thus providing easy-to-use metadata. / AFRIKAANSE OPSOMMING: Ons stel DEADLINER bekend: 'n soekmasjien wat konferensie en werkvergaderingsaankondigings
katalogiseer en wat uiteindelik 'n wye reeks akademiese byeenkomsmateriaal sal monitor en
onttrek uit die Web. DEAD LINER herken en onttrek tans sprekers, plekke (bv. landname), datums,
o.a. sperdatums vir die inlewering van akademiese verrigtings, onderwerpe, programkomiteë, oorsigte
of opsommings, en affiliasies. 'n Grondige soek is moontlik oor en deur hierdie velde. Die
nissoekmasjien is gebou deur gebruik te maak van 'n metodologie vir die vinnige oprigting van
spesialiteitsoekmasjiene. Die metodologie vermy komplekse instelling m.b.v. hande-arbeid van
die teksuittreksels deur gebruik te maak van Bayesiese integrering van eenvoudige ontsluiters. Die
ontsluiters buit dan styl- en gewoonte-sleutelwoorde uit. Die Bayesiese raamwerk skep hierdeur 'n
soekmasjien wat gebruikers toelaat om elke veld se kans om verkeerd te kies op 'n intuïtiewe en
deeglike manier te beheer.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/53492
Date04 1900
CreatorsKruger, Andries F
ContributorsOmlin, Christian W, Stellenbosch University. Faculty of Science. Department of Mathematical Sciences.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageEnglish
TypeThesis
Format1 v. (various pagings) : illustrations
RightsStellenbosch University

Page generated in 0.0017 seconds