Global ETD Search

21	Evaluation of Internet search tools instrument design Saunders, Tana 03 1900 (has links) Thesis (MPhil)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: This study investigated Internet search tools / engines to identify desirable features that can be used as a benchmark or standard to evaluate web search engines. In the past, the Internet was thought of as a big spider's web, ultimately connecting all the bits of information. It has now become clear that this is not the case, and that the bow tie analogy is more accurate. This analogy suggests that there is a central core of well-connected pages, with links IN and OUT to other pages, tendrils and orphan pages. This emphasizes the importance of selecting a search tool that is well connected and linked to the central core. Searchers must take into account that not all search tools search the Invisible Web and this will reflect on the search tool selected. Not all information found on the Web and Internet is reliable, current and accurate, and Web information must be evaluated in terms of authority, currency, bias, purpose of the Web site, etc. Different kinds of search tools are available on the Internet, such as search engines, directories, library gateways, portals, intelligent agents, etc. These search tools were studied and explored. A new categorization for online search tools consisting of Intelligent Agents, Search Engines, Directories and Portals / Hubs is suggested. This categorization distinguishes the major differences between the 21 kinds of search tools studied. Search tools / engines consist of spiders, crawlers, robots, indexes and search tool software. These search tools can be further distinguished by their scope, internal or external searches and whether they search Web pages or Web sites. Most search tools operate within a relationship with other search tools, and they often share results, spiders and databases. This relationship is very dynamic. The major international search engines have identifiable search features. The features of Google, Yahoo, Lycos and Excite were studied in detail. Search engines search for information in different ways, and present their results differently. These characteristics are critical to the Recall/Precision ratio. A well-planned search strategy will improve the Precision/Recall ratio and consider the web-user capabilities and needs. Internet search tools/engines is not a panacea for all information needs, and have pros and cons. The Internet search tool evaluation instrument was developed based on desirable features of the major search tools, and is considered a benchmark or standard for Internet search tools. This instrument, applied to three South African search tools, provided insight into the capabilities of the local search tools compared to the benchmark suggested in this study. The study concludes that the local search engines compare favorably with the major ones, but not enough so to use them exclusively. Further research into this aspect is needed. Intelligent agents are likely to become more popular, but the only certainty in the future of Internet search tools is change, change, and change. / AFRIKAANSE OPSOMMING: Hierdie studie het Internetsoekinstrumente/-enjins ondersoek met die doel om gewenste eienskappe te identifiseer wat as 'n standaard kan dien om soekenjins te evalueer. In die verlede is die Internet gesien as 'n groot spinnerak, wat uiteindelik al die inligtingsdeeltjies verbind. Dit het egter nou duidelik geword dat dit glad nie die geval is nie, en dat die strikdas analogie meer akkuraat is. Hierdie analogie stel voor dat daar 'n sentrale kern van goed gekonnekteerde bladsye is, met skakels IN en UIT na ander bladsye, tentakels en weesbladsye. Dit beklemtoon die belangrikheid om die regte soekinstrument te kies, naamlik een wat goed gekonnekteer is, en geskakel is met die sentrale kern van dokumente. Soekers moet in gedagte hou dat nie alle soekenjins in die Onsigbare Web soek nie, en dit behoort weerspieël te word in die keuse van die soekinstrument. Nie alle inligting wat op die Web en Internet gevind word is betroubaar, op datum en akkuraat nie, en Web-inligting moet geëvalueer word in terme van outoriteit, tydigheid, vooroordeel, doel van die Webruimte, ens. Verskillende soorte soekinstrumente is op die Internet beskikbaar, soos soekenjins, gidse, biblioteekpoorte, portale, intelligente agente, ens. Hierdie soekinstrumente is bestudeer en verken. 'n Nuwe kategorisering vir aanlyn soekinstrumente bestaande uit Intelligente Agente, Soekinstrumente, Gidse en Portale/Middelpunte word voorgestel. Hierdie kategorisering onderskei die hoofverskille tussen die 21 soorte soekinstrumente wat bestudeer is. Soekinstrumente/-enjins bestaan uit spinnekoppe, kruipers, robotte, indekse en soekinstrument sagteware. Hierdie soekinstrumente kan verder onderskei word deur hulle omvang, interne of eksterne soektogte en of hulle op Webbladsye of Webruimtes soek. Die meeste soekinstrumente werk in verhouding met ander soekinstrumente, en hulle deel dikwels resultate, spinnekoppe en databasisse. Hierdie verhouding is baie dinamies. Die hoof internasionale soekenjins het soekeienskappe wat identifiseerbaar is. Die eienskappe van Google, Yahoo en Excite is in besonderhede bestudeer. Soekenjins soek op verskillende maniere na inligting, en lê hulle resultate verskillend voor. Hierdie karaktereienskappe is krities vir die Herwinning/Presisie verhouding. 'n Goedbeplande soekstrategie sal die Herwinning/Presisie verhouding verbeter. Internet soekinstrumente/-enjins is nie die wondermiddel vir alle inligtingsbehoeftes nie, en het voor- en nadele. Die Internet soekinstrument evalueringsmeganisme se ontwikkeling is gebaseer op gewenste eienskappe van die hoof soekinstrumente, en word beskou as 'n standaard vir Internet soekinstrumente. Hierdie instrument, toegepas op drie Suid-Afrikaanse soekenjins, het insae verskaf in die doeltreffendheid van die plaaslike soekinstrumente soos vergelyk met die standaard wat in hierdie studie voorgestel word. In die studie word tot die slotsom gekom dat die plaaslike soekenjins gunstig vergelyk met die hoof soekenjins, maar nie genoegsaam sodat hulle eksklusief gebruik kan word nie. Verdere navorsing oor hierdie aspek is nodig. Intelligente Agente sal waarskynlik meer gewild word, maar die enigste sekerheid vir die toekoms van Internet soekinstrumente is verandering, verandering en nogmaals verandering. Internet Internet searching Search engines Web search engines Dissertations -- Information science Theses -- Information science
22	Capturing semantics using a link analysis based concept extractor approach Kulkarni, Swarnim January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / The web contains a massive amount of information and is continuously growing every day. Extracting information that is relevant to a user is an uphill task. Search engines such as Google TM, Yahoo! TM have made the task a lot easier and have indeed made people much more "smarter". However, most of the existing search engines still rely on the traditional keyword-based searching techniques i.e. returning documents that contain the keywords in the query. They do not take the associated semantics into consideration. To incorporate semantics into search, one could proceed in at least two ways. Firstly, we could plunge into the world of "Semantic Web", where the information is represented in formal formats such as RDF, N3 etc which can effectively capture the associated semantics in the documents. Secondly, we could try to explore a new semantic world in the existing structure of World Wide Web (WWW). While the first approach can be very effective when semantic information is available in RDF/N3 formats, for many web pages such information is not readily available. This is why we consider the second approach in this work. In this work, we attempt to capture the semantics associated with a query by rst extracting the concepts relevant to the query. For this purpose, we propose a novel Link Analysis based Concept Extractor (LACE) that extract the concepts associated with the query by exploiting the meta data of a web page. Next, we propose a method to determine relationships between a query and its extracted concepts. Finally, we show how LACE can be used to compute a statistical measure of semantic similarity between concepts. At each step, we evaluate our approach by comparison with other existing techniques (on benchmark data sets, when available) and show that our results are competitive with existing state of the art results or even outperform them. Semantic Web Concept Search Engines Semantic Relationships Computer Science (0984)
23	Semantically-enhanced image tagging system Rahuma, Awatef January 2013 (has links) In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags. 005.1
24	Adaptive Comparison-Based Algorithms for Evaluating Set Queries Mirzazadeh, Mehdi January 2004 (has links) In this thesis we study a problem that arises in answering boolean queries submitted to a search engine. Usually a search engine stores the set of IDs of documents containing each word in a pre-computed sorted order and to evaluate a query like "computer AND science" the search engine has to evaluate the union of the sets of documents containing the words "computer" and "science". More complex queries will result in more complex set expressions. In this thesis we consider the problem of evaluation of a set expression with union and intersection as operators and ordered sets as operands. We explore properties of comparison-based algorithms for the problem. A <i>proof of a set expression</i> is the set of comparisons that a comparison-based algorithm performs before it can determine the result of the expression. We discuss the properties of the proofs of set expressions and based on how complex the smallest proofs of a set expression <i>E</i> are, we define a measurement for determining how difficult it is for <i>E</i> to be computed. Then, we design an algorithm that is adaptive to the difficulty of the input expression and we show that the running time of the algorithm is roughly proportional to difficulty of the input expression, where the factor is roughly logarithmic in the number of the operands of the input expression. Computer Science Adaptive algorithm comparison-based algorithm search engines algorithms
25	Finding structure and characteristic of web documents for classification. January 2000 (has links) by Wong, Wai Ching. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 91-94). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgments --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Semistructured Data --- p.2 / Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 / Chapter 1.2.1 --- Labels and Values --- p.4 / Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 / Chapter 1.2.3 --- Classifying A Web Page --- p.6 / Chapter 1.3 --- Organization of the Thesis --- p.8 / Chapter 2 --- Background --- p.8 / Chapter 2.1 --- Related Work on Web Data --- p.8 / Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 / Chapter 2.1.2 --- Schema Extraction --- p.11 / Chapter 2.1.3 --- Discovering Typical Structure --- p.15 / Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 / Chapter 2.2 --- Automatic Text Processing --- p.19 / Chapter 2.2.1 --- Stopwords Elimination --- p.19 / Chapter 2.2.2 --- Stemming --- p.20 / Chapter 3 --- Web Data Definition --- p.22 / Chapter 3.1 --- Web Page --- p.22 / Chapter 3.2 --- Problem Description --- p.27 / Chapter 4 --- Hierarchical Structure --- p.32 / Chapter 4.1 --- Types of HTML Tags --- p.33 / Chapter 4.2 --- Tag-tree --- p.36 / Chapter 4.3 --- Hierarchical Structure Construction --- p.41 / Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 / Chapter 5 --- Similar Labels Discovery --- p.53 / Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 / Chapter 5.2 --- Labels Discovery Algorithm --- p.55 / Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 / Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 / Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 / Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 / Chapter 5.3.1 --- Phase 1 Results --- p.75 / Chapter 5.3.2 --- Phase 2 Results --- p.77 / Chapter 5.3.3 --- Phase 3 Results --- p.81 / Chapter 5.4 --- Classifying a Web Page --- p.83 / Chapter 5.4.1 --- Similarity Measurement --- p.84 / Chapter 5.4.2 --- Performance Evaluation --- p.86 / Chapter 6 --- Conclusion --- p.89 World Wide Web Information organization Web search engines
26	A Nearest-Neighbor Approach to Indicative Web Summarization Petinot, Yves January 2016 (has links) Through their role of content proxy, in particular on search engine result pages, Web summaries play an essential part in the discovery of information and services on the Web. In their simplest form, Web summaries are snippets based on a user-query and are obtained by extracting from the content of Web pages. The focus of this work, however, is on indicative Web summarization, that is, on the generation of summaries describing the purpose, topics and functionalities of Web pages. In many scenarios — e.g. navigational queries or content-deprived pages — such summaries represent a valuable commodity to concisely describe Web pages while circumventing the need to produce snippets from inherently noisy, dynamic, and structurally complex content. Previous approaches have identified linking pages as a privileged source of indicative content from which Web summaries may be derived using traditional extractive methods. To be reliable, these approaches require sufficient anchortext redundancy, ultimately showing the limits of extractive algorithms for what is, fundamentally, an abstractive task. In contrast, we explore the viability of abstractive approaches and propose a nearest-neighbors summarization framework leveraging summaries of conceptually related (neighboring) Web pages. We examine the steps that can lead to the reuse and adaptation of existing summaries to previously unseen pages. Specifically, we evaluate two Text-to-Text transformations that cover the main types of operations applicable to neighbor summaries: (1) ranking, to identify neighbor summaries that best fit the target; (2) target adaptation, to adjust individual neighbor summaries to the target page based on neighborhood-specific template-slot models. For this last transformation, we report on an initial exploration of the use of slot-driven compression to adjust adapted summaries based on the confidence associated with token-level adaptation operations. Overall, this dissertation explores a new research avenue for indicative Web summarization and shows the potential value, given the diversity and complexity of the content of Web pages, of transferring, and, when necessary, of adapting, existing summary information between conceptually similar Web pages. Information retrieval Web search engines Internet searching Computer science
27	Cross-media meta-search engine. January 2005 (has links) Cheng Tung Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 136-141). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.1.1 --- Information Retrieval --- p.1 / Chapter 1.1.2 --- Search Engines --- p.2 / Chapter 1.1.3 --- Data Merging --- p.3 / Chapter 1.2 --- Meta-search Engines --- p.3 / Chapter 1.2.1 --- Framework and Techniques Employed --- p.3 / Chapter 1.2.2 --- Advantages of meta-searching --- p.8 / Chapter 1.3 --- Contribution of the Thesis --- p.10 / Chapter 1.4 --- Organization of the Thesis --- p.12 / Chapter 2 --- Literature Review --- p.14 / Chapter 2.1 --- Preliminaries --- p.14 / Chapter 2.2 --- Fusion Methods --- p.15 / Chapter 2.2.1 --- Fusion methods based on a document's score --- p.15 / Chapter 2.2.2 --- Fusion methods based on a document's ranking position --- p.23 / Chapter 2.2.3 --- Fusion methods based on a document's URL title and snippets --- p.30 / Chapter 2.2.4 --- Fusion methods based on a document's entire content --- p.40 / Chapter 2.3 --- Comparison of the Fusion Methods --- p.42 / Chapter 2.4 --- Relevance Feedback --- p.46 / Chapter 3 --- Research Methodology --- p.48 / Chapter 3.1 --- Investigation of the features of the retrieved results from the search engines --- p.48 / Chapter 3.2 --- Types of relationships --- p.53 / Chapter 3.3 --- Order of Strength of the Relationships --- p.64 / Chapter 3.3.1 --- Derivation of the weight for each kind of relationship (criterion) --- p.68 / Chapter 3.4 --- Observation of the relationships between retrieved objects and the effects of these relationships on the relevance of objects --- p.69 / Chapter 3.4.1 --- Observation on the relationships existed in items that are irrelevant and relevant to the query --- p.68 / Chapter 3.5 --- Proposed re-ranking algorithms --- p.89 / Chapter 3.5.1 --- Original re-ranking algorithm (before modification) --- p.91 / Chapter 3.5.2 --- Modified re-ranking algorithm (after modification) --- p.95 / Chapter 4 --- Evaluation Methodology and Experimental Results --- p.101 / Chapter 4.1 --- Objective --- p.101 / Chapter 4.2 --- Experimental Design and Setup --- p.101 / Chapter 4.2.1 --- Preparation of data --- p.101 / Chapter 4.3 --- Evaluation Methodology --- p.104 / Chapter 4.3.1 --- Evaluation of the relevance of a document to the corresponding query --- p.104 / Chapter 4.3.2 --- Performance Measures of the Evaluation --- p.105 / Chapter 4.4 --- Experimental Results and Interpretation --- p.106 / Chapter 4.4.1 --- Precision --- p.107 / Chapter 4.4.2 --- Recall --- p.107 / Chapter 4.4.3 --- F-measure --- p.108 / Chapter 4.4.4 --- Overall evaluation results for the ten queries for each evaluation tool --- p.110 / Chapter 4.4.5 --- Discussion --- p.123 / Chapter 4.5 --- Degree of difference between the performance of systems --- p.124 / Chapter 4.5.1 --- Analysis using One-Way ANOVA --- p.124 / Chapter 4.5.2 --- Analysis using paired samples T-test --- p.126 / Chapter 5 --- Conclusion --- p.131 / Chapter 5.1 --- "Implications, Limitations, and Future Work" --- p.131 / Chapter 5.2 --- Conclusions --- p.133 / Bibliography --- p.134 / Chapter A --- Paired samples T-test for F-measures of systems retrieving all media's items --- p.140 Web search engines Multimedia systems Internet searching Computer algorithms
28	Unsupervised extraction and normalization of product attributes from web pages. January 2010 (has links) Xiong, Jiani. / "July 2010." / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (p. 59-63). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Motivation --- p.4 / Chapter 1.3 --- Our Approach --- p.8 / Chapter 1.4 --- Potential Applications --- p.12 / Chapter 1.5 --- Research Contributions --- p.13 / Chapter 1.6 --- Thesis Organization --- p.15 / Chapter 2 --- Literature Survey --- p.16 / Chapter 2.1 --- Supervised Extraction Approaches --- p.16 / Chapter 2.2 --- Unsupervised Extraction Approaches --- p.19 / Chapter 2.3 --- Attribute Normalization --- p.21 / Chapter 2.4 --- Integrated Approaches --- p.22 / Chapter 3 --- Problem Definition and Preliminaries --- p.24 / Chapter 3.1 --- Problem Definition --- p.24 / Chapter 3.2 --- Preliminaries --- p.27 / Chapter 3.2.1 --- Web Pre-processing --- p.27 / Chapter 3.2.2 --- Overview of Our Framework --- p.31 / Chapter 3.2.3 --- Background of Graphical Models --- p.32 / Chapter 4 --- Our Proposed Framework --- p.36 / Chapter 4.1 --- Our Proposed Graphical Model --- p.36 / Chapter 4.2 --- Inference --- p.41 / Chapter 4.3 --- Product Attribute Information Determination --- p.47 / Chapter 5 --- Experiments and Results --- p.49 / Chapter 6 --- Conclusion --- p.57 / Bibliography --- p.59 / Chapter A --- Dirichlet Process --- p.64 / Chapter B --- Hidden Markov Models --- p.68 Data mining--Mathematical models Search engines
29	Doctoral students’ mental models of a web search engine : an exploratory study Li, Ping, 1965- January 2007 (has links) No description available. Google College students -- Psychology. Search engines.
30	Efficient Index Maintenance for Text Databases Lester, Nicholas, nml@cs.rmit.edu.au January 2006 (has links) All practical text search systems use inverted indexes to quickly resolve user queries. Offline index construction algorithms, where queries are not accepted during construction, have been the subject of much prior research. As a result, current techniques can invert virtually unlimited amounts of text in limited main memory, making efficient use of both time and disk space. However, these algorithms assume that the collection does not change during the use of the index. This thesis examines the task of index maintenance, the problem of adapting an inverted index to reflect changes in the collection it describes. Existing approaches to index maintenance are discussed, including proposed optimisations. We present analysis and empirical evidence suggesting that existing maintenance algorithms either scale poorly to large collections, or significantly degrade query resolution speed. In addition, we propose a new strategy for index maintenance that trades a strictly controlled amount of querying efficiency for greatly increased maintenance speed and scalability. Analysis and empirical results are presented that show that this new algorithm is a useful trade-off between indexing and querying efficiency. In scenarios described in Chapter 7, the use of the new maintenance algorithm reduces the time required to construct an index to under one sixth of the time taken by algorithms that maintain contiguous inverted lists. In addition to work on index maintenance, we present a new technique for accumulator pruning during ranked query evaluation, as well as providing evidence that existing approaches are unsatisfactory for collections of large size. Accumulator pruning is a key problem in both querying efficiency and overall text search system efficiency. Existing approaches either fail to bound the memory footprint required for query evaluation, or suffer loss of retrieval accuracy. In contrast, the new pruning algorithm can be used to limit the memory footprint of ranked query evaluation, and in our experiments gives retrieval accuracy not worse than previous alternatives. The results presented in this thesis are validated with robust experiments, which utilise collections of significant size, containing real data, and tested using appropriate numbers of real queries. The techniques presented in this thesis allow information retrieval applications to efficiently index and search changing collections, a task that has been historically problematic. text indexing search engines index construction index update accumulator pruning

Search results