Global ETD Search

31	Finding structure and characteristic of web documents for classification. January 2000 (has links) by Wong, Wai Ching. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 91-94). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgments --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Semistructured Data --- p.2 / Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 / Chapter 1.2.1 --- Labels and Values --- p.4 / Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 / Chapter 1.2.3 --- Classifying A Web Page --- p.6 / Chapter 1.3 --- Organization of the Thesis --- p.8 / Chapter 2 --- Background --- p.8 / Chapter 2.1 --- Related Work on Web Data --- p.8 / Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 / Chapter 2.1.2 --- Schema Extraction --- p.11 / Chapter 2.1.3 --- Discovering Typical Structure --- p.15 / Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 / Chapter 2.2 --- Automatic Text Processing --- p.19 / Chapter 2.2.1 --- Stopwords Elimination --- p.19 / Chapter 2.2.2 --- Stemming --- p.20 / Chapter 3 --- Web Data Definition --- p.22 / Chapter 3.1 --- Web Page --- p.22 / Chapter 3.2 --- Problem Description --- p.27 / Chapter 4 --- Hierarchical Structure --- p.32 / Chapter 4.1 --- Types of HTML Tags --- p.33 / Chapter 4.2 --- Tag-tree --- p.36 / Chapter 4.3 --- Hierarchical Structure Construction --- p.41 / Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 / Chapter 5 --- Similar Labels Discovery --- p.53 / Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 / Chapter 5.2 --- Labels Discovery Algorithm --- p.55 / Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 / Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 / Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 / Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 / Chapter 5.3.1 --- Phase 1 Results --- p.75 / Chapter 5.3.2 --- Phase 2 Results --- p.77 / Chapter 5.3.3 --- Phase 3 Results --- p.81 / Chapter 5.4 --- Classifying a Web Page --- p.83 / Chapter 5.4.1 --- Similarity Measurement --- p.84 / Chapter 5.4.2 --- Performance Evaluation --- p.86 / Chapter 6 --- Conclusion --- p.89 World Wide Web Information organization Web search engines
32	A Nearest-Neighbor Approach to Indicative Web Summarization Petinot, Yves January 2016 (has links) Through their role of content proxy, in particular on search engine result pages, Web summaries play an essential part in the discovery of information and services on the Web. In their simplest form, Web summaries are snippets based on a user-query and are obtained by extracting from the content of Web pages. The focus of this work, however, is on indicative Web summarization, that is, on the generation of summaries describing the purpose, topics and functionalities of Web pages. In many scenarios — e.g. navigational queries or content-deprived pages — such summaries represent a valuable commodity to concisely describe Web pages while circumventing the need to produce snippets from inherently noisy, dynamic, and structurally complex content. Previous approaches have identified linking pages as a privileged source of indicative content from which Web summaries may be derived using traditional extractive methods. To be reliable, these approaches require sufficient anchortext redundancy, ultimately showing the limits of extractive algorithms for what is, fundamentally, an abstractive task. In contrast, we explore the viability of abstractive approaches and propose a nearest-neighbors summarization framework leveraging summaries of conceptually related (neighboring) Web pages. We examine the steps that can lead to the reuse and adaptation of existing summaries to previously unseen pages. Specifically, we evaluate two Text-to-Text transformations that cover the main types of operations applicable to neighbor summaries: (1) ranking, to identify neighbor summaries that best fit the target; (2) target adaptation, to adjust individual neighbor summaries to the target page based on neighborhood-specific template-slot models. For this last transformation, we report on an initial exploration of the use of slot-driven compression to adjust adapted summaries based on the confidence associated with token-level adaptation operations. Overall, this dissertation explores a new research avenue for indicative Web summarization and shows the potential value, given the diversity and complexity of the content of Web pages, of transferring, and, when necessary, of adapting, existing summary information between conceptually similar Web pages. Information retrieval Web search engines Internet searching Computer science
33	Cross-media meta-search engine. January 2005 (has links) Cheng Tung Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 136-141). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.1.1 --- Information Retrieval --- p.1 / Chapter 1.1.2 --- Search Engines --- p.2 / Chapter 1.1.3 --- Data Merging --- p.3 / Chapter 1.2 --- Meta-search Engines --- p.3 / Chapter 1.2.1 --- Framework and Techniques Employed --- p.3 / Chapter 1.2.2 --- Advantages of meta-searching --- p.8 / Chapter 1.3 --- Contribution of the Thesis --- p.10 / Chapter 1.4 --- Organization of the Thesis --- p.12 / Chapter 2 --- Literature Review --- p.14 / Chapter 2.1 --- Preliminaries --- p.14 / Chapter 2.2 --- Fusion Methods --- p.15 / Chapter 2.2.1 --- Fusion methods based on a document's score --- p.15 / Chapter 2.2.2 --- Fusion methods based on a document's ranking position --- p.23 / Chapter 2.2.3 --- Fusion methods based on a document's URL title and snippets --- p.30 / Chapter 2.2.4 --- Fusion methods based on a document's entire content --- p.40 / Chapter 2.3 --- Comparison of the Fusion Methods --- p.42 / Chapter 2.4 --- Relevance Feedback --- p.46 / Chapter 3 --- Research Methodology --- p.48 / Chapter 3.1 --- Investigation of the features of the retrieved results from the search engines --- p.48 / Chapter 3.2 --- Types of relationships --- p.53 / Chapter 3.3 --- Order of Strength of the Relationships --- p.64 / Chapter 3.3.1 --- Derivation of the weight for each kind of relationship (criterion) --- p.68 / Chapter 3.4 --- Observation of the relationships between retrieved objects and the effects of these relationships on the relevance of objects --- p.69 / Chapter 3.4.1 --- Observation on the relationships existed in items that are irrelevant and relevant to the query --- p.68 / Chapter 3.5 --- Proposed re-ranking algorithms --- p.89 / Chapter 3.5.1 --- Original re-ranking algorithm (before modification) --- p.91 / Chapter 3.5.2 --- Modified re-ranking algorithm (after modification) --- p.95 / Chapter 4 --- Evaluation Methodology and Experimental Results --- p.101 / Chapter 4.1 --- Objective --- p.101 / Chapter 4.2 --- Experimental Design and Setup --- p.101 / Chapter 4.2.1 --- Preparation of data --- p.101 / Chapter 4.3 --- Evaluation Methodology --- p.104 / Chapter 4.3.1 --- Evaluation of the relevance of a document to the corresponding query --- p.104 / Chapter 4.3.2 --- Performance Measures of the Evaluation --- p.105 / Chapter 4.4 --- Experimental Results and Interpretation --- p.106 / Chapter 4.4.1 --- Precision --- p.107 / Chapter 4.4.2 --- Recall --- p.107 / Chapter 4.4.3 --- F-measure --- p.108 / Chapter 4.4.4 --- Overall evaluation results for the ten queries for each evaluation tool --- p.110 / Chapter 4.4.5 --- Discussion --- p.123 / Chapter 4.5 --- Degree of difference between the performance of systems --- p.124 / Chapter 4.5.1 --- Analysis using One-Way ANOVA --- p.124 / Chapter 4.5.2 --- Analysis using paired samples T-test --- p.126 / Chapter 5 --- Conclusion --- p.131 / Chapter 5.1 --- "Implications, Limitations, and Future Work" --- p.131 / Chapter 5.2 --- Conclusions --- p.133 / Bibliography --- p.134 / Chapter A --- Paired samples T-test for F-measures of systems retrieving all media's items --- p.140 Web search engines Multimedia systems Internet searching Computer algorithms
34	Unsupervised extraction and normalization of product attributes from web pages. January 2010 (has links) Xiong, Jiani. / "July 2010." / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (p. 59-63). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Motivation --- p.4 / Chapter 1.3 --- Our Approach --- p.8 / Chapter 1.4 --- Potential Applications --- p.12 / Chapter 1.5 --- Research Contributions --- p.13 / Chapter 1.6 --- Thesis Organization --- p.15 / Chapter 2 --- Literature Survey --- p.16 / Chapter 2.1 --- Supervised Extraction Approaches --- p.16 / Chapter 2.2 --- Unsupervised Extraction Approaches --- p.19 / Chapter 2.3 --- Attribute Normalization --- p.21 / Chapter 2.4 --- Integrated Approaches --- p.22 / Chapter 3 --- Problem Definition and Preliminaries --- p.24 / Chapter 3.1 --- Problem Definition --- p.24 / Chapter 3.2 --- Preliminaries --- p.27 / Chapter 3.2.1 --- Web Pre-processing --- p.27 / Chapter 3.2.2 --- Overview of Our Framework --- p.31 / Chapter 3.2.3 --- Background of Graphical Models --- p.32 / Chapter 4 --- Our Proposed Framework --- p.36 / Chapter 4.1 --- Our Proposed Graphical Model --- p.36 / Chapter 4.2 --- Inference --- p.41 / Chapter 4.3 --- Product Attribute Information Determination --- p.47 / Chapter 5 --- Experiments and Results --- p.49 / Chapter 6 --- Conclusion --- p.57 / Bibliography --- p.59 / Chapter A --- Dirichlet Process --- p.64 / Chapter B --- Hidden Markov Models --- p.68 Data mining--Mathematical models Search engines
35	Doctoral students’ mental models of a web search engine : an exploratory study Li, Ping, 1965- January 2007 (has links) No description available. Google College students -- Psychology. Search engines.
36	Efficient Index Maintenance for Text Databases Lester, Nicholas, nml@cs.rmit.edu.au January 2006 (has links) All practical text search systems use inverted indexes to quickly resolve user queries. Offline index construction algorithms, where queries are not accepted during construction, have been the subject of much prior research. As a result, current techniques can invert virtually unlimited amounts of text in limited main memory, making efficient use of both time and disk space. However, these algorithms assume that the collection does not change during the use of the index. This thesis examines the task of index maintenance, the problem of adapting an inverted index to reflect changes in the collection it describes. Existing approaches to index maintenance are discussed, including proposed optimisations. We present analysis and empirical evidence suggesting that existing maintenance algorithms either scale poorly to large collections, or significantly degrade query resolution speed. In addition, we propose a new strategy for index maintenance that trades a strictly controlled amount of querying efficiency for greatly increased maintenance speed and scalability. Analysis and empirical results are presented that show that this new algorithm is a useful trade-off between indexing and querying efficiency. In scenarios described in Chapter 7, the use of the new maintenance algorithm reduces the time required to construct an index to under one sixth of the time taken by algorithms that maintain contiguous inverted lists. In addition to work on index maintenance, we present a new technique for accumulator pruning during ranked query evaluation, as well as providing evidence that existing approaches are unsatisfactory for collections of large size. Accumulator pruning is a key problem in both querying efficiency and overall text search system efficiency. Existing approaches either fail to bound the memory footprint required for query evaluation, or suffer loss of retrieval accuracy. In contrast, the new pruning algorithm can be used to limit the memory footprint of ranked query evaluation, and in our experiments gives retrieval accuracy not worse than previous alternatives. The results presented in this thesis are validated with robust experiments, which utilise collections of significant size, containing real data, and tested using appropriate numbers of real queries. The techniques presented in this thesis allow information retrieval applications to efficiently index and search changing collections, a task that has been historically problematic. text indexing search engines index construction index update accumulator pruning
37	Search Engine Optimisation Using Past Queries Garcia, Steven, steven.garcia@student.rmit.edu.au January 2008 (has links) World Wide Web search engines process millions of queries per day from users all over the world. Efficient query evaluation is achieved through the use of an inverted index, where, for each word in the collection the index maintains a list of the documents in which the word occurs. Query processing may also require access to document specific statistics, such as document length; access to word statistics, such as the number of unique documents in which a word occurs; and collection specific statistics, such as the number of documents in the collection. The index maintains individual data structures for each these sources of information, and repeatedly accesses each to process a query. A by-product of a web search engine is a list of all queries entered into the engine: a query log. Analyses of query logs have shown repetition of query terms in the requests made to the search system. In this work we explore techniques that take advantage of the repetition of user queries to improve the accuracy or efficiency of text search. We introduce an index organisation scheme that favours those documents that are most frequently requested by users and show that, in combination with early termination heuristics, query processing time can be dramatically reduced without reducing the accuracy of the search results. We examine the stability of such an ordering and show that an index based on as little as 100,000 training queries can support at least 20 million requests. We show the correlation between frequently accessed documents and relevance, and attempt to exploit the demonstrated relationship to improve search effectiveness. Finally, we deconstruct the search process to show that query time redundancy can be exploited at various levels of the search process. We develop a model that illustrates the improvements that can be achieved in query processing time by caching different components of a search system. This model is then validated by simulation using a document collection and query log. Results on our test data show that a well-designed cache can reduce disk activity by more than 30%, with a cache that is one tenth the size of the collection. Information retrieval search engines access-ordering caching efficiency
38	Web-based distributed applications for cytosensor Liew, Ji Seok 17 March 2003 (has links) To protect the environment and save human lives, the detection of various hazardous toxins of biological or chemical origin has been a major challenge to the researchers at Oregon State University. Living fish cells can indicate the presence of a wide range of toxins by reactions such as changing color and shape changes. A research team in Electrical and Computer Engineering Department is developing a hybrid detection device (Cytosensor) that combines biological reaction and digital technology. The functions of Cytosensor can be divided into three parts, which are real-time image acquisition, data processing and statistical data analysis. User-friendly Web-Based Distributed Applications (WBDA) for Cytosensor offer various utilities. WBDA allow the users to control and observe the local Cytosensor, search and retrieve data acquired by the sensor network, and process the acquired images remotely using only a web browser. Additionally, these applications minimize the user's exposure to dangerous chemicals or biological products. This thesis describes the design of a remote controller, system observer, remote processor, and search engine using JAVA applets, XML, Perl, MATLAB, and Peer-to-Peer models. Furthermore, the implementations of image segmentation technique in MATLAB and the Machine Vision Algorithm in JAVA for independent web-based processing are investigated. / Graduation date: 2003 Application software Cytosensor Search engines -- Programming Biosensors -- Computer programs
39	Google advanced search Unruh, Miriam, McLean, Cheryl, Tittenberger, Peter, Schor, Dario 21 March 2006 (has links) After completing this tutorial you will be able to use multiple search terms and other advanced features in "Google." This flash tutorial requires a screen resolution of 1024 x 768 or higher. google search web research Google (Firm) Internet searching Search Engines
40	Adaptive Comparison-Based Algorithms for Evaluating Set Queries Mirzazadeh, Mehdi January 2004 (has links) In this thesis we study a problem that arises in answering boolean queries submitted to a search engine. Usually a search engine stores the set of IDs of documents containing each word in a pre-computed sorted order and to evaluate a query like "computer AND science" the search engine has to evaluate the union of the sets of documents containing the words "computer" and "science". More complex queries will result in more complex set expressions. In this thesis we consider the problem of evaluation of a set expression with union and intersection as operators and ordered sets as operands. We explore properties of comparison-based algorithms for the problem. A <i>proof of a set expression</i> is the set of comparisons that a comparison-based algorithm performs before it can determine the result of the expression. We discuss the properties of the proofs of set expressions and based on how complex the smallest proofs of a set expression <i>E</i> are, we define a measurement for determining how difficult it is for <i>E</i> to be computed. Then, we design an algorithm that is adaptive to the difficulty of the input expression and we show that the running time of the algorithm is roughly proportional to difficulty of the input expression, where the factor is roughly logarithmic in the number of the operands of the input expression. Computer Science Adaptive algorithm comparison-based algorithm search engines algorithms

Search results