21 |
The implementation and use of a logic based approach to assist retrieval from a relational databaseJones, P. January 1988 (has links)
No description available.
|
22 |
A concept-space based multi-document text summarizer.January 2001 (has links)
by Tang Ting Kap. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 88-94). / Abstracts in English and Chinese. / List of Figures --- p.vi / List of Tables --- p.vii / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 1.1 --- Information Overloading and Low Utilization --- p.2 / Chapter 1.2 --- Problem Needs To Solve --- p.3 / Chapter 1.3 --- Research Contributions --- p.4 / Chapter 1.3.1 --- Using Concept Space in Summarization --- p.5 / Chapter 1.3.2 --- New Extraction Method --- p.5 / Chapter 1.3.3 --- Experiments on New System --- p.6 / Chapter 1.4 --- Organization of This Thesis --- p.7 / Chapter 2. --- LITERATURE REVIEW --- p.8 / Chapter 2.1 --- Classical Approach --- p.8 / Chapter 2.1.1 --- Luhn's Algorithm --- p.9 / Chapter 2.1.2 --- Edumundson's Algorithm --- p.11 / Chapter 2.2 --- Statistical Approach --- p.15 / Chapter 2.3 --- Natural Language Processing Approach --- p.15 / Chapter 3. --- PROPOSED SUMMARIZATION APPROACH --- p.18 / Chapter 3.1 --- Direction of Summarization --- p.19 / Chapter 3.2 --- Overview of Summarization Algorithm --- p.20 / Chapter 3.2.1 --- Document Pre-processing --- p.21 / Chapter 3.2.2 --- Vector Space Model --- p.23 / Chapter 3.2.3 --- Sentence Extraction --- p.24 / Chapter 3.3 --- Evaluation Method --- p.25 / Chapter 3.3.1 --- "Recall, Precision and F-measure" --- p.25 / Chapter 3.4 --- Advantage of Concept Space Approach --- p.26 / Chapter 4. --- SYSTEM ARCHITECTURE --- p.27 / Chapter 4.1 --- Converge Process --- p.28 / Chapter 4.2 --- Diverge Process --- p.30 / Chapter 4.3 --- Backward Search --- p.31 / Chapter 5. --- CONVERGE PROCESS --- p.32 / Chapter 5.1 --- Document Merging --- p.32 / Chapter 5.2 --- Word Phrase Extraction --- p.34 / Chapter 5.3 --- Automatic Indexing --- p.34 / Chapter 5.4 --- Cluster Analysis --- p.35 / Chapter 5.5 --- Hopfield Net Classification --- p.37 / Chapter 6. --- DIVERGE PROCESS --- p.42 / Chapter 6.1 --- Concept Terms Refinement --- p.42 / Chapter 6.2 --- Sentence Selection --- p.43 / Chapter 6.3 --- Backward Searching --- p.46 / Chapter 7. --- EXPERIMENT AND RESEARCH FINDINGS --- p.48 / Chapter 7.1 --- System-generated Summary v.s. Source Documents --- p.52 / Chapter 7.1.1 --- Compression Ratio --- p.52 / Chapter 7.1.2 --- Information Loss --- p.54 / Chapter 7.2 --- System-generated Summary v.s. Human-generated Summary --- p.58 / Chapter 7.2.1 --- Background of EXTRACTOR --- p.59 / Chapter 7.2.2 --- Evaluation Method --- p.61 / Chapter 7.3 --- Evaluation of different System-generated Summaries by Human Experts --- p.63 / Chapter 8. --- CONCLUSIONS AND FUTURE RESEARCH --- p.68 / Chapter 8.1 --- Conclusions --- p.68 / Chapter 8.2 --- Future Work --- p.69 / Chapter A. --- EXTRACTOR SYSTEM FLOW AND TEN-STEP PROCEDURE --- p.71 / Chapter B. --- SUMMARY GENERATED BY MS WORD2000 --- p.75 / Chapter C. --- SUMMARY GENERATED BY EXTRACTOR SOFTWARE --- p.76 / Chapter D. --- SUMMARY GENERATED BY OUR SYSTEM --- p.77 / Chapter E. --- SYSTEM-GENERATED WORD PHRASES FROM TEST SAMPLE --- p.78 / Chapter F. --- WORD PHRASES IDENTIFIED BY SUBJECTS --- p.79 / Chapter G. --- SAMPLE OF QUESTIONNAIRE --- p.84 / Chapter H. --- RESULT OF QUESTIONNAIRE --- p.85 / Chapter I. --- EVALUATION FOR DIVERGE PROCESS --- p.86 / BIBLIOGRAPHY --- p.88
|
23 |
A tightness continuum measure of Chinese semantic units, and its application to information retrievalXu, Ying 06 1900 (has links)
Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks.
We propose a tightness continuum for Chinese semantic units. The construction of the continuum is based on statistical informations. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks.
In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter. The second method embeds the tightness value into IR score functions. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems.
|
24 |
Effectiveness of index size reduction techniquesJacobson, Bryan L. 19 February 1992 (has links)
Index size savings from three techniques are measured. The three
techniques are: 1) eliminating common, low information words found in a
"stop list" (such as: of, the, at, etc.), 2) truncating terms by eliminating word
stems (such as: -s, -ed, -ing, etc.), and 3) simple data compression. Savings
are measured on two moderately large collections of text. The index size
savings that result from using the techniques individually and in
combination are reported. The impact on query performance in terms of
speed, recall and precision are estimated. / Graduation date: 1992
|
25 |
Focused RetrievalItakura, Kalista Yuki January 2010 (has links)
Traditional information retrieval applications, such as Web search, return atomic units of retrieval, which are generically called ``documents''. Depending on the application, a document may be a Web page, an email message, a journal article, or any similar object. In contrast to this traditional approach, focused retrieval helps users better pin-point their exact information needs by returning results at the sub-document level. These results may consist of predefined document components~---~such as pages, sections, and paragraphs~---~or they may consist of arbitrary passages, comprising any sub-string of a document. If a document is marked up with XML, a focused retrieval system might return individual XML elements or ranges of elements. This thesis proposes and evaluates a number of approaches to focused retrieval, including methods based on XML markup and methods based on arbitrary passages. It considers the best unit of retrieval, explores methods for efficient sub-document retrieval, and evaluates formulae for sub-document scoring. Focused retrieval is also considered in the specific context of the Wikipedia, where methods for automatic vandalism detection and automatic link generation are developed and evaluated.
|
26 |
Inverted Index Partitioning Strategies for a Distributed Search EnginePatel, Hiren 17 December 2010 (has links)
One of the greatest challenges in information retrieval is to develop an intelligent system for user and machine interaction that supports users in their quest for relevant information. The dramatic increase in the amount of Web content gives rise to the need for a large-scale distributed information retrieval system, targeted to support millions of users and terabytes of data. To retrieve information from such a large amount of data in an efficient manner, the index is split among the servers in a distributed information retrieval system. Thus, partitioning the index among these collaborating nodes plays an important role in enhancing the performance of a distributed search engine. The two widely known inverted index partitioning schemes for a distributed information retrieval system are document partitioning and term partitioning. %In a document partitioned system, each of the server hosts a subset of the documents in the collection, and execute every query against its local sub-collection. In a term partitioned index, each node is responsible for a subset of the terms in the collection, and serves them to a central node as they are required for query evaluation.
In this thesis, we introduce the Document over Term inverted index distribution scheme, which splits a set of nodes into several groups (sub-clusters) and then performs document partitioning between the groups and term partitioning within the group. As this approach is based on the term and document index partitioning approaches, we also refer it as a Hybrid Inverted Index. This approach retains the disk access benefits of term partitioning and the benefits of sharing computational load, scalability, maintainability, and availability of the document partitioning. We also introduce the Document over Document index partitioning scheme, based on the document partitioning approach. In this approach, a set of nodes is split into groups and documents in the collection are partitioned between groups and also within each group. This strategy retains all the benefits of the document partitioning approach, but reduces the computational load more effectively and uses resources more efficiently.
We compare distributed index approaches experimentally and show that in terms of efficiency and scalability, document partition based approaches perform significantly better than the others. The Document over Term partitioning offers efficient utilization of search-servers and lowers disk access, but suffers from the problem of load imbalance. The Document over Document partitioning emerged to be the preferred method during high workload.
|
27 |
Simultaneously searching with multiple algorithm settings an alternative to parameter tuning for suboptimal single-agent search /Valenzano, Richard. January 2009 (has links)
Thesis (M. Sc.)--University of Alberta, 2009. / Title from PDF file main screen (viewed on Nov. 27, 2009). "A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science, Department of Computing Science, University of Alberta." Includes bibliographical references.
|
28 |
Statistical physics of information retrieval /Wu, Bin. January 2002 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references. Also available in electronic version. Access restricted to campus users.
|
29 |
Adaptive video segmentationBanda, Nagamani. January 2004 (has links)
Thesis (M.S.)--West Virginia University, 2004. / Title from document title page. Document formatted into pages; contains vi, 52 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 50-52).
|
30 |
WINISIS - A Practical Guide: In Hindi LanguageChauhan, Buddhi P, Kapoor, Rachna, Singh, Shivendra, Das, Anup Kumar January 2007 (has links)
This WINISIS Training Manual in Hindi language contains three self-learning modules: WINISIS â A Practical Guide; Creating Web Interface for CDS/ISIS Databases using GenisisWeb; and Publishing CDS/ISIS Databases on CD-ROM using GenisisCD. These self-learning modules are the outcome of the Advanced Workshop on CDS-ISIS for Windows, held at the Thapar University on 14-18 May 2007. The Training Manual covers all aspects of WINISIS: installation of software, creation of the database, database operations, customization of search interfaces and display formatting language. Advanced features, such as hyper-linking, web interfacing, full-text document processing and automation of libraries, are also present in this document. Target audience of this Manual is library professionals working in academic, special and public libraries as well as students of library science courses. The Manual will also be helpful to small organizations, which are building digital archives in local library setup or on CD-ROMs. After practicing the laboratory exercises given in the Manual, the learners will be able to install WINISIS software and its web application tools GENISIS; create and manage bibliographic or full-text databases. This Manual is particularly useful in the South Asian region, where availability of training material in local languages is crucial for providing public information services with the help of free and open source software (FOSS).
|
Page generated in 0.0193 seconds