• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.

Csomai, Andras 05 1900 (has links)
This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.
2

Large Scale Image Retrieval From Books

Zhao, Mao 01 January 2012 (has links) (PDF)
Search engines play a very important role in daily life. As multimedia product becomes more and more popular, people have developed search engines for images and videos. In the first part of this thesis, I propose a prototype of a book image search engine. I discuss tag representation for the book images, as well as the way to apply the probabilistic model to generate image tags. Then I propose the random walk refinement method using tag similarity graph. The image search system is built on the Galago search engine developed in UMASS CIIR lab. Consider the large amount of data the search engines need to process, I bring in cloud environment for the large-scale distributed computing in the second part of this thesis. I discuss two models, one is the MapReduce model, which is currently one of the most popular technologies in the IT industry, and the other one is the Maiter model. The asynchronous accumulative update mechanism of Maiter model is a great fit for the random walk refinement process, which takes up 84% of the entire run time, and it accelerates the refinement process by 46 times.

Page generated in 0.1023 seconds