Return to search

Building a search engine for music and audio on the World Wide Web

The main contribution of this dissertation is a system for locating and indexing audio files on the World Wide Web. The idea behind this system is that the use of both web page and audio file analysis techniques can produce more relevant information for locating audio files on the web than is used in full-text search engines. / The most important part of this system is a web crawler that finds materials by following hyperlinks between web pages. The crawler is distributed and operates using multiple computers across a network, storing results to a database. There are two main components: a set of retrievers that retrieve pages and audio files from the web, and a central crawl manager that coordinates the retrievers and handles data storage tasks. / The crawler is designed to locate three types of audio files: AIFF, WAVE, and MPEG-1 (MP3), but other types can be easily added to the system. Once audio files are located, analyses are performed of both the audio files and the associated web pages that link to these files. Information extracted by the crawler can be used to build search indexes for resolving user queries. A set of results demonstrating aspects of the performance of the crawler are presented, as well as some statistics and points of interest regarding the nature of audio files on the web.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.85177
Date January 2005
CreatorsKnopke, Ian
PublisherMcGill University
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Formatapplication/pdf
CoverageDoctor of Philosophy (Faculty of Music.)
RightsAll items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.
Relationalephsysno: 002211452, proquestno: AAINR12871, Theses scanned by UMI/ProQuest.

Page generated in 0.0022 seconds