Global ETD Search

Return to search

Document geolocation using language models built from lexical and geographic similarity

This thesis investigates the automatic identification of the location of doc- uments. This process of geolocation aids in toponym resolution, document summarization, and geographic-based marketing. I focus on minimally su- pervised methods to examine both the lexical similarities and the geographic similarities between documents. This method predicts the location of a doc- ument as a single point on the earth’s surface. Three data sets are used to evaluate this method: a set of geotagged Wikipedia articles and two sets of Twitter feeds. For Wikipedia, the combined method obtains a median error of 12.1 kilometers and an improvement in mean error to 164 kilometers. The large Twitter data shows the greatest improvement from this method with a median error of 333 kilometers, down from the previous best of 463 kilometers. / text

Geolocation

Identifer	oai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/ETD-UT-2012-05-5717
Date	16 August 2012
Creators	Skiles, Erik David
Source Sets	University of Texas
Language	English
Detected Language	English
Type	thesis
Format	application/pdf

Page generated in 0.0018 seconds

Document geolocation using language models built from lexical and geographic similarity

Description

Links & Downloads

Tags

Additional Fields