Global ETD Search

1	Three Essays on the Determinants of and Returns to Volunteering Seifi, Forough January 2017 (has links) This thesis consists of three essays on the determinants of and returns to volunteering. The first essay, ‘volunteer opportunities and volunteering’ examines the relationship between physical access to charitable organizations and volunteering. Formal volunteer activities usually take place within a charitable or non-profit organization. While the physical presence of these organizations is required for citizens who want to contribute to their communities, the availability of charitable organizations (number and type) varies from neighbourhood to neighbourhood. Until now, no one has examined the role played by charity proximity on volunteer decisions. In this paper I use information on the location of registered charities in Canada (from the CRA T3010 registered charity returns) merged with survey information on volunteering (from General Social Surveys conducted by Statistics Canada) to examine how physical access affects volunteer behaviour. Careful attention is paid to the possibility that the measure of access might be endogenous: organizations and individuals may respond to the same unobservable factors when deciding where to locate. Various strategies including an instrumental variables procedure are undertaken to deal with this possibility. My results suggest that access does matter for the decision to volunteer as well as for the amount of time devoted to volunteering. My estimates imply that increasing the number of charitable organizations within a one-kilometre buffer around an individual’s place of residence by 6% (the growth rate of the number charities in Canada (between 2003 to 2009), increases the predicted probability of volunteering by 5%. The second essay, ‘the returns to working for free’ examines the relationship between volunteering and income. Previous studies have shown volunteering to be associated with an earnings premium, but many of these studies fail to take into account the possible endogeneity between volunteering and income. Using data from the General Social Surveys (2003, 2005, 2008, 2010 and 2013), I investigate the causal relationship between volunteering and income. I employ a novel instrument, a measure of access to charitable organizations around an individual’s place of residence, along with more conventional ones, like membership or participation in different groups or organizations, to examine this relationship and try to understand how volunteering might affect earned income. Identifying the effect of volunteering of the different subgroups affected by the different instruments provides a (surprisingly) large range of estimates. For example estimates in the upper range found in the literature (53%) are found for individuals who are induced to volunteer because of their membership or participation in sport or recreational organizations, no returns are found for those induced to volunteer because of their membership or participation in school or civic groups, negative returns (22%) are found for those induced to volunteer because of their membership or participation in religious affiliated groups and very large (47%), but imprecise estimates are found for those induced to volunteer because of proximity to charitable organizations. The third essay, ‘doing good, feeling good: causal evidence from Canadian volunteers’ examines the relationships between volunteering and health, and volunteering and life satisfaction. A literature suggests that volunteers are healthier and happier than their non-volunteering counterparts. But this ‘observation’ is fraught with problems of endogeneity. Some papers have addressed the endogeneity problem with an instrumental variable technique; mostly relying on measures of ‘religiosity’ as instruments. However, no studies of such nature have been conducted in Canada. Using data from the General Social Surveys, I again employ the measure of physical access to charitable organizations within a three-kilometer radius of an individual’s place of residence as the main identifying instrument to examine the causal relationship between volunteering, health and life satisfaction for individuals aged 15 years old and over. Employing a conditional mixed process (CMP) to estimate the model, I conclude that volunteering is a significant predictor of health, and it has a statistically significant effect on life satisfaction for female and middle-aged individuals. Volunteer Income Health Geo-coding Endogeneity Charitable organizations Charitable organizations
2	Toponym resolution in text Leidner, Jochen Lothar January 2007 (has links) Background. In the area of Geographic Information Systems (GIS), a shared discipline between informatics and geography, the term geo-parsing is used to describe the process of identifying names in text, which in computational linguistics is known as named entity recognition and classification (NERC). The term geo-coding is used for the task of mapping from implicitly geo-referenced datasets (such as structured address records) to explicitly geo-referenced representations (e.g., using latitude and longitude). However, present-day GIS systems provide no automatic geo-coding functionality for unstructured text. In Information Extraction (IE), processing of named entities in text has traditionally been seen as a two-step process comprising a flat text span recognition sub-task and an atomic classification sub-task; relating the text span to a model of the world has been ignored by evaluations such as MUC or ACE (Chinchor (1998); U.S. NIST (2003)). However, spatial and temporal expressions refer to events in space-time, and the grounding of events is a precondition for accurate reasoning. Thus, automatic grounding can improve many applications such as automatic map drawing (e.g. for choosing a focus) and question answering (e.g. for questions like How far is London from Edinburgh?, given a story in which both occur and can be resolved). Whereas temporal grounding has received considerable attention in the recent past (Mani and Wilson (2000); Setzer (2001)), robust spatial grounding has long been neglected. Concentrating on geographic names for populated places, I define the task of automatic Toponym Resolution (TR) as computing the mapping from occurrences of names for places as found in a text to a representation of the extensional semantics of the location referred to (its referent), such as a geographic latitude/longitude footprint. The task of mapping from names to locations is hard due to insufficient and noisy databases, and a large degree of ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous (London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth). In addition, names of places and the boundaries referred to change over time, and databases are incomplete. Objective. I investigate how referentially ambiguous spatial named entities can be grounded, or resolved, with respect to an extensional coordinate model robustly on open-domain news text. I begin by comparing the few algorithms proposed in the literature, and, comparing semiformal, reconstructed descriptions of them, I factor out a shared repertoire of linguistic heuristics (e.g. rules, patterns) and extra-linguistic knowledge sources (e.g. population sizes). I then investigate how to combine these sources of evidence to obtain a superior method. I also investigate the noise effect introduced by the named entity tagging step that toponym resolution relies on in a sequential system pipeline architecture. Scope. In this thesis, I investigate a present-day snapshot of terrestrial geography as represented in the gazetteer defined and, accordingly, a collection of present-day news text. I limit the investigation to populated places; geo-coding of artifact names (e.g. airports or bridges), compositional geographic descriptions (e.g. 40 miles SW of London, near Berlin), for instance, is not attempted. Historic change is a major factor affecting gazetteer construction and ultimately toponym resolution. However, this is beyond the scope of this thesis. Method. While a small number of previous attempts have been made to solve the toponym resolution problem, these were either not evaluated, or evaluation was done by manual inspection of system output instead of curating a reusable reference corpus. Since the relevant literature is scattered across several disciplines (GIS, digital libraries, information retrieval, natural language processing) and descriptions of algorithms are mostly given in informal prose, I attempt to systematically describe them and aim at a reconstruction in a uniform, semi-formal pseudo-code notation for easier re-implementation. A systematic comparison leads to an inventory of heuristics and other sources of evidence. In order to carry out a comparative evaluation procedure, an evaluation resource is required. Unfortunately, to date no gold standard has been curated in the research community. To this end, a reference gazetteer and an associated novel reference corpus with human-labeled referent annotation are created. These are subsequently used to benchmark a selection of the reconstructed algorithms and a novel re-combination of the heuristics catalogued in the inventory. I then compare the performance of the same TR algorithms under three different conditions, namely applying it to the (i) output of human named entity annotation, (ii) automatic annotation using an existing Maximum Entropy sequence tagging model, and (iii) a na¨ıve toponym lookup procedure in a gazetteer. Evaluation. The algorithms implemented in this thesis are evaluated in an intrinsic or component evaluation. To this end, we define a task-specific matching criterion to be used with traditional Precision (P) and Recall (R) evaluation metrics. This matching criterion is lenient with respect to numerical gazetteer imprecision in situations where one toponym instance is marked up with different gazetteer entries in the gold standard and the test set, respectively, but where these refer to the same candidate referent, caused by multiple near-duplicate entries in the reference gazetteer. Main Contributions. The major contributions of this thesis are as follows: • A new reference corpus in which instances of location named entities have been manually annotated with spatial grounding information for populated places, and an associated reference gazetteer, from which the assigned candidate referents are chosen. This reference gazetteer provides numerical latitude/longitude coordinates (such as 51320 North, 0 50 West) as well as hierarchical path descriptions (such as London > UK) with respect to a world wide-coverage, geographic taxonomy constructed by combining several large, but noisy gazetteers. This corpus contains news stories and comprises two sub-corpora, a subset of the REUTERS RCV1 news corpus used for the CoNLL shared task (Tjong Kim Sang and De Meulder (2003)), and a subset of the Fourth Message Understanding Contest (MUC-4; Chinchor (1995)), both available pre-annotated with gold-standard. This corpus will be made available as a reference evaluation resource; • a new method and implemented system to resolve toponyms that is capable of robustly processing unseen text (open-domain online newswire text) and grounding toponym instances in an extensional model using longitude and latitude coordinates and hierarchical path descriptions, using internal (textual) and external (gazetteer) evidence; • an empirical analysis of the relative utility of various heuristic biases and other sources of evidence with respect to the toponym resolution task when analysing free news genre text; • a comparison between a replicated method as described in the literature, which functions as a baseline, and a novel algorithm based on minimality heuristics; and • several exemplary prototypical applications to show how the resulting toponym resolution methods can be used to create visual surrogates for news stories, a geographic exploration tool for news browsing, geographically-aware document retrieval and to answer spatial questions (How far...?) in an open-domain question answering system. These applications only have demonstrative character, as a thorough quantitative, task-based (extrinsic) evaluation of the utility of automatic toponym resolution is beyond the scope of this thesis and left for future work. 621.382
3	Geo-Locating Tweets with Latent Location Information Lee, Sunshin 13 February 2017 (has links) As part of our work on the NSF funded Integrated Digital Event Archiving and Library (IDEAL) project and the Global Event and Trend Archive Research (GETAR) project, we collected over 1.4 billion tweets using over 1,000 keywords, key phrases, mentions, or hashtags, starting from 2009. Since many tweets talk about events (with useful location information), such as natural disasters, emergencies, and accidents, it is important to geo-locate those tweets whenever possible. Due to possible location ambiguity, finding a tweet's location often is challenging. Many distinct places have the same geoname, e.g., "Greenville" matches 50 different locations in the U.S.A. Frequently, in tweets, explicit location information, like geonames mentioned, is insufficient, because tweets are often brief and incomplete. They have a small fraction of the full location information of an event due to the 140 character limitation. Location indicative words (LIWs) may include latent location information, for example, "Water main break near White House" does not have any geonames but it is related to a location "1600 Pennsylvania Ave NW, Washington, DC 20500 USA" indicated by the key phrase 'White House'. To disambiguate tweet locations, we first extracted geospatial named entities (geonames) and predicted implicit state (e.g., Virginia or California) information from entities using machine learning algorithms including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest (RF). Implicit state information helps reduce ambiguity. We also studied how location information of events is expressed in tweets and how latent location indicative information can help to geo-locate tweets. We then used a machine learning (ML) approach to predict the implicit state using geonames and LIWs. We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford NER. Adding state information predicted by our classifiers increased the possibility to find the state-level geo-location unambiguously by up to 80%. We also studied over 6 million tweets (3 mid-size and 2 big-size collections about water main breaks, sinkholes, potholes, car crashes, and car accidents), covering 17 months. We found that up to 91.1% of tweets have at least one type of location information (geo-coordinates or geonames), or LIWs. We also demonstrated that in most cases adding LIWs helps geo-locate tweets with less ambiguity using a geo-coding API. Finally, we conducted additional experiments with the five different tweet collections, and found significant improvement in disambiguating tweet locations using a ML approach with geonames and all LIWs that are present in tweet texts as features. / Ph. D. / As part of our work on the projects “Integrated Digital Event Archiving and Library (IDEAL)” and “Global Event and Trend Archive Research (GETAR),” funded by NSF, we collected over 1.4 billion tweets using over 1,000 keywords, key phrases, mentions, or hashtags, starting from 2009. Since many tweets talk about events (with useful location information), such as natural disasters, emergencies, and accidents, it is important to geolocate those tweets whenever possible. Due to possible location ambiguity, finding a tweet’s location often is challenging. Many distinct places have the same geoname, e.g., “Greenville” matches 50 different locations in the U.S.A. Frequently, in tweets, explicit location information, like geonames mentioned, is insufficient, because tweets are often brief and incomplete. They have a small fraction of the full location information of an event due to the 140 character limitation. Location indicative words (LIWs) may include latent location information, for example, “Water main break near White House” does not have any geonames but it is related to a location “1600 Pennsylvania Ave NW, Washington, DC 20500 USA” indicated by the key phrase ‘White House’. To disambiguate tweet locations, we first extracted geonames, and then predicted implicit state (e.g., Virginia or California) information from entities using machine learning (ML) algorithms (wherein computers learn from examples what state is appropriate). Implicit state information helps reduce ambiguity. We also studied how location information of events is expressed in tweets and how latent location indicative information can help to geo-locate tweets. We then used a ML approach to predict the implicit state using geonames and LIWs. We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford Named Entity Recognizer. Adding state information predicted by our classifiers increased the ability to find the state-level geo-location unambiguously by up to 80%. We also studied over 6 million tweets (in three mid-size and two big collections, about water main breaks, sinkholes, potholes, car crashes, and car accidents), covering 17 months. We found that up to 91.1% of tweets have at least one type of location information (geocoordinates or geonames), or LIWs. We also demonstrated that in most cases adding LIWs helps geo-locate tweets with less ambiguity using a geo-coding Web application (that converts addresses into geographic coordinates). Finally, we conducted additional experiments with the five different tweet collections, and found significant improvement in disambiguating tweet locations using a ML approach wherein the features considered are the geonames and all LIWs that are present in the tweet texts. Classification Events Geo-coding Geo-locating Geo-parsing Google Geo-coding API Hadoop cluster Location Indicative Words (LIWs) Machine learning Naïve Bayes Named Entity Recognition Natural Language Processing
4	Fighting for Aid : Foreign Funding and Civil Conflict Intensity Strandow, Daniel January 2014 (has links) This dissertation focuses on the sub-national impact of foreign aid on civil conflicts by asking the question: How does foreign aid committed to contested areas affect the intensity of violence in those areas? The main theoretical contribution is to focus on how aid influences warring parties’ decisions to engage in contests over territorial control and how that in turn influences violence intensity. The study introduces two concepts: funding concentration and barriers to exploiting aid. A contested area has greater concentration of funding if warring parties expect a high value of aid to be distributed to only a few locations. Funding is instead diffused if the parties expect aid to be spread over many locations. A low barrier to exploiting aid is present if it is of a type that both state and non-state actors could potentially misuse. There is a high barrier if territorial control is required in order to exploit funding channels. The theory introduces three testable implications: First, greater funding concentration encourages conventional contests over territorial control, which increases military fatalities. The second proposal is that if there is a low barrier to exploiting aid (e.g. humanitarian and food aid) then there will be increased competition between warring parties and civilians, and hence more civilian fatalities. Third, high barrier funding (e.g. education aid) will motivate contests over territorial control and increase military fatalities. This dissertation uses geo-coded aid commitments data and introduces data of warring parties’ battleground control in sub-Saharan Africa, 1989–2008. The research design relies on propensity score matching where pairs of observations are matched based on a range of covariates. The results concerning barriers to exploitation are partially supported. High barrier aid increases military fatalities whereas low barrier aid has little impact on violence. Greater funding concentration increases military fatalities substantially compared to if there is low or no funding concentration. In line with theory, greater funding concentration does not increase civilian fatalities. Aid foreign aid foreign assistance relief humanitarian conflict civil war civil conflict geographic concentration intra-state violence military contest low-intensity guerrilla irregular conventional decision theory contest success function geo-coding geo-referencing territorial control propensity score Africa South of the Sahara

1

Page generated in 0.0566 seconds