Global ETD Search

Return to search

Identifying Expression Fingerprints using Linguistic Information

This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. Theapproach presented facilitates identification of intellectual propertyusing expression fingerprints. Copyright law protects expression of content. Recognizing literaryworks for copyright protection requires identification of theexpression of their content. The expression fingerprints described inthis thesis use a novel set of linguistic features that capture boththe content presented in documents and the manner of expression usedin conveying this content. These fingerprints consist of bothsyntactic and semantic elements of language. Examples of thesyntactic elements of expression include structures of embedding andembedded verb phrases. The semantic elements of expression consist ofhigh-level, broad semantic categories. Syntactic and semantic elements of expression enable generation ofmodels that correctly identify books and their paraphrases 82% of thetime, providing a significant (approximately 18%) improvement over modelsthat use tfidf-weighted keywords. The performance of models builtwith these features is also better than models created with standardfeatures used in stylometry (e.g., function words), which yield anaccuracy of 62%.In the non-digital world, copyright holders collect revenues bycontrolling distribution of their works. Current approaches to thedigital copyright problem attempt to provide copyright holders withthe same kind of control over distribution by employing Digital RightsManagement (DRM) systems. However, DRM systems also enable copyrightholders to control and limit fair use, to inhibit others' speech, andto collect private information about individual users of digitalworks.Digital tracking technologies enable alternate solutions to thedigital copyright problem; some of these solutions can protectcreative incentives of copyright holders in the absence of controlover distribution of works. Expression fingerprints facilitatedigital tracking even when literary works are DRM- and watermark-free,and even when they are paraphrased. As such, they enable meteringpopularity of works and make practicable solutions that encouragelarge-scale dissemination and unrestricted use of digital works andthat protect the revenues of copyright holders, for example throughtaxation-based revenue collection and distribution systems, withoutimposing limits on distribution.

natural language processing

syntactic information

content

expression

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/30587
Date	18 November 2005
Creators	Uzuner, Ozlem
Source Sets	M.I.T. Theses and Dissertation
Language	en_US
Detected Language	English
Format	216 p., 179019584 bytes, 5410679 bytes, application/postscript, application/pdf
Relation	Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Page generated in 0.0019 seconds

Identifying Expression Fingerprints using Linguistic Information

Description

Links & Downloads

Tags

Additional Fields