Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc1538730 |
Date | 08 1900 |
Creators | Florescu, Corina Andreea |
Contributors | Jin, Wei, Nielsen, Rodney, Huang, Yan, Fu, Song, Blanco, Eduardo |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | vii, 85 pages, Text |
Rights | Use restricted to UNT Community, Florescu, Corina Andreea, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved. |
Page generated in 0.0026 seconds