• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

News Feeds Clustering Research Study

Abuel-Futuh, Haytham 01 April 2015 (has links)
With over 0.25 billion web pages hosted in the World Wide Web, it is virtually impossible to navigate through the Internet. Many applications try to help users achieve this task. For example, search engines build indexes to make the entire World Wide Web searchable, and news curators allow users to browse topics of interest on different structured sites. One problem that arises for these applications and others with similar goals is identifying documents with similar contents. This helps the applications show users documents with unique contents as well as group various similar documents under similar topics. There has been a lot of effort into algorithms that can achieve that task. Prior research include Yang, Pierce & Carbonell (1998) research where they looked at the problem of identifying news events exploiting chronology order, Nallapati, et al (2004) research who built a dependency model for news events and Shah & Elbahesh (2004) research where they used Jaccard coefficient to generate a flat list of topics. This research will identify training and testing datasets, and it will train and evaluate (Pera & Ng) algorithm. The chosen algorithm is a hierarchical clustering algorithm that incorporates many of the ideas researched earlier. In evaluation phase, error will be measured in the ratio of miss-categorized documents to the total number of documents. The research will show error can be as low as 0.03 with a model built on a single node processing 1000 random distinct documents. In evaluation of the algorithm, the experiments will show that (Pera & Ng)’s fuzzy equivalence algorithm does produce acceptable results when compared to Google News as a reference. The algorithm, however, requires a huge amount of memory to hold the trained model. This renders it not suitable to run on portable devices.
2

Eliminating Redundant and Less-informative RSS News Articles Based on Word Similarity and A Fuzzy Equivalence Relation

Garcia, Ian 10 January 2007 (has links) (PDF)
The Internet has marked this era as the information age. There is no precedent in the amazing amount of information, especially network news, that can be accessed by Internet users these days. As a result, the problem of seeking information in online news articles is not the lack of them but being overwhelmed by them. This brings huge challenges regarding processing of online news feeds, i.e., how to determine which news article is important, how to determine the quality of each news article, and how to filter irrelevant and redundant information. In this thesis, we propose a method for filtering redundant and less-informative RSS news articles that solves the problem of excessive number of news feeds observed in RSS news aggregators. Our filtering approach measures similarity among RSS news entries by using the Fuzzy-Set Information Retrieval model and a fuzzy equivalent relation for computing word/sentence similarity to detect redundant and less-informative news articles.

Page generated in 0.068 seconds