Return to search

Predictive data mining in a collaborative editing system: the Wikipedia articles for deletion process.

Master of Science / Department of Computing and Information Sciences / William H. Hsu / In this thesis, I examine the Articles for Deletion (AfD) system in /Wikipedia/, a large-scale collaborative editing project. Articles in Wikipedia can be nominated for deletion by registered users, who are expected to cite criteria for deletion from the Wikipedia deletion. For example, an article can be nominated for deletion if there are any copyright violations, vandalism, advertising or other spam without relevant content, advertising or other spam without relevant content. Articles whose subject matter does not meet the notability criteria or any other content not suitable for an encyclopedia are also subject to deletion.
The AfD page for an article is where Wikipedians (users of Wikipedia) discuss whether an article should be deleted. Articles listed are normally discussed for at least seven days, after which the deletion process proceeds based on community consensus. Then the page may be kept, merged or redirected, transwikied (i.e., copied to another Wikimedia project), renamed/moved to another title, userfied or migrated to a user subpage, or deleted per the deletion policy. Users can vote to keep, delete or merge the nominated article. These votes can be viewed in article’s view AfD page. However, this polling does not necessarily determine the outcome of the AfD process; in fact, Wikipedia policy specifically stipulates that a vote tally alone should not be considered sufficient basis for a decision to delete or retain a page.
In this research, I apply machine learning methods to determine how the final outcome of an AfD process is affected by factors such as the difference between versions of an article, number of edits, and number of disjoint edits (according to some contiguity constraints). My goal is to predict the outcome of an AfD by analyzing the AfD page and editing history of the article. The technical objectives are to extract features from the AfD discussion and version history, as reflected in the edit history page, that reflect factors such as those discussed above, can be tested for relevance, and provide a basis for inductive generalization over past AfDs. Applications of such feature analysis include prediction and recommendation, with the performance goal of improving the precision and recall of AfD outcome prediction.

Identiferoai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/12026
Date January 1900
CreatorsAshok, Ashish Kumar
PublisherKansas State University
Source SetsK-State Research Exchange
Languageen_US
Detected LanguageEnglish
TypeThesis

Page generated in 0.0015 seconds