• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Classifying Websites into Non-topical Categories

Thapa, Chaman Unknown Date
No description available.
2

Functionality Classification Filter for Websites

Järvstråt, Lotta January 2013 (has links)
The objective of this thesis is to evaluate different models and methods for website classification. The websites are classified based on their functionality, in this case specifically whether they are forums, news sites or blogs. The analysis aims at solving a search engine problem, which means that it is interesting to know from which categories in a information search the results come. The data consists of two datasets, extracted from the web in January and April 2013. Together these data sets consist of approximately 40.000 observations, with each observation being the extracted text from the website. Approximately 7.000 new word variables were subsequently created from this text, as were variables based on Latent Dirichlet Allocation. One variable (the number of links) was created using the HTML-code for the web site. These data sets are used both in multinomial logistic regression with Lasso regularization, and to create a Naive Bayes classifier. The best classifier for the data material studied was achieved when using Lasso for all variables with multinomial logistic regression to reduce the number of variables. The  accuracy of this model is 99.70 %. When time dependency of the models is considered, using the first data to make the model and the second data for testing, the accuracy, however, is only 90.74 %. This indicates that the data is time dependent and that websites topics change over time.

Page generated in 0.1277 seconds