Global ETD Search

11	Clickstream Analysis Kliegr, Tomáš January 2007 (has links) Thesis introduces current research trends in clickstream analysis and proposes a new heuristic that could be used for dimensionality reduction of semantically enriched data in Web Usage Mining (WUM). Click-fraud and conversion fraud are identified as key prospective application areas for WUM. Thesis documents a conversion fraud vulnerability of Google Analytics and proposes defense - a new clickstream acquisition software, which collects data in sufficient granularity and structure to allow for data mining approaches to fraud detection. Three variants of K-means clustering algorithms and three association rule data mining systems are evaluated and compared on real-world web usage data.
12	Improving opinion mining with feature-opinion association and human computation. / 利用特徵意見結合及人類運算改進意見挖掘 / Li yong te zheng yi jian jie he ji ren lei yun suan gai jin yi jian wa jue January 2009 (has links) Chan, Kam Tong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves [101]-113). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Major Topic --- p.1 / Chapter 1.1.1 --- Opinion Mining --- p.1 / Chapter 1.1.2 --- Human Computation --- p.2 / Chapter 1.2 --- Major Work and Contributions --- p.3 / Chapter 1.3 --- Thesis Outline --- p.4 / Chapter 2 --- Literature Review --- p.6 / Chapter 2.1 --- Opinion Mining --- p.6 / Chapter 2.1.1 --- Feature Extraction --- p.6 / Chapter 2.1.2 --- Sentiment Analysis --- p.9 / Chapter 2.2 --- Social Computing --- p.15 / Chapter 2.2.1 --- Social Bookmarking --- p.15 / Chapter 2.2.2 --- Social Games --- p.18 / Chapter 3 --- Feature-Opinion Association for Sentiment Analysis --- p.25 / Chapter 3.1 --- Motivation --- p.25 / Chapter 3.2 --- Problem Definition --- p.27 / Chapter 3.2.1 --- Definitions --- p.27 / Chapter 3.3 --- Closer look at the problem --- p.28 / Chapter 3.3.1 --- Discussion --- p.29 / Chapter 3.4 --- Proposed Approach --- p.29 / Chapter 3.4.1 --- Nearest Opinion Word (DIST) --- p.31 / Chapter 3.4.2 --- Co-Occurrence Frequency (COF) --- p.31 / Chapter 3.4.3 --- Co-Occurrence Ratio (COR) --- p.32 / Chapter 3.4.4 --- Likelihood-Ratio Test (LHR) --- p.32 / Chapter 3.4.5 --- Combined Method --- p.34 / Chapter 3.4.6 --- Feature-Opinion Association Algorithm --- p.35 / Chapter 3.4.7 --- Sentiment Lexicon Expansion --- p.36 / Chapter 3.5 --- Evaluation --- p.37 / Chapter 3.5.1 --- Corpus Data Set --- p.37 / Chapter 3.5.2 --- Test Data set --- p.37 / Chapter 3.5.3 --- Feature-Opinion Association Accuracy --- p.38 / Chapter 3.6 --- Summary --- p.45 / Chapter 4 --- Social Game for Opinion Mining --- p.46 / Chapter 4.1 --- Motivation --- p.46 / Chapter 4.2 --- Social Game Model --- p.47 / Chapter 4.2.1 --- Definitions --- p.48 / Chapter 4.2.2 --- Social Game Problem --- p.51 / Chapter 4.2.3 --- Social Game Flow --- p.51 / Chapter 4.2.4 --- Answer Extraction Procedure --- p.52 / Chapter 4.3 --- Social Game Properties --- p.53 / Chapter 4.3.1 --- Type of Information --- p.53 / Chapter 4.3.2 --- Game Structure --- p.55 / Chapter 4.3.3 --- Verification Method --- p.59 / Chapter 4.3.4 --- Game Mechanism --- p.60 / Chapter 4.3.5 --- Player Requirement --- p.62 / Chapter 4.4 --- Design Guideline --- p.63 / Chapter 4.5 --- Opinion Mining Game Design --- p.65 / Chapter 4.5.1 --- OpinionMatch --- p.65 / Chapter 4.5.2 --- FeatureGuess --- p.68 / Chapter 4.6 --- Summary --- p.71 / Chapter 5 --- Tag Sentiment Analysis for Social Bookmark Recommendation System --- p.72 / Chapter 5.1 --- Motivation --- p.72 / Chapter 5.2 --- Problem Statement --- p.74 / Chapter 5.2.1 --- Social Bookmarking Model --- p.74 / Chapter 5.2.2 --- Social Bookmark Recommendation (SBR) Problem --- p.75 / Chapter 5.3 --- Proposed Approach --- p.75 / Chapter 5.3.1 --- Social Bookmark Recommendation Framework --- p.75 / Chapter 5.3.2 --- Subjective Tag Detection (STD) --- p.77 / Chapter 5.3.3 --- Similarity Matrices --- p.80 / Chapter 5.3.4 --- User-Website matrix: --- p.81 / Chapter 5.3.5 --- User-Tag matrix --- p.81 / Chapter 5.3.6 --- Website-Tag matrix --- p.82 / Chapter 5.4 --- Pearson Correlation Coefficient --- p.82 / Chapter 5.5 --- Social Network-based User Similarity --- p.83 / Chapter 5.6 --- User-oriented Website Ranking --- p.85 / Chapter 5.7 --- Evaluation --- p.87 / Chapter 5.7.1 --- Bookmark Data --- p.87 / Chapter 5.7.2 --- Social Network --- p.87 / Chapter 5.7.3 --- Subjective Tag List --- p.87 / Chapter 5.7.4 --- Subjective Tag Detection --- p.88 / Chapter 5.7.5 --- Bookmark Recommendation Quality --- p.90 / Chapter 5.7.6 --- System Evaluation --- p.91 / Chapter 5.8 --- Summary --- p.93 / Chapter 6 --- Conclusion and Future Work --- p.94 / Chapter A --- List of Symbols and Notations --- p.97 / Chapter B --- List of Publications --- p.100 / Bibliography --- p.101 Data mining--Mathematical models Web usage mining Information retrieval
13	Extraction de données et apprentissage automatique pour les sites web adaptatifs Murgue, Thierry 12 December 2006 (has links) (PDF) Les travaux présentés se situent dans le cadre d'extraction de connaissance à partir de données. Un contexte d'étude intéressant et d'actualité a été choisi : les sites web adaptatifs. Pour mettre en oeuvre, de manière la plus automatique possible, de tels sites adaptés aux utilisateurs, nous décidons d'apprendre des modèles d'utilisateurs ou, plus précisément, de leurs types de navigations sur un site web donné. Ces modèles sont appris par inférence grammaticale. Les données disponibles liées au contexte du Web sont particulièrement difficiles à récupérer proprement. Nous choisissons de nous focaliser sur les fichiers de logs serveur en supprimant le bruit inhérent à ces derniers. L'inférence grammaticale peut généraliser ses données d'entrée pour obtenir de bons modèles de langages. Nous travaillons sur les mesures de similarité entre langages pour l'évaluation de la qualité des modèles appris. L'introduction d'une mesure euclidienne entre modèles de langages représentés sous forme d'automates permet de pallier les problèmes des métriques existantes. Des résultats théoriques montrent que cette mesure a les propriétés d'une vraie distance. Enfin, nous présentons divers résultats d'expérimentation sur des données du web que nous pré-traitons avant d'apprendre grâce à elles des modèles utilisateurs issus de l'inférence grammaticale stochastique. Les résultats obtenus sont sensiblement meilleurs que ceux présents dans l'état de l'art, notamment sur les tâches de prédiction de nouvelle page dans une navigation utilisateur. [INFO:INFO_OH] Computer Science/Other Apprentissage automatique Web Usage Mining
14	Design of Index Structures for Supporting Personalized Information Filtering on the Internet Chen, Tsu-I 25 July 2003 (has links) Owing to the booming development of the WWW, it creates many new challenges for information filtering. Information Filtering (IF) is an area of research that develops tools for discriminating between relevant and irrelevant information. IF can find good matches between the web pages and the users' information needs. Users first give descriptions about what they need, i.e., user profiles, to start the services. A profile index is built on these profiles. A series of incoming web pages will be put into the matching process. Each incoming web page is represented in the same form of the user profile. In this way, the users who are interested in an incoming web page can be identified by comparing the descriptions of the web page with each user profile. At last, the web page will be recommended to the users whose profiles belong to the filtered results. Therefore, a critical issue of the information filtering service is how to index the user profiles for an efficient matching process. When we index the user profile, we can reduce the costs of storage space and the processing time for modifying the user profiles. In this thesis, first, we propose a count-based tree method, which takes the count of each keyword into consideration, to reduce the large storage spaces as needed by the tree method. Next, three large-itemset-based methods are proposed to reduce the storage space, which are called the count-major large itemset method, the weighted large itemset method and the hybrid method. In these three large-itemset-based methods, we first cluster profiles with similar interests into the same group. Next, for each cluster, we apply the mining association rules techniques to help us to construct the index strategies. We design three methods by using the idea of the Apriori algorithm which is one of well-known approaches in mining association rules. But, we modify the minimum support and the goal in the Apriori algorithm. We may not always output the large itemset Lk. That is, we may only use Lw, where w < k. In summary, the cost of storage spaces of our four methods are less than that of the tree method proposed by Yan and Garcia-Molina. According to our simulation results, each of our four methods may provide the best result when different input data sets. Next, we propose a large-itemset-based approach to the incremental update of the index structure for storing keywords to reduce the update cost. When someone's interests are often changed, we must care about the way how to provide the low update cost of the index structure. We take the weight of each keyword into consideration. That is, each keyword can be distinguished the long-term interest which has weight above the threshold from the short-term interest which has weight below the threshold. Owing to that the probability of modifying the short-term interests is higher than that of modifying the long-term interests, we can update the short-term interests locally. According to our simulation results, our method really can reduce the update cost as needed by Wu and Chen' methods. index information filtering personalization profile web usage mining
15	Mining novel Web user behavior models for access prediction / Wang, Hui. January 2003 (has links) Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 83-91). Also available in electronic version. Access restricted to campus users.
16	Effectively capturing user sessions on the Web using Web server logs Caldera, Amithalal, University of Western Sydney, College of Science, Technology and Environment, School of Computing and Information Technology January 2005 (has links) The usage of Web sites has been of interest to Web administrators and researchers ever since the Web started. Analysis of Web site usage data helps to understand the behaviour of its users, which is very important, as many important decisions can be made based on it. The user behaviour may be deduced by knowing all the activities each user does from the time s/he starts a session on the Web site until s/he leaves it, which is collectively called a user session. As Web server logs explicitly record the browsing behaviour of site users and are readily and economically available, this thesis explores the use of Web server logs in capturing user sessions on Web. In order to protect users’ privacy, the standard Web server logs in general do not record the user identities or similar measures to uniquely identify the users. This thesis concentrates on heuristic strategies to infer user sessions. The heuristics exploit the background knowledge of user navigational behaviour recorded in the standard Web server logs without requiring additional information through cookies, logins and session ids. They identify relationships that may exist among the log data and make use of them to assess whether requests registered by the Web server can belong to the same individual and whether these requests were performed during the same visit. Researchers have proposed several heuristics, which were adversely affected by proxy servers, caching and undefined referrers. The thesis proposes new heuristics, which effectively address all the limitations, thus extending the work in this field. It also introduces a set of measures to quantify the performance of the heuristics and uses them to investigate their efficiency based on logs from three Web sites and makes recommendations for the Web sites to devise their own heuristics. The investigation has shown satisfactory results and the new heuristics are applicable to wider range of Web sites. / Doctor of Philosophy (PhD) Web usage mining Internet users statistics Web servers heuristic programming
17	A partition based approach to approximate tree mining a memory hierarchy perspective / Agarwal, Khushbu, January 2008 (has links) Thesis (M.S.)--Ohio State University, 2008. / Title from first page of PDF file. Includes bibliographical references (p. 57-60).
18	Automatic wrapper generation for the extraction of search result records from search engines Zhao, Hongkun. January 2007 (has links) Thesis (Ph. D.)--State University of New York at Binghamton, Dept. of Computer Science, 2007. / Includes bibliographical references.
19	Web mining from client side user activity log / Shun, Yeuk Kiu. January 2002 (has links) Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references (leaves 85-90). Also available in electronic version. Access restricted to campus users.
20	Toward better website usage leveraging data mining techniques and rough set learning to construct better-to-use websites / Khasawneh, Natheer Yousef. January 2005 (has links) Dissertation (Ph. D.)--University of Akron, Dept. of Electrical and Computer Engineering, 2005. / "August, 2005." Title from electronic dissertation title page (viewed 01/14/2006) Advisor, John Durkin; Committee members, John Welch, James Grover, Yueh-Jaw Lin, Yingcai Xiao, Chien-Chung Chan; Department Chair, Alex Jose De Abreu-Garcia; Dean of the College, George Haritos; Dean of the Graduate School, George R. Newkome. Includes bibliographical references.

Search results