Global ETD Search

1	Web usage mining for click fraud detection Neves, André Pacheco Pereira January 2010 (has links) Estágio realizado na AuditMark e orientado pelo Eng.º Pedro Fortuna / Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 2010 Web usage mining
2	A multidimensional data model for monitoring web usage and optimizing website topology Wu, Hao-cun., 吳浩存. January 2004 (has links) published_or_final_version / abstract / toc / Mathematics / Master / Master of Philosophy Web usage mining. Web databases.
3	Finding And Evaluating Patterns In Wes Repository Using Database Technology And Data Mining Algorithms/ Özakar, Belgin. Püskülcü, Halis January 2002 (has links) (PDF) Thesis (Master)--İzmir Institute of Technology, İzmir, 2002. / Includes bibliographical references (59-61). Data mining Web usage mining
4	Web mining techniques for query log analysis and expertise retrieval. / Web挖掘技術及其在搜索引擎查詢日誌和專家搜索中的應用 / CUHK electronic theses & dissertations collection / Web wa jue ji shu ji qi zai sou suo yin qing cha xun ri zhi he zhuan jia sou suo zhong de ying yong January 2009 (has links) Deng, Hongbo. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 156-175). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Information retrieval Web database Web usage mining
5	Workload characterization and customer interaction at e-commerce web servers Wang, Qing 27 October 2004 Electronic commerce servers have a significant presence in today's Internet. Corporations want to maintain high availability, sufficient capacity, and satisfactory performance for their E-commerce Web systems, and want to provide satisfactory services to customers. Workload characterization and the analysis of customers' interactions with Web sites are the bases upon which to analyze server performance, plan system capacity, manage system resources, and personalize services at the Web site. To date, little empirical evidence has been discovered that identifies the characteristics for Web workloads of E-commerce systems and the behaviours of customers. This thesis analyzes the Web access logs at public Web sites for three organizations: a car rental company, an IT company, and the Computer Science department of the University of Saskatchewan. In these case studies, the characteristics of Web workloads are explored at the request level, functionlevel, resource level, and session level; customers' interactions with Web sites are analyzed by identifying and characterizing session groups. The main E-commerce Web workload characteristics and performance implications are: i) The requests for dynamic Web objects are an important part of the workload. These requests should be characterized separately since the system processes them differently; ii) Some popular image files, which are embedded in the same Web page, are always requested together. If these files are requested and sent in a bundle, a system will greatly reduce the overheads in processing requests for these files; iii) The percentage of requests for each Web page category tends to be stable in the workload when the time scale is large enough. This observation is helpful in forecasting workload composition; iv) the Secure Socket Layer protocol (SSL) is heavily used and most Web objects are either requested primarily through SSL or primarily not through SSL; and v) Session groups of different characteristics are identified for all logs. The analysis of session groups may be helpful in improving system performance, maximizing revenue throughput of the system, providing better services to customers, and managing and planning system resources. A hybrid clustering algorithm, which is a combination of the minimum spanning tree method and k-means clustering algorithm, is proposed to identify session clusters. Session clusters obtained using the three session representations Pages Requested, Navigation Pattern, and Resource Usage are similar enough so that it is possible to use different session representations interchangeably to produce similar groupings. The grouping based on one session representation is believed to be sufficient to answer questions in server performance, resource management, capacity planning and Web site personalization, which previously would have required multiple different groupings. Grouping by Pages Requested is recommended since it is the simplest and data on Web pages requested is relatively easy to obtain in HTTP logs. Web Usage Mining Workload Characterization E-commerce
6	Workload characterization and customer interaction at e-commerce web servers Wang, Qing 27 October 2004 (has links) Electronic commerce servers have a significant presence in today's Internet. Corporations want to maintain high availability, sufficient capacity, and satisfactory performance for their E-commerce Web systems, and want to provide satisfactory services to customers. Workload characterization and the analysis of customers' interactions with Web sites are the bases upon which to analyze server performance, plan system capacity, manage system resources, and personalize services at the Web site. To date, little empirical evidence has been discovered that identifies the characteristics for Web workloads of E-commerce systems and the behaviours of customers. This thesis analyzes the Web access logs at public Web sites for three organizations: a car rental company, an IT company, and the Computer Science department of the University of Saskatchewan. In these case studies, the characteristics of Web workloads are explored at the request level, functionlevel, resource level, and session level; customers' interactions with Web sites are analyzed by identifying and characterizing session groups. The main E-commerce Web workload characteristics and performance implications are: i) The requests for dynamic Web objects are an important part of the workload. These requests should be characterized separately since the system processes them differently; ii) Some popular image files, which are embedded in the same Web page, are always requested together. If these files are requested and sent in a bundle, a system will greatly reduce the overheads in processing requests for these files; iii) The percentage of requests for each Web page category tends to be stable in the workload when the time scale is large enough. This observation is helpful in forecasting workload composition; iv) the Secure Socket Layer protocol (SSL) is heavily used and most Web objects are either requested primarily through SSL or primarily not through SSL; and v) Session groups of different characteristics are identified for all logs. The analysis of session groups may be helpful in improving system performance, maximizing revenue throughput of the system, providing better services to customers, and managing and planning system resources. A hybrid clustering algorithm, which is a combination of the minimum spanning tree method and k-means clustering algorithm, is proposed to identify session clusters. Session clusters obtained using the three session representations Pages Requested, Navigation Pattern, and Resource Usage are similar enough so that it is possible to use different session representations interchangeably to produce similar groupings. The grouping based on one session representation is believed to be sufficient to answer questions in server performance, resource management, capacity planning and Web site personalization, which previously would have required multiple different groupings. Grouping by Pages Requested is recommended since it is the simplest and data on Web pages requested is relatively easy to obtain in HTTP logs. Web Usage Mining Workload Characterization E-commerce
7	Personalized web search re-ranking and content recommendation Jiang, Hao, 江浩 January 2013 (has links) In this thesis, I propose a method for establishing a personalized recommendation system for re-ranking web search results and recommending web contents. The method is based on personal reading interest which can be reflected by the user’s dwell time on each document or webpage. I acquire document-level dwell times via a customized web browser, or a mobile device. To obtain better precision, I also explore the possibility of tracking gaze position and facial expression, from which I can determine the attractiveness of different parts of a document. Inspired by idea of Google Knowledge Graph, I also establish a graph-based ontology to maintain a user profile to describe the user’s personal reading interest. Each node in the graph is a concept, which represents the user’s potential interest on this concept. I also use the dwell time to measure concept-level interest, which can be inferred from document-level user dwell times. The graph is generated based on the Wikipedia. According to the estimated concept-level user interest, my algorithm can estimate a user’s potential dwell time over a new document, based on which personalized webpage re-ranking can be carried out. I compare the rankings produced by my algorithm with rankings generated by popular commercial search engines and a recently proposed personalized ranking algorithm. The results clearly show the superiority of my method. I also use my personalized recommendation framework in other applications. A good example is personalized document summarization. The same knowledge graph is employed to estimate the weight of every word in a document; combining with a traditional document summarization algorithm which focused on text mining, I could generate a personalized summary which emphasize the user’s interest in the document. To deal with images and videos, I present a new image search and ranking algorithm for retrieving unannotated images by collaboratively mining online search results, which consists of online images and text search results. The online image search results are leveraged as reference examples to perform content-based image search over unannotated images. The online text search results are used to estimate individual reference images’ relevance to the search query as not all the online image search results are closely related to the query. Overall, the key contribution of my method lies in its ability to deal with unreliable online image search results through jointly mining visual and textual aspects of online search results. Through such collaborative mining, my algorithm infers the relevance of an online search result image to a text query. Once I estimate a query relevance score for each online image search result, I can selectively use query specific online search result images as reference examples for retrieving and ranking unannotated images. To explore the performance of my algorithm, I tested it both on a standard public image datasets and several modestly sized personal photo collections. I also compared the performance of my method with that of two peer methods. The results are very positive, which indicate that my algorithm is superior to existing content-based image search algorithms for retrieving and ranking unannotated images. Overall, the main advantage of my algorithm comes from its collaborative mining over online search results both in the visual and the textual domains. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Web usage mining
8	Characterizing Web linking and usage with hierarchical models / Lou, Wenwu. January 2005 (has links) Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2005. / Includes bibliographical references (leaves 142-151). Also available in electronic version. World Wide Web. Web usage mining.
9	Web workload analysis and session characterization using clustering Jha, Deepak. January 1900 (has links) Thesis (M.S.)--West Virginia University, 2006. / Title from document title page. Document formatted into pages; contains ix, 108 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 105-108).
10	Semantic Analysis in Web Usage Mining Norguet, Jean-Pierre E 20 March 2006 (has links) With the emergence of the Internet and of the World Wide Web, the Web site has become a key communication channel in organizations. To satisfy the objectives of the Web site and of its target audience, adapting the Web site content to the users' expectations has become a major concern. In this context, Web usage mining, a relatively new research area, and Web analytics, a part of Web usage mining that has most emerged in the corporate world, offer many Web communication analysis techniques. These techniques include prediction of the user's behaviour within the site, comparison between expected and actual Web site usage, adjustment of the Web site with respect to the users' interests, and mining and analyzing Web usage data to discover interesting metrics and usage patterns. However, Web usage mining and Web analytics suffer from significant drawbacks when it comes to support the decision-making process at the higher levels in the organization. Indeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility. Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results. An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics. In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites. According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics. Ontologies Semantic Web Web Usage Mining Data mining Web analytics

Search results