Spelling suggestions: "subject:"eeb log mining"" "subject:"beb log mining""
1 |
Enhancements of pre-processing, analysis and presentation techniques in web log mining / Žiniatinklio įrašų gavybos paruošimo, analizės ir rezultatų pateikimo naudotojui tobulinimasPabarškaitė, Židrina 13 July 2009 (has links)
As Internet is becoming an important part of our life, more attention is paid to the information quality and how it is displayed to the user. The research area of this work is web data analysis and methods how to process this data. This knowledge can be extracted by gathering web servers’ data – log files, where all users’ navigational patters about browsing are recorded.
The research object of the dissertation is web log data mining process. General topics that are related with this object: web log data preparation methods, data mining algorithms for prediction and classification tasks, web text mining. The key target of the thesis is to develop methods how to improve knowledge discovery steps mining web log data that would reveal new opportunities to the data analyst.
While performing web log analysis, it was discovered that insufficient interest has been paid to web log data cleaning process. By reducing the number of redundant records data mining process becomes much more effective and faster. Therefore a new original cleaning framework was introduced which leaves records that only corresponds to the real user clicks.
People tend to understand technical information more if it is similar to a human language. Therefore it is advantageous to use decision trees for mining web log data, as they generate web usage patterns in the form of rules which are understandable to humans. However, it was discovered that users browsing history length is different, therefore specific data... [to full text] / Internetui skverbiantis į mūsų gyvenimą, vis didesnis dėmesys kreipiamas į informacijos pateikimo kokybę, bei į tai, kaip informacija yra pateikta. Disertacijos tyrimų sritis yra žiniatinklio serverių kaupiamų duomenų gavyba bei duomenų pateikimo galutiniam naudotojui gerinimo būdai. Tam reikalingos žinios išgaunamos iš žiniatinklio serverio žurnalo įrašų, kuriuose fiksuojama informacija apie išsiųstus vartotojams žiniatinklio puslapius.
Darbo tyrimų objektas yra žiniatinklio įrašų gavyba, o su šiuo objektu susiję dalykai: žiniatinklio duomenų paruošimo etapų tobulinimas, žiniatinklio tekstų analizė, duomenų analizės algoritmai prognozavimo ir klasifikavimo uždaviniams spręsti. Pagrindinis disertacijos tikslas – perprasti svetainių naudotojų elgesio formas, tiriant žiniatinklio įrašus, tobulinti paruošimo, analizės ir rezultatų interpretavimo etapų metodologijas.
Darbo tyrimai atskleidė naujas žiniatinklio duomenų analizės galimybes. Išsiaiškinta, kad internetinių duomenų – žiniatinklio įrašų švarinimui buvo skirtas nepakankamas dėmesys. Parodyta, kad sumažinus nereikšmingų įrašų kiekį, duomenų analizės procesas tampa efektyvesnis. Todėl buvo sukurtas naujas metodas, kurį pritaikius žinių pateikimas atitinka tikruosius vartotojų maršrutus.
Tyrimo metu nustatyta, kad naudotojų naršymo istorija yra skirtingų ilgių, todėl atlikus specifinį duomenų paruošimą – suformavus fiksuoto ilgio vektorius, tikslinga taikyti iki šiol nenaudotus praktikoje sprendimų medžių algoritmus... [toliau žr. visą tekstą]
|
2 |
Caching Techniques For Dynamic Web ServersSuresha, * 07 1900 (has links)
Websites are shifting from static model to dynamic model, in order to deliver their users with dynamic, interactive, and personalized experiences. However, dynamic content generation comes at a cost – each request requires computation as well as communication across multiple components within the website and across the Internet. In fact, dynamic pages are constructed on the fly, on demand. Due to their construction overheads and non-cacheability, dynamic pages result in substantially increased user response times, server load and increased bandwidth consumption, as compared to static pages. With the exponential growth of Internet traffic and with websites becoming increasingly complex, performance and scalability have become major bottlenecks for dynamic websites.
A variety of strategies have been proposed to address these issues. Many of these solutions perform well in their individual contexts, but have not been analyzed in an integrated fashion. In our work, we have carried out a study of combining a carefully chosen set of these approaches and analyzed their behavior. Specifically, we consider solutions based on the recently-proposed fragment caching technique, since it ensures both correctness and freshness of page contents. We have developed mechanisms for reducing bandwidth consumption and dynamic page construction overheads by integrating fragment caching with various techniques such as proxy-based caching of dynamic contents, pre-generating pages, and caching program code.
We start with presenting a dynamic proxy caching technique that combines the benefits of both proxy-based and server-side caching approaches, without suffering from their individual limitations. This technique concentrates on reducing the bandwidth consumption due to dynamic web pages. Then, we move on to presenting mechanisms for reducing dynamic page construction times -- during normal loading, this is done through a hybrid technique of fragment caching and page pre-generation, utilizing the excess capacity with which web servers are typically provisioned to handle peak loads. During peak loading, this is achieved by integrating fragment-caching and code-caching, optionally augmented with page pre-generation.
In summary, we present a variety of methods for integrating existing solutions for serving dynamic web pages with the goal of achieving reduced bandwidth consumption from the web infrastructure perspective, and reduced page construction times from user perspective.
|
Page generated in 0.0871 seconds