Spelling suggestions: "subject:"olla""
121 |
Approximation of OLAP queries on data warehousesCao, Phuong Thao 20 June 2013 (has links) (PDF)
We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet.
|
122 |
Řízení informačních toků malé softwarové společnosti / Management of Information Flow of Small Software CompanyKlimeš, Jiří January 2012 (has links)
This diploma thesis deals with the management of information flow of a small company using Business Intelligence tools and the data mart. There is defined the problem of working with information from the point of view of selected company in the first part. The next part presents selected theoretical background on the basis of which was the aim achieved. The fourth part analyses the current situation of the company. There is recommended complex improvement of the current situation in the practical part. Selected information management problem is accomplished factually. There are also introduced suitable software tools, which were used for the solution.
|
123 |
Faça no seu ritmo mas não perca a hora: tomada de decisão sob demandado usuário utilizando dados da Web / Take your time, but don´t be late: on-demand decision-making using web dataSilva, Manoela Camila Barbosa da 07 August 2017 (has links)
Submitted by Milena Rubi ( ri.bso@ufscar.br) on 2017-10-16T17:29:35Z
No. of bitstreams: 1
SILVA_Manoela_2017.pdf: 5765067 bytes, checksum: 241f86d72385de30ffe23c0f4d49a868 (MD5) / Approved for entry into archive by Milena Rubi ( ri.bso@ufscar.br) on 2017-10-16T17:29:46Z (GMT) No. of bitstreams: 1
SILVA_Manoela_2017.pdf: 5765067 bytes, checksum: 241f86d72385de30ffe23c0f4d49a868 (MD5) / Approved for entry into archive by Milena Rubi ( ri.bso@ufscar.br) on 2017-10-16T17:29:57Z (GMT) No. of bitstreams: 1
SILVA_Manoela_2017.pdf: 5765067 bytes, checksum: 241f86d72385de30ffe23c0f4d49a868 (MD5) / Made available in DSpace on 2017-10-16T17:30:06Z (GMT). No. of bitstreams: 1
SILVA_Manoela_2017.pdf: 5765067 bytes, checksum: 241f86d72385de30ffe23c0f4d49a868 (MD5)
Previous issue date: 2017-08-07 / Não recebi financiamento / In the current knowledge age, with the continuous growth of the web data volume and where business decisions must be made quickly, traditional BI mechanisms become increasingly inaccurate in order to help the decision-making process. In response to this scenario rises the BI 2.0 concept, which is a recent one and is mainly based on the Web evolution, having as one of the main characteristics the use of Web sources in decision-making. However, data from Web tend to be volatile to be stored in the DW, making them a good option for situational data. Situational data are useful for decision-making queries at a particular time and situation, and can be discarded after analysis. Many researches have been developed regarding to BI 2.0, but there are still many points to be explored. This work proposes a generic architecture for Decision Support Systems that aims to integrate situational data from Web to user queries at the right time; this is, when the user needs them for decision making. Its main contribution is the proposal of a new OLAP operator, called Drill-Conformed, enabling data integration in an automatic way and using only the domain of values from the situational data.In addition, the operator collaborates with the Semantic Web, by making available the semantics-related discoveries. The case study is a streamings provision system. The results of the experiments are presented and discussed, showing that is possible to make the data integration in a satisfactory manner and with good processing times for the applied scenario. / Na atual era do conhecimento, com o crescimento contínuo do volume de dados da Web e onde decisões de negócio devem ser feitas de maneira rápida, os mecanismos tradicionais de BI se tornam cada vez menos precisos no auxílio à tomada de decisão. Em resposta a este cenário surge o conceito de BI 2.0, que se trata de um conceito recente e se baseia principalmente na evolução da Web, tendo como uma das principais características a utilização de fontes Web na tomada de decisão. Porém, dados provenientes da Web tendem a ser voláteis para serem armazenados no DW, tornando-se uma boa opção para dados transitórios. Os dados transitórios são úteis para consultas de tomada de decisão em um determinado momento e cenário e podem ser descartados após a análise. Muitos trabalhos têm sido desenvolvidos em relação à BI 2.0, mas ainda existem muitos pontos a serem explorados. Este trabalho propõe uma arquitetura genérica para SSDs, que visa integrar dados transitórios, provenientes da Web, às consultas de usuários no momento em que o mesmo necessita deles para a tomada de decisão. Sua principal contribuição é a proposta de um novo operador OLAP , denominado Drill-Conformed, capaz de realizar a integração dos dados de maneira automática e fazendo uso somente do domínio de valores dos dados transitórios. Além disso, o operador tem o intuito de colaborar com a Web semântica, a partir da disponibilização das informações por ele descobertas acerca do domínio de dados utilizado. O estudo de caso é um sistema de disponibilização de streamings . Os resultados dos experimentos são apresentados e discutidos, mostrando que é possível realizar a integração dos dados de maneira satisfatória e com bons tempos de processamento para o cenário aplicado.
|
124 |
Mineração de dados em data warehouse para sistema de abastecimento de águaGouveia, Roberta Macêdo Marques 29 May 2009 (has links)
Made available in DSpace on 2015-05-14T12:36:29Z (GMT). No. of bitstreams: 1
parte1.pdf: 3149727 bytes, checksum: 501b860d063bed8174828ab1c9287d13 (MD5)
Previous issue date: 2009-05-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This work propose to use technologies of databases with the aim of providing decision
support for managers of sector of sanitation, given that the services of water supply for use of
the population are a key indicator of quality of life. The fundamental idea is to collect
operational data, reduce them to the scope of the problem, organize them into a repository of
data, and finally apply the techniques OLAP and Data Mining algorithms to obtain results that
give managers a better understanding of the behavior and profile of the company. To facilitate
the application of the techniques of Data Mining is necessary that the data are stored properly.
Accordingly, an alternative for increasing the efficiency in storage, management and
operation of data to support the decision based on the development of Data Warehouse. This
is source of strategic information of the business, creating a competitive differential for the
company. In this context, was required to implement the repository of data, Data Warehouse,
to store, integrate and carry out consultations on the multidimensional data from the company
of water supply. Therefore, this Master's thesis aims to design a Data Warehouse relating to
Departmental Business, also known as Data Mart; applied the technology on the OLAP
multidimensional cubes of data, and run the Data Mining algorithms to the generation of a
decision support system to minimize the apparent losses in the urban water supply system. / Esta dissertação se propõe a utilizar tecnologias de Banco de Dados com a finalidade de
oferecer apoio à decisão para os gestores do setor de saneamento, haja vista que os serviços de
abastecimento de água para uso da população se constituem em um dos principais indicadores
da qualidade de vida da humanidade. A idéia fundamental consiste em coletar os dados
operacionais, reduzi-los ao escopo de um problema, organizá-los em um repositório de dados,
e finalmente aplicar as tecnologias OLAP e os algoritmos de Mineração de Dados, a fim de
obter resultados que proporcionem aos gestores um melhor entendimento do comportamento e
perfil da companhia. Para facilitar a aplicação de técnicas de Mineração de Dados é
necessário que estes dados estejam armazenados apropriadamente. Neste sentido, uma das
alternativas para o aumento da eficiência no armazenamento, gestão e operação dos dados
para o suporte a decisão baseia-se no desenvolvimento do Data Warehouse. Este ambiente
constitui fontes de informações estratégicas do negócio, gerando um diferencial competitivo
para a companhia. Diante deste contexto, se fez necessário a implementação do repositório de
dados, o Data Warehouse, para armazenar, integrar e realizar as consultas multidimensionais
sobre os dados extraídos da companhia de abastecimento de água. Portanto, esta dissertação
de mestrado tem como objetivos projetar um Data Warehouse Departamental referente ao
setor comercial, também conhecido como Data Mart; aplicar as tecnologias OLAP sobre os
cubos de dados multidimensionais; e executar algoritmos de Mineração de Dados visando a
geração de um sistema de apoio à decisão para minimização das perdas aparentes no sistema
de abastecimento urbano de água.
|
125 |
Graphs enriched by Cubes (GreC) : a new approach for OLAP on information networks / Graphes enrichis par des Cubes (GreC) : une nouvelle approche pour l’OLAP sur des réseaux d’informationJakawat, Wararat 27 September 2016 (has links)
L'analyse en ligne OLAP (Online Analytical Processing) est une des technologies les plus importantes dans les entrepôts de données, elle permet l'analyse multidimensionnelle de données. Cela correspond à un outil d'analyse puissant, tout en étant flexible en terme d'utilisation pour naviguer dans les données, plus ou moins en profondeur. OLAP a été le sujet de différentes améliorations et extensions, avec sans cesse de nouveaux problèmes en lien avec le domaine et les données, par exemple le multimedia, les données spatiales, les données séquentielles, etc. A l'origine, OLAP a été introduit pour analyser des données structurées que l'on peut qualifier de classiques. Cependant, l'émergence des réseaux d'information induit alors un nouveau domaine intéressant qu'il convient d'explorer. Extraire des connaissances à partir de larges réseaux constitue une tâche complexe et non évidente. Ainsi, l'analyse OLAP peut être une bonne alternative pour observer les données avec certains points de vue. Différents types de réseaux d'information peuvent aider les utilisateurs dans différentes activités, en fonction de différents domaines. Ici, nous focalisons notre attention sur les réseaux d'informations bibliographiques construits à partir des bases de données bibliographiques. Ces données permettent d'analyser non seulement la production scientifique, mais également les collaborations entre auteurs. Il existe différents travaux qui proposent d'avoir recours aux technologies OLAP pour les réseaux d'information, nommé ``graph OLAP". Beaucoup de techniques se basent sur ce qu'on peut appeler cube de graphes. Dans cette thèse, nous proposons une nouvelle approche de “graph OLAP” que nous appelons “Graphes enrichis par des Cubes” (GreC). Notre proposition consiste à enrichir les graphes avec des cubes plutôt que de construire des cubes de graphes. En effet, les noeuds et/ou les arêtes du réseau considéré sont décrits par des cubes de données. Cela permet des analyses intéressantes pour l'utilisateur qui peut naviguer au sein d'un graphe enrichi de cubes selon différents niveaux d'analyse, avec des opérateurs dédiés. En outre, notons quatre principaux aspects dans GreC. Premièrement, GreC considère la structure du réseau afin de permettre des opérations OLAP topologiques, et pas seulement des opérations OLAP classiques et informationnelles. Deuxièmement, GreC propose une vision globale du graphe avec des informations multidimensionnelles. Troisièmement, le problème de dimension à évolution lente est pris en charge dans le cadre de l'exploration du réseau. Quatrièmement, et dernièrement, GreC permet l'analyse de données avec une évolution du réseau parce que notre approche permet d'observer la dynamique à travers la dimension temporelle qui peut être présente dans les cubes pour la description des noeuds et/ou arêtes. Pour évaluer GreC, nous avons implémenté notre approche et mené une étude expérimentale sur des jeux de données réelles pour montrer l'intérêt de notre approche. L'approche GreC comprend différents algorithmes. Nous avons validé de manière expérimentale la pertinence de nos algorithmes et montrons leurs performances. / Online Analytical Processing (OLAP) is one of the most important technologies in data warehouse systems, which enables multidimensional analysis of data. It represents a very powerful and flexible analysis tool to manage within the data deeply by operating computation. OLAP has been the subject of improvements and extensions across the board with every new problem concerning domain and data; for instance, multimedia, spatial data, sequence data and etc. Basically, OLAP was introduced to analyze classical structured data. However, information networks are yet another interesting domain. Extracting knowledge inside large networks is a complex task and too big to be comprehensive. Therefore, OLAP analysis could be a good idea to look at a more compressed view. Many kinds of information networks can help users with various activities according to different domains. In this scenario, we further consider bibliographic networks formed on the bibliographic databases. This data allows analyzing not only the productions but also the collaborations between authors. There are research works and proposals that try to use OLAP technologies for information networks and it is called Graph OLAP. Many Graph OLAP techniques are based on a cube of graphs.In this thesis, we propose a new approach for Graph OLAP that is graphs enriched by cubes (GreC). In a different and complementary way, our proposal consists in enriching graphs with cubes. Indeed, the nodes or/and edges of the considered network are described by a cube. It allows interesting analyzes for the user who can navigate within a graph enriched by cubes according to different granularity levels, with dedicated operators. In addition, there are four main aspects in GreC. First, GreC takes into account the structure of network in order to do topological OLAP operations and not only classical or informational OLAP operations. Second, GreC has a global view of a network considered with multidimensional information. Third, the slowly changing dimension problem is taken into account in order to explore a network. Lastly, GreC allows data analysis for the evolution of a network because our approach allows observing the evolution through the time dimensions in the cubes.To evaluate GreC, we implemented our approach and performed an experimental study on a real bibliographic dataset to show the interest of our proposal. GreC approach includes different algorithms. Therefore, we also validated the relevance and the performances of our algorithms experimentally.
|
126 |
Utilização da tecnologia Data Warehousing e da ferramenta OLAP para apoiar a captação de doadores de sangue: estudo de caso no Hemonúcleo Regional de Jáu / Use of the technology dates warehousing and of the ferramenta OLAP to support the donors\' of blood reception: study of case of \"Regional Blood Bank of Jaú\"Katia Milena Gonçalves Meira 21 December 2004 (has links)
As pesquisas de apoio a decisão na área de saúde, muitas vezes enfocam o diagnóstico e não o caráter gerencial da instituição. Portanto, as unidades de saúde com todas as suas particularidades, necessitam de informações confiáveis e precisas para auxiliá-las na tomada de decisão. Os hemonúcleos convivem com uma luta constante na captação de doadores de sangue para que possam garantir hemocomponentes em quantidade necessária e qualidade para a sua região de abrangência e, informações que possam auxiliá-los na manutenção desses estoques são imprescindíveis. A tecnologia Data Warehousing pode trazer muitos benefícios nesse sentido, por possibilitar o armazenamento de dados históricos que relacionados podem demonstrar tendências e identificar relacionamentos muitas vezes desconhecidos, além de utilizar ferramentas de fácil interação com o usuário. Dessa forma, essa pesquisa tem como objetivo desenvolver uma ferramenta de apoio à decisão que extraia dados do banco de dados transacional atual do Hemonúcleo Regional de Jaú e consista esses dados de forma que as informações possam ser acessadas de maneira simples e rápida pelo usuário final. / Many times, researches sorrounding the health area focus on the diagnostic instead of the managerial character of the institution. Because of this, the health units with their details need reliable and precise information to help them in their decision making. The blood banks live a constant battle to capture blood donors to guarantee a good quantity and quality of blood components for their region and they need information that can help them maintain storage of the blood. Data Warehouse technology brings a lot of benefits because they allow the storage of historic data that, when related, can present tendences and identify unknown relationships. Besides, Data Warehouse technology frequently uses user-friendly interface tools. Therefore, this research has a aim to develop a decision supporting tool that extract data from the current transaction database of Regional Blood Bank of Jau and that check these data so that the information may be easily and quickly accessed by the final user(s).
|
127 |
Proposta de um modelo de sistema de apoio à decisão em vendas: uma aplicação / Proposal of decision support system\'s model for sales: an applicationFranklin Seoane Stefanoli 20 March 2003 (has links)
Este trabalho teve por objetivo o desenvolvimento de uma proposta de um modelo de sistema de apoio à decisão em vendas e sua aplicação. O levantamento sobre o perfil das vendas no mercado corporativo - de empresas-para-empresas, as técnicas de vendas, informações necessárias para a realização de uma venda eficiente, tal qual o controle das ações e resultados dos vendedores com a ajuda de relatórios, tudo isso aliado às tecnologias de data warehouse, data mart, OLAP foram essenciais na elaboração de uma proposta de modelo genérico e sua implantação. Esse modelo genérico foi aplicado levando-se em conta uma editora de listas e guias telefônicos hipotética, e foi construído buscando-se suprir os profissionais de vendas com informações que poderão melhorar a efetividade de suas vendas e dar-lhes maior conhecimento sobre seus produtos, clientes, usuários de listas e o mercado como um todo, além de suprir os gerentes de uma ferramenta rápida e confiável de auxílio à análise e coordenação dos esforços de vendas. A possibilidade de visualização rápida, confiável e personalizada das diversas informações permitidas por esse sistema, tal qual o êxito em responder às perguntas de pesquisas apresentadas no trabalho, comprova que essa aplicação poderá ser útil à empresa e em específico aos profissionais de vendas e gerentes tomadores de decisão. / The objective of this study was a development of a decision support system\'s application for sales. Relevant information about business markets and business buying behavior, principles of selling and evaluation models for sales representatives, all of this supported by data warehouse, data mart and OLAP technologies was important to create a generic DSS model as well your application. This generic model was created due to match some information needs of sales professionals from a hypothetic publishing company (yellow page). This system and the information generated by it would improve the effectiveness of sales professionals (sales reps), giving them more information and knowledge about your product, market, customers, users, and offering to sales managers as well decision makers a reliable, fast and interactive tool for sales effort\'s analysis and monitoring. A fast, interactive, reliable and customized visualization of information - characteristics of this system, as well your effectiveness of answering questions presented in the beginning of this study reinforce the utility and applicability of this model in order to satisfy sales professionals, managers and publishing companies information needs.
|
128 |
Um data warehouse de publicações científicas: indexação automática da dimensão tópicos de pesquisa dos data marts / A Data warehouse for scientific publications: automatic indexing of the research topic dimension for using in data martsAugusto Kanashiro 04 May 2007 (has links)
Este trabalho de mestrado insere-se no contexto do projeto de uma Ferramenta Inteligente de Apoio à Pesquisa (FIP), sendo desenvolvida no Laboratório de Inteligência Computacional do ICMC-USP. A ferramenta foi proposta para recuperar, organizar e minerar grandes conjuntos de documentos científicos (na área de computação). Nesse contexto, faz-se necessário um repositório de artigos para a FIP. Ou seja, um Data Warehouse que armazene e integre todas as informações extraídas dos documentos recuperados de diferentes páginas pessoais, institucionais e de repositórios de artigos da Web. Para suportar o processamento analítico on-line (OLAP) das informações e facilitar a ?mineração? desses dados é importante que os dados estejam armazenados apropriadamente. Dessa forma, o trabalho de mestrado teve como objetivo principal projetar um Data Warehouse (DW) para a ferramenta FIP e, adicionalmente, realizar experimentos com técnicas de mineração e Aprendizado de Máquina para automatizar o processo de indexação das informações e documentos armazenados no data warehouse (descoberta de tópicos). Para as consultas multidimensionais foram construídos data marts de forma a permitir aos pesquisadores avaliar tendências e a evolução de tópicos de pesquisa / This dissertation is related to the project of an Intelligent Tool for Research Supporting (FIP), being developed at the Laboratory of Computational Intelligence at ICMC-USP. The tool was proposed to retrieve, organize, and mining large sets of scientific documents in the field of computer science. In this context, a repository of articles becomes necessary, i.e., a Data Warehouse that integrates and stores all extracted information from retrieved documents from different personal and institutional web pages, and from article repositories. Data appropriatelly stored is decisive for supporting online analytical processing (OLAP), and ?data mining? processes. Thus, the main goal of this MSc research was design the FIP Data Warehouse (DW). Additionally, we carried out experiments with Data Mining and Machine Learning techniques in order to automatize the process of indexing of information and documents stored in the data warehouse (Topic Detection). Data marts for multidimensional queries were designed in order to facilitate researchers evaluation of research topics trend and evolution
|
129 |
Processamento de consultas analíticas com predicados de similaridade entre imagens em ambientes de data warehousing / Processing of analytical with similarity search predicates over images in data warehousing environmentsJefferson William Teixeira 29 May 2015 (has links)
Um ambiente de data warehousing oferece suporte ao processo de tomada de decisão. Ele consolida dados de fontes de informação distribuições, autônomas e heterogêneas em um único componente, o data warehouse, e realiza o processamento eficiente de consultas analíticas, denominadas OLAP (on-line analytical processing). Um data warehouse convencional armazena apenas dados alfanuméricos. Por outro lado, um data warehouse de imagens armazena, além desses dados convencionais, características intrínsecas de imagens, permitindo a realização de consultas analíticas estendidas com predicados de similaridade entre imagens. Esses ambientes demandam, portanto, a criação de estratégias que possibilitem o processamento eficiente dessas consultas complexas e custosas. Apesar de haver na literatura trabalhos voltados a índices bitmap para ambientes de data warehousing e métodos de acesso métricos para melhorar o desempenho de consultas por similaridade entre imagens, no melhor do nosso conhecimento, não há uma técnica que investigue essas duas questões em um mesmo contexto. Esta dissertação visa preencher essa lacuna na literatura por meio das seguintes contribuições: (i) proposta do ImageDWindex, um mecanismo para a otimização de consultas analíticas estendidas com predicados de similaridade entre imagens; e (ii) definição de diferentes estratégias de processamento de consultas sobre data warehouses de imagens usando o ImageDW-index. Para validar as soluções propostas, foram desenvolvidas duas outras contribuições secundárias, que são: (iii) o ImageDW-Gen, um gerador de dados com o objetivo de povoar o data warehouse de imagens; e (iv) a proposta de quatro classes de consulta, as quais enfocam em diferentes custos de processamento dos predicados de similaridade entre imagens. Utilizando o ImageDW-Gen, foram realizados testes de desempenho para investigar as vantagens introduzidas pelas estratégias propostas, de acordo com as classes de consultas definidas. Comparado com o trabalho mais correlato existente na literatura, o uso do ImageDWindex proveu uma melhora no desempenho do processamento de consultas IOLAP que variou em média de 55,57% até 82,16%, considerando uma das estratégias propostas. / A data warehousing environment offers support to the decision-making process. It consolidates data from distributed, autonomous and heterogeneous information sources into one of its main components, the data warehouse. Furthermore, it provides effcient processing of analytical queries (i.e. OLAP queries). A conventional data warehouse stores only alphanumeric data. On the other hand, an image data warehouse stores not only alphanumeric data but also intrinsic features of images, thus allowing data warehousing environments to perform analytical similarity queries over images. This requires the development of strategies to provide efficient processing of these complex and costly queries. Although there are a number of approaches in the literature aimed at the development of bitmap index for data warehouses and metric access methods for the efficient processing of similarity queries over images, to the best of our knowledge, there is not an approach that investigate these two issues in the same setting. In this research, we fill this gap in the literature by introducing the following main contributions: (i) the proposal of the ImageDW-index, an optimization mechanism aimed at the efficient processing of analytical queries extended with similarity predicates over images; and (ii) definition of different processing strategies for image data warehouses using the ImageDW-index. In order to validate these main proposals, we also introduce two secondary contributions, as follows: (iii) the ImageDW-Gen, a data generator to populate image data warehouses; and (iv) the proposal of four query classes, each one enforcing different query processing costs associated to the similarity predicates in image data warehousing environments. Using the ImageDW-Gen, performance tests were carried out in order to investigate the advantages introduced by the proposed strategies, according to the query classes. Compared to the most related work available in the literature, the ImageDW-index provided a performance gain that varied from 55.57% to 82.16%, considering one of the proposed strategies.
|
130 |
Semantic analysis in web usage miningNorguet, Jean-Pierre 20 March 2006 (has links)
With the emergence of the Internet and of the World Wide Web, the Web site has become a key communication channel in organizations. To satisfy the objectives of the Web site and of its target audience, adapting the Web site content to the users' expectations has become a major concern. In this context, Web usage mining, a relatively new research area, and Web analytics, a part of Web usage mining that has most emerged in the corporate world, offer many Web communication analysis techniques. These techniques include prediction of the user's behaviour within the site, comparison between expected and actual Web site usage, adjustment of the Web site with respect to the users' interests, and mining and analyzing Web usage data to discover interesting metrics and usage patterns. However, Web usage mining and Web analytics suffer from significant drawbacks when it comes to support the decision-making process at the higher levels in the organization.<p><p>Indeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.<p><p>Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results. <p><p>An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics. <p><p>In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites. <p><p>According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics. <p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
|
Page generated in 0.1567 seconds