1 |
"Aplicação de técnicas de data mining em logs de servidores web"Chiara, Ramon 09 May 2003 (has links)
Com o advento da Internet, as empresas puderam mostrar-se para o mundo. A possibilidade de colocar um negócio na World Wide Web (WWW) criou um novo tipo de dado que as empresas podem utilizar para melhorar ainda mais seu conhecimento sobre o mercado: a sequência de cliques que um usuário efetua em um site. Esse dado pode ser armazenado em uma espécie de Data Warehouse para ser analisado com técnicas de descoberta de conhecimento em bases de dados. Assim, há a necessidade de se realizar pesquisas para mostrar como retirar conhecimento a partir dessas sequências de cliques. Neste trabalho são discutidas e analisadas algumas das técnicas utilizadas para atingir esse objetivo. é proposta uma ferramenta onde os dados dessas sequências de cliques são mapeadas para o formato atributo-valor utilizado pelo Sistema Discover, um sistema sendo desenvolvindo em nosso Laboratório para o planejamento e execução de experimentos relacionados aos algoritmos de aprendizado utilizados durante a fase de Mineração de Dados do processo de descoberta de conhecimento em bases de dados. Ainda, é proposta a utilização do sistema de Programação Lógica Indutiva chamado Progol para extrair conhecimento relacional das sessões de sequências de cliques que caracterizam a interação de usuários com as páginas visitadas no site. Experimentos iniciais com a utilização de uma sequência de cliques real foram realizados usando Progol e algumas das facilidades já implementadas pelo Sistema Discover.
|
2 |
How Much of It is Real? Analysis of Paid Placement in Web Search Engine ResultsNicholson, Scott, Sierra, Tito, Eseryel, U. Yeliz, Park, Ji-Hong, Barkow, Philip, Pozo, Erika J., Ward, Jane January 2005 (has links)
Most Web search tools integrate sponsored results with results from their internal editorial database in providing results to users. The goal of this research is to get a better idea of how much of the screen real estate displays â realâ editorial results as compared to sponsored results. The overall average results are that 40% of all results presented on the first screen are â realâ results, and when the entire first Web page is considered, 67% of the results are non-sponsored results. For general search tools like Google, 56% of the first screen and 82% of the first Web page contain non-sponsored results. Other results include that query structure makes a significant difference in the percentage of non-sponsored results returned by a search. Similarly, the topic of the query can also have a significant effect on the percentage of sponsored results displayed by most Web search tools.
|
3 |
A Proposal for Categorization and Nomenclature for Web Search ToolsNicholson, Scott January 2000 (has links)
Also published in Journal of Internet Cataloging, 2(3/4), 9-28, 2000 / Ambiguities in Web search tool (more commonly known as "search engine") terminology are problematic when conducting precise, replicable research or when teaching others to use search tools. Standardized terminology would enable Web searchers to be aware of subtle differences between Web search tools and the implications of these for searching. A categorization and nomenclature for standardized classifications of different aspects of Web search tools is proposed, and advantages and disadvantages of using tools in each category are discussed.
|
4 |
HelpfulMed: Intelligent Searching for Medical Information over the InternetChen, Hsinchun, Lally, Ann M., Zhu, Bin, Chau, Michael 05 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Medical professionals and researchers need information
from reputable sources to accomplish their work. Unfortunately,
the Web has a large number of documents that
are irrelevant to their work, even those documents that
purport to be â medically-related.â This paper describes
an architecture designed to integrate advanced searching
and indexing algorithms, an automatic thesaurus, or
â concept space,â and Kohonen-based Self-Organizing
Map (SOM) technologies to provide searchers with finegrained
results. Initial results indicate that these systems
provide complementary retrieval functionalities.
HelpfulMed not only allows users to search Web pages
and other online databases, but also allows them to
build searches through the use of an automatic thesaurus
and browse a graphical display of medical-related
topics. Evaluation results for each of the different components
are included. Our spidering algorithm outperformed
both breadth-first search and PageRank spiders
on a test collection of 100,000 Web pages. The automatically
generated thesaurus performed as well as both
MeSH and UMLSâ systems which require human mediation
for currency. Lastly, a variant of the Kohonen SOM
was comparable to MeSH terms in perceived cluster
precision and significantly better at perceived cluster
recall.
|
5 |
Design and evaluation of a multi-agent collaborative Web mining systemChau, Michael, Zeng, Daniel, Chen, Hsinchun, Huang, Michael, Hendriawan, David 04 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Most existing Web search tools work only with individual users and do not help a user benefit from previous search
experiences of others. In this paper, we present the Collaborative Spider, a multi-agent system designed to provide post-retrieval
analysis and enable across-user collaboration in Web search and mining. This system allows the user to annotate search sessions
and share them with other users. We also report a user study designed to evaluate the effectiveness of this system. Our
experimental findings show that subjectsâ search performance was degraded, compared to individual search scenarios in which
users had no access to previous searches, when they had access to a limited number (e.g., 1 or 2) of earlier search sessions done
by other users. However, search performance improved significantly when subjects had access to more search sessions. This
indicates that gain from collaboration through collaborative Web searching and analysis does not outweigh the overhead of
browsing and comprehending other usersâ past searches until a certain number of shared sessions have been reached. In this
paper, we also catalog and analyze several different types of user collaboration behavior observed in the context of Web mining.
|
6 |
Indexing and Abstracting on the World Wide Web: An Examination of Six Web DatabasesNicholson, Scott January 1997 (has links)
Web databases, commonly known as search engines or web directories, are currently the most useful way to search the Internet. In this article, the author draws from library literature to develop a series of questions that can be used to analyze these web searching tools. Six popular web databases are analyzed using this method. Using this analysis, the author creates three categories for web databases and explores the most appropriate searches to perform with each. The work concludes with a proposal for the ideal web database.
|
7 |
"Aplicação de técnicas de data mining em logs de servidores web"Ramon Chiara 09 May 2003 (has links)
Com o advento da Internet, as empresas puderam mostrar-se para o mundo. A possibilidade de colocar um negócio na World Wide Web (WWW) criou um novo tipo de dado que as empresas podem utilizar para melhorar ainda mais seu conhecimento sobre o mercado: a sequência de cliques que um usuário efetua em um site. Esse dado pode ser armazenado em uma espécie de Data Warehouse para ser analisado com técnicas de descoberta de conhecimento em bases de dados. Assim, há a necessidade de se realizar pesquisas para mostrar como retirar conhecimento a partir dessas sequências de cliques. Neste trabalho são discutidas e analisadas algumas das técnicas utilizadas para atingir esse objetivo. é proposta uma ferramenta onde os dados dessas sequências de cliques são mapeadas para o formato atributo-valor utilizado pelo Sistema Discover, um sistema sendo desenvolvindo em nosso Laboratório para o planejamento e execução de experimentos relacionados aos algoritmos de aprendizado utilizados durante a fase de Mineração de Dados do processo de descoberta de conhecimento em bases de dados. Ainda, é proposta a utilização do sistema de Programação Lógica Indutiva chamado Progol para extrair conhecimento relacional das sessões de sequências de cliques que caracterizam a interação de usuários com as páginas visitadas no site. Experimentos iniciais com a utilização de uma sequência de cliques real foram realizados usando Progol e algumas das facilidades já implementadas pelo Sistema Discover.
|
8 |
Introduction to the JASIST Special Topic Section on Web Retrieval and Mining: A Machine Learning PerspectiveChen, Hsinchun 05 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Research in information retrieval (IR) has advanced significantly
in the past few decades. Many tasks, such as
indexing and text categorization, can be performed automatically
with minimal human effort. Machine learning has
played an important role in such automation by learning
various patterns such as document topics, text structures,
and user interests from examples.
In recent years, it has become increasingly difficult to
search for useful information on the World Wide Web
because of its large size and unstructured nature. Useful
information and resources are often hidden in the Web.
While machine learning has been successfully applied to
traditional IR systems, it poses some new challenges to
apply these algorithms to the Web due to its large size, link
structure, diversity in content and languages, and dynamic
nature. On the other hand, such characteristics of the Web
also provide interesting patterns and knowledge that do not
present in traditional information retrieval systems.
|
9 |
Comparison of Three Vertical Search SpidersChau, Michael, Chen, Hsinchun 05 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Spiders are the software agents that search
engines use to collect content for their databases.
We investigated algorithms to improve the performance
of vertical search engine spiders. The
investigation addressed three approaches: a
breadth-first graph-traversal algorithm with no
heuristics to refine the search process, a best-first
traversal algorithm that used a hyperlink-analysis
heuristic, and a spreading-activation algorithm
based on modeling the Web as a neural network.
|
10 |
Special issue: "Web retrieval and mining"Chen, Hsinchun 04 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / Search engines and data mining are two research
areas that have experienced significant progress over
the past few years. Overwhelming acceptance of the
Internet as a primary medium for content delivery and
business transactions has created unique opportunities
and challenges for researchers. The richness of the
webâ s multimedia content, the reach and timeliness of
web-based publication, the proliferation of e-commerce
activities and the potential for wireless web
delivery have generated many interesting research
problems. Technical, system, organizational and
social research approaches are all needed to address
these research problems. Many interesting webretrieval
and mining research topics have emerged
recently. These include, but are not limited to, the
following: text and data mining on the web, web visualization, web intelligence and agents, web-based decision support and knowledge management, wireless web retrieval and visualization, web-based usability methodology, web-based analysis for eCommerce applications.
This special issue consists of nine papers that
report research in web retrieval and mining.
|
Page generated in 0.0555 seconds