31 |
Constructing hierarchical advisors and advisees relationships using web informationHsi, Huei-Chan 16 June 2003 (has links)
This research is aimed to build a social network system for college teachers by retrieving different data sources available on the Internet, the characteristics of the constructed social network are represented in graphic and numeric modes. By applying relationships maintained in the social network, we can find the shortest path between any two researchers or the ego-center social network of any individual teacher according to the input parameters given by a query user.. That is to say we have realized the knowledge map concept for college teachers.
In this research, we focus on searching the advisory relationships between advisors and students. . Because after a Ph.D. student graduated, he/she could be an advisor guiding other students, by applying recursion of advisory relationships ¡A we can construct a multi-level hierarchy of family-tree for a given advisor and the family-tree can be viewed as a whole family-tree and just those non-leaf nodes. We also analyzed some interesting characteristics of the created family-tree¡A compared with the human relationships in our real society to evaluate and explain some phenomena happened in our academic society.
Furthermore, we combine two information of social network and knowledge map for developing the ANIWEB system, providing web-based query functions for users to search teachers¡¦ social network. Two types of query can be applied, one is searching for teacher¡¦s personal information, such as biography, educational background, specialty and NSC projects; the other is searching for social network information about an interested teacher, such as multi-level advisory relationship, co-advisory relationship, ego-center social network and the shortest path between any two teachers.
Users can apply different search patterns for their different needs. For example, a user can first search for those teachers with an expertise of a given research topics, then search for the shortest path from the social network to find out the expert he/she could get in touch.
|
32 |
MetaSpider: Meta-Searching and Categorization on the WebChen, Hsinchun, Fan, Haiyan, Chau, Michael, Zeng, Daniel January 2001 (has links)
Artificial Intelligence Lab, Department of MIS, Univeristy of Arizona / It has become increasingly difficult to locate relevant
information on the Web, even with the help of Web
search engines. Two approaches to addressing the low
precision and poor presentation of search results of
current search tools are studied: meta-search and document
categorization. Meta-search engines improve
precision by selecting and integrating search results
fromgeneric or domain-specific Web search engines or
other resources. Document categorization promises
better organization and presentation of retrieved results.
This article introduces MetaSpider, a meta-search engine
that has real-time indexing and categorizing functions.
We report in this paper the major components of
MetaSpider and discuss related technical approaches.
Initial results of a user evaluation study comparing Meta-
Spider, NorthernLight, and MetaCrawler in terms of
clustering performance and of time and effort expended
show that MetaSpider performed best in precision rate,
but disclose no statistically significant differences in
recall rate and time requirements. Our experimental
study also reveals that MetaSpider exhibited a higher
level of automation than the other two systems and
facilitated efficient searching by providing the user with
an organized, comprehensive view of the retrieved documents.
|
33 |
A sentiment-based meta search engineNa, Jin-Cheon, Khoo, Christopher S.G., Chan, Syin January 2006 (has links)
This study is in the area of sentiment classification: classifying online review documents according to the overall sentiment expressed in them. This paper presents a prototype sentiment-based meta search engine that has been developed to perform sentiment categorization of Web search results. It assists users to quickly focus on recommended or non-recommended information by classifying Web search results into four categories: positive, negative, neutral, and non-review documents. It does this by using an automatic classifier based on a supervised machine learning algorithm, Support Vector Machine (SVM). This paper also discusses various issues we have encountered during the prototype development, and presents our approaches for resolving them. A user evaluation of the prototype was carried out with positive responses from users.
|
34 |
Visual Exploration of Web SpacesPascual Cid, Victor 20 December 2010 (has links)
El gran volumen de datos que las técnicas de minería Web generan sobre espacios Web puede
llegar a ser muy difícil de entender, provocando la necesidad de desarrollar nuevas técnicas que
permitan generar conocimiento sobre esos datos con el fin de facilitar la toma de decisiones. Esta
tesis explora la utilización de técnicas de InfoVis/VA para ayudar en la exploración de espacios
Web. Más concretamente, presentamos el desarrollo de un prototipo muy flexible que ha sido
utilizado para analizar tres tipos distintos de espacios Web con distintas metas informacionales: el
análisis de la usabilidad de páginas Web, la evaluación del comportamiento de los estudiantes en
entornos virtuales de aprendizaje y la exploración de la estructura de grandes conversaciones
asíncronas existentes en foros online.
Esta tesis pretende aceptar el reto propuesto por la comunidad de InfoVis/VA de llevar a cabo
investigaciones en condiciones más reales, introduciendo los problemas relacionados con el
análisis de los espacios Web ya mencionados, y explorando las ventajas de utilizar las
visualizaciones proporcionadas por nuestra herramienta con usuarios reales. / The vast amount of data that Web mining techniques generate from Web spaces is difficult to
understand, suggesting the need to develop new techniques to gather insight into them in order to
assist in decision making processes.
This dissertation explores the usage of InfoVis/VA techniques to assist in the exploration of Web
spaces. More specifically, we present the development of a customisable prototype that has been
used to analyse three different types of Web spaces with different information goals: the analysis of
the usability of a website, the assessment of the students in virtual learning environments, and the
exploration of the structure of large asynchronous conversations existing in online forums.
Echoing the call of the Infovis/VA community for the need for more research into realistic
circumstances, we introduce the problems of the analysis of such Web spaces, and further explore
the benefits of using the visualisations provided by our system with real users. / El gran volum de dades que les tècniques de mineria Web proporcionen sobre els espais Web és
generalment molt difícil dʼentendre, provocant la necessitat de desenvolupar noves tècniques que
permetin generar coneixement sobre les dades de manera que facilitin la presa de decissions.
Aquesta tesi explora la utilizació de tècniques dʼInfovis/VA per ajudar en lʼexploració dʼespais Web.
Més concretament, presentem el desenvolupament dʼun prototipus molt flexible que hem utilitzat
per analitzar tres tipus diferents dʼespais Web amb diferents objectius informacionals: lʼanèlisi de la
usabilitat de pàgines Web, lʼavaluació del comportament dels estudiants en entorns virtuals
dʼaprenentatge i lʼexploració de lʼestructura de grans converses asíncrones existents en fòrums
online.
Aquesta tesi pretén acceptar el repte proposat per la comunitat dʼInfoVis/VA de fer recerca en
condicions més reals, introduint els problemes relacionats en lʼanàlisi dels espais Web ja
esmentats, i explorant els avantatges dʼutilizar les visualitzacions proporcionades per la nostra eina
amb usuaris reals.
|
35 |
A Visualization Dashboard for Muslim Social MovementsJanuary 2012 (has links)
abstract: Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other organizations, their target demographics and the area of influence. We have a huge amount of information which is a result of the research done over these topics. This thesis provides a powerful and interactive way to navigate through all this information, using a Visualization Dashboard. The dashboard makes it easier for Social Scientists, Policy Analysts, Military and other personnel to visualize an organization's propensity towards violence and radicalism. It also tracks the peaking religious, political and socio-economic markers, their target demographics and locations. A powerful search interface with parametric search helps in narrowing down to specific scenarios and view the corresponding information related to the organizations. This tool helps to identify moderate Counter-Radical organizations and also has the potential of predicting the orientation of various organizations based on the current information. / Dissertation/Thesis / M.S. Computer Science 2012
|
36 |
Mejoramiento de una metodología para la identificación de website keyobject mediante la aplicación de tecnologías eye tracking, análisis de dilatación pupilar y algoritmos de web miningMartínez Azocar, Gustavo Adolfo January 2013 (has links)
Ingeniero Civil Industrial / El crecimiento acelerado de internet ha creado un aumento sostenido de los sitios web para todo tipo de empresas, organizaciones y particulares, provocando un nivel de oferta inmensamente alto. Estos sitios comienzan cada vez más a ser un importante canal tanto de comunicación directa con el cliente como de ventas, por lo que se hace necesario tratar de generar estrategias que permitan atraer a más usuarios al sitio y además hacer que los actuales usuarios continúen utilizándolo. Esto lleva a preguntarse qué tipo de información resulta de utilidad para el usuario final y como poder identificar esa información.
Anteriormente se ha tratado de abordar este problema mediante técnica de web mining a las áreas de contenido, estructuras y usabilidad de un sitio web, de modo de poder encontrar patrones que permitan generar información y conocimiento sobre estos datos. Estos a su vez permitirían tomar mejores decisiones respecto de la estructura y contenido de los sitios web.
Sin embargo este tipo de técnicas incluía la conjunción de datos objetivos (web logs) con datos subjetivos (encuestas y focus group principalmente), los cuales poseen una alta variabilidad tanto personal como interpersonal. Esto provoca que el análisis posterior de los datos pueda contener errores, lo que redunda en peores decisiones.
Para resolver en cierta manera eso, este proyecto de memoria desarrolló algoritmos de web mining que incluyen análisis de exploración visual y neurodatos. Al ser ambas fuentes de datos objetivas, se elimina en cierta parte la variabilidad de los resultados posteriores, con la consecuente mejora en las decisiones a tomar.
El resultado principal de este proyecto son algoritmos de web mining y modelos de comportamiento del usuario que incluyen información de análisis de exploración visual y datos obtenidos a través de ténicas de neurociencia. Se incluyen también una lsita de website keyobjects encontrados en la página de prueba para este proyecto.
Se incluyen además una revisión general acerca de los principales temas sobre los que el proyecto se basa: la web e internet, el proceso KDD, Web Mining, sistemas de eye tracking y website keyobjects. Por otra parte se especificaron los alcances del proyecto de memoria, tanto técnicos como de investigación.
Se concluye que el resultado del trabajo fue exitoso, incluso siendo el resultado de los algoritmos similares a la metodología previa. Sin embargo se abre un nuevo camino en cuanto al análisis de sitio dadas las relaciones encontradas entre el comportamiento pupilar y el análisis del sitio. Son incluidas ciertas consideraciones y recomendaciones para continuar y mejorar este trabajo.
|
37 |
Centralizace a správa distribuovaných informaci / Centralization and maintenance of distributed informationValčák, Richard January 2010 (has links)
The master’s thesis deals with the web mining, information sources, unattended access methods to these sources, summary of available methods and tools. Web data mining is a very useful tool for required information acquiring, which is used for further processing. The work is focused on the proposal of a system, which is created to gather required information from given sources. The master’s thesis consists of three parts, which employ the developed library: API, which is used by programmers, server application for gathering information in time (such an exchange rate for instance) and example of AWT application, which serves for the processing of tables available on the internet.
|
38 |
WebKnox: Web Knowledge ExtractionUrbansky, David 21 August 2009 (has links) (PDF)
This thesis focuses on entity and fact extraction from the web. Different knowledge representations and techniques for information extraction are discussed before the design for a knowledge extraction system, called WebKnox, is introduced. The main contribution of this thesis is the trust ranking of extracted facts with a self-supervised learning loop and the extraction system with its composition of known and refined extraction algorithms. The used
techniques show an improvement in precision and recall in most of the matters for entity and fact extractions compared to the chosen baseline approaches.
|
39 |
Hierarkisk klustring av klickströmmar : En metodik för identifiering av användargrupperSchorn, Björn January 2022 (has links)
Nasdaq utvecklar och tillhandahåller mjukvarulösningar för clearinghus. Det finns ett intresse för att utveckla en fördjupad förståelse för hur funktionaliteten av produkten används. En möjlighet för detta är att använda sig av hierarkisk klustring av klickströmmar från webbgränssnittet. Denna rapport utvecklar ett tillvägagångsätt för en sådan klustring och tillämpar den på ett redan befintligt dataset av klickströmsloggar. Att använda sig av ett euklidiskt avståndsmått kan fungera för enklare klustringar så som gruppering av produktsidor. För en djupare analys av användarbeteendet genom en klustring av sessioner ger dock Damerau-Levenshtein bättre resultat då det även tar hänsyn till i vilken ordningsföljd sidvisningarna för respektive session sker. / Nasdaq develops and provides software solutions for clearing houses. There is an interest in developing an in-depth understanding of how the functionality of this product is used. One possibility for this is to use hierarchical clustering of click streams from the web interface. This report develops a methodology for such clustering and applies it to an already existing dataset of clickstream logs. Using a Euclidean distance measure can work for simpler clusters such as grouping product pages. For a deeper analysis of user behavior through a clustering of sessions, however, Damerau–Levenshtein gives better results as it also takes into account the order of the pages visited within the sessions.
|
40 |
Building the Dresden Web Table Corpus: A Classification ApproachLehner, Wolfgang, Eberius, Julian, Braunschweig, Katrin, Hentsch, Markus, Thiele, Maik, Ahmadov, Ahmad 12 January 2023 (has links)
In recent years, researchers have recognized relational tables on the Web as an important source of information. To assist this research we developed the Dresden Web Tables Corpus (DWTC), a collection of about 125 million data tables extracted from the Common Crawl (CC) which contains 3.6 billion web pages and is 266TB in size. As the vast majority of HTML tables are used for layout purposes and only a small share contains genuine tables with different surface forms, accurate table detection is essential for building a large-scale Web table corpus. Furthermore, correctly recognizing the table structure (e.g. horizontal listings, matrices) is important in order to understand the role of each table cell, distinguishing between label and data cells. In this paper, we present an extensive table layout classification that enables us to identify the main layout categories of Web tables with very high precision. We therefore identify and develop a plethora of table features, different feature selection techniques and several classification algorithms. We evaluate the effectiveness of the selected features and compare the performance of various state-of-the-art classification algorithms. Finally, the winning approach is employed to classify millions of tables resulting in the Dresden Web Table Corpus (DWTC).
|
Page generated in 0.1636 seconds