11 |
The Academic Web Link Database ProjectThelwall, Mike, Binns, Ray, Harries, Gareth, Page-Kennedy, Teresa, Li, Xuemei, Musgrove, Peter, Price, Liz, Wilkinson, David January 2002 (has links)
This project was created in response to the need for research into web links: including web link mining, and the creation of link metrics. It is aimed at providing the raw data and software for researchers to analyse link structures without having to rely upon commercial search engines, and without having to run their own web crawler. This site will contain all of the following.
*Complete databases of link structures of collections of academic web sites.
*Files of summary statistics about the link databases.
*Software tools for researchers to extract the information that they are particularly interested in.
*Descriptions of the methodologies used to crawl the web so that the information provided can be critically evaluated.
*Files of information used in the web crawling process.
|
12 |
Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research WorksNicholson, Scott 12 1900 (has links)
Based off Nicholson's 2000 University of North Texas dissertation, "CREATING A CRITERION-BASED INFORMATION AGENT THROUGH DATA MINING FOR AUTOMATED IDENTIFICATION OF SCHOLARLY RESEARCH ON THE WORLD WIDE WEB" located at http://scottnicholson.com/scholastic/finaldiss.doc / This research creates an intelligent agent for automated collection development in a digital library setting. It uses a predictive model based on facets of each Web page to select scholarly works. The criteria came from the academic library selection literature, and a Delphi study was used to refine the list to 41 criteria. A Perl program was designed to analyze a Web page for each criterion and applied to a large collection of scholarly and non-scholarly Web pages. Bibliomining, or data mining for libraries, was then used to create different classification models. Four techniques were used: logistic regression, non-parametric discriminant analysis, classification trees, and neural networks. Accuracy and return were used to judge the effectiveness of each model on test datasets. In addition, a set of problematic pages that were difficult to classify because of their similarity to scholarly research was gathered and classified using the models.
The resulting models could be used in the selection process to automatically create a digital library of Web-based scholarly research works. In addition, the technique can be extended to create a digital library of any type of structured electronic information.
|
13 |
Using Coplink to Analyze Criminal-Justice DataHauck, Roslin V., Atabakhsh, Homa, Ongvasith, Pichai, Gupta, Harsh, Chen, Hsinchun 03 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / As information technologies and applications
become more overwhelming
and diverse, persistent information
overload problems have become ever
more urgent.1 Fallout from this trend
has most affected government, specifically criminaljustice
information systems. The explosive growth in
the digital information maintained in the data repositories
of federal, state, and local criminal-justice
entities and the spiraling need for cross-agency
access to that information have made utilizing it
both increasingly urgent and increasingly difficult.
The Coplink system applies a concept spaceâ
a statistics-based, algorithmic technique that
identifies relationships between suspects, victims,
and other pertinent dataâ to accelerate criminal
investigations and enhance law enforcement
efforts.
|
14 |
Indexing the InternetHubbard, John 11 1900 (has links)
Essay analyzes the question of what is the best way to index the Internet.
|
15 |
Digital Library Archeology: A Conceptual Framework for Understanding Library Use through Artifact-Based EvaluationNicholson, Scott January 2005 (has links)
Archeologists have used material artifacts found in a physical space to gain an understanding about the people who occupied that space. Likewise, as users wander through a digital library, they leave behind data-based artifacts of their activity in the virtual space. Digital library archeologists can gather these artifacts and employ inductive techniques, such as bibliomining, to create generalizations. These generalizations are the basis for hypotheses, which are tested to gain understanding about library services and users. In this article, the development of traditional archeological methods is presented and used to create a conceptual framework for the artifact-based evaluation in digital libraries.
|
16 |
Special Issue Digital Government: technologies and practicesChen, Hsinchun 02 1900 (has links)
Artificial Intelligence Lab, Department of MIS, University of Arizona / The Internet is changing the way we live and do
business. It also offers a tremendous opportunity for
government to better deliver its contents and services
and interact with its many constituentsâ citizens,
businesses, and other government partners. In addition
to providing information, communication, and transaction
services, exciting and innovative transformation
could occur with the new technologies and practices.
|
17 |
Web Searching, Search Engines and Information RetrievalLewandowski, Dirk January 2005 (has links)
This article discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines. Ranking factors are divided into query-dependent and query-independent factors, the latter of which have become
more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. The article concludes with an overview of factors that should be considered to determine the quality of Web
search engines.
|
18 |
Extraction de données à partir du WebAchir, Badr 07 1900 (has links) (PDF)
Le Web est devenu riche en informations circulant à travers le monde entier via le réseau Internet. Cela a provoqué l'expansion de grandes quantités de données. De plus, ces données sont souvent non structurées et difficiles à être utilisées dans des applications Web. D'une part, l'intérêt des utilisateurs pour l'exploitation de ces données a augmenté d'une façon concurrentielle. D'autre part, les données ne sont pas faciles à être consultées par l'humain. Cet intérêt a motivé les chercheurs à penser à des approches d'extraction des données à partir du Web, d'où l'apparition des adaptateurs. Un adaptateur est basé sur un ensemble des règles d'extraction définissant l'emplacement des données dans le document à extraire. Plusieurs outils existent pour la construction de ces règles. Notre travail s'intéresse au problème de l'extraction de données à partir du Web. Dans ce document, nous proposons une méthode d'extraction des données à partir du Web basée sur l'apprentissage machine pour la construction des règles d'extraction. Les résultats de l'extraction de notre approche démontrent une importance en matière de précision d'extraction et une meilleure performance dans le processus d'apprentissage. L'utilisation de notre outil dans une application d'interrogation de sources de données a permis de répondre aux besoins des utilisateurs d'une manière très simple et automatique.
______________________________________________________________________________
MOTS-CLÉS DE L’AUTEUR : extraction, adaptateurs, règles d'extraction, apprentissage machine, Web, applications Web.
|
19 |
Benefits of the application of web-mining methods and techniques for the field of analytical customer relationship management of the marketing function in a knowledge management perspectiveErtz, Myriam 12 1900 (has links) (PDF)
Le Web Mining (WM) reste une technologie relativement méconnue. Toutefois, si elle est utilisée adéquatement, elle s'avère être d'une grande utilité pour l'identification des profils et des comportements des clients prospects et existants, dans un contexte internet. Les avancées techniques du WM améliorent grandement le volet analytique de la Gestion de la Relation Client (GRC). Cette étude suit une approche exploratoire afin de déterminer si le WM atteint, à lui seul, tous les objectifs fondamentaux de la GRC, ou le cas échéant, devrait être utilisé de manière conjointe avec la recherche marketing traditionnelle et les méthodes classiques de la GRC analytique (GRCa) pour optimiser la GRC, et de fait le marketing, dans un contexte internet. La connaissance obtenue par le WM peut ensuite être administrée au sein de l'organisation dans un cadre de Gestion de la Connaissance (GC), afin d'optimiser les relations avec les clients nouveaux et/ou existants, améliorer leur expérience client et ultimement, leur fournir de la meilleure valeur. Dans un cadre de recherche exploratoire, des entrevues semi-structurés et en profondeur furent menées afin d'obtenir le point de vue de plusieurs experts en (web) data rnining. L'étude révéla que le WM est bien approprié pour segmenter les clients prospects et existants, pour comprendre les comportements transactionnels en ligne des clients existants et prospects, ainsi que pour déterminer le statut de loyauté (ou de défection) des clients existants. Il constitue, à ce titre, un outil d'une redoutable efficacité prédictive par le biais de la classification et de l'estimation, mais aussi descriptive par le biais de la segmentation et de l'association. En revanche, le WM est moins performant dans la compréhension des dimensions sous-jacentes, moins évidentes du comportement client. L'utilisation du WM est moins appropriée pour remplir des objectifs liés à la description de la manière dont les clients existants ou prospects développent loyauté, satisfaction, défection ou attachement envers une enseigne sur internet. Cet exercice est d'autant plus difficile que la communication multicanale dans laquelle évoluent les consommateurs a une forte influence sur les relations qu'ils développent avec une marque. Ainsi le comportement en ligne ne serait qu'une transposition ou tout du moins une extension du comportement du consommateur lorsqu'il n'est pas en ligne. Le WM est également un outil relativement incomplet pour identifier le développement de la défection vers et depuis les concurrents ainsi que le développement de la loyauté envers ces derniers. Le WM nécessite toujours d'être complété par la recherche marketing traditionnelle afin d'atteindre ces objectives plus difficiles mais essentiels de la GRCa. Finalement, les conclusions de cette recherche sont principalement dirigées à l'encontre des firmes et des gestionnaires plus que du côté des clients-internautes, car ces premiers plus que ces derniers possèdent les ressources et les processus pour mettre en œuvre les projets de recherche en WM décrits.
______________________________________________________________________________
MOTS-CLÉS DE L’AUTEUR : Web mining, Gestion de la connaissance, Gestion de la relation client, Données internet, Comportement du consommateur, Forage de données, Connaissance du consommateur
|
20 |
Web Intelligence for Scaling Discourse of OrganizationsJanuary 2016 (has links)
abstract: Internet and social media devices created a new public space for debate on political
and social topics (Papacharissi 2002; Himelboim 2010). Hotly debated issues
span all spheres of human activity; from liberal vs. conservative politics, to radical
vs. counter-radical religious debate, to climate change debate in scientific community,
to globalization debate in economics, and to nuclear disarmament debate in
security. Many prominent ’camps’ have emerged within Internet debate rhetoric and
practice (Dahlberg, n.d.).
In this research I utilized feature extraction and model fitting techniques to process
the rhetoric found in the web sites of 23 Indonesian Islamic religious organizations,
later with 26 similar organizations from the United Kingdom to profile their
ideology and activity patterns along a hypothesized radical/counter-radical scale, and
presented an end-to-end system that is able to help researchers to visualize the data
in an interactive fashion on a time line. The subject data of this study is the articles
downloaded from the web sites of these organizations dating from 2001 to 2011,
and in 2013. I developed algorithms to rank these organizations by assigning them
to probable positions on the scale. I showed that the developed Rasch model fits
the data using Andersen’s LR-test (likelihood ratio). I created a gold standard of
the ranking of these organizations through an expertise elicitation tool. Then using
my system I computed expert-to-expert agreements, and then presented experimental
results comparing the performance of three baseline methods to show that the
Rasch model not only outperforms the baseline methods, but it was also the only
system that performs at expert-level accuracy.
I developed an end-to-end system that receives list of organizations from experts,
mines their web corpus, prepare discourse topic lists with expert support, and then
ranks them on scales with partial expert interaction, and finally presents them on an
easy to use web based analytic system. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016
|
Page generated in 0.0737 seconds