Global ETD Search

1	Information Aggregation using the Cameleon# Web Wrapper Firat, Aykut, Madnick, Stuart, Yahaya, Nor Adnan, Kuan, Choo Wai, Bressan, Stéphane 29 July 2005 (has links) Cameleon# is a web data extraction and management tool that provides information aggregation with advanced capabilities that are useful for developing value-added applications and services for electronic business and electronic commerce. To illustrate its features, we use an airfare aggregation example that collects data from eight online sites, including Travelocity, Orbitz, and Expedia. This paper covers the integration of Cameleon# with commercial database management systems, such as MS SQL Server, and XML query languages, such as XQuery. Cameleon# web data extraction web data management
2	Reducing human effort in web data extraction Guo, Jinsong January 2017 (has links) The human effort in large-scale web data extraction significantly affects both the extraction flexibility and the economic cost. Our work aims to reduce the human effort required by web data extraction tasks in three specific scenarios. (I) Data demand is unclear, and the user has to guide the wrapper induction by annotations. To maximally save the human effort in the annotation process, wrappers should be robust, i.e., immune to the webpage's change, to avoid the wrapper re-generation which requires a re-annotation process. Existing approaches primarily aim at generating accurate wrappers but barely generate robust wrappers. We prove that the XPATH wrapper induction problem is NP-hard, and propose an approximate solution estimating a set of top-k robust wrappers in polynomial time. Our method also meets one additional requirement that the induction process should be noise resistant, i.e., tolerate slightly erroneous examples. (II) Data demand is clear, and the user's guide should be avoided, i.e., the wrapper generation should be fully-unsupervised. Existing unsupervised methods purely relying on the repeated patterns of HTML structures/visual information are far from being practical. Partially supervised methods, such as the state-of-the-art system DIADEM, can work well for tasks involving only a small number of domains. However, the human effort in the annotator preparation process becomes a heavier burden when the domain number increases. We propose a new approach, called RED (abbreviation for 'redundancy'), an automatic approach exploiting content redundancy between the result page and its corresponding detail pages. RED requires no annotation (thus requires no human effort) and its wrapper accuracy is significantly higher than that of previous unsupervised methods. (III) Data quality is unknown, and the user's related decisions are blind. Without knowing the error types and the error number of each type in the extracted data, the extraction effort could be wasted on useless websites, and even worse, the human effort could be wasted on unnecessary or wrongly-targeted data cleaning process. Despite the importance of error estimation, no methods have addressed it sufficiently. We focus on two types of common errors in web data, namely duplicates and violations of integrity constraints. We propose a series of error estimation approaches by adapting, extending, and synthesizing some recent innovations in diverse areas such as active learning, classifier calibration, F-measure estimation, and interactive training. Read more Computer science ; Web data extraction
3	Semantic web Einführung, wirtschaftliche Bedeutung, Perspektiven Tusek, Jasna January 2006 (has links) Zugl.: Wien, Wirtschaftsuniv., Diplomarb.
4	Improving the performance of wide area networks Holt, Alan January 1999 (has links) No description available. 621.382
5	An implementation of correspondence analysis in R and its application in the analysis of web usage / Nenadić, Oleg. January 2007 (has links) Zugl.: Göttingen, University, Diss., 2007.
6	Generating Data-Extraction Ontologies By Example Zhou, Yuanqiu 22 November 2005 (has links) (PDF) Ontology-based data-extraction is a resilient web data-extraction approach. A major limitation of this approach is that ontology experts must manually develop and maintain data-extraction ontologies. The limitation prevents ordinary users who have little knowledge of conceptual models from making use of this resilient approach. In this thesis we have designed and implemented a general framework, OntoByE, to generate data-extraction ontologies semi-automatically through a small set of examples collected by users. With the assistance of a limited amount of prior knowledge, experimental evidence shows that OntoByE is capable of interacting with users to generate data-extraction ontologies for domains of interest to them. Ontology Web data data extraction Computer Sciences
7	A causal approach to transitivity Eu, Jinseung January 2014 (has links) The present thesis presents a causal approach to transitivity and proposes a model of transitivity based on the view that a single event is a single ‘causal impact’, which consists of a single causation and a single effect. It defines semantic intransitivity as events where the effect is borne by and expressed through the actor and semantic transitivity as events where the effect is borne by and expressed through the patient. It finds evidence for this definition in the phenomenon of ‘selective specification’ of action or result by verbs with actor and patient. Furthermore, it proposes that the verb eat has dual event structures, intransitive and transitive, and uses a Web data test to test and confirm this hypothesis. 415
8	Query-Based data mining for the web Poblete Labra, Bárbara 01 October 2009 (has links) El objetivo de esta tesis es estudiar diferentes aplicaciones de la minería de consultas Web para mejorar el ranking en motores de búsqueda, mejorar la recuperación de información en la Web y mejorar los sitios Web. La principal motivación de este trabajo es aprovechar la información implícita que los usuarios dejan como rastro al navegar en la Web. A través de este trabajo buscamos demostrar el valor de la "sabiduría de las masas", que entregan las consultas, para muchas aplicaciones. Estas aplicaciones permiten un mejor entendimiento de las necesidades de los usuarios en la Web, mejorando en forma directa la interacción general que tienen los visitantes con los sitios Web y los buscadores. / The objective of this thesis is to study different applications of Web query mining for the improvement of search engine ranking, Web information retrieval and Web site enhancement. The main motivation of this work is to take advantage of the implicit feedback left in the trail of users while navigating through the Web. Throughout this work we seek to demonstrate the value of queries to extract interesting rules, patterns and information about the documents they reach. The models, created in this doctoral work, show that the "wisdom of the crowds" conveyed in queries has many applications that overall provide a better understanding of users' needs in the Web. This allows to improve the general interaction of visitors with Web sites and search engines in a straightforward way. Read more query mining web data mining information retrieval 316
9	Evaluating Query and Storage Strategies for RDF Archives Fernandez Garcia, Javier David, Umbrich, Jürgen, Polleres, Axel, Knuth, Magnus January 2018 (has links) (PDF) There is an emerging demand on efficiently archiving and (temporal) querying different versions of evolving semantic Web data. As novel archiving systems are starting to address this challenge, foundations/standards for benchmarking RDF archives are needed to evaluate its storage space efficiency and the performance of different retrieval operations. To this end, we provide theoretical foundations on the design of data and queries to evaluate emerging RDF archiving systems. Then, we instantiate these foundations along a concrete set of queries on the basis of a real-world evolving dataset. Finally, we perform an empirical evaluation of various current archiving techniques and querying strategies on this data that is meant to serve as a baseline of future developments on querying archives of evolving RDF data.
10	Integration of Heterogeneous Web-based Information into a Uniform Web-based Presentation Janga, Prudhvi 17 October 2014 (has links) No description available. Computer Science Tabular Web Data Integration of Web Data XML schema generation schema mapping Relational to XML XML to relational

Search results