Global ETD Search

201	Discovering web page communities for web-based data management Hou, Jingyu January 2002 (has links) The World Wide Web is a rich source of information and continues to expand in size and complexity. Mainly because the data on the web is lack of rigid and uniform data models or schemas, how to effectively and efficiently manage web data and retrieve information is becoming a challenge problem. Discovering web page communities, which capture the features of the web and web-based data to find intrinsic relationships among the data, is one of the effective ways to solve this problem. A web page community is a set of web pages that has its own logical and semantic structures. In this work, we concentrate on the web data in web page format and exploit hyperlink information to discover (construct) web page communities. Three main web page communities are studied in this work: the first one is consisted of hub and authority pages, the second one is composed of relevant web pages with respect to a given page (URL), and the last one is the community with hierarchical cluster structures. For analysing hyperlinks, we establish a mathematical framework, especially the matrix-based framework, to model hyperlinks. Within this mathematical framework, hyperlink analysis is placed on a solid mathematic base and the results are reliable. For the web page community that is consisted of hub and authority pages, we focus on eliminating noise pages from the concerned page source to obtain another good quality page source, and in turn improve the quality of web page communities. We propose an innovative noise page elimination algorithm based on the hyperlink matrix model and mathematic operations, especially the singular value decomposition (SVD) of matrix. The proposed algorithm exploits hyperlink information among the web pages, reveals page relationships at a deeper level, and numerically defines thresholds for noise page elimination. The experiment results show the effectiveness and feasibility of the algorithm. This algorithm could also be used solely for web-based data management systems to filter unnecessary web pages and reduce the management cost. In order to construct a web page community that is consisted of relevant pages with respect to a given page (URL), we propose two hyperlink based relevant page finding algorithms. The first algorithm comes from the extended co-citation analysis of web pages. It is intuitive and easy to be implemented. The second one takes advantage of linear algebra theories to reveal deeper relationships among the web pages and identify relevant pages more precisely and effectively. The corresponding page source construction for these two algorithms can prevent the results from being affected by malicious hyperlinks on the web. The experiment results show the feasibility and effectiveness of the algorithms. The research results could be used to enhance web search by caching the relevant pages for certain searched pages. For the purpose of clustering web pages to construct a community with its hierarchical cluster structures, we propose an innovative web page similarity measurement that incorporates hyperlink transitivity and page importance (weight).Based on this similarity measurement, two types of hierarchical web page clustering algorithms are proposed. The first one is the improvement of the conventional K-mean algorithms. It is effective in improving page clustering, but is sensitive to the predefined similarity thresholds for clustering. Another type is the matrix-based hierarchical algorithm. Two algorithms of this type are proposed in this work. One takes cluster-overlapping into consideration, another one does not. The matrix-based algorithms do not require predefined similarity thresholds for clustering, are independent of the order in which the pages are presented, and produce stable clustering results. The matrix-based algorithms exploit intrinsic relationships among web pages within a uniform matrix framework, avoid much influence of human interference in the clustering procedure, and are easy to be implemented for applications. The experiments show the effectiveness of the new similarity measurement and the proposed algorithms in web page clustering improvement. For applying above mathematical algorithms better in practice, we generalize the web page discovering as a special case of information retrieval and present a visualization system prototype, as well as technical details on visualization algorithm design, to support information retrieval based on linear algebra. The visualization algorithms could be smoothly applied to web applications. XML is a new standard for data representation and exchange on the Internet. In order to extend our research to cover this important web data, we propose an object representation model (ORM) for XML data. A set of transformation rules and algorithms are established to transform XML data (DTD and XML documents with DTD or without DTD) into this model. This model capsulizes elements of XML data and data manipulation methods. DTD-Tree is also defined to describe the logical structure of DTD. It also can be used as an application program interface (API) for processing DTD, such as transforming a DTD document into the ORM. With this data model, semantic meanings of the tags (elements) in XML data can be used for further research in XML data management and information retrieval, such as community construction for XML data. webpage XML object representation model (ORM) DTD
202	Génération et adaptation automatiques de mappings pour des sources de données XML Xue, Xiaohui 11 December 2006 (has links) (PDF) L'intégration de l'information fournie par de multiples sources de données hétérogènes est un besoin croissant des systèmes d'information actuels. Dans ce contexte, les besoins des applications sont décrits au moyen d'un schéma cible et la façon dont les instances du schéma cible sont dérivées à partir des sources de données est exprimée par des mappings. Dans cette thèse, nous nous intéressons à la génération automatique de mappings pour des sources de données XML ainsi qu'à l'adaptation de ces mappings en cas de changements survenant dans le schéma cible ou dans les sources de données. <br />Nous proposons une approche de génération de mappings en trois phases : (i) la décomposition du schéma cible en sous-arbres, (ii) la recherche de mappings partiels pour chacun de ces sous-arbres et enfin (iii) la génération de mappings pour l'ensemble du schéma cible à partir de ces mappings partiels. Le résultat de notre approche est un ensemble de mappings, chacun ayant une sémantique propre. Dans le cas où l'information requise par le schéma cible n'est pas présente dans les sources, aucun mapping ne sera produit. Dans ce cas, nous proposons de relaxer certaines contraintes définies sur le schéma cible pour permettre de générer des mappings. Nous avons développé un outil pour supporter notre approche. Nous avons également proposé une approche d'adaptation des mappings existants en cas de changement survenant dans les sources ou dans le schéma cible. [INFO] Computer Science Intégration XML Mappings
203	Automates pour l'analyse de documents XML compressés, applications à la sécurité d'accès Fila, Barbara 03 November 2008 (has links) (PDF) Le problème de l'extraction d'information dans des documents semi-structurés, du type XML, constitue un des plus importants domaines de la recherche actuelle en informatique. Il a généré un grand nombre de travaux tant d'un point de vue pratique, que d'un point de vue théorique. Dans ce travail de thèse, notre étude porte sur deux objectifs : 1. évaluation des requêtes sur un document assujetti à une politique de contrôle d'accès, 2. évaluation des requêtes sur un document pouvant être partiellement ou totalement compressé. Notre étude porte essentiellement sur l'évaluation des requêtes unaires, càd. sélectionnant un ensemble des noeuds du document qui satisfont les propriétés spéciées par la requête. Pour exprimer les requêtes, nous utilisons le XPath le principal langage de sélection dans les documents XML. Grâce à ses axes navigationels, et ses ltres qualicatifs, XPath permet la navigation dans des documents XML, et la sélection des noeuds répondant à la requête. Les expressions XPath sont à la base de plusieurs formalismes de requêtes comme XQuery, XSLT, ils permettent également de dénir les clés d'accès dans XML Schema et XLink, et de référencer les éléments d'un document externe dans XPointer. [INFO] Computer Science automate DAG XML
204	Securing XML Web Services : using WS-security Antonsson, Martin January 2003 (has links) No description available. Fishbone Systems AB Datasäkerhet XML Web Services
205	A toolkit for managing XML data with a relational database management system Ramani, Ramasubramanian, January 2001 (has links) (PDF) Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains x, 54 p.; also contains graphics. Vita. Includes bibliographical references (p. 50-53).
206	Incremental maintenance of materialized Xquery views El-Sayed, Maged F. January 2005 (has links) Thesis (Ph. D.)--Worcester Polytechnic Institute. / Keywords: XML; XQuery; incremental view maintenance. Includes bibliographical references (leaves 256-263).
207	Data hiding and detection in Office Open XML (OOXML) documents Raffay, Mohammad Ali 01 March 2011 (has links) With the rapid development and popularity of information technology, criminals and mischievous computer users are given avenues to commit crimes and malicious activities. One of the commonly used tactics, called steganography, is to hide information under a cover media so that except participants, no one else knows the existence of such information. Many techniques have been proposed for hiding data in images, videos and audios, but there is not much research devoted to data hiding in the popular MS Office documents which have recently adopted Office Open XML (OOXML) format. In this research, we first focus on identifying several data hiding techniques for OOXML documents. Then, we design and develop a fast detection algorithm based on the unique internal structure of OOXML documents, which contains multiple XML files, by using multi-XML query technique. Experimental results show the proposed detection algorithm outperforms the traditional one in terms of detection speed and completeness, where performance is the key to success of detecting hidden data in OOXML documents due to the fact that millions of documents are generated and transferred over the internet every day. / UOIT Office Open XML Steganography OOXML Data hiding
208	DescribeX: A Framework for Exploring and Querying XML Web Collections Rizzolo, Flavio Carlos 26 February 2009 (has links) The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for several reasons: an XML schema may be very lax (e.g., to accommodate the flexibility needed to represent collections of documents in RSS feeds), a schema may be large and different subsets used for different documents (e.g., this is common in industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). A schema alone may not provide sufficient information for many data management tasks that require knowledge of the actual structure of the collection. Web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges. This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogenous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX’s light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage. XML Summaries Framework Semistructured Web XPath 0984
209	Distributed Data Integration using Web Services and XML Mukker, Alka 20 December 2004 (has links) Data integration has been an active topic of research in the past. With the advances in technology in the context of web, data integration faces new challenges imposed by heterogeneity of source, their autonomy and independence. Web services, which are universally accessible software components deployed on the web, are becoming the focus of recent researches due to their ability to interconnect systems and cost optimizations. At the same time, XML has also become one of the core technologies for business applications. By offering a standard, flexible and inherently extensible data format, XML significantly reduces the burden of deploying the many technologies needed to ensure the success of Web services. This thesis examines the opportunities for data integration in the context of web services development paradigm. It examines the existing technologies and standards of web services and XML and provides an example of how web services can be used to unlock heterogeneous systems to extract and integrate data. The approach followed to illustrate this uses embedded web service calls inside XML documents. The main contributions of this paper are: 1) comprehensive research of existing technologies 2) architecture to support invocation of embedded web services 3) implementation of an application to show the results 4) use of existing technologies to implement the proposed system. web services xml data integration Computer Sciences
210	DescribeX: A Framework for Exploring and Querying XML Web Collections Rizzolo, Flavio Carlos 26 February 2009 (has links) The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for several reasons: an XML schema may be very lax (e.g., to accommodate the flexibility needed to represent collections of documents in RSS feeds), a schema may be large and different subsets used for different documents (e.g., this is common in industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). A schema alone may not provide sufficient information for many data management tasks that require knowledge of the actual structure of the collection. Web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges. This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogenous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX’s light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage. XML Summaries Framework Semistructured Web XPath 0984

Search results