Global ETD Search

801	Efficient Schema Extraction from a Collection of XML Documents Parthepan, Vijayeandra 01 May 2011 (has links) The eXtensible Markup Language (XML) has become the standard format for data exchange on the Internet, providing interoperability between different business applications. Such wide use results in large volumes of heterogeneous XML data, i.e., XML documents conforming to different schemas. Although schemas are important in many business applications, they are often missing in XML documents. In this thesis, we present a suite of algorithms that are effective in extracting schema information from a large collection of XML documents. We propose using the cost of NFA simulation to compute the Minimum Length Description to rank the inferred schema. We also studied using frequencies of the sample inputs to improve the precision of the schema extraction. Furthermore, we propose an evaluation framework to quantify the quality of the extracted schema. Experimental studies are conducted on various data sets to demonstrate the efficiency and efficacy of our approach. document mark up language data mining eXtensible Markup Language Databases and Information Systems Programming Languages and Compilers
802	Dynamic Scoping for Browser Based Access Control System Nadipelly, Vinaykumar 25 May 2012 (has links) We have inorganically increased the use of web applications to the point of using them for almost everything and making them an essential part of our everyday lives. As a result, the enhancement of privacy and security policies for the web applications is becoming increasingly essential. The importance and stateless nature of the web infrastructure made the web a preferred target of attacks. The current web access control system is a reason behind the victory of attacks. The current web consists of two major components, the browser and the server, where the effective access control system needs to be implemented. In terms of an access control system, the current web has adopted the inadequate same origin policy and same session policy for the browser and server, respectively. The current web access control system policies are sufficient for the earlier day's web, which became inadequate to address the protection needs of today's web. In order to protect the web application from un-trusted contents, we provide an enhanced browser based access control system by enabling the dynamic scoping. Our security model for the browser will allow the client and trusted web application contents to share a common library and protect web contents from each other, while they still get executed at different trust levels. We have implemented a working model of an enhanced browser based access control system in Java, under the Lobo browser. Web security Web application Browser enhancement Web protection models Web vulnerabilities Computer Sciences Databases and Information Systems
803	Dynamic Data Extraction and Data Visualization with Application to the Kentucky Mesonet Paidipally, Anoop Rao 01 May 2012 (has links) There is a need to integrate large-scale database, high-performance computing engines and geographical information system technologies into a user-friendly web interface as a platform for data visualization and customized statistical analysis. We present some concepts and design ideas regarding dynamic data storage and extraction by making use of open-source computing and mapping technologies. We implemented our methods to the Kentucky Mesonet automated weather mapping workflow. The main components of the work flow includes a web based interface, a robust database and computing infrastructure designed for both general users and power users such as modelers and researchers. Geographical Information System GIS Dynamic data storage Computer Sciences Databases and Information Systems
804	Organizational Search in Email Systems Pitla, Sruthi Bhushan 01 May 2012 (has links) The storage space for emails has been increasing at a rapid pace day by day. Email systems still serve as very important data repositories for many users to store different kinds of information which they use in their daily activities. Due to the rapidly increasing volume of email data, there is a need to maintain the data in a most efficient way. It is also very important to provide intuitive and flexible search utilities to provide better access to the information in the email repositories, especially in an enterprise or organizational setting. In order to implement the functionality, we are presenting a tool name TESO. TESO is a tool for email searching using organizational information. This tool is designed to improve the relevancy of the email search by integrating the data from email servers and organizational information from directory services and other resources. We implement this functionality as an add-on for the Mozilla Thunderbird framework, which is an open source email client system developed by the Mozilla Foundation. The results are evaluated using the SQLite and the XML data. This work will serve as a handy tool in the area of existing information integration and keyword search on relational databases techniques and also helps in efficient access of XML information. Mozilla Thunderbird Email Add-on Email storage TESO Computer Sciences Databases and Information Systems
805	Standardizing our perinatal language to facilitate data sharing Massey, Kiran Angelina 05 1900 (has links) Our ultimate goal as obstetric and neonatal care providers is to improve care for mothers and their babies. Continuous quality improvement (CQI) involves iterative cycles of practice change and audit of ongoing clinical care identifying practices that are associated with good outcomes. A vital prerequisite to this evidence based medicine is data collection. In Canada, much of the country is covered by separate fragmented silos known as regional reproductive care databases or perinatal health programs. A more centralized system which includes collaborative efforts is required. Moving in this direction would serve many purposes: efficiency, economy in the setting of limited resources and shrinking budgets and lastly, interaction among data collection agencies. This interaction may facilitate translation and transfer of knowledge to care-givers and patients. There are however many barriers towards such collaborative efforts including privacy, ownership and the standardization of both digital technologies and semantics. After thoroughly examining the current existing perinatal data collection among Perinatal Health Programs (PHPs), and the Canadian Perinatal Network (CPN) database, it was evident that there is little standardization of definitions. This serves as one of the most important barriers towards data sharing. To communicate effectively and share data, researchers and clinicians alike must construct a common perinatal language. Communicative tools and programs such as SNOMED CT® offer a potential solution, but still require much work due to their infancy. A standardized perinatal language would not only lay the definitional foundation in women’s health and obstetrics but also serve as a major contribution towards a universal electronic health record. Semantics Electronic health record Perinatal health programs Databases SNOMED CT(R) Interoperability
806	Un système d'intégration des métadonnées dédiées au multimédia Amir, Samir 06 December 2011 (has links) (PDF) Ma thèse porte sur la réalisation de l'interopérabilité des métadonnées au niveau des schémas et de langages de description. Cela est fait d'une manière automatique via le développement d'un outil de matching des schémas. Pour cela, j'ai proposé dans ma thèse une nouvelle approche de matching, baptisée MuMIe (Multilevel Metadata Integration). Elle a pour but de réaliser l'interopérabilité sur les deux niveaux (schémas et langages de description). La technique proposée transforme les schémas provenant de différents langages en graphes, en capturant uniquement quelques concepts basiques. Une méthodologie de matching est ensuite effectuée sur ces graphes permettant de trouver les correspondances entre leurs noeuds. Cela est fait via l'utilisation de plusieurs informations sémantiques et structurelles. La deuxième partie de ma thèse était consacrée à la modélisation sémantique des informations dédiées au multimédia (profiles des utilisateurs, caractéristiques des réseaux de transmission, terminaux, etc).J'ai développé un métamodèle nommé CAM4Home (Collaborative Aggregated Multimedia for Digital Home) pour la fusion des métadonnées. La spécification de ce métamodèle a été faite avec le langage RDFS. Métadonnées Gestion Ontologies Interopérabilité
807	Flexible techniques for heterogeneous XML data retrieval Sanz Blasco, Ismael 31 October 2007 (has links) The progressive adoption of XML by new communities of users has motivated the appearance of applications that require the management of large and complex collections, which present a large amount of heterogeneity. Some relevant examples are present in the fields of bioinformatics, cultural heritage, ontology management and geographic information systems, where heterogeneity is not only reflected in the textual content of documents, but also in the presence of rich structures which cannot be properly accounted for using fixed schema definitions. Current approaches for dealing with heterogeneous XML data are, however, mainly focused at the content level, whereas at the structural level only a limited amount of heterogeneity is tolerated; for instance, weakening the parent-child relationship between nodes into the ancestor-descendant relationship. The main objective of this thesis is devising new approaches for querying heterogeneous XML collections. This general objective has several implications: First, a collection can present different levels of heterogeneity in different granularity levels; this fact has a significant impact in the selection of specific approaches for handling, indexing and querying the collection. Therefore, several metrics are proposed for evaluating the level of heterogeneity at different levels, based on information-theoretical considerations. These metrics can be employed for characterizing collections, and clustering together those collections which present similar characteristics. Second, the high structural variability implies that query techniques based on exact tree matching, such as the standard XPath and XQuery languages, are not suitable for heterogeneous XML collections. As a consequence, approximate querying techniques based on similarity measures must be adopted. Within the thesis, we present a formal framework for the creation of similarity measures which is based on a study of the literature that shows that most approaches for approximate XML retrieval (i) are highly tailored to very specific problems and (ii) use similarity measures for ranking that can be expressed as ad-hoc combinations of a set of --basic' measures. Some examples of these widely used measures are tf-idf for textual information and several variations of edit distances. Our approach wraps these basic measures into generic, parametrizable components that can be combined into complex measures by exploiting the composite pattern, commonly used in Software Engineering. This approach also allows us to integrate seamlessly highly specific measures, such as protein-oriented matching functions.Finally, these measures are employed for the approximate retrieval of data in a context of highly structural heterogeneity, using a new approach based on the concepts of pattern and fragment. In our context, a pattern is a concise representations of the information needs of a user, and a fragment is a match of a pattern found in the database. A pattern consists of a set of tree-structured elements --- basically an XML subtree that is intended to be found in the database, but with a flexible semantics that is strongly dependent on a particular similarity measure. For example, depending on a particular measure, the particular hierarchy of elements, or the ordering of siblings, may or may not be deemed to be relevant when searching for occurrences in the database. Fragment matching, as a query primitive, can deal with a much higher degree of flexibility than existing approaches. In this thesis we provide exhaustive and top-k query algorithms. In the latter case, we adopt an approach that does not require the similarity measure to be monotonic, as all previous XML top-k algorithms (usually based on Fagin's algorithm) do. We also presents two extensions which are important in practical settings: a specification for the integration of the aforementioned techniques into XQuery, and a clustering algorithm that is useful to manage complex result sets.All of the algorithms have been implemented as part of ArHeX, a toolkit for the development of multi-similarity XML applications, which supports fragment-based queries through an extension of the XQuery language, and includes graphical tools for designing similarity measures and querying collections. We have used ArHeX to demonstrate the effectiveness of our approach using both synthetic and real data sets, in the context of a biomedical research project. similarity approximate query processing heterogeneous data management XML 004
808	An Ordered Bag Semantics for SQL Chinaei, Hamid R. January 2007 (has links) Semantic query optimization is an important issue in many contexts of databases including information integration, view maintenance and data warehousing and can substantially improve performance, especially in today's database systems which contain gigabytes of data. A crucial issue in semantic query optimization is query containment. Several papers have dealt with the problem of conjunctive query containment. In particular, some of the literature admits SQL like query languages with aggregate operations such as sum/count. Moreover, since real SQL requires a richer semantics than set semantics, there has been work on bag-semantics for SQL, essentially by introducing an interpreted column. One important technique for reasoning about query containment in the context of bag semantics is to translate the queries to alternatives using aggregate functions and assuming set semantics. Furthermore, in SQL, order by is the operator by which the results are sorted based on certain attributes and, clearly, ordering is an important issue in query optimization. As such, there has been work done in support of ordering based on the application of the domain. However, a final step is required in order to introduce a rich semantics in support. In this work, we integrate set and bag semantics to be able to reason about real SQL queries. We demonstrate an ordered bag semantics for SQL using a relational algebra with aggregates. We define a set algebra with various expressions of interest, then define syntax and semantics for bag algebra, and finally extend these definitions to ordered bags. This is done by adding a pair of additional interpreted columns to computed relations in which the first column is used in the standard fashion to capture duplicate tuples in query results, and the second adds an ordering priority to the output. We show that the relational algebra with aggregates can be used to compute these interpreted columns with sufficient flexibility to work as a semantics for standard SQL queries, which are allowed to include order by and duplicate preserving select clauses. The reduction of a workable ordered bag semantics for SQL to the relational algebra with aggregates - as we have developed it - can enable existing query containment theory to be applied in practical query containment. Ordered Bag Semantics for SQL Object Relational Algebra Object Relational Databases Computer Science
809	"Man kan ju hitta i princip allt man behöver på Google" : Högstadie- och gymnasielevers informationssökning i digitala medier / ”You can find practically everything you need on Google” : High school students’ information retrieval in digital media Karlsson, Desirée January 2012 (has links) The purpose of this essay is to examine how high school students (age 13 to 19) search for information on the web and in databases. Furthermore, it aims to look into how critical of sources they are. The questions asked was: how the students search for information in digital media? Which kind of sources do the students use? How they evaluate the information they find? Do they get any education in information retrieval and source evaluation? To answer these questions students were interviewed in groups about their information retrieval behavior. Furthermore two school librarians were interviewed about their experience of the students’ information retrieval.During the interviews it was clear that the students had received quite sparse instructions on the subjects of information retrieval and criticism of the sources. Some students did not even know what the latter was. Their methods for information retrieval were quite simple and, especially in the case of the younger students, without much thought behind. Their main strategy seemed to be Google and Wikipedia.From the interviews with the school librarian I learned that one of the reasons the students do not receive instructions in the above mentioned subjects is the lack of interest from the teachers. The librarians thought that the solution was to build up a stronger relationship with the teachers. information retrieval high school students databases Internet source criticism informationssökning högstadieelever gymnasieelever databaser Internet källkritik
810	Optimal Path Queries in Very Large Spatial Databases Zhang, Jie January 2005 (has links) Researchers have been investigating the optimal route query problem for a long time. Optimal route queries are categorized as either unconstrained or constrained queries. Many main memory based algorithms have been developed to deal with the optimal route query problem. Among these, Dijkstra's shortest path algorithm is one of the most popular algorithms for the unconstrained route query problem. The constrained route query problem is more complicated than the unconstrained one, and some constrained route query problems such as the Traveling Salesman Problem and Hamiltonian Path Problem are NP-hard. There are many algorithms dealing with the constrained route query problem, but most of them only solve a specific case. In addition, all of them require that the entire graph resides in the main memory. Recently, due to the need of applications in very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), several disk-based algorithms have been derived by using divide-and-conquer techniques to solve the shortest path problem in a very large graph. However, until now little research has been conducted on the disk-based constrained problem. <br /><br /> This thesis presents two algorithms: 1) a new disk-based shortest path algorithm (DiskSPNN), and 2) a new disk-based optimal path algorithm (DiskOP) that answers an optimal route query without passing a set of forbidden edges in a very large graph. Both algorithms fit within the same divide-and-conquer framework as the existing disk-based shortest path algorithms proposed by Ning Zhang and Heechul Lim. Several techniques, including query super graph, successor fragment and open boundary node pruning are proposed to improve the performance of the previous disk-based shortest path algorithms. Furthermore, these techniques are applied to the DiskOP algorithm with minor changes. The proposed DiskOP algorithm depends on the concept of collecting a set of boundary vertices and simultaneously relaxing their adjacent super edges. Even if the forbidden edges are distributed in all the fragments of a graph, the DiskOP algorithm requires little memory. Our experimental results indicate that the DiskSPNN algorithm performs better than the original ones with respect to the I/O cost as well as the running time, and the DiskOP algorithm successfully solves a specific constrained route query problem in a very large graph. Computer Science Optimal route query very large spatial databases forbidden edge constraint

Search results