Global ETD Search

11	Aggregation and Privacy in Multi-Relational Databases Jafer, Yasser 11 April 2012 (has links) Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process. Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database. Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database. In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time. Aggregation Privacy Relational Database Data Mining
12	Aggregation and Privacy in Multi-Relational Databases Jafer, Yasser January 2012 (has links) Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process. Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database. Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database. In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time. Aggregation Privacy Relational Database Data Mining
13	A RELATIONAL APPROACH FOR MANAGING LARGE FLIGHT TEST PARAMETER LISTS Penna, Sérgio D., Espeschit, Antônio Magno L. 10 1900 (has links) ITC/USA 2005 Conference Proceedings / The Forty-First Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2005 / Riviera Hotel & Convention Center, Las Vegas, Nevada / The number of aircraft parameters used in flight-testing has constantly increased over the years and there is no sign that situation will change in the near future. On the contrary, in modern, software-driven, digital avionic systems, all sorts of parameters circulate through digital buses and can be transferred to on-board data acquisition systems more easily than those converted from traditional analog transducers, facilitating the request for more and more parameters to be acquired, processed, visualized, stored and retrieved at any given time. The constant unbalance between what parameter quantity engineers believe to be “sufficient” for developing and troubleshooting systems in a new aircraft, which tends to grow with aircraft complexity, and the associated cost of instrumenting a test prototype accordingly, which tends to grow beyond budget limits, pushes for new creative ways of handling both tendencies without compromising the ease of performing an engineering analysis directly from flight test data. This paper presents an alternative for handling large collections of flight test parameters through a relational approach, particularly in two important scenarios: the very basic creation and administration of the traditional “Flight Test Parameter List” and the transmission of selected data over a telemetry link for visualization in a Ground Station. Flight Test Data Processing Flight Test Parameter Telemetry Relational Database
14	Querying and extracting heterogeneous graphs from structured data and unstrutured content Soussi, Rania 22 June 2012 (has links) (PDF) The present work introduces a set of solutions to extract graphs from enterprise data and facilitate the process of information search on these graphs. First of all we have defined a new graph model called the SPIDER-Graph, which models complex objects and permits to define heterogeneous graphs. Furthermore, we have developed a set of algorithms to extract the content of a database from an enterprise and to represent it in this new model. This latter representation allows us to discover relations that exist in the data but are hidden due to their poor compatibility with the classical relational model. Moreover, in order to unify the representation of all the data of the enterprise, we have developed a second approach which extracts from unstructured data an enterprise's ontology containing the most important concepts and relations that can be found in a given enterprise. Having extracted the graphs from the relational databases and documents using the enterprise ontology, we propose an approach which allows the users to extract an interaction graph between a set of chosen enterprise objects. This approach is based on a set of relations patterns extracted from the graph and the enterprise ontology concepts and relations. Finally, information retrieval is facilitated using a new visual graph query language called GraphVQL, which allows users to query graphs by drawing a pattern visually for the query. This language covers different query types from the simple selection and aggregation queries to social network analysis queries. [SPI:OTHER] Engineering Sciences/Other Graph Model Relational database Entreprise ontology
15	Policies Based Intrusion Response System for DBMS Nayeem, Fatima, Vijayakamal, M. 01 December 2012 (has links) Relational databases are built on Relational Model proposed by Dr. E. F. Codd. The relational model has become a consistent and widely used DBMS in the world. The databases in this model are efficient in storing and retrieval of data besides providing authentication through credentials. However, there might be many other attacks apart from stealing credentials and intruding database. Adversaries may always try to intrude into the relational database for monetary or other gains [1]. The relational databases are subjected to malicious attacks as they hold the valuable business data which is sensitive in nature. Monitoring such database continuously is a task which is inevitable keeping the importance of database in mind. This is a strategy that is in top five database strategies as identified by Gartner research which are meant for getting rid of data leaks in organizations [2]. There are regulations from governments like US with respect to managing data securely. The data management like HIAPP, GLBA, and PCI etc. is mentioned in the regulations as examples. / Intrusion detection systems play an important role in detecting online intrusions and provide necessary alerts. Intrusion detection can also be done for relational databases. Intrusion response system for a relational database is essential to protect it from external and internal attacks. We propose a new intrusion response system for relational databases based on the database response policies. We have developed an interactive language that helps database administrators to determine the responses to be provided by the response system based on the malicious requests encountered by relational database. We also maintain a policy database that maintains policies with respect to response system. For searching the suitable policies algorithms are designed and implemented. Matching the right policies and policy administration are the two problems that are addressed in this paper to ensure faster action and prevent any malicious changes to be made to policy objects. Cryptography is also used in the process of protecting the relational database from attacks. The experimental results reveal that the proposed response system is effective and useful. Intrusion detection intrusion response system policies relational database
16	Analýza a problémy Top-K dotazu nad relační databází / Top-k querying over a relational databases: analysis and problems Čech, Martin January 2011 (has links) Due to increasing capacity of storage devices and speed of computer networks during last years, it is still more required to sort and search data effectively. Query result containing thousands of rows is usually useless and unreadable. In that situation, users may prefer to define constraints and sorting priorities in the query, and see only several top rows from the result. This thesis deals with top-k queries problems, extension of relational algebra by new operators and their implementation in database system. It focuses on optimization of operations join and sort. The thesis includes implementation and comparison of some algorithms in standalone .NET library NRank.
17	Adaptation of Relational Database Schema / Adaptation of Relational Database Schema Chytil, Martin January 2012 (has links) In the presented work we study evolution of a database schema and its impact on related issues. The work contains a review of important problems related to the change in a respective storage of the data. It describes existing approaches of these problems as well. In detail the work analyzes an impact of database schema changes on database queries, which relate to the particular database schema. The approach presented in this thesis shows a ability to model database queries together with a database schema model. The thesis describes a solution how to adapt database queries related to the evolved database schema. Finally the work contains a number of experiments that verify a proposal of the presented solution.
18	Classifier System Learning of Good Database Schema Tanaka, Mitsuru 07 August 2008 (has links) This thesis presents an implementation of a learning classifier system which learns good database schema. The system is implemented in Java using the NetBeans development environment, which provides a good control for the GUI components. The system contains four components: a user interface, a rule and message system, an apportionment of credit system, and genetic algorithms. The input of the system is a set of simple database schemas and the objective for the classifier system is to keep the good database schemas which are represented by classifiers. The learning classifier system is given some basic knowledge about database concepts or rules. The result showed that the system could decrease the bad schemas and keep the good ones. classifier system machine learning genetic algorithms and relational database
19	NOVEL APPROACH TO STORAGE AND STORTING OF NEXT GENERATION SEQUENCING DATA FOR THE PURPOSE OF FUNCTIONAL ANNOTATION TRANSFER Candelli, Tito January 2012 (has links) The problem of functional annotation of novel sequences has been a sigfinicant issue for many laboratories that decided to apply next generation sequencing techniques to less studied species. In particular experiments such as transcriptome analysis heavily suer from this problem due to the impossibility of ascribing their results in a relevant biological context. Several tools have been proposed to solve this problem through homology annotation transfer. The principle behind this strategy is that homologous genes share common functions in dierent organisms, and therefore annotations are transferable between these genes. Commonly, BLAST reports are used to identify a suitable homologousgene in a well annotated species and the annotation is then transferred fromthe homologue to the novel sequence. Not all homologues, however, possess valid functional annotations. The aim of this project was to devise an algorithm to process BLAST reports and provide a criterion to discriminate between homologues with a biologically informative and uninformative annotation, respectively. In addition, all data obtained from the BLAST report isto be stored in a relational database for ease of consultation and visualization. In order to test the solidity of the system, we utilized 750 novel sequences obtained through application of next generation sequencing techniques to Avena sativa samples. This species particularly suits our needs as it represents the typical target for homology annotation transfer: lack of a reference genome and diculty in attributing functional annotation. The system was able to perform all the required tasks. Comparisons between best hits asdetermined by BLAST and best hits as determined by the algorithm showed a significant increase in the biological significance of the results when thealgorithm sorting system was applied. homology annotation transfer blast parsing relational database functional information
20	Performance Analysis of Relational Database over Distributed File Systems Tsai, Ching-Tang 08 July 2011 (has links) With the growing of Internet, people use network frequently. Many PC applications have moved to the network-based environment, such as text processing, calendar, photo management, and even users can develop applications on the network. Google is a company providing web services. Its popular services are search engine and Gmail which attract people by short response time and lots amount of data storage. It also charges businesses to place their own advertisements. Another hot social network is Facebook which is also a popular website. It processes huge instant messages and social relationships between users. The magic power of doing this depends on the new technique, Cloud Computing. Cloud computing has ability to keep high-performance processing and short response time, and its kernel components are distributed data storage and distributed data processing. Hadoop is a famous open source to build cloud distributed file system and distributed data analysis. Hadoop is suitable for batch applications and write-once-and-read-many applications. Thus, currently there are only fewer applications, such as pattern searching and log file analysis, to be implemented over Hadoop. However, almost all database applications are still using relational databases. To port them into cloud platform, it becomes necessary to let relational database running over HDFS. So we will test the solution of FUSE-DFS which is an interface to mount HDFS into a system and is used like a local filesystem. If we can make FUSE-DFS performance satisfy user¡¦s application, then we can easier persuade people to port their application into cloud platform with least overhead. distributed file system Cloud Computing relational database interface performance analysis

Search results