Global ETD Search

671	Database Forensics in the Service of Information Accountability Pavlou, Kyriacos Eleftheriou January 2012 (has links) Regulations and societal expectations have recently emphasized the need to mediate access to valuable databases, even by insiders. At one end of a spectrum is the approach of restricting access to information; at the other is information accountability. The focus of this work is on effecting information accountability of data stored in relational databases. One way to ensure appropriate use and thus end-to-end accountability of such information is through continuous assurance technology, via tamper detection in databases built upon cryptographic hashing. We show how to achieve information accountability by developing and refining the necessary approaches and ideas to support accountability in high-performance databases. These concepts include the design of a reference architecture for information accountability and several of its variants, the development of a sequence of successively more sophisticated forensic analysis algorithms and their forensic cost model, and a systematic formulation of forensic analysis for determining when the tampering occurred and what data were tampered with. We derive a lower bound for the forensic cost and prove that some of the algorithms are optimal under certain circumstances. We introduce a comprehensive taxonomy of the types of possible corruption events, along with an associated forensic analysis protocol that consolidates all extant forensic algorithms and the corresponding type(s) of corruption events they detect. Finally, we show how our information accountability solution can be used for databases residing in the cloud. In order to evaluate our ideas we design and implement an integrated tamper detection and forensic analysis system named DRAGOON. This work shows that information accountability is a viable alternative to information restriction for ensuring the correct storage, use, and maintenance of high-performance relational databases. forensic analysis algorithms information accountability relational databases security Computer Science cloud computing database forensics
672	Metodas greitai duomenų paieškai duomenų bazėse / Full-text database search method Balzaravičius, Povilas 13 August 2010 (has links) Magistro darbe nagrinėjama pilno teksto paieškos veikimo sparta. Paieškos sistema sukurta naudojantis PHP ir MySQL priemonėmis, taip pat panaudojant Memcached laikinosios atminties valdymo sistemą. Tyrimui atlikti reikalingi duomenys paimti iš RSS srautų agregatoriaus, periodiškai nuskaitančio informaciją iš lietuviškų tinklaraščių. Darbe didžiausias dėmesys skiriamas paieškos variklio indeksavimo ir rezultatų atrinkimo procesų vykdymo trukmei. Tiriant indeksavimo procesą, veikimo sparta išmatuota dirbant su skirtingais duomenų kiekiais. Turint daug duomenų, šis veiksmas reikalauja daug kompiuterio resursų ir gali trukti ilgai. Indeksuojant duomenis proceso trukmė buvo išmatuota naudojant ir nenaudojant Memcached ir ignoruojamų žodžių sąrašų. Paieškos tyrimo metu tirtas paieškos užklausos elementų kiekio įtaka rezultatų atrinkimui. Abiejų procesų veiklų matavimai atlikti naudojant InnoDB ir MyISAM lenteles duomenų saugojimui. Tyrimo metu rasti sprendimai, kuriuos rekomenduojama rinktis realizuojant arba naudojant panašiais principais paremtą paieškos sistemą. / In this Master thesis the performance of full-text search is analyzed. Search engine is implemented by using PHP and MySQL also Memcached cache engine. Data for the research is gathered from RSS aggregator that periodically collects information from lithuanian blogs. The main attention is drawn on speed of both data indexing and search results gathering. While analyzing indexing process the speed of its performance was measured on different data sets. Indexing requires a lot of computer resources and can last long when having a lot of data. The duration of this process was measured with enabled and disabled Memcached service and list of ignored words. A test on how strongly query's number of elements influence the collection of results was performed. Performances were tested by using both InnoDB and MyISAM tables. Paper suggests solutions that are recommended for implementing and using similar search engines. Informatics Paieška Indeksavimas MySQL PHP Duomenų bazės Search Indexing MySQL PHP Databases
673	Langage de mashup pour l'intégration autonomique de services de données dans des environements dynamiques Othman-Abdallah, Mohamad 26 February 2014 (has links) (PDF) Dans les communautés virtuelles, l'intégration des informations (provenant de sources ou services de données) est centrée sur les utilisateurs, i.e., associée à la visualisation d'informations déterminée par les besoins des utilisateurs. Une communauté virtuelle peut donc être vue comme un espace de données personnalisable par chaque utilisateur en spécifiant quelles sont ses informations d'intérêt et la façon dont elles doivent être présentées en respectant des contraintes de sécurité et de qualité de services (QoS). Ces contraintes sont définies ou inférées puis interprétées pour construire des visualisations personnalisées. Il n'existe pas aujourd'hui de langage déclaratif simple de mashups pour récupérer, intégrer et visualiser des données fournies par des services selon des spécifications spatio-temporelles. Dans le cadre de la thèse il s'agit de proposer un tel langage. Ce travail est réalisé dans le cadre du projet Redshine, bénéficiant d'un Bonus Qualité Recherche de Grenoble INP. services de données mashup cache
674	Optimal Path Queries in Very Large Spatial Databases Zhang, Jie January 2005 (has links) Researchers have been investigating the optimal route query problem for a long time. Optimal route queries are categorized as either unconstrained or constrained queries. Many main memory based algorithms have been developed to deal with the optimal route query problem. Among these, Dijkstra's shortest path algorithm is one of the most popular algorithms for the unconstrained route query problem. The constrained route query problem is more complicated than the unconstrained one, and some constrained route query problems such as the Traveling Salesman Problem and Hamiltonian Path Problem are NP-hard. There are many algorithms dealing with the constrained route query problem, but most of them only solve a specific case. In addition, all of them require that the entire graph resides in the main memory. Recently, due to the need of applications in very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), several disk-based algorithms have been derived by using divide-and-conquer techniques to solve the shortest path problem in a very large graph. However, until now little research has been conducted on the disk-based constrained problem. <br /><br /> This thesis presents two algorithms: 1) a new disk-based shortest path algorithm (DiskSPNN), and 2) a new disk-based optimal path algorithm (DiskOP) that answers an optimal route query without passing a set of forbidden edges in a very large graph. Both algorithms fit within the same divide-and-conquer framework as the existing disk-based shortest path algorithms proposed by Ning Zhang and Heechul Lim. Several techniques, including query super graph, successor fragment and open boundary node pruning are proposed to improve the performance of the previous disk-based shortest path algorithms. Furthermore, these techniques are applied to the DiskOP algorithm with minor changes. The proposed DiskOP algorithm depends on the concept of collecting a set of boundary vertices and simultaneously relaxing their adjacent super edges. Even if the forbidden edges are distributed in all the fragments of a graph, the DiskOP algorithm requires little memory. Our experimental results indicate that the DiskSPNN algorithm performs better than the original ones with respect to the I/O cost as well as the running time, and the DiskOP algorithm successfully solves a specific constrained route query problem in a very large graph. Computer Science Optimal route query very large spatial databases forbidden edge constraint
675	Novel storage architectures and pointer-free search trees for database systems Vasaitis, Vasileios January 2012 (has links) Database systems research is an old and well-established field in computer science. Many of the key concepts appeared as early as the 60s, while the core of relational databases, which have dominated the database world for a while now, was solidified during the 80s. However, the underlying hardware has not displayed such stability in the same period, which means that a lot of assumptions that were made about the hardware by early database systems are not necessarily true for modern computer architectures. In particular, over the last few decades there have been two notable consistent trends in the evolution of computer hardware. The first is that the memory hierarchy of mainstream computer systems has been getting deeper, with its different levels moving away from each other, and new levels being added in between as a result, in particular cache memories. The second is that, when it comes to data transfers between any two adjacent levels of the memory hierarchy, access latencies have not been keeping up with transfer rates. The challenge is therefore to adapt database index structures so that they become immune to these two trends. The latter is addressed by gradually increasing the size of the data transfer unit; the former, by organizing the data so that it exhibits good locality for memory transfers across multiple memory boundaries. We have developed novel structures that facilitate both of these strategies. We started our investigation with the venerable B+-tree, which is the cornerstone order-preserving index of any database system, and we have developed a novel pointer-free tree structure for its pages that optimizes its cache performance and makes it immune to the page size. We then adapted our approach to the R-tree and the GiST, making it applicable to multi-dimensional data indexes as well as generalized indexes for any abstract data type. Finally, we have investigated our structure in the context of main memory alone, and have demonstrated its superiority over the established approaches in that setting too. While our research has its roots in data structures and algorithms theory, we have conducted it with a strong experimental focus, as the complex interactions within the memory hierarchy of a modern computer system can be quite challenging to model and theorize about effectively. Our findings are therefore backed by solid experimental results that verify our hypotheses and prove the superiority of our structures over competing approaches. 004
676	Temporally Correct Algorithms for Transaction Concurrency Control in Distributed Databases Tuck, Terry W. 05 1900 (has links) Many activities are comprised of temporally dependent events that must be executed in a specific chronological order. Supportive software applications must preserve these temporal dependencies. Whenever the processing of this type of an application includes transactions submitted to a database that is shared with other such applications, the transaction concurrency control mechanisms within the database must also preserve the temporal dependencies. A basis for preserving temporal dependencies is established by using (within the applications and databases) real-time timestamps to identify and order events and transactions. The use of optimistic approaches to transaction concurrency control can be undesirable in such situations, as they allow incorrect results for database read operations. Although the incorrectness is detected prior to transaction committal and the corresponding transaction(s) restarted, the impact on the application or entity that submitted the transaction can be too costly. Three transaction concurrency control algorithms are proposed in this dissertation. These algorithms are based on timestamp ordering, and are designed to preserve temporal dependencies existing among data-dependent transactions. The algorithms produce execution schedules that are equivalent to temporally ordered serial schedules, where the temporal order is established by the transactions' start times. The algorithms provide this equivalence while supporting currency to the extent out-of-order commits and reads. With respect to the stated concern with optimistic approaches, two of the proposed algorithms are risk-free and return to read operations only committed data-item values. Risk with the third algorithm is greatly reduced by its conservative bias. All three algorithms avoid deadlock while providing risk-free or reduced-risk operation. The performance of the algorithms is determined analytically and with experimentation. Experiments are performed using functional database management system models that implement the proposed algorithms and the well-known Conservative Multiversion Timestamp Ordering algorithm. Distributed databases. Multitasking (Computer science) Database management. temporal dependencies transaction concurrency control database management
677	Identifying Relationships between Scientific Datasets Alawini, Abdussalam 03 May 2016 (has links) Scientific datasets associated with a research project can proliferate over time as a result of activities such as sharing datasets among collaborators, extending existing datasets with new measurements, and extracting subsets of data for analysis. As such datasets begin to accumulate, it becomes increasingly difficult for a scientist to keep track of their derivation history, which complicates data sharing, provenance tracking, and scientific reproducibility. Understanding what relationships exist between datasets can help scientists recall their original derivation history. For instance, if dataset A is contained in dataset B, then the connection between A and B could be that A was extended to create B. We present a relationship-identification methodology as a solution to this problem. To examine the feasibility of our approach, we articulated a set of relevant relationships, developed algorithms for efficient discovery of these relationships, and organized these algorithms into a new system called ReConnect to assist scientists in relationship discovery. We also evaluated existing alternative approaches that rely on flagging differences between two spreadsheets and found that they were impractical for many relationship-discovery tasks. Additionally, we conducted a user study, which showed that relationships do occur in real-world spreadsheets, and that ReConnect can improve scientists' ability to detect such relationships between datasets. The promising results of ReConnect's evaluation encouraged us to explore a more automated approach for relationship discovery. In this dissertation, we introduce an automated end-to-end prototype system, ReDiscover, that identifies, from a collection of datasets, the pairs that are most likely related, and the relationship between them. Our experimental results demonstrate the overall effectiveness of ReDiscover in predicting relationships in a scientist's or a small group of researchers' collections of datasets, and the sensitivity of the overall system to the performance of its various components. Information retrieval Database management Databases and Information Systems
678	An analysis of the use of medical applications required for complex humanitarian disasters and emergencies via Hastily Formed Networks (HFN) in the field Kelley, Sean William 09 1900 (has links) This thesis analyzes the feasibility, efficacy and usability of medical operations working in concert with a Fly-Away Kit (FLAK) and the forming of Hastily Formed Networks (HFNs) in support of Humanitarian Assistance and Disaster Relief (HA/DR) operations. The initial focus of this research is on the requirements, situation, area of operations, and mission differences between nongovernmental organizations and governmental organizations. The thesis researches and discusses the possibilities for implementing medical technology in the field and the conditions and scenarios in HA/DR that may affect its success. This process will also define the requirements for medical operations as well as facilitate a methodology for ensuring those requirements are met. This thesis investigates the suitability of currently available COTS hardware and software components for medical operations. In addition, it includes a comprehensive review of the value of electronic medical records and telemedicine technologies. Virtually all organizations responding to the December 26, 2004 Southeast Asia tsunami did not have the benefit of large scale medical information technology. For example, the ability to ascertain the real extent of injuries due to the tsunami was hampered by the lack of a central database. Initial media reports claimed a death toll of over 300,000 people, when in fact hindsight now provides a more accurate tally of just over 200,000 dead. This disparity resulted from an archaic system of tracking and accounting. Undoubtedly, humanitarian medical organizations will greatly benefit from the implementation of medical information technology capabilities. This thesis lays the groundwork for further research into medical technologies that can be deployed in the field with humanitarian medical teams in the near future. Medical care Computer programs Medical supplies Databases Electronics Information technology Management
679	Efficient Disk-Based Techniques for Manipulating Very Large String Databases Allam, Amin 18 May 2017 (has links) Indexing and processing strings are very important topics in database management. Strings can be database records, DNA sequences, protein sequences, or plain text. Various string operations are required for several application categories, such as bioinformatics and entity resolution. When the string count or sizes become very large, several state-of-the-art techniques for indexing and processing such strings may fail or behave very inefficiently. Modifying an existing technique to overcome these issues is not usually straightforward or even possible. A category of string operations can be facilitated by the suffix tree data structure, which basically indexes a long string to enable efficient finding of any substring of the indexed string, and can be used in other operations as well, such as approximate string matching. In this document, we introduce a novel efficient method to construct the suffix tree index for very long strings using parallel architectures, which is a major challenge in this category. Another category of string operations require clustering similar strings in order to perform application-specific processing on the resulting possibly-overlapping clusters. In this document, based on clustering similar strings, we introduce a novel efficient technique for record linkage and entity resolution, and a novel method for correcting errors in a large number of small strings (read sequences) generated by the DNA sequencing machines. large databases string processing disk-based Suffix tree record linkage error correction
680	A Proposed Frequency-Based Feature Selection Method for Cancer Classification Pan, Yi 01 April 2017 (has links) Feature selection method is becoming an essential procedure in data preprocessing step. The feature selection problem can affect the efficiency and accuracy of classification models. Therefore, it also relates to whether a classification model can have a reliable performance. In this study, we compared an original feature selection method and a proposed frequency-based feature selection method with four classification models and three filter-based ranking techniques using a cancer dataset. The proposed method was implemented in WEKA which is an open source software. The performance is evaluated by two evaluation methods: Recall and Receiver Operating Characteristic (ROC). Finally, we found the frequency-based feature selection method performed better than the original ranking method. frequency-based feature selection ranking method WEKA ROC Bioinformatics Computer Sciences Databases and Information Systems

Search results