Global ETD Search

361	Database alignment: fundamental limits and multiple databases setting K, Zeynep 13 September 2024 (has links) In modern data analysis, privacy is a critical concern when dealing with user-related databases. Ensuring user anonymity while extracting meaningful correlations from the data poses a significant challenge, especially when side information can potentially enable de-anonymization. This dissertation explores the standard information-theoretic problems in the correlated databases model. We define a "database" as a simple probabilistic model that contains a random feature vector for each user, with user labels shuffled to ensure anonymity. We first investigate correlation detection between two databases, formulating it as a composite binary hypothesis testing problem. Under the alternate hypothesis, there exists an unknown permutation that aligns users in the first database with those in the second, thereby matching correlated entries. The null hypothesis assumes that the databases are independent, with no such alignment. For the special case of Gaussian feature vectors, we derive both upper and lower bounds on the correlation required to achieve or fail to achieve this statistical problem. Our results are tight up to a constant factor when the feature length exceeds the number of users. Regarding our achievability boundary, we draw connections to the user labeling recovery problem, highlighting significant parallels and insights. Additionally, for the two databases model, we initially examine the potential gaps in the statistical analysis conducted thus far for the large number of users regime by drawing parallels with similar problems in the literature. Motivated by these comparisons, we propose a novel approach to address the detection problem, focusing on the hidden permutation structure and intricate dependencies characterizing these relationships. Building on our research, we present a comprehensive model for handling multiple correlated databases. In this multiple-databases setting, we address another fundamental information-theoretic problem: user label recovery. We evaluate the performance of the typicality matching estimator in relation to the asymptotic behavior of feature length, demonstrating an impossibility result that holds up to a multiplicative constant factor. This exploration into multiple databases not only broadens the scope of our study but also underscores the complexity and richness of correlation detection in a more generalized framework. In conclusion, we summarize the statistical gaps identified in our findings, exploring their possible origins. We also discuss the limitations of our simple probabilistic model and propose strategies to address them. Finally, we outline potential future research directions, including the information-theoretic problem of change detection, which remains an open area of significant interest. Electrical engineering Correlated databases Hypothesis testing Recovery
362	A Highly Fault-Tolerant Distributed Database System with Replicated Data Lin, Tsai S. (Tsai Shooumeei) 12 1900 (has links) Because of the high cost and impracticality of a high connectivity network, most recent research in transaction processing has focused on a distributed replicated database system. In such a system, multiple copies of a data item are created and stored at several sites in the network, so that the system is able to tolerate more crash and communication failures and attain higher data availability. However, the multiple copies also introduce a global inconsistency problem, especially in a partitioned network. In this dissertation a tree quorum algorithm is proposed to solve this problem, imposing a logical tree structure along with dynamic system reconfiguration on all the copies of each data item. The proposed algorithm can be viewed as a dynamic voting technique which, with the help of an appropriate concurrency control algorithm, exhibits the major advantages of quorum-based replica control algorithms and of the available copies algorithm, so that a single copy is read for a read operation and a quorum of copies is written for a write operation. In addition, read and write quorums are computed dynamically and independently. As a result expensive read operations, like those that require several copies of a data item to be read in most quorum schemes, are eliminated. Furthermore, the message costs of read and write operations are reduced by the use of smaller quorum sizes. Quorum sizes can be reduced to a constant in a lightly loaded system, and log n in a failure-free network, as well as [n +1/2] in a partitioned network in a heavily loaded system. On average, our algorithm requires fewer messages than the best known tree quorum algorithm, while still maintaining the same upper bound on quorum size. One-copy serializability is guaranteed with higher data availability and highest degree of fault tolerance (up to n - 1 site failures). databases tree quorum algorithms computer science Distributed databases. File organization (Computer science)
363	Bancos de dados geográficos: uma análise das arquiteturas dual (Spring) e integrada (Oracle Spatial). / Spatial databases: an analyse of the architectures dual (Spring) e integrated (Oracle Spatial). Silva, Rosângela 29 August 2002 (has links) As características particulares dos dados geográficos constituem a razão pela qual se faz necessário estruturar novos tipos de dados e arquitetar novas formas de armazenamento e acesso aos dados. Este trabalho apresenta uma análise considerando as Arquiteturas Dual e Integrada em relação à forma de gerenciamento e recuperação da informação espacial, em conjunto com as informações não espaciais. Este trabalho aborda os conceitos fundamentais acerca dos Sistemas Gerenciadores de Banco de Dados Geográficos. Para demonstrar como estes conceitos são importantes e influenciam diretamente na eficiência dos mesmos, conclui-se o trabalho com o desenvolvimento de alguns testes de funcionalidade sob duas ferramentas com arquiteturas distintas, são elas: o SPRING, de Arquitetura Dual, e o ORACLE SPATIAL, de Arquitetura Integrada. Os testes de funcionalidade objetivaram verificar se e como as ferramentas em estudo, suportam determinados tipos de consultas espaciais. Para tanto foi escolhido o cenário de Planejamento Urbano e selecionados alguns tipos de consultas envolvendo componentes espaciais, que normalmente são implementadas neste tipo de aplicação. Os resultados obtidos permitem concluir, principalmente, que as ferramentas analisadas suportaram as consultas espaciais utilizadas nos testes - algumas envolvendo o objeto espacial e o atributo ao mesmo tempo - porém, com algumas restrições. Além disso, foi possível fazer algumas considerações em relação à utilização ou não de índices espaciais para otimização das consultas e algumas constatações sobre as arquiteturas de banco de dados geográficos analisadas, em relação à integração dos dados espaciais com os dados não espaciais. / The spatial data complexity justifies the need to develop new spatial data types and to design new structures to store, to query and to handle spatially referenced data inside a database management system (DBMS). This work presents an analysis of these issues considering the different architectures of geographic databases. The Dual and Integrated arquitectures are considered in relation to spatial data and attribute handling. The principal concepts concerning the spatial DBMS are presented. To demonstrate how these concepts are important and influence directly in the efficiency of Geographic Information Systems (GIS) tools, these work concludes with the development of some tests of functionality. Two GIS programs with distinct architectures were tested, and they are: the SPRING, of Dual Architecture, and ORACLE SPATIAL, of Integrated Architecture. The functionality tests aimed to verify if the tools in study support some kind of spatial queries, describing the necessary steps to perform these queries. In order to perform the tests, an Urban Planning application was chosen and some spatial queries were defined and executed. geographic information systems geoprocessamento sistemas de informação geográfica spatial databases spatial databases architectures
364	Mining Oncology Data: Knowledge Discovery in Clinical Performance of Cancer Patients Hayward, John T 16 August 2006 (has links) "Our goal in this research is twofold: to develop clinical performance databases of cancer patients, and to conduct data mining and machine learning studies on collected patient records. We use these studies to develop models for predicting cancer patient medical outcomes. The clinical database is developed in conjunction with surgeons and oncologists at UMass Memorial Hospital. Aspects of the database design and representation of patient narrative are discussed here. Current predictive model design in medical literature is dominated by linear and logistic regression techniques. We seek to show that novel machine learning methods can perform as well or better than these traditional techniques. Our machine learning focus for this thesis is on pancreatic cancer patients. Classification and regression prediction targets include patient survival, wellbeing scores, and disease characteristics. Information research in oncology is often constrained by type variation, missing attributes, high dimensionality, skewed class distribution, and small data sets. We compensate for these difficulties using preprocessing, meta-learning, and other algorithmic methods during data analysis. The predictive accuracy and regression error of various machine learning models are presented as results, as are t-tests comparing these to the accuracy of traditional regression methods. In most cases, it is shown that the novel machine learning prediction methods offer comparable or superior performance. We conclude with an analysis of results and discussion of future research possibilities." Clinical Performance Databases Cancer oncology Knowledge Discovery in Databases data mining Cancer Treatment Data processing Data mining
365	A comprehensive RNA-RNA database with interaction prediction and data mining / CUHK electronic theses & dissertations collection January 2015 (has links) Non-coding RNAs (ncRNAs) have important biological functions such as regulation of gene expression and disease causality. These ncRNAs exert their functions by interacting with other molecules, such as messenger RNAs (mRNAs). Thus RNA-RNA interaction studies are important for understanding the gene regulation mechanism and for curing ncRNAs related diseases. This thesis contributes to RNA-RNA interaction prediction problem, construction of a comprehensive human micro RNAs (miRNAs)-related database and data mining on high throughput RNA-RNA interaction data. / On RNA-RNA interaction prediction problem, a novel energy model is proposed and a GA based algorithm is developed, namely RIPGA. The experiments results show that the novel energy model outperforms the state of the art model, which is called Turner energy model, RIPGA with novel energy model also outperforms two state of the art programs, which are called inRNAs and RactIP. / On construction of a comprehensive human miRNA-related database, data are collected and cleansed from three state of the art databases related to human miRNAs, which are called miR-TarBase, miRBase and HMDD v2.0. A network is constructed from these data to present the complete relationships of the miRNAs because the relationships are only partial covered by the existing databases. A website and database are setup for data query, visualization and analysis functions to complement the existing databases. / On data mining on high-throughput RNA-RNA interaction data, four characteristics of RNA-RNA interaction are identified from the high-throughput data. We believe these characteristics are potential explanations of the high degree of connectivity of some miRNAs. These characteristics are also important scientific knowledge for future research on RNA-RNA interaction and control for biomedical applications. / 非編碼核糖核酸(ncRNAs)有重要的生物功能，例如：基因表現的調控和疾病的因果關係。這些ncRNAs透過與其他分子的互相作用來發揮作用，例如：信使核糖核酸(mRNAs)。因此，核糖核酸互相作用的研究對理解基因表現的調控和治愈與ncRNAs有關的疾病十分重要。本論文集中解決核糖核酸互相作用的預測問題、建設人類微核糖核酸(miRNAs)綜合數據庫和高通量核糖核酸互相作用數據的數據挖掘。 / 針對核糖核酸互相作用的預測問題，我們提出了一個新的能量模型和開發了一個基於遺傳算法的算法，即RIPGA。實驗數據顯示新的能量模型比最先進的模型﹐即特納能量模型做得更好。而RIPGA亦比最先進的inRNAs和RacIP做得更好。 / 針對建設人類微核糖核酸綜合數據庫，我們收集及潔淨來自三個最先進的數據庫的數據，即miRTarBase，miRBase和HMDDv2.0。我們由這些數據建設了一個網絡來表達完整的miRNAs關係，因為現存的數據庫只覆蓋了部份的關係。我們亦設置了網站和數據庫，並提供數據查詢、可視化及分析的功能，以補足現存的數據庫。 / 針對高通量核糖核酸互相作用數據的數據挖掘，我們從高通量核糖核酸互相作用數據中認出四個核糖核酸互相作用的特點。我們相信這些特點是部份高連通miRNAs的可能的解釋。這些特點對核糖核酸互相作用及控制的未來研究和生物醫學應用亦是重要的科學知識。 / Cheung, Kwan Yau. / Thesis M.Phil. Chinese University of Hong Kong 2015. / Includes bibliographical references (leaves 149-154). / Abstracts also in Chinese. / Title from PDF title page (viewed on 05, October, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. Non-coding RNA--Databases Data mining RNA, Untranslated Databases, Genetic Data Mining
366	APPLICATION OF BLOCKCHAIN NETWORK FOR THE USE OF INFORMATION SHARING Unknown Date (has links) The Blockchain concept was originally developed to provide security in the Bitcoin cryptocurrency network, where trust is achieved through the provision of an agreed-upon and immutable record of transactions between parties. The use of a Blockchain as a secure, publicly distributed ledger is applicable to fields beyond finance, and is an emerging area of research across many other fields in the industry. This thesis considers the feasibility of using a Blockchain to facilitate secured information sharing between parties, where a lack of trust and absence of central control are common characteristics. Implementation of a Blockchain Information Sharing system will be designed on an existing Blockchain network with as a communicative party members sharing secured information. The benefits and risks associated with using a public Blockchain for information sharing will also be discussed. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection Blockchains (Databases) Data encryption (Computer science) Personal data protection Bitcoin
367	An Examination of Small Businesses' Propensity to Adopt Cloud-Computing Innovation Powelson, Steven E. 01 January 2011 (has links) The problem researched was small business leaders' early and limited adoption of cloud computing. Business leaders that do not use cloud computing may forfeit the benefits of its lower capital costs and ubiquitous accessibility. Anchored in a diffusion of innovation theory, the purpose of this quantitative cross-sectional survey study was to examine if there is a relationship between small business leaders' view of cloud-computing attributes of compatibility, complexity, observability, relative advantage, results demonstrable, trialability, and voluntariness and intent to use cloud computing. The central research question involved understanding the extent to which each cloud-computing attribute relate to small business leaders' intent to use cloud computing. A sample of 3,897 small business leaders were selected from a commerce authority e-mail list yielding 151 completed surveys that were analyzed using regression. Significant correlations were found for the relationships between the independent variables of compatibility, complexity, observability, relative advantage, and results demonstrable and the dependent variable intent to use cloud computing. However, no significant correlation was found between the independent variable voluntariness and intent to use. The findings might provide new insights relating to cloud-computing deployment and commercialization strategies for small business leaders. Implications for positive social change include the need to prepare for new skills for workers affected by cloud computing adoption and cloud-computing ecosystem's reduced environmental consequences and policies. Databases and Information Systems
368	The relationship between cell phone use and identity theft Saunders, Lewis O. 01 January 2011 (has links) The growth of mobile phone use has paralleled increased reports of identity theft. Identity theft can result in financial loss and threats to a victim's personal safety. Although trends in identity theft are well-known, less is known about individual cell phone users' attitudes toward identity theft and the extent to which they connect it to cell phone use. The purpose of this qualitative study was to determine how cell phone use is affected by attitudes toward privacy and identity theft. The study was based on social impact theory, according to which people's attitudes and behavior are affected by the strength and immediacy of others' attitudes and behavior. The research questions concerned the extent to which participants connected cell phone use with decreasing privacy and increasing cybercrime, how the use of biometrics affected cell phone users' attitudes and behavior, and what steps can be taken to reduce the misuse of private information associated with cell phone use. Data collection consisted of personal interviews with representatives from 3 groups: a private biometrics company, individual cell phone users who earn more than {dollar}55,000 a year, and individual cell phone users who earn less than {dollar}55,000 a year. Interviews were transcribed and coded for themes and patterns. Findings showed that interviewees were more likely to see identity theft as a problem among the public at large than in the industries in which they worked. Participants recommended a variety of measures to improve cell phone security and to reduce the likelihood of identity theft: passwords, security codes, voice or fingerprint recognition, and encryption. The implications for positive social change include informing government officials and individual users about the use and abuse of cell phones in order to decrease violations of privacy and identity theft while still promoting national security. Databases and Information Systems Public Administration Public Policy
369	A geographic data model for groundwater systems Strassberg, Gil 28 August 2008 (has links) Not available / text Aquifers -- Simulation methods Aquifers -- Databases Groundwater -- Simulation methods Groundwater -- Databases Geodatabases
370	The need for object-oriented systems to extend or replace the relational database model to solve performance problems Gibson, Mark G. January 1992 (has links) The relational model has dominated the database field because of its reduced application development time and non-procedural data manipulation features. It has significant problems, however, including weak integrity constraints. This paper discusses the need for object oriented techniques to improve on these flaws. Three existing DBMS will be discussed: IRIS, ORION, and OZ. / Department of Computer Science Database design. Object-oriented databases. Relational databases.

Search results