Global ETD Search

411	Analysis and Experimental Comparison of Graph Databases / Analysis and Experimental Comparison of Graph Databases Kolomičenko, Vojtěch January 2013 (has links) In the recent years a new type of NoSQL databases, called Graph databases (GDBs), has gained significant popularity due to the increasing need of processing and storing data in the form of a graph. The objective of this thesis is a research on possibilities and limitations of GDBs and conducting an experimental comparison of selected GDB implementations. For this purpose the requirements of a universal GDB benchmark have been formulated and an extensible benchmarking tool, named BlueBench, has been developed.
412	Query evaluation with constant delay / L'évaluation de requêtes avec un délai constant Kazana, Wojciech 16 September 2013 (has links) Cette thèse se concentre autour du problème de l'évaluation des requêtes. Étant donné une requête q et une base de données D, l'objectif est de calculer l'ensemble q(D) des uplets résultant de l'évaluation de q sur D. Toutefois, l'ensemble q(D) peut être plus grand que la base de données elle-même car elle peut avoir une taille de la forme n^l où n est la taille de la base de données et l est l'arité de la requête. Calculer entièrement q(D) peut donc nécessiter plus que les ressources disponibles. L'objectif principal de cette thèse est une solution particulière à ce problème: une énumération de q(D) avec un délai constant. Intuitivement, cela signifie qu'il existe un algorithme avec deux phases: une phase de pré-traitement qui fonctionne en temps linéaire dans la taille de la base de données, suivie d'une phase d'énumération produisant un à un tous les éléments de q(D) avec un délai constant (indépendant de la taille de la base de données) entre deux éléments consécutifs. En outre, quatre autres problèmes sont considérés: le model-checking (où la requête q est un booléen), le comptage (où on veut calculer la taille \|q(D)\|), les tests (où on s'intéresse à un test efficace pour savoir si un uplet donné appartient au résultat de la requête) et la j-ième solution (où on veut accéder directement au j-ième élément de q(D)). Les résultats présentés dans cette thèse portent sur les problèmes ci-dessus concernant: - les requêtes du premier ordre sur les classes de structures de degré borné, - les requêtes du second ordre monadique sur les classes de structures de largeur d'arborescente bornée, - les requêtes du premier ordre sur les classes de structures avec expansion bornée. / This thesis is concentrated around the problem of query evaluation. Given a query q and a database D it is to compute the set q(D) of all tuples in the output of q on D. However, the set q(D) may be larger than the database itself as it can have a size of the form n^l where n is the size of the database and l the arity of the query. It can therefore require too many of the available resources to compute it entirely. The main focus of this thesis is a particular solution to this problem: a scenario where in stead of just computing, we are interested in enumerating q(D) with constant delay. Intuitively, this means that there is a two-phase algorithm working as follows: a preprocessing phase that works in time linear in the size of the database, followed by an enumeration phase outputting one by one all the elements of q(D) with a constant delay (which is independent from the size of the database) between any two consecutive outputs. Additionally, four more problems related to enumeration are also considered in the thesis. These are model-checking (where the query q is boolean), counting (where one wants to compute just the size \|q(D)\| of the output set), testing (where one is interested in an efficient test for whether a given tuple belongs to the output of the query or not) and j-th solution (where, one wants to be able to directly access the j-th element of q(D)). The results presented in the thesis address the above problems with respect to: - first-order queries over the classes of structures with bounded degree, - monadic second-order queries over the classes of structures with bounded treewidth, - first-order queries over the classes of structures with bounded expansion. Bases de données Évaluation des requêtes Logique Databases Query evaluation Logic
413	A GIS-Centric Approach for Modeling Vessel Management Behavior System Data to Determine Oyster Vessel Behavior on Public Oyster Grounds in Louisiana Gallegos, David X 18 December 2014 (has links) The satellite communications system called the Vessel Management System was used to provide geospatial data on oyster fishing over the nearly 1.7 million acres of the public water bottoms in Louisiana. An algorithm to analyze the data was developed in order to model vessel behaviors including docked, gearing, fishing and traveling. Vessel speeds were calculated via the Haversine formula at small and large intervals and compared to derive a measure of linearity. The algorithm was implemented into software using Python and inserted into a PostgreSQL database supporting geospatial information. Queries were developed to obtain reports on vessel activities and daily effort expended per behavior. ArcGIS was used to display and interpret the patterns produced by the vessel activity, yielding information about fishing activity clusters and effort which implied the location and productiveness of oyster reefs. VMS Fisheries Louisiana GIS Modeling Behavior Databases and Information Systems
414	Base de dados online na disseminação sobre lazer de idosos / Teodoro, Ana Paula Evaristo Guizarde. January 2011 (has links) Orientador: Gisele Maria Schwartz / Banca: Jossett Campagna / Banca: Afonso Antonio Machado / Resumo: O processo evolutivo da internet tem ampliado a adesão humana ao ambiente virtual, o que levou a comunidade científica a investir esforços para compreender melhor os aspectos inerentes à qualidade dessa interação. No âmbito das pesquisas envolvendo a temática do lazer virtual, bem pouco se tem debruçado atenção sobre a qualidade das informações específicas veiculadas, principalmente, em relação ao público idoso, instigando a atenção deste estudo. Este estudo, de natureza qualitativa, teve como objetivo a produção de uma base de dados, por intermédio de links contendo informações a respeito dos conteúdos culturais do lazer, neste caso, para a população idosa, além da elaboração e aplicação do Inventário de Avaliação de Usabilidade de Sites sobre Lazer (IAUSLA-21+). Esse instrumento foi dividido em duas partes, sendo a primeira referente à caracterização da amostra e a segunda composta por uma escala do tipo Likert, graduada em 5 pontos. O instrumento foi aplicado após vivências inclusivas no site, com uma amostra intencional de 60 sujeitos, de ambos os sexos, acima de 60 anos, familiarizados com o uso de computador e que se dispuseram a participar do estudo, pertencentes a programas de inclusão digital da cidade de Rio Claro-SP e São Paulo-SP. Os dados provenientes da aplicação dos instrumentos foram analisados de forma descritiva, por meio da utilização da Técnica de Análise de Conteúdo Temática e ilustrados numericamente, de modo percentual. Em relação ao layout, informação e operacionalização, os resultados reiteram que a base de dados construída atendeu aos objetivos do estudo, indicando boa usabilidade, porém, atualizações deverão ser realizadas sempre que necessário. Sugere-se que novas possibilidades referentes ao campo do lazer virtual sejam oferecidas aos idosos, no sentido de ampliar as perspectivas de vivências frente às novas tecnologias / Abstract: The evolutionary process of the Internet has expanded the membership to human virtual environment, which led the scientific community to make efforts to better understand the intrinsic qualities of this interaction. In the context of research involving the theme of virtual entertainment, little attention has been addressing the quality of information on specific vehicles, particularly in relation to senior public, arousing the attention of this study. Therefore, this qualitative study aimed to produce a database, through links to information about the cultural content of leisure in this case for the elderly, and the development and implementation of the Usability Evaluation of Recreation Sites Inventory (IAUSLA-21 +). The instrument was divided into two parts, the first referring to the characterization of the sample, and the second consisting of a Likert type scale, graduated in 5 points. The instrument was administered to an intentional sample of 60 subjects of both sexes over 60 years, familiar with computer use and that were willing to participate in the study, belonging to the Digital Inclusion Programs, at Rio Claro-SP and Sao Paulo-SP cities. Data from the application of the instruments were descriptively analyzed through the use of Thematic Content Analysis Technique and numerically illustrated by percentage. The results indicated that the built database met the objectives of the study, indicating good usability, but updates should be held whenever necessary. Therefore, it is suggested that new possibilities could be offered to the elderly, including those relating to leisure field, to broaden the perspectives of experiences facing the new technologies / Mestre Idoso - Lazer. Banco de dados. Older people. eng Databases. eng
415	Tratamento de condições especiais para busca por similaridade em bancos de dados complexos / Treatment of special conditional for similarity searching in complex data bases Kaster, Daniel dos Santos 23 April 2012 (has links) A quantidade de dados complexos (imagens, vídeos, séries temporais e outros) tem crescido rapidamente. Dados complexos são adequados para serem recuperados por similaridade, o que significa definir consultas de acordo com um dado critério de similaridade. Além disso, dados complexos usualmente são associados com outras informações, geralmente de tipos de dados convencionais, que devem ser utilizadas em conjunto com operações por similaridade para responder a consultas complexas. Vários trabalhos propuseram técnicas para busca por similaridade, entretanto, a maioria das abordagens não foi concebida para ser integrada com um SGBD, tratando consultas por similaridade como operações isoladas, disassociadas do processador de consultas. O objetivo principal desta tese é propor alternativas algébricas, estruturas de dados e algoritmos para permitir um uso abrangente de consultas por similaridade associadas às demais operações de busca disponibilizadas pelos SGBDs relacionais e executar essas consultas compostas eficientemente. Para alcançar este objetivo, este trabalho apresenta duas contribuições principais. A primeira contribuição é a proposta de uma nova operação por similaridade, chamada consulta aos k-vizinhos mais próximos estendida com condições (ck-NNq), que estende a consulta aos k-vizinhos mais próximos (k-\'NN SUB. q\') de maneira a fornecer uma condição adicional, modificando a semântica da operação. A operação proposta permite representar consultas demandadas por várias aplicações, que não eram capazes de ser representadas anteriormente, e permite homogeneamente integrar condições de filtragem complementares à k-\'NN IND.q\'. A segunda contribuição é o desenvolvimento do FMI-SiR (user-defined Features, Metrics and Indexes for Similarity Retrieval ), que é um módulo de banco de dados que permite executar consultas por similaridade integradas às demais operações do SGBD. O módulo permite incluir métodos de extração de características e funções de distância definidos pelo usuário no núcleo do gerenciador de banco de dados, fornecendo grande exibilidade, e também possui um tratamento especial para imagens médicas. Além disso, foi verificado através de experimentos sobre bancos de dados reais que a implementação do FMI-SiR sobre o SGBD Oracle é capaz de consultar eficientemente grandes bancos de dados complexos / The amount of complex data (images, videos, time series and others) has been growing at a very fast pace. Complex data are well-suited to be searched by similarity, which means to define queries according to a given similarity criterion. Moreover, complex data are usually associated with other information, usually of conventional data types, which must be employed in conjunction with similarity operations to answer complex queries. Several works proposed techniques for similarity searching, however, the majority of the approaches was not conceived to be integrated into a DBMS, treating similarity queries as isolated operations detached from the query processor. The main objective of this thesis is to propose algebraic alternatives, data structures and algorithms to allow a wide use of similarity queries associated to the search operations provided by the relational DBMSs and to execute such composite queries eficiently. To reach this goal, this work presents two main contributions. The first contribution is the proposal of a new similarity operation, called condition-extended k-Nearest Neighbor query (ck-\'NN IND. q\'), that extends the k-Nearest Neighbor query (k-\'NN IND. q\') to provide an additional conditio modifying the operation semantics. The proposed operation allows representing queries required by several applications, which were not able to be represented before, and allows to homogeneously integrate complementary filtering conditions to the k-\'NN IND. q\'. The second contribution is the development of the FMI-SiR(user-defined Features, Metrics and Indexes for Similarity Retrieval), which is a database module that allows executing similarity queries integrated to the DBMS operations. The module allows including user-defined feature extraction methods and distance functions into the database core, providing great exibility, and also has a special treatment for medical images. Moreover, it was verified through experiments over real datasets that the implementation of FMI-SiR over the Oracle DBMS is able to eficiently search very large complex databases Banco de dados Consultas por similaridade Multimedia databases Multimídia Similarity queries
416	Historisation de données dans les bases de données NoSQLorientées graphes / Historical management in NoSQL Graph Databases Castelltort, Arnaud 30 September 2014 (has links) Cette thèse porte sur l'historisation des données dans les bases de données graphes. La problématique des données en graphes existe depuis longtemps mais leur exploitation par des moteurs de système de gestion de bases de données, principalement dans les moteurs NoSQL, est récente. Cette apparition est notamment liée à l'émergence des thématiques Big Data dont les propriétés intrinsèques, souvent décrites à l'aide des propriétés 3V (variété, volume, vélocité), ont révélé les limites des bases de données relationnelles classiques. L'historisation quant à elle, est un enjeu majeur des SI qui a été longtemps abordé seulement pour des raisons techniques de sauvegarde, de maintenance ou plus récemment pour des raisons décisionnelles (suites applicatives de Business Intelligence). Cependant, cet aspect s'avère maintenant prendre une place prédominante dans les applications de gestion. Dans ce contexte, les bases de données graphes qui sont de plus en plus utilisées n'ont que très peu bénéficié des apports récents de l'historisation. La première contribution consiste à étudier le nouveau poids des données historisées dans les SI de gestion. Cette analyse repose sur l'hypothèse selon laquelle les applications de gestion intègrent de plus en plus en leur sein les enjeux d'historisation. Nous discutons ce positionnement au regard de l'analyse de l'évolution des SI par rapport à cette problématique. La deuxième contribution vise, au-delà de l'étude de l'évolution des sytèmes d'information, à proposer un modèle innovant de gestion de l'historisation dans les bases de données NoSQL en graphes. Cette proposition consiste d'une part en l'élaboration d'un système unique et générique de représentation de l'historique au sein des BD NoSQL en graphes et d'autre part à proposer des modes d'interrogation (requêtes). Nous montrons qu'il est possible d'utiliser ce système aussi bien pour des requêtes simples (c'est-à-dire correspondant à ce que l'on attend en première intention d'un système d'historisation~: récupérer les précédentes versions d'une donnée) mais aussi de requêtes plus complexes qui permettent de tirer parti aussi bien de la notion d'historisation que des possibilités offertes par les bases de données graphes (par exemple, la reconnaissance de motifs dans le temps). / This thesis deals with data historization in the context of graphs. Graph data have been dealt with for many years but their exploitation in information systems, especially in NoSQL engines, is recent. The emerging Big Data and 3V contexts (Variety, Volume, Velocity) have revealed the limits of classical relational databases. Historization, on its side, has been considered for a long time as only linked with technical and backups issues, and more recently with decisional reasons (Business Intelligence). However, historization is now taking more and more importance in management applications.In this framework, graph databases that are often used have received little attention regarding historization. Our first contribution consists in studying the impact of historized data in management information systems. This analysis relies on the hypothesis that historization is taking more and more importance. Our second contribution aims at proposing an original model for managing historization in NoSQL graph databases.This proposition consists on the one hand in elaborating a unique and generic system for representing the history and on the other hand in proposing query features.We show that the system can support both simple and complex queries.Our contributions have been implemented and tested over synthetic and real databases. Bases de données graphes Historisation Gestion Graph Databases Historization Management
417	A hybrid machine learning approach to measuring sentiment, credibility and influence on Twitter Heeley, Robert January 2017 (has links) Current sentiment analysis on Twitter is hampered by two factors namely, not all accounts are genuine and not all users have the same level of influence. Including non credible and irrelevant Tweets in sentiment analysis dilutes the effectiveness of any sentiment produced. Similarly, counting a Tweet with a potential audience of 10 users as having the same impact as a Tweet that could reach 1 million users is not accurately reflecting its importance. In order to mitigate against these inherent problems a novel method was devised to account for credibility and to measure influence. The current definition of credibility on Twitter was redefined and expanded to incorporate the subtle nuances that exist beyond the simple variance between human or bot account. Once basic sentiment was produced it was filtered by removing non credible Tweets and the remaining sentiment was augmented by weighting it based upon both the user’s and the Tweet’s influence scores. Measuring one person’s opinion is costly and lacking in power, however, machine learning techniques allow us to capture and analyse millions of opinions. Combining a Tweet’s sentiment with the user’s influence score and their credibility rating greatly increases the understanding and usefulness of that sentiment. In order to gauge and measure the impact of this research and highlight its generalisability, this thesis examined 2 distinct real world datasets, the UK General Election 2015 and the Rugby World Cup 2015, which also served to validate the approach used. A better more accurate understanding of sentiment on Twitter has the potential for broad impact from providing targeted advertising that is in tune with people’s needs and desires to providing governments with a better understanding of the will and desire of the people. 004
418	Function-specific schemes for verifiable computation Papadopoulos, Dimitrios 07 December 2016 (has links) An integral component of modern computing is the ability to outsource data and computation to powerful remote servers, for instance, in the context of cloud computing or remote file storage. While participants can benefit from this interaction, a fundamental security issue that arises is that of integrity of computation: How can the end-user be certain that the result of a computation over the outsourced data has not been tampered with (not even by a compromised or adversarial server)? Cryptographic schemes for verifiable computation address this problem by accompanying each result with a proof that can be used to check the correctness of the performed computation. Recent advances in the field have led to the first implementations of schemes that can verify arbitrary computations. However, in practice the overhead of these general-purpose constructions remains prohibitive for most applications, with proof computation times (at the server) in the order of minutes or even hours for real-world problem instances. A different approach for designing such schemes targets specific types of computation and builds custom-made protocols, sacrificing generality for efficiency. An important representative of this function-specific approach is an authenticated data structure (ADS), where a specialized protocol is designed that supports query types associated with a particular outsourced dataset. This thesis presents three novel ADS constructions for the important query types of set operations, multi-dimensional range search, and pattern matching, and proves their security under cryptographic assumptions over bilinear groups. The scheme for set operations can support nested queries (e.g., two unions followed by an intersection of the results), extending previous works that only accommodate a single operation. The range search ADS provides an exponential (in the number of attributes in the dataset) asymptotic improvement from previous schemes for storage and computation costs. Finally, the pattern matching ADS supports text pattern and XML path queries with minimal cost, e.g., the overhead at the server is less than 4% compared to simply computing the result, for all our tested settings. The experimental evaluation of all three constructions shows significant improvements in proof-computation time over general-purpose schemes. Computer science Outsourced databases Secure outsourcing Verifiable computation
419	Sensored: The Quantified Self, Self-Tracking, and the Limits of Digital Transparency Grinberg, Yuliya January 2019 (has links) The idea that daily life overflows with data has entered our common sense. Digital sensors placed in phones, clothing, or household appliances to track how we walk, how much we sleep, or where we travel have heightened the sense that everything about our lives is rapidly being translated into data. Theorists writing about data overload have largely converged around questions of privacy and agency, focusing on the feelings of impotence produced by large quantities of data that now let corporations effortlessly monitor and regulate people’s lives. By contrast, I am interested in moments of friction. Scholars point to real issues, but they overstate the efficacy of data gathering and discount the professional dynamics that motivate the proliferation of data. As I evaluate how data discourse operates and builds, I concentrate on the experiences of those involved in the business of self-tracking, and mainly on the work of U.S.-based developers of wearable computing and the technology professionals who participate in the international forum for data enthusiasts called the Quantified Self. As I analyze how digital entrepreneurialism configures notions of data and transforms digital self-monitoring into meaningful work, I examine how the relationship of technology professionals to data opens onto wider debates about the politics of digital representation. Ultimately, by applying an anthropological lens to explore how the practices, beliefs, and views of marketers, engineers, and developers of self-tracking tools shape digital knowledge, this research challenges accounts of data based purely on transparency, anxiety, and fear and reveals just how precarious the control exerted by digital companies and self-monitoring tools really is. Ethnology Anthropology User-generated content Technology Detectors Databases
420	An algebraic approach to the information-lossless decomposition of relational databases. / CUHK electronic theses & dissertations collection January 2008 (has links) In the second part, we further investigate algebraic structure of relational databases. The decomposition theory for relational databases is based on data dependencies. Nevertheless, the set-theoretic representations of data dependencies in terms of the attributes of relation schemes are incompatible with partial ordering operations. This brings a gap between the database decomposition theory and our theory. We identify the unique component constraint as a necessary condition for binary decomposition of a relation, i.e. there is a unique component for every join key value in the bipartite graph. We generalize the running intersection property as the partial ordering counterpart under the unique component constraint. It follows that we characterize the multivalued and acyclic join dependencies in terms of commutativity and unique component constraint. This shows the decompositions specified by these dependencies are special cases of our theory. Furthermore, we propose a lossless decomposition method for the class of data dependencies that is based on commutativity, and demonstrate that existing relational operations are sufficient for this method. / Relational information systems, systems that can be represented by tables of finite states, are widely used in many areas such as logic circuits, finite state machines, and relational databases. Decomposition is a natural method to remove redundancy of complex systems. It divides a system into a network of simpler components. In order to preserve the original functionalities of the system, any valid decomposition has to be lossless. This work is divided into two parts. In the first part, we develop a mathematical model for lossless decompositions of relational information systems. Commutative partitions play an important role in decompositions. The commutativity is essentially a general algebraic formulation of independency of two partitions. We express the interdependency of two commutative partitions by a bipartite graph, and classify the hierarchical independency structures by the topological property of bipartite graphs. In particular, we show that two partitions are decomposable, the strongest kind of independency, if and only if the associated bipartite graph is uniform. Moreover, we adopt Shannon's entropy to quantify the amount of information contained in each partition, and formulate information-lossless decompositions by entropy equalities. Under the assumption of running intersection property, we show that the general formulation of information-lossless decompositions of relational information systems is given by the entropy inclusion-exclusion equality. We also present the applications of these formulations to the above engineering systems to manifest the information-lossless decomposition processes. / Lo, Ying Hang. / Adviser: Tony T. Lee. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3606. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 159-163). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Database management--Mathematics Decomposition (Mathematics) Relational databases--Mathematics

Search results