Global ETD Search

551	Enriquecimento de dados: uma pré-etapa em relação à limpeza de dados Carreira , Juliano Augusto [UNESP] 12 July 2012 (has links) (PDF) Made available in DSpace on 2014-06-11T19:24:01Z (GMT). No. of bitstreams: 0 Previous issue date: 2012-07-12Bitstream added on 2014-06-13T18:20:04Z : No. of bitstreams: 1 carreira_ja_me_sjrp.pdf: 438099 bytes, checksum: d4a3de381d717416cf913583222eee97 (MD5) / A incidência de tuplas duplicadas é um problema significativo e inerente às grandes bases de dados atuais. Trata-se da repetição de registros que, na maioria das vezes, são representados de formas diferentes nas bases de dados, mas fazem referência a uma mesma entidade do mundo real, tornando, assim, a tarefa de identificação das duplicatas um trabalho árduo. As técnicas designadas para o tratamento deste tipo de problema são geralmente genéricas. Isso significa que não levam em consideração as características particulares dos idiomas o que, de certa forma, inibe a maximização quantitativa e qualitativa das tuplas duplicadas identificadas. Este trabalho propõe a criação de uma pré-etapa – intitulada “enriquecimento” – referente ao processo de identificação de tuplas duplicadas. Tal processo baseia-se no favorecimento do idioma e se dá por meio da utilização de regras de linguagem pré-definidas, de forma genérica, para cada idioma desejado. Assim, consegue-se enriquecer os registros de entrada, definidos em qualquer idioma, e, com a aproximação ortográfica que o enriquecimento proporciona, consegue-se aumentar a quantidade de tuplas duplicadas e/ou melhorar o nível de confiança em relação aos pares de tuplas duplicadas identificadas pelo processo / The incidence of duplicate tuples is a significant problem inherent in current large databases. It is the repetition of records that, in most cases, are represented differently in the database but refer to the same real world entity thus making the task of identifying duplicates a hard work. The techniques designed to treat this type of problem are usually generic. That means they do not take into account the particular characteristics of the languages that somehow inhibits the quantitative and qualitative maximization of duplicate tuples identified. This dissertation proposes the creation of a pre-step - called enrichment – in relation to the process of duplicate tuples identification. This process is based on the language favoring and is through the use of predefined language rules in a general way for each language. Thus, it is possible to enrich the input records defined in any language and considering the spell approximation provided by the enrichment process, it is possible to increase the amount of duplicate tuples and/or improve the level of trust in relation to the pairs of duplicate tuples identified by the process Banco de dados - Limpeza Bases de dados - Tuplas duplicadas Databases - Duplicate tuples
552	BenchXtend: uma ferramenta para medir a elasticidade de sistemas de banco de dados em nuvem / BenchXtend: a tool to measure the elasticity of cloud database systems Rodrigo FÃlix de Almeida 27 September 2013 (has links) nÃo hÃ / In recent years, cloud computing has attracted attention from industry and academic world, becoming increasingly common to find cases of cloud adoption by companies and research institutions in the literature. Since the majority of cloud applications are data-driven, database management systems powering these applications are critical components in the application stack. Many novel database systems have emerged to fulfill new requirements of high-scalable cloud applications. Those systems have remarkable differences when compared to traditional relational databases. Moreover, since elasticity is a key feature in cloud computing and it is a differential of this computing paradigm, novel database systems must also provide elasticity. Altogether with the emergence of these new systems, the need of evaluating them comes up. Traditional benchmark tools for database systems are not sufficient to analyze some specificities of these systems in a cloud. Thus, new benchmark tools are required to properly evaluate such cloud systems and also to measure how elastic they are. Before actually benchmarking and measuring elasticity of cloud database systems, it becomes necessary to define a model with elasticity metrics that makes sense both for consumers and providers. In this work we present BenchXtend, a tool, that extends Yahoo! Cloud Serving Benchmark (YCSB), to benchmark cloud database systems and to measure elasticity of such systems. As part of this work, we propose a model with metrics from consumer and provider perspectives to measure elasticity. Finally, we evaluated our solution by performing experiments and we verified that our tool could properly vary the load during execution, as expected, and that our elasticity model could capture the elasticity differences between the studied scenarios. / Nos Ãltimos anos, a computaÃÃo em nuvem tem atraÃdo a atenÃÃo tanto da indÃstria quanto do meio acadÃmico, tornando-se comum encontrar na literatura relatos de adoÃÃo de computaÃÃo em nuvem por parte de empresas e instituiÃÃes acadÃmicas. Uma vez que a maioria das aplicaÃÃes em nuvem sÃo orientadas a dados, sistemas de gerenciamento de bancos de dados sÃo componentes crÃticos das aplicaÃÃes. Novos sistemas de bancos de dados surgiram para atender a novos requisitos de aplicaÃÃes altamente escalÃveis em nuvem. Esses sistemas possuem diferenÃas marcantes quando comparados com sistemas relacionais tradicionais. AlÃm disso, uma vez que elasticidade Ã um recurso chave da computaÃÃo em nuvem e um diferencial desse paradigma, esses novos sistemas de bancos de dados tambÃm devem prover elasticidade. Juntamente com o surgimento desses novos sistemas, surge tambÃm a necessidade de avaliÃ-los. Ferramentas tradicionais de benchmark para bancos de dados nÃo sÃo suficientes para analisar as especificidades desses sistemas em nuvem. Assim, novas ferramentas de benchmark sÃo necessÃrias para avaliar adequadamente esses sistemas em nuvem e como medir o quÃo elÃsticos eles sÃo. Antes de avaliar e calcular a elasticidade desses sistemas, se faz necessÃria a definiÃÃo de um modelo com mÃtricas de elasticidade que faÃam sentido tanto para consumidores quanto provedores. Nesse trabalho apresentamos BenchXtend, uma ferramenta, que estende o Yahoo! Cloud Serving Benchmark (YCSB), para benchmarking e mediÃÃo de elasticidade de bancos de dados em nuvem. Como parte desse trabalho, propomos um modelo com mÃtricas a partir das perspectivas dos consumidores e dos provedores para medir a elasticidade. Por fim, avaliamos nossa soluÃÃo atravÃs de experimentos e verificamos que nossa ferramenta foi capaz de variar a carga de trabalho, como esperado, e que nossas mÃtricas conseguiram capturar a variaÃÃo de elasticidade nos cenÃrios analisados. Elasticidade Bancos de dados Benchmarking Elasticity Databases Cloud CIENCIA DA COMPUTACAO
553	Controle de acesso para bancos de dados geograficos multiversão / Access control in multiversion geographic databases Pierre, Mateus Silva 12 November 2007 (has links) Orientador: Claudia Maria Bauzer Medeiros / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-10T17:58:45Z (GMT). No. of bitstreams: 1 Pierre_MateusSilva_M.pdf: 1215967 bytes, checksum: 6e10bb54cce47309fee13b3e67907566 (MD5) Previous issue date: 2007 / Resumo: Aplicações geográficas estão cada vez mais influenciando todas nossas atividades diárias. Seu desenvolvimento exige, via de regra, trabalho em equipe de múltiplos perfis de especialistas, com diferentes visões e direitos de acesso aos dados. Em conseqüência, vários mecanismos vêm sendo propostos para controlar autorização a bancos de dados geográficos ou disponibilizar o uso de versões. Estes mecanismos, no entanto, trabalham de forma isolada, priorizando apenas o direito de acesso ou o versionamento flexível. A dissertação aborda esta questão, propondo um modelo unificado de autorização em bancos de dados que ataque os dois problemas em conjunto. O modelo trata da questão de controle de acesso em bancos de dados geográficos, levando-se em consideração a existência de mecanismos de versionamento dos dados armazenados. Este modelo pode, assim, servir como base para trabalho cooperativo e seguro em aplicações que usem Sistemas de Informação Geográficos (SIGs) / Abstract: Geographic applications are increasingly influencing our daily activities. Their development requires efforts from multiple teams of experts with different views and authorizations to access data. As a result, several mechanisms have been proposed to control authorization in geographic databases or to provide the use of versions. These mechanisms, however, work in isolation, prioritizing only either data access or versioning systems. This dissertation addresses this issue, by proposing a unified authorization model for databases that faces both problems. The model deals with the access control issue in geographic databases, taking into account the existence of data versioning mechanisms. This model may serve as the basis for cooperative and secure work in applications that use Geographic Information Systems (GIS) / Mestrado / Banco de Dados / Mestre em Ciência da Computação Sistemas de informação geográfica Banco de dados Geographic information systems Databases
554	Serial Annotator = managing annotations of time series = Serial Annotator: gerenciando anotações em séries temporais / Serial Annotator : gerenciando anotações em séries temporais Silva, Felipe Henriques da, 1978- 06 October 2013 (has links) Orientador: Claudia Maria Bauzer Medeiros / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-23T21:42:10Z (GMT). No. of bitstreams: 1 Silva_FelipeHenriquesda_M.pdf: 3283921 bytes, checksum: 6875b168c728390c5cbeb2e32389cb99 (MD5) Previous issue date: 2013 / Resumo: Séries temporais são sequências de valores medidos em sucessivos instantes de tempo. Elas são usadas em diversos domínios, tais como agricultura, medicina e economia. A análise dessas séries é de extrema importância, fornecendo a especialistas a capacidade de identificar tendências e prever possíveis cenários. A fim de facilitar sua análise, especialistas frequentemente associam anotações com séries temporais. Tais anotações também podem ser usadas para correlacionar séries distintas, ou para procurar por séries específicas num banco de dados. Existem muitos desafios envolvidos no gerenciamento destas anotações - desde encontrar estruturas adequadas para associá-las com as séries, até organizar e recuperar séries através das anotações associadas a estas. Este trabalho contribui para o trabalho em gerenciamento de séries temporais. Suas principais contribuições são o projeto e desenvolvimento de um arcabouço para o gerenciamento de múltiplas anotações associadas com uma ou mais séries em um banco de dados. Este arcabouço também fornece meios para o controle de versão das anotações, de modo que os estados anteriores de uma anotação nunca sejam perdidos. Serial Annotator é uma aplicação desenvolvida para a plataforma Android. Ela foi usada para validar o arcabouço proposto e foi testada com dados reais envolvendo problemas do domínio agrícola / Abstract: Time series are sequences of values measured at successive time instants. They are used in several domains such as agriculture, medicine and economics. The analysis of these series is of utmost importance, providing experts the ability to identify trends and forecast possible scenarios. In order to facilitate their analyses, experts often associate annotations with time series. Such annotations can also be used to correlate distinct series, or look for specific series in a database. There are many challenges involved in managing annotations - from finding proper structures to associate them with series, to organizing and retrieving series based on annotations. This work contributes to the work in management of time series. Its main contributions are the design and development of a framework for the management of multiple annotations associated with one or multiple time series in a database. The framework also provides means for annotation versioning, so that previous states of an annotation are never lost. Serial Annotator is an application implemented for the Android smart phone platform. It has been used to validate the proposed framework and has been tested with real data involving agriculture problems / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Banco de dados Análise de séries temporais Databases Time-series analysis
555	Database analysis and managing large data sets in a trading environment Månsson, Per January 2014 (has links) Start-up companies today tend to find a need to scale up quickly and smoothly, to cover quickly increasing demands for the services they create. It is also always a necessity to save money and finding a cost-efficient solution which can meet the demands of the company. This report uses Amazon Web Services for infrastructure. It covers hosting databases on Elastic Computing Cloud, the Relational Database Serviceas well as Amazon DynamoDB for NoSQL storage are compared, benchmarked and evaluated. Amazon Web Services Databases Startup Computer Engineering Datorteknik
556	How the choice of Operating System can affect databases on a Virtual Machine Karlsson, Jan, Eriksson, Patrik January 2014 (has links) As databases grow in size, the need for optimizing databases is becoming a necessity. Choosing the right operating system to support your database becomes paramount to ensure that the database is fully utilized. Furthermore with the virtualization of operating systems becoming more commonplace, we find ourselves with more choices than we ever faced before. This paper demonstrates why the choice of operating system plays an integral part in deciding the right database for your system in a virtual environment. This paper contains an experiment which measured benchmark performance of a Database management system on various virtual operating systems. This experiment shows the effect a virtual operating system has on the database management system that runs upon it. These findings will help to promote future research into this area as well as provide a foundation on which future research can be based upon. Databases Operating Systems Virtual Machines Performance Software Engineering Programvaruteknik
557	Evaluation of using NoSQL databases in an event sourcing system Rothsberg, Johan January 2015 (has links) An event store is a database for storing events in an event sourcing system. Instead of storing the current state, a very common way to persist data, an event sourcing system captures all changes to an application state as a sequence of events. Usually the event store is a relational database. Relational databases have several drawbacks and therefore NoSQL databases have been developed. The purpose of this thesis is to explore the possibility of using a NoSQL database in an event sourcing system. We will see how data is stored in an event store and then evaluate di↵erent solutions to find a suitable database. The graph database Neo4j was selected to be further investigated and a Neo4j event store has been implemented. At last the implemented solution is evaluated against the existing event store that uses a relational database. The conclusion of this thesis is that event store data could easily be modeled in Neo4j but some queries became complex to implement. The performance tests showed us that the implemented event store had poorer performance than the existing one using a relational database. NoSQL databases event sourcing event store Computer Sciences Datavetenskap (datalogi)
558	Extensions to the self protecting object model to facilitate integrity in stationary and mobile hosts Brandi, Wesley 13 March 2014 (has links) M.Sc. (Computer Science) / In this dissertation we propose extensions to the Self Protecting Object (SPO) model to facilitate the sharing of information in a more effective manner. We see the sharing ofinformation as the sharing of objects that provide services. Sharing objects effectively is allowing the objects to be used in a secure environment, independent of their location, in a manner usage was intended. The SPO model proposed by Olivier [32] allows for objects in a federated database to be moved from one site to another and ensures that the security policy of the object will always be respected and implemented, regardless of its location. Although the SPO model does indeed allow for objects (information) to be shared effectively, it fails to address issues of maintaining integrity within objects. We therefore define the notion of maintaining integrity within the spa model and propose a model to achieve it. We argue that ensuring an SPO is only used in a way usage was intended does not suffice to ensure integrity. The model we propose is based on ensuring that modifications to an SPO are only executed if the modification does not violate the constraints defined for the Sf'O, The model" allows for an spa to maintain its unique identity in addition to maintaining its integrity. The SPO model is designed to be used in a federated database on sites that are stationary. Therefore, having addressed the issue of maintaining integrity within SPOs on stationary sites in the federated database, we then introduce the notion of a mobile site: a site that will eventually disconnect from the federated database and become unreachable for some time. Introducing the mobile site into the federated database allows us to propose the Mobile Self Protecting Object (MSPO) and its associated architecture. Because of the nature of mobile sites, the original model for maintaining integrity can not be applied to the MSPO architecture. We therefore propose a mechanism (to be implemented in unison with the original model) to ensure the integrity of MSPOs on mobile sites. We then discuss the JASPO prototype. The aim of the prototype was to determine if the Self Protecting Object model was feasible using current development technologies. We examine the requirements identified in order for the prototype to be successful and discuss how these were satisfied. Several modifications were made to the original spa model, including the addition of a new module and the exclusion of others, we discuss these modifications and examine why they were necessary. Internetworking (Telecommunication) Internet domain names
559	A privacy protection model to support personal privacy in relational databases. Oberholzer, Hendrik Johannes 02 June 2008 (has links) The individual of today incessantly insists on more protection of his/her personal privacy than a few years ago. During the last few years, rapid technological advances, especially in the field of information technology, directed most attention and energy to the privacy protection of the Internet user. Research was done and is still being done covering a vast area to protect the privacy of transactions performed on the Internet. However, it was established that almost no research has been done on the protection of the privacy of personal data that are stored in tables of a relational database. Until now the individual had no say in the way his/her personal data might have been used, indicating who may access the data or who may not. The individual also had no way to indicate the level of sensitivity with regard to the use of his/her personal data or exactly what he/she consented to. Therefore, the primary aim of this study was to develop a model to protect the personal privacy of the individual in relational databases in such a way that the individual will be able to specify how sensitive he/she regards the privacy of his/her data. This aim culminated in the development of the Hierarchical Privacy-Sensitive Filtering (HPSF) model. A secondary aim was to test the model by implementing the model into query languages and as such to determine the potential of query languages to support the implementation of the HPSF model. Oracle SQL served as an example for text or command-based query languages, while Oracle SQLForms served as an example of a graphical user interface. Eventually, the study showed that SQL could support implementation of the model only partially, but that SQLForms was able to support implementation of the model completely. An overview of the research approach employed to realise the objectives of the study: At first, the concepts of privacy were studied to narrow down the field of study to personal privacy and the definition thereof. Problems that relate to the violation or abuse of the individual’s personal privacy were researched. Secondly, the right to privacy was researched on a national and international level. Based on the guidelines set by organisations like the Organisation for Economic Co-operation and Development (OECD) and the Council of Europe (COE), requirements were determined to protect the personal privacy of the individual. Thirdly, existing privacy protection mechanisms like privacy administration, self-regulation, and automated regulation were studied to see what mechanisms are currently available and how they function in the protection of privacy. Probably the most sensitive data about an individual is his/her medical data. Therefore, to conclude the literature study, the privacy of electronic medical records and the mechanisms proposed to protect the personal privacy of patients were investigated. The protection of the personal privacy of patients seemed to serve as the best example to use in the development of a privacy model. Eventually, the Hierarchical Privacy-Sensitive Filtering model was developed and introduced, and the potential of Oracle SQL and Oracle SQL*Forms to implement the model was investigated. The conclusion at the end of the dissertation summarises the study and suggests further research topics. / Prof. M.S. Olivier data protection confidential communications records access control relational databases security
560	Secure object-oriented databases Olivier, Martin Stephanus 07 October 2014 (has links) D.Phil. (Computer Science) / The need for security in a database is obvious. Object-orientation enables databases to be used in applications where other database models are not adequate. It is thus clear that security of object-oriented databases must be investigated... Computers - Access control

Search results