Spelling suggestions: "subject:"databases,"" "subject:"atabases,""
881 |
Partitionnement dans les systèmes de gestion de données parallèlesLiroz, Miguel 17 December 2013 (has links) (PDF)
Au cours des dernières années, le volume des données qui sont capturées et générées a explosé. Les progrès des technologies informatiques, qui fournissent du stockage à bas prix et une très forte puissance de calcul, ont permis aux organisations d'exécuter des analyses complexes de leurs données et d'en extraire des connaissances précieuses. Cette tendance a été très importante non seulement pour l'industrie, mais a également pour la science, où les meilleures instruments et les simulations les plus complexes ont besoin d'une gestion efficace des quantités énormes de données.Le parallélisme est une technique fondamentale dans la gestion de données extrêmement volumineuses car il tire parti de l'utilisation simultanée de plusieurs ressources informatiques. Pour profiter du calcul parallèle, nous avons besoin de techniques de partitionnement de données efficaces, qui sont en charge de la division de l'ensemble des données en plusieurs partitions et leur attribution aux nœuds de calculs. Le partitionnement de données est un problème complexe, car il doit prendre en compte des questions différentes et souvent contradictoires telles que la localité des données, la répartition de charge et la maximisation du parallélisme.Dans cette thèse, nous étudions le problème de partitionnement de données, en particulier dans les bases de données parallèles scientifiques qui sont continuellement en croissance. Nous étudions également ces partitionnements dans le cadre MapReduce.Dans le premier cas, nous considérons le partitionnement de très grandes bases de données dans lesquelles des nouveaux éléments sont ajoutés en permanence, avec pour exemple une application aux données astronomiques. Les approches existantes sont limitées à cause de la complexité de la charge de travail et l'ajout en continu de nouvelles données limitent l'utilisation d'approches traditionnelles. Nous proposons deux algorithmes de partitionnement dynamique qui attribuent les nouvelles données aux partitions en utilisant une technique basée sur l'affinité. Nos algorithmes permettent d'obtenir de très bons partitionnements des données en un temps d'exécution réduit comparé aux approches traditionnelles.Nous étudions également comment améliorer la performance du framework MapReduce en utilisant des techniques de partitionnement de données. En particulier, nous sommes intéressés par le partitionnement efficient de données d'entrée
882 |
A study on relational databases through mathematical theories of relations and logicYu, Chaoran January 1988 (has links)
The purpose of this study is to explore that mathematics provides a convenient formalism for studying classical database management system problems. There are two main parts in this study, devoted respectively to using mathematical theory of relations and using logical theory to study database management systems. In the first part we focus on relational model and relational algebra. The second part deals with the application of mathematical logic to database management systems, where logic may be used both as a inference system and as a representation language. The features and logical mechanisms of Prolog programming language have been studied. A sample logical database model is developed and tested, using the logic programming language Prolog. / Department of Computer Science
883 |
In this study, the inability to in a future meet the electricity demand and the urge to change the consumption behavior considered. In a smart grid context there are several possible ways to do this. Means include ways to increase the consumer’s awareness, add energy storages or build smarter homes which can control the appliances. To be able to implement these, indications on how the future consumption will be could be useful. Therefore we look further into how a framework for short-term consumption predictions can be created using electricity consumption data in relation to external factors. To do this a literature study is made to see what kind of methods that are relevant and which qualities is interesting to look at in order to choose a good prediction method. Case Based Reasoning seemed to be able to be suitable method. This method was examined further and built using relational databases. After this the method was tested and evaluated using datasets and evaluation methods CV, MBE and MAPE, which have previously been used in the domain of consumption prediction. The result was compared to the results of the winning methods in the ASHRAE competition. The CBR method was expected to perform better than what it did, and still not as good as the winning methods from the ASHRAE competition. The result showed that the CBR method can be used as a predictor and has potential to make good energy consumption predictions. and there is room for improvement in future studies.
884 |
Framework for expressing prioritized constraints using infinitesimal logicAgarwal, Ruchi 10 November 2009 (has links)
In this thesis, we propose an extension to the multiple-valued infinitesimal logic frame-work to provide a simple representation for prioritized constraints. We introduce two unary operators, µ and w, to infinitesimal logic in order to define preferential constraints and backup constraints, respectively. The new framework naturally allows us to define a hierarchy of priorities among constraints. Also, we present a lazy algorithm for evaluating the multiple-valued prioritized constraint expressions of our representation. Our algorithm, which is similar to the alpha-beta pruning technique for minimax game tree evaluation, is based on a recursive depth-first traversal of the parse tree for the expression and works by evaluating operands within an increasingly narrower range of interest. Our implementation of this representation for querying a movies database demonstrates the expressive power and flexibility of our framework.
885 |
Distributed multi-source regular path queriesShoaran, Maryam 06 April 2010 (has links)
Regular path queries are the building block of almost any mechanism for querying semistructured data. Despite the fact that the main applications of such data are distributed, there are only few works dealing with distributed evaluation of regular path queries. In this thesis we present a message-efficient and truly distributed algorithm for computing the answer to regular path queries in a multi-source semistructured database setting.
Our algorithm has several desirable properties. First, it is general as it works for the larger class of weighted regular path queries on weighted semistructured databases. Second, it performs a progressive evaluation, that is, partial answers can be represented to the user as soon as they are computed while she is waiting for new answers to arrive. Third, the proposed algorithm is symmetric among processes, i.e., they all run the same algorithm. And finally, it does not need a separate termination detection algorithm as it can detect the global termination simply by using an spanning tree.
886 |
Design of a hyper-environment for tracing object-oriented requirementsPinheiro, Francisco de Assis Cartaxo January 1997 (has links)
Change is inevitable and unending in developing large, complex systems. Changes to requirements arise not only from changes in the social context of the system, but also from improved understanding of constraints and tradeoffs as system development proceeds. How to trace software requirements is the problem addressed by this thesis. We present a solution for requirements tracing in the context of object-oriented software development. Our solution consists of a traceability model and a tool to automate the tracing. TOOR, the tool to implement the model, uses a project specification written in FOOPS, a general purpose object-oriented language with specification capabilities, to set up the environment in which a project is carried out. The project specification defines the trace units and traces as objects and relations, respectively. The evolution of objects from requirements sources to requirements to design to code, and generally to any object taking part in the process is dealt with in a uniform way in TOOR: classes are declared for each kind of object we wish to control, and relations are defined between them. TOOR uses regular expressions to provide a selective tracing mode: the actual configuration of objects and relations is considered as a text and regular expressions are used to retrieve parts of the configuration matching the pattern described by them. TOOR enhances the flexibility of regular expressions by extending the pattern matching procedure by providing different ways of specifying how an object or relation is to be matched. Other modes of tracing in TOOR are the interactive tracing through modules and the non-guided tracing through several browsing mechanisms. TOOR modules are used to structure projects by providing hierarchical scopes for objects used in a project development. The tracing mechanisms of TOOR can use the project structure to order searches or to provide boundaries for searching. Browsing objects provides additional flexibility in situations where little information of what has to be traced is possessed and hyper-media features address the need to re-interpret data usually encoded in different formats. The user-definable features of a project specification provides much of the flexibility necessary for effective use of a software tracing tool. Also, the integration of regular expression tracing with other forms of tracing such as browsing and interactive tracing makes TOOK an extremely versatile tool. The user can select the more appropriate form of tracing depending on context and can switch from one form to another as convenient.
887 |
Système de Questions/Réponses dans un contexte de Business IngelligenceKuchmann-Beauger, Nicolas 15 February 2013 (has links) (PDF)
Le volume et la complexité des données générées par les systèmes d'information croissent de façon singulière dans les entrepôts de données. Le domaine de l'informatique décisionnelle (aussi appelé BI) a pour objectif d'apporter des méthodes et des outils pour assister les utilisateurs dans leur tâche de recherche d'information. En effet, les sources de données ne sont en général pas centralisées, et il est souvent nécessaire d'interagir avec diverses applications. Accéder à l'information est alors une tâche ardue, alors que les employés d'une entreprise cherchent généralement à réduire leur charge de travail. Pour faire face à ce constat, le domaine "Enterprise Search" s'est développé récemment, et prend en compte les différentes sources de données appartenant aussi bien au réseau privé d'entreprise qu'au domaine public (telles que les pages Internet). Pourtant, les utilisateurs de moteurs de recherche actuels souffrent toujours de du volume trop important d'information à disposition. Nous pensons que de tels systèmes pourraient tirer parti des méthodes du traitement naturel des langues associées à celles des systèmes de questions/réponses. En effet, les interfaces en langue naturelle permettent aux utilisateurs de rechercher de l'information en utilisant leurs propres termes, et d'obtenir des réponses concises et non une liste de documents dans laquelle l'éventuelle bonne réponse doit être identifiée. De cette façon, les utilisateurs n'ont pas besoin d'employer une terminologie figée, ni de formuler des requêtes selon une syntaxe très précise, et peuvent de plus accéder plus rapidement à l'information désirée. Un challenge lors de la construction d'un tel système consiste à interagir avec les différentes applications, et donc avec les langages utilisés par ces applications d'une part, et d'être en mesure de s'adapter facilement à de nouveaux domaines d'application d'autre part. Notre rapport détaille un système de questions/réponses configurable pour des cas d'utilisation d'entreprise, et le décrit dans son intégralité. Dans les systèmes traditionnels de l'informatique décisionnelle, les préférences utilisateurs ne sont généralement pas prises en compte, ni d'ailleurs leurs situations ou leur contexte. Les systèmes état-de-l'art du domaine tels que Soda ou Safe ne génèrent pas de résultats calculés à partir de l'analyse de la situation des utilisateurs. Ce rapport introduit une approche plus personnalisée, qui convient mieux aux utilisateurs finaux. Notre expérimentation principale se traduit par une interface de type search qui affiche les résultats dans un dashboard sous la forme de graphes, de tables de faits ou encore de miniatures de pages Internet. En fonction des requêtes initiales des utilisateurs, des recommandations de requêtes sont aussi affichées en sus, et ce dans le but de réduire le temps de réponse global du système. En ce sens, ces recommandations sont comparables à des prédictions. Notre travail se traduit par les contributions suivantes : tout d'abord, une architecture implémentée via des algorithmes parallélisés et qui prend en compte la diversité des sources de données, à savoir des données structurées ou non structurées dans le cadre d'un framework de questions-réponses qui peut être facilement configuré dans des environnements différents. De plus, une approche de traduction basée sur la résolution de contrainte, qui remplace le traditionnel langage-pivot par un modèle conceptuel et qui conduit à des requêtes multidimensionnelles mieux personnalisées. En outre, en ensemble de patrons linguistiques utilisés pour traduire des questions BI en des requêtes pour bases de données, qui peuvent être facilement adaptés dans le cas de configurations différentes. Enfin, nous avons implémenté une application pour iPhone/iPad et une interface de type "HTML" qui démontre la faisabilité des différentes approches développées grâce à un ensemble de mesures d'évaluations pour l'élément principal (le composant de traduction) et un scénario d'évaluation pour le framework dans sa globalité. Dans ce but, nous introduisons un ensemble de requêtes pouvant servir à évaluer d'autres système de recherche d'information dans le domaine, et nous montrons que notre système se comporte de façon similaire au système de référence WolframAlpha, en fonction des paramètres d'évaluation.
888 |
Record Linkage for Web DataHassanzadeh, Oktie 15 August 2013 (has links)
Record linkage refers to the task of finding and linking records (in a single database or in a set of data sources) that refer to the same entity. Automating the record linkage process is a challenging problem, and has been the topic of extensive research for many years. However, the changing nature of the linkage process and the growing size of data sources create new challenges for this task.
This thesis studies the record linkage problem for Web data sources. Our hypothesis is that a generic and extensible set of linkage algorithms combined within an easy-to-use framework that integrates and allows tailoring and combining of these algorithms can be used to effectively link large collections of Web data from different domains.
To this end, we first present a framework for record linkage over relational data, motivated by the fact that many Web data sources are powered by relational database engines. This framework is based on declarative specification of the linkage requirements by the user and allows linking records in many real-world scenarios. We present algorithms for translation of these requirements to queries that can run over a relational data source, potentially using a semantic knowledge base to enhance the accuracy of link discovery.
Effective specification of requirements for linking records across multiple data sources requires understanding the schema of each source, identifying attributes that can be used for linkage, and their corresponding attributes in other sources. Schema or attribute matching is often done with the goal of aligning schemas, so attributes are matched if they play semantically related roles in their schemas. In contrast, we seek to find attributes that can be used to link records between data sources, which we refer to as linkage points. In this thesis, we define the notion of linkage points and present the first linkage point discovery algorithms.
We then address the novel problem of how to publish Web data in a way that facilitates record linkage. We hypothesize that careful use of existing, curated Web sources (their data and structure) can guide the creation of conceptual models for semi-structured Web data that in turn facilitate record linkage with these curated sources. Our solution is an end-to-end framework for data transformation and publication, which includes novel algorithms for identification of entity types and their relationships out of semi-structured Web data. A highlight of this thesis is showcasing the application of the proposed algorithms and frameworks in real applications and publishing the results as high-quality data sources on the Web.
889 |
Record Linkage for Web DataHassanzadeh, Oktie 15 August 2013 (has links)
Record linkage refers to the task of finding and linking records (in a single database or in a set of data sources) that refer to the same entity. Automating the record linkage process is a challenging problem, and has been the topic of extensive research for many years. However, the changing nature of the linkage process and the growing size of data sources create new challenges for this task.
This thesis studies the record linkage problem for Web data sources. Our hypothesis is that a generic and extensible set of linkage algorithms combined within an easy-to-use framework that integrates and allows tailoring and combining of these algorithms can be used to effectively link large collections of Web data from different domains.
To this end, we first present a framework for record linkage over relational data, motivated by the fact that many Web data sources are powered by relational database engines. This framework is based on declarative specification of the linkage requirements by the user and allows linking records in many real-world scenarios. We present algorithms for translation of these requirements to queries that can run over a relational data source, potentially using a semantic knowledge base to enhance the accuracy of link discovery.
Effective specification of requirements for linking records across multiple data sources requires understanding the schema of each source, identifying attributes that can be used for linkage, and their corresponding attributes in other sources. Schema or attribute matching is often done with the goal of aligning schemas, so attributes are matched if they play semantically related roles in their schemas. In contrast, we seek to find attributes that can be used to link records between data sources, which we refer to as linkage points. In this thesis, we define the notion of linkage points and present the first linkage point discovery algorithms.
We then address the novel problem of how to publish Web data in a way that facilitates record linkage. We hypothesize that careful use of existing, curated Web sources (their data and structure) can guide the creation of conceptual models for semi-structured Web data that in turn facilitate record linkage with these curated sources. Our solution is an end-to-end framework for data transformation and publication, which includes novel algorithms for identification of entity types and their relationships out of semi-structured Web data. A highlight of this thesis is showcasing the application of the proposed algorithms and frameworks in real applications and publishing the results as high-quality data sources on the Web.
890 |
Risk of Stroke in Older Women Treated for Early Invasive Breast Cancer, Tamoxifen vs. Aromatase Inhibitors: A Population based Retrospective Cohort StudyWijeratne, Don Thiwanka Dilshan 30 December 2010 (has links)
Tamoxifen and aromatase inhibitors are treatment options for women with breast cancer and evidence on the risk of stroke is important in choosing between these two options. A systematic review of two randomized controlled trials and their nine related trial reports showed different methods for adverse event reporting and inconsistent estimates of stroke risk. In an observational cohort study of 5443 Ontario women, aged 66 years or older with early stage breast cancer, 86 ischemic stroke events (1.6%) occurred during follow-up of 5 years. There was no statistically significant difference in the risk of stroke between the hormone therapy groups [adjusted HR for tamoxifen compared to AI 1.330 (0.810, 2.179)]. Results were similar across cardiovascular disease risk groups and were robust to different follow up periods and analytic methods. This study suggests that there is no significant difference in stroke between these treatment options.
Page generated in 0.0285 seconds