1 |
The notion of 'information content of data' for databasesXu, Kaibo January 2009 (has links)
This thesis is concerned with a fundamental notion of information in the context of databases. The problem of information content of a conceptual data schema appears elusive. The conventional definition of information is established upon an entropy-based quantitative theory proposed by Shannon (1948). It is widely used in measuring the amount of information that is created and transmitted through a communication channel by applying the notion of entropy. However, such an approach seems lacking a capability of explaining phenomena concerning the content aspect of information. Moreover, it would appear how the information content of data in a database may be reasoned about has not been addressed adequately. We therefore believe that the notion of the information content of data should be fully investigated and formally defined. To this end, the notion of the information content of a signal is redefined by modifying the known definition of information content given by Dretske (1981, p. 65). Then what we call the information content inclusion relation (IIR) (a partial order of random events) between two random events is defined. A set of inference rules is presented for reasoning about the information content of a random event and explore how these ideas and the rules may be used in a database setting including the derivation of otherwise hidden information by deriving new IIR from a given set of IIR. Furthermore, it is observed that the problem of whether the instances of a data schema may be recovered from those of another does not seem to have been well investigated, and this, we believe, is fundamental for the relationship between two schemata. In the literature, works that are closest to this question are based upon the notion of relevant information capacity, which is concerned with whether one schema may replace another without losing the capacity of the system in storing data. It is also observed that the rationale of such an approach is over intuitive (even though the techniques involved are sophisticated): a convincing answer to this question should be based on the question whether one or more instances of a schema can tell us truly what an instance of another schema would be. This is a matter of one thing carrying information about another. To capture such a relationship, the notion of information carrying between states of affairs is introduced, through which we look at much more detailed levels of informational relationships than the conventional entropy-based approach, namely random events and particulars of random events. The validity of our ideas is demonstrated by applying them to schema transformations that are information bearing capability preserving. This includes, among others, some aspects of normalization for relational databases, schema transformation with Miller et al’s (1994) Schema Intension Graph (SIG) model. To verify our ideas on reasoning about the information content of data, a prototype called IIR-Reasoning is presented, which shows how our idea might be exploited in a real database setting including how real world events and database values are aligned.
|
2 |
Modelling stress levels based on physiological responses to web contentsIsiaka, Fatima January 2017 (has links)
Capturing data on user experience of web applications and browsing is important in many ways. For instance, web designers and developers may find such data quite useful in enhancing navigational features of web pages; rehabilitation therapists, mental-health specialists and other biomedical personnel regularly use computer simulations to monitor and control the behaviour of patients. Marketing and law enforcement agencies are probably two of the most common beneficiaries of such data - with the success of online marketing increasingly requiring a good understanding of customers' online behaviour. On the other hand, law enforcement agents have for long been using lie detection methods - typically relying on human physiological functions - to determine the likelihood of falsehood in interrogations. Quite often, online user experience is studied via tangible measures such as task completion time, surveys and comprehensive tests from which data attributes are generated. Prediction of users' stress level and behaviour in some of these cases depends mostly on task completion time and number of clicks per given time interval. However, such approaches are generally subjective and rely heavily on distributional assumptions making the results prone to recording errors. We propose a novel method - PHYCOB I - that addresses the foregoing issues. Primary data were obtained from laboratory experiments during which forty-four volunteers had their synchronized physiological readings - Skin Conductance Response, Skin Temperature, Eye tracker sensors and users activity attributes taken by a specially designed sensing device. PHYCOB I then collects secondary data attributes from these synchronized physiological readings and uses them for two purposes. Firstly, naturally arising structures in the data are detected via identifying optimal responses and high level tonic phases and secondly users are classified into three different stress levels. The method's novelty derives from its ability to integrate physiological readings and eye movement records to identify hidden correlates by simply computing the delay for each increase in amplitude in reaction to webpages contents. This addresses the problem of latency faced in most physiological readings. Performance comparisons are made with conventional predictive methods such as Neural Network and Logistic Regression whereas multiple runs of the Forward Search algorithm and Principal Component Analysis are used to cross-validate the performance. Results show that PHYCOB I outperforms the conventional models in terms of both accuracy and reliability - that is, the average recoverable natural structures for the three models with respect to accuracy and reliability are more consistent within the PHYCOB I environment than with the other two. There are two main advantages of the proposed method - its resistance to over-fitting and its ability to automatically assess human stress levels while dealing with specific web contents. The latter is particularly important in that it can be used to predict which contents of webpages cause stress-induced emotions to users when involved in online activities. There are numerous potential extensions of the model including, but not limited to, applications in law enforcement - detecting abnormal online behaviour; online shopping (marketing) - predicting what captures customers attention and palliative in biomedical application such as detecting levels of stress in patients during physiotherapy sessions.
|
3 |
A study into best practices and procedures used in the management of database systemsHolt, Victoria January 2017 (has links)
There has been a vast expansion of data usage in recent years. The requirements of database systems to provide a variety of information has resulted in many more types of database engines and approaches (such as cloud computing). A once simple management task has become much more complex. Challenges exist for database managers to make the best choices of practices and procedures to satisfy the requirements of organisations. This research is aimed at understanding how the management of database systems is undertaken, how best practices and procedures form a part of the management process, and the complex nature of database systems. The study examined the adoption of best practices and how the complex interactions between components of the database system affect management and performance. The research followed a mixed methods approach, using sequential explanatory design. The quantitative research phase, using an online survey, highlighted the breadth of issues relevant to database management. It concluded that existing practices and procedures were not optimal, and revealed some of the complexities. Based on the findings from the survey the qualitative research phase that followed utilized information from the quantitative survey to seek understanding of key areas, through a number of focus groups. As part of this research, an innovative method was developed in which thematic analysis of the resulting data was deepened through the use of systems thinking and diagramming. Taking this holistic approach to database systems enabled a different understanding of best practices and the complexity of database systems. A ‘blueprint’, called a CODEX, was drawn up to support improvement and innovation of database systems. Based on a comprehensive assessment of the individual causal interactions between data components, a data map detailed the complex interactions.
|
4 |
Studies on modal logics of time and spaceGatto, Alberto January 2016 (has links)
This dissertation presents original results in Temporal Logic and Spatial Logic. Part I concerns Branching-Time Logic. Since Prior 1967, two main semantics for Branching-Time Logic have been devised: Peircean and Ockhamist semantics. Zanardo 1998 proposed a general semantics, called Indistinguishability semantics, of which Peircean and Ockhamist semantics are limit cases. We provide a finite axiomatization of the Indistinguishability logic of upward endless bundled trees using a non-standard inference rule, and prove that this logic is strongly complete. In Part II, we study the temporal logic given by the tense operators F for future and P for past together with the derivative operator < d >, interpreted on the real numbers. We prove that this logic is neither strongly nor Kripke complete, it is PSPACE-complete, and it is finitely axiomatizable. In Part III, we study the spatial logic given by the derivative operator < d > and the graded modalities {◇n|n Ε N}. We prove that this language, call it L, is as expressive as the first-order language Lt of Flum and Ziegler 1980 when interpreted on T3 topological spaces. Then, we give a general definition of modal operator: essentially, a modal operator will be defined by a formula of Lt with at most one free variable. If a modal operator is defined by a formula predicating only over points, then it is called point-sort operator. We prove that L, even if enriched with all point-sort operators, however enriched with finitely many modal operators predicating also on open sets, cannot express Lt on T2 spaces. Finally, we axiomatize the logic of any class between all T1 and all T3 spaces and prove that it is PSPACE-complete.
|
5 |
Database query optimisation based on measures of regretAlyoubi, Khaled Hamed January 2016 (has links)
The query optimiser in a database management system (DBMS) is responsible for �nding a good order in which to execute the operators in a given query. However, in practice the query optimiser does not usually guarantee to �nd the best plan. This is often due to the non-availability of precise statistical data or inaccurate assumptions made by the optimiser. In this thesis we propose a robust approach to logical query optimisation that takes into account the unreliability in database statistics during the optimisation process. In particular, we study the ordering problem for selection operators and for join operators, where selectivities are modelled as intervals rather than exact values. As a measure of optimality, we use a concept from decision theory called minmax regret optimisation (MRO). When using interval selectivities, the decision problem for selection operator ordering turns out to be NP-hard. After investigating properties of the problem and identifying special cases which can be solved in polynomial time, we develop a novel heuristic for solving the general selection ordering problem in polynomial time. Experimental evaluation of the heuristic using synthetic data, the Star Schema Benchmark and real-world data sets shows that it outperforms other heuristics (which take an optimistic, pessimistic or midpoint approach) and also produces plans whose regret is on average very close to optimal. The general join ordering problem is known to be NP-hard, even for exact selectivities. So, for interval selectivities, we restrict our investigation to sets of join operators which form a chain and to plans that correspond to left-deep join trees. We investigate properties of the problem and use these, along with ideas from the selection ordering heuristic and other algorithms in the literature, to develop a polynomial-time heuristic tailored for the join ordering problem. Experimental evaluation of the heuristic shows that, once again, it performs better than the optimistic, pessimistic and midpoint heuristics. In addition, the results show that the heuristic produces plans whose regret is on average even closer to the optimal than for selection ordering.
|
6 |
Easing access to relational databases : investigating access to relational databases in the context of both novice and would-be expert usersGarner, Philip January 2016 (has links)
Relational databases are commonplace in a wide variety of applications and are used by a broad range of users with varying levels of technical understanding. Extracting information from relational databases can be difficult; novice users should be able to do so without any understanding of the database structure or query languages and those who wish to become experts can find it difficult to learn the skills required. Many applications designed for the novice user demand some understanding of the underlying database and/or are limited in their ability to translate keywords to appropriate results. Some educational applications often fail to provide assistance in key areas such as using joins, learning textual SQL and building queries from scratch. This thesis presents two applications: Context Aware Free Text ANalysis (CAFTAN) that aims to provide accurate keyword query interpretations for novices, and SQL in Steps (SiS) designed for students learning SQL. Both CAFTAN and SiS are subject to detailed evaluations; the former is shown to be capable of interpreting keyword queries in a way similar to humans; the latter was integrated into an undergraduate databases course and showed the potential benefits of introducing graphical aids into a student's learning process. The findings presented in this thesis have the potential to improve keyword search over relational databases in both a generic and customised context, as well as easing the process of learning SQL for new experts.
|
7 |
Web relation extraction with distant supervisionAugenstein, Isabelle January 2016 (has links)
Being able to find relevant information about prominent entities quickly is the main reason to use a search engine. However, with large quantities of information on the World Wide Web, real time search over billions of Web pages can waste resources and the end user’s time. One of the solutions to this is to store the answer to frequently asked general knowledge queries, such as the albums released by a musical artist, in a more accessible format, a knowledge base. Knowledge bases can be created and maintained automatically by using information extraction methods, particularly methods to extract relations between proper names (named entities). A group of approaches for this that has become popular in recent years are distantly supervised approaches as they allow to train relation extractors without text-bound annotation, using instead known relations from a knowledge base to heuristically align them with a large textual corpus from an appropriate domain. This thesis focuses on researching distant supervision for the Web domain. A new setting for creating training and testing data for distant supervision from the Web with entity-specific search queries is introduced and the resulting corpus is published. Methods to recognise noisy training examples as well as methods to combine extractions based on statistics derived from the background knowledge base are researched. Using co-reference resolution methods to extract relations from sentences which do not contain a direct mention of the subject of the relation is also investigated. One bottleneck for distant supervision for Web data is identified to be named entity recognition and classification (NERC), since relation extraction methods rely on it for identifying relation arguments. Typically, existing pre-trained tools are used, which fail in diverse genres with non-standard language, such as the Web genre. The thesis explores what can cause NERC methods to fail in diverse genres and quantifies different reasons for NERC failure. Finally, a novel method for NERC for relation extraction is proposed based on the idea of jointly training the named entity classifier and the relation extractor with imitation learning to reduce the reliance on external NERC tools. This thesis improves the state of the art in distant supervision for knowledge base population, and sheds light on and proposes solutions for issues arising for information extraction for not traditionally studied domains.
|
8 |
Learning to rank and order answers to definition questionsPandey, Shailesh January 2012 (has links)
The task of ordering a set of ranked result returned by an online search engine or an offline information retrieval engine is termed as reranking. It is called reranking for the reason that the candidate answer snippets are extracted by the information retrieval systems using some strategy for scoring, for example, based on occurrence of query words. We therefore assume the results to be already ranked and therefore the subsequent ranking is termed as reranking. Ranking drastically reduces the number of documents that will be processed further. Reranking usually involves deeper linguistic analysis and use of expert knowledge resources to get an even better understanding. The first task this thesis explores is regarding reranking of answers to definition questions. The answers are sentences returned by the google search engine in response to the definition questions. This step is relevant to definition questions because the questions tend to be short and therefore the information need of the user is difficult to assess. This means the final result is not a single piece of information but a ordered set of relevant sentences. In this thesis we explore two approaches to reranking that uses dependency tree statistics in a probabilistic setting. One of them is based on calculating edit distance between trees and tree statistics from the corpus and other one uses a tree kernel function and involves using the output from trained classifiers directly. The second task this thesis explores is the task of sentence ordering for definition questions. The reranking part of the definition question answering pipeline is able to identify the sentences that are relevant to a given question. However, answer to a definition question is a collection of sentences that has some coherent ordering between them. In a way this is not far away from the characteristics observed in a good summary. We believe that by moving sentences around to form a more coherent chunk we will be able to better meet the expectation of a user by improving his reading experience. We present an approach that finds an ordering for the sentences based on the knowledge extracted from observing the order of sentences in Wikipedia articles. Due to the popularity and acceptability of Wikipedia, proven by the fact that wikipedia results are ranked high by all major commercial search engines, it was chosen as the standard to be learnt from and compared against. We present a framework that uses the order of sentences extracted from Wikipedia articles to construct a single big graph of connected sentences. As a mechanism to select a node in the graph, we define a scoring function based on the relative position of candidate sentences.
|
9 |
An object-oriented data and query modelVella, Joseph January 2013 (has links)
OODBs build effective databases with their development peaking in 2000. A reason given for its neglect is of not having a declarative and procedural language. Relevant declarative languages include ODMG OQL and Flora-2, a first order and object-oriented logic programming system based on F-Logic. Few procedural object algebras have been proposed and ours is one and none are offered together. The characteristics of the algebra proposed and implemented with Flora-2 are: it is closed; it is typed; its ranges and outputs are homogeneous sets of objects; operators work on either values or logical identifiers; a result set is asserted; and a query expression’s meta details are asserted too. The algebra has ten operators and most have algebraic properties. A framework was developed too and it has its object base loaded with methods that maintain it and our algebra. A framework’s function is to read an EERM diagram to assert the respective classes and referential constraints. The framework then sifts for nonimplementable constructs in the EERM diagram and converts them into implementable ones (e.g. n-ary relationships) and translate the object base design into ODMG ODLs. This translation’s correctness and completeness is studied. The framework implements run-time type checking as Flora-2 lacks these. We develop type checking for methods that are static, with single arity, polymorphic (e.g. overloaded and bounded), and recursive structures (e.g. lists) through well-known and accepted techniques. A procedure that converts a subset of OQL into an algebraic expression is given. Once created it is manipulated to produce an optimised expression through a number of query rewriting methods: e.g. semantic, join reduction, and view materialisation. These techniques are aided by the underlying object base constructs; e.g. primary key constraint presence is used to avoid duplicate elimination of a result set. We show the importance of tight coupling in each translation step from an EERM to an algebraic expression. Also we identify invariant constructs, e.g. primary key through a select operand, which do not change from a query range to a query result set.
|
10 |
A relational data base management systemHutt, A. T. F. January 1976 (has links)
No description available.
|
Page generated in 0.0545 seconds