Global ETD Search

1	Efficient Incremental View Maintenance for Data Warehousing Chen, Songting 20 December 2005 (has links) "Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality." View Matching View Maintenance Materialized View Data Warehouse Information Integration Data warehousing Database searching
2	Cross-view learning Zhang, Li January 2018 (has links) Key to achieving more efficient machine intelligence is the capability to analysing and understanding data across different views - which can be camera views or modality views (such as visual and textual). One generic learning paradigm for automated understanding data from different views called cross-view learning which includes cross-view matching, cross-view fusion and cross-view generation. Specifically, this thesis investigates two of them, cross-view matching and cross-view generation, by developing new methods for addressing the following specific computer vision problems. The first problem is cross-view matching for person re-identification which a person is captured by multiple non-overlapping camera views, the objective is to match him/her across views among a large number of imposters. Typically a person's appearance is represented using features of thousands of dimensions, whilst only hundreds of training samples are available due to the difficulties in collecting matched training samples. With the number of training samples much smaller than the feature dimension, the existing methods thus face the classic small sample size (SSS) problem and have to resort to dimensionality reduction techniques and/or matrix regularisation, which lead to loss of discriminative power for cross-view matching. To that end, this thesis proposes to overcome the SSS problem in subspace learning by matching cross-view data in a discriminative null space of the training data. The second problem is cross-view matching for zero-shot learning where data are drawn from different modalities each for a different view (e.g. visual or textual), versus single-modal data considered in the first problem. This is inherently more challenging as the gap between different views becomes larger. Specifically, the zero-shot learning problem can be solved if the visual representation/view of the data (object) and its textual view are matched. Moreover, it requires learning a joint embedding space where different view data can be projected to for nearest neighbour search. This thesis argues that the key to make zero-shot learning models succeed is to choose the right embedding space. Different from most existing zero-shot learning models utilising a textual or an intermediate space as the embedding space for achieving crossview matching, the proposed method uniquely explores the visual space as the embedding space. This thesis finds that in the visual space, the subsequent nearest neighbour search would suffer much less from the hubness problem and thus become more effective. Moreover, a natural mechanism for multiple textual modalities optimised jointly in an end-to-end manner in this model demonstrates significant advantages over existing methods. The last problem is cross-view generation for image captioning which aims to automatically generate textual sentences from visual images. Most existing image captioning studies are limited to investigate variants of deep learning-based image encoders, improving the inputs for the subsequent deep sentence decoders. Existing methods have two limitations: (i) They are trained to maximise the likelihood of each ground-truth word given the previous ground-truth words and the image, termed Teacher-Forcing. This strategy may cause a mismatch between training and testing since at test-time the model uses the previously generated words from the model distribution to predict the next word. This exposure bias can result in error accumulation in sentence generation during test time, since the model has never been exposed to its own predictions. (ii) The training supervision metric, such as the widely used cross entropy loss, is different from the evaluation metrics at test time. In other words, the model is not directly optimised towards the task expectation. This learned model is therefore suboptimal. One main underlying reason responsible is that the evaluation metrics are non-differentiable and therefore much harder to be optimised against. This thesis overcomes the problems as above by exploring the reinforcement learning idea. Specifically, a novel actor-critic based learning approach is formulated to directly maximise the reward - the actual Natural Language Processing quality metrics of interest. As compared to existing reinforcement learning based captioning models, the new method has the unique advantage of a per-token advantage and value computation is enabled leading to better model training.

Search results

Efficient Incremental View Maintenance for Data Warehousing

Cross-view learning