Global ETD Search

41	Réordonnancement de candidats reponses pour un système de questions-réponses / Re-ranking of candidates answers of a question-answering system. Bernard, Guillaume 06 June 2011 (has links) L’objectif de cette thèse a été de proposer une approche robuste pour traiter le problème de la recherche dela réponse précise à une question.Notre première contribution a été la conception et la mise en œuvre d’un modèle de représentation robuste de l’informationet son implémentation. Son objectif est d’apporter aux phrases des documents et aux questions de l’informationstructurelle, composée de groupes de mots typés (segments typés) et de relations entre ces groupes. Ce modèle a été évalué sur différents corpus (écrits, oraux, web) et a donné de bons résultats, prouvant sa robustesse.Notre seconde contribution a consisté en la conception d’une méthode de réordonnancement des candidats réponsesretournés par un système de questions-réponses. Cette méthode a aussi été conçue pour des besoins de robustesse, ets’appuie sur notre première contribution. L’idée est de comparer une question et le passage d’où a été extraite une réponse candidate, et de calculer un score de similarité, en s’appuyant notamment sur une distance d’édition.Le réordonnanceur a été évalué sur les données de différentes campagnes d’évaluation. Les résultats obtenus sontparticulièrement positifs sur des questions longues et complexes. Ces résultats prouvent l’intérêt de notre méthode, notreapproche étant particulièrement adaptée pour traiter les questions longues, et ce quel que soit le type de données. Leréordonnanceur a ainsi été évalué sur l’édition 2010 de la campagne d’évaluation Quaero, où les résultats sont positifs. / The objective of this work is to introduce a new robust approach to treat the problem of finding the correctanswer to a question.Our first contribution is the design and implementation of a robust representation model for information. The aim is torepresent the structural information of sentences of documents and questions structural information. This representation iscomposed of typed groups of words (typed segments) and relations between these groups. This model has been evaluatedon several corpus (written, oral, web) and achieved good resultats, which proves his robustness.Our second contribution consisted is the design of a re-ranking method of a set of the candidate answers output by thequestion-answering system. This re-ranking method is based on the structural information representation. The general ideais to compare a question and a passage from where a candidate answer was extracted, and to compute a similarity score by using a modified edit distance we proposed.Our re-ranking method has been evaluated on the data of several evaluation campaigns. The results are quite goodon long and complex questions. These results show the interest of our method : our approach is quite adapted to treatlong question, whatever the type of the data. The re-ranker has been officially evaluated on the 2010 edition of the Quaeroevaluation campaign, with positives results. Question-Réponse Oral Réordonnancement Domaine ouvert Question-Answering Oral Re-ranking Open domain
42	Validation de réponses dans un système de questions réponses / Answer validation in question answering system Grappy, Arnaud 08 November 2011 (has links) Avec l'augmentation des connaissances disponibles sur Internet est apparue la difficulté d'obtenir une information. Les moteurs de recherche permettent de retourner des pages Web censés contenir l'information désirée à partir de mots clés. Toutefois il est encore nécessaire de trouver la bonne requête et d'examiner les documents retournés. Les systèmes de questions réponses ont pour but de renvoyer directement une réponse concise à partir d'une question posée en langue naturelle. La réponse est généralement accompagnée d'un passage de texte censé la justifier. Par exemple, pour la question « Quel est le réalisateur d'Avatar ? » la réponse « James Cameron » peut être renvoyée accompagnée de « James Cameron a réalisé Avatar. ». Cette thèse se focalise sur la validation de réponses qui permet de déterminer automatiquement si la réponse est valide. Une réponse est valide si elle est correcte (répond bien à la question) et justifiée par le passage textuel. Cette validation permet d'améliorer les systèmes de questions réponses en ne renvoyant à l'utilisateur que les réponses valides. Les approches permettant de reconnaître les réponses valides peuvent se décomposer en deux grandes catégories : -les approches utilisant un formalisme de représentation particulier de la question et du passage dans lequel les structures sont comparées ;-les approches suivant une approche par apprentissage qui combinent différents critères d'ordres lexicaux ou syntaxiques. Dans le but d'identifier les différents phénomènes sous tendant la validation de réponses, nous avons participé à la création d'un corpus annoté manuellement. Ces phénomènes sont de différentes natures telle que la paraphrase ou la coréférence. On peut aussi remarquer que les différentes informations sont réparties sur plusieurs phrases, voire sont manquantes dans les passages contenant la réponse. Une deuxième étude de corpus de questions a porté sur les différentes informations à vérifier afin de détecter qu'une réponse est valide. Cette étude a montré que les trois phénomènes les plus fréquents sont la vérification du type de la réponse, la date et le lieu contenus dans la question. Ces différentes études ont permis de mettre au point notre système de validation de réponses qui s'appuie sur une combinaison de critères. Certains critères traitent de la présence dans le passage des mots de la question ce qui permet de pointer la présence des informations de la question. Un traitement particulier a été effectué pour les informations de date en détectant une réponse comme n'étant pas valide si le passage ne contient pas la date contenue dans la question. D'autres critères, dont la proximité dans le passage des mots de la question et de la réponse, portent sur le lien entre les différents mots de la question dans le passage. Le second grand type de vérification permet de mesurer la compatibilité entre la réponse et la question. Un certain nombre de questions attendent une réponse étant d'un type particulier. La question de l'exemple précédent attend ainsi un réalisateur en réponse. Si la réponse n'est pas de ce type alors elle est incorrecte. Comme cette information peut ne pas se trouver dans le passage justificatif, elle est recherchée dans des documents autres à l'aide de la structure des pages Wikipédia, en utilisant des patrons syntaxiques ou grâce à des fréquences d'apparitions du type et de la réponse dans des documents. La vérification du type est particulièrement efficace puisqu'elle effectue 80 % de bonnes détections. La vérification de la validité des réponses est également pertinente puisque lors de la participation à une campagne d'évaluation, AVE 2008, le système s'est placé parmi les meilleurs toutes langues confondues. La dernière contribution a consisté à intégrer le module de validation dans un système de questions réponses, QAVAL. Dans ce cadre de nombreuses réponses sont extraites par QAVAL et ordonnées grâce au module de validation de réponses. Le système n'est plus utilisé afin de détecter les réponses valides mais pour fournir un score de confiance à chaque réponse. Le système QAVAL peut ainsi aussi bien être utilisé en effectuant des recherches dans des articles de journaux que dans des articles issus du Web. Les résultats sont assez bons puisqu'ils dépassent ceux obtenus par un simple ordonnancement des réponses de près de 50 %. / Question answering systems extract precise answers from a set of documents, and return the answers along with text snippets which justify them. For example, to the question "Who is the director of Avatar?" The answer "James Cameron" may be returned with "Avatar by James Cameron.".The answer validation detect automatically if the answer is valid ie. if it is correct (responds to the question) and justified by the text passage. This validation allows to improve the question answering systems by producing only valid answers.Two kind of methods can be used to detect right answers : -approaches using specific representation formalism of the question and the passage in which the structures are compared;-learning approaches that combines lexical and syntactic features.To identify the phenomena that characterize the answer validation, we built a manually annotated corpus. Differents phenomena can be seen like paraphrasing, coreference or that the information is spread in different sentences or documents. A second corpus aims to identify the different informations to be checked to valid an answer. This study showed that the three mains phenomena are the answer type, the date and place of the question.These studies have helped to develop our answer validation system which is based on a combination of features. The first one estimates the proportion of common terms in the snippet and the question, the second one measures the proximity of these terms and the answer. The second kind of features measure the compatibility between the answer and the question. Numerous questions wait for answers of an explicit type. For example, the question “Which president succeeded to Jacques Chirac?” requires an instance of president as answer.If the answer is not of this type then it is incorrect. The method aims at verifying that an answer given by a system corresponds to the given type. This verification is done by combining features provided by different methods. The first types of feature are statistical and compute the presence rate of both the answer and the type in documents, other features rely on named entity recognizers and the last criteria are based on the use of Wikipedia. Type checking is particularly effective because it makes 80 % correct detections. The final contribution was to integrate the validation module in a question answering system, QAVAL. Many answers are retrieved by QAVAL and ordered through the answers validation module. The module provide a confidence score to each response. QAVAL can be used both by researching the information in newspaper articles and in articles from the Web. The results are good, exceeding those obtained by a simple answer ranking from nearly 50%. Systèmes de questions réponses Validation de réponses Implication textuelle Question answering system Answer validation Textual entailment
43	Vision and language understanding with localized evidence Xu, Huijuan 16 February 2019 (has links) Enabling machines to solve computer vision tasks with natural language components can greatly improve human interaction with computers. In this thesis, we address vision and language tasks with deep learning methods that explicitly localize relevant visual evidence. Spatial evidence localization in images enhances the interpretability of the model, while temporal localization in video is necessary to remove irrelevant content. We apply our methods to various vision and language tasks, including visual question answering, temporal activity detection, dense video captioning and cross-modal retrieval. First, we tackle the problem of image question answering, which requires the model to predict answers to questions posed about images. We design a memory network with a question-guided spatial attention mechanism which assigns higher weights to regions that are more relevant to the question. The visual evidence used to derive the answer can be shown by visualizing the attention weights in images. We then address the problem of localizing temporal evidence in videos. For most language/vision tasks, only part of the video is relevant to the linguistic component, so we need to detect these relevant events in videos. We propose an end-to-end model for temporal activity detection, which can detect arbitrary length activities by coordinate regression with respect to anchors and contains a proposal stage to filter out background segments, saving computation time. We further extend activity category detection to event captioning, which can express richer semantic meaning compared to a class label. This derives the problem of dense video captioning, which involves two sub-problems: localizing distinct events in long video and generating captions for the localized events. We propose an end-to-end hierarchical captioning model with vision and language context modeling in which the captioning training affects the activity localization. Lastly, the task of text-to-clip video retrieval requires one to localize the specified query instead of detecting and captioning all events. We propose a model based on the early fusion of words and visual features, outperforming standard approaches which embed the whole sentence before performing late feature fusion. Furthermore, we use queries to regulate the proposal network to generate query related proposals. In conclusion, our proposed visual localization mechanism applies across a variety of vision and language tasks and achieves state-of-the-art results. Together with the inference module, our work can contribute to solving other tasks such as video question answering in future research. Computer science Dense video captioning Temporal activity detection Text-to-clip retrieval Visual question answering
44	A computational framework for mixed-initiative dialog modeling. January 2002 (has links) Chan, Shuk Fong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 114-122). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Contributions --- p.5 / Chapter 1.3 --- Thesis Outline --- p.9 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Mixed-Initiative Interactions --- p.11 / Chapter 2.2 --- Mixed-Initiative Spoken Dialog Systems --- p.14 / Chapter 2.2.1 --- Finite-state Networks --- p.16 / Chapter 2.2.2 --- Form-based Approaches --- p.17 / Chapter 2.2.3 --- Sequential Decision Approaches --- p.18 / Chapter 2.2.4 --- Machine Learning Approaches --- p.20 / Chapter 2.3 --- Understanding Mixed-Initiative Dialogs --- p.24 / Chapter 2.4 --- Cooperative Response Generation --- p.26 / Chapter 2.4.1 --- Plan-based Approach --- p.27 / Chapter 2.4.2 --- Constraint-based Approach --- p.28 / Chapter 2.5 --- Chapter Summary --- p.29 / Chapter 3 --- Mixed-Initiative Dialog Management in the ISIS system --- p.30 / Chapter 3.1 --- The ISIS Domain --- p.31 / Chapter 3.1.1 --- System Overview --- p.31 / Chapter 3.1.2 --- Domain-Specific Constraints --- p.33 / Chapter 3.2 --- Discourse and Dialog --- p.34 / Chapter 3.2.1 --- Discourse Inheritance --- p.37 / Chapter 3.2.2 --- Mixed-Initiative Dialogs --- p.41 / Chapter 3.3 --- Challenges and New Directions --- p.45 / Chapter 3.3.1 --- A Learning System --- p.46 / Chapter 3.3.2 --- Combining Interaction and Delegation Subdialogs --- p.49 / Chapter 3.4 --- Chapter Summary --- p.57 / Chapter 4 --- Understanding Mixed-Initiative Human-Human Dialogs --- p.59 / Chapter 4.1 --- The CU Restaurants Domain --- p.60 / Chapter 4.2 --- "Task Goals, Dialog Acts, Categories and Annotation" --- p.61 / Chapter 4.2.1 --- Task Goals and Dialog Acts --- p.61 / Chapter 4.2.2 --- Semantic and Syntactic Categories --- p.64 / Chapter 4.2.3 --- Annotating the Training Sentences --- p.65 / Chapter 4.3 --- Selective Inheritance Strategy --- p.67 / Chapter 4.3.1 --- Category Inheritance Rules --- p.67 / Chapter 4.3.2 --- Category Refresh Rules --- p.73 / Chapter 4.4 --- Task Goal and Dialog Act Identification --- p.78 / Chapter 4.4.1 --- Belief Networks Development --- p.78 / Chapter 4.4.2 --- Varying the Input Dimensionality --- p.80 / Chapter 4.4.3 --- Evaluation --- p.80 / Chapter 4.5 --- Procedure for Discourse Inheritance --- p.83 / Chapter 4.6 --- Chapter Summary --- p.86 / Chapter 5 --- Cooperative Response Generation in Mixed-Initiative Dialog Modeling --- p.88 / Chapter 5.1 --- System Overview --- p.89 / Chapter 5.1.1 --- State Space Generation --- p.89 / Chapter 5.1.2 --- Task Goal and Dialog Act Generation for System Response --- p.92 / Chapter 5.1.3 --- Response Frame Generation --- p.93 / Chapter 5.1.4 --- Text Generation --- p.100 / Chapter 5.2 --- Experiments and Results --- p.100 / Chapter 5.2.1 --- Subjective Results --- p.103 / Chapter 5.2.2 --- Objective Results --- p.105 / Chapter 5.3 --- Chapter Summary --- p.105 / Chapter 6 --- Conclusions --- p.108 / Chapter 6.1 --- Summary --- p.108 / Chapter 6.2 --- Contributions --- p.110 / Chapter 6.3 --- Future Work --- p.111 / Bibliography --- p.113 / Chapter A --- Domain-Specific Task Goals in CU Restaurants Domain --- p.123 / Chapter B --- Full list of VERBMOBIL-2 Dialog Acts --- p.124 / Chapter C --- Dialog Acts for Customer Requests and Waiter Responses in CU Restaurants Domain --- p.125 / Chapter D --- The Two Grammers for Task Goal and Dialog Act Identifi- cation --- p.130 / Chapter E --- Category Inheritance Rules --- p.143 / Chapter F --- Category Refresh Rules --- p.149 / Chapter G --- Full list of Response Trigger Words --- p.154 / Chapter H --- Evaluation Test Questionnaire for Dialog System in CU Restaurants Domain --- p.159 / Chapter I --- Details of the statistical testing Regarding Grice's Maxims and User Satisfaction --- p.161 Speech processing systems Question-answering systems Human-computer interaction
45	Template-Based Question Answering over Linked Data using Recursive Neural Networks January 2018 (has links) abstract: The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies used by these knowledge graphs. So, to make these knowledge graphs more accessible (even for non- experts) several question answering (QA) systems have been developed over the last decade. Due to the complexity of the task, several approaches have been undertaken that include techniques from natural language processing (NLP), information retrieval (IR), machine learning (ML) and the Semantic Web (SW). At a higher level, most question answering systems approach the question answering task as a conversion from the natural language question to its corresponding SPARQL query. These systems then utilize the query to retrieve the desired entities or literals. One approach to solve this problem, that is used by most systems today, is to apply deep syntactic and semantic analysis on the input question to derive the SPARQL query. This has resulted in the evolution of natural language processing pipelines that have common characteristics such as answer type detection, segmentation, phrase matching, part-of-speech-tagging, named entity recognition, named entity disambiguation, syntactic or dependency parsing, semantic role labeling, etc. This has lead to NLP pipeline architectures that integrate components that solve a specific aspect of the problem and pass on the results to subsequent components for further processing eg: DBpedia Spotlight for named entity recognition, RelMatch for relational mapping, etc. A major drawback in this approach is error propagation that is a common problem in NLP. This can occur due to mistakes early on in the pipeline that can adversely affect successive steps further down the pipeline. Another approach is to use query templates either manually generated or extracted from existing benchmark datasets such as Question Answering over Linked Data (QALD) to generate the SPARQL queries that is basically a set of predefined queries with various slots that need to be filled. This approach potentially shifts the question answering problem into a classification task where the system needs to match the input question to the appropriate template (class label). This thesis proposes a neural network approach to automatically learn and classify natural language questions into its corresponding template using recursive neural networks. An obvious advantage of using neural networks is the elimination for the need of laborious feature engineering that can be cumbersome and error prone. The input question would be encoded into a vector representation. The model will be trained and evaluated on the LC-QuAD Dataset (Large-scale Complex Question Answering Dataset). The dataset was created explicitly for machine learning based QA approaches for learning complex SPARQL queries. The dataset consists of 5000 questions along with their corresponding SPARQL queries over the DBpedia dataset spanning 5042 entities and 615 predicates. These queries were annotated based on 38 unique templates that the model will attempt to classify. The resulting model will be evaluated against both the LC-QuAD dataset and the Question Answering Over Linked Data (QALD-7) dataset. The recursive neural network achieves template classification accuracy of 0.828 on the LC-QuAD dataset and an accuracy of 0.618 on the QALD-7 dataset. When the top-2 most likely templates were considered the model achieves an accuracy of 0.945 on the LC-QuAD dataset and 0.786 on the QALD-7 dataset. After slot filling, the overall system achieves a macro F-score 0.419 on the LC- QuAD dataset and a macro F-score of 0.417 on the QALD-7 dataset. / Dissertation/Thesis / Masters Thesis Software Engineering 2018 Computer science Knowledge Graphs Linked Data Question Answering Recursive Neural Networks Semantic Web
46	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.
47	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.
48	Question Classification in Question Answering Systems Sundblad, Håkan January 2007 (has links) <p>Question answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications.</p><p>This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst.</p><p>The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain.</p> / Report code: LiU-Tek-Lic-2007:29. Question classification question answering machine learning taxonomy evaluation Computational linguistics Datorlingvistik
49	Ontology Learning And Question Answering (qa) Systems Baskurt, Meltem 01 May 2010 (has links) (PDF) Ontology Learning requires a deep specialization on Semantic Web, Knowledge Representation, Search Engines, Inductive Learning, Natural Language Processing, Information Storage, Extraction and Retrieval. Huge amount of domain specific, unstructured on-line data needs to be expressed in machine understandable and semantically searchable format. Currently users are often forced to search manually in the results returned by the keyword-based search services. They also want to use their native languages to express what they search. In this thesis we developed an ontology based question answering system that satisfies these needs by the research outputs of the areas stated above. The system allows users to enter a question about a restricted domain by means of natural language and returns exact answer of the questions. A set of questions are collected from the users in the domain. In addition to questions, their corresponding question templates were generated on the basis of the domain ontology. When the user asks a question and hits the search button, system chooses the suitable question template and builds a SPARQL query according to this template. System is also capable of answering questions required inference by using generic inference rules defined at a rule file. Our evaluation with ten users shows that the sytem is extremely simple to use without any training resulting in very good query performance. QA Computer Software 76.75-76.765
50	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.

Search results