Global ETD Search

161	Teleoperadoras ativas: estresse e expressividade oral Pimentel, Aline Tavares 09 October 2007 (has links) Made available in DSpace on 2016-04-27T18:12:26Z (GMT). No. of bitstreams: 1 Aline Tavares Pimentel.pdf: 623989 bytes, checksum: 0ca4614b775cdc0e01ab2e8e27d5f7e0 (MD5) Previous issue date: 2007-10-09 / Background: the professional voice, the verbal behavior and the mode as the communication interpersonal can be define the relations of job. To relate the verbal expressivity about stress, of form that if can think about who these aspects intervene is something recent in the speech-therapy. The telemarketing and the conditions of job have been explored in the research, the field of the professional voice. Objective: to investigate the relation between the symptoms of stress and the characteristics of verbal expressivity in active teleoperators of a central office of telemarketing of Salvador. Method: 52 teleoperators of the active sector had been selected, in the period between March and June of 2007. In this group, two protocols had been applied. First one investigated aspects about general health and the second the protocol approached aspects relative the occupational questions. Also the questionnaire Job Stress Scale for the verification was applied of stress in the work environment. Were select the records of the two teleoperators that had presented high consuming and low consuming for analyzes of the verbal expressivity for a speech therapy specialist in voice. Results: high consuming in 50% of the teleoperators was verified, haven t significance statistics between perception of the teleoperators and stress. The teleoperator with high consuming related great number of health problems. About your verbal expressivity this was positive, a time that presented intonations variations, adequate pauses and emphases and contextualized inflection. The teleoperator with low consuming, related little symptoms of health problems and about your verbal expressivity can be considered more negative, she show less emphases and phrases. Conclusion: The stress affected positively the verbal expressivity of the teleoperator with high consuming and refusal of the teleoperator with low consuming. Suggests work groups and discussion about stress inside the company, similar to increase the perception and assistant in the development of management for stress / Introdução: a voz profissional, a expressividade oral e o modo como a comunicação interpessoais pode definir as relações de trabalho. Relacionar a expressividade oral com estresse, de forma que se possa pensar em como esses aspectos interferem é algo recente na Fonoaudiologia. O teleatendimento e as condições de trabalho têm sido temas explorados nas pesquisas, no campo da voz profissional. Objetivo: investigar a relação entre os sintomas de estresse e as características de expressividade oral em teleoperadoras ativas de uma central de teleatendimento de Salvador. Método: foram selecionados 52 teleoperadoras do setor ativo, no período entre março e junho de 2007. Nesse grupo foram aplicados dois protocolos. O primeiro investigou aspectos da saúde geral e o segundo protocolo abordou aspectos relativos a questões ocupacionais. Também foi aplicado o questionário Job Stress Scale para a verificação do estresse no ambiente de trabalho. Foram selecionadas as gravações de duas teleoperadoras que apresentaram alto desgaste e baixo desgaste para analise da expressividade oral por uma fonoaudióloga especialista. Resultados: verificou-se alto desgaste em 50% das teleoperadoras, não teve significância estatística entre a percepção das teleoperadoras e estresse. A teleoperadora com alto desgaste referiu grande número de problemas de saúde. Quanto à expressividade oral esta foi positiva, uma vez que apresentou variações entonacionais, pausas e ênfases adequadas e inflexão contextualizada. A teleoperadora com baixo desgaste, referiu menos sintomas de problemas de saúde e quanto a expressividade oral pode ser considerada mais negativa, por apresentar menos ênfases e frases mais curtas. Conclusão: O estresse afetou positivamente a expressividade oral da teleoperadora com alto desgaste e negativa a da teleoperadora com baixo desgaste. Sugere-se grupos de trabalho e discussão sobre o estresse na empresa, a fim de aumentar a percepção e auxiliar no desenvolvimento de gerenciamento para o estresse Central de teleatendimentos Estresse Comportamento verbal Mulheres Comunicacao oral Telemarketing Qualidade da voz Stress ocupacional Voz Answering services Stress Verbal behavior Women CNPQ::CIENCIAS DA SAUDE::FONOAUDIOLOGIA
162	Belief detection and temporal analysis of experts in question answering communities : case strudy on stack overflow / Détection et analyse temporelle des experts dans les réseaux communautaires de questions réponses : étude de cas Stack Overflow Attiaoui, Dorra 01 December 2017 (has links) L'émergence du Web 2.0 a changé la façon avec laquelle les gens recherchent et obtiennent des informations sur internet. Entre sites communautaires spécialisés, réseaux sociaux, l'utilisateur doit faire face à une grande quantité d'informations. Les sites communautaires de questions réponses représentent un moyen facile et rapide pour obtenir des réponses à n'importe quelle question qu'une personne se pose. Tout ce qu'il suffit de faire c'est de déposer une question sur un de ces sites et d'attendre qu'un autre utilisateur lui réponde. Dans ces sites communautaires, nous voulons identifier les personnes très compétentes. Ce sont des utilisateurs importants qui partagent leurs connaissances avec les autres membres de leurs communauté. Ainsi la détection des experts est devenue une tache très importantes, car elle permet de garantir la qualité des réponses postées sur les différents sites. Dans cette thèse, nous proposons une mesure générale d'expertise fondée sur la théorie des fonctions de croyances. Cette théorie nous permet de gérer l'incertitude présente dans toutes les données émanant du monde réel. D'abord et afin d'identifier ces experts parmi la foule d'utilisateurs présents dans la communauté, nous nous sommes intéressés à identifier des attributs qui permettent de décrire le comportement de chaque individus. Nous avons ensuite développé un modèle statistique fondé sur la théorie des fonctions de croyance pour estimer l'expertise générale des usagers de la plateforme. Cette mesure nous a permis de classifier les différents utilisateurs et de détecter les plus experts d'entre eux. Par la suite, nous proposons une analyse temporelle pour étudier l'évolution temporelle des utilisateurs pendant plusieurs mois. Pour cette partie, nous décrirons com- ment les différents usagers peuvent évoluer au cours de leur activité dans la plateforme. En outre, nous nous sommes également intéressés à la détection des experts potentiels pendant les premiers mois de leurs inscriptions dans un site. L'efficacité de ces approches a été validée par des données réelles provenant de Stack Overflow. / During the last decade, people have changed the way they seek information online. Between question answering communities, specialized websites, social networks, the Web has become one of the most widespread platforms for information exchange and retrieval. Question answering communities provide an easy and quick way to search for information needed in any topic. The user has to only ask a question and wait for the other members of the community to respond. Any person posting a question intends to have accurate and helpful answers. Within these platforms, we want to find experts. They are key users that share their knowledge with the other members of the community. Expert detection in question answering communities has become important for several reasons such as providing high quality content, getting valuable answers, etc. In this thesis, we are interested in proposing a general measure of expertise based on the theory of belief functions. Also called the mathematical theory of evidence, it is one of the most well known approaches for reasoning under uncertainty. In order to identify experts among other users in the community, we have focused on finding the most important features that describe every individual. Next, we have developed a model founded on the theory of belief functions to estimate the general expertise of the contributors. This measure will allow us to classify users and detect the most knowledgeable persons. Therefore, once this metric defined, we look at the temporal evolution of users' behavior over time. We propose an analysis of users activity for several months in community. For this temporal investigation, we will describe how do users evolve during their time spent within the platform. Besides, we are also interested on detecting potential experts during the beginning of their activity. The effectiveness of these approaches is evaluated on real data provided from Stack Overflow. Expertise Détection des Experts Clustering Théorie des fonctions de croyance Combinaison Question Answering Communities Expertise Experts detection Clustering Theory of Belief Functions Combination
163	Addressing the brittleness of knowledge-based question-answering Chaw, Shaw Yi 02 April 2012 (has links) Knowledge base systems are brittle when the users of the knowledge base are unfamiliar with its content and structure. Querying a knowledge base requires users to state their questions in precise and complete formal representations that relate the facts in the question with relevant terms and relations in the underlying knowledge base. This requirement places a heavy burden on the users to become deeply familiar with the contents of the knowledge base and prevents novice users to effectively using the knowledge base for problem solving. As a result, the utility of knowledge base systems is often restricted to the developers themselves. The goal of this work is to help users, who may possess little domain expertise, to use unfamiliar knowledge bases for problem solving. Our thesis is that the difficulty in using unfamiliar knowledge bases can be addressed by an approach that funnels natural questions, expressed in English, into formal representations appropriate for automated reasoning. The approach uses a simplified English controlled language, a domain-neutral ontology, a set of mechanisms to handle a handful of well known question types, and a software component, called the Question Mediator, to identify relevant information in the knowledge base for problem solving. With our approach, a knowledge base user can use a variety of unfamiliar knowledge bases by posing their questions with simplified English to retrieve relevant information in the knowledge base for problem solving. We studied the thesis in the context of a system called ASKME. We evaluated ASKME on the task of answering exam questions for college level biology, chemistry, and physics. The evaluation consists of successive experiments to test if ASKME can help novice users employ unfamiliar knowledge bases for problem solving. The initial experiment measures ASKME's level of performance under ideal conditions, where the knowledge base is built and used by the same knowledge engineers. Subsequent experiments measure ASKME's level of performance under increasingly realistic conditions. In the final experiment, we measure ASKME's level of performance under conditions where the knowledge base is independently built by subject matter experts and the users of the knowledge base are a group of novices who are unfamiliar with the knowledge base. Results from the evaluation show that ASKME works well on different knowledge bases and answers a broad range of questions that were posed by novice users in a variety of domains. / text Knowledge bases Question answering Problem solving Natural language processing Project Halo Question Mediator Domain neutral ontologies Component Library Controlled languages AP exams Machine reading
164	Improvements to the complex question answering models Imam, Md. Kaisar January 2011 (has links) In recent years the amount of information on the web has increased dramatically. As a result, it has become a challenge for the researchers to find effective ways that can help us query and extract meaning from these large repositories. Standard document search engines try to address the problem by presenting the users a ranked list of relevant documents. In most cases, this is not enough as the end-user has to go through the entire document to find out the answer he is looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain complex question answering. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, we considered the task of complex question answering as query-focused multi-document summarization. In this thesis, to improve complex question answering we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. We have formulated the task of complex question answering using reinforcement framework, which to our best knowledge has not been applied for this task before and has the potential to improve itself by fine-tuning the feature weights from user feedback. We have also used unsupervised machine learning techniques (random walk, manifold ranking) and augmented semantic and syntactic information to improve them. Finally we experimented with question decomposition where instead of trying to find the answer of the complex question directly, we decomposed the complex question into a set of simple questions and synthesized the answers to get our final result. / x, 128 leaves : ill. ; 29 cm Question-answering systems -- Research Database searching Querying (Computer science) -- Research Information retrieval -- Research Dissertations, Academic
165	Approximation of OLAP queries on data warehouses Cao, Phuong Thao 20 June 2013 (has links) (PDF) We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre OLAP Approximate query answering OLAP data exchange Streaming data Edit distance Sampling algorithm Statistical dependencies Statistical model
166	Developing an enriched natural language grammar for prosodically-improved concent-to-speech synthesis Marais, Laurette 04 1900 (has links) The need for interacting with machines using spoken natural language is growing, along with the expectation that synthetic speech in this context sound natural. Such interaction includes answering questions, where prosody plays an important role in producing natural English synthetic speech by communicating the information structure of utterances. CCG is a theoretical framework that exploits the notion that, in English, information structure, prosodic structure and syntactic structure are isomorphic. This provides a way to convert a semantic representation of an utterance into a prosodically natural spoken utterance. GF is a framework for writing grammars, where abstract tree structures capture the semantic structure and concrete grammars render these structures in linearised strings. This research combines these frameworks to develop a system that converts semantic representations of utterances into linearised strings of natural language that are marked up to inform the prosody-generating component of a speech synthesis system. / Computing / M. Sc. (Computing) GF CCG Prosody Intonation Speech synthesis Concept-to-speech Information structure Syntax Question-answering Spoken natural language 006.54 Speech synthesis Computational linguistics
167	Locating Information in Heterogeneous log files / Localisation d'information dans les fichiers logs hétérogènes Saneifar, Hassan 02 December 2011 (has links) Cette thèse s'inscrit dans les domaines des systèmes Question Réponse en domaine restreint, la recherche d'information ainsi que TALN. Les systèmes de Question Réponse (QR) ont pour objectif de retrouver un fragment pertinent d'un document qui pourrait être considéré comme la meilleure réponse concise possible à une question de l'utilisateur. Le but de cette thèse est de proposer une approche de localisation de réponses dans des masses de données complexes et évolutives décrites ci-dessous.. De nos jours, dans de nombreux domaines d'application, les systèmes informatiques sont instrumentés pour produire des rapports d'événements survenant, dans un format de données textuelles généralement appelé fichiers log. Les fichiers logs représentent la source principale d'informations sur l'état des systèmes, des produits, ou encore les causes de problèmes qui peuvent survenir. Les fichiers logs peuvent également inclure des données sur les paramètres critiques, les sorties de capteurs, ou une combinaison de ceux-ci. Ces fichiers sont également utilisés lors des différentes étapes du développement de logiciels, principalement dans l'objectif de débogage et le profilage. Les fichiers logs sont devenus un élément standard et essentiel de toutes les grandes applications. Bien que le processus de génération de fichiers logs est assez simple et direct, l'analyse de fichiers logs pourrait être une tâche difficile qui exige d'énormes ressources de calcul, de temps et de procédures sophistiquées. En effet, il existe de nombreux types de fichiers logs générés dans certains domaines d'application qui ne sont pas systématiquement exploités d'une manière efficace en raison de leurs caractéristiques particulières. Dans cette thèse, nous nous concentrerons sur un type des fichiers logs générés par des systèmes EDA (Electronic Design Automation). Ces fichiers logs contiennent des informations sur la configuration et la conception des Circuits Intégrés (CI) ainsi que les tests de vérification effectués sur eux. Ces informations, très peu exploitées actuellement, sont particulièrement attractives et intéressantes pour la gestion de conception, la surveillance et surtout la vérification de la qualité de conception. Cependant, la complexité de ces données textuelles complexes, c.-à-d. des fichiers logs générés par des outils de conception de CI, rend difficile l'exploitation de ces connaissances. Plusieurs aspects de ces fichiers logs ont été moins soulignés dans les méthodes de TALN et Extraction d'Information (EI). Le grand volume de données et leurs caractéristiques particulières limitent la pertinence des méthodes classiques de TALN et EI. Dans ce projet de recherche nous cherchons à proposer une approche qui permet de répondre à répondre automatiquement aux questionnaires de vérification de qualité des CI selon les informations se trouvant dans les fichiers logs générés par les outils de conception. Au sein de cette thèse, nous étudions principalement "comment les spécificités de fichiers logs peuvent influencer l'extraction de l'information et les méthodes de TALN?". Le problème est accentué lorsque nous devons également prendre leurs structures évolutives et leur vocabulaire spécifique en compte. Dans ce contexte, un défi clé est de fournir des approches qui prennent les spécificités des fichiers logs en compte tout en considérant les enjeux qui sont spécifiques aux systèmes QR dans des domaines restreints. Ainsi, les contributions de cette thèse consistent brièvement en :〉Proposer une méthode d'identification et de reconnaissance automatique des unités logiques dans les fichiers logs afin d'effectuer une segmentation textuelle selon la structure des fichiers. Au sein de cette approche, nous proposons un type original de descripteur qui permet de modéliser la structure textuelle et le layout des documents textuels.〉Proposer une approche de la localisation de réponse (recherche de passages) dans les fichiers logs. Afin d'améliorer la performance de recherche de passage ainsi que surmonter certains problématiques dûs aux caractéristiques des fichiers logs, nous proposons une approches d'enrichissement de requêtes. Cette approches, fondée sur la notion de relevance feedback, consiste en un processus d'apprentissage et une méthode de pondération des mots pertinents du contexte qui sont susceptibles d'exister dans les passage adaptés. Cela dit, nous proposons également une nouvelle fonction originale de pondération (scoring), appelée TRQ (Term Relatedness to Query) qui a pour objectif de donner un poids élevé aux termes qui ont une probabilité importante de faire partie des passages pertinents. Cette approche est également adaptée et évaluée dans les domaines généraux.〉Etudier l'utilisation des connaissances morpho-syntaxiques au sein de nos approches. A cette fin, nous nous sommes intéressés à l'extraction de la terminologie dans les fichiers logs. Ainsi, nous proposons la méthode Exterlog, adaptée aux spécificités des logs, qui permet d'extraire des termes selon des patrons syntaxiques. Afin d'évaluer les termes extraits et en choisir les plus pertinents, nous proposons un protocole de validation automatique des termes qui utilise une mesure fondée sur le Web associée à des mesures statistiques, tout en prenant en compte le contexte spécialisé des logs. / In this thesis, we present contributions to the challenging issues which are encounteredin question answering and locating information in complex textual data, like log files. Question answering systems (QAS) aim to find a relevant fragment of a document which could be regarded as the best possible concise answer for a question given by a user. In this work, we are looking to propose a complete solution to locate information in a special kind of textual data, i.e., log files generated by EDA design tools.Nowadays, in many application areas, modern computing systems are instrumented to generate huge reports about occurring events in the format of log files. Log files are generated in every computing field to report the status of systems, products, or even causes of problems that can occur. Log files may also include data about critical parameters, sensor outputs, or a combination of those. Analyzing log files, as an attractive approach for automatic system management and monitoring, has been enjoying a growing amount of attention [Li et al., 2005]. Although the process of generating log files is quite simple and straightforward, log file analysis could be a tremendous task that requires enormous computational resources, long time and sophisticated procedures [Valdman, 2004]. Indeed, there are many kinds of log files generated in some application domains which are not systematically exploited in an efficient way because of their special characteristics. In this thesis, we are mainly interested in log files generated by Electronic Design Automation (EDA) systems. Electronic design automation is a category of software tools for designing electronic systems such as printed circuit boards and Integrated Circuits (IC). In this domain, to ensure the design quality, there are some quality check rules which should be verified. Verification of these rules is principally performed by analyzing the generated log files. In the case of large designs that the design tools may generate megabytes or gigabytes of log files each day, the problem is to wade through all of this data to locate the critical information we need to verify the quality check rules. These log files typically include a substantial amount of data. Accordingly, manually locating information is a tedious and cumbersome process. Furthermore, the particular characteristics of log files, specially those generated by EDA design tools, rise significant challenges in retrieval of information from the log files. The specific features of log files limit the usefulness of manual analysis techniques and static methods. Automated analysis of such logs is complex due to their heterogeneous and evolving structures and the large non-fixed vocabulary.In this thesis, by each contribution, we answer to questions raised in this work due to the data specificities or domain requirements. We investigate throughout this work the main concern "how the specificities of log files can influence the information extraction and natural language processing methods?". In this context, a key challenge is to provide approaches that take the log file specificities into account while considering the issues which are specific to QA in restricted domains. We present different contributions as below:> Proposing a novel method to recognize and identify the logical units in the log files to perform a segmentation according to their structure. We thus propose a method to characterize complex logicalunits found in log files according to their syntactic characteristics. Within this approach, we propose an original type of descriptor to model the textual structure and layout of text documents.> Proposing an approach to locate the requested information in the log files based on passage retrieval. To improve the performance of passage retrieval, we propose a novel query expansion approach to adapt an initial query to all types of corresponding log files and overcome the difficulties like mismatch vocabularies. Our query expansion approach relies on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of questions. The second phase consists of a novel type of pseudo relevance feedback. Our method is based on a new term weighting function, called TRQ (Term Relatedness to Query), introduced in this work, which gives a score to terms of corpus according to their relatedness to the query. We also investigate how to apply our query expansion approach to documents from general domains.> Studying the use of morpho-syntactic knowledge in our approaches. For this purpose, we are interested in the extraction of terminology in the log files. Thus, we here introduce our approach, named Exterlog (EXtraction of TERminology from LOGs), to extract the terminology of log files. To evaluate the extracted terms and choose the most relevant ones, we propose a candidate term evaluation method using a measure, based on the Web and combined with statistical measures, taking into account the context of log files. Recherche d'Information Traitement de la langue naturelle Fouille de textes Système Question Réponse Données Textuelles Complexes Information Retrieval Natural Language Processing Text Mining Question Answering Systems Complex Textual Data
168	Répondre efficacement aux requêtes Big Data en présence de contraintes / Efficient Big Data query answering in the presence of constraints Bursztyn, Damián 15 December 2016 (has links) Les contraintes sont les artéfacts fondamentaux permettant de donner un sens aux données. Elles garantissent que les données sont conformes aux besoins des applications. L'objet de cette thèse est d'étudier deux problématiques liées à la gestion efficace des données en présence de contraintes. Nous abordons le problème de répondre efficacement à des requêtes portant sur des données, en présence de contraintes déductives. Cela mène à des données implicites dérivant de données explicites et de contraintes. Les données implicites requièrent une étape de raisonnement afin de calculer les réponses aux requêtes. Le raisonnement par reformulation des requêtes compile les contraintes dans une requête modifiée qui, évaluée à partir des données explicites uniquement, génère toutes les réponses fondées sur les données explicites et implicites. Comme les requêtes reformulées peuvent être complexes, leur évaluation est souvent difficile et coûteuse. Nous étudions l'optimisation de la technique de réponse aux requêtes par reformulation dans le cadre de l'accès aux données à travers une ontologie, où des requêtes conjonctives SPARQL sont posées sur un ensemble de faits RDF sur lesquels des contraintes RDF Schema (RDFS) sont exprimées. La thèse apporte les contributions suivantes. (i) Nous généralisons les langages de reformulation de requêtes précédemment étudiées, afin d'obtenir un espace de reformulations d'une requête posée plutôt qu'une unique reformulation. (ii) Nous présentons des algorithmes effectifs et efficaces, fondés sur un modèle de coût, permettant de sélectionner une requête reformulée ayant le plus faible coût d'évaluation. (iii) Nous montrons expérimentalement que notre technique améliore significativement la performance de la technique de réponse aux requêtes par reformulation. Au-delà de RDFS, nous nous intéressons aux langages d'ontologie pour lesquels répondre à une requête peut se réduire à l'évaluation d'une certaine formule de la Logique du Premier Ordre (obtenue à partir de la requête et de l'ontologie), sur les faits explicites uniquement. (iv) Nous généralisons la technique de reformulation optimisée pour RDF, mentionnée ci-dessus, aux formalismes pour répondre à une requête LPO-réductible. (v) Nous appliquons cette technique à la Logique de Description DL-LiteR sous-jacente au langage OWL2 QL du W3C, et montrons expérimentalement ses avantages dans ce contexte. Nous présentons également, brièvement, un travail en cours sur le problème consistant à fournir des chemins d'accès efficaces aux données dans les systèmes Big Data. Nous proposons d'utiliser un ensemble de systèmes de stockages hétérogènes afin de fournir une meilleure performance que n'importe lequel d'entre eux, utilisé individuellement. Les données stockées dans chaque système peuvent être décrites comme des vues matérialisées sur les données applicatives. Répondre à une requête revient alors à réécrire la requête à l'aide des vues disponibles, puis à décoder la réécriture produite comme un ensemble de requêtes à exécuter sur les systèmes stockant les vues, ainsi qu'une requête les combinant de façon appropriée. / Constraints are the essential artefact for giving meaning to data, ensuring that it fits real-life application needs, and that its meaning is correctly conveyed to the users. This thesis investigates two fundamental problems related to the efficient management of data in the presence of constraints. We address the problem of efficiently answering queries over data in the presence of deductive constraints, which lead to implicit data that is entailed (derived) from the explicit data and the constraints. Implicit data requires a reasoning step in order to compute complete query answers, and two main query answering techniques exist. Data saturation compiles the constraints into the database by making all implicit data explicit, while query reformulation compiles the constraints into a modified query, which, evaluated over the explicit data only, computes all the answer due to explicit and/or implicit data. So far, reformulation-based query answering has received significantly less attention than saturation. In particular, reformulated queries may be complex, thus their evaluation may be very challenging. We study optimizing reformulation-based query answering in the setting of ontology-based data access, where SPARQL conjunctive queries are answered against a set of RDF facts on which constraints hold. When RDF Schema is used to express the constraints, the thesis makes the following contributions. (i) We generalize prior query reformulation languages, leading to a space of reformulated queries we call JUCQs (joins of unions of conjunctive queries), instead of a single fixed reformulation. (ii) We present effective and efficient cost-based algorithms for selecting from this space, a reformulated query with the lowest estimated cost. (iii) We demonstrate through experiments that our technique drastically improves the performance of reformulation-based query answering while always avoiding “worst-case” performance. Moving beyond RDFS, we consider the large and useful set of ontology languages enjoying FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. (iv) We generalize the above-mentioned JUCQ-based optimized reformulation technique to improve performance in any FOL-reducible setting, and (v) we instantiate this framework to the DL-LiteR Description Logic underpinning the W3C’s OWL2 QL ontology language, demonstrating significant performance advantages in this setting also. We also report on current work regarding the problem of providing efficient data access paths in Big Data stores. We consider a setting where a set of different, heterogeneous storage systems can be used side by side to provide better performance than any of them used individually. In such a setting, the data stored in each system can be described as views over the application data. Answering a query thus amounts to rewrite the query using the available views, and then to decode the rewriting into a set of queries to be executed on the systems holding the views, and a query combining them appropriately. Web sémantique Optimisation des requêtes Reformulation des requêtes Polystores Semantic Web Query optimization Query reformulation Query answering under constraints Hybrid stores
169	Vícejazyčný systém pro odpovídání na otázky nad otevřenou doménou / Multilingual Open-Domain Question Answering Slávka, Michal January 2021 (has links) Táto práca sa zaoberá automatickým viacjazyčným zodpovedaním na otázky v otvorenej doméne. V tejto práci sú navrhnuté prístupy k tejto málo prebádanej doméne. Konkrétne skúma, či: (i) použitie prekladu z angličtiny je dostačujúce, (ii) multilinguálne systémy vedia využiť preklad otázky do iných jazykov (iii) alebo je výhodnejšie nepoužívať žiaden preklad. Porovnávam použitie anglického systému založeného na modeli T5, ktorý využíva strojový preklad s natívne viacjazyčnými systémami založenými na viacjazyčnom modeli MT5. Anglický systém so strojovým prekladom mierne prekonáva svoje jednojazyčné náprotivky vo viacerých úlohách. Napriek tomu, že tento model bol natrénovaný na väčšom množstve dát zlepšenie nie je dostatočne signifikantné. To ukazuje, že použitie natívne viacjazyčných systémov je sľubným prístupom pre budúci výskum. Tiež prezentujem metódu získavania dokumentov v rôznych jazykoch pomocou algoritmu BM25 a porovnávam ju s anglickým retrievalom. Používanie viacjazyčných dôkazov sa javí ako prospešné a zlepšuje výkonnosť systému systémov.
170	Query Answering in Probabilistic Data and Knowledge Bases Ceylan, Ismail Ilkan 29 November 2017 (has links) Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory. info:eu-repo/classification/ddc/004 ddc:004

Search results