• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 507
  • 79
  • 36
  • 29
  • 22
  • 15
  • 11
  • 10
  • 9
  • 8
  • 6
  • 6
  • 5
  • 4
  • 3
  • Tagged with
  • 870
  • 286
  • 264
  • 221
  • 201
  • 169
  • 152
  • 133
  • 129
  • 128
  • 124
  • 116
  • 103
  • 101
  • 101
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
521

Contribution à l'analyse et l'évaluation des requêtes expertes : cas du domaine médical / Contribution to the analyze and evaluation of clinical queries : medical domain

Znaidi, Eya 30 June 2016 (has links)
La recherche d'information nécessite la mise en place de stratégies qui consistent à (1) cerner le besoin d'information ; (2) formuler le besoin d'information ; (3) repérer les sources pertinentes ; (4) identifier les outils à exploiter en fonction de ces sources ; (5) interroger les outils ; et (6) évaluer la qualité des résultats. Ce domaine n'a cessé d'évoluer pour présenter des techniques et des approches permettant de sélectionner à partir d'un corpus de documents l'information pertinente capable de satisfaire le besoin exprimé par l'utilisateur. De plus, dans le contexte applicatif du domaine de la RI biomédicale, les sources d'information hétérogènes sont en constante évolution, aussi bien du point de vue de la structure que du contenu. De même, les besoins en information peuvent être exprimés par des utilisateurs qui se caractérisent par différents profils, à savoir : les experts médicaux comme les praticiens, les cliniciens et les professionnels de santé, les utilisateurs néophytes (sans aucune expertise ou connaissance du domaine) comme les patients et leurs familles, etc. Plusieurs défis sont liés à la tâche de la RI biomédicale, à savoir : (1) la variation et la diversité du besoin en information, (2) différents types de connaissances médicales, (3) différences de compé- tences linguistiques entre experts et néophytes, (4) la quantité importante de la littérature médicale ; et (5) la nature de la tâche de RI médicale. Cela implique une difficulté d'accéder à l'information pertinente spécifique au contexte de la recherche, spécialement pour les experts du domaine qui les aideraient dans leur prise de décision médicale. Nos travaux de thèse s'inscrivent dans le domaine de la RI biomédicale et traitent les défis de la formulation du besoin en information experte et l'identification des sources pertinentes pour mieux répondre aux besoins cliniques. Concernant le volet de la formulation et l'analyse de requêtes expertes, nous proposons des analyses exploratoires sur des attributs de requêtes, que nous avons définis, formalisés et calculés, à savoir : (1) deux attributs de longueur en nombre de termes et en nombre de concepts, (2) deux facettes de spécificité terme-document et hiérarchique, (3) clarté de la requête basée sur la pertinence et celle basée sur le sujet de la requête. Nous avons proposé des études et analyses statistiques sur des collections issues de différentes campagnes d'évaluation médicales CLEF et TREC, afin de prendre en compte les différentes tâches de RI. Après les analyses descriptives, nous avons étudié d'une part, les corrélations par paires d'attributs de requêtes et les analyses de corrélation multidimensionnelle. Nous avons étudié l'impact de ces corrélations sur les performances de recherche d'autre part. Nous avons pu ainsi comparer et caractériser les différentes requêtes selon la tâche médicale d'une manière plus généralisable. Concernant le volet lié à l'accès à l'information, nous proposons des techniques d'appariement et d'expansion sémantiques de requêtes dans le cadre de la RI basée sur les preuves cliniques. / The research topic of this document deals with a particular setting of medical information retrieval (IR), referred to as expert based information retrieval. We were interested in information needs expressed by medical domain experts like praticians, physicians, etc. It is well known in information retrieval (IR) area that expressing queries that accurately reflect the information needs is a difficult task either in general domains or specialized ones and even for expert users. Thus, the identification of the users' intention hidden behind queries that they submit to a search engine is a challenging issue. Moreover, the increasing amount of health information available from various sources such as government agencies, non-profit and for-profit organizations, internet portals etc. presents oppor- tunities and issues to improve health care information delivery for medical professionals, patients and general public. One critical issue is the understanding of users search strategies and tactics for bridging the gap between their intention and the delivered information. In this thesis, we focus, more particularly, on two main aspects of medical information needs dealing with the expertise which consist of two parts, namely : - Understanding the users intents behind the queries is critically important to gain a better insight of how to select relevant results. While many studies investigated how users in general carry out exploratory health searches in digital environments, a few focused on how are the queries formulated, specifically by domain expert users. We address more specifically domain expert health search through the analysis of query attributes namely length, specificity and clarity using appropriate proposed measures built according to different sources of evidence. In this respect, we undertake an in-depth statistical analysis of queries issued from IR evalua- tion compaigns namely Text REtrieval Conference (TREC) and Conference and Labs of the Evaluation Forum (CLEF) devoted for different medical tasks within controlled evaluation settings. - We address the issue of answering PICO (Population, Intervention, Comparison and Outcome) clinical queries formulated within the Evidence Based Medicine framework. The contributions of this part include (1) a new algorithm for query elicitation based on the semantic mapping of each facet of the query to a reference terminology, and (2) a new document ranking model based on a prioritized aggregation operator. we tackle the issue related to the retrieval of the best evidence that fits with a PICO question, which is an underexplored research area. We propose a new document ranking algorithm that relies on semantic based query expansion leveraged by each question facet. The expansion is moreover bounded by the local search context to better discard irrelevant documents. The experimental evaluation carried out on the CLIREC dataset shows the benefit of our approaches.
522

Intelligent Data Layer: : An approach to generating data layer from normalized database model.

Buzo, Amir January 2012 (has links)
Model View Controller (MVC) software architecture is widely spread and commonly used in application’s development. Therefore generation of data layer for the database model is able to reduce cost and time. After research on current Object Relational Mapping (ORM) tools, it was discovered that there are generating tools like Data Access Object (DAO) and Hibernate, however their usage causes problems like inefficiency and slow performance due to many connections with database and set up time. Most of these tools are trying to solve specific problems rather than generating a data layer which is an important component and the bottom layer of database centred applications. The proposed solution to the problem is an engineering approach where we have designed a tool named Generated Intelligent Data Layer (GIDL). GIDL tool generates small models which create the main data layer of the system according to the Database Model. The goal of this tool is to enable and allow software developers to work only with object without deep knowledge in SQL. The problem of transaction and commit is solved by the tool. Also filter objects are constructed for filtering the database. GIDL tool reduced the number of connections and also have a cache where to store object lists and modify them. The tool is compared under the same environment with Hibernate and showed a better performance in terms of time evaluations for the same functions. GIDL tool is beneficial for software developers, because it generates the entire data layer.
523

Balancing Money and Time for OLAP Queries on Cloud Databases

Sabih, Rafia January 2016 (has links) (PDF)
Enterprise Database Management Systems (DBMSs) have to contend with resource-intensive and time-varying workloads, making them well-suited candidates for migration to cloud plat-forms { specifically, they can dynamically leverage the resource elasticity while retaining affordability through the pay-as-you-go rental interface. The current design of database engine components lays emphasis on maximizing computing efficiency, but to fully capitalize on the cloud's benefits, the outlays of these computations also need to be factored into the planning exercise. In this thesis, we investigate this contemporary problem in the context of industrial-strength deployments of relational database systems on real-world cloud platforms. Specifically, we consider how the traditional metric used to compare query execution plans, namely response-time, can be augmented to incorporate monetary costs in the decision process. The challenge here is that execution-time and monetary costs are adversarial metrics, with a decrease in one entailing a rise in the other. For instance, a Virtual Machine (VM) with rich physical resources (RAM, cores, etc.) decreases the query response-time, but is expensive with regard to rental rates. In a nutshell, there is a tradeoff between money and time, and our goal therefore is to identify the VM that others the best tradeoff between these two competing considerations. In our study, we pro le the behavior of money versus time for a given query, and de ne the best tradeoff as the \knee" { that is, the location on the pro le with the minimum Euclidean distance from the origin. To study the performance of industrial-strength database engines on real-world cloud infrastructure, we have deployed a commercial DBMS on Google cloud services. On this platform, we have carried out extensive experimentation with the TPC-DS decision-support benchmark, an industry-wide standard for evaluating database system performance. Our experiments demonstrate that the choice of VM for hosting the database server is a crucial decision, because: (i) variation in time and money across VMs is significant for a given query, (ii) no one VM offers the best money-time tradeoff across all queries. To efficiently identify the VM with the best tradeoff from a large suite of available configurations, we propose a technique to characterize the money-time pro le for a given query. The core of this technique is a VM pruning mechanism that exploits the property of partially ordered set of the VMs on their resources. It processes the minimal and maximal VMs of this poset for estimated query response-time. If the response-times on these extreme VMs are similar, then all the VMs sandwiched between them are pruned from further consideration. Otherwise, the already processed VMs are set aside, and the minimal and maximal VMs of the remaining unprocessed VMs are evaluated for their response-times. Finally, the knee VM is identified from the processed VMs as the one with the minimum Euclidean distance from the origin on the money-time space. We theoretically prove that this technique always identifies the knee VM; further, if it is acceptable to and a \near-optimal" knee by providing a relaxation-factor on the response-time distance from the optimal knee, then it is also capable of finding more efficiently a satisfactory knee under these relaxed conditions. We propose two favors of this approach: the first one prunes the VMs using complete plan information received from database engine API, and named as Plan-based Identification of Knee (PIK). On the other hand, to further increase the efficiency of the identification of the knee VM, we propose a sub-plan based pruning algorithm called Sub-Plan-based Identification of Knee (SPIK), which requires modifications in the query optimizer. We have evaluated PIK on a commercial system and found that it often requires processing for only 20% of the total VMs. The efficiency of the algorithm is further increased significantly, by using 10-20% relaxation in response-time. For evaluating SPIK , we prototyped it on an open-source engine { Postgresql 9.3, and also implemented it as Java wrapper program with the commercial engine. Experimentally, the processing done by SPIK is found to be only 40% of the PIK approach. Therefore, from an overall perspective, this thesis facilitates the desired migration of enterprise databases to cloud platforms, by identifying the VM(s) that offer competitive tradeoffs between money and time for the given query.
524

Suporte a consultas por similaridade unárias em SQL / Extending SQL to support unary similary queries

Mônica Ribeiro Porto Ferreira 15 February 2008 (has links)
Os operadores convencionais para comparação de dados por igualdade e por relação de ordem total não são adequados para o gerenciamento de dados complexos como, por exemplo, os dados multimí?dia (imagens, áudio, textos longos), séries temporais e seqüências genéticas. Para comparar dados desses tipos, o grau de similaridade entre suas instâncias é, em geral, o fator mais importante sendo, portanto, indicado que as operações de consulta sejam realizadas utilizando os chamados operadores por similaridade. Existem operadores de busca por similaridade tanto unários quanto binários. Os operadores unários são utilizados para implementar operações de seleção, enquanto os operadores binários destinam-se a operações de junção. A álgebra relacional, usada nos Sistemas de Gerenciamento de Bases de Dados Relacionais, não provê suporte para expressar critérios de busca por similaridade. Para suprir esse suporte, está em desenvolvimento no Grupo de Bases de Dados e Imagens (GBdI-ICMC-USP) uma extensão à álgebra relacional que permite representar as consultas por similaridade em expressões algébricas. Esta dissertação incorpora-se nesse empreendimento, abordando o tratamento aos operadores unários por similaridade na álgebra, bem como a implementação do otimizador de consultas por similaridade no SIREN (Similarity Retrieval Engine) para que as consultas por similaridade possam ser respondidas pelos Sistemas de Gerenciamento de Bases de Dados relacionais / Conventional operators for data comparison based on exact matching and total order relations are not appropriate to manage complex data, such as multimedia data (e.g. images, audio and large texts), time series and genetic sequences. In fact, the most important aspect to compare complex data is usually the similarity degree between instances, leading to the use of similarity operators to perform search and retrieval operations. Similarity operators can be classified as unary or as binary, respectively used to implement selection operations and joins. However, the Relation Algebra, employed in Relational Database Management Systems (DBMS), does not provide resources to express similarity search criteria. In order to fulfill this lack of support, an extension to the Relational Algebra is under development at GBdI-ICMC-USP (Grupo de Bases de Dados e Imagens), aiming to represent similarity queries in algebraic expressions. This work contributes to such an effort by dealing with unary similarity operators in Relational Algebra and by developing a similarity query optimizer for SIREN (Similarity Retrieval Engine), therefore allowing similarity queries to be answered by Relational DBMS
525

Traitement de requêtes SPARQL sur des données liées / SPARQL distributed query processing over linked data

Macina, Abdoul 17 December 2018 (has links)
De plus en plus de sources de données liées sont publiées à travers le Web en s'appuyant sur les technologies du Web sémantique, formant ainsi un large réseau de données distribuées. Cependant il est difficile pour les consommateurs de données de profiter de la richesse de ces données, compte tenu de leur distribution, de l'augmentation de leur volume et de l'autonomie des sources de données. Les moteurs fédérateurs de données permettent d'interroger ces sources de données en utilisant des techniques de traitement de requêtes distribuées. Cependant, une mise en œuvre naïve de ces techniques peut générer un nombre considérable de requêtes distantes et de nombreux résultats intermédiaires entraînant ainsi un long temps de traitement des requêtes et des communications réseau coûteuse. Par ailleurs, la sémantique des requêtes distribuées est souvent ignorée. L'expressivité des requêtes, le partitionnement des données et leur réplication sont d'autres défis auxquels doivent faire face les moteurs de requêtes. Pour répondre à ces défis, nous avons d'abord proposé une sémantique des requêtes distribuées compatible avec les standards SPARQL et RDF qui préserve l’expressivité de SPARQL. Nous avons ensuite présenté plusieurs stratégies d'optimisation pour un moteur de requêtes fédérées qui interroge de manière transparente des sources de données distribuées. La performance de ces optimisations est évaluée sur une implémentation d’un moteur de requêtes distribuées SPARQL / Driven by the Semantic Web standards, an increasing number of RDF data sources are published and connected over the Web by data providers, leading to a large distributed linked data network. However, exploiting the wealth of these data sources is very challenging for data consumers considering the data distribution, their volume growth and data sources autonomy. In the Linked Data context, federation engines allow querying these distributed data sources by relying on Distributed Query Processing (DQP) techniques. Nevertheless, a naive implementation of the DQP approach may generate a tremendous number of remote requests towards data sources and numerous intermediate results, thus leading to costly network communications. Furthermore, the distributed query semantics is often overlooked. Query expressiveness, data partitioning, and data replication are other challenges to be taken into account. To address these challenges, we first proposed in this thesis a SPARQL and RDF compliant Distributed Query Processing semantics which preserves the SPARQL language expressiveness. Afterwards, we presented several strategies for a federated query engine that transparently addresses distributed data sources, while managing data partitioning, query results completeness, data replication, and query processing performance. We implemented and evaluated our approach and optimization strategies in a federated query engine to prove their effectiveness.
526

Analytical Query Processing Based on Continuous Compression of Intermediates

Damme, Patrick 02 October 2020 (has links)
Nowadays, increasingly large amounts of data are being collected in numerous areas ranging from science to industry. To gain valueable insights from these data, the importance of Online Analytical Processing (OLAP) workloads is constantly growing. At the same time, the hardware landscape is continuously evolving. On the one hand, the increasing capacities of DRAM allow database systems to store their entire data in main memory. Furthermore, the performance of microprocessors has improved tremendously in recent years through the use of sophisticated hardware techniques, such as Single Instruction Multiple Data (SIMD) extensions promising hitherto unknown processing speeds. On the other hand, the main memory bandwidth has not increased proportionately, such that the data access is now the main bottleneck for an efficient data processing. To face these developments, in-memory column-stores have emerged as a new database architecture. These systems store each attribute of a relation separately in memory as a contiguous sequence of values. It is state-of-the-art to encode all values as integers and apply lossless lightweight integer compression to reduce the data size. This offers several advantages ranging from lower transfer times between RAM and CPU over a better utilization of the cache hierarchy to fast direct processing of compressed data. However, compression also incurs a certain computational overhead. State-of-the-art systems focus on the compression of base data. However, intermediate results generated during the execution of complex analytical queries can exceed the base data in number and total size. Since in in-memory systems, accessing intermediates is as expensive as accessing base data, intermediates should be handled as efficiently as possible, too. While there are approaches trying to avoid intermediates whenever it is possible, we envision the orthogonal approach of efficiently representing intermediates using lightweight integer compression algorithms to reduce memory accesses. More precisely, our vision is a balanced query processing based on lightweight compression of intermediate results in in-memory column-stores. That means, all intermediates shall be represented using a suitable lightweight integer compression algorithm and processed by compression-enabled query operators to avoid a full decompression, whereby compression shall be used in a balanced way to ensure that its benefits outweigh its costs. In this thesis, we address all important aspects of this vision. We provide an extensive overview of existing lightweight integer compression algorithms and conduct a systematical experimental survey of several of these algorithms to gain a deep understanding of their behavior. We propose a novel compression-enabled processing model for in-memory column-stores allowing a continuous compression of intermediates. Additionally, we develop novel cost-based strategies for a compression-aware secondary query optimization to make effective use of our processing model. Our end-to-end evaluation using the famous Star Schema Benchmark shows that our envisioned compression of intermediates can improve both the memory footprint and the runtime of complex analytical queries significantly.:1 Introduction 1.1 Contributions 1.2 Outline 2 Lightweight Integer Compression 2.1 Foundations 2.1.1 Disambiguation of Lightweight Integer Compression 2.1.2 Overview of Lightweight Integer Compression 2.1.3 State-of-the-Art in Lightweight Integer Compression 2.2 Experimental Survey 2.2.1 Related Work 2.2.2 Experimental Setup and Methodology 2.2.3 Evaluation of the Impact of the Data Characteristics 2.2.4 Evaluation of the Impact of the Hardware Characteristics 2.2.5 Evaluation of the Impact of the SIMD Extension 2.3 Summary and Discussion 3 Processing Compressed Intermediates 3.1 Processing Model for Compressed Intermediates 3.1.1 Related Work 3.1.2 Description of the Underlying Processing Model 3.1.3 Integration of Compression into Query Operators 3.1.4 Integration of Compression into the Overall Query Execution 3.1.5 Efficient Implementation 3.1.6 Evaluation 3.2 Direct Integer Morphing Algorithms 3.2.1 Related Work 3.2.2 Integer Morphing Algorithms 3.2.3 Example Algorithms 3.2.4 Evaluation 3.3 Summary and Discussion 4 Compression-Aware Query Optimization Strategies 4.1 Related Work 4.2 Compression-Aware Secondary Query Optimization 4.2.1 Compression-Level: Selecting a Suitable Algorithm 4.2.2 Operator-Level: Selecting Suitable Input/Output Formats 4.2.3 QEP-Level: Selecting Suitable Formats for All Involved Columns 4.3 Evaluation 4.3.1 Compression-Level: Selecting a Suitable Algorithm 4.3.2 Operator-Level: Selecting Suitable Input/Output Formats 4.3.3 Lessons Learned 4.4 Summary and Discussion 5 End-to-End Evaluation 5.1 Experimental Setup and Methodology 5.2 A Simple OLAP Query 5.3 Complex OLAP Queries: The Star Schema Benchmark 5.4 Summary and Discussion 6 Conclusion 6.1 Summary of this Thesis 6.2 Directions for Future Work Bibliography List of Figures List of Tables
527

A Performance Comparison of Auto-Generated GraphQL Server Implementations / En jämförelse av automatiskt genererade GraphQL server implementationer

Larsson, Markus, Ångström, David January 2020 (has links)
As databases and traffic over the internet is becoming larger by the day, the performance of sending information has become a target of great importance. In past years, other software architectural styles such as REST have been used as it is a reliable framework and works really well when one has a dependable internet connection. In 2015, the querying language GraphQL was released by Facebook to the public as an alternative to REST. GraphQL made improvements in fetching data by for example removing the possibility of under- and overfitting. This means that a client only gets the data which they have requested, nothing more, nothing less. To create a GraphQL schema and server implementation requires time, effort and knowledge. This is however a requirement to run GraphQL over your current legacy database. For this reason multiple server implementation tools have been created by vendors to reduce development time and instead auto-generates a GraphQL schema and server implementation using an already existing database. This bachelor thesis will pick, run and compare the benchmarks of the two different server implementation tools Hasura and PostGraphile. This is done using a benchmark methodology based on technical difficulties (choke points). The result of our benchmark suggests that the throughput is larger for Hasura compared to PostGraphile whilst the query execution time as well as query response time is similar. PostGraphile is better at paging without offset as well as ordering, but on all other cases Hasura outperforms PostGraphile or shows similar results. / Linköping GraphQL Benchmark (LinGBM)
528

Vyhledávání v hudebních signálech / Search in Music Signals

Skála, František January 2012 (has links)
This work contains overview of methods used in the area of Music Information Retrieval, mainly for purposes of searching of musical recordings. Several existing services in the areas of music identification and searching are presented and their methods for unique song identification are described. This work also focuses on possible modifications of these algorithms for searching of cover versions of songs and for the possibility of searching based on voice created examples.
529

Improvements of the syntax of the query language DQL / Förbättringar i syntax för query språket DQL

Diep, Mikael, Cheimonettos, Anestis January 2023 (has links)
This thesis focuses on improving the syntax of a query language named DQL(Dynamic Query Language) in order to enhance the user experience and productivity of its users. The study investigates the original state of the query language and identifies areas for improvement in terms of intuitiveness, efficiency, and consistency.  Through an extensive review of existing literature and case studies, the thesis develops a set of guidelines for designing intuitive query languages that minimise the cognitive load for users. The thesis also proposes several modifications to the syntax of DQL that aim to simplify the structure and improve the readability of queries. Finally, the thesis evaluates the effectiveness of the proposed modifications through  semi-structured interviews to compare the original syntax with the proposed new one.
530

Parallel Query Systems : Demand-Driven Incremental Compilers / En arkitektur för parallella och inkrementella kompilatorer

Nolander, Christofer January 2023 (has links)
Query systems were recently introduced as an architecture for constructing compilers, and have shown to enable fast and efficient incremental compilation, where results from previous builds is reused to accelerate future builds. With this architecture, a compiler is composed of several queries, each of which extracts a small piece of information about the source program. For example, one query might determine the type of a variable, and another the list of functions defined in some file. The dependencies of a query, which includes other queries or files on disk, are automatically recorded at runtime. With these dependencies, query systems can detect changes in their inputs and incorporate them into the final output, while reusing old results from queries which have not changed. This reduces the amount of work needed to recompile code, which saves both time and energy. We present a new parallel execution model for query systems using work-stealing, which dynamically balances the workload across multiple threads. This is facilitated by various augmentations to existing algorithms to allow concurrent operations. Furthermore, we introduce a novel data structure that accelerates incremental compilation for common use cases. We evaluated the impact of these augmentations by implementing a compiler frontend capable of parsing and type-checking the Go programming language. We demonstrate a 10x reduction in compile times using the parallel execution mode. Finally, under certain common conditions, we show a 5x reduction in incremental compile times compared to the state-of-the-art. / Query-system är en ny arkitektur som har använts för att implementera kompilatorer för programspråk och har ett fokus på att möjliggöra snabb och effektiv inkrementell kompilering. Med denna arkitektur består en kompilator flera olika mindre funktioner, som var och en svarar på en liten fråga om källprogrammet, såsom typen av en variabel eller listan över funktioner i en fil. Genom att spåra hur dessa funktioner anropar varandra, och den data de läser, kan kompilatorer upptäcka förändringar i sina indata och utföra den minimala mängd arbete som krävs för att sammanställa dessa förändringar i utdata. Detta minskar mängden arbete som behövs för att kompilera om kod, vilket sparar både tid och energi. I denna rapport presenterar vi en ny exekveringsmodell för Query-system som möjliggör parallellism med hjälp av work-stealing. Detta underlättas av flera tillägg till befintliga algoritmer som gör det möjligt att utföra alla operationer parallellt. Utöver detta introducerar vi även en ny datastruktur som gör inkrementell kompilering snabbare för många vanliga användningsområden. Vi utvärderade effekten av dessa förändringar genom att implementera ett kompilatorgränssnitt som kan analysera och verifiera korrekthet av typer Go-programmeringsspråket. Resultaten visar en 10x reduktion i kompileringstider med hjälp av parallellkörningsläget. Vi demonstrerar även 5 gånger lägre kompileringstider vid inkrementella ändringar än vad som tidigare varit möjligt.

Page generated in 0.423 seconds