Spelling suggestions: "subject:"search queda""
1 |
Term selection in information retrievalMaxwell, Kylie Tamsin January 2016 (has links)
Systems trained on linguistically annotated data achieve strong performance for many language processing tasks. This encourages the idea that annotations can improve any language processing task if applied in the right way. However, despite widespread acceptance and availability of highly accurate parsing software, it is not clear that ad hoc information retrieval (IR) techniques using annotated documents and requests consistently improve search performance compared to techniques that use no linguistic knowledge. In many cases, retrieval gains made using language processing components, such as part-of-speech tagging and head-dependent relations, are offset by significant negative effects. This results in a minimal positive, or even negative, overall impact for linguistically motivated approaches compared to approaches that do not use any syntactic or domain knowledge. In some cases, it may be that syntax does not reveal anything of practical importance about document relevance. Yet without a convincing explanation for why linguistic annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text can result in the repeated application, and mis-application, of language processing to enhance search performance. This dissertation investigates whether linguistics can improve the selection of query terms by better modelling the alignment process between natural language requests and search queries. It is the most comprehensive work on the utility of linguistic methods in IR to date. Term selection in this work focuses on identification of informative query terms of 1-3 words that both represent the semantics of a request and discriminate between relevant and non-relevant documents. Approaches to word association are discussed with respect to linguistic principles, and evaluated with respect to semantic characterization and discriminative ability. Analysis is organised around three theories of language that emphasize different structures for the identification of terms: phrase structure theory, dependency theory and lexicalism. The structures identified by these theories play distinctive roles in the organisation of language. Evidence is presented regarding the value of different methods of word association based on these structures, and the effect of method and term combinations. Two highly effective, novel methods for the selection of terms from verbose queries are also proposed and evaluated. The first method focuses on the semantic phenomenon of ellipsis with a discriminative filter that leverages diverse text features. The second method exploits a term ranking algorithm, PhRank, that uses no linguistic information and relies on a network model of query context. The latter focuses queries so that 1-5 terms in an unweighted model achieve better retrieval effectiveness than weighted IR models that use up to 30 terms. In addition, unlike models that use a weighted distribution of terms or subqueries, the concise terms identified by PhRank are interpretable by users. Evaluation with newswire and web collections demonstrates that PhRank-based query reformulation significantly improves performance of verbose queries up to 14% compared to highly competitive IR models, and is at least as good for short, keyword queries with the same models. Results illustrate that linguistic processing may help with the selection of word associations but does not necessarily translate into improved IR performance. Statistical methods are necessary to overcome the limits of syntactic parsing and word adjacency measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness, but methods that use simple features can be substantially more efficient and equally, or more, effective. Various explanations for this finding are suggested, including the probabilistic nature of grammatical categories, a lack of homomorphism between syntax and semantics, the impact of lexical relations, variability in collection data, and systemic effects in language systems.
|
2 |
Google ekonometrie: Aplikace na Českou republiku / Google Econometrics: An Application to the Czech RepublicPlatil, Lukáš January 2014 (has links)
This thesis examines the applicability of Google Econometrics - the use of search volume data of particular queries as explanatory variables in time se- ries modeling - in the case of the Czech Republic. We analyze the contribu- tion of Google data by comparing out-of-sample nowcasting performance and in-sample fit with control variables in three related areas: using an auto- regressive model for unemployment, vector autoregression and logit models for GDP and household consumption, and Granger causality test for consum- er confidence. The improvement in quality of unemployment nowcasting is modest but statistically significant; sentiment index based on Google queries shows reciprocal relationship with the official Consumer Confidence Indicator, and it also provides superior nowcasts for household consumption as well as in- sample fit in logit models; its performance in GDP nowcasting is average among control variables. In overall, the results suggest that Google Econometrics is applicable also to the Czech Republic, despite the fact that the internet penetration rate and Google popularity was lower over the analyzed period compared with developed economies where these methods were usually tested. In the future, Google data may be used together with other leading and coincident indica- tors to...
|
3 |
Named Entity Recognition for Search Queries in the Music Domain / Identifiering av namngivna enheter för sökfrågor inom musikdomänenLiljeqvist, Sandra January 2016 (has links)
This thesis addresses the problem of named entity recognition (NER) in music-related search queries. NER is the task of identifying keywords in text and classifying them into predefined categories. Previous work in the field has mainly focused on longer documents of editorial texts. However, in recent years, the application of NER for queries has attracted increased attention. This task is, however, acknowledged to be challenging due to queries being short, ungrammatical and containing minimal linguistic context. The usage of NER for queries is especially useful for the implementation of natural language queries in domain-specific search applications. These applications are often backed by a database, where the query format otherwise is restricted to keyword search or the usage of a formal query language. In this thesis, two techniques for NER for music-related queries are evaluated; a conditional random field based solution and a probabilistic solution based on context words. As a baseline, the most elementary implementation of NER, commonly applied on editorial text, is used. Both of the evaluated approaches outperform the baseline and demonstrate an overall F1 score of 79.2% and 63.4% respectively. The experimental results show a high precision for the probabilistic approach and the conditional random field based solution demonstrates an F1 score comparable to previous studies from other domains. / Denna avhandling redogör för identifiering av namngivna enheter i musikrelaterade sökfrågor. Identifiering av namngivna enheter innebär att extrahera nyckelord från text och att klassificera dessa till någon av ett antal förbestämda kategorier. Tidigare forskning kring ämnet har framför allt fokuserat på längre redaktionella dokument. Däremot har intresset för tillämpningar på sökfrågor ökat de senaste åren. Detta anses vara ett svårt problem då sökfrågor i allmänhet är korta, grammatiskt inkorrekta och innehåller minimal språklig kontext. Identifiering av namngivna enheter är framför allt användbart för domänspecifika sökapplikationer där målet är att kunna tolka sökfrågor skrivna med naturligt språk. Dessa applikationer baseras ofta på en databas där formatet på sökfrågorna annars är begränsat till att enbart använda nyckelord eller användande av ett formellt frågespråk. I denna avhandling har två tekniker för identifiering av namngivna enheter för musikrelaterade sökfrågor undersökts; en metod baserad på villkorliga slumpfält (eng. conditional random field) och en probabilistisk metod baserad på kontextord. Som baslinje har den mest grundläggande implementationen, som vanligtvis används för redaktionella texter, valts. De båda utvärderade metoderna presterar bättre än baslinjen och ges ett F1-värde på 79,2% respektive 63,4%. De experimentella resultaten visar en hög precision för den probabilistiska implementationen och metoden ba- serad på villkorliga slumpfält visar på resultat på en nivå jämförbar med tidigare studier inom andra domäner.
|
4 |
User-Centric Privacy Preservation in Mobile and Location-Aware ApplicationsGuo, Mingming 10 April 2018 (has links)
The mobile and wireless community has brought a significant growth of location-aware devices including smart phones, connected vehicles and IoT devices. The combination of location-aware sensing, data processing and wireless communication in these devices leads to the rapid development of mobile and location-aware applications. Meanwhile, user privacy is becoming an indispensable concern. These mobile and location-aware applications, which collect data from mobile sensors carried by users or vehicles, return valuable data collection services (e.g., health condition monitoring, traffic monitoring, and natural disaster forecasting) in real time. The sequential spatial-temporal data queries sent by users provide their location trajectory information. The location trajectory information not only contains users’ movement patterns, but also reveals sensitive attributes such as users’ personal habits, preferences, as well as home and work addresses. By exploring this type of information, the attackers can extract and sell user profile data, decrease subscribed data services, and even jeopardize personal safety.
This research spans from the realization that user privacy is lost along with the popular usage of emerging location-aware applications. The outcome seeks to relive user location and trajectory privacy problems. First, we develop a pseudonym-based anonymity zone generation scheme against a strong adversary model in continuous location-based services. Based on a geometric transformation algorithm, this scheme generates distributed anonymity zones with personalized privacy parameters to conceal users’ real location trajectories. Second, based on the historical query data analysis, we introduce a query-feature-based probabilistic inference attack, and propose query-aware randomized algorithms to preserve user privacy by distorting the probabilistic inference conducted by attackers. Finally, we develop a privacy-aware mobile sensing mechanism to help vehicular users reduce the number of queries to be sent to the adversarial servers. In this mechanism, mobile vehicular users can selectively query nearby nodes in a peer-to-peer way for privacy protection in vehicular networks.
|
5 |
Hur sökfraser är utformadeClarinsson, Richard January 2006 (has links)
<p>Millions of people are using search engines every day when they are trying to find information on the Internet. The purpose of this report is to find out how people formulate search queries. The result in this report is based on an empirical study which is based on a search log from the Swedish search engine Seek.se.</p><p>One of the results in this thesis is that nearly all search queries are based on keywords.</p> / <p>Miljoner människor använder sökmotorer varje dag när de försöker hitta information på Internet. Syftet med den här uppsatsen är att ta reda på hur individer formulerar sökfraser. Resultatet i rapporten är baserad på en empirisk studie som är baserad på sökloggen för den svenska sökmotorn Seek.se.</p><p>Uppsatsen kommer bl.a. fram till att nästan alla sökningar som görs på Internet är nyckelordsbaserade.</p>
|
6 |
Hur sökfraser är utformadeClarinsson, Richard January 2006 (has links)
Millions of people are using search engines every day when they are trying to find information on the Internet. The purpose of this report is to find out how people formulate search queries. The result in this report is based on an empirical study which is based on a search log from the Swedish search engine Seek.se. One of the results in this thesis is that nearly all search queries are based on keywords. / Miljoner människor använder sökmotorer varje dag när de försöker hitta information på Internet. Syftet med den här uppsatsen är att ta reda på hur individer formulerar sökfraser. Resultatet i rapporten är baserad på en empirisk studie som är baserad på sökloggen för den svenska sökmotorn Seek.se. Uppsatsen kommer bl.a. fram till att nästan alla sökningar som görs på Internet är nyckelordsbaserade.
|
7 |
Vers un meilleur accès aux informations pertinentes à l’aide du Web sémantique : application au domaine du e-tourisme / Towards a better access to relevant information with Semantic Web : application to the e-tourism domainLully, Vincent 17 December 2018 (has links)
Cette thèse part du constat qu’il y a une infobésité croissante sur le Web. Les deux types d’outils principaux, à savoir le système de recherche et celui de recommandation, qui sont conçus pour nous aider à explorer les données du Web, connaissent plusieurs problématiques dans : (1) l’assistance de la manifestation des besoins d’informations explicites, (2) la sélection des documents pertinents, et (3) la mise en valeur des documents sélectionnés. Nous proposons des approches mobilisant les technologies du Web sémantique afin de pallier à ces problématiques et d’améliorer l’accès aux informations pertinentes. Nous avons notamment proposé : (1) une approche sémantique d’auto-complétion qui aide les utilisateurs à formuler des requêtes de recherche plus longues et plus riches, (2) des approches de recommandation utilisant des liens hiérarchiques et transversaux des graphes de connaissances pour améliorer la pertinence, (3) un framework d’affinité sémantique pour intégrer des données sémantiques et sociales pour parvenir à des recommandations qualitativement équilibrées en termes de pertinence, diversité et nouveauté, (4) des approches sémantiques visant à améliorer la pertinence, l’intelligibilité et la convivialité des explications des recommandations, (5) deux approches de profilage sémantique utilisateur à partir des images, et (6) une approche de sélection des meilleures images pour accompagner les documents recommandés dans les bannières de recommandation. Nous avons implémenté et appliqué nos approches dans le domaine du e-tourisme. Elles ont été dûment évaluées quantitativement avec des jeux de données vérité terrain et qualitativement à travers des études utilisateurs. / This thesis starts with the observation that there is an increasing infobesity on the Web. The two main types of tools, namely the search engine and the recommender system, which are designed to help us explore the Web data, have several problems: (1) in helping users express their explicit information needs, (2) in selecting relevant documents, and (3) in valuing the selected documents. We propose several approaches using Semantic Web technologies to remedy these problems and to improve the access to relevant information. We propose particularly: (1) a semantic auto-completion approach which helps users formulate longer and richer search queries, (2) several recommendation approaches using the hierarchical and transversal links in knowledge graphs to improve the relevance of the recommendations, (3) a semantic affinity framework to integrate semantic and social data to yield qualitatively balanced recommendations in terms of relevance, diversity and novelty, (4) several recommendation explanation approaches aiming at improving the relevance, the intelligibility and the user-friendliness, (5) two image user profiling approaches and (6) an approach which selects the best images to accompany the recommended documents in recommendation banners. We implemented and applied our approaches in the e-tourism domain. They have been properly evaluated quantitatively with ground-truth datasets and qualitatively through user studies.
|
Page generated in 0.0676 seconds