Global ETD Search

521	Implications of Punctuation Mark Normalization on Text Retrieval Kim, Eungi 08 1900 (has links) This research investigated issues related to normalizing punctuation marks from a text retrieval perspective. A punctuated-centric approach was undertaken by exploring changes in meanings, whitespaces, words retrievability, and other issues related to normalizing punctuation marks. To investigate punctuation normalization issues, various frequency counts of punctuation marks and punctuation patterns were conducted using the text drawn from the Gutenberg Project archive and the Usenet Newsgroup archive. A number of useful punctuation mark types that could aid in analyzing punctuation marks were discovered. This study identified two types of punctuation normalization procedures: (1) lexical independent (LI) punctuation normalization and (2) lexical oriented (LO) punctuation normalization. Using these two types of punctuation normalization procedures, this study discovered various effects of punctuation normalization in terms of different search query types. By analyzing the punctuation normalization problem in this manner, a wide range of issues were discovered such as: the need to define different types of searching, to disambiguate the role of punctuation marks, to normalize whitespaces, and indexing of punctuated terms. This study concluded that to achieve the most positive effect in a text retrieval environment, normalizing punctuation marks should be based on an extensive systematic analysis of punctuation marks and punctuation patterns and their related factors. The results of this study indicate that there were many challenges due to complexity of language. Further, this study recommends avoiding a simplistic approach to punctuation normalization. Punctuation marks text retrieval normalization information retrieval non-alphanumeric characters
522	A Study on Fine-Grained User Behavior Analysis in Web Search / Web検索における細粒度ユーザ行動の分析に関する研究 Umemoto, Kazutoshi 23 March 2016 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19852号 / 情博第603号 / 新制\|\|情\|\|105(附属図書館) / 32888 / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授田中克己, 教授石田亨, 教授吉川正俊 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM search intent search outcome search behavior information retrieval 007
523	Information Retrieval for Call Center Quality Assurance McMurtry, William F. 02 October 2020 (has links) No description available. Computer Science Automatic Speech Recognition Information Retrieval Text Classification
524	Toward an Effective Automated Tracing Process Mahmoud, Anas Mohammad 17 May 2014 (has links) Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry. information foraging refactoring clustering semantics indexing information retrieval traceability
525	Information Retrieval Using Lucene and WordNet Whissel, Jhon F. 23 December 2009 (has links) No description available. Computer Science Lucene WordNet Nutch open source information retrieval
526	An Infrastructure for Performance Measurement and Comparison of Information Retrieval Solutions Saunders, Gary 13 August 2008 (has links) (PDF) The amount of information available on both public and private networks continues to grow at a phenomenal rate. This information is contained within a wide variety of objects, including documents, e-mail archives, medical records, manuals, pictures and music. To be of any value, this data must be easily searchable and accessible. Information Retrieval (IR) is concerned with the ability to find and gain access to relevant information. As electronic data repositories continue to proliferate, so too, grows the variety of methods used to locate and access the information contained therein. Similarly, the introduction of innovative retrieval strategies—and the optimization of older strategies—emphasizes the need for an infrastructure capable of measuring and comparing the performance of competing Information Retrieval solutions, but such an environment does not yet exist. The purpose of this research is to develop an infrastructure wherein Information Retrieval solutions may be evaluated and compared. In 1979, an expert in the field believed the need for a system-independent benchmarking utility was long overdue—twenty-five years later, progress in this area has been minimal. Contrastingly, new theories have emerged; new techniques have been introduced; all with the goal of improving retrieval performance. The need for a system-independent analysis of retrieval performance is more critical now. IR information retrieval performance measurement Databases and Information Systems
527	HyKSS: Hybrid Keyword and Semantic Search Zitzelberger, Andrew J. 09 August 2011 (has links) (PDF) The rapid production of digital information makes the task of locating relevant information increasingly difficult. Keyword search alleviates this difficulty by retrieving documents containing keywords of interest. However, keyword search suffers from a number of issues such ambiguity, synonymy, and the inability to handle semantic constraints. Semantic search helps resolve these issues but is limited by the quality of annotations which are likely to be incomplete or imprecise. Hybrid search, a search technique that combines the merits of both keyword and semantic search, appears to be a promising solution. In this work we introduce HyKSS, a hybrid search system driven by extraction ontologies for both annotation creation and query interpretation. HyKSS is not limited to a single domain, but rather allows queries to cross ontological boundaries. We show that our hybrid search system, which uses a query driven dynamic ranking mechanism, outperforms keyword and semantic search in isolation, as well as a number of other non-HyKSS hybrid ranking approaches, over data sets of short topical documents. We also find that there is not a statistically significant difference between using multiple ontologies for query generation and simply selecting and using the best matching ontology. hybrid search information retrieval ontologies cross-ontology queries Computer Sciences
528	Exploring Privacy and Personalization in Information Retrieval Applications Feild, Henry A. 01 September 2013 (has links) A growing number of information retrieval applications rely on search behavior aggregated over many users. If aggregated data such as search query reformulations is not handled properly, it can allow users to be identified and their privacy compromised. Besides leveraging aggregate data, it is also common for applications to make use of user-specific behavior in order to provide a personalized experience for users. Unlike aggregate data, privacy is not an issue in individual personalization since users are the only consumers of their own data. The goal of this work is to explore the effects of personalization and privacy preservation methods on three information retrieval applications, namely search task identification, task-aware query recommendation, and searcher frustration detection. We pursue this goal by first introducing a novel framework called CrowdLogging for logging and aggregating data privately over a distributed set of users. We then describe several privacy mechanisms for sanitizing global data, including one novel mechanism based on differential privacy. We present a template for describing how local user data and global aggregate data are collected, processed, and used within an application, and apply this template to our three applications. We find that sanitizing feature vectors aggregated across users has a low impact on performance for classification applications (search task identification and searcher frustration detection). However, sanitizing free-text query reformulations is extremely detrimental to performance for the query recommendation application we consider. Personalization is useful to some degree in all the applications we explore when integrated with global information, achieving gains for search task identification, task-aware query recommendation, and searcher frustration detection. Finally we introduce an open source system called CrowdLogger that implements the CrowdLogging framework and also serves as a platform for conducting in-situ user studies of search behavior, prototyping and evaluating information retrieval applications, and collecting labeled data. information retrieval personalization privacy web search Computer Sciences
529	Query-Dependent Selection of Retrieval Alternatives Balasubramanian, Niranjan 01 September 2011 (has links) The main goal of this thesis is to investigate query-dependent selection of retrieval alternatives for Information Retrieval (IR) systems. Retrieval alternatives include choices in representing queries (query representations), and choices in methods used for scoring documents. For example, an IR system can represent a user query without any modification, automatically expand it to include more terms, or reduce it by dropping some terms. The main motivation for this work is that no single query representation or retrieval model performs the best for all queries. This suggests that selecting the best representation or retrieval model for each query can yield improved performance. The key research question in selecting between alternatives is how to estimate the performance of the different alternatives. We treat query dependent selection as a general problem of selecting between the result sets of different alternatives. We develop a relative effectiveness estimation technique using retrieval-based features and a learning formulation that directly predict differences between the results sets. The main idea behind this technique is to aggregate the scores and features used for retrieval (retrieval-based features) as evidence towards the effectiveness of the results set. We apply this general technique to select between alternatives reduced versions for long queries and to combine multiple ranking algorithms. Then, we investigate the extension of query-dependent selection under specific efficiency constraints. Specifically, we consider the black-box meta-search scenario, where querying all available search engines can be expensive and the features and scores used by the search engines are not available. We develop easy-to-compute features based on the results page alone to predict when querying an alternate search engine can be useful. Finally, we present an analysis of selection performance to better understand when query-dependent selection can be useful. Information Retrieval Learning to Rank Machine Learning Computer Sciences
530	Query-Time Optimization Techniques for Structured Queries in Information Retrieval Cartright, Marc-Allen 01 September 2013 (has links) The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective, this translates into an increasing computational cost to generate the final ranked list in response to a query. Therefore we encounter an increasing tension in the trade-off between retrieval effectiveness (quality of result list) and efficiency (the speed at which the list is generated). This tension creates a strong need for optimization techniques to improve the efficiency of ranking with respect to these more complex retrieval models This thesis presents three new optimization techniques designed to deal with different aspects of structured queries. The first technique involves manipulation of interpolated subqueries, a common structure found across a large number of retrieval models today. We then develop an alternative scoring formulation to make retrieval models more responsive to dynamic pruning techniques. The last technique is delayed execution, which focuses on the class of queries that utilize term dependencies and term conjunction operations. In each case, we empirically show that these optimizations can significantly improve query processing efficiency without negatively impacting retrieval effectiveness. Additionally, we implement these optimizations in the context of a new retrieval system known as Julien. As opposed to implementing these techniques as one-off solutions hard-wired to specific retrieval models, we treat each technique as a ``behavioral'' extension to the original system. This allows us to flexibly stack the modifications to use the optimizations in conjunction, increasing efficiency even further. By focusing on the behaviors of the objects involved in the retrieval process instead of on the details of the retrieval algorithm itself, we can recast these techniques to be applied only when the conditions are appropriate. Finally, the modular design of these components illustrates a system design that allows improvements to be implemented without disturbing the existing retrieval infrastructure. algorithms information retrieval optimization search Artificial Intelligence and Robotics Computer Sciences

Search results