Spelling suggestions: "subject:"search tem""
1 |
Search Term Selection and Document Clustering for Query SuggestionZhang, Xiaomin 06 1900 (has links)
In order to improve a user's query and help the user quickly satisfy his/her information need, most search engines provide query suggestions that are meant to be relevant alternatives to the user's query. This thesis builds on the query suggestion system and evaluation methodology described in Shen Jiang's Masters thesis (2008). Jiang's system constructs query suggestions by searching for lexical aliases of web documents and then applying query search to the lexical aliases. A lexical alias for a web document is a list of terms that return the web document in a top-ranked position. Query search is a search process that finds useful combinations of search terms. The main focus of this thesis is to supply alternatives for the components of Jiang's system. We suggest three term scoring mechanisms and generalize Jiang's lexical alias search to be a general search for terms that are useful for constructing good query suggestions. We also replace Jiang's top-down query search
by a bottom-up beam search method. We experimentally show that our query suggestion method improves Jiang's system by 30% for short queries and 90% for long queries using Jiang's evaluation method. In addition, we add new evidence supporting Jiang's conclusion that terms in the user's initial query terms are important to include in the query suggestions.
In addition, we explore the usefulness of document clustering in creating query suggestions. Our experimental results are the opposite of what we expected: query suggestion based on clustering does not perform nearly as well, in terms of the "coverage" scores we are using for evaluation, as our best method that is not based on document clustering.
|
2 |
Search Term Selection and Document Clustering for Query SuggestionZhang, Xiaomin Unknown Date
No description available.
|
3 |
A Content Analysis of Online HPV Immunization InformationPappa, Sara T. January 2016 (has links)
No description available.
|
4 |
The Impact of Database Querying Exactitude in Intellectual Property Law Practice in BrazilHemerly, Henrique January 2020 (has links)
In current business affairs, most executive professions require one or several kinds of data consultation in their practice. Nowadays, the majority of data either is or has been digitalized and digital data is defined as information represented in a discrete and discontinuous manner. For accessibility purposes, data are often stored in databases that organize information via design and modeling techniques to facilitate querying. Data retrieval is crucial and if this process lacks efficacy, users either are presented incomplete information or are forced to perform repetitive queries. Intellectual property (IP) lawyers in Brazil are among that group and must regularly access a private database for trademark information. While it contains all the data they require, the database’s querying mechanisms are not tailored for IP law practice. The existing filters and lack of replacement algorithms often yield incomplete results, increasing time and resources dispended. With millions of dollars in potential lawsuits and work-hours, the purpose of this study is to investigate whether an IP-focused querying system could help mitigate this resource waste, facilitating the trademark comparison work of IP lawyers. For this, a new orthographic and phonetically focused querying logic was implemented. ANOVA tests and a questionnaire were used to compare the existing querying mechanism with the new one in terms of time, work satisfaction and querying accuracy. Results indicate the new querying system significantly decreased the amount of searches needed to execute a complete trademark analysis, while lawyers averaged the same amount of time to complete their work. Lawyers also reported higher work satisfaction levels and perceived increase in work efficiency.
|
5 |
EXPLORING HEALTH WEBSITE USERS BY WEB MININGKong, Wei 07 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / With the continuous growth of health information on the Internet, providing user-orientated health service online has become a great challenge to health providers. Understanding the information needs of the users is the first step to providing tailored health service. The purpose of this study is to examine the navigation behavior of different user groups by extracting their search terms and to make some suggestions to reconstruct a website for more customized Web service. This study analyzed five months’ of daily access weblog files from one local health provider’s website, discovered the most popular general topics and health related topics, and compared the information search strategies for both patient/consumer and doctor groups. Our findings show that users are not searching health information as much as was thought. The top two health topics which patients are concerned about are children’s health and occupational health. Another topic that both user groups are interested in is medical records. Also, patients and doctors have different search strategies when looking for information on this website. Patients get back to the previous page more often, while doctors usually go to the final page directly and then leave the page without coming back. As a result, some suggestions to redesign and improve the website are discussed; a more intuitive portal and more customized links for both user groups are suggested.
|
6 |
Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMedEisinger, Daniel 07 October 2013 (has links)
The patent domain is a very important source of scientific information that is currently not used to its full potential. Searching for relevant patents is a complex task because the number of existing patents is very high and grows quickly, patent text is extremely complicated, and standard vocabulary is not used consistently or doesn’t even exist. As a consequence, pure keyword searches often fail to return satisfying results in the patent domain. Major companies employ patent professionals who are able to search patents effectively, but even they have to invest a lot of time and effort into their search. Academic scientists on the other hand do not have access to such resources and therefore often do not search patents at all, but they risk missing up-to-date information that will not be published in scientific publications until much later, if it is published at all.
Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Similarly, professional patent searches expand beyond keywords by including class codes from various patent classification systems. However, classification-based searches can only be performed effectively if the user has very detailed knowledge of the system, which is usually not the case for academic scientists. Consequently, we investigated methods to automatically identify relevant classes that can then be suggested to the user to expand their query. Since every patent is assigned at least one class code, it should be possible for these assignments to be used in a similar way as the MeSH annotations in PubMed.
In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. In order to gain such knowledge, we perform an in-depth comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms. Our analysis shows that the hierarchies are structurally similar, but terms and annotations differ significantly. The most important differences concern the considerably higher complexity of the IPC class definitions compared to MeSH terms and the far lower number of class assignments to the average patent compared to the number of MeSH terms assigned to PubMed documents.
As a result of these differences, problems are caused both for unexperienced patent searchers and professionals. On the one hand, the complex term system makes it very difficult for members of the former group to find any IPC classes that are relevant for their search task. On the other hand, the low number of IPC classes per patent points to incomplete class assignments by the patent office, therefore limiting the recall of the classification-based searches that are frequently performed by the latter group. We approach these problems from two directions: First, by automatically assigning additional patent classes to make up for the missing assignments, and second, by automatically retrieving relevant keywords and classes that are proposed to the user so they can expand their initial search.
For the automated assignment of additional patent classes, we adapt an approach to the patent domain that was successfully used for the assignment of MeSH terms to PubMed abstracts. Each document is assigned a set of IPC classes by a large set of binary Maximum-Entropy classifiers. Our evaluation shows good performance by individual classifiers (precision/recall between 0:84 and 0:90), making the retrieval of additional relevant documents for specific IPC classes feasible. The assignment of additional classes to specific documents is more problematic, since the precision of our classifiers is not high enough to avoid false positives. However, we propose filtering methods that can help solve this problem.
For the guided patent search, we demonstrate various methods to expand a user’s initial query. Our methods use both keywords and class codes that the user enters to retrieve additional relevant keywords and classes that are then suggested to the user. These additional query components are extracted from different sources such as patent text, IPC definitions, external vocabularies and co-occurrence data. The suggested expansions can help unexperienced users refine their queries with relevant IPC classes, and professionals can compose their complete query faster and more easily. We also present GoPatents, a patent retrieval prototype that incorporates some of our proposals and makes faceted browsing of a patent corpus possible.
|
Page generated in 0.0487 seconds