Spelling suggestions: "subject:"forminformation retrieval"" "subject:"informationation retrieval""
501 |
Use of ontologies in information extractionWimalasuriya, Daya Chinthana 03 1900 (has links)
xiii, 149 p. : ill. (some col.) / Information extraction (IE) aims to recognize and retrieve certain types of information from natural language text. For instance, an information extraction system may extract key geopolitical indicators about countries from a set of web pages while ignoring other types of information. IE has existed as a research field for a few decades, and ontology-based information extraction (OBIE) has recently emerged as one of its subfields. Here, the general idea is to use ontologies--which provide formal and explicit specifications of shared conceptualizations--to guide the information extraction process. This dissertation presents two novel directions for ontology-based information extraction in which ontologies are used to improve the information extraction process.
First, I describe how a component-based approach for information extraction can be designed through the use of ontologies in information extraction. A key idea in this approach is identifying components of information extraction systems which make extractions with respect to specific ontological concepts. These components are termed "information extractors". The component-based approach explores how information extractors as well as other types of components can be used in developing information extraction systems. This approach has the potential to make a significant contribution towards the widespread usage and commercialization of information extraction.
Second, I describe how an ontology-based information extraction system can make use of multiple ontologies. Almost all previous systems use a single ontology, although multiple ontologies are available for most domains. Using multiple ontologies in information extraction has the potential to extract more information from text and thus leads to an improvement in performance measures. The concept of information extractor, conceived in the component-based approach for information extraction, is used in designing the principles for accommodating multiple ontologies in an ontology-based information extraction system. / Committee in charge: Dr. Dejing Dou, Chair;
Dr. Arthur Farley, Member;
Dr. Michal Young, Member;
Dr. Monte Westerfield, Outside Member
|
502 |
Term selection in information retrievalMaxwell, Kylie Tamsin January 2016 (has links)
Systems trained on linguistically annotated data achieve strong performance for many language processing tasks. This encourages the idea that annotations can improve any language processing task if applied in the right way. However, despite widespread acceptance and availability of highly accurate parsing software, it is not clear that ad hoc information retrieval (IR) techniques using annotated documents and requests consistently improve search performance compared to techniques that use no linguistic knowledge. In many cases, retrieval gains made using language processing components, such as part-of-speech tagging and head-dependent relations, are offset by significant negative effects. This results in a minimal positive, or even negative, overall impact for linguistically motivated approaches compared to approaches that do not use any syntactic or domain knowledge. In some cases, it may be that syntax does not reveal anything of practical importance about document relevance. Yet without a convincing explanation for why linguistic annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text can result in the repeated application, and mis-application, of language processing to enhance search performance. This dissertation investigates whether linguistics can improve the selection of query terms by better modelling the alignment process between natural language requests and search queries. It is the most comprehensive work on the utility of linguistic methods in IR to date. Term selection in this work focuses on identification of informative query terms of 1-3 words that both represent the semantics of a request and discriminate between relevant and non-relevant documents. Approaches to word association are discussed with respect to linguistic principles, and evaluated with respect to semantic characterization and discriminative ability. Analysis is organised around three theories of language that emphasize different structures for the identification of terms: phrase structure theory, dependency theory and lexicalism. The structures identified by these theories play distinctive roles in the organisation of language. Evidence is presented regarding the value of different methods of word association based on these structures, and the effect of method and term combinations. Two highly effective, novel methods for the selection of terms from verbose queries are also proposed and evaluated. The first method focuses on the semantic phenomenon of ellipsis with a discriminative filter that leverages diverse text features. The second method exploits a term ranking algorithm, PhRank, that uses no linguistic information and relies on a network model of query context. The latter focuses queries so that 1-5 terms in an unweighted model achieve better retrieval effectiveness than weighted IR models that use up to 30 terms. In addition, unlike models that use a weighted distribution of terms or subqueries, the concise terms identified by PhRank are interpretable by users. Evaluation with newswire and web collections demonstrates that PhRank-based query reformulation significantly improves performance of verbose queries up to 14% compared to highly competitive IR models, and is at least as good for short, keyword queries with the same models. Results illustrate that linguistic processing may help with the selection of word associations but does not necessarily translate into improved IR performance. Statistical methods are necessary to overcome the limits of syntactic parsing and word adjacency measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness, but methods that use simple features can be substantially more efficient and equally, or more, effective. Various explanations for this finding are suggested, including the probabilistic nature of grammatical categories, a lack of homomorphism between syntax and semantics, the impact of lexical relations, variability in collection data, and systemic effects in language systems.
|
503 |
Ontologia como interface de apresentação de resultados de busca : uma proposta baseada no modelo espaço vetorial /Lopes, Tatiane dos Santos de Freitas. January 2017 (has links)
Orientador: Edberto Ferneda / Banca: Maria José Vicentini Jorente / Banca: Luciana Maria Vieira Pöttker / Resumo: Um sistema de recuperação de informação é um elemento mediador entre um acervo documental e os usuários que buscam por documentos relevantes. Nesse contexto, as interfaces desempenham uma função importante: em um primeiro momento, auxiliando o usuário na tarefa de expressar a sua necessidade de informação por meio de uma expressão de busca e, em um segundo momento, fornecendo recursos para ajudá-lo a selecionar documentos relevantes dentre os resultados obtidos. A recuperação de informação é um processo linguístico cuja eficiência depende de coincidências terminológicas entre a expressão de busca do usuário e a representação dos documentos. Este trabalho propõe um modelo de interface na qual a estrutura terminológica de uma ontologia é utilizada para auxiliar o usuário na seleção de documentos relevantes dentre aqueles resultantes de sua busca. Caracteriza-se como uma pesquisa de natureza aplicada, e exploratória e bibliográfica quanto aos procedimentos. Conclui-se que a apresentação visual de uma ontologia permite o desenvolvimento de interfaces dinâmicas e interativas, proporcionando ao usuário uma navegação estimulante e prazerosa por entre os documentos resultantes de sua busca, tendo por base os termos de uma determinada área de conhecimento. / Abstract: An information retrieval system is a mediating element between a document collection and the users who looking for relevant documents. In this context, interfaces play an important role: firstly, assisting the user to expressing their information need by means of a search expression, and secondly by providing resources to help selecting relevant documents from the obtained results. The information retrieval is a linguistic process whose efficiency depends on terminological coincidences between the user's query and the representation of documents. This work proposes an interface model in which the terminological structure of an ontology is used to assist the user in the selection of relevant documents among those resulting from their search. It is characterized as an applied, exploratory and bibliographic research. It is concluded that the visual presentation of ontology allows the development of dynamic and interactive interfaces, providing the user with stimulating and pleasant navigation among the documents resulting from their search, based on the terms of a certain knowledge area. / Mestre
|
504 |
Tratamento de listas na linguagem FORTRAN-sistema SLIP .Implantacao no computador IBM 1620 .Servico de calculo analogico e digitalSILVA, LUCIA F. 09 October 2014 (has links)
Made available in DSpace on 2014-10-09T12:24:07Z (GMT). No. of bitstreams: 0 / Made available in DSpace on 2014-10-09T14:07:43Z (GMT). No. of bitstreams: 1
01043.pdf: 6108658 bytes, checksum: 84a91ac096bef28c5f0c4c3c285a4579 (MD5) / Dissertação (Mestrado) / IEA/D / Instituto de Pesquisas Energeticas e Nucleares - IPEN/CNEN-SP
|
505 |
Visualização de similaridades em bases de dados de música / Visualization of similarities in song data setsJorge Henrique Piazentin Ono 30 June 2015 (has links)
Coleções de músicas estão amplamente disponíveis na internet e, graças ao crescimento na capacidade de armazenamento e velocidade de transmissão de dados, usuários podem ter acesso a uma quantidade quase ilimitada de composições. Isso levou a uma maior necessidade de organizar, recuperar e processar dados musicais de modo automático. Visualização de informação é uma área de pesquisa que possibilita a análise visual de grandes conjuntos de dados e, por isso, é uma ferramenta muito valiosa para a exploração de bibliotecas musicais. Nesta dissertação, metodologias para a construção de duas técnicas de visualização de bases de dados de música são propostas. A primeira, Grafo de Similaridades, permite a exploração da base de dados em termos de similaridades hierárquicas. A segunda, RadViz Concêntrico, representa os dados em termos de tarefas de classificação e permite que o usuário altere a visualização de acordo com seus interesses. Ambas as técnicas são capazes de revelar estruturas de interesse no conjunto de dados, facilitando o seu entendimento e exploração. / Music collections are widely available on the internet and, leveraged by the increasing storage and bandwidth capability, users can currently access a multitude of songs. This leads to a growing demand towards automated methods for organizing, retrieving and processing music data. Information visualization is a research area that allows the analysis of large data sets, thus, it is a valuable tool for the exploration of music libraries. In this thesis, methodologies for the development of two music visualization techniques are proposed. The first, Similarity Graph, enables the exploration of data sets in terms of hierarchical similarities. The second, Concentric RadViz, represents the data in terms of classification tasks and enables the user to alter the visualization according to his interests. Both techniques are able to reveal interesting structures in the data, favoring its understanding and exploration.
|
506 |
Search and retrieval of source code using the faceted approachMendes, Rodrigo Cavalcante 31 January 2008 (has links)
Made available in DSpace on 2014-06-12T15:54:33Z (GMT). No. of bitstreams: 2
arquivo1977_1.pdf: 1819935 bytes, checksum: 4162d549c75996e549bcc53383212659 (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2008 / Software Reuse has been considered a key concept to increase the quality and
productivity of the software development by the reuse of existing artifacts,
avoiding build new ones from scratch. However, In order to obtain effective
benefits from the software reuse is necessary a set of complementary resources
such as: education, active management support and the introduction of
appropriate process and tools.
In fact, resources that provide mechanism to ease the access of reusable
components, such as search and retrieval tools, appear as potential instruments
in favor of reuse programs adoption in the organizations. One of the challenges
of the search and retrieval tools is how to make that existing components
returned have a significant relevance.
In this sense, the use of the faceted approach rises as a suitable
alternative. This approach proposes the creation of a vocabulary supported by
attributes, dividing the components into group of classes based on pre-defined
keywords, increasing the level of precision and providing a more flexible
classification.
Thus, this work presents an extension of search and retrieval tool of
reusable components, source code in particular, using the faceted classification
approach. In addition, also was developed an auxiliary tool to aid the Domain
expert to perform his activities using this approach. Finally, an experimental
study evaluates the proposed solution
|
507 |
Tratamento de listas na linguagem FORTRAN-sistema SLIP .Implantacao no computador IBM 1620 .Servico de calculo analogico e digitalSILVA, LUCIA F. 09 October 2014 (has links)
Made available in DSpace on 2014-10-09T12:24:07Z (GMT). No. of bitstreams: 0 / Made available in DSpace on 2014-10-09T14:07:43Z (GMT). No. of bitstreams: 1
01043.pdf: 6108658 bytes, checksum: 84a91ac096bef28c5f0c4c3c285a4579 (MD5) / Dissertação (Mestrado) / IEA/D / Instituto de Pesquisas Energeticas e Nucleares - IPEN/CNEN-SP
|
508 |
Seleção de notícias online para inteligência competitiva: uso de ontologia de domínio do negócio para expansão semântica da busca na internet / Selection of online news for competitive intelligence: use of business domain ontology for internet search semantic query expansionCleber Marchetti Duranti 02 September 2013 (has links)
DURANTI, Cleber Marchetti. Seleção de notícias online para inteligência competitiva - Uso de ontologia de domínio do negócio para expansão semântica da busca na internet. São Paulo, 2013. Tese (Doutorado em Administração) - Departamento de Administração, Faculdade de Economia, Administração e Contabilidade da Universidade de São Paulo. A internet disponibiliza o acesso a notícias e informações em volume crescente a respeito do ambiente em que as empresas operam, e estas precisam se manter a par dos movimentos dos atores do seu mercado de atuação e dos temas pertinentes ao seu negócio para se manterem competitivas. O crescente volume de dados, porém, leva à sobrecarga de informações, quando o volume de informações disponíveis é maior que a capacidade de processamento dos usuários. Torna-se então necessário o desenvolvimento de métodos e ferramentas que ajudem a separar a informação potencialmente útil da informação irrelevante. Este trabalho apresenta o desenvolvimento de uma ferramenta que utiliza a modelagem da área de negócio na forma de uma ontologia como subsídio para formulação de melhores buscas na internet, através da expansão semântica interativa das palavras-chave utilizadas pelos usuários quando da busca num buscador comum da internet - ainda o método mais utilizado para coleta de informações da internet. Uma ontologia do domínio de negócio \"Outsourcing de TI\" e uma interface para uso dessa ontologia na expansão das buscas dentro deste domínio são desenvolvidos. O protótipo é testado por meio de simulações de buscas e testes por usuários da área de TI, com os quais é feito um levantamento de aceitação de tecnologia utilizando o modelo TAM-3 adaptado para a avaliação do protótipo. Os resultados do levantamento indicam uma boa aceitação da solução nos aspectos de utilidade, facilidade de uso e nas demais dimensões do modelo TAM3. / The internet provides access to news and information in increasing volume about the environment in which companies operate, and they need to keep up to date about the movements of the actors of their market and the topics relevant to their business in order to keep their competitiveness. The growing volume of data, however, leads to information overload, when the amount of information available is larger than the processing capacity of its users. It becomes necessary then to develop methods and tools that help separate potentially useful information from irrelevant information. This research presents the development of a tool that uses the modeling of the a business area in the form of an ontology as a support for the formulation of better internet searches through interactive semantic expansion of keywords used by users when searching in an usual internet search engine - still the most widely used method for collecting information from the internet. An ontology of the business domain \"IT outsourcing\" and an interface to use this ontology in the expansion of searches in this area are developed. The prototype is tested by simulations and test searches by IT users with whom a survey is done using the qualitative model TAM-3 adapted to evaluate the prototype. The survey results show good acceptance of the solution in the aspects of usefulness, easy of use and the other dimensions of the TAM3 model.
|
509 |
Information seeking behaviour of postgraduate students: a study of Rhodes University and the University of Fort HareMonyela, Madireng Jane January 2013 (has links)
Information is documented as data value in planning, decision making and evaluation of any programme, therefore any informed decision would be based on the kind of information that the decision maker has. Information seeking behaviour can be described as an individual’s manner of gathering and sourcing information for personal use, knowledge update and development. In the light of this Information, this study examined the information seeking behaviour of postgraduate students at the University of Fort Hare and Rhodes University. The study went further to understand the impact the introduction of new technology has on postgraduate students’ information seeking behaviour. The study was limited to postgraduate students in the faculties of Humanities, Social Sciences and Education at the University of Fort Hare and Rhodes University. These disciplines were selected because of Whitemire (2002:637)’s opinion that students studying humanities, social sciences and education carry out more information seeking activities than students studying hard sciences such as Mathematics and other Natural Sciences. The aim of the study was to establish how postgraduate students seek and gather information for academic use. The objectives of the study were as follows: to find out information sources that postgraduate students value the most and determine where they find such resources; to identify the activities postgraduate students engage in when seeking for information; to establish the factors which influence postgraduate students information seeking behaviour and to determine methods that postgraduate students use to obtain relevant information. Both quantitative and qualitative research methodologies were employed in a survey. The main research instrument was a questionnaire supported by focus groups and face to face interviews. The results showed that postgraduate students utilised different sources of information when seeking information for academic use. Internet usage however was established as the information source that postgraduate students valued and relied on most. Few respondents indicated that they still visited the library and browsed the shelves and found information that met their needs in books. Reports on consulting librarians for help were low. Although the study was not on information seeking behaviour and age, the researcher noticed that mature students did not make use of information technologies effectively and also called themselves “Born Before Technology” generation. The study also established that postgraduate student’s preferred or conveniently accessed Internet and other electronic sources of information in the libraries, even though the two Universities have postgraduate computer laboratories and students could also access electronic sources of information at their residences through wireless connection. The researcher also noted that postgraduate students relied more on lecturers and supervisors for the choice of information sources, rather than independently searching to find the most appropriate documents to use. Postgraduate students used keywords to obtain relevant information when searching electronic sources. The respondents strongly agreed that they felt frustrated, confused, disappointed and demotivated if they did not find relevant information for their searches. This validates Kuhlthau (1991)’s Information Seeking Process model (ISP) as it not only focuses on the information seeking process, but also on emotions, thoughts and expressions of the user when searching information. The study recommends the following: optional computer literacy programmes for postgraduate students, extended library orientation for postgraduate students, mentorship programmes, extended information literacy programmes, appointment of research and subject librarians as well as more faculty librarians and improvement in library marketing.
|
510 |
Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database SearchesPorter, Brandi 01 January 2009 (has links)
Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search techniques are different between the two information retrieval systems. Students often prefer searching the Web, but in doing so often miss higher quality materials that may be available only through their library. Furthermore, each system uses different information retrieval algorithms for producing results, so proficiency in one search system may not transfer to another.
Web based information retrieval systems are unable to search and retrieve many resources available in libraries and other proprietary information retrieval systems, often referred to as the Invisible Web. These are resources that are not available to the general public and are password protected (from anyone not considered to be an affiliated user of that particular organization). These resources are often licensed to libraries by third party vendors or publishers and include fee-based access to content. Therefore, many millennial students may not be accessing many scholarly resources available to them if they were to use Web based information retrieval systems.
Investigation of how millennial students approach searches for the same topic in both systems was conducted. The goal was to build upon theory of why students search using various techniques, why they often choose the Web for their searches, and what can be done to improve library online information retrieval systems. Mixed qualitative methods of data gathering were used to elicit this information.
The investigation showed that millennial undergraduate students lacked detailed search strategies, and often used the same search techniques regardless of system or subject. Students displayed greater familiarity and ease of use with Web based IR systems than online library IR systems. Results illustrated suggestions for search design enhancements to library online information retrieval systems such as better natural language searching and easier linking to full text articles. Design enhancements based on millennial search strategies should encourage students to use library-based information retrieval systems more often.
|
Page generated in 0.1284 seconds