Global ETD Search

21	Estudo de vocabulário controlado na indexação automática : aplicação no processo de indexação do Sistema de Indización Semiautomática (SISA) / Narukawa, Cristina Miyuki. January 2011 (has links) Resumo: A indexação automática é um processo complexo e delegar a atividade de atribuição de termos aos sistemas automáticos requer análise, tanto dos métodos, quanto das características dos instrumentos de indexação. Desse modo, propomos investigar a atuação de vocabulário controlado neste processo a partir da análise dos resultados de aplicação do vocabulário ThesAgro no Sistema de Indización SemiAutomatica (SISA), com objetivos de identificar as características que definem e distinguem os tipos de vocabulários; analisar propostas metodológicas e sistemas de indexação; aplicar o ThesAgro no sistema SISA em análise comparativa com a indexação manual da Biblioteca Nacional de Agricultura (BINAGRI), e analisar os fatores intervenientes que apontam os problemas ocasionados à indexação automática. De modo geral, buscamos contribuir com o desenvolvimento do tema ao levantar subsídios para adaptação de vocabulários controlados. Realizamos uma revisão teórica sobre sistemas de indexação automática e um experimento aplicando o ThesAgro no sistema SISA com 100 artigos da área agrícola, especificamente sobre fruticultura. Utilizamos, como parâmetro de avaliação, a indexação manual realizada pela BINAGRI e análise comparativa com os resultados de pesquisa anterior em que se avaliou o desempenho do vocabulário Descritores em Ciências da Saúde (DeCS) no referido sistema. A partir da análise dos resultados constatamos que o vocabulário condiciona os resultados do processo de indexação automática e, portanto, é necessário compreendê-lo, considerando os métodos de identificação das unidades representativas da informação, aplicação de tratamento linguístico, características da área do conhecimento, relações semânticas, idioma, atualização, uso de vocabulários... (Resumo completo, clicara acesso eletrônico abaixo) / Abstract: Automatic indexing is a complex process, and delegating the attribution of terms to automatic systems requires analyzing not only the methods, but also the features of indexing instruments. Thereby, we propose to investigate the role of controlled vocabulary in such process, based on the analysis of results from the application of ThesAgro vocabulary in the Semi-Automatic Indexing System (SISA - Sistema de Indización SemiAutomatica -), with the purposes of identifying the characteristics which define and distinguish the types of vocabularies; analyzing methodological proposals and indexing systems; applying the ThesAgro in the SISA, making a comparative analysis related to the manual indexing by the National Library of Agriculture (BINAGRI - Biblioteca Nacional de Agricultura), and analyzing the intervening factors pointing to the occurrence of problems concerning automatic indexing. As a general matter, we seek to contribute to the development of this theme by raising subsidies for adapting controlled vocabularies. We have performed a theoretical review on automatic indexing systems, and an experiment applying the ThesAgro in the SISA, with 100 articles on agriculture , specifically about fruit production. The manual indexing performed by BINAGRI and the comparative analysis with the results from a previous research, which evaluated the performance of the vocabulary from the Health Sciences Descriptors (DeCS - Descritores em Ciências da Saúde) in the before mentioned system, have served as the evaluation parameter. The analysis of results allows us to conclude that the vocabulary conditions the results of the automatic indexing process. Thus, it is necessary to understand it, considering the identification methods of the information representative units, application of linguistic treatment, features of the... (Complete abstract click electronic access below) / Orientador: Mariângela Spotti Lopes Fujita / Coorientador: Isidoro Gil Leiva / Banca: Renato Rocha Souza / Banca: José Augusto Chaves Guimarães / Mestre Ciência da informação. Indexação automática. Automatic indexing. eng Controlled vocabulary. eng Automatic Indexing Systems. eng
22	Building an Intelligent Filtering System Using Idea Indexing Yang, Li 08 1900 (has links) The widely used vector model maintains its popularity because of its simplicity, fast speed, and the appeal of using spatial proximity for semantic proximity. However, this model faces a disadvantage that is associated with the vagueness from keywords overlapping. Efforts have been made to improve the vector model. The research on improving document representation has been focused on four areas, namely, statistical co-occurrence of related items, forming term phrases, grouping of related words, and representing the content of documents. In this thesis, we propose the idea-indexing model to improve document representation for the filtering task in IR. The idea-indexing model matches document terms with the ideas they express and indexes the document with these ideas. This indexing scheme represents the document with its semantics instead of sets of independent terms. We show in this thesis that indexing with ideas leads to better performance. Information retrieval. Automatic indexing. Information retrieval vector model term indexing idea indexing
23	Redundancy on content-based indexing. January 1997 (has links) by Cheung King Lum Kingly. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 108-110). / Abstract --- p.ii / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Problems in Content-Based Indexing --- p.2 / Chapter 1.3 --- Contributions --- p.3 / Chapter 1.4 --- Thesis Organization --- p.4 / Chapter 2 --- Content-Based Indexing Structures --- p.5 / Chapter 2.1 --- R-Tree --- p.6 / Chapter 2.2 --- R+-Tree --- p.8 / Chapter 2.3 --- R-Tree --- p.11 / Chapter 3 --- Searching in Both R-Tree and R-Tree --- p.15 / Chapter 3.1 --- Exact Search --- p.15 / Chapter 3.2 --- Nearest Neighbor Search --- p.19 / Chapter 3.2.1 --- Definition of Searching Metrics --- p.19 / Chapter 3.2.2 --- Pruning Heuristics --- p.21 / Chapter 3.2.3 --- Nearest Neighbor Search Algorithm --- p.24 / Chapter 3.2.4 --- Generalization to N-Nearest Neighbor Search --- p.25 / Chapter 4 --- An Improved Nearest Neighbor Search Algorithm for R-Tree --- p.29 / Chapter 4.1 --- Introduction --- p.29 / Chapter 4.2 --- New Pruning Heuristics --- p.31 / Chapter 4.3 --- An Improved Nearest Neighbor Search Algorithm --- p.34 / Chapter 4.4 --- Replacing Heuristics --- p.36 / Chapter 4.5 --- N-Nearest Neighbor Search --- p.41 / Chapter 4.6 --- Performance Evaluation --- p.45 / Chapter 5 --- Overlapping Nodes in R-Tree and R*-Tree --- p.53 / Chapter 5.1 --- Overlapping Nodes --- p.54 / Chapter 5.2 --- Problem Induced By Overlapping Nodes --- p.57 / Chapter 5.2.1 --- Backtracking --- p.57 / Chapter 5.2.2 --- Inefficient Exact Search --- p.57 / Chapter 5.2.3 --- Inefficient Nearest Neighbor Search --- p.60 / Chapter 6 --- Redundancy On R-Tree --- p.64 / Chapter 6.1 --- Motivation --- p.64 / Chapter 6.2 --- Adding Redundancy on Index Tree --- p.65 / Chapter 6.3 --- R-Tree with Redundancy --- p.66 / Chapter 6.3.1 --- Previous Models of R-Tree with Redundancy --- p.66 / Chapter 6.3.2 --- Redundant R-Tree --- p.70 / Chapter 6.3.3 --- Level List --- p.71 / Chapter 6.3.4 --- Inserting Redundancy to R-Tree --- p.72 / Chapter 6.3.5 --- Properties of Redundant R-Tree --- p.77 / Chapter 7 --- Searching in Redundant R-Tree --- p.82 / Chapter 7.1 --- Exact Search --- p.82 / Chapter 7.2 --- Nearest Neighbor Search --- p.86 / Chapter 7.3 --- Avoidance of Multiple Accesses --- p.89 / Chapter 8 --- Experiment --- p.90 / Chapter 8.1 --- Experimental Setup --- p.90 / Chapter 8.2 --- Exact Search --- p.91 / Chapter 8.2.1 --- Clustered Data --- p.91 / Chapter 8.2.2 --- Real Data --- p.93 / Chapter 8.3 --- Nearest Neighbor Search --- p.95 / Chapter 8.3.1 --- Clustered Data --- p.95 / Chapter 8.3.2 --- Uniform Data --- p.98 / Chapter 8.3.3 --- Real Data --- p.100 / Chapter 8.4 --- Discussion --- p.102 / Chapter 9 --- Conclusions and Future Research --- p.105 / Chapter 9.1 --- Conclusions --- p.105 / Chapter 9.2 --- Future Research --- p.106 / Bibliography --- p.108 Automatic indexing Trees (Graph theory) Data structures (Computer Science)
24	Regensburger Verbundklassifikation und Schlagwortnormdatei im Tandem Probstmeyer, Judith 24 January 2011 (has links) (PDF) Im Katalog des Südwestverbunds besitzen zahlreiche Publikationen sowohl SWD-Schlagwörter und -ketten als auch Notationen der Regensburger Verbundklassifikation (RVK). An der Universitätsbibliothek Mannheim wurden auf dieser Datenbasis automatische Korrelationen zwischen SWD und RVK generiert, die im Rahmen einer Bachelorarbeit an der Hochschule der Medien Stuttgart analysiert wurden. Im Vortrag werden die Ergebnisse der Analyse vorgestellt und Überlegungen zu möglichen praktischen Anwendungen solcher Korrelationen angestellt. RVK SWD Automatische Erschließung Automatic indexing classification systems concordance ddc:020 Regensburger Verbundklassifikation Korrelation Konkordanz
25	On the construction and application of compressed text indexes Hon, Wing-kai., 韓永楷. January 2004 (has links) published_or_final_version / abstract / toc / Computer Science and Information Systems / Doctoral / Doctor of Philosophy Text processing (Computer science) Automatic indexing Data structures (Computer science) Pattern recognition systems.
26	Parallelism and distribution for very large scale content-based image retrieval Gudmundsson, Gylfi Thor 12 September 2013 (has links) (PDF) The scale of multimedia collections has grown very fast over the last few years. Facebook stores more than 100 billion images, 200 million are added every day. In order to cope with this growth, methods for content-based image retrieval must adapt gracefully. The work presented in this thesis goes in this direction. Two observations drove the design of the high-dimensional indexing technique presented here. Firstly, the collections are so huge, typically several terabytes, that they must be kept on secondary storage. Addressing disk related issues is thus central to our work. Secondly, all CPUs are now multi-core and clusters of machines are a commonplace. Parallelism and distribution are both key for fast indexing and high-throughput batch-oriented searching. We describe in this manuscript a high-dimensional indexing technique called eCP. Its design includes the constraints associated to using disks, parallelism and distribution. At its core is an non-iterative unstructured vectorial quantization scheme. eCP builds on an existing indexing scheme that is main memory oriented. Our first contribution is a set of extensions for processing very large data collections, reducing indexing costs and best using disks. The second contribution proposes multi-threaded algorithms for both building and searching, harnessing the power of multi-core processors. Datasets for evaluation contain about 25 million images or over 8 billion SIFT descriptors. The third contribution addresses distributed computing. We adapt eCP to the MapReduce programming model and use the Hadoop framework and HDFS for our experiments. This time we evaluate eCP's ability to scale-up with a collection of 100 million images, more than 30 billion SIFT descriptors, and its ability to scale-out by running experiments on more than 100 machines. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Automatic indexing
27	On the construction and application of compressed text indexes Hon, Wing-kai. January 2004 (has links) Thesis (Ph. D.)--University of Hong Kong, 2005. / Title proper from title frame. Also available in printed format.
28	Rapid pre-indexing by machine. January 1968 (has links) Bibliography: p. 61. / Issued also as a Master of Science thesis in the Dept. of Electrical Engineering, 1968. / M.I.T. Project DSR 70054. Research Grant NSFC-472. TK7855.M41 E386 no.355 Automatic indexing M.I.T. Project Intrex
29	Recuperação de informação baseada em ontologia: uma proposta utilizando o modelo vetorial / Ontology based information retrieval: a proposal using the vector space model Janaite Neto, Jorge [UNESP] 30 May 2018 (has links) Submitted by Jorge Janaite Neto (janaite@gmail.com) on 2018-06-24T23:56:37Z No. of bitstreams: 1 janaite_neto_j_me_mar.pdf: 1649007 bytes, checksum: 66467a076d4f716197896c6dc3c5ee2b (MD5) / Approved for entry into archive by Satie Tagara (satie@marilia.unesp.br) on 2018-06-25T13:46:39Z (GMT) No. of bitstreams: 1 janaiteneto_j_me_mar.pdf: 1649007 bytes, checksum: 66467a076d4f716197896c6dc3c5ee2b (MD5) / Made available in DSpace on 2018-06-25T13:46:39Z (GMT). No. of bitstreams: 1 janaiteneto_j_me_mar.pdf: 1649007 bytes, checksum: 66467a076d4f716197896c6dc3c5ee2b (MD5) Previous issue date: 2018-05-30 / Não recebi financiamento / A recuperação de informação ocorre por meio da comparação entre as representações dos documentos de um acervo e a representação da necessidade de informação do usuário. Um documento é recuperado quando sua representação coincidir total ou parcialmente com a representação da necessidade de informação do usuário. O processo de recuperação de informação pode ser visto como um problema linguístico no qual o conteúdo informacional dos documentos e a necessidade de informação do usuário são representados por um conjunto de termos. A eficiência do processo de recuperação de informação depende da qualidade das representações dos documentos e dos termos empregados pelo usuário para representar sua necessidade de informação. Quanto mais compatíveis forem essas representações maior será a eficiência do processo de recuperação. A partir de uma pesquisa exploratória e descritiva fundamentada em bibliografia específica, este trabalho propõe a utilização de ontologias computacionais em sistemas de recuperação de informação baseados no Modelo Espaço Vetorial. As ontologias são empregadas como estrutura terminológica externa utilizadas tanto na expansão dos termos de indexação quanto na expansão dos termos que compõe a expressão de busca. A expansão dos termos de indexação é feita logo após a extração dos termos mais representativos do documento em análise durante o processo de indexação, consistindo na adição de novos termos conceitualmente relacionados a fim de enriquecer a representação do documento. A expansão da consulta é obtida a partir da adição de novos termos relacionados aos já existentes na expressão de busca com o objetivo de melhor contextualizá-los. Nesta proposta utiliza-se apenas a estrutura terminológica e hierárquica oferecida por uma ontologia computacional OWL, sem considerar os demais tipos de relações possíveis nem as restrições lógicas que podem ser descritas, podendo esses recursos serem utilizados em trabalhos futuros na tentativa de melhorar ainda mais a eficiência do processo de recuperação. A proposta apresentada neste estudo pode ser implementada e futuramente tornar-se um sistema de recuperação de informação totalmente operacional. / The information retrieval occurs by means of match between the representations of documents from a collection and the representation of user information’s needs. A document is retrieved when its representation matches totally or partially to the user information’s needs. The process of information retrieval can be seen as a linguistic issue in which the document information content and the user information need are represented by a set of terms. Its efficiency depends on the quality of the representations of the documents and the terms used to represent the user’s information need. The more compatible these representations were, the more efficient the retrieval process. Based on an exploratory and descriptive research substantiated in a specific bibliography, this paper offers to use computational ontologies in information retrieval systems based on the Vector Space Model. The ontologies are applied as external terminological structures used in the indexing terms expansion as well as in the expansion of the terms which compound the query expression. The indexing terms expansion is made as soon as the extraction of the more representative terms of the document in analysis during the indexing process, consisting on the adding of new conceptually related terms in order to improve the document representation. Query expansion is obtained from adding new related terms to the existent ones in the query expression to better contextualize them. In this propose, only the terminological and hierarchical structure offered by an OWL computational ontology was used, regardless other possible relations and logical restrictions that could be descripted, saving these resources to be used in further works in an attempt to improve the retrieval process efficiency. The shown proposition can be implemented and become a fully operational information retrieval system. Recuperação de informação Ontologia Indexação automática Expansão de consulta OWL OWL2 Information retrieval Ontology Automatic indexing Query expansion
30	Ontologias no processo de indexação automática de documentos textuais / Pansani Junior, Eder Antonio. January 2016 (has links) Orientador: Edberto Ferneda / Banca: Mariângela Spotti Lopes Fujita / Banca: Elvis Fusco / Resumo: Apesar dos avanços tecnológicos das últimas décadas, a busca por informações relevantes ainda é uma tarefa árdua. A recuperação de informação envolve, por um lado, um acervo documental que deve ser representado por expressões linguísticas que resumem seu conteúdo temático. Por outro lado, pessoas tentam descrever linguisticamente as suas necessidades de informação a fim de obterem documentos relevantes para satisfazer tais necessidades. Um sistema de recuperação de informação é, portanto, um elemento mediador entre um acervo documental e seus requisitantes. Um dos aspectos que interferem diretamente na sua eficiência é a forma como os documentos são representados. Sendo assim, pesquisas sobre indexação automática tomam importância, principalmente em ambiente de grande produção e disseminação de documentos, como é o caso da Web. A utilização de vocabulários controlados como elementos de normalização terminológica é um recurso utilizado para melhorar os resultados do processo de indexação. Este trabalho tem por objetivo propor, avaliar e desenvolver um método de utilização de ontologias no processo de indexação automática de documentos textuais, fazendo uso da estrutura lógica e conceitual das ontologias de domínio e implementado um método que permite aos sistemas de indexação automática a realização de inferências automáticas, favorecendo uma representação dos documentos mais semântica e abrangente. Conclui-se com o estudo que a utilização das ontologias como vocabulários cont... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Despite the technological advances of recent decades, the search for relevant information is still an arduous task. The information retrieval involves, on the one hand, a documentary collection that must be represented by linguistic expressions which summarize its thematic content. On the other hand, people try describing linguistically their information needs in order to obtain relevant documents to satisfy those needs. An information retrieval system is therefore a mediating element between a documentary collection and its requesters. One of the aspects that directly interferes in their efficiency is how documents are represented. Therefore, researches on automatic indexing take importance, particularly, in an environment of large production and dissemination of documents, as it's the case of the Web. The use of controlled vocabularies as terminology standardization elements is a feature used to improve the results of the indexing process. This study aims to propose, evaluate and develop a method for using ontologies in the automatic indexing process of textual documents, making use of logical and conceptual structure of domain ontologies and implementing a method that enables automatic indexing systems, an execution of automatic inferences, favoring a semantic and comprehensive documents representation. The study conclusion is that the use of ontologies as controlled vocabularies in automatic indexing systems can offer promising results, allowing the automatic discovery of... (Complete abstract click electronic access below) / Mestre Indexação automática. Cabeçalhos de assunto. Recuperação da informação. Automatic indexing

Search results