• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 9
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 46
  • 46
  • 16
  • 16
  • 13
  • 12
  • 10
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 7
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

La automatización de la indización: propuesta teórica-metodológica. Aplicación en el área de biblioteconomía y documentación

Gil Leiva, Isidoro 17 November 1997 (has links)
Se expone un marco conceptual sobre la automatización de la indización concretado en su delimitación, los posicionamientos de los investigadores en Biblioteconomía y Documentación con respecto a estas indagaciones, el desarrollo diacrónico ocurrido en esta automatización, y en la explicitación de la interdisciplinariedad inherente a este proceso. Se presenta una propuesta teórico-metodológica para diseñar un procedimiento semiautomático para la indización de documentos sobre Biblioteconomía y Documentación constituído por cuatro módulos. En los tres primeros se preparan las fuentes utilizadas, se seleccionan los términos candidatos a descriptores y se valoran y ponderan dichos términos, mientras que en el cuarto módulo el usuario ejecuta una validación y edición interactiva de los resultados propuestos. El sistema se fundamenta en el uso de un vocabulario controlado sobre Biblioteconomía y Documentación construido para tal fin. La consistencia media obtenida entre la indización de cincuenta artículos analizados por indizadores de la Base de datos ISOC y por nuestra propuesta está dentro del rango de otros sistemas de indización automática. / A conceptual frame is exposed on automatic indexing concretized in his definition, the positionings of the investigators in Information Science with regard to these investigations and the development happened in this process. One methodological-theoretical offer is presented to design a semiautomatic indexing system for documents indexing on Information Science compound for four modules. In the three first the sources are prepared, candidates terms are selected for descriptors and the above mentioned terms are valued and weight, whereas in the fourth module the user executes one validation and interactive edition of the results.
12

WebDoc an automated Web document indexing system /

Tang, Bo. January 2002 (has links)
Thesis (M.S.)--Mississippi State University. Department of Computer Science. / Title from title screen. Includes bibliographical references.
13

Automatic indexing and abstracting of document texts /

Moens, Marie-Francine. January 2000 (has links)
Univ., Diss.--Leuven, 1999. / Includes bibliographical references (p. [237] - 260) and index.
14

Discovering interpretable topics in free-style text diagnostics, rare topics, and topic supervision /

Zheng, Ning, January 2008 (has links)
Thesis (Ph. D.)--Ohio State University, 2008. / Title from first page of PDF file. Includes bibliographical references (p. 105-108).
15

Succinct Data Structures

Gupta, Ankur, January 2007 (has links)
Thesis (Ph. D.)--Duke University, 2007.
16

Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.

Csomai, Andras 05 1900 (has links)
This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.
17

Parallelism and distribution for very large scale content-based image retrieval / Parallélisme et distribution pour des bases d'images à très grande échelle

Gudmunsson, Gylfi Thor 12 September 2013 (has links)
Les volumes de données multimédia ont fortement crus ces dernières années. Facebook stocke plus de 100 milliards d'images, 200 millions sont ajoutées chaque jour. Cela oblige les systèmes de recherche d'images par le contenu à s'adapter pour fonctionner à ces échelles. Les travaux présentés dans ce manuscrit vont dans cette direction. Deux observations essentielles cadrent nos travaux. Premièrement, la taille des collections d'images est telle, plusieurs téraoctets, qu'il nous faut obligatoirement prendre en compte les contraintes du stockage secondaire. Cet aspect est central. Deuxièmement, tous les processeurs sont maintenant multi-cœurs et les grilles de calcul largement disponibles. Du coup, profiter de parallélisme et de distribution semble naturel pour accélérer tant la construction de la base que le débit des recherches par lots. Cette thèse décrit une technique d'indexation multidimensionnelle s'appelant eCP. Sa conception prend en compte les contraintes issues de l'usage de disques et d'architectures parallèles et distribuées. eCP se fonde sur la technique de quantification vectorielle non structurée et non itérative. eCP s'appuie sur une technique de l'état de l'art qui est toutefois orientée mémoire centrale. Notre première contribution se compose d'extensions destinées à permettre de traiter de très larges collections de données en réduisant fortement le coût de l'indexation et en utilisant les disques au mieux. La seconde contribution tire profit des architectures multi-cœurs et détaille comment paralléliser l'indexation et la recherche. Nous évaluons cet apport sur près de 25 millions d'images, soit près de 8 milliards de descripteurs SIFT. La troisième contribution aborde l'aspect distribué. Nous adaptons eCP au paradigme Map-Reduce et nous utilisons Hadoop pour en évaluer les performances. Là, nous montrons la capacité de eCP à traiter de grandes bases en indexant plus de 100 millions d'images, soit 30 milliards de SIFT. Nous montrons aussi la capacité de eCP à utiliser plusieurs centaines de cœurs. / The scale of multimedia collections has grown very fast over the last few years. Facebook stores more than 100 billion images, 200 million are added every day. In order to cope with this growth, methods for content-based image retrieval must adapt gracefully. The work presented in this thesis goes in this direction. Two observations drove the design of the high-dimensional indexing technique presented here. Firstly, the collections are so huge, typically several terabytes, that they must be kept on secondary storage. Addressing disk related issues is thus central to our work. Secondly, all CPUs are now multi-core and clusters of machines are a commonplace. Parallelism and distribution are both key for fast indexing and high-throughput batch-oriented searching. We describe in this manuscript a high-dimensional indexing technique called eCP. Its design includes the constraints associated to using disks, parallelism and distribution. At its core is an non-iterative unstructured vectorial quantization scheme. eCP builds on an existing indexing scheme that is main memory oriented. Our first contribution is a set of extensions for processing very large data collections, reducing indexing costs and best using disks. The second contribution proposes multi-threaded algorithms for both building and searching, harnessing the power of multi-core processors. Datasets for evaluation contain about 25 million images or over 8 billion SIFT descriptors. The third contribution addresses distributed computing. We adapt eCP to the MapReduce programming model and use the Hadoop framework and HDFS for our experiments. This time we evaluate eCP's ability to scale-up with a collection of 100 million images, more than 30 billion SIFT descriptors, and its ability to scale-out by running experiments on more than 100 machines.
18

Recuperação de informação baseada em ontologia : uma proposta utilizando o modelo vetorial /

Janaite Neto, Jorge. January 2018 (has links)
Orientador: Edberto Ferneda / Banca: Rachel Cristina Vesu Alves / Banca: Rogério Aparecido Sá Ramalho / Resumo: A recuperação de informação ocorre por meio da comparação entre as representações dos documentos de um acervo e a representação da necessidade de informação do usuário. Um documento é recuperado quando sua representação coincidir total ou parcialmente com a representação da necessidade de informação do usuário. O processo de recuperação de informação pode ser visto como um problema linguístico no qual o conteúdo informacional dos documentos e a necessidade de informação do usuário são representados por um conjunto de termos. A eficiência do processo de recuperação de informação depende da qualidade das representações dos documentos e dos termos empregados pelo usuário para representar sua necessidade de informação. Quanto mais compatíveis forem essas representações maior será a eficiência do processo de recuperação. A partir de uma pesquisa exploratória e descritiva fundamentada em bibliografia específica, este trabalho propõe a utilização de ontologias computacionais em sistemas de recuperação de informação baseados no Modelo Espaço Vetorial. As ontologias são empregadas como estrutura terminológica externa utilizadas tanto na expansão dos termos de indexação quanto na expansão dos termos que compõe a expressão de busca. A expansão dos termos de indexação é feita logo após a extração dos termos mais representativos do documento em análise durante o processo de indexação, consistindo na adição de novos termos conceitualmente relacionados a fim de enriquecer a representação do... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The information retrieval occurs by means of match between the representations of documents from a collection and the representation of user information's needs. A document is retrieved when its representation matches totally or partially to the user information's needs. The process of information retrieval can be seen as a linguistic issue in which the document information content and the user information need are represented by a set of terms. Its efficiency depends on the quality of the representations of the documents and the terms used to represent the user's information need. The more compatible these representations were, the more efficient the retrieval process. Based on an exploratory and descriptive research substantiated in a specific bibliography, this paper offers to use computational ontologies in information retrieval systems based on the Vector Space Model. The ontologies are applied as external terminological structures used in the indexing terms expansion as well as in the expansion of the terms which compound the query expression. The indexing terms expansion is made as soon as the extraction of the more representative terms of the document in analysis during the indexing process, consisting on the adding of new conceptually related terms in order to improve the document representation. Query expansion is obtained from adding new related terms to the existent ones in the query expression to better contextualize them. In this propose, only the terminological... (Complete abstract click electronic access below) / Mestre
19

Feature-based indexing in visual information systems. / CUHK electronic theses & dissertations collection

January 1997 (has links)
by Donald Asogu Adjeroh. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (p. 202-216). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web.
20

Estudo de vocabulário controlado na indexação automática : aplicação no processo de indexação do Sistema de Indización Semiautomática (SISA) /

Narukawa, Cristina Miyuki. January 2011 (has links)
Resumo: A indexação automática é um processo complexo e delegar a atividade de atribuição de termos aos sistemas automáticos requer análise, tanto dos métodos, quanto das características dos instrumentos de indexação. Desse modo, propomos investigar a atuação de vocabulário controlado neste processo a partir da análise dos resultados de aplicação do vocabulário ThesAgro no Sistema de Indización SemiAutomatica (SISA), com objetivos de identificar as características que definem e distinguem os tipos de vocabulários; analisar propostas metodológicas e sistemas de indexação; aplicar o ThesAgro no sistema SISA em análise comparativa com a indexação manual da Biblioteca Nacional de Agricultura (BINAGRI), e analisar os fatores intervenientes que apontam os problemas ocasionados à indexação automática. De modo geral, buscamos contribuir com o desenvolvimento do tema ao levantar subsídios para adaptação de vocabulários controlados. Realizamos uma revisão teórica sobre sistemas de indexação automática e um experimento aplicando o ThesAgro no sistema SISA com 100 artigos da área agrícola, especificamente sobre fruticultura. Utilizamos, como parâmetro de avaliação, a indexação manual realizada pela BINAGRI e análise comparativa com os resultados de pesquisa anterior em que se avaliou o desempenho do vocabulário Descritores em Ciências da Saúde (DeCS) no referido sistema. A partir da análise dos resultados constatamos que o vocabulário condiciona os resultados do processo de indexação automática e, portanto, é necessário compreendê-lo, considerando os métodos de identificação das unidades representativas da informação, aplicação de tratamento linguístico, características da área do conhecimento, relações semânticas, idioma, atualização, uso de vocabulários... (Resumo completo, clicara acesso eletrônico abaixo) / Abstract: Automatic indexing is a complex process, and delegating the attribution of terms to automatic systems requires analyzing not only the methods, but also the features of indexing instruments. Thereby, we propose to investigate the role of controlled vocabulary in such process, based on the analysis of results from the application of ThesAgro vocabulary in the Semi-Automatic Indexing System (SISA - Sistema de Indización SemiAutomatica -), with the purposes of identifying the characteristics which define and distinguish the types of vocabularies; analyzing methodological proposals and indexing systems; applying the ThesAgro in the SISA, making a comparative analysis related to the manual indexing by the National Library of Agriculture (BINAGRI - Biblioteca Nacional de Agricultura), and analyzing the intervening factors pointing to the occurrence of problems concerning automatic indexing. As a general matter, we seek to contribute to the development of this theme by raising subsidies for adapting controlled vocabularies. We have performed a theoretical review on automatic indexing systems, and an experiment applying the ThesAgro in the SISA, with 100 articles on agriculture , specifically about fruit production. The manual indexing performed by BINAGRI and the comparative analysis with the results from a previous research, which evaluated the performance of the vocabulary from the Health Sciences Descriptors (DeCS - Descritores em Ciências da Saúde) in the before mentioned system, have served as the evaluation parameter. The analysis of results allows us to conclude that the vocabulary conditions the results of the automatic indexing process. Thus, it is necessary to understand it, considering the identification methods of the information representative units, application of linguistic treatment, features of the... (Complete abstract click electronic access below) / Orientador: Mariângela Spotti Lopes Fujita / Coorientador: Isidoro Gil Leiva / Banca: Renato Rocha Souza / Banca: José Augusto Chaves Guimarães / Mestre

Page generated in 0.1798 seconds