Global ETD Search

811	Modelagem gerativa para sumarização automática multidocumento / Generative modeling for multi-document sumarization María Lucía Del Rosario Castro Jorge 09 March 2015 (has links) A Sumarização Multidocumento consiste na produção automática de um único sumário a partir de um conjunto de textos que tratam de um mesmo assunto. Essa tarefa vem se tornando cada vez mais importante, já que auxilia o processamento de grandes volumes de informação, permitindo destacar a informação mais relevante para o usuário. Nesse trabalho, são propostas e exploradas modelagens baseadas em Aprendizado Gerativo, em que a tarefa de Sumarização Multidocumento é esquematizada usando o modelo Noisy- Channel e seus componentes de modelagem de língua, de transformação e decodificação, que são apropriadamente instanciados para a tarefa em questão. Essas modelagens são formuladas com atributos superficiais e profundos. Em particular, foram definidos três modelos de transformação, cujas histórias gerativas capturam padrões de seleção de conteúdo a partir de conjuntos de textos e seus correspondentes sumários multidocumento produzidos por humanos. O primeiro modelo é relativamente mais simples, pois é composto por atributos superficiais tradicionais; o segundo modelo é mais complexo, pois, além de atributos superficiais, adiciona atributos discursivos monodocumento; finalmente, o terceiro modelo é o mais complexo, pois integra atributos superficiais, de natureza discursiva monodocumento e semântico-discursiva multidocumento, pelo uso de informação proveniente das teorias RST e CST, respectivamente. Além desses modelos, também foi desenvolvido um modelo de coerência (ou modelo de língua) para sumários multidocumento, que é projetado para capturar padrões de coerência, tratando alguns dos principais fenômenos multidocumento que a afetam. Esse modelo foi desenvolvido com base no modelo de entidades e com informações discursivas. Cada um desses modelos foi inferido a partir do córpus CSTNews de textos jornalísticos e seus respectivos sumários em português. Finalmente, foi desenvolvido também um decodificador para realizar a construção do sumário a partir das inferências obtidas. O decodificador seleciona o subconjunto de sentenças que maximizam a probabilidade do sumário de acordo com as probabilidades inferidas nos modelos de seleção de conteúdo e o modelo de coerência. Esse decodificador inclui também uma estratégia para evitar que sentenças redundantes sejam incluídas no sumário final. Os sumários produzidos a partir dessa modelagem gerativa são comparados com os sumários produzidos por métodos estatísticos do estado da arte, os quais foram implementados, treinados e testados sobre o córpus. Utilizando-se avaliações de informatividade tradicionais da área, os resultados obtidos mostram que os modelos desenvolvidos neste trabalho são competitivos com os métodos estatísticos do estado da arte e, em alguns casos, os superam. / Multi-document Summarization consists in automatically producing a unique summary from a set of source texts that share a common topic. This task is becoming more important, since it supports large volume data processing, enabling to highlight relevant information to the users. In this work, generative modeling approaches are proposed and investigated, where the Multidocument Summarization task is modeled through the Noisy-Channel framework and its components: language model, transformation model and decoding, which are properly instantiated for the correspondent task. These models are formulated with shallow and deep features. Particularly, three main transformation models were defined, establishing generative stories that capture content selection patterns from sets of source texts and their corresponding human multi-document summaries. The first model is the less complex, since its features are traditional shallow features; the second model is more complex, incorporating single-document discursive knowledge features (given by RST) to the features proposed in the first model; finally, the third model is the most complex, since it incorporates multi-document discursive knowledge features (given by CST) to the features provided by models 1 and 2. Besides these models, it was also developed a coherence model (represented by the Noisy-Channel´s language model) for multi-document summaries. This model, different from transformation models, aims at capturing coerence patterns in multi-document summaries. This model was developed over the Entity-based Model and incorporates discursive knowledge in order to capture coherence patterns, exploring multi-document phenomena. Each of these models was treined with the CSTNews córpus of journalistic texts and their corresponding summaries. Finally, a decoder to search for the summary that maximizes the probability of the estimated models was developed. The decoder selects the subset of sentences that maximize the estimated probabilities. The decoder also includes an additional functionality for treating redundancy in the decoding process by using discursive information from the CST. The produced summaries are compared with the summaries produced by state of the art generative models, which were also treined and tested with the CSTNews corpus. The evaluation was carried out using traditional informativeness measures, and the results showed that the generative models developed in this work are competitive with the state of the art statistical models, and, in some cases, they outperform them. . Aprendizado de máquina Modelagem gerativa Sumarização multidocumento Generative modeling Machine learning Multi-document sumarization
812	Modelo de gestión documental para empresas que brinden servicios informáticos a proyectos de investigación académica / Enterprise content management model for companies that provide computer services to academic research projects Huanachin Yancce, Edith Fiorela, Chaffo Vega, Renzo Mauricio Renato 28 October 2019 (has links) Las organizaciones que brindan servicios informáticos para el desarrollo de proyectos de investigación académica no están utilizando Enterprise Content Management (ECM), a pesar de la complejidad de la administración de documentos asociados a esta actividad. ECM permite para almacenar, entregar y gestionar contenido de forma más ágil y así poder brindar mejor servicio. En este trabajo se propone un modelo de gestión documental donde se presentan las fases que permiten implementar esta tecnología y con ello ganar capacidades nuevas en la organización. Para validar el modelo se implementó una herramienta en 2 empresas virtuales analizando los tiempos de respuesta a las solicitudes de servicio que se producían antes y después de la implementación. Además, se hizo una encuesta utilizando la escala de Likert para medirá la satisfacción de los stakeholders. / Organizations that provide computer services for the development of academic research projects are not using Enterprise Content Management (ECM), despite the complexity of document management associated with this activity. ECM allows to store, deliver and manage content more agile and thus be able to provide better service. In this paper, a document management model is proposed where the phases that allow implementing this technology are presented and thereby gain new capabilities in the organization. To validate the model, a tool was implemented in 2 virtual companies analyzing the response times to service requests that occurred before and after implementation. In addition, a survey was made using the Likert scale to measure stakeholder satisfaction. / Tesis Gestión documental Metadata Enterprise Content management Document management Workflow management software
813	Analýza vlastností nástroje dashboard založená na automatické dekompozici obrazovky / Analysis of Dashboard Attributes Based on Automatic Decomposition of Screen Mejía, Santiago January 2018 (has links) The aim of this paper is to propose a method for automatic segmentation of dashboards so that they can be analyzed using aesthetic by means of metrics. These metrics analyze the properties of the screen objects by region. This method is based on bottom-up segmentation method and creates objects based on proximity. The main result is a faster and more accurate analysis of dashboards, as there is no need to manually segment the image.
814	A Gamma-Poisson topic model for short text Mazarura, Jocelyn Rangarirai January 2020 (has links) Most topic models are constructed under the assumption that documents follow a multinomial distribution. The Poisson distribution is an alternative distribution to describe the probability of count data. For topic modelling, the Poisson distribution describes the number of occurrences of a word in documents of fixed length. The Poisson distribution has been successfully applied in text classification, but its application to topic modelling is not well documented, specifically in the context of a generative probabilistic model. Furthermore, the few Poisson topic models in literature are admixture models, making the assumption that a document is generated from a mixture of topics. In this study, we focus on short text. Many studies have shown that the simpler assumption of a mixture model fits short text better. With mixture models, as opposed to admixture models, the generative assumption is that a document is generated from a single topic. One topic model, which makes this one-topic-per-document assumption, is the Dirichlet-multinomial mixture model. The main contributions of this work are a new Gamma-Poisson mixture model, as well as a collapsed Gibbs sampler for the model. The benefit of the collapsed Gibbs sampler derivation is that the model is able to automatically select the number of topics contained in the corpus. The results show that the Gamma-Poisson mixture model performs better than the Dirichlet-multinomial mixture model at selecting the number of topics in labelled corpora. Furthermore, the Gamma-Poisson mixture produces better topic coherence scores than the Dirichlet-multinomial mixture model, thus making it a viable option for the challenging task of topic modelling of short text. The application of GPM was then extended to a further real-world task: that of distinguishing between semantically similar and dissimilar texts. The objective was to determine whether GPM could produce semantic representations that allow the user to determine the relevance of new, unseen documents to a corpus of interest. The challenge of addressing this problem in short text from small corpora was of key interest. Corpora of small size are not uncommon. For example, at the start of the Coronavirus pandemic limited research was available on the topic. Handling short text is not only challenging due to the sparsity of such text, but some corpora, such as chats between people, also tend to be noisy. The performance of GPM was compared to that of word2vec under these challenging conditions on labelled corpora. It was found that the GPM was able to produce better results based on accuracy, precision and recall in most cases. In addition, unlike word2vec, GPM was shown to be applicable on datasets that were unlabelled and a methodology for this was also presented. Finally, a relevance index metric was introduced. This relevance index translates the similarity distance between a corpus of interest and a test document to the probability of the test document to be semantically similar to the corpus of interest. / Thesis (PhD (Mathematical Statistics))--University of Pretoria, 2020. / Statistics / PhD (Mathematical Statistics) / Unrestricted Topic modelling for short text Gamma-poisson mixture mixture models topic modelling document similarity
815	Suffix Trees for Document Retrieval Reck, Ryan 01 June 2012 (has links) This thesis presents a look at the suitability of Suffix Trees for full text indexing and retrieval. Typically suffix trees are built on a character level, where the tree records which characters follow each other character. By building suffix trees for documents based on words instead of characters, the resulting tree effectively indexes every word or sequence of words that occur in any of the documents. Ukkonnen's algorithm is adapted to build word-level suffix trees. But the primary focus is on developing Algorithms for searching the suffix tree for exact and approximate, or fuzzy, matches to arbitrary query strings. A proof-of-concept implementation is built and compared to a Lucene index for retrieval over a subset of the Reuters RCV1 data set. suffix trees document retrieval search index information retrieval Databases and Information Systems
816	Systém pro správu elektronického archivu / System for administering electronic archives Balák, Václav January 2010 (has links) The objectives of this project are to create a concept of system for multimedia documents management and long-term archiving and its realization. The opening chapters are devoted to theoretical analysis of the expected characteristics of this systém and their implementation in open source document management system Alfresco, which was used for implementation. Other chapters are devoted to modifications made to this system, which are enhancements to work with multimedia content and its metadata. Also possibilities of connecting the system to other systems are mentioned. Finally, this document also describes testing the changes and adjustments according to various criteria.
817	Tvorba vnitropodnikových směrnic ve vybrané firmě / Creation of Interdepartmental Directions in a Selected Firm Josefíková, Šárka January 2008 (has links) The master's thesis engaged in problems of intradepartmental directions and focuses on their creation. The main task of this work is to process the most suitable directions for selected territorially autonomy unit.
818	Aplikace nástrojů řízení a automatizace administrativních procesů / Deployment of automation and management tools for administrative processes Brada, Jan January 2008 (has links) Continual raise of data and information amount requires more sophisticated methods and procedures for their processing. The complexity of information systems and their specialized components increases with the volume of processed data. Applications for business process management support and for capacity and efficiency improving have formed a few basic categories for that this terminology becomes common: Business Process Management, Enterprise Content Management and Document Management System. These systems are known in commercial sphere and also in civil service and is used in many corporations. Diploma thesis focuses on the discovery of real situation how these systems are used at universities where they are applied very slowly. The aim of the thesis is to formulate recommendation for implementation of these systems in environment of The Brno University of Technology as a project study.
819	Vicefaktorová autentizace elektronických dokumentů / Multifactoral Authentication of Electronic Documents Gancarčík, Lukáš January 2013 (has links) The aim of the thesis is to provide complete information regarding electronic documents and possibilities of their usage. The focus is concentrated on the area of authentication, which specifies the possibility of obtaining authentication information and describes the authentication processes itself. The diploma thesis also deals with the suggestion of multifactor authentication of electronic documents for the selected company.
820	Management kvality v malém podniku / Quality management in small enterprise Stodůlka, Michal January 2008 (has links) Diploma thesis on the theme „Quality management in small enterprise“ is engaged in quality management system in small and relatively short-time working company which deals with metal production. Thesis should help to My&Ko, s.r.o. firm with other creation and improvement of quality management system, subsequently it should serve as an instruction in procedure of the future certification by ISO 9001:2001.

Search results