1 |
The practical value of classification summaries in information management and integrationRozman, Darija 12 1900 (has links)
The author discusses the value and importance of using short extracts from classification tables to support subject access management. While detailed classification is time consuming, complex and costly, the classification of documents into broader classes is a simpler and easier way of achieving meaningful and useful subject organization. The paper outlines the role of this type of classification use in bibliographic listings, in the organization and representation of physical documents, in the presentation of web resources, in statistical reports in collection development and use, and, last but not least, in information integration in a networked environment. This approach of subject classification is illustrated by the Slovenian union catalogue COBISS/OPAC in which a standardized set of UDC codes is used. The author emphasizes the importance of this outline for the homogeneity and continuity of the use of UDC in Slovenia and explains how this may be weakened by the changes in the top level of UDC.
|
2 |
DescribeX: A Framework for Exploring and Querying XML Web CollectionsRizzolo, Flavio Carlos 26 February 2009 (has links)
The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for several reasons: an XML schema may be very lax (e.g., to accommodate the flexibility needed to represent collections of documents in RSS feeds), a schema may be large and different subsets used for different documents (e.g., this is common in industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). A schema alone may not provide sufficient information for many data management tasks that require knowledge of the actual structure of the collection.
Web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges.
This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogenous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them.
Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX’s light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.
|
3 |
DescribeX: A Framework for Exploring and Querying XML Web CollectionsRizzolo, Flavio Carlos 26 February 2009 (has links)
The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for several reasons: an XML schema may be very lax (e.g., to accommodate the flexibility needed to represent collections of documents in RSS feeds), a schema may be large and different subsets used for different documents (e.g., this is common in industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). A schema alone may not provide sufficient information for many data management tasks that require knowledge of the actual structure of the collection.
Web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges.
This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogenous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them.
Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX’s light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.
|
4 |
Analyzing the Use of Plain Language in Brief Summaries on ClinicalTrials.govEddington, Megan J 01 January 2024 (has links) (PDF)
ClinicalTrials.gov is a database designed to help clinical researchers make their research publicly available. The clinical trials registered on the database each include a brief summary, which is meant to be a short description that the public can easily understand. In September 2022, ClinicalTrials.gov published a "Plain Language Checklist for Lay Brief Summaries" on their website, which identifies plain language best practices intended to help investigators craft summaries that can be readily understood by the public. This thesis assesses the impact of the checklist on the language use in the brief summaries in the year following the checklist's publication. The analysis examines 62 brief summaries for Phase III and IV clinical trials posted on ClinicalTrials.gov between September 26, 2022, and September 26, 2023. It focuses particularly on summaries associated with rheumatoid arthritis, knee replacement, and conjunctivitis to gauge how well they complied with 4 of the 19 criteria on the Plain Language Checklist: keeping sentences and paragraphs short, aiming for a 6th to 8th grade reading level, writing out acronyms on the first use, and providing both percentages and natural frequencies. It also examines rhetorical moves made in the summaries to address the use of jargon, key term definitions, headings, formatted lists, direct research questions, descriptions of study type, sentence fragments, and the placement of the purpose statement to see how these moves affected the plain language. Although the summaries tended to comply with the paragraph length guidelines, they did not comply with the sentence length, reading level, or acronym guidelines. The variation in compliance could be attributed to researchers' lack of awareness of the guidelines, lack of time to devote to creating brief summaries, or being too immersed in the field to imagine the needs of a lay audience. It could also be attributed to the National Institute of Health not enforcing the guidelines or to researchers not viewing the guidelines as being relevant.
|
5 |
IMPROVED DOCUMENT SUMMARIZATION AND TAG CLOUDS VIA SINGULAR VALUE DECOMPOSITIONProvost, JAMES 25 September 2008 (has links)
Automated summarization is a difficult task. World-class summarizers can provide only "best guesses" of which sentences encapsulate the important content from within a set of documents. As automated systems continue to improve, users are still not given the means to observe complex relationships between seemingly independent concepts. In this research we used singular value decompositions to organize concepts and determine the best candidate sentences for an automated summary. The results from this straightforward attempt were comparable to world-class summarizers. We then included a clustered tag cloud, using a singular value decomposition to measure term "interestingness" with respect to the set of documents. The combination of best candidate sentences and tag clouds provided a more inclusive summary than a traditionally-developed summarizer alone. / Thesis (Master, Computing) -- Queen's University, 2008-09-24 16:31:25.261
|
6 |
O gênero resumo na universidade : dialogismo e responsividade em resumos de alunos ingressantes / The gender summary at university : dialogism and responsivity in newcomer students'summariesCosta, Cristina Fontes de Paula, 1988- 23 August 2018 (has links)
Orientador: Raquel Salek Fiad / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-23T10:59:33Z (GMT). No. of bitstreams: 1
Costa_CristinaFontesdePaula_M.pdf: 1642439 bytes, checksum: a6b113eb999257cf38cd046cc310b0a1 (MD5)
Previous issue date: 2013 / Resumo: Neste trabalho, fazemos uma discussão sobre um gênero muito comum na esfera escolar e acadêmica: o resumo. Estabelecendo um diálogo entre a orientação dialógica, de Bakhtin (1990, 1992), e o conceito de práticas de letramento, de Street (2003), o objetivo de nosso trabalho é observar, a partir de resumos produzidos a partir de um artigo de opinião, por alunos ingressantes que participaram de um programa universitário, indícios de práticas de leitura e escrita anteriores à esfera acadêmica, assim como detectar, a partir da materialidade linguística, diálogos com textos, gêneros e práticas. Com um olhar de Sherlock Holmes adotamos, como metodologia, o paradigma indiciário, proposto por Ginzburg (1989), que nos guiou para o particular, para a singularidade de cada texto. Em todas as análises, ressaltamos o caráter dialógico da linguagem; assim, fazemos uma análise do próprio texto base que, pelo fato de ser um artigo de opinião escrito por um cientista, já representa um diálogo entre as esferas jornalística e científica. Nos resumos, detectamos o diálogo principal com três práticas escolares de escrita: o próprio gênero resumo, a dissertação e os gêneros de divulgação científica. Constatamos ser a escrita de um resumo uma atividade complexa que envolve conhecimento e familiaridade com o gênero do texto base, diálogos com o já dito (e com o ainda por dizer), com outras esferas, com práticas de escrita e leitura e com a avaliação a ser feita pelo professor, sempre presente em qualquer prática de escrita na esfera escolar. Levando em conta que qualquer manifestação de singularidade é sempre uma resposta ao contexto social, podemos pensar nos diálogos como uma resposta à instituição escolar e suas práticas. Observamos que o diálogo com as práticas escolares remete ao momento histórico dos alunos, ingressantes na universidade, que não se desvencilham de uma hora para outra de práticas de letramento anteriores. Observamos, também, que a maior parte dos resumos analisados dialoga com práticas autoritárias de leitura e escrita, em que o aluno deve apenas reproduzir, revozear o já-dito. Esse fato diz muito sobre a produção de textos da escola, que pode estar privilegiando a reprodução e não a reflexão / Abstract: In this paper, we discuss a very common gender in scholar and academic sphere: the summary. Establishing a dialogue between the dialogic orientation, from Bakhtin (1990, 1992), and the literacy practices concept, from Street (2003), the goal of this work is to observe, in summaries produced from an opinion article, by students who participated of an undergraduate program, indications of anterior reading and writing practices, and also to detect, through linguistic materiality, dialogues with texts, genres and practices. With a Sherlock Holmes look we adopt, as a methodology, the paradigm of indication, proposed by Ginzburg (1989), which guided us for the particular, for the singularity of each text. In all analyses, we highlight the dialogic nature of language; to do so, we make an analyze of the base text, an opinion article written by a scientist, that represents a dialogue between journalistic and scientific spheres. In the summaries, we detect a main dialogue with three writing school practices: the genre summary itself, the essay and scientific popularization genres. We established that the writing of a summary is a complex activity which involves knowledge and familiarity with the base text genre, dialogues with the told (and what yet has not been told), with other spheres, reading and writing practices and with the assessment made by the teacher, always present in any writing practice in the school domain. Taking into account that any manifestation of singularity is always an answer to social context, we can consider the dialogues as an answer to the scholar institution and its practices. We see that the dialogue with school practices is related to the students historical moment, since they are first year university students and don't suddenly do away with anterior literacy practices. We also see that the most part of the analyzed summaries dialogue with authoritarian practices of reading and writing, in which the student only has to reproduce, retell what has been already told. This fact means much about text production at school, that could be privileging reproduction and not thinking / Mestrado / Lingua Materna / Mestra em Linguística Aplicada
|
7 |
Fouille de données par extraction de motifs graduels : contextualisation et enrichissement / Data mining based on gradual itemsets extraction : contextualization and enrichmentOudni, Amal 09 July 2014 (has links)
Les travaux de cette thèse s'inscrivent dans le cadre de l'extraction de connaissances et de la fouille de données appliquée à des bases de données numériques ou floues afin d'extraire des résumés linguistiques sous la forme de motifs graduels exprimant des corrélations de co-variations des valeurs des attributs, de la forme « plus la température augmente, plus la pression augmente ». Notre objectif est de les contextualiser et de les enrichir en proposant différents types de compléments d'information afin d'augmenter leur qualité et leur apporter une meilleure interprétation. Nous proposons quatre formes de nouveaux motifs : nous avons tout d'abord étudié les motifs dits « renforcés », qui effectuent, dans le cas de données floues, une contextualisation par intégration d'attributs complémentaires, ajoutant des clauses introduites linguistiquement par l'expression « d'autant plus que ». Ils peuvent être illustrés par l'exemple « plus la température diminue, plus le volume de l'air diminue, d'autant plus que sa densité augmente ». Ce renforcement est interprété comme validité accrue des motifs graduels. Nous nous sommes également intéressées à la transposition de la notion de renforcement aux règles d'association classiques en discutant de leurs interprétations possibles et nous montrons leur apport limité. Nous proposons ensuite de traiter le problème des motifs graduels contradictoires rencontré par exemple lors de l'extraction simultanée des deux motifs « plus la température augmente, plus l'humidité augmente » et « plus la température augmente, plus l'humidité diminue ». Pour gérer ces contradictions, nous proposons une définition contrainte du support d'un motif graduel, qui, en particulier, ne dépend pas uniquement du motif considéré, mais aussi de ses contradicteurs potentiels. Nous proposons également deux méthodes d'extraction, respectivement basées sur un filtrage a posteriori et sur l'intégration de la contrainte du nouveau support dans le processus de génération. Nous introduisons également les motifs graduels caractérisés, définis par l'ajout d'une clause linguistiquement introduite par l'expression « surtout si » comme par exemple « plus la température diminue, plus l'humidité diminue, surtout si la température varie dans [0, 10] °C » : la clause additionnelle précise des plages de valeurs sur lesquelles la validité des motifs est accrue. Nous formalisons la qualité de cet enrichissement comme un compromis entre deux contraintes imposées à l'intervalle identifié, portant sur sa taille et sa validité, ainsi qu'une extension tenant compte de la densité des données.Nous proposons une méthode d'extraction automatique basée sur des outils de morphologie mathématique et la définition d'un filtre approprié et transcription. / This thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription.
|
8 |
Semantics-driven Abstractive Document SummarizationAlambo, Amanuel 02 August 2022 (has links)
No description available.
|
9 |
Système symbolique de création de résumés de mise à jourGenest, Pierre-Étienne January 2009 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal.
|
10 |
Resumo de artigo de opinião na perspectiva dos estudos linguísticos da microestrutura e da macroestrutura textualMoraes, Otávio Brasil de 07 August 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-08-17T13:21:00Z
No. of bitstreams: 1
Otávio Brasil de Moraes.pdf: 5548770 bytes, checksum: f76d20e7f7854b63714dd8e1030f214f (MD5) / Made available in DSpace on 2017-08-17T13:21:00Z (GMT). No. of bitstreams: 1
Otávio Brasil de Moraes.pdf: 5548770 bytes, checksum: f76d20e7f7854b63714dd8e1030f214f (MD5)
Previous issue date: 2017-08-07 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Fundação de Amparo à Pesquisa do Estado do Amazonas - FAPEAM / In this dissertation, our general objective is to propose the usage of the notions of
microstructures and macrostructures in the production of summaries of opinion
articles. Throughout the research, developed together with students of a
secondary school in the city of Manaus, we have highlighted significant differences
between summaries produced in the “traditional” manner and those produced
according to our proposal. Our theoretical framework is Textual Linguistics of text
grammars, especially the proposal by Van Dijk (1996). We have also considered
the concept of text developed in the 1970’s and 80’s by Van Dijk and Kinstch.
Moreover, we have highlighted more recent studies, such as Marquesi (2004),
Delphino (1991) and Machado (2004). Methodologically, the research was
developed as follows: initially, in the first class, we requested that students
produced an abstract of an opinion article following the traditional perspective in
teaching how to produce summaries, in which it is generally emphasized only the
selection of the main ideas in a text. In the second class, we discussed the
concepts of microstructures and macrostructures for the production of summaries,
and, in the third class, we requested that students produced a second summary
using those notions. We then analyzed twenty-eight summaries produced by 14
students, which allowed us to verify positive contributions from the proposal of
working with textual macrostructures. The data analysis chapter contains as
examples 6 summaries chosen among the 28 that were analyzed, of which 3
(three) were written with the traditional perspective and the other 3 (three) were
written in the framework of microstructures and macrostructures. After, we discuss
the results, which clearly point out the contributions of the studies of textual
microstructures and macrostructures to the production of summaries / Nesta dissertação, temos como objetivo geral propor a utilização das noções de
micro e macroestruturas para a produção de resumos de artigos de opinião. Ao
longo da pesquisa, desenvolvida junto aos discentes do ensino médio de uma
escola estadual da cidade de Manaus, destacamos diferenças significativas entre
os resumos produzidos de forma “tradicional” e aqueles produzidos segundo a
nossa proposta. Para tanto, tomamos como base teórica a Linguística Textual das
gramáticas de texto, sobretudo a proposta de van Dijk (1996). Consideramos
também o conceito de texto desenvolvido durante as décadas de 1970 e 1980 por
van Dijk e Kinstch. Ademais, destacamos trabalhos mais recentes que tratam da
produção de resumos de textos segundo a perspectiva desses estudos, tais como
Marquesi (2004), Leite (2006), Delphino (1991) e Machado (2004).
Metodologicamente, a pesquisa se desenvolveu da seguinte forma: inicialmente,
na primeira aula, solicitamos aos alunos que produzissem um resumo de um artigo
de opinião, obedecendo à perspectiva tradicional de ensino de produção de
resumo, na qual de modo geral se enfatiza apenas a identificação das ideias
principais do texto; na segunda aula, trabalhamos os conceitos de micro e
macroestruturas para a produção de textos-resumo e, na terceira aula, solicitamos
aos discentes que produzissem um segundo resumo, no caso considerando a
utilização das noções de micro e macroestrutura. Após essa etapa, analisamos
vinte e oito resumos produzidos por 14 alunos, o que nos permitiu verificar os
pontos positivos da proposta que considera o trabalho com as macroestruturas
textuais. No capítulo de análise, apresentamos a título de exemplificação a análise
de 6 (seis) resumos escolhidos entre os 28 analisados, sendo 3 (três) elaborados
segundo a perspectiva tradicional de ensino de produção de resumo e 3 (três)
elaborados pelo viés micro/ macroestrutura. Em seguida, discutimos os resultados
que sinalizaram claramente as contribuições da perspectiva que considera os
estudos de micro e macroestrutura textual para a produção de resumos
|
Page generated in 0.0459 seconds