Global ETD Search

1	A generic approach towards the collaborative construction of digital scholarly editions / Une approche générique pour la construction collaborative d'éditions critiques électroniques Barrellon, Vincent 27 November 2017 (has links) Les éditions critiques numériques sont des ressources patrimoniales annotées, sous une forme numérique. De telles éditions prennent la forme d'une transcription des ressources originales, augmentées d'un apparat critique, c'est-à-dire, la forme de données structurées. Dans un contexte collaboratif, a structure de ces données est définie explicitement par un schéma, document interprétable qui contraint la manière dont les éditeurs vont pouvoir annoter les ressources primaires et va de ce fait garantir une certaine homogénéité dans le respect de la politique éditoriale. Les projets d'édition critique numérique font classiquement face à deux problèmes techniques. Le premier a à voir avec l'expressivité des langages d'annotation, qui empêchent l'expression de certaines informations utiles. La seconde tient au fait que, par expérience, les schémas qui sous-tendent une édition critique vont être amenés à évoluer au cours de la réalisation de cette édition ; cependant, modifier le schéma implique qu'il faille mettre à jour l'intégralité des données structurées validées par ce schéma, ce qui est habituellement effectué à la main par les éditeurs, au moyen de scripts ad-hoc – si les éditeurs, faute de moyens ou de temps, ne renoncent pas à faire évoluer la structure de données. Dans ce travail de thèse, nous définissons les fondements théoriques pour l'établissement d'un système éditorial dédié à l'édition critique numérique. Nous définissons les eAG, un modèle d'annotation déporté basé sur un formalisme de graphes cycliques, autorisant a plus grande expressivité. Nous définissons un mécanisme de schéma innovant, SeAG, permettant la validation à la volée des eAG au cours de leur manufacture. Nous définissons également une syntaxe de balisage présentant des similarités avec les langages d'annotation classiques comme XML, tout en préservant l'expressivité des eAG. Enfin, nous proposons une algèbre bidirectionnelle pour les eAG de telle sorte que, si un SeAG S est transformé en un SeAG S', alors tout eAG I validé par S est traduit de manière semi-automatique sous la forme d'un eAG I', validé par S', et tel que toute mise à jour de I (respectivement I') soit propagé, de manière semi-automatique, sur I' (resp. I). / Digital Scholarly Editions are critically annotated patrimonial literary resources, in a digital form. Such editions roughly take the shape of a transcription of the original resources, augmented with critical information, that is, of structured data. In a collaborative setting, the structure of the data is explicitly defined in a schema, an interpretable document that governs the way editors annotate the original resources and guarantees they follow a common editorial policy. Digital editorial projects classically face two technical problems. The first has to do with the expressiveness of the annotation languages, that prevents from expressing some kinds of information. The second relies in the fact that, historically, schemas of long-running digital edition projects have to evolve during the lifespan of the project. However, amending a schema implies to update the structured data that has been produced, which is done either by hand, by means of ad-hoc scripts, or abandoned by lack of technical skills or human resources. In this work, we define the theoretical ground for an annotation system dedicated to scholarly edition. We define eAG, a stand-off annotation model based on a cyclic graph model, enabling the widest range of annotation. We define a novel schema language, SeAG, that permits to validate eAG documents on-the-fly, while they are being manufactured. We also define an inline markup syntax for eAG, reminiscent of the classic annotation languages like XML, but retaining the expressivity of eAG. Eventually, we propose a bidirectional algebra for eAG documents so that, when a SeAG S is amended, giving S', an eAG I validated by S is semi-automatically translated into an eAG I' validated by S', and so that any modification applied to I (resp. I') is semi-automatically propagated to I' (resp. I) – hence working as an assistance tool for the evolution of SeAG schemas and eAG annotations. Informatique Document multistructure Humanités numétriques Annotation Transformations bidirectionnelles Information Technology Multistructured documents Digital humanities Annotation Bidirectional transformations 006.740 72
2	An XML document representation method based on structure and content : application in technical document classification / An XML document representation method based on structure and content : application in technical document classification Chagheri, Samaneh 27 September 2012 (has links) L’amélioration rapide du nombre de documents stockés électroniquement représente un défi pour la classification automatique de documents. Les systèmes de classification traditionnels traitent les documents en tant que texte plat, mais les documents sont de plus en plus structurés. Par exemple, XML est la norme plus connue et plus utilisée pour la représentation de documents structurés. Ce type des documents comprend des informations complémentaires sur l'organisation du contenu représentées par différents éléments comme les titres, les sections, les légendes etc. Pour tenir compte des informations stockées dans la structure logique, nous proposons une approche de représentation des documents structurés basée à la fois sur la structure logique du document et son contenu textuel. Notre approche étend le modèle traditionnel de représentation du document appelé modèle vectoriel. Nous avons essayé d'utiliser d'information structurelle dans toutes les phases de la représentation du document: -procédure d'extraction de caractéristiques, -La sélection des caractéristiques, -Pondération des caractéristiques. Notre deuxième contribution concerne d’appliquer notre approche générique à un domaine réel : classification des documents techniques. Nous désirons mettre en œuvre notre proposition sur une collection de documents techniques sauvegardés électroniquement dans la société CONTINEW spécialisée dans l'audit de documents techniques. Ces documents sont en format représentations où la structure logique est non accessible. Nous proposons une solution d’interprétation de documents pour détecter la structure logique des documents à partir de leur présentation physique. Ainsi une collection hétérogène en différents formats de stockage est transformée en une collection homogène de documents XML contenant le même schéma logique. Cette contribution est basée sur un apprentissage supervisé. En conclusion, notre proposition prend en charge l'ensemble de flux de traitements des documents partant du format original jusqu’à la détermination de la ses classe Dans notre système l’algorithme de classification utilisé est SVM. / Rapid improvement in the number of documents stored electronically presents a challenge for automatic classification of documents. Traditional classification systems consider documents as a plain text; however documents are becoming more and more structured. For example, XML is the most known and used standard for structured document representation. These documents include supplementary information on content organization represented by different elements such as title, section, caption etc. We propose an approach on structured document classification based on both document logical structure and its content in order to take into account the information present in logical structure. Our approach extends the traditional document representation model called Vector Space Model (VSM). We have tried to integrate structural information in all phases of document representation construction: -Feature extraction procedure, -Feature selection, -Feature weighting. Our second contribution concerns to apply our generic approach to a real domain of technical documentation. We desire to use our proposition for classifying technical documents electronically saved in CONTINEW; society specialized in technical document audit. These documents are in legacy format in which logical structure is inaccessible. Then we propose an approach for document understanding in order to extract documents logical structure from their presentation layout. Thus a collection of heterogeneous documents in different physical presentations and formats is transformed to a homogenous XML collection sharing the same logical structure. Our contribution is based on learning approach where each logical element is described by its physical characteristics. Therefore, our proposal supports whole document transformation workflow from document’s original format to being classified. In our system SVM has been used as classification algorithm. Informatique Document structuré Document XML Classification supervisée Structure logique Reconstruction structure Computer science Structured document XML Document Supervised classification Logical structure Restructuring 006.740 72

Search results

A generic approach towards the collaborative construction of digital scholarly editions / Une approche générique pour la construction collaborative d'éditions critiques électroniques

An XML document representation method based on structure and content : application in technical document classification / An XML document representation method based on structure and content : application in technical document classification