Global ETD Search

521	Schémas de classification et repérage des documents administratifs électroniques dans un contexte de gestion décentralisée des ressources informationnelles Mas, Sabine 05 1900 (has links) Les employés d’un organisme utilisent souvent un schéma de classification personnel pour organiser les documents électroniques qui sont sous leur contrôle direct, ce qui suggère la difficulté pour d’autres employés de repérer ces documents et la perte possible de documentation pour l’organisme. Aucune étude empirique n’a été menée à ce jour afin de vérifier dans quelle mesure les schémas de classification personnels permettent, ou même facilitent, le repérage des documents électroniques par des tiers, dans le cadre d’un travail collaboratif par exemple, ou lorsqu’il s’agit de reconstituer un dossier. Le premier objectif de notre recherche était de décrire les caractéristiques de schémas de classification personnels utilisés pour organiser et classer des documents administratifs électroniques. Le deuxième objectif consistait à vérifier, dans un environnement contrôlé, les différences sur le plan de l’efficacité du repérage de documents électroniques qui sont fonction du schéma de classification utilisé. Nous voulions vérifier s’il était possible de repérer un document avec la même efficacité, quel que soit le schéma de classification utilisé pour ce faire. Une collecte de données en deux étapes fut réalisée pour atteindre ces objectifs. Nous avons d’abord identifié les caractéristiques structurelles, logiques et sémantiques de 21 schémas de classification utilisés par des employés de l’Université de Montréal pour organiser et classer les documents électroniques qui sont sous leur contrôle direct. Par la suite, nous avons comparé, à partir d'une expérimentation contrôlée, la capacité d’un groupe de 70 répondants à repérer des documents électroniques à l’aide de cinq schémas de classification ayant des caractéristiques structurelles, logiques et sémantiques variées. Trois variables ont été utilisées pour mesurer l’efficacité du repérage : la proportion de documents repérés, le temps moyen requis (en secondes) pour repérer les documents et la proportion de documents repérés dès le premier essai. Les résultats révèlent plusieurs caractéristiques structurelles, logiques et sémantiques communes à une majorité de schémas de classification personnels : macro-structure étendue, structure peu profonde, complexe et déséquilibrée, regroupement par thème, ordre alphabétique des classes, etc. Les résultats des tests d’analyse de la variance révèlent des différences significatives sur le plan de l’efficacité du repérage de documents électroniques qui sont fonction des caractéristiques structurelles, logiques et sémantiques du schéma de classification utilisé. Un schéma de classification caractérisé par une macro-structure peu étendue et une logique basée partiellement sur une division par classes d’activités augmente la probabilité de repérer plus rapidement les documents. Au plan sémantique, une dénomination explicite des classes (par exemple, par utilisation de définitions ou en évitant acronymes et abréviations) augmente la probabilité de succès au repérage. Enfin, un schéma de classification caractérisé par une macro-structure peu étendue, une logique basée partiellement sur une division par classes d’activités et une sémantique qui utilise peu d’abréviations augmente la probabilité de repérer les documents dès le premier essai. / The employees of an organization often use a personal classification scheme to organize electronic documents residing on their own workstations. As this may make it hard for other employees to retrieve these documents, there is a risk for the organization of losing track of needed documentation. To this day, no empirical study has been conducted to verify whether personal classification schemes allow, or even facilitate the retrieval of documents created and classed by someone else, in collaborative work, for example, or when it becomes necessary to reconstruct a “dossier”. The first objective of our research was to describe the characteristics of personal classification schemes used to organize and classify administrative electronic documents. Our second objective was to verify, in a controlled environment, differences as to retrieval effectiveness which would be linked to the characteristics of classification schemes. More precisely, we wanted to verify if it was possible to find a document with the same effectiveness, whatever the classification scheme used. Two types of data collection were necessary to reach those objectives. We first identified the structural, logical and semantic characteristics of 21 classification schemes used by Université de Montréal employees to organize and classify electronic documents residing on their own workstations. We then compared, in a controlled experimentation, the capacity of 70 participants to find electronic documents with the help of five classification schemes exhibiting variations in their structural, logical and semantic characteristics. Three variables were used to measure retrieval effectiveness : the number of documents found, the average time needed (in seconds) to locate the documents and the number of documents found on the first try. Results revealed many structural, logical and semantic characteristics common to a majority of personal classification schemes : extended macro-structures, shallow, complex and unbalanced structures, thematic grouping, alphabetical order of classes, etc. An analysis of variance revealed significant differences as to retrieval effectiveness that are related to the structural, logical and semantic characteristics of the classification scheme. A classification scheme characterized by a narrow macro-structure and a logic based on classes of activities increases the probability of finding documents more rapidly. On the semantic level, more explicit denominations of classes (for example, by using definitions or avoiding acronyms and abbreviations) increases the probability of success in finding documents. Finally, a classification scheme characterized by a narrow macro-structure, a logic based on classes of activities, and a semantic that uses few abbreviations minimizes the risk of error and failure in retrieval. Schéma de classification Classification Repérage Document électronique Document administratif Archives Gestion personnelle de l’information Organisation des documents Théorie de la classification Principes archivistiques Classification schemes Classification Electronic records Administrative records Archives Personal information management Document organization Classification theory Archival principles Retrieval
522	Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction Packer, Thomas L 01 October 2014 (has links) (PDF) Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, and requirements of a complete end-to-end solution to this task. The diverse information is costly to extract because many kinds of lists appear even within a single document, differing from each other in both structure and content. The lists' records and component data fields are usually not set apart explicitly from the rest of the text, especially in a corpus of OCRed historical documents. OCR errors and the lack of document structure (e.g. HMTL tags) make list content hard to recognize by a software tool developed without a substantial amount of highly specialized, hand-coded knowledge or machine learning supervision. Making an approach that is not only accurate but also sufficiently scalable in terms of time and space complexity to process a large corpus efficiently is especially challenging. In this dissertation, we introduce a novel family of scalable approaches to list discovery and ontology population. Its contributions include the following. We introduce the first general-purpose methods of which we are aware for both list detection and wrapper induction for lists in OCRed or other plain text. We formally outline a mapping between in-line labeled text and populated ontologies, effectively reducing the ontology population problem to a sequence labeling problem, opening the door to applying sequence labelers and other common text tools to the goal of populating a richly structured ontology from text. We provide a novel admissible heuristic for inducing regular expression wrappers using an A* search. We introduce two ways of modeling list-structured text with a hidden Markov model. We present two query strategies for active learning in a list-wrapper induction setting. Our primary contributions are two complete and scalable wrapper-induction-based solutions to the end-to-end challenge of finding lists, extracting data, and populating an ontology. The first has linear time and space complexity and extracts highly accurate information at a low cost in terms of user involvement. The second has time and space complexity that are linear in the size of the input text and quadratic in the length of an output record and achieves higher F1-measures for extracted information as a function of supervision cost. We measure the performance of each of these approaches and show that they perform better than strong baselines, including variations of our own approaches and a conditional random field-based approach. information extraction data ontology conceptual modeling ontology population grammar induction wrapper induction hidden Markov model HMM regular expression regex OCR plain text OCRed text document list active learning unsupervised active learning document analysis and recognition historical document Computer Sciences
523	Die institusionele beeld en die impak daarvan op die kommunikasie binne die Universiteit van Stellenbosch Pienaar, Marguerite 03 1900 (has links) Thesis (MPhil (Afrikaans and Dutch. Document Analysis))--University of Stellenbosch, 2007. / In this study, research has been done on the institutional image of the Stellenbosch University (SU). The impact of the true and desired image on the written communication of the SU has been researched to determine how much influence the US can have on the forming of the institutional image and how it can be improved to correlate more with the desired image of the SU. The focus was, more specifically, on the written communication of the Registrar’s division of the SU and their institutional documents. The institutional image was tested by means of questionnaires filled in by the students of the SU. The groups have been selected in accordance with the population profile of the SU to be statistically representitave of the true population studying at the SU. Written communication -- Stellenbosch Business communication Dissertations -- Document design Theses -- Document design
524	Creating and Maintaining Consistent Documents with Elucidative Development Bartho, Andreas 20 September 2016 (has links) (PDF) Software systems usually consist of multiple artefacts, such as requirements, class diagrams, or source code. Documents, such as specifications and documentation, can also be viewed as artefacts. In practice, however, writing and updating documents is often neglected because it is expensive and brings no immediate benefit. Consequently, documents are often outdated and communicate wrong information about the software. The price is paid later when a software system must be maintained and much implicit knowledge that existed at the time of the original development has been lost. A simple way to keep documents up to date is generation. However, not all documents can be fully generated. Usually, at least some content must be written by a human author. This handwritten content is lost if the documents must be regenerated. In this thesis, Elucidative Development is introduced. It is an approach to create documents by partial generation. Partial generation means that some parts of the document are generated whereas others are handwritten. Elucidative Development retains manually written content when the document is regenerated. An integral part of Elucidative Development is a guidance system, which informs the author about changes in the generated content and helps him update the handwritten content. / Softwaresysteme setzen sich üblicherweise aus vielen verschiedenen Artefakten zusammen, zum Beispiel Anforderungen, Klassendiagrammen oder Quellcode. Dokumente, wie zum Beispiel Spezifikationen oder Dokumentation, können auch als Artefakte betrachtet werden. In der Praxis wird aber das Schreiben und Aktualisieren von Dokumenten oft vernachlässigt, weil es zum einen teuer ist und zum anderen keinen unmittelbaren Vorteil bringt. Dokumente sind darum häufig veraltet und vermitteln falsche Informationen über die Software. Den Preis muss man später zahlen, wenn die Software gepflegt wird, weil viel von dem impliziten Wissen, das zur Zeit der Entwicklung existierte, verloren ist. Eine einfache Möglichkeit, Dokumente aktuell zu halten, ist Generierung. Allerdings können nicht alle Dokumente generiert werden. Meist muss wenigstens ein Teil von einem Menschen geschrieben werden. Dieser handgeschriebene Inhalt geht verloren, wenn das Dokument neu generiert werden muss. In dieser Arbeit wird das Elucidative Development vorgestellt. Dabei handelt es sich um einen Ansatz zur Dokumenterzeugung mittels partieller Generierung. Das bedeutet, dass Teile eines Dokuments generiert werden und der Rest von Hand ergänzt wird. Beim Elucidative Development bleibt der handgeschriebene Inhalt bestehen, wenn das restliche Dokument neu generiert wird. Ein integraler Bestandteil von Elucidative Development ist darüber hinaus ein Hilfesystem, das den Autor über Änderungen an generiertem Inhalt informiert und ihm hilft, den handgeschriebenen Inhalt zu aktualisieren. Elucidative Development Redundanz Konsistenz Inkonsistenz Transkonsistenz Transklusion Dokumenterzeugung Dokumentverwaltung DEFT Elucidative Development Redundancy Consistency Inconsistency Transconsistency Transclusion Document Generation Document Management DEFT ddc:000 rvk:ST 230
525	L'écrit électronique Senécal, François 08 1900 (has links) Les technologies de l’information entraînent de profondes transformations dans nos façons d’apprendre et de socialiser ; de lire et d’écrire. Ces changements ne sont pas sans conséquence sur de nombreuses institutions, juridiques ou non. Créées au fil du temps et adaptées à une réalité qu’elles avaient internalisée, elles doivent aujourd’hui comprendre et s’adapter au changement. L’écrit est une de ces institutions. Sa place dans le droit civil est le fruit de centaines d’années de cohabitation et le droit y a vu un allié stable. Mais autrefois facilitateur, l’écrit devient obstacle alors que les technologies de l’information, affranchies du papier, sont utilisées dans des situations juridiques. Comment adapter la notion d’écrit – et celles de l’original et de la signature – alors qu’il n’est question que de données abstraites sous forme numérique ? C’est là l’objet de ce mémoire. Suite à une étude de la notion d’écrit dans le temps, de son affirmation à son bouleversement, nous étudierons les outils juridiques (traditionnels ou récents, comme les principes de neutralité technologique et d’équivalence fonctionnelle) à la disposition du droit civil pour constamment s’adapter à des situations changeantes. Enfin, dans une perspective plus pratique, nous verrons le traitement qu’ont fait divers législateurs, de l’écrit électronique. Nous terminerons par une analyse plus précise des dispositions québécoises relatives à l’écrit électronique. Les principes étudiés dans ce mémoire sont susceptibles de s’appliquer à d’autres situations similaires. / Information technology has completely modified our way of learning, socialising, reading and writing. These changes have also affected numerous institutions. Developed over many years and adapted to a reality they internalised, they now have to understand the nature of the changes taking place and adapt to them. The legal concept of “writing” is such an institution. Its place in the realm of civil law is the result of hundreds of years of cohabitation. The legal system has found a great ally in “writings”. However, although “writing” has been seen as an enabler in the past, the use of information technologies in legal circumstances has turned it into an obstacle. How are we going to adapt the notion of writing – and those of original and signature – when talking about digital data ? This is the topic of our thesis. Following a historical study of the concept of “writing”, from its inception to its current state of crisis, we will analyse the legal tools made available to civil law (whether they be traditional or recent, such as the technological neutrality, and functional equivalence principles) in order to adapt to a constantly changing technological landscape. On a more practical level, we will study how different legislators have addressed electronic documents. Our study will conclude with an analysis of Quebec legislation pertaining to electronic documents. The principles studied in this thesis should be applicable to other similar situations. Écrit Writing Document électronique Electronic document Preuve Proof Formalisme Formalism Neutralité technologique Technological neutrality Équivalence fonctionnelle Functional equivalence
526	Émergence du fumisme dans la production d'un nouvel esprit littéraire Tremblay, Charles-Étienne 08 1900 (has links) La présente thèse se veut une relecture du fumisme en tant que concept et mouvement historique daté (années 1860-1880) et situé (la France), ou moment qui représente une économie de sens qui a bouleversé les habitudes perceptuelles et intellectuelles de la réception depuis la seconde moitié du dix-neuvième siècle. Selon la lecture habituelle du fumisme, les productions des poètes et artistes fumistes, qualifiées de « fumisteries », ne forment qu’un chapitre, ou une catégorie négligeable, de l’histoire littéraire. Cette histoire confond le fumisme en tant que mouvement littéraire éphémère avec les épisodes décadent et symboliste pour le réduire à un concours de mystifications de bourgeois par des bohèmes en marge par rapport à l’institution littéraire organisées par le comédien Sapeck et l’écrivain Alphonse Allais, tous deux nommés ironiquement chefs de « l’École fumiste » vers 1880. Or, en offusquant la conception positiviste du langage qu’elle lui applique afin de le réduire à une simple provocation sans but, et en assimilant Rimbaud aux « fumisteries » des « décadents », la critique littéraire nous donne l’outil principal de démystification du fumisme en tant que pratique ou mode de production d’une économie de sens. C’est cette économie qui constitue notre principal point d’intérêt. Contemporain des épisodes décadent et symboliste, le moment fumiste oblige la réception à reconfigurer la façon de produire du sens. Les productions fumistes (essentiellement des poèmes et des caricatures, comme dans l’Album zutique, notre corpus principal) sont fondées sur une économie du rébus. Exemplifiée par le sonnet de Rimbaud intitulé « Voyelles », cette économie, qui crée des « documents », des textes inséparables de leur matière, introduit l’économie artistique du vingtième siècle – en particulier, au mode de perception cinématographique tel que fabriqué par le fumiste Émile Cohl. / This thesis focuses on a particular period in literary history that goes under the name of fumism. This “fumist” moment, which occurred during the years 1860-1880 in Paris, introduces a new economy of meaning that, in the latter part of the nineteenth century, leads to a transformation in reader’s changing perceptual and intellectual habits. In the perspective of institutionalized literary history, critics conceive fumist productions as “fumisteries” (which might be rendered as “nonsense”) and lump this ephemeral movement or literary school (“l’École fumiste”) together with decadent and symbolist literary movements, reducing it to so-called mystification contests organized by the comedian named Sapeck and the writer Alphonse Allais, both designated as leaders of “l’École fumiste” around 1880. Yet, rather than viewing fumist productions as aimless provocations and assimilating Rimbaud’s work as an example of this “fumisterie” and decadence, this thesis examines the underlying presuppositions of language that are operative in the novel understanding of literature it entails. In this perspective, “fumism,” as a theory of discourse and literary practice, signals the emergence of a new vision of language and literary production. Against this background, this thesis presents a detailed historical reading of fumism in the context of literary debates in late nineteenth-century France. At the same time, this study shows how the reception of fumist works leads to a transformed economy of meaning and, above all, to a reconfiguration of literary understanding. As this study details, fumist productions (essentially poems and caricatures that can be viewed in the Album zutique, the main corpus) are based on rebuses. Remarkably exemplified by Rimbaud’s controversial sonnet “Voyelles”, this new meaning economy creates what are termed “documents”, which place the materiality of the text on a par with its potential meanings. In interpreting this transformation, the thesis concludes by demonstrating how this new understanding of meaning lays the groundwork for the artistic economy of the twentieth century – in particular, with regard to the dynamic mode of perception introduced by the father of the animated film, the fumist Emile Cohl. Fumisme Moment fumiste Economie Production de sens Reception Rebus Metatexte Document Fumism Fumist period Economy Meaning production Reception Rebuses Metatext Document
527	Extracting and exploiting word relationships for information retrieval Cao, Guihong January 2008 (has links) Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal. Recherche d'information Information retrieval Modèle de langue Language Modeling Relation entre termes Word relationship Expansion de document Document expansion Expansion de requête Query expansion
528	La preuve par métadonnées Dicecca, Christopher 11 1900 (has links) L’entrée en vigueur de la Loi concernant le cadre juridique des technologies de l’information (ci-après la Loi), est la concrétisation de la prise en compte par le droit, de la preuve technologique. La notion de document technologique est à la fois centrale dans la Loi et dans le Code civil du Québec. Il s’est parfaitement intégré aux divers moyens de preuve du Code civil. Nous allons nous intéresser à cette notion qu’est le document technologique, mais davantage à ses éléments structurants, les métadonnées. Nous allons nous pencher sur la notion, ses origines et ses domaines de prédilection, faisant d’elles, un objet a priori essentiellement technologique, avant de les envisager dans un contexte de preuve. Nous allons voir quel potentiel probatoire les métadonnées représentent, à l’appui d’un document technologique. Enfin, nous nous interrogerons sur leur rôle probatoire autour des notions de copie-transfert et des obligations posées par la Loi, afin que ces deux modes de reproduction des document, puissent légalement tenir lieu du document original, soit la certification et la documentation. / The entry into force of the Act to establish a legal framework for information technology (hereafter «the Law») symbolises the embodiment of technological evidence into law. The notion of technological document is central to this Law. It is perfectly integrated to the different means of evidence in the Civil code. We will of course look at the notion of technological document, but even more so at its structuring element, metadata. We will study the notion, the origin and core areas of metadata. Metadata, an essentially technological element, will be studied within the context of evidence law. We will see what probationary potential metadata can offer in support of a technological document. Finally, we will examine the role of metadata within the copy-transfer concept and obligations imposed by the Law to legally be used as original document, certification and documentation. Métadonnées preuve document technologique copie transfert transmission certification notaire documentation metadata evidence copy technology-based notary document
529	Le rôle des inscriptions documentaires dans la transmission des savoirs. Le cas de la psychologie comme discipline / The role of material inscriptions in knowledge transmission : a case study of the use of documents in the academic discipline of psychology Temperville, Véronique 24 June 2014 (has links) Cette étude porte sur l’évolution des pratiques info-documentaires des étudiants de psychologie en troisième année de licence. Nous nous intéressons aux modifications engendrées par le numérique sur la culture informationnelle des étudiants. Nous défendons l’idée que le web transforme les valeurs, les représentations et les pratiques attachées à la culture académique. Nous abordons cette hypothèse à partir des matérialités documentaires et notamment des aspects éditoriaux. Ce travail croise des données issues d’entretiens réalisés auprès des étudiants et des enseignants, des analyses des discussions et des dépôts sur la plateforme Moodle, et des analyses sémiotiques de documents et de sites. Il développe particulièrement les phénomènes de circulation et d’hybridation dans les pratiques. / This study is about the evolution of third-year undergraduate psychology students' information practices. We look at the changes the digital world has brought about in students' information culture. We defend the idea that the web transforms values, representations, and practices within academic culture. We discuss this hypothesis through consideration of the document as a material form and through analysis of the editorial dimension of the document. This work confronts data obtained from interviews, with students and university professors as well as contributions to the Moodle learning Platform, and semiotic analyses of documents and websites. This research explores the notions of circulation and hybridation as applied to information practice. Culture informationnelle Document Enonciation éditoriale Enseignement supérieur Etudiant Pratiques informationnelles Informational practices Information culture Undergraduate student Document Higher education Editorial enunciation
530	Identificação automática de relações multidocumento / Automatic identification of multidocument relations Maziero, Erick Galani 16 January 2012 (has links) O tratamento multidocumento mostra-se indispensável no cenário atual das mídias eletrônicas, em que são produzidos diversos documentos sobre um mesmo tópico, principalmente quando se considera a explosão de informação permitida pela web. Tanto leitores quanto aplicações computacionais se beneficiam da análise discursiva multidocumento por meio da qual são explicitadas relações entre as porções dos documentos, por exemplo, relações de equivalência, contradição ou de contextualização de alguma informação. A fim de realizar o tratamento automático multidocumento, adota-se neste trabalho a teoria linguístico-computacional CST (Cross-document Structure Theory, Radev, 2000). Esse tipo de conhecimento multidocumento permite que (i) se tratem mais apropriadamente fenômenos como redundância, complementariedade e contradição de informações e, consequentemente, (ii) produzam-se sistemas melhores de processamento textual, como buscadores web mais inteligentes e sumarizadores automáticos. Neste trabalho é apresentada uma metodologia de identificação dessas relações explorando-se técnicas de aprendizado automático do paradigma tradicional e hierárquico. Para relações que não são passíveis de identificação por aprendizado automático foram desenvolvidas regras para sua identificação. Por fim, um parser é gerado contendo classificadores e regras / The multi-document treatment is essential in the current scenario of electronic media, in which many documents are produced about a same topic, mainly when considering the explosion of information allowed by the web. Both readers and computational applications are benefited by the discursive multi-document analysis, through which the relations (for example, equivalence, contradiction or background relations) among the portions of text are showed. In order to achieve the automatic multi-document treatment, the CST (Cross-document Structure Theory, Radev, 2000) is adopted in this work. This kind of knowledge allow (i) the appropriated treatment of phenomena like redundancy, complementarity and contradiction of information and, consequently, (ii) the production of better systems of text processing, as more intelligent web searchers and automatic summarizers. In this work, a methodology to identify these relations is presented exploring techniques of machine learning of the traditional and hierarchical paradigm. For relations with low frequency in the corpus, handcrafted rules were developed. Finally, a parser is generated containing classifiers and rules Análise multidocumento Aprendizado automático Cross-document structure theory Cross-document strucure theory Machine learning Multidocument analysis Multidocument parsing Multidocument relationship Relações multidocumento Rules

Search results