Global ETD Search

101	Localisation interne et en contexte des logiciels commerciaux et libres Fraisse, Amel 10 June 2010 (has links) (PDF) Nous proposons une méthode novatrice pour permettre la localisation en contexte de la majorité des logiciels commerciaux et libres, ceux programmés en Java et en C++/C#. Actuellement, la traduction des documents techniques ainsi que celle des éléments d'interface des logiciels commerciaux est confiée uniquement à des professionnels, ce qui allonge le processus de traduction, le rend coûteux, et quelquefois aboutit à une mauvaise qualité car les traducteurs professionnels n'ont pas accès au contexte d'utilisation des éléments textuels. Dès que l'on sort du petit ensemble des quelques langues les mieux dotées, et que lon veut localiser un logiciel pour des " langues peu dotées ", ce processus n'est plus viable pour des raisons de coût et surtout de rareté, de cherté, ou d'absence de traducteurs professionnels. Notre méthode consiste à faire participer de façon efficace et dynamique les bêta-testeurs et les utilisateurs finals au processus de localisation : pendant qu'ils utilisent l'application, les utilisateurs connaissant la langue originale du logiciel (souvent mais pas toujours l'anglais) peuvent intervenir sur les éléments textuels d'interface que l'application leur présente dans leur contexte d'utilisation courant. Ils peuvent ainsi traduire en contexte les boutons, les menus, les étiquettes, les onglets, etc., ou améliorer la traduction proposée par des systèmes de traduction automatique (TA) ou des mémoires de traductions (MT). Afin de mettre en place ce nouveau paradigme, nous avons besoin d'intervenir très localement sur le code source du logiciel : il s'agit donc aussi d'un paradigme de localisation interne. La mise en place d'une telle approche de localisation a nécessité l'intégration d'un gestionnaire de flot de traductions " SECTra_w ". Ainsi, nous avons un nouveau processus de localisation tripartite dont les trois parties sont l'utilisateur, l'éditeur du logiciel et le site collaboratif SECTra_w. Nous avons effectué une expérimentation complète du nouveau processus de localisation sur deux logiciels libres à code source ouvert : Notepad-plus-plus et Vuze. Traitement Automatique des Langues Localisation de logiciels Localisation en contexte
102	Fluency enhancement : applications to machine translation : thesis for Master of Engineering in Information & Telecommunications Engineering, Massey University, Palmerston North, New Zealand Manion, Steve Lawrence January 2009 (has links) The quality of Machine Translation (MT) can often be poor due to it appearing incoherent and lacking in fluency. These problems consist of word ordering, awkward use of words and grammar, and translating text too literally. However we should not consider translations such as these failures until we have done our best to enhance their quality, or more simply, their fluency. In the same way various processes can be applied to touch up a photograph, various processes can also be applied to touch up a translation. This research outlines the improvement of MT quality through the application of Fluency Enhancement (FE), which is a process we have created that reforms and evaluates text to enhance its fluency. We have tested our FE process on our own MT system which operates on what we call the SAM fundamentals, which are as follows: Simplicity - to be simple in design in order to be portable across different languages pairs, Adaptability - to compensate for the evolution of language, and Multiplicity - to determine a final set of translations from as many candidate translations as possible. Based on our research, the SAM fundamentals are the key to developing a successful MT system, and are what have piloted the success of our FE process. machine translating machine translation
103	Fluency enhancement : applications to machine translation : thesis for Master of Engineering in Information & Telecommunications Engineering, Massey University, Palmerston North, New Zealand Manion, Steve Lawrence January 2009 (has links) The quality of Machine Translation (MT) can often be poor due to it appearing incoherent and lacking in fluency. These problems consist of word ordering, awkward use of words and grammar, and translating text too literally. However we should not consider translations such as these failures until we have done our best to enhance their quality, or more simply, their fluency. In the same way various processes can be applied to touch up a photograph, various processes can also be applied to touch up a translation. This research outlines the improvement of MT quality through the application of Fluency Enhancement (FE), which is a process we have created that reforms and evaluates text to enhance its fluency. We have tested our FE process on our own MT system which operates on what we call the SAM fundamentals, which are as follows: Simplicity - to be simple in design in order to be portable across different languages pairs, Adaptability - to compensate for the evolution of language, and Multiplicity - to determine a final set of translations from as many candidate translations as possible. Based on our research, the SAM fundamentals are the key to developing a successful MT system, and are what have piloted the success of our FE process. machine translating machine translation
104	Méthodes et modèles de construction automatisée d'ontologies pour des domaines spécialisés / Methods and models for the learning the domain ontology Goncharova, Olena 23 February 2017 (has links) La thèse est préparée dans le cadre d’une convention de cotutelle sous la direction des Professeurs Jean-Hugues Chauchat (ERIC-Lyon2) et N.V. Charonova (Université Nationale Polytechnique de Kharkov en Ukraine).1. Les résultats obtenus peuvent se résumer ainsi : Rétrospective des fondations théoriques sur la formalisation des connaissances et langue naturelle en tant que précurseurs de l’ingénierie des ontologies. Actualisation de l’état de l’art sur les approches générales dans le domaine de l’apprentissage d’ontologie, et sur les méthodes d’extraction des termes et des relations sémantiques. Panorama des plateformes et outils de construction et d’apprentissage des ontologies ; répertoire des ressources lexicales disponibles en ligne et susceptibles d’appuyer l’apprentissage d’ontologie (apprentissage des concepts et relation). 2. Propositions méthodologiques : Une méthode d’apprentissage des patrons morphosyntaxiques et d’installation de taxonomies partielles de termes. Une méthode de formation de classes sémantiques représentant les concepts et les relations pour le domaine de la sécurité radiologique. Un cadre (famework) d’organisation des étapes de travaux menant à la construction de l’ontologie du domaine de la sécurité radiologique.3. Implémentation et expérimentations : Installation de deux corpus spécialisés dans le domaine de la protection radiologique, en français et en russe, comprenant respectivement 1 500 000 et 600 000 unités lexicales. Implémentation des trois méthodes proposées et analyse des résultats obtenus. Les résultats ont été présentés dans 13 publications, revues et actes de conférences nationales et internationales, entre 2010 et 2016, notamment IMS-2012, TIA-2013, TOTH-2014, Eastern-European Journal of Eenterprise Technologies, Bionica Intellecta (Бионика интеллекта), Herald of the NTU «~KhPI~» (Вестник НТУ «~ХПИ~»). / The thesis has been prepared within a co-supervision agreement with the Professors Jean-Hugues Chauchat (ERIC-Lyon2) and N.V. Charonova (National Polytechnic University of Kharkov in Ukraine).The results obtained can be summarized as follows:1. State of the art:Retrospective of theoretical foundations concerning the formalization of knowledge and natural language as precursors of ontology engineering.Update of the state of the art on general approaches in the field of ontology learning, and on methods for extracting terms and semantic relations.Overview of platforms and tools for ontology construction and learning; list of lexical resources available online able to support ontology learning (concept learning and relationship).2. Methodological proposals:Learning morphosyntactic patterns and implementing partial taxonomies of terms.Finding semantic classes representing concepts and relationships for the field of radiological safety.Building a frame for the various stages of the work leading to the construction of the ontology in the field of radiological safety.3. Implementation and experiments:Loading of two corpuses specialized in radiological protection, in French and Russian, with 1,500,000 and 600,000 lexical units respectively.Implementation of the three previous methods and analysis of the results obtained.The results have been published in 13 national and international journals and proceedings, between 2010 and 2016, including IMS-2012, TIA-2013, TOTH-2014, Bionica Intellecta (Бионика интеллекта) , Herald of the NTU "~ KhPI ~" (Вестник НТУ "~ ХПИ ~"). Apprentissage des ontologies Traitement de texte Analyse sémantique Extraction de termes Extraction de relations Ontology learning Text processing Semantic analysis Terms extraction 001
105	Extração de relações semanticas via análise de correlação de termos em documentos / Extracting semantic relations via analysis of correlated terms in documents Botero, Sergio William 12 December 2008 (has links) Orientador: Ivan Luiz Marques Ricarte / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-12T17:41:25Z (GMT). No. of bitstreams: 1 Botero_SergioWilliam_M.pdf: 2163763 bytes, checksum: a7c5db625a3d99cead80cee63b7908ce (MD5) Previous issue date: 2008 / Resumo: Sistemas de recuperação de informação são ferramentas para automatizar os procedimentos de busca por informações. Surgiram com propostas simples nas quais a recuperação era baseada exclusivamente na sintaxe das palavras e evoluíram para sistemas baseados na semântica das palavras como, por exemplo, os que utilizam ontologias. Entretanto, a especificação manual de ontologias é uma tarefa extremamente custosa e sujeita a erros humanos. Métodos automáticos para a construção de ontologias mostraram-se ineficientes, identificando falsas relações semânticas. O presente trabalho apresenta uma técnica baseada em processamento de linguagem natural e um novo algoritmo de agrupamento para a extração semi-automática de relações que utiliza o conteúdo dos documentos, uma ontologia de senso comum e supervisão do usuário para identificar corretamente as relações semânticas. A proposta envolve um estágio que utiliza recursos lingüísticos para a extração de termos e outro que utiliza algoritmos de agrupamento para a identificação de conceitos e relações semânticas de instanciação entre termos e conceitos. O algoritmo proposto é baseado em técnicas de agrupamento possibilístico e de bi-agrupamento e permite a extração interativa de conceitos e relações. Os resultados são promissores, similares às metodologias mais recentes, com a vantagem de permitir a supervisão do processo de extração / Abstract: Information Retrieval systems are tools to automate the searching for information. The first implementations were very simple, based exclusively on word syntax, and have evolved to systems that use semantic knowledge such as those using ontologies. However, the manual specification is an expensive task and subject to human mistakes. In order to deal with this problem, methodologies that automatically construct ontologies have been proposed but they did not reach good results, identifying false semantic relation between words. This work presents a natural language processing technique e a new clustering algorithm for the semi-automatic extraction of semantic relations by using the content of the document, a commom-sense ontology, and the supervision of the user to correctly identify semantic relations. The proposal encompasses a stage that uses linguistic resources to extract the terms and another stage that uses clustering algorithms to identify concepts and instanceof relations between terms and concepts. The proposed algorithm is based on possibilistic clustering and bi-clustering techniques and it allows the interative extraction of concepts. The results are promising, similar to the most recent methodologies, with the advantage of allowing the supervision of the extraction process / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Processamento de textos (Computação) Semântica Recuperação da informação Ontologia Text processing (Computation) Semantic Information retrieval Information retrieval system Ontology
106	Detektor plagiátů textových dokumentů / Text document plagiarism detector Kořínek, Lukáš January 2021 (has links) This diploma thesis is concerned with research on available methods of plagiarism detection and then with design and implementation of such detector. Primary aim is to detect plagiarism within academic works or theses issued at BUT. The detector uses sophisticated preprocessing algorithms to store documents in its own corpus (document database). Implemented comparison algorithms are designed for parallel execution on graphical processing units and they compare a single subject document against all other documents within the corpus in the shortest time possible, enabling near real-time detection while maintaining acceptable quality of output.
107	Určení základního tvaru slova / Determination of basic form of words Šanda, Pavel January 2011 (has links) Lemmatization is an important preprocessing step for many applications of text mining. Lemmatization process is similar to the stemming process, with the difference that determines not only the word stem, but it´s trying to determines the basic form of the word using the methods Brute Force and Suffix Stripping. The main aim of this paper is to present methods for algorithmic improvements Czech lemmatization. The created training set of data are content of this paper and can be freely used for student and academic works dealing with similar problematics.
108	Återkoppling som ger bearbetning : En empirisk studie av lågstadielärares återkoppling på elevtexter / Feedback that provides processing. : An empirical study about primary school teachers’ feedback on students’ texts Rudin, Maja, Kunze, Sanna January 2020 (has links) Föreliggande studies syfte är att undersöka hur verksamma lärare på lågstadiet arbetar med återkoppling för att eleven ska få möjlighet att bearbeta texter. Den teoretiska ramen för studien är den sociokulturella teorin där stöttning (eng. scaffolding) i form av lärarens återkoppling är i fokus för att utveckla elevens textskapande. Metoderna som används i studien är intervjuer med lärare och analys av lärarkommentarer på elevtexter. Resultatet av textanalysen och intervjuerna visar att responsen från läraren utgår från de kunskapskrav som finns i svenska för årskurs 1–3 i läroplanen för grundskolan. Feedbacken till eleven är oftast muntlig, men de lärare som arbetar med digitala verktyg ger oftast skriftlig respons direkt i elevens dokument. Det framgick även att flertalet lärare gärna skulle vilja ge en-till-en-återkoppling oftare för att eleverna ska få en god respons som hjälper dem framåt i bearbetningen, men att tidsbrist och resurser är några hinder som finns för att lärare ska kunna använda det här arbetssättet. Sociocultural theory scaffolding feedback primary school text processing sociokulturell teori stöttning respons lågstadiet bearbetning av text Pedagogical Work Pedagogiskt arbete
109	Risk analysis of implementing Machine Learning in construction projects Roy, Aki January 2024 (has links) Machine Learning has significantly influenced development across domains by leveraging incoming and existing data. However, despite its advancements, criticism persists regarding its failure to adequately address real-world problems, with the construction domain being an example. Construction sector is crucial for global economic growth, yet it remains largely unexplored, lacking sufficient research and technological utilization of its extensive data. Despite increasing publications on adapting technological advancements, the primary focus is on urging industry innovation through digitization. Recently, adopting Machine Learning to address operational challenges has gained attention. While some studies have explored potential ML integration opportunities in construction, there is a gap in understanding the factors and barriers hindering its adoption across projects. This study investigates the factors restricting organizations from implementing ML in construction projects and their consequent operational impacts. This study employs a comprehensive literature review of ML concepts and identifies gaps in construction data. Qualitative interviews have been conducted in a semi-structured manner with five industry professionals offering practical insights, preceding a thematic analysis of interview data. Themes are analyzed and discussed in relation to theoretical material to identify connections. Finally, a risk assessment based on identified risks is evaluated through a risk matrix. The results of this study discuss the challenges and potential benefits of implementing ML within the construction industry. The study further emphasizes the necessity of knowledge to understand project-specific datasets. With a primary focus on unstructured text and image data, the study uncovers challenges related to data inconsistency that affect data reliability. While recognizing ML’s potential to streamline construction operations, the study underscores challenges such as data security and digitalization. In summary, this study emphasizes the importance of data quality, security, and cultural transformation in harnessing ML’s capabilities to improve construction project management and operations. Construction Machine Learning Unstructured Data Image Processing Text Processing Project Analysis Data Management Risk Identification Information Systems
110	Complexification des données et des techniques en linguistique : contributions du TAL aux solutions et aux problèmes Tanguy, Ludovic 11 September 2012 (has links) (PDF) Ce mémoire d'habilitation est l'occasion de faire le bilan de mon activité d'enseignant-chercheur en traitement automatique des langues (TAL) dans un laboratoire de linguistique (CLLE-ERSS) et des principales évolutions de l'outillage informatique de la linguistique au cours des 15 dernières années. Mes recherches portent notamment sur le repérage de structures morphosyntaxiques dans les textes, l'analyse des structures du discours et l'acquisition de ressources lexicales à partir de corpus. Certaines se positionnent dans des cadres applicatifs comme la recherche d'information et la classification de textes, mais aussi dans des contextes plus spécifiques en lien avec d'autres disciplines (médecine, psychologie, sociologie...). En m'appuyant sur la diversité de ces travaux et de mes collaborations, j'identifie quatre dimensions d'évolution principales : - l'augmentation de la masse de données langagières disponibles et notamment la part croissante de l'utilisation du Web comme corpus ; - la complexification de l'outillage informatique disponible pour gérer la masse et la variété des données accessibles (outils de constitution et d'interrogation de corpus) ; - la complexification de l'annotation des données langagières, qu'elle soit manuelle, assistée ou automatique ; - la montée en puissance, en TAL mais aussi en linguistique descriptive, des méthodes quantitatives (depuis l'analyse statistique jusqu'aux techniques de fouille de données et d'apprentissage). Si les avancées techniques du TAL ont permis d'accroître de façon conséquente les potentialités d'investigation du matériau langagier, et dans certains cas de dégager de nouveaux questionnements, elles ont aussi contribué à creuser un fossé entre les deux composantes (informatique et linguistique) de la discipline. A travers ma propre expérience d'acteur ou d'accompagnateur de ces changements et avec une vocation de "passeur" interdisciplinaire, je cherche à dégager les principaux enjeux actuels pour la linguistique outillée : - doter la linguistique descriptive d'outils de visualisation de données pour aborder la complexité, en exploitant les avancées théoriques et techniques de ce nouveau champ disciplinaire et en les adaptant aux spécificités du matériau langagier ; - rendre abordables aux linguistes les techniques fondamentales de l'analyse statistique, mais aussi les méthodes d'apprentissage artificiel seules capables d'assister l'investigation et l'exploitation de données massives et complexes ; - replacer la linguistique au sein des développements actuels du TAL, notamment par le biais de l'utilisation de descripteurs linguistiques riches dans les outils de traitement par apprentissage, pour un bénéfice mutuel.

Search results