Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
461 |
Nautral language understanding in controlled virtual environments /Ye, Patrick. January 2009 (has links)
Thesis (Ph.D.)--University of Melbourne, Dept. of Computer Science and Software Engineering, 2009. / Typescript. Name on cover : Patrick Jing Ye. Includes bibliographical references.
|
462 |
Μελέτη και έλεγχος του Python Natural Language Toolkit στην ελληνική γλώσσαΣταυλιώτης, Λεωνίδας 14 May 2012 (has links)
Στην παρούσα διπλωματική εργασία παρουσιάζεται ο έλεγχος του εργαλείου NLTK (Natural Language Toolkit) της Python. Συγκεκριμένα, το nltk είναι μια ανοιχτού κώδικα βιβλιοθήκη συναρτήσεων για επεξεργασία φυσικής γλώσσας και ανάπτυξη ανάλογων εφαρμογών. Έχει αναπτυχθεί σε γλώσσα Python με στόχο την ανάλυση και ανάπτυξη εφαρμογών κυρίως για την Αγγλική γλώσσα. Αντικείμενο αυτής της εργασίας είναι η συστηματική μελέτη και ο έλεγχος των συναρτήσεων του nltk για την Ελληνική γλώσσα, καθώς υπάρχουν ενδείξεις ότι σημαντικό μέρος αυτών δουλεύει σωστά. Αρχικά, έγινε η μελέτη για εισαγωγή ελληνικών κειμένων, καθώς και κατάλληλη επεξεργασία αυτών, ώστε να είναι σε επεξεργάσιμη μορφή από το εργαλείο. Έπειτα, ελέγχθησαν όλες οι εντολές και κατηγοριοποιήθηκαν με βάση τη λειτουργία τους. Τέλος, παρατηρώντας τα συγκεντρωτικά αποτελέσματα, εξάγεται το συμπέρασμα ότι οι υποψίες για σωστή λειτουργία μεγάλου αριθμού εντολών επαληθεύονται, καθώς το 87,9 % των εντολών φαίνεται να λειτουργεί σωστά. / This diploma dissertation presents the examination of Python NLTK (Natural Language Toolkit) tool. Particularly, nltk is an open source function library suitable for natural language processing and the development of respective applications. It has been developed into Python language in order to analyse and develop applications mostly for the English language. The present dissertation is concerned with the systematic study and the examination of nltk functions for the Greek language, given that there is evidence of the correct operation of some. At first, research for the input of Greek texts as well as their appropriate processing was conducted as a way of presenting these texts in a processable by the tool form. Thereupon, all functions were tested and categorised in terms of their operation. Finally, the observation of concentrated results leads to the conclusion that the initial hypothesis for the correct operation of a great number of order is confirmed, as 87,9% of the functions appears to be operating correctly.
|
463 |
The use of systems engineering principles for the integration of existing models and simulationsLuff, Robert January 2017 (has links)
With the rise in computational power, the prospect of simulating a complex engineering system with a high degree of accuracy and in a meaningful way is becoming a real possibility. Modelling and simulation have become ubiquitous throughout the engineering life cycle, as a consequence there are many thousands of existing models and simulations that are potential candidates for integration. This work is concerned with ascertaining if systems engineering principles are of use in the support of virtual testing, from desire to test, designing experiments, specifying simulations, selecting models and simulations, integrating component parts, verifying that the work is as specified, and validating that any outcomes are meaningful. A novel representation of systems engineering framework is proposed and forms the bases for the methods that were developed. It takes the core systems engineering principles and expresses them in a way that can be implemented in a variety of ways. An end to end process for virtual testing with the potential to use existing models and simulations is proposed, it provides structure and order to the testing task. A key part of the proposed process is the recognition that models and simulations requirements are different from those of the system being designed, and hence a modelling and simulation specific writing guide is produced. The automation of any engineering task has the potential to reduce the time to market of the final product, for this reason the potential of natural language processing technology to hasten the proposed processes was investigated. Two case studies were selected to test and demonstrate the potential of the novel approach, the first being an investigation into material selection for a squash ball, and the second being automotive in nature concerned with combining steering and braking systems. The processes and methods indicated their potential value, especially in the automotive case study where inconsistences were identified that could have otherwise affected the successful integration. This capability, combined with the verification stages, improves the confidence of any model and simulation integration. The NLP proof of concept software also demonstrated that such technology has value in the automation of integration. With further testing and development there is the possibility to create a software package to guide engineers through the difficult task of virtual testing. Such a tool would have the potential to drastically reduce the time to market of complex products.
|
464 |
A Perception Based Question-Answering Architecture Derived from Computing with WordsTorres Parra, Jimena Cecilia 01 December 2009 (has links)
Most search applications in use today employ a keyword based search mechanism, which do not have any deductive abilities and are therefore unable to understand human perceptions underlying any given search. This paper proposes a framework for a Fuzzy Expert System for question-answer support while searching within a specific domain. Development of such a framework requires computing theories which can understand and manipulate the knowledge inherent in natural language based documents. To this end, we can now employ the newly introduced theory of Computing with Words (CW). The recent introduction of CW, by Lofti Zadeh, signifies a break from the traditional computing model and promises to enable analysis of natural language based information. In order to provide a bridge between raw natural language text and CW, the use of Probabilistic Context Free Grammar (PCFG) is proposed. Together the two theories form the core of the proposed framework that allows search applications to be constructed with the capabilities of deduction and perception analysis using a natural language interface.
|
465 |
Software Development Productivity Metrics, Measurements and ImplicationsGupta, Shweta 06 September 2018 (has links)
The rapidly increasing capabilities and complexity of numerical software present a growing challenge to software development productivity. While many open source projects enable the community to share experiences, learn and collaborate; estimating individual developer productivity becomes more difficult as projects expand. In this work, we analyze some HPC software Git repositories with issue trackers and compute productivity metrics that can be used to better understand and potentially improve development processes. Evaluating productivity in these communities presents additional challenges because bug reports and feature requests are often done by using mailing lists instead of issue tracking, resulting in difficult-to-analyze unstructured data. For such data, we investigate automatic tag generation by using natural language processing techniques. We aim to produce metrics that help quantify productivity improvement or degradation over the projects lifetimes. We also provide an objective measurement of productivity based on the effort estimation for the developer's work.
|
466 |
Characterizing Online Social Media: Topic Inference and Information PropagationRezayidemne, Seyedsaed 31 October 2018 (has links)
Word-of-mouth (WOM) communication is a well studied phenomenon in the literature and content propagation in Online Social Networks (OSNs) is one of the forms of WOM mechanism that have been prevalent in recent years specially with the widespread surge of online communities and online social networks. The basic piece of information in most OSNs is a post (e.g., a tweet in Twitter or a post in Facebook). A post can contain different types of content such as text, photo, video, etc, or a mixture of two or more them. There are also various ways to enrich the text by mentioning other users, using hashtags, and adding URLs to external contents. The goal of this study is to investigate what factors contribute into the propagation of messages in Google+. To answer to this question a multidimensional study will be conducted. On one hand this question could be viewed as a natural language processing problem where topic or sentiment of posts cause message dissemination. On the other hand the propagation can be effect of graph properties i.e., popularity of message originators (node degree) or activities of communities. Other aspects of this problem are time, external contents, and external events. All of these factors are studied carefully to find the most highly correlated attribute(s) in the propagation of posts.
|
467 |
Reconnaissance des entités nommées par exploration de règles d'annotation : interpréter les marqueurs d'annotation comme instructions de structuration locale. / Named entity recognition by mining association rulesNouvel, Damien 20 November 2012 (has links)
Le développement des technologies de l'information et de la communication à modifié en profondeur la manière dont nous avons accès aux connaissances. Face à l’afflux de données et à leur diversité, il est nécessaire de meure su point des technologies performantes et robustes pour y rechercher des informations. Notre travail porte sur le reconnaissance des entités nommées et leur annotation su sein de transcriptions d’émissions radiodiffusées ou télévisuelles. En première partie, nous abordons le problématique de la reconnaissance automatique des entités nommées. Après une caractérisation de leur nature linguistique, nous proposons une approche par instructions, fondée sur les marqueurs (balises) d’annotation, qui considère ces éléments isolément (début ou fin d’une annotation). En seconde partie, nous faisons état des travaux en fouille de données et présentons un cadre formel pour explorer les données. Nous y proposons une formulation alternative par segments, qui limite la combinatoire lors de l’exploration. Les motifs corrélés à un ou plusieurs marqueurs d’annotation sont extraits comme règles d’annotation. La dernière partie décrit le cadre expérimental, quelques spécificités de l’implémentation du système (mXS) et les résultats obtenus. Nous montrons l’intérêt d’extraire largement les règles d’annotation et expérimentons les motifs de segments. Nous fournissons des résultats chiffrés relatifs aux performances du système à divers point de vue et dans diverses configurations. Ils montrent que l’approche que nous proposons est compétitive et qu’elle ouvre des perspectives dans le cadre de l’observation des langues naturelles et de l’annotation automatique. / Those latest decades, the development of information end communication technologies has deeply modified die way we access knowledge. Facing the volume end the diversity of date, it is necessary to work out robust end efficient technologies to retrieve information. The present work considers recognition and annotation of Named Entities within radio and TV broadcasts transcripts. For this purpose, we interpret die annotation task es s local structuration. We can therefore leverage data to empirically extract mies that govern annotation markers (or tags) presence. In die first part, we introduce our problematic: processing named entities. We question named entities status (related notions, typologies, evaluation end annotation) and propose properties to define their linguistic nature. We conclude this part by describing state-of-the-art approaches end by presenting our contribution, focused on markers (tags) diet begin or end an annotation. In die second part, we present die formalism used to mine date. The framework we use to enrich date, explore sequences and extract annotation rules is formalized. The lest part describes the implemented system (mXS) and the obtained results. Specific implementation details are given and results about rule extraction from data are reported. Finally, we provide quantitative results of the performance of mXS on Ester2 end Etape datasets, among with various indications about die behaviour of die system from diverse points of view and in diverse configurations. They show diet our approach gives competitive results end that it opens up new perspectives for natural language processing and automatic annotation.
|
468 |
Detection of Naming Convention Violations in Process Models for Different LanguagesLeopold, Henrik, Rami-Habib, Eid-Sabbagh, Mendling, Jan, Guerreiro Azevdo, Leonardo, Baião, Fernanda Araujo 12 1900 (has links) (PDF)
Companies increasingly use business process modeling for documenting and redesigning their operations.
However, due to the size of such modeling initiatives, they often struggle with the quality assurance of their
model collections. While many model properties can already be checked automatically, there is a notable gap
of techniques for checking linguistic aspects such as naming conventions of process model elements. In this
paper, we address this problem by introducing an automatic technique for detecting violations of naming
conventions. This technique is based on text corpora and independent of linguistic resources such as WordNet.
Therefore, it can be easily adapted to the broad set of languages for which corpora exist. We demonstrate the
applicability of the technique by analyzing nine process model collections from practice, including over 27,000
labels and covering three different languages. The results of the evaluation show that our technique yields
stable results and can reliably deal with ambiguous cases. In this way, this paper provides an important
contribution to the field of automated quality assurance of conceptual models.
|
469 |
Génération modulaire de grammaires formelles / Modular generation of formal grammarsPetitjean, Simon 11 December 2014 (has links)
Les travaux présentés dans cette thèse visent à faciliter le développement de ressources pour le traitement automatique des langues. Les ressources de ce type prennent des formes très diverses, en raison de l’existence de différents niveaux d’étude de la langue (syntaxe, morphologie, sémantique,. . . ) et de différents formalismes proposés pour la description des langues à chacun de ces niveaux. Les formalismes faisant intervenir différents types de structures, un unique langage de description n’est pas suffisant : il est nécessaire pour chaque formalisme de créer un langage dédié (ou DSL), et d’implémenter un nouvel outil utilisant ce langage, ce qui est une tâche longue et complexe. Pour cette raison, nous proposons dans cette thèse une méthode pour assembler modulairement, et adapter, des cadres de développement spécifiques à des tâches de génération de ressources langagières. Les cadres de développement créés sont construits autour des concepts fondamentaux de l’approche XMG (eXtensible MetaGrammar), à savoir disposer d’un langage de description permettant la définition modulaire d’abstractions sur des structures linguistiques, ainsi que leur combinaison non-déterministe (c’est à dire au moyen des opérateurs logiques de conjonction et disjonction). La méthode se base sur l’assemblage d’un langage de description à partir de briques réutilisables, et d’après un fichier unique de spécification. L’intégralité de la chaîne de traitement pour le DSL ainsi défini est assemblée automatiquement d’après cette même spécification. Nous avons dans un premier temps validé cette approche en recréant l’outil XMG à partir de briques élémentaires. Des collaborations avec des linguistes nous ont également amené à assembler des compilateurs permettant la description de la morphologie de l’Ikota (langue bantoue) et de la sémantique (au moyen de la théorie des frames). / The work presented in this thesis aim at facilitating the development of resources for natural language processing. Resources of this type take different forms, because of the existence of several levels of linguistic description (syntax, morphology, semantics, . . . ) and of several formalisms proposed for the description of natural languages at each one of these levels. The formalisms featuring different types of structures, a unique description language is not enough: it is necessary to create a domain specific language (or DSL) for every formalism, and to implement a new tool which uses this language, which is a long a complex task. For this reason, we propose in this thesis a method to assemble in a modular way development frameworks specific to tasks of linguistic resource generation. The frameworks assembled thanks to our method are based on the fundamental concepts of the XMG (eXtensible MetaGrammar) approach, allowing the generation of tree based grammars. The method is based on the assembling of a description language from reusable bricks, and according to a unique specification file. The totality of the processing chain for the DSL is automatically assembled thanks to the same specification. In a first time, we validated this approach by recreating the XMG tool from elementary bricks. Some collaborations with linguists also brought us to assemble compilers allowing the description of morphology and semantics.
|
470 |
A verb learning model driven by syntactic constructions / Um modelo de aquisição de verbos guiado por construções sintáticasMachado, Mario Lúcio Mesquita January 2008 (has links)
Desde a segunda metade do último século, as teorias cognitivas têm trazido algumas visões interessantes em relação ao aprendizado de linguagem. A aplicação destas teorias em modelos computacionais tem duplo benefício: por um lado, implementações computacionais podem ser usaas como uma forma de validação destas teorias; por outro lado, modelos computacionais podem alcançar uma performance melhorada a partir da adoção de estratégias de aprendizado cognitivamente plausíveis. Estruturas sintáticas são ditas fornecer uma pista importante para a aquisição do significado de verbos. Ainda, para um subconjunto particular de verbos muito frequentes e gerais - os assim-chamados light verbs - há uma forte ligação entre as estruturas sintáticas nas quais eles aparecem e seus significados. Neste trabalho, empregamos um modelo computacional para investigar estas propostas, em particular, considerando a tarefa de aquisição como um mapeamento entre um verbo desconhecido e referentes prototípicos para eventos verbais, com base na estrutura sintática na qual o verbo aparece. Os experimentos conduzidos ressaltaram alguns requerimentos para um aprendizado bem-sucedido, em termos de níveis de informação disponível para o aprendiz e da estratégia de aprendizado adotada. / Cognitive theories have been, since the second half of the last century, bringing some interesting views about language learning. The application of these theories on computational models has double benefits: in the one hand, computational implementations can be used as a form of validation of these theories; on the other hand, computational models can earn an improved performance from adopting some cognitively plausible learning strategies. Syntactic structures are said to provide an important cue for the acquisition of verb meaning. Yet, for a particular subset of very frequent and general verbs – the so-called light verbs – there is a strong link between the syntactic structures in which they appear and their meanings. In this work, we used a computational model, to further investigate these proposals, in particular looking at the acquisition task as a mapping between an unknown verb and prototypical referents for verbal events, on the basis of the syntactic structure in which the verb appears. The experiments conducted have highlighted some requirements for a successful learning, both in terms of the levels of information available to the learner and the learning strategies adopted.
|
Page generated in 0.085 seconds