Global ETD Search

1091	Automating the conversion of natural language fiction to multi-modal 3D animated virtual environments Glass, Kevin Robert January 2009 (has links) Popular fiction books describe rich visual environments that contain characters, objects, and behaviour. This research develops automated processes for converting text sourced from fiction books into animated virtual environments and multi-modal films. This involves the analysis of unrestricted natural language fiction to identify appropriate visual descriptions, and the interpretation of the identified descriptions for constructing animated 3D virtual environments. The goal of the text analysis stage is the creation of annotated fiction text, which identifies visual descriptions in a structured manner. A hierarchical rule-based learning system is created that induces patterns from example annotations provided by a human, and uses these for the creation of additional annotations. Patterns are expressed as tree structures that abstract the input text on different levels according to structural (token, sentence) and syntactic (parts-of-speech, syntactic function) categories. Patterns are generalized using pair-wise merging, where dissimilar sub-trees are replaced with wild-cards. The result is a small set of generalized patterns that are able to create correct annotations. A set of generalized patterns represents a model of an annotator's mental process regarding a particular annotation category. Annotated text is interpreted automatically for constructing detailed scene descriptions. This includes identifying which scenes to visualize, and identifying the contents and behaviour in each scene. Entity behaviour in a 3D virtual environment is formulated using time-based constraints that are automatically derived from annotations. Constraints are expressed as non-linear symbolic functions that restrict the trajectories of a pair of entities over a continuous interval of time. Solutions to these constraints specify precise behaviour. We create an innovative quantified constraint optimizer for locating sound solutions, which uses interval arithmetic for treating time and space as contiguous quantities. This optimization method uses a technique of constraint relaxation and tightening that allows solution approximations to be located where constraint systems are inconsistent (an ability not previously explored in interval-based quantified constraint solving). 3D virtual environments are populated by automatically selecting geometric models or procedural geometry-creation methods from a library. 3D models are animated according to trajectories derived from constraint solutions. The final animated film is sequenced using a range of modalities including animated 3D graphics, textual subtitles, audio narrations, and foleys. Hierarchical rule-based learning is evaluated over a range of annotation categories. Models are induced for different categories of annotation without modifying the core learning algorithms, and these models are shown to be applicable to different types of books. Models are induced automatically with accuracies ranging between 51.4% and 90.4%, depending on the category. We show that models are refined if further examples are provided, and this supports a boot-strapping process for training the learning mechanism. The task of interpreting annotated fiction text and populating 3D virtual environments is successfully automated using our described techniques. Detailed scene descriptions are created accurately, where between 83% and 96% of the automatically generated descriptions require no manual modification (depending on the type of description). The interval-based quantified constraint optimizer fully automates the behaviour specification process. Sample animated multi-modal 3D films are created using extracts from fiction books that are unrestricted in terms of complexity or subject matter (unlike existing text-to-graphics systems). These examples demonstrate that: behaviour is visualized that corresponds to the descriptions in the original text; appropriate geometry is selected (or created) for visualizing entities in each scene; sequences of scenes are created for a film-like presentation of the story; and that multiple modalities are combined to create a coherent multi-modal representation of the fiction text. This research demonstrates that visual descriptions in fiction text can be automatically identified, and that these descriptions can be converted into corresponding animated virtual environments. Unlike existing text-to-graphics systems, we describe techniques that function over unrestricted natural language text and perform the conversion process without the need for manually constructed repositories of world knowledge. This enables the rapid production of animated 3D virtual environments, allowing the human designer to focus on creative aspects. Virtual computer systems Virtual storage (Computer science) Virtual reality Computer animation Fiction -- Computer programs Animation (Cinematography)
1092	Vers une approche non orientée pour l'évaluation de la qualité des odeurs / Towards a non oriented approach of the evaluation of the odor quality Medjkoune, Massissilia 30 March 2018 (has links) Caractériser la qualité d’une odeur est une tâche complexe qui consiste à identifier un ensemble de descripteurs qui synthétise au mieux la sensation olfactive au cours de séances d’analyse sensorielle. Généralement, cette caractérisation est une liste de descripteurs extraite d’un vocabulaire imposé par les industriels d’un domaine pour leurs analyses sensorielles. Ces analyses représentent un coût significatif pour les industriels chaque année. En effet, ces approches dites orientées reposent sur l’apprentissage de vocabulaires, limitent singulièrement les descripteurs pour un public non initié et nécessitent de couteuses phases d’apprentissage. Si cette caractérisation devait être confiée à des évaluateurs naïfs, le nombre de participants pourrait être significativement augmenté tout en réduisant le cout des analyses sensorielles. Malheureusement, chaque description libre n’est alors plus associée à un ensemble de descripteurs non ambigus, mais à un simple sac de mots en langage naturel (LN). Deux problématiques sont alors rattachées à la caractérisation d’odeurs. La première consiste à transformer des descriptions en LN en descripteurs structurés ; la seconde se donne pour objet de résumer un ensemble de descriptions formelles proposées par un panel d’évaluateurs en une synthèse unique et cohérente à des fins industrielles. Ainsi, la première partie de notre travail se focalise sur la définition et l’évaluation de modèles qui peuvent être utilisés pour résumer un ensemble de mots en un ensemble de descripteurs désambiguïsés. Parmi les différentes stratégies envisagées dans cette contribution, nous proposons de comparer des approches hybrides exploitant à la fois des bases de connaissances et des plongements lexicaux définis à partir de grands corpus de textes. Nos résultats illustrent le bénéfice substantiel à utiliser conjointement représentation symbolique et plongement lexical. Nous définissons ensuite de manière formelle le processus de synthèse d’un ensemble de concepts et nous proposons un modèle qui s’apparente à une forme d’intelligence humaine pour évaluer les résumés alternatifs au regard d’un objectif de synthèse donné. L’approche non orientée que nous proposons dans ce manuscrit apparait ainsi comme l’automatisation cognitive des tâches confiées aux opérateurs des séances d’analyse sensorielle. Elle ouvre des perspectives intéressantes pour développer des analyses sensorielles à grande échelle sur de grands panels d’évaluateurs lorsque l’on essaie notamment de caractériser les nuisances olfactives autour d’un site industriel. / Characterizing the quality of smells is a complex process that consists in identifying a set of descriptors best summarizing the olfactory sensation. Generally, this characterization results in a limited set of descriptors provided by sensorial analysis experts. These sensorial analysis sessions are however very costly for industrials. Indeed, such oriented approaches based on vocabulary learning limit, in a restrictive manner, the possible descriptors available for any uninitiated public, and therefore require a costly vocabulary-learning phase. If we could entrust this characterization to neophytes, the number of participants of a sensorial analysis session would be significantly enlarged while reducing costs. However, in that setting, each individual description is not related to a set of non-ambiguous descriptors anymore, but to a bag of terms expressed in natural language (NL). Two issues are then related to smell characterization implementing this approach. The first one is how to translate such NL descriptions into structured descriptors; the second one being how to summarize a set of individual characterizations into a consistent and synthetic unique characterization meaningful for professional purposes. Hence, this work focuses first on the definition and evaluation of models that can be used to summarize a set of terms into unambiguous entity identifiers selected from a given ontology. Among the several strategies explored in this contribution, we propose to compare hybrid approaches taking advantages of knowledge bases (symbolic representations) and word embeddings defined from large text corpora analysis. The results we obtain highlight the relative benefits of mixing symbolic representations with classic word embeddings for this task. We then formally define the problem of summarizing sets of concepts and we propose a model mimicking Human-like Intelligence for scoring alternative summaries with regard to a specific objective function. Interestingly, this non-oriented approach for identifying the quality of odors appears to be an actual cognitive automation of the task today performed by expert operators in sensorial analysis. It therefore opens interesting perspectives for developing scalable sensorial analyses based on large sets of evaluators when assessing, for instance, olfactory pollution around industrial sites. Représentation des connaissances Synthèse conceptuelle Fusion d’information Analyse sensorielle Taxonomie Knowledge Representation Conceptual synthesis Information fusion Automatic Natural Language Processing Sensory analysis Taxonomy
1093	[en] DIRECT AND INDIRECT QUOTATION EXTRACTION FOR PORTUGUESE / [pt] EXTRAÇÃO DE CITAÇÕES DIRETAS E INDIRETAS PARA O PORTUGUÊS RAFAEL DOS REIS SILVA 08 June 2017 (has links) [pt] Extração de Citações consiste na identificação de citações de um texto e na associação destas com seus autores. Neste trabalho, apresentamos um Extrator de Citações Diretas e Indiretas para o Português. A tarefa de Extração de Citações já foi abordada usando diversas técnicas em diversos idiomas. Nossa proposta difere das anteriores, pois construímos um modelo de Aprendizado de Máquina que, além de indetificar citações diretas, também identifica as citações indiretas. Citações indiretas são difíceis de serem identificadas num texto por não conter delimitações explícitas. Porém, são mais frequentes do que as delimitadas e, por essa razão, possuem grande importância na extração de informação. Por utilizarmos um modelo baseado em Aprendizado de Máquina, podemos facilmente adaptá-lo para outras línguas, bastando apenas uma lista de verbos do dizer num dado idioma. Poucos foram os sistemas propostos anteriormente que atacaram o problema das citações indiretas e nenhum deles para o Português usando Aprendizado de Máquina. Nós construímos um Extrator de Citações usando um modelo para o algoritmo do Perceptron Estruturado. Com o objetivo de treinar e avaliar o sistema, construímos o corpus QuoTrees 1.0. Nós anotamos este corpus a fim de atacar o problema das citações indiretas. O Perceptron Estruturado baseado no agendamento de tarefas ponderado tem desempenho F1 igual a 66 por cento para o corpus QuoTrees 1.0. / [en] Quotation Extraction consists of identifying quotations from a text and associating them to their authors. In this work, we present a Direct and Indirect Quotation Extraction System for Portuguese. Quotation Extraction has been previously approached using different techniques and for several languages. Our proposal differs from previous work, because we build a Machine Learning model that, besides recognizing direct quotations, it also recognizes indirect ones in Portuguese. Indirect quotations are hard to be identified in a text, due to the lack of explicit delimitation. Nevertheless, they happen more often then the delimited ones and, for this reason, have an huge importance on information extraction. Due to the fact that we use a Machine Learning model based, we can easily adapt it to other languages, needing only a list of verbs of speech for a given language. Few were the previously proposed systems that tackled the task of indirect quotations and neither of them for Portuguese using a Machine Learning approach. We build a Quotation Extractor using a model for the Structured Perceptron algorithm. In order to train and evaluate the system, we build QuoTrees 1.0 corpus. We annotate it to tackle the indirect quotation problem. The Structured Perceptron based on weight interval scheduling obtains an F1 score of 66 percent for QuoTrees 1.0 corpus. [pt] EXTRACAO DE INFORMACAO [en] EXTRATION OF INFORMATION [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [en] NATURAL LANGUAGE PROCESSING [pt] EXTRACAO DE CITACOES [en] QUOTATION EXTRACTION [pt] PERCEPTRON ESTRUTURADO [pt] AGENDAMENTO DE TAREFAS PONDERADO
1094	[en] AUTOMATIC INTERPRETATION OF EQUIPMENT OPERATION REPORTS / [pt] INTERPRETAÇÃO AUTOMÁTICA DE RELATÓRIOS DE OPERAÇÃO DE EQUIPAMENTOS PEDRO HENRIQUE THOMPSON FURTADO 28 July 2017 (has links) [pt] As unidades operacionais da área de Exploração e Produção (EeP) da PETROBRAS utilizam relatórios diários para o registro de situações e eventos em Unidades Estacionárias de Produção (UEPs), as conhecidas plataformas de produção de petróleo. Um destes relatórios, o SITOP (Situação Operacional das Unidades Marítimas), é um documento diário em texto livre que apresenta informações numéricas (índices de produção, algumas vazões, etc.) e, principalmente, informações textuais. A parte textual, apesar de não estruturada, encerra uma valiosíssima base de dados de histórico de eventos no ambiente de produção, tais como: quebras de válvulas, falhas em equipamentos de processo, início e término de manutenções, manobras executadas, responsabilidades etc. O valor destes dados é alto, mas o custo da busca de informações também o é, pois se demanda a atenção de técnicos da empresa na leitura de uma enorme quantidade de documentos. O objetivo do presente trabalho é o desenvolvimento de um modelo de processamento de linguagem natural para a identificação, nos textos dos SITOPs, de entidades nomeadas e extração de relações entre estas entidades, descritas formalmente em uma ontologia de domínio aplicada a eventos em unidades de processamento de petróleo e gás em ambiente offshore. Ter-se-á, portanto, um método de estruturação automática da informação presente nestes relatórios operacionais. Os resultados obtidos demonstram que a metodologia é útil para este caso, ainda que passível de melhorias em diferentes frentes. A extração de relações apresenta melhores resultados que a identificação de entidades, o que pode ser explicado pela diferença entre o número de classes das duas tarefas. Verifica-se também que o aumento na quantidade de dados é um dos fatores mais importantes para a melhoria do aprendizado e da eficiência da metodologia como um todo. / [en] The operational units at the Exploration and Production (E and P) area at PETROBRAS make use of daily reports to register situations and events from their Stationary Production Units (SPUs), the well-known petroleum production platforms. One of these reports, called SITOP (the Portuguese acronym for Offshore Unities Operational Situation), is a daily document in free text format that presents numerical information and, mainly, textual information about operational situation of offshore units. The textual section, although unstructured, stores a valuable database with historical events in the production environment, such as: valve breakages, failures in processing equipment, beginning and end of maintenance activities, actions executed, responsibilities, etc. The value of these data is high, as well as the costs of searching relevant information, consuming many hours of attention from technicians and engineers to read the large number of documents. The goal of this dissertation is to develop a model of natural language processing to recognize named entities and extract relations among them, described formally as a domain ontology applied to events in offshore oil and gas processing units. After all, there will be a method for automatic structuring of the information from these operational reports. Our results show that this methodology is useful in SITOP s case, also indicating some possible enhancements. Relation extraction showed better results than named entity recognition, what can be explained by the difference in the amount of classes in these tasks. We also verified that the increase in the amount of data was one of the most important factors for the improvement in learning and methodology efficiency as a whole. [pt] GAS NATURAL [en] NATURAL GAS [pt] PETROLEO [en] PETROLEUM [pt] ONTOLOGIAS [en] ONTOLOGIES [pt] APRENDIZADO AUTOMATICO [en] AUTOMATIC LEARNING [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [en] NATURAL LANGUAGE PROCESSING
1095	Predicting Software Defectiveness by Mining Software Repositories Kasianenko, Stanislav January 2018 (has links) One of the important aims of the continuous software development process is to localize and remove all existing program bugs as fast as possible. Such goal is highly related to software engineering and defectiveness estimation. Many big companies started to store source code in software repositories as the later grew in popularity. These repositories usually include static source code as well as detailed data for defects in software units. This allows analyzing all the data without interrupting programing process. The main problem of large, complex software is impossibility to control everything manually while the price of the error can be very high. This might result in developers missing defects on testing stage and increase of maintenance cost. The general research goal is to find a way of predicting future software defectiveness with high precision. Reducing maintenance and development costs will contribute to reduce the time-to-market and increase software quality. To address the problem of estimating residual defects an approach was found to predict residual defectiveness of a software by the means of machine learning. For a prime machine learning algorithm, a regression decision tree was chosen as a simple and reliable solution. Data for this tree is extracted from static source code repository and divided into two parts: software metrics and defect data. Software metrics are formed from static code and defect data is extracted from reported issues in the repository. In addition to already reported bugs, they are augmented with unreported bugs found on “discussions” section in repository and parsed by a natural language processor. Metrics were filtered to remove ones, that were not related to defect data by applying correlation algorithm. Remaining metrics were weighted to use the most correlated combination as a training set for the decision tree. As a result, built decision tree model allows to forecast defectiveness with 89% chance for the particular product. This experiment was conducted using GitHub repository on a Java project and predicted number of possible bugs in a single file (Java class). The experiment resulted in designed method for predicting possible defectiveness from a static code of a single big (more than 1000 files) software version. repository mining software metric correlation defect bug natural language processing Pearson coefficient Breiman’s decision tree machine learning Computer Sciences Datavetenskap (datalogi)
1096	Developing an enriched natural language grammar for prosodically-improved concent-to-speech synthesis Marais, Laurette 04 1900 (has links) The need for interacting with machines using spoken natural language is growing, along with the expectation that synthetic speech in this context sound natural. Such interaction includes answering questions, where prosody plays an important role in producing natural English synthetic speech by communicating the information structure of utterances. CCG is a theoretical framework that exploits the notion that, in English, information structure, prosodic structure and syntactic structure are isomorphic. This provides a way to convert a semantic representation of an utterance into a prosodically natural spoken utterance. GF is a framework for writing grammars, where abstract tree structures capture the semantic structure and concrete grammars render these structures in linearised strings. This research combines these frameworks to develop a system that converts semantic representations of utterances into linearised strings of natural language that are marked up to inform the prosody-generating component of a speech synthesis system. / Computing / M. Sc. (Computing) GF CCG Prosody Intonation Speech synthesis Concept-to-speech Information structure Syntax Question-answering Spoken natural language 006.54 Speech synthesis Computational linguistics
1097	Using formal logic to represent sign language phonetics in semi-automatic annotation tasks / Using formal logic to represent sign language phonetics in semi-automatic annotation tasks Curiel Diaz, Arturo Tlacaélel 23 November 2015 (has links) Cette thèse présente le développement d'un framework formel pour la représentation des Langues de Signes (LS), les langages des communautés Sourdes, dans le cadre de la construction d'un système de reconnaissance automatique. Les LS sont de langues naturelles, qui utilisent des gestes et l'espace autour du signeur pour transmettre de l'information. Cela veut dire que, à différence des langues vocales, les morphèmes en LS ne correspondent pas aux séquences de sons; ils correspondent aux séquences de postures corporelles très spécifiques, séparés par des changements tels que de mouvements. De plus, lors du discours les signeurs utilisent plusieurs parties de leurs corps (articulateurs) simultanément, ce qui est difficile à capturer avec un système de notation écrite. Cette situation difficulté leur représentation dans de taches de Traitement Automatique du Langage Naturel (TALN). Pour ces raisons, le travail présenté dans ce document a comme objectif la construction d'une représentation abstraite de la LS; plus précisément, le but est de pouvoir représenter des collections de vidéo LS (corpus) de manière formelle. En générale, il s'agit de construire une couche de représentation intermédiaire, permettant de faire de la reconnaissance automatique indépendamment des technologies de suivi et des corpus utilisés pour la recherche. Cette couche corresponde à un système de transition d'états (STE), spécialement crée pour représenter la nature parallèle des LS. En plus, elle peut-être annoté avec de formules logiques pour son analyse, à travers de la vérification de modèles. Pour représenter les propriétés à vérifier, une logique multi-modale a été choisi : la Logique Propositionnelle Dynamique (PDL). Cette logique a été originalement crée pour la spécification de programmes. De manière plus précise, PDL permit d'utilise des opérateurs modales comme [a] et <a>, représentant <<nécessité>> et <<possibilité>>, respectivement. Une variante particulaire a été développée pour les LS : la PDL pour Langue de Signes (PDLSL), qui est interprété sur des STE représentant des corpus. Avec PDLSL, chaque articulateur du corps (comme les mains et la tête) est vu comme un agent indépendant; cela veut dire que chacun a ses propres actions et propositions possibles, et qu'il peux les exécuter pour influencer une posture gestuelle. L'utilisation du framework proposé peut aider à diminuer deux problèmes importantes qui existent dans l'étude linguistique des LS : hétérogénéité des corpus et la manque des systèmes automatiques d'aide à l'annotation. De ce fait, un chercheur peut rendre exploitables des corpus existants en les transformant vers des STE. Finalement, la création de cet outil à permit l'implémentation d'un système d'annotation semi-automatique, basé sur les principes théoriques du formalisme. Globalement, le système reçoit des vidéos LS et les transforme dans un STE valide. Ensuite, un module fait de la vérification formelle sur le STE, en utilisant une base de données de formules crée par un expert en LS. Les formules représentent des propriétés lexicales à chercher dans le STE. Le produit de ce processus, est une annotation qui peut être corrigé par des utilisateurs humains, et qui est utilisable dans des domaines d'études tels que la linguistique. / This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and <a> to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics. Langue des signes Logique propositionnelle dynamique Annotation automatique Sign Language Propositional Dynamic Logic Automatic Annotation Natural Language Processing
1098	A study of the use of natural language processing for conversational agents Wilkens, Rodrigo Souza January 2016 (has links) linguagem é uma marca da humanidade e da consciência, sendo a conversação (ou diálogo) uma das maneiras de comunicacão mais fundamentais que aprendemos quando crianças. Por isso uma forma de fazer um computador mais atrativo para interação com usuários é usando linguagem natural. Dos sistemas com algum grau de capacidade de linguagem desenvolvidos, o chatterbot Eliza é, provavelmente, o primeiro sistema com foco em diálogo. Com o objetivo de tornar a interação mais interessante e útil para o usuário há outras aplicações alem de chatterbots, como agentes conversacionais. Estes agentes geralmente possuem, em algum grau, propriedades como: corpo (com estados cognitivos, incluindo crenças, desejos e intenções ou objetivos); incorporação interativa no mundo real ou virtual (incluindo percepções de eventos, comunicação, habilidade de manipular o mundo e comunicar com outros agentes); e comportamento similar ao humano (incluindo habilidades afetivas). Este tipo de agente tem sido chamado de diversos nomes como agentes animados ou agentes conversacionais incorporados. Um sistema de diálogo possui seis componentes básicos. (1) O componente de reconhecimento de fala que é responsável por traduzir a fala do usuário em texto. (2) O componente de entendimento de linguagem natural que produz uma representação semântica adequada para diálogos, normalmente utilizando gramáticas e ontologias. (3) O gerenciador de tarefa que escolhe os conceitos a serem expressos ao usuário. (4) O componente de geração de linguagem natural que define como expressar estes conceitos em palavras. (5) O gerenciador de diálogo controla a estrutura do diálogo. (6) O sintetizador de voz é responsável por traduzir a resposta do agente em fala. No entanto, não há consenso sobre os recursos necessários para desenvolver agentes conversacionais e a dificuldade envolvida nisso (especialmente em línguas com poucos recursos disponíveis). Este trabalho foca na influência dos componentes de linguagem natural (entendimento e gerência de diálogo) e analisa em especial o uso de sistemas de análise sintática (parser) como parte do desenvolvimento de agentes conversacionais com habilidades de linguagem mais flexível. Este trabalho analisa quais os recursos do analisador sintático contribuem para agentes conversacionais e aborda como os desenvolver, tendo como língua alvo o português (uma língua com poucos recursos disponíveis). Para isto, analisamos as abordagens de entendimento de linguagem natural e identificamos as abordagens de análise sintática que oferecem um bom desempenho. Baseados nesta análise, desenvolvemos um protótipo para avaliar o impacto do uso de analisador sintático em um agente conversacional. / Language is a mark of humanity and conscience, with the conversation (or dialogue) as one of the most fundamental manners of communication that we learn as children. Therefore one way to make a computer more attractive for interaction with users is through the use of natural language. Among the systems with some degree of language capabilities developed, the Eliza chatterbot is probably the first with a focus on dialogue. In order to make the interaction more interesting and useful to the user there are other approaches besides chatterbots, like conversational agents. These agents generally have, to some degree, properties like: a body (with cognitive states, including beliefs, desires and intentions or objectives); an interactive incorporation in the real or virtual world (including perception of events, communication, ability to manipulate the world and communicate with others); and behavior similar to a human (including affective abilities). This type of agents has been called by several terms, including animated agents or embedded conversational agents (ECA). A dialogue system has six basic components. (1) The speech recognition component is responsible for translating the user’s speech into text. (2) The Natural Language Understanding component produces a semantic representation suitable for dialogues, usually using grammars and ontologies. (3) The Task Manager chooses the concepts to be expressed to the user. (4) The Natural Language Generation component defines how to express these concepts in words. (5) The dialog manager controls the structure of the dialogue. (6) The synthesizer is responsible for translating the agents answer into speech. However, there is no consensus about the necessary resources for developing conversational agents and the difficulties involved (especially in resource-poor languages). This work focuses on the influence of natural language components (dialogue understander and manager) and analyses, in particular the use of parsing systems as part of developing conversational agents with more flexible language capabilities. This work analyses what kind of parsing resources contributes to conversational agents and discusses how to develop them targeting Portuguese, which is a resource-poor language. To do so we analyze approaches to the understanding of natural language, and identify parsing approaches that offer good performance, based on which we develop a prototype to evaluate the impact of using a parser in a conversational agent. Processamento : Linguagem natural Agentes inteligentes Inteligência artificial Natural language processing Conversational agents Natural language understanding Parser Combinatory categorial grammar Grammar acquisition Portuguese
1099	Uma abordagem semiautomática para identificação de elementos de processo de negócio em texto de linguagem natural / A semi-automatic approach to identify business process elements in natural language text Ferreira, Renato César Borges January 2017 (has links) Para permitir um efetivo gerenciamento de processos de negócio, o primeiro passo é o desenvolvimento de modelos de processo adequados aos objetivos das organizações. Tais modelos são utilizados para descreverem papéis e responsabilidades dos colaboradores nas organizações. Além disso, a modelagem de processos é de grande importância para documentar, entender e automatizar processos. As organizações, geralmente provêm documentos não estruturados e de difícil entendimento por parte dos analistas. Neste panorama, a modelagem de processos se torna demorada e de alto custo, podendo gerar modelos de processo que estão em desacordo com a realidade prevista pelas organizações. A extração de modelos ou fragmentos de processo a partir de descrições textuais pode contribuir para minimizar o esforço necessário à modelagem de processos. Neste contexto, esta dissertação propõe uma abordagem para identificar elementos de processo de negócio em texto em linguagem natural de forma semiautomática. Baseado no estudo de processamento de linguagem natural, foi definido um conjunto de regras de mapeamento para identificar elementos de processo em descrição textual Além disso, para avaliar as regras de mapeamento e viabilizar a abordagem proposta, foi desenvolvido um protótipo capaz de identificar elementos de processo em texto de forma semiautomática. Para medir o desempenho do protótipo proposto, foram utilizadas métricas de recuperação de informação, tais como precisão, revocação e medida-F. Além disso, foram aplicados dois questionários com o objetivo de verificar a aceitação perante os usuários. As avaliações apresentam resultados promissores. A análise de 70 textos, apresentou, em média, 73,61% de precisão, 70,15% de revocação e 71,82% de medida-F. Além disso, os resultados do primeiro e segundo questionários apresentaram, em média, 91,66% de aceitação dos participantes. A principal contribuição deste trabalho é propor regras de mapeamento para identificar elementos de processo em texto em linguagem natural para auxiliar e minimizar o tempo necessário à modelagem de processos realizada pelos analistas de processo. / To enable effective business process management, the first step is the design of appropriate process models to the organization’s objectives. These models are used to describe roles and responsibilities of the employees in an organizations. In addition, business process modeling is very important to report, understand and automate processes. However, the documentation existent in organizations about such processes is mostly unstructured and difficult to be understood by analysts. In this context, process modeling becomes highly time consuming and expensive, generating process models that do not comply with the reality of the organizations. The extracting of process models from textual descriptions may contribute to minimize the effort required in process modeling. In this context, this dissertation proposes a semi-automatic approach to identify process elements in natural language text. Based on the study of natural language processing, it was defined a set of mapping rules to identify process elements in text. In addition, in order to evaluate the mapping rules and to demonstrate the feasibility of the proposed approach, a prototype was developed able to identify process elements in text in a semiautomatic way To measure the performance of the proposed prototype metrics were used to retrieve information such as precision, recall, and F-measure. In addition, two surveys were developed with the purpose of verifying the acceptance of the users. The evaluations present promising results. The analyses of 70 texts presented, on average, 73.61% precision, 70.15% recall and 71.82% F-measure. In addition, the results of the first and second surveys presented on average 91.66% acceptance of the participants. The main contribution of this work is to provide mapping rules for identify process elements in natural language text to support and minimize the time required for process modeling performed by process analysts. Processamento : Linguagem natural Processo de negócios Mapping rules Business process model and notation Business process management Process element Process model Natural language processing Process modeling
1100	Seleção de atributos para classificação de textos usando técnicas baseadas em agrupamento, PoS tagging e algoritmos evolutivos Ferreira, Charles Henrique Porto January 2016 (has links) Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / Neste trabalho são investigadas técnicas de seleção de atributos para serem aplicadas à tarefa de classificação de textos. Três técnicas diferentes são propostas para serem comparadas com as técnicas tradicionais de pré-processamento de textos. A primeira técnica propõe que nem todas as classes gramaticais de um dado idioma sejam relevantes em um texto quando este é submetido à tarefa de classificação. A segunda técnica emprega agrupamento de atributos e algoritmos genéticos para seleção de grupos. Na terceira técnica são levantadas 2 hipóteses: a primeira supõe que as palavras que ocorrem com mais frequência em uma base de textos do que no idioma como um todo, podem ser as palavras mais importantes para comporem os atributos; a segunda presume que as relações de cada instância de dados com cada classe pode compor um novo conjunto de atributos. Os resultados obtidos sugerem que as abordagens propostas são promissoras e que as hipóteses levantadas podem ser válidas. Os experimentos com a primeira abordagem mostram que existe um conjunto de classes gramaticais cujas palavras podem ser desconsideradas do conjunto de atributos em bases de textos diferentes mantendo ou até melhorando a acurácia de classificação. A segunda abordagem consegue realizar uma forte redução no número de atributos original e ainda melhorar a acurácia de classificação. Com a terceira abordagem, foi obtida a redução mais acentuada do número de atributos pois, pela natureza da proposta, o número final de atributos é igual ao número de classes da base, e o impacto na acurácia foi nulo ou até positivo. / This work investigates feature selection techniques to be applied to text classification task. Three different techniques are proposed to be compared with the traditional techniques of preprocessing texts. The first technique proposed that not all grammatical classes of a given language in a text are relevant when it is subjected to the classification task. The second technique employs clustering features and genetic algorithms for selecting groups. In the third technique are raised two hypotheses: the first assumes that the words that occur most often on the dataset than the language as a whole, may be the most important words to compose the features; the second assumes that the relationship of each data instance with each class can compose a new set of attributes. The results suggest that the proposed approaches are promising and that the hypotheses may be valid. The experiments show that the first approach is a set of grammatical word classes which can be disregarded from the set of features from different datasets maintaining or even improving the accuracy of classification. The second approach can achieve a significant reduction in the number of unique features and to improve the accuracy of classification. With the third approach, it obtained the more pronounced reduction in the number of features because, by the nature of the proposal, the final number offeatures is equal to the number of classes of the dataset, and the impact on the accuracy was zero or even positive. Seleção de Atributos PROCESSAMENTO DE LINGUAGEM NATURAL CLASSIFICAÇÃO DE TEXTOS FEATURE SELECTION NATURAL LANGUAGE PROCESSING TEXT MINING

Search results