Global ETD Search

121	Verblexpor : um recurso léxico com anotação de papéis semânticos para o português Zilio, Leonardo January 2015 (has links) Esta tese propõe um recurso léxico de verbos com anotação de papéis semânticos, denominado VerbLexPor, baseado em recursos como VerbNet, PropBank e FrameNet. As bases teóricas da proposta são interdisciplinares e retiradas da Linguística de Corpus e do Processamento de Linguagem Natural (PLN), visando-se a contribuir para a Linguística e para a Computação. As hipóteses de pesquisa são: a) um mesmo conjunto de papéis semânticos pode ser aplicado a diferentes gêneros textuais; e b) as diferenças entre esses gêneros se destacam no ranqueamento dos papéis semânticos. O desenvolvimento do VerbLexPor se apoia em dois corpora: um especializado, com mais de 1,6 milhão de palavras, composto por artigos científicos de Cardiologia de três periódicos brasileiros; e um não especializado, com mais de 1 milhão de palavras composto por artigos do jornal popular Diário Gaúcho. Os corpora foram anotados com o parser PALAVRAS, e as informações de sentenças, verbos e argumentos foram extraídas e armazenadas em um banco de dados. O VerbLexPor tem 192 verbos e mais de 15 mil argumentos anotados distribuídos em mais de 6 mil sentenças. Observou-se que o corpus do Diário Gaúcho privilegia uma sintaxe direta e pouco uso de voz passiva e adjuntos, enquanto o corpus de Cardiologia apresenta mais voz passiva e um maior uso de INSTRUMENTOS na posição de sujeito, além de uma menor incidência de AGENTES. Foram realizados também alguns experimentos paralelos, como a anotação de papéis semânticos por vários anotadores e o agrupamento automático de verbos. Na tarefa de múltiplos anotadores, cada um anotou exatamente as mesmas 25 orações. Os anotadores receberam um manual de anotação e um treinamento básico (explicação sobre a tarefa e dois exemplos de anotação). Usou-se o cálculo de multi-π para avaliar a concordância entre os anotadores, e o resultado foi de π = 0,25. Os motivos para essa concordância baixa podem estar na falta de um treinamento mais completo. A tarefa de agrupamento de verbos mostrou que a sintaxe e a semântica são igualmente importantes para o agrupamento. Este estudo contribui para a área de Linguística, com um léxico de verbos anotados semanticamente, e também para a Computação, com dados que podem ser consultados e processados para diversas aplicações do PLN, principalmente por estarem disponíveis nos formatos XML e SQL. / This dissertation aims at developing a lexical resource of verbs annotated with semantic roles, called VerbLexPor, and based on other resources, such as VerbNet, PropBank, and FrameNet. The theoretical bases of this study lies in Corpus Linguistics and Natural Language Processing (NLP), so that it aims at contributing to both Linguistics and Computer Science. The hypotheses are: a) one set of semantic roles can be applied to different genres; and b) the differences among genres are shown by the ranking of semantic roles. The development of VerbLexPor has two corpora at the basis: a specialized one, with more than 1.6 million words, composed by scientific papers in the field of Cardiology from three Brazilian journals; and a non-specialized one, with more than 1 million words, composed by newspaper articles from Diário Gaúcho. The corpora were analyzed with the parser PALAVRAS, and sentence, verb and argument information was extracted and stored in a database. VerbLexPor has 192 verbs and more than 15 thousand arguments annotated with semantic roles, distributed among more than 6 thousand sentences. We observed that Diário Gaúcho has a more direct syntax, with less passive voice and adjuncts, while Cardiology has more passive voice and more INSTRUMENTS for subjects, and fewer AGENTS. We also conducted some parallel experiments, such as semantic role labeling with multiple annotators and automatic verbal clustering. In the multiple annotators task, each of them annotated exactly the same 25 sentences. They received an annotation manual and basic training (explanation on the task and two annotation examples). We used multi-π to evaluate agreement among annotators, and results were π = 0,25. Reasons for this low agreement may be a lack of a thoroughly developed training. The verbal clustering task showed that syntax and semantics are equally important for verbal clustering. This study contributes to Linguistics, with a verbal lexicon annotated with semantic roles, and also to Computer Science, with data that can be assessed and processed for various NLP applications, especially because the data are available in both XML and SQL formats. Língua portuguesa Linguística computacional Corpus Linguagem especializada Semantic role labeling Lexical resource NLP Corpus linguistics
122	iGen: Toward Automatic Generation and Analysis of Indicators of Compromise (IOCs) using Convolutional Neural Network January 2017 (has links) abstract: Field of cyber threats is evolving rapidly and every day multitude of new information about malware and Advanced Persistent Threats (APTs) is generated in the form of malware reports, blog articles, forum posts, etc. However, current Threat Intelligence (TI) systems have several limitations. First, most of the TI systems examine and interpret data manually with the help of analysts. Second, some of them generate Indicators of Compromise (IOCs) directly using regular expressions without understanding the contextual meaning of those IOCs from the data sources which allows the tools to include lot of false positives. Third, lot of TI systems consider either one or two data sources for the generation of IOCs, and misses some of the most valuable IOCs from other data sources. To overcome these limitations, we propose iGen, a novel approach to fully automate the process of IOC generation and analysis. Proposed approach is based on the idea that our model can understand English texts like human beings, and extract the IOCs from the different data sources intelligently. Identification of the IOCs is done on the basis of the syntax and semantics of the sentence as well as context words (e.g., ``attacked'', ``suspicious'') present in the sentence which helps the approach work on any kind of data source. Our proposed technique, first removes the words with no contextual meaning like stop words and punctuations etc. Then using the rest of the words in the sentence and output label (IOC or non-IOC sentence), our model intelligently learn to classify sentences into IOC and non-IOC sentences. Once IOC sentences are identified using this learned Convolutional Neural Network (CNN) based approach, next step is to identify the IOC tokens (like domains, IP, URL) in the sentences. This CNN based classification model helps in removing false positives (like IPs which are not malicious). Afterwards, IOCs extracted from different data sources are correlated to find the links between thousands of apparently unrelated attack instances, particularly infrastructures shared between them. Our approach fully automates the process of IOC generation from gathering data from different sources to creating rules (e.g. OpenIOC, snort rules, STIX rules) for deployment on the security infrastructure. iGen has collected around 400K IOCs till now with a precision of 95\%, better than any state-of-art method. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Computer science CNN Indicators of Compromise Intrusion Detection Machine Learning NLP Security
123	Modèle de vérification grammaticale automatique gauche-droite / Model for automated left-right grammar checking Souque, Agnès 12 December 2014 (has links) Nous proposons un modèle de vérification grammaticale automatique gauche-droite issu de l'analyse d'un corpus d'erreurs tapuscrites. Les travaux menés en psychologie cognitive ont montré que le processus de révision procède au travers de la confrontation d'une attente à un résultat. Ainsi, la détection d'une erreur grammaticale reposerait, chez l'humain, sur une attente du réviseur non comblée. Ce principe est à la base du modèle que nous avons élaboré. Pour faciliter la gestion des attentes du point de vue traitement numérique, nous convions deux concepts courants en TAL : le principe d'unification et la segmentation en chunks. Le premier est particulièrement adapté à la vérification des accords et le second constitue une unité de calcul intermédiaire permettant de définir des bornes simplifiant la recherche d'incohérences grammaticales. Enfin, l'originalité de ce modèle réside dans une analyse gauche-droite construite au fur et à mesure de la lecture/écriture. / This thesis presents a model for automated left-right grammar checking based on analysis of a corpus of typescript errors. Studies in cognitive psychology have shown that the revision process works by confronting expectations with results. For humans, detecting a grammatical error therefore relies on an unfulfilled expectation on the part of the revisor. The model presented here is based on this principle. In order to deal with expectations from the point of view of computational processing, two common concepts in NLP are called upon: the unification principle and chunk segmentation. The former is particularly adapted to checking agreements, while the latter provides an intermediate computational unit to delimit, and therefore simplify, detection of grammatical inconsistencies. Finally, the model?s originality lies in the left-right analysis it provides, which is constructed as the text is produced/read. Correction grammaticale Chunk Unification Corpus TAL Grammar checking Chunk Unification Corpus NLP
124	Verblexpor : um recurso léxico com anotação de papéis semânticos para o português Zilio, Leonardo January 2015 (has links) Esta tese propõe um recurso léxico de verbos com anotação de papéis semânticos, denominado VerbLexPor, baseado em recursos como VerbNet, PropBank e FrameNet. As bases teóricas da proposta são interdisciplinares e retiradas da Linguística de Corpus e do Processamento de Linguagem Natural (PLN), visando-se a contribuir para a Linguística e para a Computação. As hipóteses de pesquisa são: a) um mesmo conjunto de papéis semânticos pode ser aplicado a diferentes gêneros textuais; e b) as diferenças entre esses gêneros se destacam no ranqueamento dos papéis semânticos. O desenvolvimento do VerbLexPor se apoia em dois corpora: um especializado, com mais de 1,6 milhão de palavras, composto por artigos científicos de Cardiologia de três periódicos brasileiros; e um não especializado, com mais de 1 milhão de palavras composto por artigos do jornal popular Diário Gaúcho. Os corpora foram anotados com o parser PALAVRAS, e as informações de sentenças, verbos e argumentos foram extraídas e armazenadas em um banco de dados. O VerbLexPor tem 192 verbos e mais de 15 mil argumentos anotados distribuídos em mais de 6 mil sentenças. Observou-se que o corpus do Diário Gaúcho privilegia uma sintaxe direta e pouco uso de voz passiva e adjuntos, enquanto o corpus de Cardiologia apresenta mais voz passiva e um maior uso de INSTRUMENTOS na posição de sujeito, além de uma menor incidência de AGENTES. Foram realizados também alguns experimentos paralelos, como a anotação de papéis semânticos por vários anotadores e o agrupamento automático de verbos. Na tarefa de múltiplos anotadores, cada um anotou exatamente as mesmas 25 orações. Os anotadores receberam um manual de anotação e um treinamento básico (explicação sobre a tarefa e dois exemplos de anotação). Usou-se o cálculo de multi-π para avaliar a concordância entre os anotadores, e o resultado foi de π = 0,25. Os motivos para essa concordância baixa podem estar na falta de um treinamento mais completo. A tarefa de agrupamento de verbos mostrou que a sintaxe e a semântica são igualmente importantes para o agrupamento. Este estudo contribui para a área de Linguística, com um léxico de verbos anotados semanticamente, e também para a Computação, com dados que podem ser consultados e processados para diversas aplicações do PLN, principalmente por estarem disponíveis nos formatos XML e SQL. / This dissertation aims at developing a lexical resource of verbs annotated with semantic roles, called VerbLexPor, and based on other resources, such as VerbNet, PropBank, and FrameNet. The theoretical bases of this study lies in Corpus Linguistics and Natural Language Processing (NLP), so that it aims at contributing to both Linguistics and Computer Science. The hypotheses are: a) one set of semantic roles can be applied to different genres; and b) the differences among genres are shown by the ranking of semantic roles. The development of VerbLexPor has two corpora at the basis: a specialized one, with more than 1.6 million words, composed by scientific papers in the field of Cardiology from three Brazilian journals; and a non-specialized one, with more than 1 million words, composed by newspaper articles from Diário Gaúcho. The corpora were analyzed with the parser PALAVRAS, and sentence, verb and argument information was extracted and stored in a database. VerbLexPor has 192 verbs and more than 15 thousand arguments annotated with semantic roles, distributed among more than 6 thousand sentences. We observed that Diário Gaúcho has a more direct syntax, with less passive voice and adjuncts, while Cardiology has more passive voice and more INSTRUMENTS for subjects, and fewer AGENTS. We also conducted some parallel experiments, such as semantic role labeling with multiple annotators and automatic verbal clustering. In the multiple annotators task, each of them annotated exactly the same 25 sentences. They received an annotation manual and basic training (explanation on the task and two annotation examples). We used multi-π to evaluate agreement among annotators, and results were π = 0,25. Reasons for this low agreement may be a lack of a thoroughly developed training. The verbal clustering task showed that syntax and semantics are equally important for verbal clustering. This study contributes to Linguistics, with a verbal lexicon annotated with semantic roles, and also to Computer Science, with data that can be assessed and processed for various NLP applications, especially because the data are available in both XML and SQL formats. Língua portuguesa Linguística computacional Corpus Linguagem especializada Semantic role labeling Lexical resource NLP Corpus linguistics
125	Verblexpor : um recurso léxico com anotação de papéis semânticos para o português Zilio, Leonardo January 2015 (has links) Esta tese propõe um recurso léxico de verbos com anotação de papéis semânticos, denominado VerbLexPor, baseado em recursos como VerbNet, PropBank e FrameNet. As bases teóricas da proposta são interdisciplinares e retiradas da Linguística de Corpus e do Processamento de Linguagem Natural (PLN), visando-se a contribuir para a Linguística e para a Computação. As hipóteses de pesquisa são: a) um mesmo conjunto de papéis semânticos pode ser aplicado a diferentes gêneros textuais; e b) as diferenças entre esses gêneros se destacam no ranqueamento dos papéis semânticos. O desenvolvimento do VerbLexPor se apoia em dois corpora: um especializado, com mais de 1,6 milhão de palavras, composto por artigos científicos de Cardiologia de três periódicos brasileiros; e um não especializado, com mais de 1 milhão de palavras composto por artigos do jornal popular Diário Gaúcho. Os corpora foram anotados com o parser PALAVRAS, e as informações de sentenças, verbos e argumentos foram extraídas e armazenadas em um banco de dados. O VerbLexPor tem 192 verbos e mais de 15 mil argumentos anotados distribuídos em mais de 6 mil sentenças. Observou-se que o corpus do Diário Gaúcho privilegia uma sintaxe direta e pouco uso de voz passiva e adjuntos, enquanto o corpus de Cardiologia apresenta mais voz passiva e um maior uso de INSTRUMENTOS na posição de sujeito, além de uma menor incidência de AGENTES. Foram realizados também alguns experimentos paralelos, como a anotação de papéis semânticos por vários anotadores e o agrupamento automático de verbos. Na tarefa de múltiplos anotadores, cada um anotou exatamente as mesmas 25 orações. Os anotadores receberam um manual de anotação e um treinamento básico (explicação sobre a tarefa e dois exemplos de anotação). Usou-se o cálculo de multi-π para avaliar a concordância entre os anotadores, e o resultado foi de π = 0,25. Os motivos para essa concordância baixa podem estar na falta de um treinamento mais completo. A tarefa de agrupamento de verbos mostrou que a sintaxe e a semântica são igualmente importantes para o agrupamento. Este estudo contribui para a área de Linguística, com um léxico de verbos anotados semanticamente, e também para a Computação, com dados que podem ser consultados e processados para diversas aplicações do PLN, principalmente por estarem disponíveis nos formatos XML e SQL. / This dissertation aims at developing a lexical resource of verbs annotated with semantic roles, called VerbLexPor, and based on other resources, such as VerbNet, PropBank, and FrameNet. The theoretical bases of this study lies in Corpus Linguistics and Natural Language Processing (NLP), so that it aims at contributing to both Linguistics and Computer Science. The hypotheses are: a) one set of semantic roles can be applied to different genres; and b) the differences among genres are shown by the ranking of semantic roles. The development of VerbLexPor has two corpora at the basis: a specialized one, with more than 1.6 million words, composed by scientific papers in the field of Cardiology from three Brazilian journals; and a non-specialized one, with more than 1 million words, composed by newspaper articles from Diário Gaúcho. The corpora were analyzed with the parser PALAVRAS, and sentence, verb and argument information was extracted and stored in a database. VerbLexPor has 192 verbs and more than 15 thousand arguments annotated with semantic roles, distributed among more than 6 thousand sentences. We observed that Diário Gaúcho has a more direct syntax, with less passive voice and adjuncts, while Cardiology has more passive voice and more INSTRUMENTS for subjects, and fewer AGENTS. We also conducted some parallel experiments, such as semantic role labeling with multiple annotators and automatic verbal clustering. In the multiple annotators task, each of them annotated exactly the same 25 sentences. They received an annotation manual and basic training (explanation on the task and two annotation examples). We used multi-π to evaluate agreement among annotators, and results were π = 0,25. Reasons for this low agreement may be a lack of a thoroughly developed training. The verbal clustering task showed that syntax and semantics are equally important for verbal clustering. This study contributes to Linguistics, with a verbal lexicon annotated with semantic roles, and also to Computer Science, with data that can be assessed and processed for various NLP applications, especially because the data are available in both XML and SQL formats. Língua portuguesa Linguística computacional Corpus Linguagem especializada Semantic role labeling Lexical resource NLP Corpus linguistics
126	CUILESS2016: a clinical corpus applying compositional normalization of text mentions Osborne, John D., Neu, Matthew B., Danila, Maria I., Solorio, Thamar, Bethard, Steven J. 10 January 2018 (has links) Background: Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text. Methods: We annotated 5397 disorder mentions from the ShARe corpus to SNOMED CT that were previously normalized as "CUI-less" in the "SemEval-2015 Task 14" shared task because they lacked a pre-coordinated mapping. Unlike the previous normalization method, we do not restrict concept mappings to a particular set of the Unified Medical Language System (UMLS) semantic types and allow normalization to occur to multiple UMLS Concept Unique Identifiers (CUIs). We computed annotator agreement and assessed semantic coverage with this method. Results: We generated the largest clinical text normalization corpus to date with mappings to multiple identifiers and made it freely available. All but 8 of the 5397 disorder mentions were normalized using this methodology. Annotator agreement ranged from 52.4% using the strictest metric (exact matching) to 78.2% using a hierarchical agreement that measures the overlap of shared ancestral nodes. Conclusion: Our results provide evidence that compositional concepts can increase semantic coverage in clinical text. To our knowledge we provide the first freely available corpus of compositional concept annotation in clinical text. NLP Information extraction Concept normalization Concept recognition Fine grained named entity recognition
127	Bilden av "Gameplay" i spelrecensioner : Lingvistisk och komparativ analys av gameplay-begreppet i professionella spelrecensioner och i spelforskningslitteraturen Räisänen, Kalle January 2011 (has links) No description available. Gameplay Spel Spelrecensioner Innehållsanalys Lexikalisk analys NLP Computer and Information Sciences Data- och informationsvetenskap
128	Simulace uzivatele pro statisticke dialogove systemy / User simulation for statistical dialogue systems Michlíková, Vendula January 2015 (has links) The purpose of this thesis is to develop and evaluate user simulators for a spoken dialogue system. Created simulators are operating on dialogue act level. We implemented a bigram simulator as a baseline system. Based on the baseline simulator, we created another bigram simulator that is trained on dialogue acts without slot values. The third implemented simulator is similar to an implemen- tation of a dialogue manager. It tracks its dialogue state and learns a dialogue strategy based on the state using supervised learning. The user simulators are implemented in Python 2.7, in ALEX framework for dialogue system development. Simulators are developed for PTICS application which operates in the domain of public transport information. Simulators are trained and evaluated using real human-machine dialogues collected with PTICS application. 1
129	The role of Neuro Linguistic Programming in improving organisational leadership through intrapersonal communication development Oberholzer, Charl January 2014 (has links) In today’s rapidly changing world of work, where dramatic, unpredictable and complex change is redefining the way in which organisations are to be managed, a realisation has emerged that the intra- and interpersonal communication techniques of people are fundamental in organisational success. This study argues that Neuro Linguistic Programming (NLP) consists of the necessary techniques to impact on an organisation, while its communication model and leadership behaviour adds to achieving organisational success. Previous research in NLP has been done mostly in disciplines such as psychology and linguistics, but a call is made to apply NLP in an organisational context. Little empirical evidence exists with regards to the benefits of NLP techniques while even fewer evidence is available in a South African context. In this study NLP’s relationship with Emotional Intelligence, the development of leadership, the corporate world and several communication theories are explored so as to understand the value it can contribute in a time where the concept of organisational success is being re-defined. An integrated framework of organisational success, incorporating NLP, Emotional Intelligence and intrapersonal communication, is introduced that serves as an additional guideline to measure the elements of organisational success in organisations, leaders or communication models. This framework also leads to the conclusion that organisations, leaders and communication models making use of NLP are better off than those not making use of it. The benefits of NLP include motivating employees, managing conflict and self-motivation, managing emotional states, communicating effectively, building trust, increasing productivity, improving customer care, strategic planning, setting goals as well as aligning visions and better flexibility. This study establishes that intelligent leadership, the application of NLP techniques to the intra- and interpersonal communication behaviour and management approaches of leaders, can be correlated with organisational success. This is done by means of a case study on the Solidarity Movement, a large non-profit organisation in South-Africa, where five strategic leaders are analysed who are believed to be using and have implemented NLP in the organisation. The result is that evidence is found to support the notion that NLP improves the intra- and interpersonal behaviour of leaders and in turn contributes to organisational success by applying NLP in their communication and leadership behaviour. NLP is often presented as a magic toolkit for the self-improvement of individuals and have since recently relied more on presuppositions than either qualitative or quantitative research. This study adds to the credibility of NLP as an increasingly important instrument for communication management as a discipline. / Dissertation (MCom)--University of Pretoria, 2014. / gm2014 / Communication Management / Unrestricted Interpersonal communication techniques Neuro Linguistic Programming (NLP) Organisational leadership South African context Relationship UCTD
130	Optimisation of plate/plate-fin heat exchanger design Guo, Kunpeng January 2015 (has links) With increasing global energy consumption, stringent environmental protection legislation and safety regulations in industrialised nations, energy saving has been put under high priority. One of the most efficient ways of energy reduction is through heat transfer enhancement for additional heat recovery. Applying compact heat exchanger is one of the main strategies of heat transfer enhancement. However, the application of compact heat exchangers is prohibited by the lack of design methodology. Therefore, the aim of this research is to tackle the problem of developing optimisation methodologies of plate/plate-fin heat exchanger design. A mathematical model of plate-fin heat exchanger design is proposed to consider fin type selection with detailed geometry and imposed constraints simultaneously. The concept of mix-and-match fin type combinations is put forward to include all possible fin type combinations in a heat exchanger. The mixed integer nonlinear programming (MINLP) model can be converted to a nonlinear programming (NLP) model by employing continuous heat transfer and pressure drop correlations and considering the basic fin geometric parameters as continuous variables. The whole optimisation is based on volumetric minimisation or capital cost minimisation and completed by CONOPT solver in GAMS. Case studies are carried out to demonstrate the effectiveness and benefits of the new proposed methodology. For plate heat exchangers, the design methodology is developed on the basis of plate-fin heat exchanger methodology, and takes phase change, plate pattern selection, flow arrangement and pressure drop constraints simultaneously. The phase change problem is tackled by dividing the whole process into several subsections and considering constant physical properties in each subsection. The performances of various flow arrangements are evaluated by correction factors of logarithmic mean temperature difference. For two-phase conditions, the heat transfer and pressure drop performance are predicted by continuous two-phase Nusselt number and Fanning friction factor correlations to avoid the MINLP problem. The optimisation is solved by CONOPT solver as well. The feasibility and accuracy of the new proposed methodology is examined by case studies. 621.402

Search results