Global ETD Search

281	Framtidens cybersäkerhet : en studie om hur Natural Language Processing påverkar dagens cybersäkerhetsarbete / The Future of Cybersecurity : A Study on How Natural Language Processing Impacts Today's Cybersecurity Efforts Grönstedt Söderberg, Olle, Mattsson, Fredrik January 2024 (has links) Sedan lanseringen av OpenAIs generativa chatbot ChatGPT i slutet av 2022 har intresset för artificiell intelligens (AI) och specifikt Natural Language Processing (NLP) ökat markant. Genom dess förmåga att tolka och generera mänskligt språk har NLP redan transformerat flertalet industrier och skapat debatter bland forskare, där somliga ser AI som en av de mest betydelsefulla innovationerna någonsin, medan andra varnar för att den hastiga teknikutvecklingen leder till nya och förändrade risker. Denna studie syftar till att undersöka cybersäkerhetsexperters syn på risker relaterade till användningen av NLP och dess inverkan på cybersäkerhetsarbete. Genom intervjuer och enkäter har studien identifierat flera risker som effektiviseras i och med användningen av NLP-baserade tjänster. Studiens enkätresultat visar vilka risker cybersäkerhetsexperter värderar högst utifrån sannolikhet och potentiella skada. Värderingarna görs med ramverket CIA i åtanke (Confidentiality, Integrity, Availability), en beprövad säkerhetsmodell som används för att upprätthålla god informations- och cybersäkerhet. Studiens intervjuresultat förser studien med insikter i respondenternas bakomliggande resonemang och betonar också vikten av medvetenhet vid användningen av NLP-baserade tjänster. Sammantaget förser studien läsaren med en förståelse för de risker som är förknippade med Natural language processing och ger insikt i de faktorer som cybersäkerhetsexperter tar i beaktning när de bedömer dessa risker. De tre risker som studien identifierade som särskilt framstående var: Spear-phishing, Skadlig Kod och Data leaks. Natural language processing Cybersecurity Artificial intelligence CIA Risks Natural language processing Cybersäkerhet Artificiell intelligens CIA Risker Information Systems, Social aspects
282	Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication Kyle, Kristopher 09 May 2016 (has links) Syntactic complexity has been an area of significant interest in L2 writing development studies over the past 45 years. Despite the regularity in which syntactic complexity measures have been employed, the construct is still relatively under-developed, and, as a result, the cumulative results of syntactic complexity studies can appear opaque. At least three reasons exist for the current state of affairs, namely the lack of consistency and clarity by which indices of syntactic complexity have been described, the overly broad nature of the indices that have been regularly employed, and the omission of indices that focus on usage-based perspectives. This study seeks to address these three gaps through the development and validation of the Tool for the Automatic Assessment of Syntactic Sophistication and Complexity (TAASSC). TAASSC measures large and fined grained clausal and phrasal indices of syntactic complexity and usage-based frequency/contingency indices of syntactic sophistication. Using TAASSC, this study will address L2 writing development in two main ways: through the examination of syntactic development longitudinally and through the examination of human judgments of writing proficiency (e.g., expert ratings of TOEFL essays). This study will have important implications for second language acquisition, second language writing, and language assessment. Second language acquisition Syntactic complexity Writing development Language use Language assessment Natural language processing
283	Efficient algorithms for infinite-state recursive stochastic models and Newton's method Stewart, Alistair Mark January 2015 (has links) Some well-studied infinite-state stochastic models give rise to systems of nonlinear equations. These systems of equations have solutions that are probabilities, generally probabilities of termination in the model. We are interested in finding efficient, preferably polynomial time, algorithms for calculating probabilities associated with these models. The chief tool we use to solve systems of polynomial equations will be Newton’s method as suggested by [EY09]. The main contribution of this thesis is to the analysis of this and related algorithms. We give polynomial-time algorithms for calculating probabilities for broad classes of models for which none were known before. Stochastic models that give rise to such systems of equations include such classic and heavily-studied models as Multi-type Branching Processes, Stochastic Context- Free Grammars(SCFGs) and Quasi Birth-Death Processes. We also consider models that give rise to infinite-state Markov Decision Processes (MDPs) by giving algorithms for approximating optimal probabilities and finding policies that give probabilities close to the optimal probability, in several classes of infinite-state MDPs. Our algorithms for analysing infinite-state MDPs rely on a non-trivial generalization of Newton’s method that works for the max/min polynomial systems that arise as Bellman optimality equations in these models. For SCFGs, which are used in statistical natural language processing, in addition to approximating termination probabilities, we analyse algorithms for approximating the probability that a grammar produces a given string, or produces a string in a given regular language. In most cases, we show that we can calculate an approximation to the relevant probability in time polynomial in the size of the model and the number of bits of desired precision. We also consider more general systems of monotone polynomial equations. For such systems we cannot give a polynomial-time algorithm, which pre-existing hardness results render unlikely, but we can still give an algorithm with a complexity upper bound which is exponential only in some parameters that are likely to be bounded for the monotone polynomial equations that arise for many interesting stochastic models. 512
284	Characterization of Prose by Rhetorical Structure for Machine Learning Classification Java, James 01 January 2015 (has links) Measures of classical rhetorical structure in text can improve accuracy in certain types of stylistic classification tasks such as authorship attribution. This research augments the relatively scarce work in the automated identification of rhetorical figures and uses the resulting statistics to characterize an author's rhetorical style. These characterizations of style can then become part of the feature set of various classification models. Our Rhetorica software identifies 14 classical rhetorical figures in free English text, with generally good precision and recall, and provides summary measures to use in descriptive or classification tasks. Classification models trained on Rhetorica's rhetorical measures paired with lexical features typically performed better at authorship attribution than either set of features used individually. The rhetorical measures also provide new stylistic quantities for describing texts, authors, genres, etc. Authorship attribution Machine learning Natural language processing Rhetoric Computer Science Computer Sciences Rhetoric
285	Iterated learning framework for unsupervised part-of-speech induction Christodoulopoulos, Christos January 2013 (has links) Computational approaches to linguistic analysis have been used for more than half a century. The main tools come from the field of Natural Language Processing (NLP) and are based on rule-based or corpora-based (supervised) methods. Despite the undeniable success of supervised learning methods in NLP, they have two main drawbacks: on the practical side, it is expensive to produce the manual annotation (or the rules) required and it is not easy to find annotators for less common languages. A theoretical disadvantage is that the computational analysis produced is tied to a specific theory or annotation scheme. Unsupervised methods offer the possibility to expand our analyses into more resourcepoor languages, and to move beyond the conventional linguistic theories. They are a way of observing patterns and regularities emerging directly from the data and can provide new linguistic insights. In this thesis I explore unsupervised methods for inducing parts of speech across languages. I discuss the challenges in evaluation of unsupervised learning and at the same time, by looking at the historical evolution of part-of-speech systems, I make the case that the compartmentalised, traditional pipeline approach of NLP is not ideal for the task. I present a generative Bayesian system that makes it easy to incorporate multiple diverse features, spanning different levels of linguistic structure, like morphology, lexical distribution, syntactic dependencies and word alignment information that allow for the examination of cross-linguistic patterns. I test the system using features provided by unsupervised systems in a pipeline mode (where the output of one system is the input to another) and show that the performance of the baseline (distributional) model increases significantly, reaching and in some cases surpassing the performance of state-of-the-art part-of-speech induction systems. I then turn to the unsupervised systems that provided these sources of information (morphology, dependencies, word alignment) and examine the way that part-of-speech information influences their inference. Having established a bi-directional relationship between each system and my part-of-speech inducer, I describe an iterated learning method, where each component system is trained using the output of the other system in each iteration. The iterated learning method improves the performance of both component systems in each task. Finally, using this iterated learning framework, and by using parts of speech as the central component, I produce chains of linguistic structure induction that combine all the component systems to offer a more holistic view of NLP. To show the potential of this multi-level system, I demonstrate its use ‘in the wild’. I describe the creation of a vastly multilingual parallel corpus based on 100 translations of the Bible in a diverse set of languages. Using the multi-level induction system, I induce cross-lingual clusters, and provide some qualitative results of my approach. I show that it is possible to discover similarities between languages that correspond to ‘hidden’ morphological, syntactic or semantic elements. 006.3
286	Automatic generation of factual questions from video documentaries Skalban, Yvonne January 2013 (has links) Questioning sessions are an essential part of teachers’ daily instructional activities. Questions are used to assess students’ knowledge and comprehension and to promote learning. The manual creation of such learning material is a laborious and time-consuming task. Research in Natural Language Processing (NLP) has shown that Question Generation (QG) systems can be used to efficiently create high-quality learning materials to support teachers in their work and students in their learning process. A number of successful QG applications for education and training have been developed, but these focus mainly on supporting reading materials. However, digital technology is always evolving; there is an ever-growing amount of multimedia content available, and more and more delivery methods for audio-visual content are emerging and easily accessible. At the same time, research provides empirical evidence that multimedia use in the classroom has beneficial effects on student learning. Thus, there is a need to investigate whether QG systems can be used to assist teachers in creating assessment materials from these different types of media that are being employed in classrooms. This thesis serves to explore how NLP tools and techniques can be harnessed to generate questions from non-traditional learning materials, in particular videos. A QG framework which allows the generation of factual questions from video documentaries has been developed and a number of evaluations to analyse the quality of the produced questions have been performed. The developed framework uses several readily available NLP tools to generate questions from the subtitles accompanying a video documentary. The reason for choosing video vii documentaries is two-fold: firstly, they are frequently used by teachers and secondly, their factual nature lends itself well to question generation, as will be explained within the thesis. The questions generated by the framework can be used as a quick way of testing students’ comprehension of what they have learned from the documentary. As part of this research project, the characteristics of documentary videos and their subtitles were analysed and the methodology has been adapted to be able to exploit these characteristics. An evaluation of the system output by domain experts showed promising results but also revealed that generating even shallow questions is a task which is far from trivial. To this end, the evaluation and subsequent error analysis contribute to the literature by highlighting the challenges QG from documentary videos can face. In a user study, it was investigated whether questions generated automatically by the system developed as part of this thesis and a state-of-the-art system can successfully be used to assist multimedia-based learning. Using a novel evaluation methodology, the feasibility of using a QG system’s output as ‘pre-questions’ with different types of prequestions (text-based and with images) used was examined. The psychometric parameters of the automatically generated questions by the two systems and of those generated manually were compared. The results indicate that the presence of pre-questions (preferably with images) improves the performance of test-takers and they highlight that the psychometric parameters of the questions generated by the system are comparable if not better than those of the state-of-the-art system. In another experiment, the productivity of questions in terms of time taken to generate questions manually vs. time taken to post-edit system-generated questions was analysed. A viii post-editing tool which allows for the tracking of several statistics such as edit distance measures, editing time, etc, was used. The quality of questions before and after postediting was also analysed. Not only did the experiments provide quantitative data about automatically and manually generated questions, but qualitative data in the form of user feedback, which provides an insight into how users perceived the quality of questions, was also gathered. 006.35
287	Semantic annotation of Chinese texts with message structures based on HowNet Wong, Ping-wai., 黃炳蔚. January 2007 (has links) published_or_final_version / abstract / Humanities / Doctoral / Doctor of Philosophy Chinese language - Semantics. Semantics - Data processing.
288	DEXTER: Generating Documents by means of computational registers Oldham, Joseph D. 01 January 2000 (has links) Software is often capable of efficiently storing and managing data on computers. However, even software systems that store and manage data efficiently often do an inadequate job of presenting data to users. A prototypical example is the display of raw data in the tabular results of SQL queries. Users may need a presentation that is sensitive to data values and sensitive to domain conventions. One way to enhance presentation is to generate documents that correctly convey the data to users, taking into account the needs of the user and the values in the data. I have designed and implemented a software approach to generating human-readable documents in a variety of domains. The software to generate a document is called a {\em computational register}, or ``register'' for short. A {\em register system} is a software package for authoring and managing individual registers. Registers generating documents in various domains may be managed by one register system. In this thesis I describe computational registers at an architectural level and discuss registers as implemented in DEXTER, my register system. Input to DEXTER registers is a set of SQL query results. DEXTER registers use a rule-based approach to create a document outline from the input. A register creates the output document by using flexible templates to express the document outline. The register approach is unique in several ways. Content determination and structural planning are carried out sequentially rather than simultaneously. Content planning itself is broken down into data re-representation followed by content selection. No advanced linguistic knowledge is required to understand the approach. Register authoring follows a course very similar to writing a single document. The internal data representation and content planning steps allow registers to use flexible templates, rather than more abstract grammar-based approaches, to render the final document, Computational registers are applicable in a variety of domains. What registers can be written is restricted not by domain, but by the original data representation. Finally, DEXTER shows that a single software suite can assist in authoring and management of a variety of registers.
289	Advanced natural language processing for improved prosody in text-to-speech synthesis / G. I. Schlünz Schlünz, Georg Isaac January 2014 (has links) Text-to-speech synthesis enables the speech-impeded user of an augmentative and alternative communication system to partake in any conversation on any topic, because it can produce dynamic content. Current synthetic voices do not sound very natural, however, lacking in the areas of emphasis and emotion. These qualities are furthermore important to convey meaning and intent beyond that which can be achieved by the vocabulary of words only. Put differently, speech synthesis requires a more comprehensive analysis of its text input beyond the word level to infer the meaning and intent that elicit emphasis and emotion. The synthesised speech then needs to imitate the effects that these textual factors have on the acoustics of human speech. This research addresses these challenges by commencing with a literature study on the state of the art in the fields of natural language processing, text-to-speech synthesis and speech prosody. It is noted that the higher linguistic levels of discourse, information structure and affect are necessary for the text analysis to shape the prosody appropriately for more natural synthesised speech. Discourse and information structure account for meaning, intent and emphasis, and affect formalises the modelling of emotion. The OCC model is shown to be a suitable point of departure for a new model of affect that can leverage the higher linguistic levels. The audiobook is presented as a text and speech resource for the modelling of discourse, information structure and affect because its narrative structure is prosodically richer than the random constitution of a traditional text-to-speech corpus. A set of audiobooks are selected and phonetically aligned for subsequent investigation. The new model of discourse, information structure and affect, called e-motif, is developed to take advantage of the audiobook text. It is a subjective model that does not specify any particular belief system in order to appraise its emotions, but defines only anonymous affect states. Its cognitive and social features rely heavily on the coreference resolution of the text, but this process is found not to be accurate enough to produce usable features values. The research concludes with an experimental investigation of the influence of the e-motif features on human speech and synthesised speech. The aligned audiobook speech is inspected for prosodic correlates of the cognitive and social features, revealing that some activity occurs in the into national domain. However, when the aligned audiobook speech is used in the training of a synthetic voice, the e-motif effects are overshadowed by those of structural features that come standard in the voice building framework. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014 Natural language processing Text-to-speech synthesis Prosody Discourse Information structure Affect OCC model E-motif
290	Automatic error detection in non-native English De Felice, Rachele January 2008 (has links) This thesis describes the development of Dapper (`Determiner And PrePosition Error Recogniser'), a system designed to automatically acquire models of occurrence for English prepositions and determiners to allow for the detection and correction of errors in their usage, especially in the writing of non-native speakers of the language. Prepositions and determiners are focused on because they are parts of speech whose usage is particularly challenging to acquire, both for students of the language and for natural language processing tools. The work presented in this thesis proposes to address this problem by developing a system which can acquire models of correct preposition and determiner occurrence, and can use this knowledge to identify divergences from these models as errors. The contexts of these parts of speech are represented by a sophisticated feature set, incorporating a variety of semantic and syntactic elements. DAPPER is found to perform well on preposition and determiner selection tasks in correct native English text. Results on each preposition and determiner are discussed in detail to understand the possible reasons for variations in performance, and whether these are due to problems with the structure of DAPPER or to deeper linguistic reasons. An in-depth analysis of all features used is also offered, quantifying the contribution of each feature individually. This can help establish if the decision to include complex semantic and syntactic features is justified in the context of this task. Finally, the performance of DAPPER on non-native English text is assessed. The system is found to be robust when applied to text which does not contain any preposition or determiner errors. On an error correction task, results are mixed: DAPPER shows promising results on preposition selection and determiner confusion (definite vs. indefinite) errors, but is less successful in detecting errors involving missing or extraneous determiners. Several characteristics of learner writing are described, to gain a clearer understanding of what problems arise when natural language processing tools are used with this kind of text. It is concluded that the construction of contextual models is a viable approach to the task of preposition and determiner selection, despite outstanding issues pertaining to the domain of non-native writing. 005.3

Search results