Global ETD Search

111	Controlled Languages in Software User Documentation Steensland, Henrik, Dervisevic, Dina January 2005 (has links) In order to facilitate comprehensibility and translation, the language used in software user documentation must be standardized. If the terminology and language rules are standardized and consistent, the time and cost of translation will be reduced. For this reason, controlled languages have been developed. Controlled languages are subsets of other languages, purposely limited by restricting the terminology and grammar that is allowed. The purpose and goal of this thesis is to investigate how using a controlled language can improve comprehensibility and translatability of software user documentation written in English. In order to reach our goal, we have performed a case study at IFS AB. We specify a number of research questions that help satisfy some of the goals of IFS and, when generalized, fulfill the goal of this thesis. A major result of our case study is a list of sixteen controlled language rules. Some examples of these rules are control of the maximum allowed number of words in a sentence, and control of when the author is allowed to use past participles. We have based our controlled language rules on existing controlled languages, style guides, research reports, and the opinions of technical writers at IFS. When we applied these rules to different user documentation texts at IFS, we managed to increase the readability score for each of the texts. Also, during an assessment test of readability and translatability, the rewritten versions were chosen in 85 % of the cases by experienced technical writers at IFS. Another result of our case study is a prototype application that shows that it is possible to develop and use a software checker for helping the authors when writing documentation according to our suggested controlled language rules. Controlled Language Readability Translatability Style Guides IFS
112	Exponerade hatkommentarer : En studie av svensk hatkommentarsklassificering Johansson, Kim January 2016 (has links) I detta arbete presenteras hatfulla kommentarer på internet som ett sam- hällsproblem som vi bör göra något åt. Webbplatsen Exponerat.net presenteras som en källa till hatfulla kommentarer. Med hjälp av ett förenklande antagande om att de kommentarer som finns på Exponerat kan utgöra en god representation för hatfulla kommentarer på internet konstruerar vi en klassificerare. Klassificeraren utvärderas i två steg; det ena med hjälp av tiofaldig korsvalidering och det andra manuellt. Klassificeraren uppvisar acceptabla precision/recall-värden i det första utvärderingssteget men faller kort i det manuella. Arbetet avslutas med en diskussion om rimligheten i det förenklande antagandet att använda en enda källa. / Hate speech on the internet is a serious issue. This study asks the question: "Is it possible to use machine learning to do something about it?". By using crawled comments from the blog Exponerat.net as a representation of “hate” and comments from the blog Feber.se as “not-hate” we try to construct a classifier. Evaluation in done in two steps; one using 10-fold cross validation and one using manual evaluation methods. The classifier produces an acceptable result in the first step but falls short in the second. The study ends with discussions about if it is even possible to train a classifier using only one source of data. exponerat.net hatkommentarer SVM textklassificering blogg näthat
113	Användning av Self Organizing Maps som en metod att skapa semantiska representationer ur text Fallgren, Per January 2015 (has links) Denna studie är ett kognitionsvetenskapligt examensarbete som syftar på att skapa en modell som skapar semantiska representationer utifrån ett mer biologiskt plausibelt tillvägagångssätt jämfört med traditionella metoder. Denna modell kan ses som ett första steg i utredningen av ansatsen som följer. Studien utreder antagandet om Self Organizing Maps kan användas för att skapa semantiska representationer ur stora mängder text utifrån ett distribuerat inspirerat tillvägagångssätt. Resultatet visar på ett potentiellt fungerande system, men som behöver utredas vidare i framtida studier för verifiering av högre grad. Self Organizing Maps semantic neural network
114	An eye-tracking study on synonym replacement / En ögonrörelsestudie på synonymutbyte Svensson, Cassandra January 2015 (has links) As the amount of information increase, the need for automatic textsimplication also increase. There are some strategies for doing thatand this thesis has studied two basic synonym replacement strategies.The rst one is called word length and is about always choosinga shorter synonym if it is possible. The second one is called wordfrequency and is about always choosing a more frequent synonym if itis possible. Three dierent versions of them were tried. The rst onewas about just choosing the shortest or most frequent synonym. Thesecond was about only choosing a synonym if it was extremely shorteror more frequent. The last was about only choosing a synonym if itmet the requirements for being replaced and was on synonym level 5.Statistical analysis of the data revealed no signicant dierence. Butsmall trends showed that always choosing a more frequent synonymthat is of level 5 seemed to make the text a bit easier. Synonym replacement word lenght word frequency eye tracking
115	Cohesion and Comprehensibility in Swedish-English Machine Translated Texts Askarieh, Sona January 2014 (has links) Access to various texts in different languages causes an increasing demand for fast, multi-purpose, and cheap translators. Pervasive internet use intensifies the necessity for intelligent and cheap translators, since traditional translation methods are excessively slow to translate different texts. During the past years, scientists carried out much research in order to add human and artificial intelligence into the old machine translation systems and the idea of developing a machine translation system came into existence during the days of World War (Kohenn, 2010). The new invention was useful in order to help the human translators and many other people who need to translate different types of texts according to their needs. The new translation systems are useful in meeting people’s needs. Since the machine translation systems vary according to the quality of the systems outputs, their performance should be evaluated from the linguistic point of view in order to reach a fair judgment about the quality of the systems outputs. To achieve this goal, two various Swedish texts were translated by two different machine translation systems in the thesis. The translated texts were evaluated to examine the extent to which errors affect the comprehensibility of the translations. The performances of the systems were evaluated using three approaches. Firstly, most common linguistically errors, which appear in the machine translation systems outputs, were analyzed (e.g. word alignment of the translated texts). Secondly, the influence of different types of errors on the cohesion chains were evaluated. Finally, the effect of the errors on the comprehensibility of the translations were investigated. Numerical results showed that some types of errors have more effects on the comprehensibility of the systems’ outputs. The obtained data illustrated that the subjects’ comprehension of the translated texts depend on the type of error, but not frequency. The analyzing depicted which translation system had best performance. Languages and Literature Språk och litteratur
116	Keeping an Eye on the Context : An Eye Tracking Study of Cohesion Errors in Automatic Text Summarization / Med ett öga på sammanhanget : En ögonrörelsestudie av kohesionsfel i automatiska textsammanfattningar Rennes, Evelina January 2013 (has links) Automatic text summarization is a growing field due to the modern world’s Internet based society, but to automatically create perfect summaries is not easy, and cohesion errors are common. By the usage of an eye tracking camera, this thesis studies the nature of four different types of cohesion errors occurring in summaries. A total of 23 participants read and rated four different texts and marked the most difficult areas of each text. Statistical analysis of the data revealed that absent cohesion or context and broken anaphoric reference (pronouns) caused some disturbance in reading, but that the impact is restricted to the effort to read rather than the comprehension of the text. Erroneous anaphoric reference (pronouns) was not detected by the participants which poses a problem for automatic text summarizers, and other potential disturbing factors were detected. Finally, the question of the meaningfulness of keeping absent cohesion or context as a separate error type was raised. Automatic text summarization cohesion errors eye tracking CogSum
117	Detecting Rhetorical Figures Based on Repetition of Words: Chiasmus, Epanaphora, Epiphora Dubremetz, Marie January 2017 (has links) This thesis deals with the detection of three rhetorical figures based on repetition of words: chiasmus (“Fair is foul, and foul is fair.”), epanaphora (“Poor old European Commission! Poor old European Council.”) and epiphora (“This house is mine. This car is mine. You are mine.”). For a computer, locating all repetitions of words is trivial, but locating just those repetitions that achieve a rhetorical effect is not. How can we make this distinction automatically? First, we propose a new definition of the problem. We observe that rhetorical figures are a graded phenomenon, with universally accepted prototypical cases, equally clear non-cases, and a broad range of borderline cases in between. This makes it natural to view the problem as a ranking task rather than a binary detection task. We therefore design a model for ranking candidate repetitions in terms of decreasing likelihood of having a rhetorical effect, which allows potential users to decide for themselves where to draw the line with respect to borderline cases. Second, we address the problem of collecting annotated data to train the ranking model. Thanks to a selective method of annotation, we can reduce by three orders of magnitude the annotation work for chiasmus, and by one order of magnitude the work for epanaphora and epiphora. In this way, we prove that it is feasible to develop a system for detecting the three figures without an unsurmountable amount of human work. Finally, we propose an evaluation scheme and apply it to our models. The evaluation reveals that, even with a very incompletely annotated corpus, a system for repetitive figure detection can be trained to achieve reasonable accuracy. We investigate the impact of different linguistic features, including length, n-grams, part-of-speech tags, and syntactic roles, and find that different features are useful for different figures. We also apply the system to four different types of text: political discourse, fiction, titles of articles and novels, and quotations. Here the evaluation shows that the system is robust to shifts in genre and that the frequencies of the three rhetorical figures vary with genre. / Denna avhandling behandlar tre retoriska figurer som bygger på upprepning av ord, kiasm (“Om inte Muhammed kan komma till berget får berget komma till Muhammed.”), anafor (“Det är inte rimligt. Det är inte hållbart. Det är inte rättvist.”), och epifor (“Den här stugan är min. Den här bilen är min. Du är min.”). En dator kan lätt identifiera upprepningar av ord i en text, men att urskilja enbart de upprepningar som har en retorisk effekt är svårare. Hur kan vi få datorer att göra detta? För det första föreslår vi en ny definition av problemet. Vi noterar att retoriska figurer är ett graderbart fenomen, med prototypiska fall å ena sidan, och klara icke-fall å andra sidan; däremellan finns ett brett spektrum av gränsfall. Detta gör det naturligt att se problemet som en uppgift som gäller rangordning snarare än binär klassificering. Vi skapar därför en modell för att rangordna repetitioner efter sannolikheten att de har en retorisk effekt. Därigenom tillåts systemets användare att själva avgöra hur gränsfall ska hanteras. För det andra försöker vi undvika tänkbara svårigheter med att samla in annoterade data för att träna modellen för rangordning. Genom att använda en selektiv metod kan vi reducera mängden annoteringsarbete tusenfalt för kiasm och tiofalt för anafor och epifor. Det är alltså möjligt att utveckla ett system för att identifiera de aktuella retoriska figurerna utan en stor mängd manuell annotering. Slutligen föreslår vi en metod för utvärdering och tillämpar den på våra modeller. Utvärderingen visar att vi även med en korpus där få exempel är annoterade kan träna ett system för identifiering av repetitiva figurer med godtagbart resultat. Vi undersöker effekten av olika särdrag som bygger på t.ex. längd, n-gram, ordklasser och syntaktiska roller. En slutsats är att olika särdrag är användbara i olika grad för olika figurer. Vi prövar också systemet på ytterligare texttyper: politisk diskurs, skönlitteratur, titlar på artiklar och romaner, samt citat. Utvärderingen visar att systemet är robust vad gäller genreskillnader. Vi ser även att figurernas frekvens varierar över olika genrer. digital humanities figure of speech rhetorical device machine learning annotation
118	Finding Synonyms in Medical Texts : Creating a system for automatic synonym extraction from medical texts Cederblad, Gustav January 2018 (has links) This thesis describes the work of creating an automatic system for identifying synonyms and semantically related words in medical texts. Before this work, as a part of the project E-care@home, medical texts have been classified as either lay or specialized by both a lay annotator and an expert annotator. The lay annotator, in this case, is a person without any medical knowledge, whereas the expert annotator has professional knowledge in medicine. Using these texts made it possible to create co-occurrences matrices from which the related words could be identified. Fifteen medical terms were chosen as system input. The Dice similarity of these words in a context window of ten words around them was calculated. As output, five candidate related terms for each medical term was returned. Only unigrams were considered. The candidate related terms were evaluated using a questionnaire, where 223 healthcare professionals rated the similarity using a scale from one to five. A Fleiss kappa test showed that the agreement among these raters was 0.28, which is a fair agreement. The evaluation further showed that there was a significant correlation between the human ratings and the relatedness score (Dice similarity). That is, words with higher Dice similarity tended to get a higher human rating. However, the Dice similarity interval in which the words got the highest average human rating was 0.35-0.39. This result means that there is much room for improving the system. Further developments of the system should remove the unigram limitation and expand the corpus the provide a more accurate and reliable result. eHealth distributional semantics medical synonyms semantic relations word similarity
119	Better cooperation through communication in multi-agent reinforcement learning Kiseliou, Ivan January 2020 (has links) Cooperative needs play a critical role in the organisation of natural systems of communications. A number of recent studies in multi-agent reinforcement learning have established that artiﬁcial intelligence agents are similarly able to develop functional communication when required to complete a cooperative task. This thesis studies the emergence of communication in reinforcement learning agents, using a custom card game environment as a test-bed. Two contrasting approaches encompassing continuous and discrete modes of communication were appraised experimentally. Based on the average game completion rate, the agents provisioned with a continuous communication channel consistently exceed the no-communication baseline. A qualitative analysis of the agents’ behavioural strategies reveals a clearly deﬁned communication protocol as well as the deployment of playing tactics unseen in the baseline agents. On the other hand, the agents equipped with the discrete channel fail to learn to utilise it eﬀectively, ultimately showing no improvement from the baseline. Reinforcement Learning emergent communication
120	Studies of Cipher Keys from the 16th Century : Transcription, Systematisation and Analysis Tudor, Crina January 2019 (has links) In historical cryptography, a cipher key represents a set of rules by which we can convert between plaintext and ciphertext within an encryption system. Presently, there are not many studies that focus on analysing keys,especially not on a large scale or done in a systematic manner. In this paper, we describe a uniform transcription standard for the keys in the DECODE database. This way, we intend to lay a strong foundation to facilitate further studies on large sets of key transcriptions. We believe that a homogeneous set of transcriptions would be an ideal starting point for comparative studies, especially from a chronological perspective, as this can reveal potential patterns in the evolution of encryption methods. We also build a script that can perform an in-depth analysis of the components of a key, using our standardized transcription files as input. Finally, we give a detailed account of our findings and show that our method can reliably extract valuable information from the transcription file, such as the method of encryption or the types of symbols used for encoding, without the need of additional manual analysis of the original key. keys ciphers historical keys cryptography key transcription

Search results