Global ETD Search

441	The Swedish translation of concessive conjuncts in Dan Brown’s Angels and Demons Poltan, Andreas January 2007 (has links) The purpose of this study is to present and analyze the translation of seven selected concessive conjuncts – anyway, however, although, though, still, nonetheless and yet – in Dan Brown’s novel Angels and Demons translated by Ola Klingberg, by means of a comparative method combined with a qualitative analysis. Background and theory are mainly based on Altenberg (1999, 2002) for the conjuncts and Ingo (1991) for translation strategies. The aim is fulfilled by answering the three research questions: 1. How does Klingberg translate the seven selected concessive conjuncts into Swedish? 2. What factors influence the choice of translation alternative? 3. What kinds of strategies does Klingberg use? The main result is that the conjuncts translate into many different alternatives, although most frequently into the Swedish adversative men, followed by a Swedish concessive like ändå. However, the analysis of anyway is inconclusive because there were not enough tokens. The main conclusion is that translation is a difficult area to be involved in since numerous aspects affect the choice of translation alternative, even though it is shown that it is definitely possible to translate more or less ‘correctly’. A second conclusion is that some words are more likely to be translated with a particular word than others. Angels and Demons concessive conjuncts source text target text translation unexpectedness Specific Languages Studier av enskilda språk
442	Berättande drag i argumenterande elevtexter Fjellander, Johanna January 2012 (has links) Denna uppsats handlar om gymnasieelevers anpassning till olika texttyper i skrift. Uppsatsens frågeställningar är: 1.) Vilka berättande drag återfinns i gymnasieelevers argumenterande texter? 2.) Har gymnasieelevers medvetenhet om den argumenterande texttypen något samband med det betyg som de får på uppgiften?Hypotesen som framläggs är att det förekommer fler berättande drag i texter med lägre betyg, eftersom målen för högre betyg kräver en medvetenhet om olika texttyper.Undersökningen är utförd på debattinlägg skrivna av 20 gymnasieelever, uppgift B1 i det nationella kursprovet i Svenska B vårterminen 2012 Dit vinden blåser (Skolverket 2012b). Utifrån förekomsten eller avsaknaden av fyra berättande drag, utvalda av uppsatsens författare, klassificeras elevtexterna enligt hur väl de uppfyller den argumenterande texttypen. Kriterierna gäller pronomenval, inledning, personliga exempel och ordval.Resultatet av undersökningen visar på att förekomsten av berättande drag återfinns i elevtexter på samtliga betygsnivåer. De berättande dragen i elevtexterna är mest frekventa i de elevtexter som inte har uppnått ett godkänt resultat, det finns lika många berättande drag i G-elevtexterna som i VG-elevtexterna och det är lägst antal berättande drag i MVG-elevtexterna. Därmed verifieras hypotesen. texttyp genre elevtext berättande text argumenterande text. Specific Languages Studier av enskilda språk
443	Úloha a použití řečových aktů v dialozích románu Pýcha a předsudek Jane Austenové / Role and Use of Speech Acts in the Dialogues of the Novel Pride and Prejudice by Jane Austen Pellar, Jan January 2017 (has links) This work from the field of pragmatics introduces the application of the concept of speech acts (see J. L. Austin, J. Searle) to the literary sample of 15 chosen dialogues i.e. 1122 sentences from the novel Pride and Prejudice by the classical English author Jane Austen. It introduces an eight-member modified classification of speech acts: representatives, assertives, directives, connissives, expressives, interrogatives, requestives and daclarations. There are eight literary characters included in the research together with marginally Charlotte Lucas, who use speech acts to express their communicative intentions. The main heroine Elizabeth occurring in 12 dialogues uses mostly representatives, assertives and expressives. The remaining three dialogues involve Mrs Bennet and her husband Mr Bennet. Jane Austen's language is very rich and complex, with frequent occurrence of politeness turns of phrase. Some mixed and multiple categories also add to this complexity (there are 55.8% of simple ones; 39.1% of double, 4.6% of triple, quadruple only 0.5 % of the 969 sentences counted). This work also contains some comments on stylistic analysis featuring selected interesting literary and pragmatic aspects of the dialogical samples.
444	Unsupervised discovery of relations for analysis of textual data in digital forensics Louis, Anita Lily 23 August 2010 (has links) This dissertation addresses the problem of analysing digital data in digital forensics. It will be shown that text mining methods can be adapted and applied to digital forensics to aid analysts to more quickly, efficiently and accurately analyse data to reveal truly useful information. Investigators who wish to utilise digital evidence must examine and organise the data to piece together events and facts of a crime. The difficulty with finding relevant information quickly using the current tools and methods is that these tools rely very heavily on background knowledge for query terms and do not fully utilise the content of the data. A novel framework in which to perform evidence discovery is proposed in order to reduce the quantity of data to be analysed, aid the analysts' exploration of the data and enhance the intelligibility of the presentation of the data. The framework combines information extraction techniques with visual exploration techniques to provide a novel approach to performing evidence discovery, in the form of an evidence discovery system. By utilising unrestricted, unsupervised information extraction techniques, the investigator does not require input queries or keywords for searching, thus enabling the investigator to analyse portions of the data that may not have been identified by keyword searches. The evidence discovery system produces text graphs of the most important concepts and associations extracted from the full text to establish ties between the concepts and provide an overview and general representation of the text. Through an interactive visual interface the investigator can explore the data to identify suspects, events and the relations between suspects. Two models are proposed for performing the relation extraction process of the evidence discovery framework. The first model takes a statistical approach to discovering relations based on co-occurrences of complex concepts. The second model utilises a linguistic approach using named entity extraction and information extraction patterns. A preliminary study was performed to assess the usefulness of a text mining approach to digital forensics as against the traditional information retrieval approach. It was concluded that the novel approach to text analysis for evidence discovery presented in this dissertation is a viable and promising approach. The preliminary experiment showed that the results obtained from the evidence discovery system, using either of the relation extraction models, are sensible and useful. The approach advocated in this dissertation can therefore be successfully applied to the analysis of textual data for digital forensics Copyright / Dissertation (MSc)--University of Pretoria, 2010. / Computer Science / unrestricted Text analysis Text mining Information extraction Relation discovery Digital forensics UCTD
445	Information extraction from chemical patents Jessop, David M. January 2011 (has links) The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye - an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) - is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye - 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community. 541
446	Extraction of chemical structures and reactions from the literature Lowe, Daniel Mark January 2012 (has links) The ever increasing quantity of chemical literature necessitates the creation of automated techniques for extracting relevant information. This work focuses on two aspects: the conversion of chemical names to computer readable structure representations and the extraction of chemical reactions from text. Chemical names are a common way of communicating chemical structure information. OPSIN (Open Parser for Systematic IUPAC Nomenclature), an open source, freely available algorithm for converting chemical names to structures was developed. OPSIN employs a regular grammar to direct tokenisation and parsing leading to the generation of an XML parse tree. Nomenclature operations are applied successively to the tree with many requiring the manipulation of an in-memory connection table representation of the structure under construction. Areas of nomenclature supported are described with attention being drawn to difficulties that may be encountered in name to structure conversion. Results on sets of generated names and names extracted from patents are presented. On generated names, recall of between 96.2% and 99.0% was achieved with a lower bound of 97.9% on precision with all results either being comparable or superior to the tested commercial solutions. On the patent names OPSIN s recall was 2-10% higher than the tested solutions when the patent names were processed as found in the patents. The uses of OPSIN as a web service and as a tool for identifying chemical names in text are shown to demonstrate the direct utility of this algorithm. A software system for extracting chemical reactions from the text of chemical patents was developed. The system relies on the output of ChemicalTagger, a tool for tagging words and identifying phrases of importance in experimental chemistry text. Improvements to this tool required to facilitate this task are documented. The structure of chemical entities are where possible determined using OPSIN in conjunction with a dictionary of name to structure relationships. Extracted reactions are atom mapped to confirm that they are chemically consistent. 424,621 atom mapped reactions were extracted from 65,034 organic chemistry USPTO patents. On a sample of 100 of these extracted reactions chemical entities were identified with 96.4% recall and 88.9% precision. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. Qualitatively the system captured the essence of the reaction in 95% of cases. This system is expected to be useful in the creation of searchable databases of reactions from chemical patents and in facilitating analysis of the properties of large populations of reactions. 005.74
447	An Unsupervised Approach to Detecting and Correcting Errors in Text Islam, Md Aminul January 2011 (has links) In practice, most approaches for text error detection and correction are based on a conventional domain-dependent background dictionary that represents a fixed and static collection of correct words of a given language and, as a result, satisfactory correction can only be achieved if the dictionary covers most tokens of the underlying correct text. Again, most approaches for text correction are for only one or at best a very few types of errors. The purpose of this thesis is to propose an unsupervised approach to detecting and correcting text errors, that can compete with supervised approaches and answer the following questions: Can an unsupervised approach efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature? What is the magnitude of error coverage, in terms of the number of errors that can be corrected? We conclude that (1) it is possible that an unsupervised approach can efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature. Error types include: real-word spelling errors, typographical errors, lexical choice errors, unwanted words, missing words, prepositional errors, article errors, punctuation errors, and many of the grammatical errors (e.g., errors in agreement and verb formation). (2) The magnitude of error coverage, in terms of the number of errors that can be corrected, is almost double of the number of correct words of the text. Although this is not the upper limit, this is what is practically feasible. We use engineering approaches to answer the first question and theoretical approaches to answer and support the second question. We show that finding inherent properties of a correct text using a corpus in the form of an n-gram data set is more appropriate and practical than using other approaches to detecting and correcting errors. Instead of using rule-based approaches and dictionaries, we argue that a corpus can effectively be used to infer the properties of these types of errors, and to detect and correct these errors. We test the robustness of the proposed approach separately for some individual error types, and then for all types of errors. The approach is language-independent, it can be applied to other languages, as long as n-grams are available. The results of this thesis thus suggest that unsupervised approaches, which are often dismissed in favor of supervised ones in the context of many Natural Language Processing (NLP) related tasks, may present an interesting array of NLP-related problem solving strengths. Text Error Detection Spelling Error Google n-gram Unsupervised Text Error Correction
448	The Business Value of Text Mining Stolt, Richard January 2017 (has links) Text mining is an enabling technology that will come to change the process for how businesses derive insights & knowledge from the textual data available to them. The current literature has its focus set on the text mining algorithms and techniques, whereas the practical aspects of text mining are lacking. The efforts of this study aims at helping companies understand what the business value of text mining is with the help of a case study. Subsequently, an SMS-survey method was used to identify additional business areas where text mining could be used to derive business value from. A literature review was conducted to conceptualize the business value of text mining, thus a concept matrix was established. Here a business category and its relative: derived insights & knowledge, domain, and data source are specified. The concept matrix was from then on used to decide when information was of business value, to prove that text mining could be used to derive information of business value.Text mining analyses was conducted on traffic school data of survey feedback. The results were several patterns, where the business value was derived mainly for the categories of Quality Control & Quality Assurance. After comparing the results of the SMS-survey with the case study empiricism, some difficulties emerged in the categorization of derived information, implying the categories are required to become more specific and distinct. Furthermore, the concept matrix does not comprise all of the business categories that are sure to exist. business value text mining survey data analysis business value of text mining Engineering and Technology Teknik och teknologier
449	Mining patient journeys from healthcare narratives Dehghan, Azad January 2015 (has links) The aim of the thesis is to investigate the feasibility of using text mining methods to reconstruct patient journeys from unstructured clinical narratives. A novel method to extract and represent patient journeys is proposed and evaluated in this thesis. A composition of methods were designed, developed and evaluated to this end; which included health-related concept extraction, temporal information extraction, and concept clustering and automated work-flow generation. A suite of methods to extract clinical information from healthcare narratives were proposed and evaluated in order to enable chronological ordering of clinical concepts. Specifically, we proposed and evaluated a data-driven method to identify key clinical events (i.e., medical problems, treatments, and tests) using a sequence labelling algorithm, CRF, with a combination of lexical and syntactic features, and a rule-based post-processing method including label correction, boundary adjustment and false positive filter. The method was evaluated as part of the 2012 i2b2 challengeand achieved a state-of-the-art performance with a strict and lenient micro F1-measure of 83.45% and 91.13% respectively. A method to extract temporal expressions using a hybrid knowledge- (dictionary and rules) and data-driven (CRF) has been proposed and evaluated. The method demonstrated the state-of-the-art performance at the 2012 i2b2 challenge: F1-measure of 90.48% and accuracy of 70.44% for identification and normalisation respectively. For temporal ordering of events we proposed and evaluated a knowledge-driven method, with a F1-measure of 62.96% (considering the reduced temporal graph) or 70.22% for extraction of temporal links. The method developed consisted of initial rule-based identification and classification components which utilised contextual lexico-syntactic cues for inter-sentence links, string similarity for co-reference links, and subsequently a temporal closure component to calculate transitive relations of the extracted links. In a case study of survivors of childhood central nervous system tumours (medulloblastoma), qualitative evaluation showed that we were able to capture specific trends part of patient journeys. An overall quantitative evaluation score (average precision and recall) of 94-100% for individual and 97% for aggregated patient journeys were also achieved. Hence, indicating that text mining methods can be used to identify, extract and temporally organise key clinical concepts that make up a patient’s journey. We also presented an analyses of healthcare narratives, specifically exploring the content of clinical and patient narratives by using methods developed to extract patient journeys. We found that health-related quality of life concepts are more common in patient narrative, while clinical concepts (e.g., medical problems, treatments, tests) are more prevalent in clinical narratives. In addition, while both aggregated sets of narratives contain all investigated concepts; clinical narratives contain, proportionally, more health-related quality of life concepts than clinical concepts found in patient narratives. These results demonstrate that automated concept extraction, in particular health-related quality of life, as part of standard clinical practice is feasible. The proposed method presented herein demonstrated that text mining methods can be efficiently used to identify, extract and temporally organise key clinical concepts that make up a patient’s journey in a healthcare system. Automated reconstruction of patient journeys can potentially be of value for clinical practitioners and researchers, to aid large scale analyses of implemented care pathways, and subsequently help monitor, compare, develop and adjust clinical guidelines both in the areas of chronic diseases where there is plenty of data and rare conditions where potentially there are no established guidelines. 616.02
450	Rozpoznávání rukou psaného textu / Handwriting Recognition Zouhar, David January 2012 (has links) This diploma thesis deals with handwriting recognition in real-time. It describes the ways how the intput data are processed. It is also focused on the classi cation methods, which are used for the recognition. It especially describes hidden Markov models. It also present the evaluation of the success of the recognition based on implemented experiments. The alternative keyboard for MeeGo system was created for this thesis as well. The established system achieved the success above 96%.

Search results