Spelling suggestions: "subject:"text."" "subject:"next.""
441 |
Entity-Centric Text Mining for Historical DocumentsColl Ardanuy, Maria 07 July 2017 (has links)
No description available.
|
442 |
Jag är min egen lärare : En interventionsstudie om självständig textbearbetning av elever i årskurs 6 / I am my own teacher : An experimental study of independent text revision by pupils ingrade 6Joakim, Heining, Emmely, Heiman January 2017 (has links)
The aim of the study was to investigate how an independent revision of a self- produced text for the national examinations in Swedish for grade 6 changes its quality. An additional aim was to study whether this change in quality led to a change in the grade awarded to the text. In this study the pupils were viewed as independent individuals with an ability to think and act on their own, and therefore a cognitive perspective was applied to the result. The study used the texts produced by the pupils during the national examination, and the accompanying grading matrix was used as a yardstick. After the pupils had revised their text, it was graded again and this was compared with the previous grade to manifest the change. When the grading had been done, the texts were divided into one of three categories: improved, deteriorated, or unchanged quality, which enabled a quantification of the results of the study. This shows a general improvement in text quality after the revision. Of the 36 participants in the study, 28 improved the quality of their texts, and 6 of these were judged to have earned higher grades. Only one text was deemed to have declined in quality and earned a lower grade. It is relevant for teachers in today’s school to know that a revision should not just be viewed as a final correction but as part of the entire writing process. The study also shows that pupils who revise a text on their own improve its quality. In some cases it may be appropriate for a teacher to give a response in order to allow pupils a better chance to improve the quality of their texts.
|
443 |
The Swedish translation of concessive conjuncts in Dan Brown’s Angels and DemonsPoltan, Andreas January 2007 (has links)
The purpose of this study is to present and analyze the translation of seven selected concessive conjuncts – anyway, however, although, though, still, nonetheless and yet – in Dan Brown’s novel Angels and Demons translated by Ola Klingberg, by means of a comparative method combined with a qualitative analysis. Background and theory are mainly based on Altenberg (1999, 2002) for the conjuncts and Ingo (1991) for translation strategies. The aim is fulfilled by answering the three research questions: 1. How does Klingberg translate the seven selected concessive conjuncts into Swedish? 2. What factors influence the choice of translation alternative? 3. What kinds of strategies does Klingberg use? The main result is that the conjuncts translate into many different alternatives, although most frequently into the Swedish adversative men, followed by a Swedish concessive like ändå. However, the analysis of anyway is inconclusive because there were not enough tokens. The main conclusion is that translation is a difficult area to be involved in since numerous aspects affect the choice of translation alternative, even though it is shown that it is definitely possible to translate more or less ‘correctly’. A second conclusion is that some words are more likely to be translated with a particular word than others.
|
444 |
Berättande drag i argumenterande elevtexterFjellander, Johanna January 2012 (has links)
Denna uppsats handlar om gymnasieelevers anpassning till olika texttyper i skrift. Uppsatsens frågeställningar är: 1.) Vilka berättande drag återfinns i gymnasieelevers argumenterande texter? 2.) Har gymnasieelevers medvetenhet om den argumenterande texttypen något samband med det betyg som de får på uppgiften?Hypotesen som framläggs är att det förekommer fler berättande drag i texter med lägre betyg, eftersom målen för högre betyg kräver en medvetenhet om olika texttyper.Undersökningen är utförd på debattinlägg skrivna av 20 gymnasieelever, uppgift B1 i det nationella kursprovet i Svenska B vårterminen 2012 Dit vinden blåser (Skolverket 2012b). Utifrån förekomsten eller avsaknaden av fyra berättande drag, utvalda av uppsatsens författare, klassificeras elevtexterna enligt hur väl de uppfyller den argumenterande texttypen. Kriterierna gäller pronomenval, inledning, personliga exempel och ordval.Resultatet av undersökningen visar på att förekomsten av berättande drag återfinns i elevtexter på samtliga betygsnivåer. De berättande dragen i elevtexterna är mest frekventa i de elevtexter som inte har uppnått ett godkänt resultat, det finns lika många berättande drag i G-elevtexterna som i VG-elevtexterna och det är lägst antal berättande drag i MVG-elevtexterna. Därmed verifieras hypotesen.
|
445 |
Úloha a použití řečových aktů v dialozích románu Pýcha a předsudek Jane Austenové / Role and Use of Speech Acts in the Dialogues of the Novel Pride and Prejudice by Jane AustenPellar, Jan January 2017 (has links)
This work from the field of pragmatics introduces the application of the concept of speech acts (see J. L. Austin, J. Searle) to the literary sample of 15 chosen dialogues i.e. 1122 sentences from the novel Pride and Prejudice by the classical English author Jane Austen. It introduces an eight-member modified classification of speech acts: representatives, assertives, directives, connissives, expressives, interrogatives, requestives and daclarations. There are eight literary characters included in the research together with marginally Charlotte Lucas, who use speech acts to express their communicative intentions. The main heroine Elizabeth occurring in 12 dialogues uses mostly representatives, assertives and expressives. The remaining three dialogues involve Mrs Bennet and her husband Mr Bennet. Jane Austen's language is very rich and complex, with frequent occurrence of politeness turns of phrase. Some mixed and multiple categories also add to this complexity (there are 55.8% of simple ones; 39.1% of double, 4.6% of triple, quadruple only 0.5 % of the 969 sentences counted). This work also contains some comments on stylistic analysis featuring selected interesting literary and pragmatic aspects of the dialogical samples.
|
446 |
Unsupervised discovery of relations for analysis of textual data in digital forensicsLouis, Anita Lily 23 August 2010 (has links)
This dissertation addresses the problem of analysing digital data in digital forensics. It will be shown that text mining methods can be adapted and applied to digital forensics to aid analysts to more quickly, efficiently and accurately analyse data to reveal truly useful information. Investigators who wish to utilise digital evidence must examine and organise the data to piece together events and facts of a crime. The difficulty with finding relevant information quickly using the current tools and methods is that these tools rely very heavily on background knowledge for query terms and do not fully utilise the content of the data. A novel framework in which to perform evidence discovery is proposed in order to reduce the quantity of data to be analysed, aid the analysts' exploration of the data and enhance the intelligibility of the presentation of the data. The framework combines information extraction techniques with visual exploration techniques to provide a novel approach to performing evidence discovery, in the form of an evidence discovery system. By utilising unrestricted, unsupervised information extraction techniques, the investigator does not require input queries or keywords for searching, thus enabling the investigator to analyse portions of the data that may not have been identified by keyword searches. The evidence discovery system produces text graphs of the most important concepts and associations extracted from the full text to establish ties between the concepts and provide an overview and general representation of the text. Through an interactive visual interface the investigator can explore the data to identify suspects, events and the relations between suspects. Two models are proposed for performing the relation extraction process of the evidence discovery framework. The first model takes a statistical approach to discovering relations based on co-occurrences of complex concepts. The second model utilises a linguistic approach using named entity extraction and information extraction patterns. A preliminary study was performed to assess the usefulness of a text mining approach to digital forensics as against the traditional information retrieval approach. It was concluded that the novel approach to text analysis for evidence discovery presented in this dissertation is a viable and promising approach. The preliminary experiment showed that the results obtained from the evidence discovery system, using either of the relation extraction models, are sensible and useful. The approach advocated in this dissertation can therefore be successfully applied to the analysis of textual data for digital forensics Copyright / Dissertation (MSc)--University of Pretoria, 2010. / Computer Science / unrestricted
|
447 |
Information extraction from chemical patentsJessop, David M. January 2011 (has links)
The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye - an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) - is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye - 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.
|
448 |
Extraction of chemical structures and reactions from the literatureLowe, Daniel Mark January 2012 (has links)
The ever increasing quantity of chemical literature necessitates the creation of automated techniques for extracting relevant information. This work focuses on two aspects: the conversion of chemical names to computer readable structure representations and the extraction of chemical reactions from text. Chemical names are a common way of communicating chemical structure information. OPSIN (Open Parser for Systematic IUPAC Nomenclature), an open source, freely available algorithm for converting chemical names to structures was developed. OPSIN employs a regular grammar to direct tokenisation and parsing leading to the generation of an XML parse tree. Nomenclature operations are applied successively to the tree with many requiring the manipulation of an in-memory connection table representation of the structure under construction. Areas of nomenclature supported are described with attention being drawn to difficulties that may be encountered in name to structure conversion. Results on sets of generated names and names extracted from patents are presented. On generated names, recall of between 96.2% and 99.0% was achieved with a lower bound of 97.9% on precision with all results either being comparable or superior to the tested commercial solutions. On the patent names OPSIN s recall was 2-10% higher than the tested solutions when the patent names were processed as found in the patents. The uses of OPSIN as a web service and as a tool for identifying chemical names in text are shown to demonstrate the direct utility of this algorithm. A software system for extracting chemical reactions from the text of chemical patents was developed. The system relies on the output of ChemicalTagger, a tool for tagging words and identifying phrases of importance in experimental chemistry text. Improvements to this tool required to facilitate this task are documented. The structure of chemical entities are where possible determined using OPSIN in conjunction with a dictionary of name to structure relationships. Extracted reactions are atom mapped to confirm that they are chemically consistent. 424,621 atom mapped reactions were extracted from 65,034 organic chemistry USPTO patents. On a sample of 100 of these extracted reactions chemical entities were identified with 96.4% recall and 88.9% precision. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. Qualitatively the system captured the essence of the reaction in 95% of cases. This system is expected to be useful in the creation of searchable databases of reactions from chemical patents and in facilitating analysis of the properties of large populations of reactions.
|
449 |
An Unsupervised Approach to Detecting and Correcting Errors in TextIslam, Md Aminul January 2011 (has links)
In practice, most approaches for text error detection and correction are based on a conventional domain-dependent background dictionary that represents a fixed and static collection of correct words of a given language and, as a result, satisfactory correction can only be achieved if the dictionary covers most tokens of the underlying correct text. Again, most approaches for text correction are for only one or at best a very few types of errors.
The purpose of this thesis is to propose an unsupervised approach to detecting and correcting text errors, that can compete with supervised approaches and answer the following questions:
Can an unsupervised approach efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature?
What is the magnitude of error coverage, in terms of the number of errors that can be corrected?
We conclude that (1) it is possible that an unsupervised approach can efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature. Error types include: real-word spelling errors, typographical errors, lexical choice errors, unwanted words, missing words, prepositional errors, article errors, punctuation errors, and many of the grammatical errors (e.g., errors in agreement and verb formation). (2) The magnitude of error coverage, in terms of the number of errors that can be corrected, is almost double of the number of correct words of the text. Although this is not the upper limit, this is what is practically feasible.
We use engineering approaches to answer the first question and theoretical approaches to answer and support the second question. We show that finding inherent properties of a correct text using a corpus in the form of an n-gram data set is more appropriate and practical than using other approaches to detecting and correcting errors. Instead of using rule-based approaches and dictionaries, we argue that a corpus can effectively be used to infer the properties of these types of errors, and to detect and correct these errors. We test the robustness of the proposed approach separately for some individual error types, and then for all types of errors.
The approach is language-independent, it can be applied to other languages, as long as n-grams are available.
The results of this thesis thus suggest that unsupervised approaches, which are often dismissed in favor of supervised ones in the context of many Natural Language Processing (NLP) related tasks, may present an interesting array of NLP-related problem solving strengths.
|
450 |
The Business Value of Text MiningStolt, Richard January 2017 (has links)
Text mining is an enabling technology that will come to change the process for how businesses derive insights & knowledge from the textual data available to them. The current literature has its focus set on the text mining algorithms and techniques, whereas the practical aspects of text mining are lacking. The efforts of this study aims at helping companies understand what the business value of text mining is with the help of a case study. Subsequently, an SMS-survey method was used to identify additional business areas where text mining could be used to derive business value from. A literature review was conducted to conceptualize the business value of text mining, thus a concept matrix was established. Here a business category and its relative: derived insights & knowledge, domain, and data source are specified. The concept matrix was from then on used to decide when information was of business value, to prove that text mining could be used to derive information of business value.Text mining analyses was conducted on traffic school data of survey feedback. The results were several patterns, where the business value was derived mainly for the categories of Quality Control & Quality Assurance. After comparing the results of the SMS-survey with the case study empiricism, some difficulties emerged in the categorization of derived information, implying the categories are required to become more specific and distinct. Furthermore, the concept matrix does not comprise all of the business categories that are sure to exist.
|
Page generated in 0.418 seconds