Global ETD Search

411	Statistical Text Analysis for Social Science O'Connor, Brendan T. 01 August 2014 (has links) What can text corpora tell us about society? How can automatic text analysis algorithms efficiently and reliably analyze the social processes revealed in language production? This work develops statistical text analyses of dynamic social and news media datasets to extract indicators of underlying social phenomena, and to reveal how social factors guide linguistic production. This is illustrated through three case studies: first, examining whether sentiment expressed in social media can track opinion polls on economic and political topics (Chapter 3); second, analyzing how novel online slang terms can be very specific to geographic and demographic communities, and how these social factors affect their transmission over time (Chapters 4 and 5); and third, automatically extracting political events from news articles, to assist analyses of the interactions of international actors over time (Chapter 6). We demonstrate a variety of computational, linguistic, and statistical tools that are employed for these analyses, and also contribute MiTextExplorer, an interactive system for exploratory analysis of text data against document covariates, whose design was informed by the experience of researching these and other similar works (Chapter 2). These case studies illustrate recurring themes toward developing text analysis as a social science methodology: computational and statistical complexity, and domain knowledge and linguistic assumptions. computational social science natural language processing text mining quantitative text analysis machine learning probabilistic graphical models
412	Wo die Geschichte in Büchern sitzt Schneider, Ulrich Johannes 22 July 2014 (has links) (PDF) Bücher, die bewegen, liest man mit dem Bleistift in der Hand, man knickt die Ecken wichtiger Seiten ab, legt Zettel ein und klebt Zeitungsausschnitte in die vorderen oder hinteren Buchspiegel, kurz: Man eignet sie sich an wie selbst geschriebene Texte. Die Geschichte sitzt hier am Rand des Satzspiegels in Anmerkungen, sie sitzt in Unterstreichungen und Kommentaren ebenso wie in Frage- und Ausrufungszeichen. Die Leser mögen so tot sein wie die Autoren: Immer aber sagen diese Zeichen der aufmerksamen Lektüre früherer Epochen, dass Bücher beweglich sind, dass sie den Text vom Autor zum Leser bringen und dort auf autorähnliche Heftigkeit stoßen, die in manchen Fällen zu einem eigenen Text führt, meist aber zu Anmerkungen und Kommentaren, für die es keinen besseren Ort gibt als eben jene Stellen, zu denen sie Anmerkungen und Kommentare sind. Die folgenden Beobachtungen sind die eines Lesers, der den schreibenden Eingriff in gedruckte Texte oft genug geübt hat und nun um Verständnis bittet, dass er nur persönlich sprechen kann und eigene Erlebnisse beim Umgang mit Büchern ausstellt. Im Aufbrechen solcher Intimität zeigen sich Spuren einer oft vernachlässigten Geschichte, die in Büchern sitzt und daraus von Fall zu Fall befreit werden muss. Text Leser Anstreichung Bemerkung Annotation text reader mark comment annotation ddc:000
413	Finite-state canonicalization techniques for historical German Jurish, Bryan January 2011 (has links) This work addresses issues in the automatic preprocessing of historical German input text for use by conventional natural language processing techniques. Conventional techniques cannot adequately account for historical input text due to conventional tools' reliance on a fixed application-specific lexicon keyed by contemporary orthographic surface form on the one hand, and the lack of consistent orthographic conventions in historical input text on the other. Historical spelling variation is treated here as an error-correction problem or "canonicalization" task: an attempt to automatically assign each (historical) input word a unique extant canonical cognate, thus allowing direct application-specific processing (tagging, parsing, etc.) of the returned canonical forms without need for any additional application-specific modifications. In the course of the work, various methods for automatic canonicalization are investigated and empirically evaluated, including conflation by phonetic identity, conflation by lemma instantiation heuristics, canonicalization by weighted finite-state rewrite cascade, and token-wise disambiguation by a dynamic Hidden Markov Model. / Diese Arbeit behandelt Themen der automatischen Vorverarbeitung historischen deutschen Textes für die Weiterverarbeitung durch konventionelle computerlinguistische Techniken. Konventionelle Techniken können historischen Text wegen des hohen Grads an graphematischer Variation in solchem Text ohne eine solche Vorverarbeitung nicht zufriedenstellend behandeln. Variation in der historischen Rechtschreibung wird hier als Fehlerkorrekturproblem oder "Kanonikalisierungsaufgabe" behandelt: ein Versuch, jedem (historischen) Eingabewort eine eindeutige extante Äquivalente zuzuordnen; so können konventionelle Techniken ohne weitere Modifikation direkt auf den gelieferten kanonischen Formen arbeiten. Verschiedene Methoden zur automatischen Kanonikalisierung werden im Rahmen dieser Arbeit untersucht, unter anderem Konflation durch phonetische Identität, Konflation durch Lemma-Instanziierungsheuristiken, Kanonikalisierung durch eine Kaskade gewichteter endlicher Transduktoren, und Disambiguiierung von Konflationskandidaten durch ein dynamisches Hidden Markov Modell. Computerlinguistik Orthographie historischer Text Rechtschreibkorrektur computational linguistics orthography historical text spelling correction Language, Linguistics
414	Šiaulių rajono Šakynos pagrindinės mokyklos kompiuterinio raštingumo tyrimas / The Computer Literacy Research of Šakyna Basic School in Šiauliai District Kirtiklytė, Rita 01 August 2013 (has links) Pastaruoju metu Lietuvoje vis daugiau dėmesio skiriama kompiuterinio raštingumo svarbai ugdant informacinę visuomenę. Kompiuterinis raštingumas daro didelę įtaką įprastam raštingumui, rašto kultūrai, todėl darosi svarbu kompiuterinį raštingumą apžvelgti lituanistiniu aspektu. Bakalauro darbe aptariamos kompiuterinio raštingumo reikalavimų gairės, apibendrinami formalieji kompiuteriu spausdinamo teksto reikalavimai. Darbe analizuojami Šakynos pagrindinės mokyklos administracijos ir mokinių spausdinti tekstai, mokyklos internetiniame puslapyje esanti spausdinta informacija. Tekstai palyginti su kitų Šiaulių rajono pagrindinių mokyklų analogiškais tekstais. Kokybine anketų analize siekiama išsiaiškinti Šiaulių rajono pagrindinių mokyklų mokytojų požiūrį į kompiuterinį raštingumą mokyklose. Atlikus analizę daromos išvados, kad Šakynos mokyklos mokiniai ir visos Šiaulių rajono pagrindinės mokyklos daro nemažai klaidų tekstus rašant kompiuteriu, daugiausiai daroma rašybos ir skyrybos klaidų. Mokytojai teigia žinantys kompiuterinio raštingumo reikalavimus, tačiau iš mokinių atitinkamo raštingumo nereikalauja. / Recently, Lithuania is increasingly focused on the importance of computer literacy in developing the information society. Computer literacy has a significant impact on the normal literacy, written culture, so it becomes important to overview computer literacy from the aspect of Lithuanian-language. This bachelor thesis deals with the computer literacy guidelines and formal requirements for the computer printed text. The paper analyzes the texts of administration and students of Šakyna basic school and information of the school website. The quality of the texts is comparable with adequate texts of other basic schools of Šiauliai district. Qualitative analysis of the questionnaire is designed to ascertain teachers', from basic schools in Šiauliai district, attitude to computer literacy in education. The analysis concludes that students of Šakyna school and all other basic schools of Šiauliai district have a number of errors in the texts when writing a computer. Spelling and punctuation are the main errors. The teachers say they know the requirements of the computer literacy, but still they do not demand students’ competent literacy. Philology Kompiuterinis raštingumas Teksto spausdinimas Teksto formatavimas Computer literacy Text printing Text formatting
415	Automated Biomedical Text Fragmentation In Support Of Biomedical Sentence Fragment Classification Salehi, Sara 29 September 2009 (has links) The past decade has seen a tremendous growth in the amount of biomedical literature, specifically in the area of bioinformatics. As a result, biomedical text categorization has become a central task for providing researchers with literature appropriate for their specific information needs. Pan et al. have explored a method that automatically identifies information-bearing sentence fragments within scientific text. Their proposed method aims to automatically classify sentence fragments into certain sets of categories defined to satisfy specific types of information needs. The categories are grouped into five different dimensions known as Focus, Polarity, Certainty, Evidence, and Trend. The reason that fragments are used as the unit of classification is that the class value along each of these dimensions can change mid-sentence. In order to automatically annotate sentence fragments along the five dimensions, automatically breaking sentences into fragments is a necessary step. The performance of the classifier depends on the sentence fragments. In this study, we investigate the problem of automatic fragmentation of biomedical sentences, which is a fundamental layer in the multi-dimensional fragment classification. In addition, we believe that our proposed fragmentation algorithm can be used in other domains such as sentiment analysis. The goal of sentiment analysis is often to classify the polarity (positive or negative) of a given text. Sentiment classification can be conducted at different levels such as document, sentence, or phrase (fragment) level. Our proposed fragmentation algorithm can be used as a prerequisite for phrase-level sentiment categorization which aims to automatically capture multiple sentiments within a sentence. / Thesis (Master, Computing) -- Queen's University, 2009-09-25 10:08:04.429 text fragmentation fragment automated biomedical text fragmentation edit distance with move
416	Les différences entre la correction de textes manuscrits et la correction de textes dactylographiés et imprimés par ordinateur Godin, Caroline January 2009 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal Correction Évaluation Manuscrit Ordinateur Production écrite Texte dactylographié Computer Correction Evaluation Handwritten text Taped text Writing
417	Die leesbaarheid van akademiese tekste : 'n tekslinguistiese ondersoek / M. Pienaar Pienaar, Mari-Leigh January 2009 (has links) Research conducted in the readability of texts shows that there is an extensive problem with learners' of all phases academic skills in terms of reading comprehension and insight into writing texts in accordance with generally accepted academic standards. It is important that sufficient attention and research are devoted to possible solutions to this problem, since various theorists have found that effective reading and writing skills are of great importance for learners' academic progress and achievement. A reason for this is that most academic material that is made available to learners appear in written format, and that learners have to report on their knowledge in the same format. Although educational institutions formulate academic texts (of which study guides form part) with a high readability level, the current study investigates some problems and shortcomings that appear in study guides. If the readability of the mentioned study material is increased in light of text linguistic insights, learners should have greater accessibility to the textual content, which could impact positively on academic achievement. With the above mentioned in mind, this study focuses on a text linguistic approach to investigating the readability of study guides which are written for first-year learners at a tertiary institution. The main problem addresses how lexical cohesive devices and conjunction markers can be included as part of a text linguistic approach to writing study guides, and how this can be used by authors of study guides in practice as a tool to increase the efficiency of the writing process. The research is conducted on the basis of various theories, which include Systemic Functional Linguistics, Halliday and Hasan's Cohesion theory and Stotsky's adaptation thereof for written academic texts, as well as Hyland's theory about academic metadiscourse. Applicable insights regarding text linguistic criteria for writing academic texts, which are identified through this literature study, are converted into a framework for text analysis and then implemented to investigate the effective use of specific textual markers in the obtained study guides. This will be done in a descriptive and primarily qualitative manner. Ten Afrikaans study guides, pertaining to diverse subject groups, and which were recently used at a tertiary institution as introductory study guides for first-year learners, form the data of this study. In order to conduct a reliable investigation, the data is analised procedurally (with reference to the mentioned text analytical model) by hand and also using WordSmith Tools. Based on both the literature study and the text analysis, guidelines that can be used when writing Afrikaans study guides are formulated from a text linguistic point of view. These guidelines may be used to supplement the existing guide used by writers of study guides at the particular institution. The possibility also exists that these guidelines can be used to improve and standardise the quality and readability of the mentioned learning material. / Thesis (M.A. (Afrikaans and Dutch))--North-West University, Vaal Triangle Campus, 2009. Reader Writer Readability Textual accessibility Academic text Text linguistics Cohesion Coherence Metadiscourse
418	Die leesbaarheid van akademiese tekste : 'n tekslinguistiese ondersoek / M. Pienaar Pienaar, Mari-Leigh January 2009 (has links) Research conducted in the readability of texts shows that there is an extensive problem with learners' of all phases academic skills in terms of reading comprehension and insight into writing texts in accordance with generally accepted academic standards. It is important that sufficient attention and research are devoted to possible solutions to this problem, since various theorists have found that effective reading and writing skills are of great importance for learners' academic progress and achievement. A reason for this is that most academic material that is made available to learners appear in written format, and that learners have to report on their knowledge in the same format. Although educational institutions formulate academic texts (of which study guides form part) with a high readability level, the current study investigates some problems and shortcomings that appear in study guides. If the readability of the mentioned study material is increased in light of text linguistic insights, learners should have greater accessibility to the textual content, which could impact positively on academic achievement. With the above mentioned in mind, this study focuses on a text linguistic approach to investigating the readability of study guides which are written for first-year learners at a tertiary institution. The main problem addresses how lexical cohesive devices and conjunction markers can be included as part of a text linguistic approach to writing study guides, and how this can be used by authors of study guides in practice as a tool to increase the efficiency of the writing process. The research is conducted on the basis of various theories, which include Systemic Functional Linguistics, Halliday and Hasan's Cohesion theory and Stotsky's adaptation thereof for written academic texts, as well as Hyland's theory about academic metadiscourse. Applicable insights regarding text linguistic criteria for writing academic texts, which are identified through this literature study, are converted into a framework for text analysis and then implemented to investigate the effective use of specific textual markers in the obtained study guides. This will be done in a descriptive and primarily qualitative manner. Ten Afrikaans study guides, pertaining to diverse subject groups, and which were recently used at a tertiary institution as introductory study guides for first-year learners, form the data of this study. In order to conduct a reliable investigation, the data is analised procedurally (with reference to the mentioned text analytical model) by hand and also using WordSmith Tools. Based on both the literature study and the text analysis, guidelines that can be used when writing Afrikaans study guides are formulated from a text linguistic point of view. These guidelines may be used to supplement the existing guide used by writers of study guides at the particular institution. The possibility also exists that these guidelines can be used to improve and standardise the quality and readability of the mentioned learning material. / Thesis (M.A. (Afrikaans and Dutch))--North-West University, Vaal Triangle Campus, 2009. Reader Writer Readability Textual accessibility Academic text Text linguistics Cohesion Coherence Metadiscourse
419	An Unsupervised Approach to Detecting and Correcting Errors in Text Islam, Md Aminul 01 June 2011 (has links) In practice, most approaches for text error detection and correction are based on a conventional domain-dependent background dictionary that represents a fixed and static collection of correct words of a given language and, as a result, satisfactory correction can only be achieved if the dictionary covers most tokens of the underlying correct text. Again, most approaches for text correction are for only one or at best a very few types of errors. The purpose of this thesis is to propose an unsupervised approach to detecting and correcting text errors, that can compete with supervised approaches and answer the following questions: Can an unsupervised approach efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature? What is the magnitude of error coverage, in terms of the number of errors that can be corrected? We conclude that (1) it is possible that an unsupervised approach can efficiently detect and correct a text containing multiple errors of both syntactic and semantic nature. Error types include: real-word spelling errors, typographical errors, lexical choice errors, unwanted words, missing words, prepositional errors, article errors, punctuation errors, and many of the grammatical errors (e.g., errors in agreement and verb formation). (2) The magnitude of error coverage, in terms of the number of errors that can be corrected, is almost double of the number of correct words of the text. Although this is not the upper limit, this is what is practically feasible. We use engineering approaches to answer the first question and theoretical approaches to answer and support the second question. We show that finding inherent properties of a correct text using a corpus in the form of an n-gram data set is more appropriate and practical than using other approaches to detecting and correcting errors. Instead of using rule-based approaches and dictionaries, we argue that a corpus can effectively be used to infer the properties of these types of errors, and to detect and correct these errors. We test the robustness of the proposed approach separately for some individual error types, and then for all types of errors. The approach is language-independent, it can be applied to other languages, as long as n-grams are available. The results of this thesis thus suggest that unsupervised approaches, which are often dismissed in favor of supervised ones in the context of many Natural Language Processing (NLP) related tasks, may present an interesting array of NLP-related problem solving strengths. Text Error Detection Spelling Error Google n-gram Unsupervised Text Error Correction
420	Design för lärande och multimodala texter i svenskämnet : En produktorienterad studie av två läromedel i svenska Halonen, Maria January 2015 (has links) This paper presents a study of educational materials used in Swedish language education. The aim of the study is to understand in which way multimodal resources can be used in texts, to benefit in the process of meaning making among pupils in the nine-year compulsory school. The theoretical framework used as a basis for understanding and analysing these educational materials is the social semiotic multimodal perspective and the design theoretical multimodal perspective. The study is a multimodal text analysis but it also involves analyses of the syllabi connected to the subject of Swedish language education. The extended concept of text was introduced in the year of 2000 in the syllabus and today multimodal texts are supposed to be part of the Swedish language education. In course of this study the researcher found that multimodal resources can be used in different ways to benefit in the process of meaning making. The study shows that the use of resources is connected to the different aims among texts and to the affordances of meaning making resources. The aim of texts differs among and in-between the educational materials connected to the different syllabi. The researcher also found that the texts supposed to be included in Swedish language education has increased since the extended concept of text was introduced and according to the process of time. Pupils however, aren´t introduced to strategies for dealing with these new kinds of texts, in the same extent.

Search results