Global ETD Search

91	Release of the MySQL based implementation of the CTS protocol Tiepmar, Jochen January 2016 (has links) In a project called "A Library of a Billion Words" we needed an implementation of the CTS protocol that is capable of handling a text collection containing at least 1 billion words. Because the existing solutions did not work for this scale or were still in development I started an implementation of the CTS protocol using methods that MySQL provides. Last year we published a paper that introduced a prototype with the core functionalities without being compliant with the specifications of CTS (Tiepmar et al., 2013). The purpose of this paper is to describe and evaluate the MySQL based implementa-tion now that it is fulfilling the specifications version 5.0 rc.1 and mark it as finished and ready to use. Fur-ther information, online instances of CTS for all de-scribed datasets and binaries can be accessed via the projects website1. Reference Tiepmar J, Teichmann C, Heyer G, Berti M and Crane G. 2013. A new Implementation for Canonical Text Services. in Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH). info:eu-repo/classification/ddc/930 ddc:930
92	Extraction of Text Objects in Image and Video Documents Zhang, Jing 01 January 2012 (has links) The popularity of digital image and video is increasing rapidly. To help users navigate libraries of image and video, Content Based Information Retrieval (CBIR) system that can automatically index image and video documents are needed. However, due to the semantic gap between low-level machine descriptors and high-level semantic descriptors, the existing CBIR systems are still far from perfect. Text embedded in multi-media data, as a well-defined model of concepts for humans' communication, contains much semantic information related to the content. This text information can provide a much truer form of content-based access to the image and video documents if it can be extracted and harnessed efficiently. This dissertation solves the problem involved in detecting text object in image and video and tracking text event in video. For text detection problem, we propose a new unsupervised text detection algorithm. A new text model is constructed to describe text object using pictorial structure. Each character is a part in the model and every two neighboring characters are connected by a spring-like link. Two characters and the link connecting them are defined as a text unit. We localize candidate parts by extracting closed boundaries and initialize the links by connecting two neighboring candidate parts based on the spatial relationship of characters. For every candidate part, we compute character energy using three new character features, averaged angle difference of corresponding pairs, fraction of non-noise pairs, and vector of stroke width. They are extracted based on our observation that the edge of a character can be divided into two sets with high similarities in length, curvature, and orientation. For every candidate link, we compute link energy based on our observation that the characters of a text typically align along certain direction with similar color, size, and stroke width. For every candidate text unit, we combine character and link energies to compute text unit energy which indicates the probability that the candidate text model is a real text object. The final text detection results are generated using a text unit energy based thresholding. For text tracking problem, we construct a text event model by using pictorial structure as well. In this model, the detected text object in each video frame is a part and two neighboring text objects of a text event are connected by a spring-like link. Inter-frame link energy is computed for each link based on the character energy, similarity of neighboring text objects, and motion information. After refining the model using inter-frame link energy, the remaining text event models are marked as text events. At character level, because the proposed method is based on the assumption that the strokes of a character have uniform thickness, it can detect and localize characters from different languages in different styles, such as typewritten text or handwriting text, if the characters have approximately uniform stroke thickness. At text level, however, because the spatial relationship between two neighboring characters is used to localize text objects, the proposed method may fail to detect and localize the characters with multiple separate strokes or connected characters. For example, some East Asian language characters, such as Chinese, Japanese, and Korean, have many strokes of a single character. We need to group the strokes first to form single characters and then group characters to form text objects. While, the characters of some languages, such Arabic and Hindi, are connected together, we cannot extract spatial information between neighboring characters since they are detected as a single character. Therefore, in current stage the proposed method can detect and localize the text objects that are composed of separate characters with connected strokes with approximately uniform thickness. We evaluated our method comprehensively using three English language-based image and video datasets: ICDAR 2003/2005 text locating dataset (258 training images and 251 test images), Microsoft Street View text detection dataset (307 street view images), and VACE video dataset (50 broadcast news videos from CNN and ABC). The experimental results demonstrate that the proposed text detection method can capture the inherent properties of text and discriminate text from other objects efficiently. Character Feature Text Detection Text Event Model Text Localization Text Model Text Tracking American Studies Arts and Humanities Computer Sciences
93	Literární text ve výuce francouzského jazyka na českých školách / Use of literary text in French teaching on Czech high schools Brodinová, Anna January 2015 (has links) Title: Literary text in French language teaching at Czech high schools Author: Bc. Anna Brodinová, anna.brodinova@gmail.com Department: Institute of Romance studies, Faculty of Arts, Charles University Supervisor: PhDr. Hana Loucká, CSc. Number of characters: 147 252 Keywords: literary text, poetical text, prose, dramatic text, French teaching, high school Abstract: Literary text, meaning a text enriched with aestethical quality, represents one means to convey culture of a country. Its role in foreing language teaching is to help strenghten communicative competency and develop socio-cultural competency. The way literary text is used in didactics and approached in general has changed considerably in terms of individual methodical principles of foreign language teaching. Firstly, in the period of grammar- translation method, literary text was of the utmost importance, contrary to the beginning of the period of direct methods when it was completely abandonned. Nowadays, literary text is understood as one of the effective supplements in foreing language teaching. The Common European Framework of Reference for Languages, considered to be a fundamental document for principles of foreign language teaching at present, identifies for each level of proficiency the ability of reading comprehension. This ability...
94	An Early Childhood Expository Comprehension Measure: A Look At Validity Robertson, MaryBeth Fillerup 01 March 2018 (has links) Many have argued for more informational text to be incorporated into the curriculum, even in the earliest grades. However, it has traditionally been thought that narrative text should precede informational text when introducing children to literacy. Still several studies have demonstrated that preschool children are capable of learning from these texts. Because informational texts are being introduced even in the earliest grades, preschool teachers are in need of ways to assess their students' ability to handle early forms of informational texts. The Early Expository Text Comprehension Assessment (EECA) was developed to help teachers understand the comprehension abilities of their preschool children across several informational text structures. As part of a larger study, the third iteration of this assessment measure, called the EECA-R3, was examined for concurrent validity with the Test of Story Comprehension (TSC), a subtest of the Narrative Language Measure (NLM). Data came from 108 preschool children between the ages of four and five who were attending one of six title one preschools or one of four private preschool classrooms. Correlations that were run between the TSC and the EECA-R3 to determine concurrent validity were positive and significant, suggesting that the EECA-R3 is valid. comprehension assessment expository text informational text preschool Communication Sciences and Disorders
95	The Role of Prior Knowledge and Elaboration in Text Comprehension and Memory: A Comparison of Self-Generated Elaboration versus Text-Provided Elaboration Kim, Sung-il 01 May 1992 (has links) A series of six experiments investigated the effect of text-provided elaborations and prior knowledge on memory for text. In all experiments, subjects read 28 episodes, half of which were associated with well-known individuals, and the other half were associated with unknown individuals. In Experiment 1, text-provided elaborations enhanced recall only when the reader did not possess a high level of prior knowledge. The findings from Experiment 1 were hypothesized to be the result of readers generating relevant elaborations during text comprehension. Experiment 2 supported this hypothesis by providing evidence of self-generated elaborations. Experiment 3 provided evidence that this generation process occurred "on-line." The results from Experiments 4 and 5 extended these findings by showing that readers with high prior knowledge automatically generate causally relevant elaborations when the sentences have a low relation. The findings of Experiment 6 suggest that distinctive text-provided elaborations are more effective than normal text-provided elaborations only when readers have high prior knowledge. self-generated text-provided elaboration text comprehension knoledge Psychology
96	Hearing and Reading Biblical Texts: A Study of Difference - Mark 6:30 - 8:27a Waterford, William Bede, n/a January 2004 (has links) The thesis records a study of difference - the difference between reading and hearing biblical texts. It shows that the types of interpretation people make when reading such texts often differ from those they make when they are hearing the same texts read aloud. The extent of the difference is demonstrated in ten studies where theories relating to reading and hearing are applied to the Greek text of Mark 6:30-8:27a. The biblical texts used in the studies vary in the size, as do the themes and issues investigated. Despite this diversity the results are consistent across all ten studies. Almost all the assessments made in these studies are verified by independent data, such as the published opinions of biblical scholars and literary analyses of the Greek text. As elucidated in the thesis; the results attained, the method utilised and the theories employed are relevant for assessing the types of interpretation people are likely to make when reading and listening to other biblical stories. Because the research encompasses a literary issue and concerns the processes that are used in communication, the approach adopted is a literary one and the methodology incorporates media criticism and audience criticism. Other techniques, such as narrative criticism, rhetorical criticism, and reader response criticism are utilised extensively in the various analyses and assessments. The ten studies are preceded in the thesis by data as to the processes people use in reading texts and in listening to non-reciprocal speech. Such data includes information relating to experiments and studies into the communicative processes that have been carried out over the past fifty years. There is also data as to the theories that have been developed by scholars based on the results of such experiments and studies. These are the theories that are used in this thesis. There are also several analyses in the thesis which collectively demonstrate that texts used in Church liturgies should be those that have been specifically translated to meet the needs of listeners. This is a very important issue, because, even in very literate communities, there are still more Christians who listen to biblical texts being read than those who read such texts for themselves. Biblical text Greek biblical text Church liturgies listening and reading
97	Corpus construction based on Ontological domain knowledge Benis, Nirupama, Kaliyaperumal, Rajaram January 2011 (has links) The purpose of this thesis is to contribute a corpus for sentence level interpretation of biomedical language. The available corpora for the biomedical domain are small in terms of amount of text and predicates. Besides that these corpora are developed rather intuitively. In this effort which we call BioOntoFN, we created a corpus from the domain knowledge provided by an ontology. By doing this we believe that we can provide a rough set of rules to create corpora from ontologies. Besides that we also designed an annotation tool specifically for building our corpus. We built a corpus for biological transport events. The ontology we used is the piece of Gene Ontology pertaining to transport, the term transport GO: 0006810 and all of its child concepts, which could be called a sub-ontology. The annotation of the corpus follows the rules of FrameNet and the output is annotated text that is in an XML format similar to that of FrameNet. The text for the corpus is taken from abstracts of MEDLINE articles. The annotation tool is a GUI created using Java. Text mining Biomedical text mining Natural Language Processing
98	Using Text mining Techniques for automatically classifying Public Opinion Documents Chen, Kuan-hsien 19 January 2009 (has links) In a democratic society, the number of public opinion documents increase with days, and there is a pressing need for automatically classifying these documents. Traditional approach for classifying documents involves the techniques of segmenting words and the use of stop words, corpus, and grammar analysis for retrieving the key terms of documents. However, with the emergence of new terms, the traditional methods that leverage dictionary or thesaurus may incur lower accuracy. Therefore, this study proposes a new method that does not require the prior establishment of a dictionary or thesaurus, and is applicable to documents written in any language and documents containing unstructured text. Specifically, the classification method employs genetic algorithm for achieving this goal. In this method, each training document is represented by several chromosomes, and based on the gene values of these chromosomes, the characteristic terms of the document are determined. The fitness function, which is required by the genetic algorithm for evaluating the fitness of an evolved chromosome, considers the similarity to the chromosomes of documents of other types. This study used data FAQ of e-mail box of Taipei city mayor for evaluating the proposed method by varying the length of documents. The results show that the proposed method achieves the average accuracy rate of 89%, the average precision rate of 47%, and the average recall rate of 45%. In addition, F-measure can reach up to 0.7. The results confirms that the number of training documents, content of training documents, the similarity between the types of documents, and the length of the documents all contribute to the effectiveness of the proposed method. Text Categorization Word Segmentation Genetic Algorithms Public Opinion Text Mining
99	Über Tempel und Texte Schneider, Ulrich Johannes 17 July 2014 (has links) (PDF) Die Epochenschwelle vom 18. zum 19. Jahrhundert besteht in einem Schritt vom historischen Rekonstruieren zum hermeneutischen Interpretieren, das jedenfalls zeigt die Geschichte der Hermeneutik und die Geschichte der Geschichtsschreibung. Historische Bilder - der Philosophie, der Mythologie, allgemein - sind damals entworfen und revidiert worden, die sich noch heute im Umgang mit der Philosophie beobachten lassen. Jener Streit um die Bedeutung von Texten für die Philosophie scheint in dieser Epochenschwelle entschieden: Das Immanenzverhältnis ersetzt das Transzendenzverhältnis. Texte sind Orte der Philosophie, nicht Mittel. Aber wie gestaltet sich diese Ersetzung? Ist sie Folge, Folgerung, bildet sie eine selbst immanente Logik, so etwas wie die Logik des historischen Bildes der Philosophie? Im folgenden wird ein Bildervergleich klären helfen, was philosophische Texte sind, auch wenn die angeführten Bilder Tempel zeigen. Beide Bilder lassen sich der für unser heutiges philosophisches Selbstverständnis entscheidenden Epochenschwelle zurechnen. Philosophie Text Immanenz Transzendenz philosophy text immanence transcendence ddc:100
100	The textcat Package for n-Gram Based Text Categorization in R Feinerer, Ingo, Buchta, Christian, Geiger, Wilhelm, Rauch, Johannes, Mair, Patrick, Hornik, Kurt 02 1900 (has links) (PDF) Identifying the language used will typically be the first step in most natural language processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Cavnar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. This paper presents the R extension package textcat for n-gram based text categorization which implements both the Cavnar and Trenkle approach as well as a reduced n-gram approach designed to remove redundancies of the original approach. A multi-lingual corpus obtained from the Wikipedia pages available on a selection of topics is used to illustrate the functionality of the package and the performance of the provided language identification methods. (authors' abstract)

Search results