1 |
Word based off-line handwritten Arabic classification and recognition : design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approachesAlKhateeb, Jawad Hasan Yasin January 2010 (has links)
The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text. The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using ii the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved.
2 |
Loslabern. Über das (ungenierte) Brechen von TextmusternJach, Daniel 01 August 2011 (has links)
When it comes to the production and reception of texts, most linguists will readily agree that writers as well as readers constitute and follow typical text patterns as they produce and read texts. It has become common today to describe text patterns as typical sets of manifestations of formal, thematic, and pragmatic features that arise from resemblances between different texts and that are inscribed in the communicative memories of a language community. However, speakers also transcend text patterns every day and oscillate between following and overstepping textual rules. This thesis investigates how speakers categorize linguistic knowledge in text patterns, and follow and transcend these patterns in everyday communication, using the example of Rainald Goetz’ text Loslabern. Excerpts from reviews about Loslabern illustrate how readers perceive the text: Some readers consider Loslabern as a ‘ruined’ text that falls apart, whereas others describe Loslabern as ‘ocean-like’ and fluid. Based on these reader experiences, the thesis attempts to answer the following central question: How can we describe ‘Loslabern’ and connatural texts from the viewpoint of textual linguistics in accordance with the readers’ intuitions? The thesis proposes and discusses three options: Textual linguistics may describe Loslabern i) as a broken text, ii) in terms of different concepts of text pattern, and iii) in terms of a novel concept of text pattern. The analytical section focuses on a discussion of different concepts of text pattern: discrete structural and prototypical pragmatic concepts. It examines how these concepts fall short of describing Loslabern in accordance with the readers’ intuitions. Following Wittgenstein and his concept of family resemblance, it creates a multi-dimensional and open concept of text patterns. This concept enables textual linguistics to depict the intertextual embeddedness of Loslabern and other texts systematically, to gain insight into the mechanisms of forming and transcending text patterns, and to describe Loslabern in accordance with the diversity of readers’ intuitions. In doing so, the thesis points at new directions in linguistics as well as literary studies.
3 |
Saco-SR-konflikten 1971 – en analys av opinionsbildning i tidningsledare / The Saco-SR Conflict of 1971 – An Analysis of Influencing Opinion in Newspaper LeadersHellström, Gunilla January 2011 (has links)
The aim of this thesis is to study what means are used in newspaper leaders (editorials) to influence public opinion. In order to obtain a wide range of such means, I have chosen material that has a clear timeframe and illustrates strong political antagonism, concerning the 1971 conflict between the Saco and SR unions and the Swedish state. Leaders from eight different newspapers with different party affiliations are analysed – six morning and two evening newspapers. What type of message leaders convey is examined mainly at the sentence level. Writers report what happened, assess the situation and analyse the causes and explanations for there being a labour conflict. They express criticism of those involved in various ways and exhort them to take recommended courses of action to resolve the conflict. Paragraphs can also be categorised in this way. How criticism is expressed is studied in detail because the material is rich in critical utterances of different types. Various theories about text types and speech act theory provide a theoretical background that is applied to the material. A number of different theories about what defines a genre are presented and tested on the leaders. The results of the investigation indicate that a large number of leaders from the morning newspapers are structured in a similar way, with the paragraph as the unit. They reveal a pattern, the normal pattern, where information is presented in a given order in the majority of morning leaders and the greatest number of message types is used. There is also a pattern of analysis/criticism, with critical and analytical paragraphs alternating and the analysis substantiating the criticism, as a rule. The few leaders in the morning newspapers that do not form a pattern may be strongly critical or almost solely analytical. One of the morning newspapers has many critical leaders that argue or incite. No analysis is made of evening newspaper leaders at the paragraph level since the paragraphs are short; instead, they are analysed as a whole, as are the argumentative leaders. The analysis shows that many leaders are structured in a similar way while at the same time there is considerable variation in the material, which is attributable to there being different types of editorials.
4 |
La Commission européenne et ses pratiques communicatives : Étude des dimensions linguistiques et des enjeux politiques des communiqués de presse / Europeiska kommissionens kommunikativa praktiker : En studie av pressmeddelandenas språkliga och politiska dimensionerLindholm, Maria January 2007 (has links)
I den här avhandlingen studeras Europeiska kommissionens kommunikativa praktiker i ljuset av de pressmeddelanden som dagligen distribueras till världens största presskår i Bryssel, men också via internet till andra journalister och allmänheten. Övergripande syften med avhandlingen är att beskriva textproduktionen i denna en av världens största textproducenter och att lyfta fram den, hittills förvånansvärt osynliga, språkliga dimensionen av kommissionens kommunikation. Avhandlingen tar avstamp i ett dialogiskt perspektiv på kommunikation, där kommunikation förstås som en dynamisk process i vilken människor (sam)agerar i ett givet sammanhang. Avgörande blir således att se pressmeddelandena som en del av den produktions- och distributionskontext de ingår i, både på lokal nivå och på en mer övergripande institutionell nivå. Empiriskt bygger avhandlingen på fältstudier vid Europeiska kommissionen och textanalyser av pressmeddelanden från kommissionen och från franska och svenska departement. Pressmeddelandena studeras både som process och produkt: formuleringsprocesser å ena sidan och textmönster och tempusbruk å den andra. Som ett exempel detaljstuderas produktionen av två pressmeddelanden mot bakgrund av skribenternas förklaringar och motiveringar till sina ändringar. Med sin unika inblick i hur ett pressmeddelande blir till steg för steg och av olika aktörer utgör denna del ett viktigt bidrag till forskningen om pressmeddelanden, som först på senare år blivit mer processinriktad. De olika delstudierna ger alla vid handen att kommissionen, enkelt uttryckt, måste arbeta mer för att underbygga sin argumentation och för att göra sina initiativ mer begripliga, legitima och motiverade. Detta kan i stor utsträckning tillskrivas den mer komplicerade kommunikationssituationen som gäller för kommissionen i förhållande till de nationella departement som är jämförelsematerial i studien. / The thesis investigates the European Commission’s communicative practices in the light of the press releases that are distributed daily to the world’s largest press corps in Brussels and on the Internet to other journalists and the general public. The overall aim of the thesis is to describe the text production of one of the largest text producers in the world and to highlight the linguistic dimensions of the Commission’s communicative practices, which until now have received little scholarly attention. The study adopts a dialogical perspective on communication, where communication is understood as a dynamic process in which people interact in a given context. This means that the press releases are seen as parts of the production and distribution context in which they are embedded, both on a local level and on a more general institutional level. The empirical data on which the study is based comprise field studies at the European Commission and text analyses of press releases issued by the Commission and French and Swedish ministries. The press releases are analysed on different linguistic levels, text pattern and the use of tense, on the one hand, and composition processes on the other. As an example, the production of two press releases is studied in detail, in view of the authors’ comments to and motivations for changes to the texts. With its unique insight into how a press release is drafted step by step and by the different parties involved this part of the thesis is an important contribution to research on press releases, which only recently has become more oriented towards the production process. The results of the analyses highlight the fact that the Commission, to a greater extent than the national ministries, must substantiate its argumentation and make its initiatives more comprehensible, legitimate, and motivated. This finding may be ascribed to the more complex communication situation of the Commission, compared to the national ministries, which served as material for comparison in the study.
Page generated in 0.0609 seconds