Global ETD Search

11	An investigation into the comprehension of formulaic sequences by Saudi EFL learners Al-Mohizea, Monira Ibrahim January 2013 (has links) This study set out to explore the comprehension of formulaic sequences, -particularly body-part idioms, by Saudi EFL learners. The study is essentially empirical and is situated within a cognitive linguistics-inspired framework. This approach goes hand in hand with construction grammar (cf. Fillmore et al. 1988). This approach proved to be plausible as it treats idioms as a central part of the language, in contrast to traditional approaches, e.g. generative grammar (e.g. Chomsky, 1980) or structural linguistics (e.g. Hockett, 1958) which marginalise idioms as language oddities or part of the 'periphery' but not the 'core'. A total of 91 Arabic-speaking female participants, majoring in Languages and Translation at King Saud University (KSU), Saudi Arabia were recruited for this study. The performance of the participants on idioms - moderated by language proficiency - was assessed in relation to four variables: similarity of idioms to L1, their level of compositionality, their level of transparency, and their frequency in the BNC. The language proficiency was assesed using the Oxford Placement Test (OPT). A test of idioms in a Multiple-Choice Question (MCQ) format was devised, piloted and validated. The test aimed to assess the receptive knowledge of the learners on idioms related to body parts. In addition to this, the test aimed to investigate the factors that affect the performance of the participants on the test of idioms at the comprehension level. The initial criterion of selection for the items was based on similarity to the first language (LI) of the participants (Arabic). A total of 60 items were included, 30 of which were similar to Arabic, and 30 of which were dissimilar. The variables pertaining to the characterizing features of idioms were carefully operationalized. In relation to similarity to L I, two measures were undertaken. First, idioms were categorized based on: (1) the similarity/dissimilarity of the idiom at the linguistic level; (2) the conceptual level that motivates the idiom. Second, native speakers of Arabic were required to judge how similar the idioms are to Arabic on a 5-point Likert scale. The same procedure of operationalization was followed to measure compositionality and transparency, following specific definitions and criteria. As for the frequency of idioms, this was checked from the BNC, following certain formulae so as to capture all possible instances. The data were analyzed following a mixed method design - combining quantitative and qualitative paradigms. The quantitative analyses included running a test for correlation between language proficiency and the overall scores of the learners on the test of idioms, as well as on the correlation between the four variables and the overall scores of the participants on the test of idioms. The qualitative analysis involved the use of think-aloud (TA) protocols, which aimed to tap into some additional factors, explore the strategies employed to understand idioms and also triangulate the results. The findings reveal that language proficiency and the total scores on the test of idioms correlate significantly. Moreover, judgements of the similarity to Ll and transparency judgements by Arabic speakers also correlate significantly with the overall scores on the test of idioms. However, compositionality judgements and frequency hits did not yield any statistically significant correlation. Further, the TA protocols corroborate the quantitative analysis and reveal that language proficiency affects the performance of the participants, as do transparency and similarity to Ll. Abundant cross-linguistic influence (CLl) was noted on the verbalization of 10 participants, who also displayed some discrepancies across different proficiency levels. The findings also show some interesting findings related to ,properties of body-part lexis, such as imageability, concreteness and translatability, as well as familiarity 410.1
12	Text classification in the BNC using corpus and statistical methods Mohamed, Ghada January 2011 (has links) The main part of this thesis sets out to develop a system of categories within a text typology. Although there exist many different approaches to the classification of text into categories, this research fills a gap in the literature, as most work on text classification is based on features external to the text such as the text's purpose, the aim of discourse, and the medium of communication. Text categories that have been set up based on some external features are not linguistically defined. In consequence, texts that belong to the same type are not necessarily similar in their linguistic forms. Even Biber's (1988) linguistically-oriented work was based on externally defined ~registers. Further, establishing text categories based on text-external features favours theoretical and qualitative approaches of text classification. These approaches can be seen as top-down approaches where external features are defined functionally in advance, and subsequently patterns of linguistic features are described in relation to each function. In such a case, the process of linking texts with a particular type is not done in a systematic way. In this thesis, I show how a text typology based on similarities in linguistic form can be developed systematically using a multivariate statistical technique; namely, cluster analysis. Following a review of various possible approaches to multivariate statistical analysis, I argue that cluster analysis is the most appropriate for systematising the study of text classification, because it has the distinctive feature of placing objects into distinct groupings based on their overall similarities across multiple variables. Cluster analysis identifies these grouping algorithmically. The objects to be clustered in my thesis are the written texts in the British National Corpus (BNC). I will make use of the written part only, since results of previous research which attempts to classify texts of this dataset were not very beneficial. Takahashi (2006), for instance, identified merely a broad distinction between formal and informal styles in the written part; whereas in the spoken part, he could come up with insightful results. Thus, it seems justifiable to look at the part of the BNC which Taka..1.ashi found intractable, using a different multivariate technique, to see if this methodology allows patterns to emerge in the dataset. Further, there are two other reasons to use the written BNC. First, some studies (e.g. Akinnaso 1982; Chafe and Danielewicz 1987) suggest that distinctions between text varieties based on frequencies of linguistic features can be identified even within one mode of communication, i.e. writing. Second, analysing written text varieties has direct implications for pedagogy (Biber and Conrad 2009). The variables measured in the written texts of the BNC are linguistic features that have functional associations. However, any linguistic feature can be interpreted functionally; hence, we cannot say that there is an easy way to decide on a list of linguistic features to investigate text varieties. In this thesis, the list of linguistic features is informed by some aspects of Systemic Functional Theory (STF) and characteristics identified in previous research on writing, as opposed to speech. SFT lends itself to the interpretation of how language is used through functional associations of linguistic features, treating meaning and form as two inseparable notions. This characteristic of SFT can be one source to inform my research to some extent, which assumes that a model of text-types can be established by investigating not only the linguistic features shared in each type, but also the functions served by these linguistic features in each type. However, there is no commitment in this study to aspects of SFT other than those I have discussed here. Similarly, the linguistic features that reflect characteristics of speech and writing identified in previous research also have a crucial role in distinguishing between different texts. For instance, writing is elaborate, and this is associated with linguistic features such as subordinate clauses, prepositional phrases, adjectives, and so on. However, these characteristics do not only reflect the distinction between speech and writing; they can also distinguish between different spoken texts or different written texts (see Akinnaso 1982). Thus, the linguistic features seen as important from these two perspectives are included in my list of linguistic features. To make the list more principled and exhaustive, I also consult a comprehensive corpus-based work on English language, along with some microscopic studies examining individual features in different registers. The linguistic features include personal pronouns, passive constructions, prepositional phrases, nominalisation, modal auxiliaries, adverbs, and adjectives. Computing a cluster analysis based on this data is a complex process with many steps. At each step, several alternative techniques are available. Choosing among the available teclmiques is a non-trivial decision, as multiple alternatives are in common use by statisticians. I demonstrate how a process of testing several combinations of clustering methods, in order to determine the most useful/stable clustering combination(s) for use in the classification of texts by their linguistic features . To test the robustness of the clustering algorithms techniques and to validate the cluster analysis, I use three validation techniques for cluster analysis, namely the cophenetic coefficient, the adjusted Rand index, and the AV p-value. The findings of the cluster analysis represent a plausible attempt to systematise the study of the diversity of texts by means of automatic classification. Initially, the cluster analysis resulted in 16 clusters/text types. However, a thorough investigation of those 16 clusters reveals that some clusters represent quite similar text types. Thus, it is possible to establish overall headings for similar types, reflecting their shared linguistic features. The resulting typology contains six major text types: persuasion, narration, informational narration, exposition, scientific exposition, and literary exposition. Cluster analysis thus proves to be a powerful tool for structuring the data, if used with caution. The way it is implemented in this study constitutes an advance in the field of text typology. Finally, a small-scale case study of the validity of the text typology is carried out. A questionnaire is used to find out whether and to what extent my taxonomy corresponds to native speakers' understanding of textual variability, that is, whether the taxonomy has some mental reality for native speakers of English. The results showed that native speakers of English, on the one hand, are good at explicitly identifying the grammatical features associated with scientific exposition and narration; but on the other hand, they are not so good at identifying the grammatical features associated with literary exposition and persuasion. The results also showed that participants seem to have difficulties in identifying grammatical features of informational narration. The results of this small-scale case study indicate that the text typology in my thesis is, to some extent, a phenomenon that native speakers are aware of, and thus we can justify placing our trust in the results - at least in their general pattern, if not in every detail. 410.1
13	The representation of the Arab Spring narrative in English and Arabic news media Alian, Najat Hashem Mohammad January 2016 (has links) Since its emergence in late December 2010, the Arab Spring narrative has sparked many controversies among researchers, commentators, analysts, and scholars from different disciplines around the world in terms of the causes and the reasons behind it and even its name: ‘Arab Spring’. This study explores the Arab Spring narrative from its emergence to its continuing dénouement in the English and Arabic mainstream news media from corpus-linguistic and critical discourse analytic perspectives. The Arab Spring bilingual corpus consists of two main sub-corpora, English and Arabic, compiled from LexisNexis and other news websites. Totalling 15,088 articles and 11,522,846 words, the English sub-corpus consists of 7,018 texts with total of 5,901,416 words, while the Arabic sub-corpus comprises 8,070 news texts and a total of 5,621,430 words. Taken from prominent news media outlets from Western, Arab and Islamic countries, and divided into two major text types (news and editorials and opinions) with date range coverage from 15 June 2010 until 31 August 2013, it allows us to diachronically and synchronically examine the discursive construction of the Arab Spring narrative. Combining quantitative and qualitative methods associated with Corpus Linguistics (CL) and Critical Discourse Analysis (CDA), the current study explores the key topics associated with the Arab Spring at both the linguistic as well as the semantic levels by means of frequency, keyword (KKW list function in my case), collocation list functions and concordance. Analysis also identifies the main news actors and news values. Actors and events are represented, negatively and positively by means of lexical choice, and the different presentation strategies indicate that many of the Arab Spring news stories are politically, socially, and ideologically polarized. The contrasting themes/concepts within the resulting semantic categories (by means of pairs of items with positive/negative connotations) are also prevalent. For example, at the lexical level the following contrasting pairs are revealed: democracy/dictatorship; religious, sectarian/secular; peace/ violence; government/regime; allies/enemy; corruption/ reform, opposition/ support. Similarly, at the grammatical level items, such as pro/anti, is/non- and not, also indicate the contrastive as well as the polarizing nature of the Arab Spring narrative. 410.1
14	The BT Correspondence Corpus 1853-1982 : the development and analysis of archival language resources Morton, R. January 2016 (has links) This thesis reports on the construction and analysis of the British Telecom Correspondence Corpus (BTCC), a searchable database of business letters taken from the archives of British Telecom. The letters in the corpus cover the years 1853-1982. This is a crucial period in the development of business correspondence but is so far underrepresented in available historical corpora. This research contributes knowledge in two main areas. Firstly, a number of methodological issues are highlighted with regard to working with public archives to produce linguistic resources. The way in which archives are typically organised, particularly the lack of item-level metadata, presents a number of challenges in terms of locating relevant material and extracting the sort of metadata that is necessary for linguistic analysis. In this thesis I outline the approach that was taken in identifying and digitising the letters for the BTCC, the issues encountered, and the implications future projects that make use of public archives as a source of linguistic material. Secondly this study contributes new insights into the development of English business correspondence from the nineteenth to the twentieth century. The results show a notable decline in overtly deferential language and an increase in familiar forms. However, these more familiar forms also appear in fixed-phrases and conventional patterns. This suggests that there was a move from formalised distance to formalised friendliness in the language of business correspondence in this period. We also see a shift away from the performance of institutional identity through phrases such as ‘I am directed by…’ towards an increased use of the pronoun ‘we’ to represent corporate positions. This shift in corporate identity seems to coincide with the decline in deferential language. Finally an analysis of moves and strategies used in requests suggests that, as the twentieth century progressed, authors began to use a wider range moves to contextualise and justify their requests. Furthermore, though the same request strategy types remain popular over the timeline of the BTCC, there a degree of diversification in terms of how the most popular request strategies are expressed and indirect strategies that rely more on implicature become somewhat more prevalent. 410.1
15	Stylistics versus statistics : a corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails Wright, David January 2014 (has links) This thesis empirically investigates how a corpus linguistic approach can address the main theoretical and methodological challenges facing the field of forensic authorship analysis. Linguists approach the problem of questioned authorship from the theoretical position that each person has their own distinctive idiolect (Coulthard 2004: 431). However, the notion of idiolect has come under scrutiny in forensic linguistics over recent years for being too abstract to be of practical use (Grant 2010; Turell 2010). At the same time, two competing methodologies have developed in authorship analysis. On the one hand, there are qualitative stylistic approaches, and on the other there are statistical ‘stylometric’ techniques. This study uses a corpus of over 60,000 emails and 2.5 million words written by 176 employees of the former American company Enron to tackle these issues in the contexts of both authorship attribution (identifying authors using linguistic evidence) and author profiling (predicting authors’ social characteristics using linguistic evidence). Analyses reveal that even in shared communicative contexts, and when using very common lexical items, individual Enron employees produce distinctive collocation patterns and lexical co-selections. In turn, these idiolectal elements of linguistic output can be captured and quantified by word n-grams (strings of n words). An attribution experiment is performed using word n-grams to identify the authors of anonymised email samples. Results of the experiment are encouraging, and it is argued that the approach developed here offers a means by which stylistic and statistical techniques can complement each other. Finally, quantitative and qualitative analyses are combined in the sociolinguistic profiling of Enron employees by gender and occupation. Current author profiling research is exclusively statistical in nature. However, the findings here demonstrate that when statistical results are augmented by qualitative evidence, the complex relationship between language use and author identity can be more accurately observed. 410.1
16	Voicing imperial subjects in British literature : a corpus analysis of literary dialect, 1768-1929 Brown, David January 2016 (has links) This study investigates nonstandard dialect as it used in fictional dialogue. The works included in it were produced by British authors between 1768 and 1929 – a period marking the expansion and height of the British Empire. One of the project’s aims is to examine the connections among dialect representation and the imperial project, to investigate how ventriloquizing African diasporic, Chinese, and Indian characters works with related forms of characterization to encode ideologies and relations of power. A related aim is to explore the emergence and evolution of these literary dialects over time and to compare their structures as they are used to impersonate different communities of speakers. In order to track such patterns of representation, a corpus was constructed from the dialogue of 126 novels, plays, and short stories. That dialogue was then annotated for more than 200 lexical, morphological, orthographic, and phonological features. That data enable statistical analyses that model variation in the voicing of speakers and how those voicings change over time. This modeling demonstrates, for example, an increase in the frequency of phonological features for African diasporic dialogue and a countervailing decrease in the frequency and complexity of coded features generally for Indian dialogue. Trends like these that are surfaced though quantitative methods are further contextualized using qualitative, archival data. The analysis ultimately rests on connecting patterns of representation to changes in the imperial political economy, evolving language ideologies that circulate in the Anglophone world, and shifts in sociocultural anxieties that crosscut race and empire. The combined quantitative and qualitative analyses, therefore, expose representational systems – the apparatuses that propagate structures and the social attitudes that accrue to those structures. It further demonstrates that in such propagation, structures and attitudes are complementary. 410.1
17	Meaning construction in popular science : an investigation into cognitive, digital, and empirical approaches to discourse reification Alexander, Marc Gabriel January 2011 (has links) This thesis uses cognitive linguistics and digital humanities techniques to analyse abstract conceptualization in a corpus of popular science texts. Combining techniques from Conceptual Integration Theory, corpus linguistics, data-mining, cognitive pragmatics and computational linguistics, it presents a unified approach to understanding cross-domain mappings in this area, and through case studies of key extracts, describes how concept integration in these texts operates. In more detail, Part I of the thesis describes and implements a comprehensive procedure for semantically analysing large bodies of text using the recently- completed database of the Historical Thesaurus of English. Using log-likelihood statistical measures and semantic annotation techniques on a 600,000 word corpus of abstract popular science, this part establishes both the existence and the extent of significant analogical content in the corpus. Part II then identifies samples which are particularly high in analogical content from the corpus, and proposes an adaptation of empirical and corpus methods to support and enhance conceptual integration (sometimes called conceptual blending) analyses, informed by Part I’s methodologies for the study of analogy on a wider scale. Finally, the thesis closes with a detailed analysis, using this methodology, of examples taken from the example corpus. This analysis illustrates those conclusions which can be drawn from such work, completing the methodological chain of reasoning from wide-scale corpora to narrow-focus semantics, and providing data about the nature of highly-abstract popular science as a genre. The thesis’ original contribution to knowledge is therefore twofold; while contributing to the understanding of the reification of abstractions in discourse, it also focuses on methodological enhancements to existing tools and approaches, aiming to contribute to the established tradition of both analytic and procedural work advancing the digital humanities in the area of language and discourse. 410.1 P Philology. Linguistics
18	A multi-modal corpus approach to the analysis of backchanneling behaviour Knight, Dawn January 2009 (has links) Current methodologies in corpus linguistics have revolutionised the way we look at language. They allow us to make objective observations about written and spoken language in use. However, most corpora are limited in scope because they are unable to capture language and communication beyond the word. This is problematic given that interaction is in fact multi-modal, as meaning is constructed through the interplay of text, gesture and prosody; a combination of verbal and non-verbal characteristics. This thesis outlines, then utilises, a multi-modal approach to corpus linguistics, and examines how such can be used to facilitate our explorations of backchanneling phenomena in conversation, such as gestural and verbal signals of active listenership. Backchannels have been seen as being highly conventionalised, they differ considerably in form, function, interlocutor and location (in context and co-text). Therefore their relevance at any given time in a given conversation is highly conditional. The thesis provides an in-depth investigation of the use of, and the relationship between, spoken and non-verbal forms of this behaviour, focusing on a particular sub-set of gestural forms: head nods. This investigation is undertaken by analysing the patterned use of specific forms and functions of backchannels within and across sentence boundaries, as evidenced in a five-hour sub-corpus of dyadic multi-modal conversational episodes, taken from the Nottingham Multi-Modal Corpus (NMMC). The results from this investigation reveal 22 key findings regarding the collaborative and cooperative nature of backchannels, which function to both support and extend what is already known about such behaviours. Using these findings, the thesis presents an adapted pragmatic-functional linguistic coding matrix for the classification and examination of backchanneling phenomena. This fuses the different, dynamic properties of spoken and non-verbal forms of this behaviour into a single, integrated conceptual model, in order to provide the foundations, a theoretical point-of-entry, for future research of this nature. 410.1 P Philology. Linguistics
19	Evidence of lexical priming in spoken Liverpool English Pace-Sigge, Michael January 2010 (has links) This thesis is about two things. Firstly, drawing on Michael Hoey’s Lexical Priming, it aims to extend the research represented in that book – into the roots of the concept of priming and into how far Hoey’s claims are valid for spoken English corpora. The thesis traces the development of the concept of priming, which was initially work done by computational analysts, psychologists and psycho-linguists, to present a clearer picture of what priming means and in how far the phenomenon of priming has been proven to be a salient model of how man’s mind works. Moving on from that, I demonstrate how this model can be adapted to provide a model of language generation and use as Sinclair (2004) and Hoey (2003 etc.) have done, leading to the linguistic theory of Lexical Priming. Secondly, throughout the thesis two speech communities are compared: a general community of English speakers throughout the UK and a specific community, namely the Liverpool English (Scouse) speakers of Liverpool, UK. In the course of this work, a socio-economic discussion highlights the notion of Liverpool Exceptionalism and, grounded in the theory of lexical priming, I aim to show through corporaled research that this Exceptionalism manifests itself, linguistically, through (amongst other things) specific use of particular words and phrases. I thus research the lexical use of Liverpool speakers in direct comparison to the use by other UK English speakers. I explore the use of “I” and people, indefinite pronouns (anybody, someone etc.), discourse markers (like, really, well, yeah etc.) amongst other key items of spoken discourse where features of two varieties of English may systematically differ. The focus is on divergence found in their collocation, colligation, semantic preference and their lexically driven grammatical patterns. Comparing casual spoken Liverpool English with the casual spoken (UK) English found in the Macmillan and BNC subcorpora, this study finds primings in the patterns of language use that appear in all three corpora. Beyond that, there are primings of language use that appear to be specific to the Liverpool English corpus. With Scouse as the example under the microscope, this is an exploration into how speakers in different speech communities use the same language – but differently. It is not only the phonetic realisation, or the grammatical or lexical differences that define them as a separate speech group – it is the fact that they use the same lexicon in a distinct way. This means that lexical use, rather than just lexical stock, is a characterising feature of dialects. 410.1 PE English
20	Error correction through corpus consultation in EAP writing : an analysis of corpus use in a pre-sessional context Bridle, Marcus January 2015 (has links) This study investigates the effect of corpus consultation on the accuracy of learner written error revisions. It examines the conditions which cause a learner to consult the corpus in correcting errors and whether these revisions are more effective than those made using other corrections methods. Claims have been made for the potential usefulness of corpora in encouraging a better understanding of language through inductive learning (Johns, 1991; Benson, 2001; Watson Todd, 2003). The opportunity for learners to interact with the authentic language used to compile corpora has also been cited as a possible benefit (Thurstun and Candlin, 1998). However, theoretical advantages of using corpus data have not always translated into actual benefits in real learning contexts. Learners frequently encounter difficulties in dealing with the volume of information available to them in concordances and can reject corpus use because it adds to their learning load (Yoon and Hirvela, 2004; Frankenberg Garcia, 2005; Lee and Swales, 2006). This has meant that practical employment of corpus data has sometimes been difficult to implement. In this experiment, learners on a six week pre-sessional English for Academic Purposes (EAP) course were shown how to use the BYU (Brigham Young University) website to access the BNC (British National Corpus) to address written errors. Through a draft/feedback/revision process using meta-linguistic error coding, the frequency, context and effectiveness of the corpus being used as a reference tool was measured. Use of the corpus was found to be limited to a small range of error types which largely involved queries of a pragmatic nature. In these contexts, the corpus was found to be a potentially more effective correction tool than dictionary reference or recourse to previous knowledge and it may have a beneficial effect in encouraging top-down processing skills. However, its frequency of use over the course was low and accounted for only a small proportion of accurate error revisions as a whole. Learner response to the corpus corroborated the negative perception already noted in previous studies. These findings prompt recommendations for further investigation into effective mediation of corpus data within the classroom and continued technological developments in order to make corpus data more accessible to non-specialists. 410.1 P Philology. Linguistics

Search results