Spelling suggestions: "subject:"discourse analysis -- data processing."" "subject:"discourse analysis -- mata processing.""
1 |
The dynamics of collocation: a corpus-based study of the phraseology and pragmatics of the introductory-it constructionMak, King Tong 28 August 2008 (has links)
Not available / text
|
2 |
A computerized content analysis of Oprah Winfrey's discourse during the James Frey controversyStephens, Maegan R. January 2008 (has links)
This analysis utilizes the computer-based content analysis program DICTION to gain a better understanding of Oprah Winfrey's specific discourse types (praise, blame, and standard) and her language surrounding the James Frey Controversy. Grounded in Social Influence Theory, this thesis argues that is important to understand the language styles of such a significant rhetor in society because she has the potential to influence the public. The findings indicate that Oprah's discourse types differ in the level of Optimism her language represents and that the two episodes of The Oprah Winfrey Show relating to the James Frey Controversy differ in terms of the Certainty. Also, this thesis provides a new application of the program DICTION and the implications for such procedures are discussed. / Department of Communication Studies
|
3 |
You talking to me? : zero auxiliary constructions in British EnglishCaines, Andrew Paul January 2011 (has links)
No description available.
|
4 |
Infusing Automatic Question Generation with Natural Language UnderstandingMazidi, Karen 12 1900 (has links)
Automatically generating questions from text for educational purposes is an active research area in natural language processing. The automatic question generation system accompanying this dissertation is MARGE, which is a recursive acronym for: MARGE automatically reads generates and evaluates. MARGE generates questions from both individual sentences and the passage as a whole, and is the first question generation system to successfully generate meaningful questions from textual units larger than a sentence. Prior work in automatic question generation from text treats a sentence as a string of constituents to be rearranged into as many questions as allowed by English grammar rules. Consequently, such systems overgenerate and create mainly trivial questions. Further, none of these systems to date has been able to automatically determine which questions are meaningful and which are trivial. This is because the research focus has been placed on NLG at the expense of NLU. In contrast, the work presented here infuses the questions generation process with natural language understanding. From the input text, MARGE creates a meaning analysis representation for each sentence in a passage via the DeconStructure algorithm presented in this work. Questions are generated from sentence meaning analysis representations using templates. The generated questions are automatically evaluated for question quality and importance via a ranking algorithm.
|
5 |
Constructing Task-Oriented Dialogue Systems with Limited ResourcesQian, Kun January 2024 (has links)
Task-oriented dialogue systems have increasingly become integral to our daily lives. However, collecting dialogue data is notably expensive due to the necessity of human interaction. These systems are used in various applications, such as customer service chatbots, virtual assistants, and automated scheduling tools. Given the critical role of these systems, it is essential to develop methods that can leverage limited resources efficiently, especially in data-driven models like neural networks, which have demonstrated superior performance and widespread adoption. This dissertation proposes systematic approaches to address the limited-data problem in both modeling and data aspects, aiming to enhance the effectiveness and efficiency of task-oriented dialogue systems even when data is scarce.
This dissertation is divided into three main parts. The first part introduces three modeling techniques to tackle limited-data challenges. As the base dialogue model evolves from traditional recurrent neural networks to advanced large language models, we explore meta-learning methods, meta-in-context learning, and pre-training sequentially. Besides modeling considerations, the second part of our discussion emphasizes evaluation benchmarks. We start by discussing our work on correcting MultiWOZ, one of the most popular task-oriented dialogue datasets, which enhances training and provides more accurate evaluations. We also investigate biases within this dataset and propose methods to mitigate them. Additionally, we aim to improve the dataset by extending it to a multilingual dataset, facilitating the development of task-oriented dialogue systems for a global audience. The last part examines how to adapt our methods to real-world applications. We address the issue of database-search-result ambiguity in Meta’s virtual assistants by constructing disambiguation dialogue turns in the training data. Furthermore, we aim to enhance Walmart’s shopping companion by synthesizing high-quality knowledge-based question-answer pairs and constructing dialogue data from the bottom up.
Throughout this dissertation, the consistent focus is on developing effective approaches to building task-oriented dialogue systems with limited resources. Our strategies include leveraging limited data more efficiently, utilizing data from other domains, improving data quality, and distilling knowledge from pre-trained models. We hope our approach will contribute to the field of dialogue systems and natural language processing, particularly in building applications involving real-world limited data and minimizing the need for manual data construction efforts. By addressing these challenges, this dissertation aims to lay the groundwork for creating more robust, efficient, and scalable task-oriented dialogue systems that better serve diverse user needs across various industrial applications.
|
6 |
Text Mining and Topic Modeling for Social and Medical Decision SupportUnknown Date (has links)
Effective decision support plays vital roles in people's daily life, as well as for
professional practitioners such as health care providers. Without correct information
and timely derived knowledge, a decision is often suboptimal and may result in signi
cant nancial loss or compromises of the performance. In this dissertation, we
study text mining and topic modeling and propose to use text mining methods, in
combination with topic models, to discover knowledge from texts popularly available
from a wide variety of sources, such as research publications, news, medical diagnose
notes, and further employ discovered knowledge to assist social and medical decision
support. Examples of such decisions include hospital patient readmission prediction,
which is a national initiative for health care cost reduction, academic research topics
discovery and trend modeling, and social preference modeling for friend recommendation
in social networks etc.
To carry out text mining, our research, in Chapter 3, first emphasizes on single
document analyzing to investigate textual stylometric features for user pro ling and
recognition. Our research confirms that by using properly designed features, it is
possible to identify the authors who wrote the article, using a number of sample articles written by the author as the training data. This study serves as the base to
assert that text mining is a powerful tool for capturing knowledge in texts for better
decision making.
In the Chapter 4, we advance our research from single documents to documents
with interdependency relationships, and propose to model and predict citation
relationship between documents. Given a collection of documents with known linkage
relationships, our research will discover e ective features to train prediction models,
and predict the likelihood of two documents involving a citation relationships. This
study will help accurately model social network linkage relationships, and can be used
to assist e ective decision making for friend recommendation in social networking, and
reference recommendation in scienti c writing etc.
In the Chapter 5, we advance a topic discovery and trend prediction principle
to discover meaningful topics from a set of data collection, and further model the
evolution trend of the topic. By proposing techniques to discover topics from text,
and using temporal correlation between trend for prediction, our techniques can be
used to summarize a large collection of documents as meaningful topics, and further
forecast the popularity of the topic in a near future. This study can help design
systems to discover popular topics in social media, and further assist resource planning
and scheduling based on the discovered topics and the their evolution trend.
In the Chapter 6, we employ both text mining and topic modeling to the
medical domain for effective decision making. The goal is to discover knowledge from
medical notes to predict the risk of a patient being re-admitted in a near future.
Our research emphasizes on the challenge that re-admitted patients are only a small
portion of the patient population, although they bring signficant financial loss. As
a result, the datasets are highly imbalanced which often result in poor accuracy for
decision making. Our research will propose to use latent topic modeling to carryout
localized sampling, and combine models trained from multiple copies of sampled data for accurate prediction. This study can be directly used to assist hospital re-admission
assessment for early warning and decision support.
The text mining and topic modeling techniques investigated in the dissertation
can be applied to many other domains, involving texts and social relationships,
towards pattern and knowledge based e ective decision making. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2016. / FAU Electronic Theses and Dissertations Collection
|
7 |
Towards a corpus of Indian South African English (ISAE) : an investigation of lexical and syntactic features in a spoken corpus of contemporary ISAEPienaar, Cheryl Leelavathie January 2008 (has links)
There is consensus among scholars that there is not just one English language but a family of “World Englishes”. The umbrella-term “World Englishes” provides a conceptual framework to accommodate the different varieties of English that have evolved as a result of the linguistic cross-fertilization attendant upon colonization, migration, trade and transplantation of the original “strain” or variety. Various theoretical models have emerged in an attempt to understand and classify the extant and emerging varieties of this global language. The hierarchically based model of English, which classifies world English as “First Language”, “Second Language” and “Foreign Language”, has been challenged by more equitably-conceived models which refer to the emerging varieties as New Englishes. The situation in a country such as multi-lingual South Africa is a complex one: there are 11 official languages, one of which is English. However the English used in South Africa (or “South African English”), is not a homogeneous variety, since its speakers include those for whom it is a first language, those for whom it is an additional language and those for whom it is a replacement language. The Indian population in South Africa are amongst the latter group, as theirs is a case where English has ousted the traditional Indian languages and become a de facto first language, which has retained strong community resonances. This study was undertaken using the methodology of corpus linguistics to initiate the creation of a repository of linguistic evidence (or corpus), of Indian South African English, a sub-variety of South African English (Mesthrie 1992b, 1996, 2002). Although small (approximately 60 000 words), and representing a narrow age band of young adults, the resulting corpus of spoken data confirmed the existence of robust features identified in prior research into the sub-variety. These features include the use of ‘y’all’ as a second person plural pronoun, the use of but in a sentence-final position, and ‘lakker’ /'lVk@/ as a pronunciation variant of ‘lekker’ (meaning ‘good’, ‘nice’ or great’). An examination of lexical frequency lists revealed examples of general South African English such as the colloquially pervasive ‘ja’, ‘bladdy’ (for bloody) and jol(ling) (for partying or enjoying oneself) together with neologisms such as ‘eish’, the latter previously associated with speakers of Black South African English. The frequency lists facilitated cross-corpora comparisons with data from the British National Corpus and the Corpus of London Teenage Language and similarities and differences were noted and discussed. The study also used discourse analysis frameworks to investigate the role of high frequency lexical items such as ‘like’ in the data. In recent times ‘like’ has emerged globally as a lexicalized discourse marker, and its appearance in the corpus of Indian South African English confirms this trend. The corpus built as part of this study is intended as the first building block towards a full corpus of Indian South African English which could serve as a standard for referencing research into the sub-variety. Ultimately, it is argued that the establishment of similar corpora of other known sub-varieties of South African English could contribute towards the creation of a truly representative large corpus of South African English and a more nuanced understanding and definition of this important variety of World English.
|
Page generated in 0.1143 seconds