Global ETD Search

681	Algorithmic Fidelity and the use of Large Language Models in Social Science Research Rytting, Christopher Michael 23 May 2023 (has links) (PDF) In this dissertation, we argue that large language models (LLMs) exhibit a considerable amount of \textit{algorithmic fidelity}, a property where they have modeled ideas, behaviors, and attitudes of the population who generated their training data. This has important implications for social science, as this fidelity theoretically allows for the use of LLMs as effective proxies for human beings in experiments and research. We demonstrate this empirically in various social science domains (political partisanship, demographic surveying, voting behavior, hot-button policy issues, news media, populism, congressional summaries), in various applications (replicating social science survey findings, assisting in coding of text datasets, inferring demographics, automating interventions to improve conversations about divisive topics), and at various levels of granularity (from findings about the entire U.S. population down to specific demographics and down to the individual level). It is intrinsically interesting that LLMs could learn such behaviors on the unsupervised objective whereby they are trained. It is also strategically useful to establish where and to what extent they have, so that we can study them in cheaper and formerly impossible ways. This work serves as a preliminary study on these phenomena and an early, demonstrative methodology for drawing out the algorithmic fidelity from large language models. Natural Language Processing Generative Modeling Social Science Simulation Physical Sciences and Mathematics
682	Natural Language Programming for Controlled Object-Oriented English Zhan, Yue 11 July 2022 (has links) Natural language (NL) is a common medium humans use to express ideas and communicate with others, while programming languages (PL) are the ``language'' humans use to communicate with machines. As NL and PL were designed for different purposes, a considerable difference exists in the structure and capabilities. Programming using PL can take novices months to learn. Meanwhile, users are already familiar with NL. Therefore, natural language programming (NLPr) holds excellent potential by giving non-experts the ability to ``program'' with the language they already know and a Low-Code/No-Code development experience. However, many challenges with developing NLPr systems are yet to be addressed, namely how to disambiguate NL semantics, validate inputs and provide helpful feedback, and generate the executable programs based on semantic meanings effectively. This dissertation addresses these issues by proposing a Controlled Object-Oriented Language (COOL) model to disambiguate and analyze the English inputs' semantic meanings and implement a LEGO robot NLPr platform. Two main approaches that connect the current research in general-purpose NLP to NLPr are taken: (1) A domain-specific lexicon and function library serve as the syntax and semantic space. Even though NL can be complex and expressive, functions for the specific robot domain can be fulfilled with libraries built of a finite set of objects and functions. (2) An error-reporting and feedback mechanism detects erroneous sentences, explains possible reasons, and provides debugging and rewriting suggestions. The error-reporting and feedback systems are developed with a hybrid approach that combines rule-based methods such as FSM and dependency-based structural analysis with the data-based multi-label classification (MLC) method. Experiment results and user studies show that, with the proposed model and approaches reducing the ambiguity within the target domain, the NLPr system can process a relatively expressive controlled NL for robot motion control and generate executable codes based on the English input. When the system is confronted with erroneous sentences, it produces error messages, suggestions, and example sentences for users. NL's structural and semantic information can be transformed into the intermediate representations used for program synthesis with the language model and system proposed to resolve the situation where the considerable amount of data needed for a data-based model is unavailable. / Doctor of Philosophy / Natural language (NL) is one of the most common mediums humans use daily to express and explain ideas and communicate with each other. In contrast, programming languages (PL) are the ``language'' humans use to communicate with machines. Because of the difference in the purpose, media, and audience, there is a considerable difference in their structure and capabilities. NL is more expressive and natural and sometimes can be rather complex, while PL is primarily short, straightforward, and not as expressive as NL. The need for programming has increased in recent years. However, the learning curve of programming languages can easily be months or more for novice users to learn. At the same time, all potential users are familiar with at least one NL. As such, natural language programming (NLPr), a technology that enables people to program with NL, holds excellent potential since it gives non-experts the ability to ``program'' with the language they already know and a Low-Code or even No-Code development experience. However, despite recent research into NLPr, many challenges with developing NLPr systems are yet to be addressed, namely how to disambiguate natural language semantics, how to validate inputs and provide helpful feedback with a limited amount of data, and how to effectively generate the executable programs based on the semantic meanings. This dissertation addresses these issues by proposing a Controlled Object-Oriented Language (COOL) model to disambiguate and analyze the English inputs' semantic meanings and implement a LEGO robot NLPr platform. Two main approaches that connect the current research in general-purpose NLP techniques to NLPr are taken: (1) The first is developing a domain-specific lexicon and function library with the designed COOL model to serve as the syntax and semantic space. Even though natural language can be extremely complex and expressive, the functions for the specific robot domain can be fulfilled with libraries built of a finite set of objects and functions. (2) An error-reporting and feedback mechanism detects erroneous sentences, explains possible reasons, and provides debugging and rewriting suggestions. The error-reporting and feedback systems are developed with a hybrid approach that combines rule-based methods such as FSM and dependency-based structural analysis with the data-based multi-label classification (MLC) method. Experiment results and user studies show that, with the proposed language model and approaches reducing the ambiguity within the target domain, the designed NLPr system can process a relatively expressive controlled natural language designed for robot motion control and generate executable codes based on the semantic information extracted. When the NLPr system is confronted with erroneous sentences, it produces detailed error messages and provides suggestions and sample sentences for possible fixes to users. NL's structural and semantic information can be transformed into the intermediate representations used for program synthesis with the simple language model and system proposed to resolve the situation where the considerable amount of data needed for a data-based model is unavailable. Natural language programming Natural language processing Semantic extraction Multi-label classification LEGO Mindstorm EV3
683	A Person-Centered Approach to Understanding Perceived Deception in Job Advertisement Text Ristow, Teresa Lauren 09 May 2023 (has links) Regardless of industry or job type, most organizations aim to recruit large qualified applicant pools via job advertisements or postings. With little control over those individuals that choose to apply and those that do not, organizations and their recruiters are likely to do what they can to increase their applicant pool. This allows for more options in potential hires during the selection process. In order to control the applicant pool as much as possible, recruiters can try and influence potential applicants through the posted job advertisement. Therefore, it is reasonable to assume that many recruiters will write a slightly inflated or overly positive view of the job in order to appeal to more applicants. However, individuals job searching may perceive this attempt as misleading or deceptive. In order to understand perceived deception in job advertisements and what features of their text elicits an overall negative attitude towards the advertisement, this study proposes a mainly exploratory approach to discover if there is a homogenous higher-level construct of perceived deceptiveness or if there is a more person-centered approach via latent profile analysis (LPA) to explain what applicants perceived as deceptive. After the nature of perceived deceptiveness is better understood, this study aims to utilize natural language processing (NLP) topic modeling to find common deceptive topics within different dimensions of the job posting such as, pay, benefits, qualifications, etc. With the limited empirical guidance provided to practitioners, the proposed study can help facilitate research on best practices in job advertisement writing to gain qualified and quality candidates. In turn, those candidates will tend to maintain positive attitudes towards the job and organization, which can persist even after being hired. / Doctor of Philosophy / In today's job market, organizations aim to attract qualified applicants through appealing job advertisements. However, some applicants may perceive these attempts as misleading or deceptive. This study explores whether there is a common view of what is deceptive within the text of a job advertisement or if it varies based on individualized perceptions. This study aims to classify different types of applicants and their associated perception of deception in job ads. This study also employs natural language processing techniques to analyze the language used in job advertisements, pinpointing common deceptive themes in various sections of the job posting, such as pay, benefits, and qualifications. By uncovering how people perceive deception in job ads, this study hopes to provide valuable insights to organizations for crafting more honest and transparent job postings. This can attract high-quality candidates who maintain positive attitudes towards the job and organization, ultimately contributing to improved hiring practices and fostering a more positive work environment. Recruiting Job Advertisement Perception Deception Natural Language Processing Latent Profile Analysis
684	Investigation of Contractual Specification and Implementation of Relational Approaches in Public Private Partnership (PPP) Projects Khurana, Mayank 30 August 2021 (has links) Public-Private Partnerships (PPPs) have unique characteristics such as a long time horizon and multiple stakeholders involved. This can lead to common transaction hazards – uncertainty, asset specificity, imperfect information and incomplete contracts, which can further promote opportunistic behavior between parties. Although contracts are designed to govern projects and curb opportunism, their efficacy is limited by these transaction hazards. Therefore, the development of strong relationships and cooperative behavior among stakeholders are often emphasized to complement contractual provisions, which can mitigate transaction hazard impacts. Relational contracting includes a set of principles such as flexibility and effective communication, which promote cooperative behavior and advance mutually beneficial outcomes for stakeholders. A relational contract can include different relational approaches such as informal resolution procedures, partnering practices and incentives to promote relational exchanges in a project. The level of presence of these relational approaches in PPP contracts is an indication of their ability to further inter-organizational relationships. Although previous studies have summarized and further investigated relational approaches in construction projects using conventional delivery methods, similar investigations for PPP projects are limited. Furthermore, relational contracting theory suggests that the contractual inclusion of approaches does not assure their implementation in the field. Stakeholders tend to form working relationships different than what is intended in the contract. Therefore, examining the implementation of these approaches is an important precursor to exploring their effectiveness and capacity to promote stronger relationships between parties. Accordingly, this research presents three complementary studies to enhance understanding of relational approaches employed in PPP projects. The first study focused on the identification of relational approaches described in the literature for specification in PPP contracts to enhance cooperative behavior. A comprehensive literature review process identified relational approaches that were grouped into six categories – communication/nature of negotiations, partnering, conflict resolution methods, monitoring, changes process and risk allocation. The second study examined the extent that the relational approaches identified in the first study were specified in 22 PPP transportation project contracts in the United States. This investigation characterized how relational these contracts are, which is indicative of the capacity of these contracts to promote relational exchanges in a project. For instance, different partnering practices were either included in contracts or not, while the inclusion of conflict resolution methods in contracts was selective. The third study investigated the implementation of relational approaches in practice. Semi-structured interviews of 13 subject matter experts were conducted to obtain perspectives regarding the implementation of different relational approaches. For example, all the interviewees emphasized handling conflicts through informal resolution methods to save the time and effort required in third party methods such as mediation, arbitration and dispute review boards. A framework intended to promote proactive management of stakeholder relationships is proposed based on the findings. Collectively, these three complementary studies shed light on the current state of contractual inclusion and implementation of relational approaches in PPP projects in the United States. Overall, this research contributes towards the growing literature concerning the complementarity between contractual and relational governance, which is needed for improved project performance. These studies have advanced understanding of relational approaches in PPPs by establishing a baseline for their current contractual specification in PPP projects and identifying factors influencing their implementation in the field. Future research can explore their impact on project performance and counterparty relations. / Doctor of Philosophy / Different life cycle phases and stakeholders involved are some of the unique characteristics of Public Private Partnership (PPP) projects. This can lead to presence of transaction hazards such as uncertainty, asset specificity, incomplete contracts and imperfect information which makes contracts less effective in governing projects. Therefore, developing strong relationships between the stakeholders is necessary to complement contracts which can lead to improved project performance. Relational contracting includes a set of principles which aim at developing cooperative behavior between the stakeholders through improved communication and flexibility. A contract that includes such principles is called a relational contract. Although projects with traditional delivery methods such as design bid build and design build have been investigated regarding relational approaches in contracts, a similar review for PPP contracts has not been found. On the other hand, relational contracting theory suggests that the working relationships between the stakeholders can turn out to be completely different than what is intended in the contract. Therefore, it is important to investigate the actual implementation of relational approaches in PPP projects which has not been performed till now. Based on the arguments above, three complementary studies have been performed in this research to overcome the limitations mentioned and to have a better understanding of relational approaches in PPP projects. The first study aims at identifying a comprehensive list of relational approaches from the literature which can be included in PPP contracts to enhance cooperative behavior. A robust literature review process was followed to identify relational approaches which were further grouped into six categories – communication/nature of negotiations, partnering, conflict resolution methods, monitoring, changes process and risk allocation. The second study further investigates the contracts of PPP transportation projects in the United States regarding the level of relational approaches identified in the first study. Contracts from 22 PPP projects were investigated and compared. The findings provided insights into the ability of these contracts to promote relational exchanges in a project. For instance, the inclusion of partnering practices was either all-inclusive or not, whereas inclusion of conflict resolution methods was selective. The third study investigated the implementation of relational approaches in practice. Semi-structured interviews of 13 subject matter experts were conducted to gather insights into the implementation of different relational approaches. For example, all the interviewees emphasized the need to resolve conflicts through informal resolution methods to save time and effort required in third party methods such as mediation, arbitration and dispute review boards. A framework with the objective of promoting proactive management of stakeholder relationships was proposed based on the findings. Collectively, these three studies provide insights into the current state of contractual inclusion and implementation of relational approaches in PPP projects. This research contributes towards the growing literature concerning the complementarity between contractual and relational governance in PPP projects which is needed for improved project performance. Public-Private Partnerships transportation relational approaches contract content analysis natural language processing implementation
685	Analysis of Information Diffusion through Social Media Khalili, Nastaran 16 June 2021 (has links) The changes in the course of communication changed the world from different perspectives. Public participation on social media means the generation, diffusion, and exposure to a tremendous amount of user-generated content without supervision. This four-essay dissertation analyzes information diffusion through social media and its opportunities and challenges through management systems engineering and data analytics. First, we evaluate how information can be shared to reach maximum exposure for the case on online petitions. We use system dynamics modeling and propose policies for campaign managers to schedule the reminders they send to have the highest number of petition signatures. We find that sending reminders is more effective in the case of increasing the signature rate. In the second essay, we investigate how people build trust/ mistrust in science during an emergency. We use data analytics methods on more than 700,000 tweets containing keywords of Hydroxychloroquine and chloroquine, two candidate medicines, to prevent and cure patients infected with COVID-19. We show that people's opinions are concentrated in the case of polarity and spread out in the case of subjectivity. Also, they tend to share subjective tweets than objective ones. In the third essay, building on the same dataset as essay two, we study the changes in science communication during the coronavirus pandemic. We used topic modeling and clustered the tweets into seven different groups. Our analysis suggests that a highly scientific and health-related subject can become political in the case of an emergency. We found that the groups of medical information and research and study have fewer tweets than the political one. Fourth, we investigated fake news diffusion as one of the main challenges of user-generated content. We built a system dynamics model and analyzed the effects of competition and correction in combating fake news. We show that correction of misinformation and competition in fake news needs a high percentage of participation to be effective enough to deal with fake news. / Doctor of Philosophy / The prevalence of social media among people has changed information diffusion in several ways. This change caused the emergence of a variety of opportunities and challenges. We discuss instances of these in this dissertation in four main essays. In the first essay, we study online social and political campaigns. Considering the main goal of campaign managers is to gain the highest reach and signatures, we generate a model to show the effects of sending reminders after the initial announcement and its schedule on the final total number of signatures. We found that the best policy for online petition success is sending reminders when people are increasingly signing it rather than when people lose interest in it. In the second essay, we investigated how people build trust/ mistrust in scientific information in emergency cases. We used public tweets about two candidate medicines to prevent and treat patients infected with COVID-19 and analyzed them. Our results suggest that people trust and retweet the information that is based on emotions and judgments more than the one containing facts. We evaluated the science communication during the mentioned emergency by further investigating the same dataset in the third essay. We clustered all the tweets based on the words they used into seven different groups and labeled each of them. Then, we focused on three groups of medical, research and study, and political. Our analysis suggests that although the subject is a health-related scientific one, the number of tweets in the political group is greater than the other clusters. In the fourth essay, we analyzed the fake news diffusion through social media and the effects of correction and competition on it. In this context, correction means the reaction to misinformation that states its falsity or provides counter facts based on truth. We created a model and simulated it for the competition considering novelty as one influential factor of sharing. The results of this study reveal that active participation in correction and competition is needed to combat fake news effectively. Social Media Information Misinformation Communication System Dynamics Data Analytics Natural Language Processing
686	Analyzing Large Language Models For Classifying Sexual Harassment Stories With Out-of-Vocabulary Word Substitution Seung Yeon Paik (18419409) 25 April 2024 (has links) <p dir="ltr">Sexual harassment is regarded as a serious issue in society, with a particularly negative impact on young children and adolescents. Online sexual harassment has recently gained prominence as a significant number of communications have taken place online. Online sexual harassment can happen anywhere in the world because of the global nature of the internet, which transcends geographical barriers and allows people to communicate electronically. Online sexual harassment can occur in a wide variety of environments such as through work mail or chat apps in the workplace, on social media, in online communities, and in games (Chawki & El Shazly, 2013).<br>However, especially for non-native English speakers, due to cultural differences and language barriers, may vary in their understanding or interpretation of text-based sexual harassment (Welsh, Carr, MacQuarrie, & Huntley, 2006). To bridge this gap, previous studies have proposed large language models to detect and classify online sexual harassment, prompting a need to explore how language models comprehend the nuanced aspects of sexual harassment data. Prior to exploring the role of language models, it is critical to recognize the current gaps in knowledge that these models could potentially address in order to comprehend and interpret the complex nature of sexual harassment.</p><p><br></p><p dir="ltr">The Large Language Model (LLM) has attracted significant attention recently due to its exceptional performance on a broad spectrum of tasks. However, these models are characterized by being very sensitive to input data (Fujita et al., 2022; Wei, Wang, et al., 2022). Thus, the purpose of this study is to examine how various LLMs interpret data that falls under the domain of sexual harassment and how they comprehend it after replacing Out-of-Vocabulary words.</p><p dir="ltr"><br>This research examines the impact of Out-of-Vocabulary words on the performance of LLMs in classifying sexual harassment behaviors in text. The study compares the story classification abilities of cutting-edge LLM, before and after the replacement of Out-of-Vocabulary words. Through this investigation, the study provides insights into the flexibility and contextual awareness of LLMs when managing delicate narratives in the context of sexual harassment stories as well as raises awareness of sensitive social issues.</p> Crime and social justice Natural language processing Sexual harassment Large Language Models (LLMs) out-of-vocabulary (OOV) words
687	Augmenting Large Language Models with Humor Theory To Understand Puns Ryan Rony Dsilva (18429846) 25 April 2024 (has links) <p dir="ltr">This research explores the application of large language models (LLMs) to comprehension of puns. Leveraging the expansive capabilities of LLMs, this study delves into the domain of pun classification by examining it through the prism of two humor theories: the Computational Model of Humor and the Benign Violation theory, which is an extension of the N+V Theory. The computational model posits that for a phrase to qualify as a pun, it must possess both ambiguity and distinctiveness, characterized by a word that can be interpreted in two plausible ways, each interpretation being supported by at least one unique word. On the other hand, the Benign Violation theory posits that puns work by breaching one linguistic rule while conforming to another, thereby creating a "benign violation." By leveraging the capabilities of large language models (LLMs), this research endeavors to scrutinize a curated collection of English language puns. Our aim is to assess the validity and effectiveness of the use of these theoretical frameworks in accurately classifying puns. We undertake controlled experiments on the dataset, selectively removing a condition specific to one theory and then evaluating the puns based on the criteria of the other theory to see how well it classifies the altered inputs. This approach allows us to uncover deeper insights into the processes that facilitate the recognition of puns and to explore the practical implications of applying humor theories. The findings of our experiments, detailed in the subsequent sections, sheds light on how the alteration of specific conditions impacts the ability of the LLMs to accurately classify puns, according to each theory, where each component of the theory does not influence the result to the same extent, thereby contributing to our understanding of humor mechanics through the eyes of LLMs.</p> Natural language processing Deep learning Computational linguistics Large Language Models (LLMs) puns wordplay humor
688	A temporal analysis of natural language narrative text Ramachandran, Venkateshwaran 12 March 2009 (has links) Written English texts in the form of narratives often describe events that occur in definite chronological sequence. Understanding the concept of time in such texts is an essential aspect of text comprehension and forms the basis for answering time related questions pertaining to the source text. It is our hypothesis that time in such texts is expressed in terms of temporal orderings of the situations described and can be modelled by a linear representation of these situations. This representation conforms to the traditional view of the linearity of time where it is regarded as a horizontal line called the timeline. Information indicating the temporal ordering of events is often explicitly specified in the source text. Where such indicators are missing, semantic relations between the events enforce temporal orderings. This thesis proposes and implements a practical model for automatically processing paragraphs of narrative fiction for explicit chronological information and employing certain guidelines for inferring such information in the absence of explicit indications. Although we cannot claim to have altogether eliminated the need for expensive semantic inferencing within our model, we have certainly devised guidelines to eliminate the expense in certain cases where explicit temporal indicators are missing. We have also characterized some cases through our test data where semantic inferencing proves necessary to augment the capabilities of our model. / Master of Science LD5655.V855 1990.R352
689	EMOTION DISCOVERY IN HINDI-ENGLISH CODE-MIXED CONVERSATIONS Monika Vyas (18431835) 28 April 2024 (has links) <p dir="ltr">This thesis delves into emotion recognition in Hindi-English code-mixed dialogues, particularly focusing on romanized text, which is essential for understanding multilingual communication dynamics. Using a dataset from bilingual television shows, the study employs machine learning and natural language processing techniques, with models like Support Vector Machine, Logistic Regression, and XLM-Roberta tailored to handle the nuances of code-switching and transliteration in romanized Hindi-English. To combat challenges such as data imbalance, SMOTE (Synthetic Minority Over-sampling Technique) is utilized, enhancing model training and generalization. The research also explores ensemble learning with methods like VotingClassifier to improve emotional classification accuracy. Logistic regression stands out for its high accuracy and robustness, demonstrated through rigorous cross-validation. The findings underscore the potential of advanced machine learning models and advocate for further exploration of deep learning and multimodal data to enhance emotion detection in diverse linguistic settings.</p> Natural language processing Emotion detection textual data analysis romanization Code Switching
690	Greedy Inference Algorithms for Structured and Neural Models Sun, Qing 18 January 2018 (has links) A number of problems in Computer Vision, Natural Language Processing, and Machine Learning produce structured outputs in high-dimensional space, which makes searching for the global optimal solution extremely expensive. Thus, greedy algorithms, making trade-offs between precision and efficiency, are widely used. Unfortunately, they in general lack theoretical guarantees. In this thesis, we prove that greedy algorithms are effective and efficient to search for multiple top-scoring hypotheses from structured (neural) models: 1) Entropy estimation. We aim to find deterministic samples that are representative of Gibbs distribution via a greedy strategy. 2) Searching for a set of diverse and high-quality bounding boxes. We formulate this problem as the constrained maximization of a monotonic sub-modular function such that there exists a greedy algorithm having near-optimal guarantee. 3) Fill-in-the-blank. The goal is to generate missing words conditioned on context given an image. We extend Beam Search, a greedy algorithm applicable on unidirectional expansion, to bidirectional neural models when both past and future information have to be considered. We test our proposed approaches on a series of Computer Vision and Natural Language Processing benchmarks and show that they are effective and efficient. / Ph. D. greedy algorithm natural language processing graph models recurrent neural networks beam search

Search results