Spelling suggestions: "subject:"[een] NLP"" "subject:"[enn] NLP""
1 |
Mental Illness and Suicide Ideation Detection Using Social Media DataKirinde Gamaarachchige, Prasadith Buddhitha 12 November 2021 (has links)
Mental disorders and suicide have become a global public health problem. Over the years, researchers in computational linguistics have extracted features from social media data for the early detection of users susceptible to mental disorders and suicide ideation. Lack of reliable and inadequate data and the requirement of interpretability can be identified as the principal reasons for the low adoption of neural network architectures in recognizing individuals with mental disorders and suicide ideation. In recent years, a gradual increase in the use of deep neural network architectures in detecting mental disorders and suicide ideation with low false positive and false negative rates became feasible. Our research investigates the efficacy of using a shared representation to learn lower-level features mutual among mental disorders and between mental disorders and suicide ideation. In addition to discovering the shared features between users with suicidal thoughts and users who self-declared a single mental disorder, we further investigate the impact of comorbidities on suicide ideation and use two unseen datasets to investigate the generalizability of the trained models. We use data from two different social media platforms to identify if knowledge can be shared between suicide ideation and mental illness detection tasks across platforms. Through multiple experiments with different but related tasks, we demonstrate the effectiveness of multi-task learning (MTL) when predicting users with mental disorders and suicide ideation. We produce competitive results using MTL with hard parameter sharing when predicting neurotypical users, users who might have PTSD (Post-Traumatic Stress Disorder), and users with depression. The results were further improved by using auxiliary inputs such as emotion, age, and gender.
To predict users with suicide ideation or mental disorders (i.e., either single or multiple disorders), we use MTL with hard and soft parameter sharing and produce state-of-the-art results predicting users with suicide ideation who require urgent attention. For similar tasks, but with data from two different social media platforms, we further improve the state-of-the-art results when predicting users with suicide ideation who require urgent attention. In addition, we managed to improve the overall performances of the models by using different auxiliary inputs.
|
2 |
Social Fairness in Semi-Supervised Toxicity Text ClassificationShayesteh, Shahriar 11 July 2023 (has links)
The rapid growth of user-generated content on social media platforms in the form of text
caused moderating toxic language manually to become an increasingly challenging task.
Consequently, researchers have turned to artificial intelligence (AI) and machine learning
(ML) models to detect and classify toxic comments automatically. However, these models
often exhibit unintended bias against comments containing sensitive terms related to de-
mographic groups, such as race and gender, which leads to unfair classifications of samples.
In addition, most existing research on this topic focuses on fully supervised learning frame-
works. Therefore, there is a growing need to explore fairness in semi-supervised toxicity
detection due to the difficulty of annotating large amounts of data. In this thesis, we aim
to address this gap by developing a fair generative-based semi-supervised framework for
mitigating social bias in toxicity text classification. This framework consists of two parts,
first, we trained a semi-supervised generative-based text classification model on the bench-
mark toxicity datasets. Then, in the second step, we mitigated social bias in the trained
classifier in step 1 using adversarial debiasing, to improve fairness. In this work, we use
two different semi-supervised generative-based text classification models, NDAGAN and
GANBERT (the difference between them is that the former adds negative data augmenta-
tion to address some of the problems in GANBERT), to propose two fair semi-supervised
models called FairNDAGAN and FairGANBERT. Finally, we compare the performance of
the proposed fair semi-supervised models in terms of accuracy and fairness (equalized odds
difference) against baselines to clarify the challenges of social fairness in semi-supervised
toxicity text classification for the first time.
Based on the experimental results, the key contributions of this research are: first,
we propose a novel fair semi-supervised generative-based framework for fair toxicity text
classification for the first time. Second, we show that we can achieve fairness in semi-
supervised toxicity text classification without considerable loss of accuracy. Third, we
demonstrate that achieving fairness at the coarse-grained level improves fairness at the
fine-grained level but does not always guarantee it. Fourth, we justify the impact of
the labeled and unlabeled data in terms of fairness and accuracy in the studied semi-
supervised framework. Finally, we demonstrate the susceptibility of the supervised and
semi-supervised models against data imbalance in terms of accuracy and fairness.
|
3 |
Course Quizard : Providing students additional ways to studyHellgren, Daniel, Ljus, Simon, Wang, Tony January 2016 (has links)
Studies shows that students today have problems learning. We have developed an application to combat these issues. We call it Course Quizard and it targets university students that want to find an supplemental way to study. Users can play course specific quizzes against each other containing questions created by other users or automatic generated questions from course material. A small group of students tested the application and they feel the idea is valuable but would rather recommend it to a friend than play it themselves. Course Quizard has potential with future development which would make the application an excellent study alternative. / Studier visar att dagens studenter har problem att lära sig. Vi har utvecklat en applikation för att motverka inlärningsproblemen. Applikationen kallar vi Course Quizard, en applikation som riktar sig till universitetsstudenter som söker ytterligare sätt att studera. Användarna kan spela kursspecifika frågesporter mot varandra som innehåller frågor som skapats av andra användare eller automatiskt genererade från kursmaterialet. En liten grupp av studenter testade programmet och de anser att idén är värdefullt, men de skulle hellre rekommendera applikationen till en vän än att spela det själva. Med framtida utveckling har Course Quizard potential att bli ett utmärkt hjälpmedel för att lära sig kursmaterialet på ett ytterligare vis.
|
4 |
Using data-driven resources for optimising rule-based syntactic analysis for modern standard ArabicElbey, Mohamed January 2014 (has links)
This thesis is about optimising a rule based parser for Modern Standard Arabic (MSA). If ambiguity is a major problem in NLP systems, it is even worse in a language MSA due to the fact that written MSA omits short vowels and for other reasons that will be discussed in Chapter 1. By analysing the original rule based parser, it turned out that many parses were unnecessary due to many edges being produced and not used in the final analysis. The first part of this thesis is to investigate whether integrating a Part Of Speech (POS) tagger will help speeding up the parsing, or not. This is a well-known technique for Romance and Germanic languages, but its effectiveness has not been widely explored for MSA. The second part of the thesis is to use statistics and machine learning techniques and investigate its effects on the parser. This thesis is not about the accuracy of the parser. It is about finding ways to improve the speed. A new approach will be discussed, which was not explored in statistical parsing before. This approach is collecting statistics while parsing, and using these to learn strategies to be used during the parsing process. The learning process involves all the moves of the parsing (moves that lead to the final analysis, i.e good moves and moves that lead away from it, i.e. bad moves). The idea here is, not only we are learning from positive data, but also from negative data. The questions to be asked: • Why is this move good so that we can encourage itl • Why is this move bad so that we discourage it. In the final part of the thesis, both techniques were merged together: integrating a POS tagger and using the learning approach, and finding out the effect of this on the parser.
|
5 |
The Effects of the Use of Natural Language Processing and Task Complexity on Jurors' Assessments of Auditor NegligenceCui, Junnan 08 1900 (has links)
The purpose of my dissertation is to examine jurors' evaluation of auditor negligence in response to auditors' use of natural language processing (NLP). To test my research objective, I conducted a 2x2 between-subjects experiment with 175 jury-eligible individuals. In the online experiment, I manipulated whether the audit team analyzes contracts with NLP software or by having human auditors read the contracts. I also manipulated task complexity as complex or simple. The dependent variables include a binary verdict variable and a scaled assessment of negligence. This dissertation makes several contributions to the accounting literature and practice. First, it contributes to the recent juror literature on emerging technologies by providing evidence that jurors attribute higher negligence assessments to auditors when auditors use NLP to examine contracts than when human auditors examine contracts. I also find that auditors' use of NLP leads to jurors' higher perceived causation, which, in turn, increases jurors' assessments of auditor liability. Second, this study answers the call of other researchers to examine the relationship between task complexity and negligence in different settings. I also find a marginally significant interaction effect of the use of NLP compared to human auditors to perform audit testing that is greater for complex tasks than for simple tasks. Third, this dissertation provides new insights for practitioners and accounting firms when using emerging algorithm-based AI technologies such as NLP. As more AI technologies are used in audit practice, the findings will provide helpful insights for audit practitioners to consider when they utilize technologies to design and implement audit procedures.
|
6 |
API pro ovládání robota v přirozeném jazyce / API for natural language robot controlEtenkowski, Bartlomiej January 2012 (has links)
No description available.
|
7 |
Simultaneously Acquiring the Syntax and Semantics of Spatial Referring ExpressionsWright, Jeremy Bryan January 2014 (has links)
To be useful for communication language must be grounded in perceptions of the world, but acquiring such grounded language is a challenging task that increases in difficulty as the length and syntactic complexity of utterances grow. Several state of the art methods exist to learn complex grounded language from unannotated utterances, however each requires that the semantic system of the language be completely defined ahead of time. This expectation is problematic as it assumes not only that agents must have complete semantic understanding before starting to learn language, but also that the human designers of these systems can accurately transcribe the semantics of human languages in great detail. This paper presents Reagent, a construction grammar framework for concurrently learning the syntax and semantics of complex English referring expressions, with an emphasis on spatial referring expressions. Rather than requiring fully predefined semantic representations, Reagent only requires access to a set of semantic primitives from which it can build appropriate representations. The results presented here demonstrate that Reagent can acquire constructions that are missing from its starting grammar by observing the contextual utterances of a fully fluent agent, can approach fluent accuracy at inferring the referent of such expressions, and learns meanings that are qualitatively similar to the constructions of the agent from which it is learning. We propose that this approach could be expanded to other types of expressions and languages, and forms a solid foundation for general natural language acquisition.
|
8 |
Using minimal recursion semantics in Japanese question answeringDridan, Rebecca Unknown Date (has links) (PDF)
Question answering is a research field with the aim of providing answers to a user’s question, phrased in natural language. In this thesis I explore some techniques used in question answering, working towards the twin goals of using deep linguistic knowledge robustly as well as using language-independent methods wherever possible. While the ultimate aim is cross-language question answering, in this research experiments are conducted over Japanese data, concentrating on factoid questions. The two main focus areas, identified as the two tasks most likely to benefit from linguistic knowledge, are question classification and answer extraction. / In question classification, I investigate the issues involved in the two common methods used for this task—pattern matching and machine learning. I find that even with a small amount of training data (2000 questions), machine learning achieves better classification accuracy than pattern matching with much less effort. The other issue I explore in question classification is the classification accuracy possible with named entity taxonomies of different sizes and shapes. Results demonstrate that, although the accuracy decreases as the taxonomy size increases, the ability to use soft decision making techniques as well as high accuracies achieved in certain classes make larger, hierarchical taxonomies a viable option. / For answer extraction, I use Robust Minimal Recursion Semantics (RMRS) as a sentence representation to determine similarity between questions and answers, and then use this similarity score, along with other information discovered during comparison, to score and rank answer candidates. Results were slightly disappointing, but close examination showed that 40% of errors were due to answer candidate extraction, and the scoring algorithm worked very well. Interestingly, despite the lower accuracy achieved during question classification, the larger named entity taxonomies allowed much better accuracy in answer extraction than the smaller taxonomies.
|
9 |
Modèles statistiques pour la prédiction de cadres sémantiques / Statistical models for semantic frame predictionMichalon, Olivier 04 October 2017 (has links)
En traitement automatique de la langue, les différentes étapes d'analyse usuelles ont tour à tour amélioré la façon dont le langage peut être modélisé par les machines. Une étape d'analyse encore mal maîtrisée correspond à l'analyse sémantique. Ce type d'analyse permettrait de nombreuses avancées, telles que de meilleures interactions homme-machine ou des traductions plus fiables. Il existe plusieurs structures de représentation du sens telles que PropBank, les AMR et FrameNet. FrameNet correspond à la représentation en cadres sémantiques dont la théorie a été décrite par Charles Fillmore. Dans cette théorie, chaque situation prototypique et les différents éléments y intervenant sont représentés de telle sorte que deux situations similaires soient représentées par le même objet, appelé cadre sémantique. Le projet FrameNet est une application de cette théorie, dans laquelle plusieurs centaines de situations prototypiques sont définies. Le travail que nous décrirons ici s'inscrit dans la continuité des travaux déjà élaborés pour prédire automatiquement des cadres sémantiques. Nous présenterons quatre systèmes de prédiction, chacun ayant permis de valider une hypothèse sur les propriétés nécessaires à une prédiction efficace. Nous verrons également que notre analyse peut être améliorée en fournissant aux modèles de prédiction des informations raffinées au préalable, avec d'un côté une analyse syntaxique dont les liens profonds sont explicités et de l'autre des représentations vectorielles du vocabulaire apprises au préalable. / In natural language processing, each analysis step has improved the way in which language can be modeled by machines. Another step of analysis still poorly mastered resides in semantic parsing. This type of analysis can provide information which would allow for many advances, such as better human-machine interactions or more reliable translations. There exist several types of meaning representation structures, such as PropBank, AMR and FrameNet. FrameNet corresponds to the frame semantic framework whose theory has been described by Charles Fillmore (1971). In this theory, each prototypical situation and each different elements involved are represented in such a way that two similar situations are represented by the same object, called a semantic frame. The work that we will describe here follows the work already developed for machine prediction of frame semantic representations. We will present four prediction systems, and each one of them allowed to validate another hypothesis on the necessary properties for effective prediction. We will show that semantic parsing can also be improved by providing prediction models with refined information as input of the system, with firstly a syntactic analysis where deep links are made explicit and secondly vectorial representations of the vocabulary learned beforehand.
|
10 |
Web Conference Summarization Through a System of FlagsAnkola, Annirudh M 01 March 2020 (has links)
In today’s world, we are always trying to find new ways to advance. This era has given rise to a global, distributed workforce since technology has allowed people to access and communicate with individuals all over the world. With the rise of remote workers, the need for quality communication tools has risen significantly. These communication tools come in many forms, and web-conference apps are among the most prominent for the task. Developing a system to automatically summarize the web-conference will save companies time and money, leading to more efficient meetings. Current approaches to summarizing multi-speaker web-conferences tend to yield poor or incoherent results, since conversations do not flow in the same manner that monologues or well-structured articles do. This thesis proposes a system of flags used to extract information from sentences, where the flags are fed into Machine Learning models to determine the importance of the the sentence with which they are associated. The system of flags shows promise for multi-speaker conference summaries.
|
Page generated in 0.0393 seconds