• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 252
  • 27
  • 25
  • 22
  • 9
  • 8
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 396
  • 188
  • 144
  • 139
  • 139
  • 126
  • 104
  • 95
  • 81
  • 73
  • 73
  • 67
  • 67
  • 63
  • 58
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Mental Illness and Suicide Ideation Detection Using Social Media Data

Kirinde Gamaarachchige, Prasadith Buddhitha 12 November 2021 (has links)
Mental disorders and suicide have become a global public health problem. Over the years, researchers in computational linguistics have extracted features from social media data for the early detection of users susceptible to mental disorders and suicide ideation. Lack of reliable and inadequate data and the requirement of interpretability can be identified as the principal reasons for the low adoption of neural network architectures in recognizing individuals with mental disorders and suicide ideation. In recent years, a gradual increase in the use of deep neural network architectures in detecting mental disorders and suicide ideation with low false positive and false negative rates became feasible. Our research investigates the efficacy of using a shared representation to learn lower-level features mutual among mental disorders and between mental disorders and suicide ideation. In addition to discovering the shared features between users with suicidal thoughts and users who self-declared a single mental disorder, we further investigate the impact of comorbidities on suicide ideation and use two unseen datasets to investigate the generalizability of the trained models. We use data from two different social media platforms to identify if knowledge can be shared between suicide ideation and mental illness detection tasks across platforms. Through multiple experiments with different but related tasks, we demonstrate the effectiveness of multi-task learning (MTL) when predicting users with mental disorders and suicide ideation. We produce competitive results using MTL with hard parameter sharing when predicting neurotypical users, users who might have PTSD (Post-Traumatic Stress Disorder), and users with depression. The results were further improved by using auxiliary inputs such as emotion, age, and gender. To predict users with suicide ideation or mental disorders (i.e., either single or multiple disorders), we use MTL with hard and soft parameter sharing and produce state-of-the-art results predicting users with suicide ideation who require urgent attention. For similar tasks, but with data from two different social media platforms, we further improve the state-of-the-art results when predicting users with suicide ideation who require urgent attention. In addition, we managed to improve the overall performances of the models by using different auxiliary inputs.
2

Social Fairness in Semi-Supervised Toxicity Text Classification

Shayesteh, Shahriar 11 July 2023 (has links)
The rapid growth of user-generated content on social media platforms in the form of text caused moderating toxic language manually to become an increasingly challenging task. Consequently, researchers have turned to artificial intelligence (AI) and machine learning (ML) models to detect and classify toxic comments automatically. However, these models often exhibit unintended bias against comments containing sensitive terms related to de- mographic groups, such as race and gender, which leads to unfair classifications of samples. In addition, most existing research on this topic focuses on fully supervised learning frame- works. Therefore, there is a growing need to explore fairness in semi-supervised toxicity detection due to the difficulty of annotating large amounts of data. In this thesis, we aim to address this gap by developing a fair generative-based semi-supervised framework for mitigating social bias in toxicity text classification. This framework consists of two parts, first, we trained a semi-supervised generative-based text classification model on the bench- mark toxicity datasets. Then, in the second step, we mitigated social bias in the trained classifier in step 1 using adversarial debiasing, to improve fairness. In this work, we use two different semi-supervised generative-based text classification models, NDAGAN and GANBERT (the difference between them is that the former adds negative data augmenta- tion to address some of the problems in GANBERT), to propose two fair semi-supervised models called FairNDAGAN and FairGANBERT. Finally, we compare the performance of the proposed fair semi-supervised models in terms of accuracy and fairness (equalized odds difference) against baselines to clarify the challenges of social fairness in semi-supervised toxicity text classification for the first time. Based on the experimental results, the key contributions of this research are: first, we propose a novel fair semi-supervised generative-based framework for fair toxicity text classification for the first time. Second, we show that we can achieve fairness in semi- supervised toxicity text classification without considerable loss of accuracy. Third, we demonstrate that achieving fairness at the coarse-grained level improves fairness at the fine-grained level but does not always guarantee it. Fourth, we justify the impact of the labeled and unlabeled data in terms of fairness and accuracy in the studied semi- supervised framework. Finally, we demonstrate the susceptibility of the supervised and semi-supervised models against data imbalance in terms of accuracy and fairness.
3

Course Quizard : Providing students additional ways to study

Hellgren, Daniel, Ljus, Simon, Wang, Tony January 2016 (has links)
Studies shows that students today have problems learning. We have developed an application to combat these issues. We call it Course Quizard and it targets university students that want to find an supplemental way to study. Users can play course specific quizzes against each other containing questions created by other users or automatic generated questions from course material. A small group of students tested the application and they feel the idea is valuable but would rather recommend it to a friend than play it themselves. Course Quizard has potential with future development which would make the application an excellent study alternative. / Studier visar att dagens studenter har problem att lära sig. Vi har utvecklat en applikation för att motverka inlärningsproblemen. Applikationen kallar vi Course Quizard, en applikation som riktar sig till  universitetsstudenter som söker ytterligare sätt att studera. Användarna kan spela kursspecifika frågesporter mot varandra som innehåller frågor som skapats av andra användare eller automatiskt genererade från kursmaterialet. En liten grupp av studenter testade programmet och de anser att idén är värdefullt, men de skulle hellre rekommendera applikationen till en vän än att spela det själva. Med framtida utveckling har Course Quizard potential att bli ett utmärkt hjälpmedel för att lära sig kursmaterialet på ett ytterligare vis.
4

Using data-driven resources for optimising rule-based syntactic analysis for modern standard Arabic

Elbey, Mohamed January 2014 (has links)
This thesis is about optimising a rule based parser for Modern Standard Arabic (MSA). If ambiguity is a major problem in NLP systems, it is even worse in a language MSA due to the fact that written MSA omits short vowels and for other reasons that will be discussed in Chapter 1. By analysing the original rule based parser, it turned out that many parses were unnecessary due to many edges being produced and not used in the final analysis. The first part of this thesis is to investigate whether integrating a Part Of Speech (POS) tagger will help speeding up the parsing, or not. This is a well-known technique for Romance and Germanic languages, but its effectiveness has not been widely explored for MSA. The second part of the thesis is to use statistics and machine learning techniques and investigate its effects on the parser. This thesis is not about the accuracy of the parser. It is about finding ways to improve the speed. A new approach will be discussed, which was not explored in statistical parsing before. This approach is collecting statistics while parsing, and using these to learn strategies to be used during the parsing process. The learning process involves all the moves of the parsing (moves that lead to the final analysis, i.e good moves and moves that lead away from it, i.e. bad moves). The idea here is, not only we are learning from positive data, but also from negative data. The questions to be asked: • Why is this move good so that we can encourage itl • Why is this move bad so that we discourage it. In the final part of the thesis, both techniques were merged together: integrating a POS tagger and using the learning approach, and finding out the effect of this on the parser.
5

API pro ovládání robota v přirozeném jazyce / API for natural language robot control

Etenkowski, Bartlomiej January 2012 (has links)
No description available.
6

Simultaneously Acquiring the Syntax and Semantics of Spatial Referring Expressions

Wright, Jeremy Bryan January 2014 (has links)
To be useful for communication language must be grounded in perceptions of the world, but acquiring such grounded language is a challenging task that increases in difficulty as the length and syntactic complexity of utterances grow. Several state of the art methods exist to learn complex grounded language from unannotated utterances, however each requires that the semantic system of the language be completely defined ahead of time. This expectation is problematic as it assumes not only that agents must have complete semantic understanding before starting to learn language, but also that the human designers of these systems can accurately transcribe the semantics of human languages in great detail. This paper presents Reagent, a construction grammar framework for concurrently learning the syntax and semantics of complex English referring expressions, with an emphasis on spatial referring expressions. Rather than requiring fully predefined semantic representations, Reagent only requires access to a set of semantic primitives from which it can build appropriate representations. The results presented here demonstrate that Reagent can acquire constructions that are missing from its starting grammar by observing the contextual utterances of a fully fluent agent, can approach fluent accuracy at inferring the referent of such expressions, and learns meanings that are qualitatively similar to the constructions of the agent from which it is learning. We propose that this approach could be expanded to other types of expressions and languages, and forms a solid foundation for general natural language acquisition.
7

Using minimal recursion semantics in Japanese question answering

Dridan, Rebecca Unknown Date (has links) (PDF)
Question answering is a research field with the aim of providing answers to a user’s question, phrased in natural language. In this thesis I explore some techniques used in question answering, working towards the twin goals of using deep linguistic knowledge robustly as well as using language-independent methods wherever possible. While the ultimate aim is cross-language question answering, in this research experiments are conducted over Japanese data, concentrating on factoid questions. The two main focus areas, identified as the two tasks most likely to benefit from linguistic knowledge, are question classification and answer extraction. / In question classification, I investigate the issues involved in the two common methods used for this task—pattern matching and machine learning. I find that even with a small amount of training data (2000 questions), machine learning achieves better classification accuracy than pattern matching with much less effort. The other issue I explore in question classification is the classification accuracy possible with named entity taxonomies of different sizes and shapes. Results demonstrate that, although the accuracy decreases as the taxonomy size increases, the ability to use soft decision making techniques as well as high accuracies achieved in certain classes make larger, hierarchical taxonomies a viable option. / For answer extraction, I use Robust Minimal Recursion Semantics (RMRS) as a sentence representation to determine similarity between questions and answers, and then use this similarity score, along with other information discovered during comparison, to score and rank answer candidates. Results were slightly disappointing, but close examination showed that 40% of errors were due to answer candidate extraction, and the scoring algorithm worked very well. Interestingly, despite the lower accuracy achieved during question classification, the larger named entity taxonomies allowed much better accuracy in answer extraction than the smaller taxonomies.
8

Modèles statistiques pour la prédiction de cadres sémantiques / Statistical models for semantic frame prediction

Michalon, Olivier 04 October 2017 (has links)
En traitement automatique de la langue, les différentes étapes d'analyse usuelles ont tour à tour amélioré la façon dont le langage peut être modélisé par les machines. Une étape d'analyse encore mal maîtrisée correspond à l'analyse sémantique. Ce type d'analyse permettrait de nombreuses avancées, telles que de meilleures interactions homme-machine ou des traductions plus fiables. Il existe plusieurs structures de représentation du sens telles que PropBank, les AMR et FrameNet. FrameNet correspond à la représentation en cadres sémantiques dont la théorie a été décrite par Charles Fillmore. Dans cette théorie, chaque situation prototypique et les différents éléments y intervenant sont représentés de telle sorte que deux situations similaires soient représentées par le même objet, appelé cadre sémantique. Le projet FrameNet est une application de cette théorie, dans laquelle plusieurs centaines de situations prototypiques sont définies. Le travail que nous décrirons ici s'inscrit dans la continuité des travaux déjà élaborés pour prédire automatiquement des cadres sémantiques. Nous présenterons quatre systèmes de prédiction, chacun ayant permis de valider une hypothèse sur les propriétés nécessaires à une prédiction efficace. Nous verrons également que notre analyse peut être améliorée en fournissant aux modèles de prédiction des informations raffinées au préalable, avec d'un côté une analyse syntaxique dont les liens profonds sont explicités et de l'autre des représentations vectorielles du vocabulaire apprises au préalable. / In natural language processing, each analysis step has improved the way in which language can be modeled by machines. Another step of analysis still poorly mastered resides in semantic parsing. This type of analysis can provide information which would allow for many advances, such as better human-machine interactions or more reliable translations. There exist several types of meaning representation structures, such as PropBank, AMR and FrameNet. FrameNet corresponds to the frame semantic framework whose theory has been described by Charles Fillmore (1971). In this theory, each prototypical situation and each different elements involved are represented in such a way that two similar situations are represented by the same object, called a semantic frame. The work that we will describe here follows the work already developed for machine prediction of frame semantic representations. We will present four prediction systems, and each one of them allowed to validate another hypothesis on the necessary properties for effective prediction. We will show that semantic parsing can also be improved by providing prediction models with refined information as input of the system, with firstly a syntactic analysis where deep links are made explicit and secondly vectorial representations of the vocabulary learned beforehand.
9

Web Conference Summarization Through a System of Flags

Ankola, Annirudh M 01 March 2020 (has links)
In today’s world, we are always trying to find new ways to advance. This era has given rise to a global, distributed workforce since technology has allowed people to access and communicate with individuals all over the world. With the rise of remote workers, the need for quality communication tools has risen significantly. These communication tools come in many forms, and web-conference apps are among the most prominent for the task. Developing a system to automatically summarize the web-conference will save companies time and money, leading to more efficient meetings. Current approaches to summarizing multi-speaker web-conferences tend to yield poor or incoherent results, since conversations do not flow in the same manner that monologues or well-structured articles do. This thesis proposes a system of flags used to extract information from sentences, where the flags are fed into Machine Learning models to determine the importance of the the sentence with which they are associated. The system of flags shows promise for multi-speaker conference summaries.
10

Comparing Text Classification Libraries in Scala and Python : A comparison of precision and recall

Garamvölgyi, Filip, Henning Bruce, August January 2021 (has links)
In today’s internet era, more text than ever is being uploaded online. The text comes in many forms, such as social media posts, business reviews, and many more. For various reasons, there is an interest in analyzing the uploaded text. For instance, an airline business could ask their customers to review the service they have received. The feedback would be collected by asking the customer to leave a review and a score. A common scenario is a review with a good score that contains negative aspects. It is preferable to avoid a situation where the entirety of the review is regarded as positive because of the score if there are negative aspects mentioned. A solution to this would be to analyze each sentence of a review and classify it by negative, neutral or, positive depending on how the sentence is perceived.  With the amount of text uploaded today, it is not feasible to manually analyze text. To automatically classify text by a set of criteria is called text classification. The process of specifically classifying text by how it is perceived is a subcategory of text classification known as sentiment analysis. Positive, neutral and, negative would be the sentiments to classify.  The most popular frameworks associated with the implementation of sentiment analyzers are developed in the programming language Python. However, over the years, text classification has had an increase in popularity. The increase in popularity has caused new frameworks to be developed in new programming languages. Scala is one of the programming languages that has had new frameworks developed to work with sentiment analysis. However, in comparison to Python, it has fewer available resources. Python has more available libraries to work with, available documentation, and community support online. There are even fewer resources regarding sentiment analysis in a less common language such as Swedish. The problem is no one has compared a sentiment analyzer for Swedish text implemented using Scala and compared it to Python. The purpose of this thesis is to compare recall and precision of a sentiment analyzer implemented in Scala to Python. The goal of this thesis is to increase the knowledge regarding the state of text classification for less common natural languages in Scala.  To conduct the study, a qualitative approach with the support of quantitative data was used. Two kinds of sentiment analyzers were implemented in Scala and Python. The first classified text as either positive or negative (binary sentiment analysis), the second sentiment analyzer would also classify text as neutral (multiclass sentiment analysis). To perform the comparative study, the implemented analyzers would perform classification on text with known sentiments. The quality of the classifications was measured using their F1-score.  The results showed that Python had better recall and quality for both tasks. In the binary task, there was not as large of a difference between the two implementations. The resources from Python were more specialized for Swedish and did not seem to be as affected by the small dataset used as the resources in Scala. Scala had an F1-score of 0.78 for binary sentiment analysis and 0.65 for multiclass sentiment analysis. Python had an F1-score of 0.83 for binary sentiment analysis and 0.78 for multiclass sentiment analysis. / I dagens internetera laddas mer text upp än någonsin online. Texten finns i många former, till exempel inlägg på sociala medier, företagsrecensioner och många fler. Av olika skäl finns det ett intresse av att analysera den uppladdade texten. Till exempel kan ett flygbolag be sina kunder att lämna omdömen om tjänsten de nyttjat. Feedbacken samlas in genom att be kunden lämna ett omdöme och ett betyg. Ett vanligt scenario är en recension med ett bra betyg som innehåller negativa aspekter. Det är att föredra att undvika en situation där hela recensionen anses vara positiv på grund av poängen, om det nämnts negativa aspekter. En lösning på detta skulle vara att analysera varje mening i en recension och klassificera den som negativ, neutral eller positiv beroende på hur meningen uppfattas.  Med den mängd text som laddas upp idag är det inte möjligt att manuellt analysera text. Att automatiskt klassificera text efter en uppsättning kriterier kallas textklassificering. Processen att specifikt klassificera text efter hur den uppfattas är en underkategori av textklassificering som kallas sentimentanalys. Positivt, neutralt och negativt skulle vara sentiment att klassificera.  De mest populära ramverken för implementering av sentimentanalysatorer utvecklas i programmeringsspråket Python. Men genom åren har textklassificering ökat i popularitet. Ökningen i popularitet har gjort att nya ramverk utvecklats för nya programmeringsspråk. Scala är ett av programmeringsspråken som har utvecklat nya ramverk för att arbeta med sentimentanalys. I jämförelse med Python har den dock mindre tillgängliga resurser. Python har mer bibliotek, dokumentation och mer stöd online. Det finns ännu färre resurser när det gäller sentimentanalyser på ett mindre vanligt språk som svenska. Problemet är att ingen har jämfört en sentimentanalysator för svensk text implementerad med Scala och jämfört den med Python. Syftet med denna avhandling är att jämföra precision och recall på en sentimentanalysator implementerad i Scala med Python. Målet med denna avhandling är att öka kunskapen om tillståndet för textklassificering för mindre vanliga naturliga språk i Scala.  För att genomföra studien användes ett kvalitativt tillvägagångssätt med stöd av kvantitativa data. Två typer av sentimentanalysatorer implementerades i Scala och Python. Den första klassificerade texten som antingen positiv eller negativ (binär sentimentanalys), den andra sentimentanalysatorn skulle också klassificera text som neutral (sentimentanalys i flera klasser). För att utföra den jämförande studien skulle de implementerade analysatorerna utföra klassificering på text med kända sentiment. Klassificeringarnas kvalitet mättes med deras F1-poäng.  Resultaten visade att Python hade bättre precision och recall för båda uppgifterna. I den binära uppgiften var det inte lika stor skillnad mellan de två implementeringarna. Resurserna från Python var mer specialiserade för svenska och verkade inte påverkas lika mycket av den lilla dataset som används som resurserna i Scala. Scala hade ett F1-poäng på 0,78 för binär sentimentanalys och 0,65 för sentimentanalys i flera klasser. Python hade ett F1-poäng på 0,83 för binär sentimentanalys och 0,78 för sentimentanalys i flera klasser.

Page generated in 0.0491 seconds