Global ETD Search

111	Cyberbullying detection in Urdu language using machine learning Khan, Sara, Qureshi, Amna 11 January 2023 (has links) Yes / Cyberbullying has become a significant problem with the surge in the use of social media. The most basic way to prevent cyberbullying on these social media platforms is to identify and remove offensive comments. However, it is hard for humans to read and remove all the comments manually. Current research work focuses on using machine learning to detect and eliminate cyberbullying. Although most of the work has been conducted on English texts to detect cyberbullying, limited to no work can be found in Urdu. This paper aims to detect cyberbullying from the users' comments posted in Urdu on Twitter using machine learning and Natural Language Processing (NLP) techniques. To the best of our knowledge, cyberbullying detection on Urdu text comments has not been performed due to the lack of a publicly available standard Urdu dataset. In this paper, we created a dataset of offensive user-generated Urdu comments from Twitter. The comments in the dataset are classified into five categories. n-gram techniques are used to extract features at character and word levels. Various supervised machine-learning techniques are applied to the dataset to detect cyberbullying. Evaluation metrics such as precision, recall, accuracy and F1 scores are used to analyse the performance of machine learning techniques. Cyberbullying Machine learning Natural language processing Twitter
112	Musical training and semantic integration in sentence processing: Tales of the unexpected Featherstone, C.R., Morrison, Catriona M., Waterman, M.G., MacGregor, L.J. January 2014 (has links) no / Building on models of transfer effects between musical training and language processing and on evidence of similarities in the way the brain responds to unexpected elements in music and language, we investigated whether effects of musical training could be observed at the level of sentence processing. Using sentences that tax the semantic processes involved in natural comprehension and avoid outright anomalies, we showed a striking difference between musicians and non-musicians: contrary to non-musicians, musicians showed no N400 response to novel metaphorical words which were more difficult to integrate semantically into their context than literal controls. This difference between musicians and non-musicians in semantic processing in sentences shows an effect of musicianship at the highest level of music–language transfer effects demonstrated so far in the literature. As well as adding to the growing body of evidence surrounding the relationship between musical training and language processing, this work provides support for theories which suggest shared resources, computations, and neural areas underpinning the high-level processing of music and language.
113	NLP in Engineering Education - Demonstrating the use of Natural Language Processing Techniques for Use in Engineering Education Classrooms and Research Bhaduri, Sreyoshi 19 February 2018 (has links) Engineering Education is a developing field, with new research and ideas constantly emerging and contributing to the ever-evolving nature of this discipline. Textual data (such as publications, open-ended questions on student assignments, and interview transcripts) form an important means of dialogue between the various stakeholders of the engineering community. Analysis of textual data demands consumption of a lot of time and resources. As a result, researchers end up spending a lot of time and effort in analyzing such text repositories. While there is a lot to be gained through in-depth research analysis of text data, some educators or administrators could benefit from an automated system which could reveal trends and present broader overviews for given datasets in more time and resource efficient ways. Analyzing datasets using Natural Language Processing is one solution to this problem. The purpose of my doctoral research was two-pronged: first, to describe the current state of use of Natural Language Processing as it applies to the broader field of Education, and second, to demonstrate the use of Natural Language Processing techniques for two Engineering Education specific contexts of instruction and research respectively. Specifically, my research includes three manuscripts: (1) systematic review of existing publications on the use of Natural Language Processing in education research, (2) automated classification system for open-ended student responses to gauge metacognition levels in engineering classrooms, and (3) using insights from Natural Language Processing techniques to facilitate exploratory analysis of a large interview dataset led by a novice researcher. A common theme across the three tasks was to explore the use of Natural Language Processing techniques to enable the computer to extract meaningful information from textual data for Engineering Education related contexts. Results from my first manuscript suggested that researchers in the broader fields of Education used Natural Language Processing for a wide range of tasks, primarily serving to automate instruction in terms of creating content for examinations, automated grading or intelligent tutoring purposes. In manuscripts two and three I implemented some of the Natural Language Processing techniques such as Part-of-Speech tagging and tf-idf (text frequency-inverse document frequency) that were found (through my systematic review) to be used by researchers, to (a) develop an automated classification system for student responses to gauge their metacognitive levels and (b) conduct an exploratory novice led analysis of excerpts from interviews of students on career preparedness, respectively. Overall results of my research studies indicate that although the use of Natural Language Processing techniques in Engineering Education is not widespread, although such research endeavors could facilitate research and practice in our field. Particularly, this type of approach to textual data could be of use to practitioners in large engineering classrooms who are unable to devote large amounts of time to data analysis but would benefit from algorithmic systems that could quickly present a summary based on information processed from available text data. / Ph. D. Natural Language Processing Engineering Education Education Research
114	Natural Language Driven Image Edits using a Semantic Image Manipulation Language Mohapatra, Akrit 04 June 2018 (has links) Language provides us with a powerful tool to articulate and express ourselves! Understanding and harnessing the expressions of natural language can open the doors to a vast array of creative applications. In this work we explore one such application - natural language based image editing. We propose a novel framework to go from free-form natural language commands to performing fine-grained image edits. Recent progress in the field of deep learning has motivated solving most tasks using end-to-end deep convolutional frameworks. Such methods have shown to be very successful even achieving super-human performance in some cases. Although such progress has shown significant promise for the future we believe there is still progress to be made before their effective application to a task like fine-grained image editing. We approach the problem by dissecting the inputs (image and language query) and focusing on understanding the language input utilizing traditional natural language processing (NLP) techniques. We start by parsing the input query to identify the entities, attributes and relationships and generate a command entity representation. We define our own high-level image manipulation language that serves as an intermediate programming language connecting natural language requests that represent a creative intent over an image into the lower-level operations needed to execute them. The semantic command entity representations are mapped into this high- level language to carry out the intended execution. / Master of Science Machine learning Natural language Processing Computer Vision
115	Leveraging Structure for Effective Question Answering Bonadiman, Daniele 25 September 2020 (has links) In this thesis, we focus on Answer Sentence Selection (A2S) that is the core task of retrieval based question answering. A2S consists of selecting the sentences that answer user queries from a collection of documents retrieved by a search engine. Over more than two decades, several solutions based on machine learning have been proposed to solve this task, starting from simple approaches based on manual feature engineering to more complex Structural Tree Kernels models, and recently Neural Network architectures. In particular, the latter requires little human effort as they can automatically extract relevant features from plain text. The development of neural architectures brought improvements in many areas of A2S, reaching unprecedented results. They substantially increase accuracy on almost all benchmark datasets for A2S. However, this has come with the cost of a huge increase in the number of parameters and computational costs of the models. A large number of parameters has led to two drawbacks. The model requires a massive amount of data to train effectively, and huge computational power to maintain an acceptable transaction per second in a production environment. Current state-of-the-art techniques for A2S use huge Transformer architectures, having up to 340 million parameters, pre-trained on a massive amount of data, e.g., BERT. The latter and related models in the same family, such as RoBERTa, are general architectures, i.e., they can be applied to many tasks of NLP without any architectural change. In contrast to the trend above, we focus on specialized architectures for A2S that can effectively encode the local structure of the question and answer candidate and global information, i.e., the structure of the task and the context in which the answer candidate appears. In particular, we propose solutions to effectively encode both the local and the global structure of A2S in efficient neural network models. (i) We encode syntactic information in a fast CNN architecture exploiting the capabilities of Structural Tree Kernel to encode the syntactic structure. (ii) We propose an efficient model that can use semantic relational information between question and answer candidates by pretraining word representations on a relational knowledge base. (iii) This efficient approach is further extended to encode each answer candidate's contextual information, encoding all answer candidates in the original context. Lastly, (iv) we propose a solution to encode task-specific structure that is available, for example, available on the community Question Answering task. The final model, which encodes different aspects of the task, achieves state-of-the-art performance on A2S compared with other efficient architectures. The proposed model is more efficient than attention based architectures and outperforms BERT by two orders of magnitude in terms of transaction per second during training and testing, i.e., it processes 700 questions per second compared to 6 questions per second for BERT when training on a single GPU.
116	BCC’ing AI: Using Modern Natural Language Processing to Detect Micro and Macro E-ggressions in Workplace Emails Cornett, Kelsi E. 24 May 2024 (has links) Subtle offensive statements in workplace emails, which I term "Micro E-ggressions," can significantly impact the psychological safety and subsequent productivity of work environments despite their often-ambiguous intent. This thesis investigates the prevalence and nature of both micro and macro e-ggressions within workplace email communications, utilizing state-of-the-art natural language processing (NLP) techniques. Leveraging a large dataset of workplace emails, the study aims to detect and analyze these subtle offenses, exploring their themes and the contextual factors that facilitate their occurrence. The research identifies common types of micro e-ggressions, such as questioning competence and work ethic, and examines the responses to these offenses. Results indicate a high prevalence of offensive content in workplace emails and reveal distinct thematic elements that contribute to the perpetuation of workplace incivility. The findings underscore the potential for NLP tools to bridge gaps in awareness and sensitivity, ultimately contributing to more inclusive and respectful workplace cultures. / Master of Science / Subtle offensive statements in workplace emails, which I term "Micro E-ggressions," can significantly impact the psychological safety and subsequent productivity of work environments despite their often-ambiguous intent. This thesis investigates the prevalence and nature of both micro and macro e-ggressions within workplace email communications, utilizing state-of-the-art natural language processing (NLP) techniques. Leveraging a large dataset of workplace emails, the study aims to detect and analyze these subtle offenses, exploring their themes and the contextual factors that facilitate their occurrence. The research identifies common types of micro e-ggressions, such as questioning competence and work ethic, and examines the responses to these offenses. The results show a high occurrence of offensive content in workplace emails and highlight patterns that help maintain a negative work environment. The study demonstrates that advanced language analysis tools can help raise awareness and sensitivity, ultimately fostering more inclusive and respectful workplace cultures. Microaggressions Workplace mistreatment Natural language processing Diversity
117	Distributed representations for compositional semantics Hermann, Karl Moritz January 2014 (has links) The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches—meaning distributed representations that exploit co-occurrence statistics of large corpora—have proved popular and successful across a number of tasks. However, natural language usually comes in structures beyond the word level, with meaning arising not only from the individual words but also the structure they are contained in at the phrasal or sentential level. Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is an equally fundamental task of NLP. This dissertation explores methods for learning distributed semantic representations and models for composing these into representations for larger linguistic units. Our underlying hypothesis is that neural models are a suitable vehicle for learning semantically rich representations and that such representations in turn are suitable vehicles for solving important tasks in natural language processing. The contribution of this thesis is a thorough evaluation of our hypothesis, as part of which we introduce several new approaches to representation learning and compositional semantics, as well as multiple state-of-the-art models which apply distributed semantic representations to various tasks in NLP. Part I focuses on distributed representations and their application. In particular, in Chapter 3 we explore the semantic usefulness of distributed representations by evaluating their use in the task of semantic frame identification. Part II describes the transition from semantic representations for words to compositional semantics. Chapter 4 covers the relevant literature in this field. Following this, Chapter 5 investigates the role of syntax in semantic composition. For this, we discuss a series of neural network-based models and learning mechanisms, and demonstrate how syntactic information can be incorporated into semantic composition. This study allows us to establish the effectiveness of syntactic information as a guiding parameter for semantic composition, and answer questions about the link between syntax and semantics. Following these discoveries regarding the role of syntax, Chapter 6 investigates whether it is possible to further reduce the impact of monolingual surface forms and syntax when attempting to capture semantics. Asking how machines can best approximate human signals of semantics, we propose multilingual information as one method for grounding semantics, and develop an extension to the distributional hypothesis for multilingual representations. Finally, Part III summarizes our findings and discusses future work. 006.3
118	Lexical vagueness handling using fuzzy logic in human robot interaction Guo, Xiao January 2011 (has links) Lexical vagueness is a ubiquitous phenomenon in natural language. Most of previous works in natural language processing (NLP) consider lexical ambiguity as the main problem in natural language understanding rather than lexical vagueness. Lexical vagueness is usually considered as a solution rather than a problem in natural language understanding since precise information is usually failed to be provided in conversations. However, lexical vagueness is obviously an obstacle in human robot interaction (HRI) since the robots are expected to precisely understand their users' utterances in order to provide reliable services to their users. This research aims to develop novel lexical vagueness handling techniques to enable service robots to precisely understand their users' utterance so that they can provide the reliable services to their users. A novel integrated system to handle lexical vagueness is proposed in this research based on an in-depth understanding of lexical ambiguity and lexical vagueness including why they exist, how they are presented, what differences are in between them, and the mainstream techniques to handle lexical ambiguity and lexical vagueness. The integrated system consists of two blocks: the block of lexical ambiguity handling and the block of lexical vagueness handling. The block of lexical ambiguity handling first removes syntactic ambiguity and lexical ambiguity. The block of lexical vagueness handling is then used to model and remove lexical vagueness. Experimental results show that the robots endowed with the developed integrated system are able to understand their users' utterances. The reliable services to their users, therefore, can be provided by the robots. 006.3
119	Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection Reyes Pérez, Antonio 19 July 2012 (has links) El lenguaje figurado representa una de las tareas más difíciles del procesamiento del lenguaje natural. A diferencia del lenguaje literal, el lenguaje figurado hace uso de recursos lingüísticos tales como la ironía, el humor, el sarcasmo, la metáfora, la analogía, entre otros, para comunicar significados indirectos que la mayoría de las veces no son interpretables sólo en términos de información sintáctica o semántica. Por el contrario, el lenguaje figurado refleja patrones del pensamiento que adquieren significado pleno en contextos comunicativos y sociales, lo cual hace que tanto su representación lingüística, así como su procesamiento computacional, se vuelvan tareas por demás complejas. En este contexto, en esta tesis de doctorado se aborda una problemática relacionada con el procesamiento del lenguaje figurado a partir de patrones lingüísticos. En particular, nuestros esfuerzos se centran en la creación de un sistema capaz de detectar automáticamente instancias de humor e ironía en textos extraídos de medios sociales. Nuestra hipótesis principal se basa en la premisa de que el lenguaje refleja patrones de conceptualización; es decir, al estudiar el lenguaje, estudiamos tales patrones. Por tanto, al analizar estos dos dominios del lenguaje figurado, pretendemos dar argumentos respecto a cómo la gente los concibe, y sobre todo, a cómo esa concepción hace que tanto humor como ironía sean verbalizados de una forma particular en diversos medios sociales. En este contexto, uno de nuestros mayores intereses es demostrar cómo el conocimiento que proviene del análisis de diferentes niveles de estudio lingüístico puede representar un conjunto de patrones relevantes para identificar automáticamente usos figurados del lenguaje. Cabe destacar que contrario a la mayoría de aproximaciones que se han enfocado en el estudio del lenguaje figurado, en nuestra investigación no buscamos dar argumentos basados únicamente en ejemplos prototípicos, sino en textos cuyas características / Reyes Pérez, A. (2012). Linguistic-based Patterns for Figurative Language Processing: The Case of Humor Recognition and Irony Detection [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16692 / Palancia Automatic humor recognition Irony detection Figurative language processing Natural language processing LENGUAJES Y SISTEMAS INFORMATICOS
120	DEFENDING BERT AGAINST MISSPELLINGS Nivedita Nighojkar (8063438) 06 April 2021 (has links) Defending models against Natural Language Processing adversarial attacks is a challenge because of the discrete nature of the text dataset. However, given the variety of Natural Language Processing applications, it is important to make text processing models more robust and secure. This paper aims to develop techniques that will help text processing models such as BERT to combat adversarial samples that contain misspellings. These developed models are more robust than off the shelf spelling checkers. Applied Computer Science Natural Language Processing Natural language processing Movie Reviews LSTM networks

Search results