Global ETD Search

101	Cyberbullying detection in Urdu language using machine learning Khan, Sara, Qureshi, Amna 11 January 2023 (has links) Yes / Cyberbullying has become a significant problem with the surge in the use of social media. The most basic way to prevent cyberbullying on these social media platforms is to identify and remove offensive comments. However, it is hard for humans to read and remove all the comments manually. Current research work focuses on using machine learning to detect and eliminate cyberbullying. Although most of the work has been conducted on English texts to detect cyberbullying, limited to no work can be found in Urdu. This paper aims to detect cyberbullying from the users' comments posted in Urdu on Twitter using machine learning and Natural Language Processing (NLP) techniques. To the best of our knowledge, cyberbullying detection on Urdu text comments has not been performed due to the lack of a publicly available standard Urdu dataset. In this paper, we created a dataset of offensive user-generated Urdu comments from Twitter. The comments in the dataset are classified into five categories. n-gram techniques are used to extract features at character and word levels. Various supervised machine-learning techniques are applied to the dataset to detect cyberbullying. Evaluation metrics such as precision, recall, accuracy and F1 scores are used to analyse the performance of machine learning techniques. Cyberbullying Machine learning Natural language processing Twitter
102	Natural Language Driven Image Edits using a Semantic Image Manipulation Language Mohapatra, Akrit 04 June 2018 (has links) Language provides us with a powerful tool to articulate and express ourselves! Understanding and harnessing the expressions of natural language can open the doors to a vast array of creative applications. In this work we explore one such application - natural language based image editing. We propose a novel framework to go from free-form natural language commands to performing fine-grained image edits. Recent progress in the field of deep learning has motivated solving most tasks using end-to-end deep convolutional frameworks. Such methods have shown to be very successful even achieving super-human performance in some cases. Although such progress has shown significant promise for the future we believe there is still progress to be made before their effective application to a task like fine-grained image editing. We approach the problem by dissecting the inputs (image and language query) and focusing on understanding the language input utilizing traditional natural language processing (NLP) techniques. We start by parsing the input query to identify the entities, attributes and relationships and generate a command entity representation. We define our own high-level image manipulation language that serves as an intermediate programming language connecting natural language requests that represent a creative intent over an image into the lower-level operations needed to execute them. The semantic command entity representations are mapped into this high- level language to carry out the intended execution. / Master of Science Machine learning Natural language Processing Computer Vision
103	Leveraging Structure for Effective Question Answering Bonadiman, Daniele 25 September 2020 (has links) In this thesis, we focus on Answer Sentence Selection (A2S) that is the core task of retrieval based question answering. A2S consists of selecting the sentences that answer user queries from a collection of documents retrieved by a search engine. Over more than two decades, several solutions based on machine learning have been proposed to solve this task, starting from simple approaches based on manual feature engineering to more complex Structural Tree Kernels models, and recently Neural Network architectures. In particular, the latter requires little human effort as they can automatically extract relevant features from plain text. The development of neural architectures brought improvements in many areas of A2S, reaching unprecedented results. They substantially increase accuracy on almost all benchmark datasets for A2S. However, this has come with the cost of a huge increase in the number of parameters and computational costs of the models. A large number of parameters has led to two drawbacks. The model requires a massive amount of data to train effectively, and huge computational power to maintain an acceptable transaction per second in a production environment. Current state-of-the-art techniques for A2S use huge Transformer architectures, having up to 340 million parameters, pre-trained on a massive amount of data, e.g., BERT. The latter and related models in the same family, such as RoBERTa, are general architectures, i.e., they can be applied to many tasks of NLP without any architectural change. In contrast to the trend above, we focus on specialized architectures for A2S that can effectively encode the local structure of the question and answer candidate and global information, i.e., the structure of the task and the context in which the answer candidate appears. In particular, we propose solutions to effectively encode both the local and the global structure of A2S in efficient neural network models. (i) We encode syntactic information in a fast CNN architecture exploiting the capabilities of Structural Tree Kernel to encode the syntactic structure. (ii) We propose an efficient model that can use semantic relational information between question and answer candidates by pretraining word representations on a relational knowledge base. (iii) This efficient approach is further extended to encode each answer candidate's contextual information, encoding all answer candidates in the original context. Lastly, (iv) we propose a solution to encode task-specific structure that is available, for example, available on the community Question Answering task. The final model, which encodes different aspects of the task, achieves state-of-the-art performance on A2S compared with other efficient architectures. The proposed model is more efficient than attention based architectures and outperforms BERT by two orders of magnitude in terms of transaction per second during training and testing, i.e., it processes 700 questions per second compared to 6 questions per second for BERT when training on a single GPU.
104	BCC’ing AI: Using Modern Natural Language Processing to Detect Micro and Macro E-ggressions in Workplace Emails Cornett, Kelsi E. 24 May 2024 (has links) Subtle offensive statements in workplace emails, which I term "Micro E-ggressions," can significantly impact the psychological safety and subsequent productivity of work environments despite their often-ambiguous intent. This thesis investigates the prevalence and nature of both micro and macro e-ggressions within workplace email communications, utilizing state-of-the-art natural language processing (NLP) techniques. Leveraging a large dataset of workplace emails, the study aims to detect and analyze these subtle offenses, exploring their themes and the contextual factors that facilitate their occurrence. The research identifies common types of micro e-ggressions, such as questioning competence and work ethic, and examines the responses to these offenses. Results indicate a high prevalence of offensive content in workplace emails and reveal distinct thematic elements that contribute to the perpetuation of workplace incivility. The findings underscore the potential for NLP tools to bridge gaps in awareness and sensitivity, ultimately contributing to more inclusive and respectful workplace cultures. / Master of Science / Subtle offensive statements in workplace emails, which I term "Micro E-ggressions," can significantly impact the psychological safety and subsequent productivity of work environments despite their often-ambiguous intent. This thesis investigates the prevalence and nature of both micro and macro e-ggressions within workplace email communications, utilizing state-of-the-art natural language processing (NLP) techniques. Leveraging a large dataset of workplace emails, the study aims to detect and analyze these subtle offenses, exploring their themes and the contextual factors that facilitate their occurrence. The research identifies common types of micro e-ggressions, such as questioning competence and work ethic, and examines the responses to these offenses. The results show a high occurrence of offensive content in workplace emails and highlight patterns that help maintain a negative work environment. The study demonstrates that advanced language analysis tools can help raise awareness and sensitivity, ultimately fostering more inclusive and respectful workplace cultures. Microaggressions Workplace mistreatment Natural language processing Diversity
105	Distributed representations for compositional semantics Hermann, Karl Moritz January 2014 (has links) The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches—meaning distributed representations that exploit co-occurrence statistics of large corpora—have proved popular and successful across a number of tasks. However, natural language usually comes in structures beyond the word level, with meaning arising not only from the individual words but also the structure they are contained in at the phrasal or sentential level. Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is an equally fundamental task of NLP. This dissertation explores methods for learning distributed semantic representations and models for composing these into representations for larger linguistic units. Our underlying hypothesis is that neural models are a suitable vehicle for learning semantically rich representations and that such representations in turn are suitable vehicles for solving important tasks in natural language processing. The contribution of this thesis is a thorough evaluation of our hypothesis, as part of which we introduce several new approaches to representation learning and compositional semantics, as well as multiple state-of-the-art models which apply distributed semantic representations to various tasks in NLP. Part I focuses on distributed representations and their application. In particular, in Chapter 3 we explore the semantic usefulness of distributed representations by evaluating their use in the task of semantic frame identification. Part II describes the transition from semantic representations for words to compositional semantics. Chapter 4 covers the relevant literature in this field. Following this, Chapter 5 investigates the role of syntax in semantic composition. For this, we discuss a series of neural network-based models and learning mechanisms, and demonstrate how syntactic information can be incorporated into semantic composition. This study allows us to establish the effectiveness of syntactic information as a guiding parameter for semantic composition, and answer questions about the link between syntax and semantics. Following these discoveries regarding the role of syntax, Chapter 6 investigates whether it is possible to further reduce the impact of monolingual surface forms and syntax when attempting to capture semantics. Asking how machines can best approximate human signals of semantics, we propose multilingual information as one method for grounding semantics, and develop an extension to the distributional hypothesis for multilingual representations. Finally, Part III summarizes our findings and discusses future work. 006.3
106	Forced Attention for Image Captioning Hemanth Devarapalli (5930603) 17 January 2019 (has links) <div> <div> <div> <p>Automatic generation of captions for a given image is an active research area in Artificial Intelligence. The architectures have evolved from using metadata of the images on which classical machine learning was employed to neural networks. Two different styles of architectures evolved in the neural network space for image captioning: Encoder-Attention-Decoder architecture, and the transformer architecture. This study is an attempt to modify the attention to allow any object to be specified. An archetypical Encoder-Attention-Decoder architecture (Show, Attend, and Tell (Xu et al., 2015)) is employed as a baseline for this study, and a modification of the Show, Attend, and Tell architecture is proposed. Both the architectures are evaluated on the MSCOCO (Lin et al., 2014) dataset, and seven metrics: BLEU – 1, 2, 3, 4 (Papineni, Roukos, Ward & Zhu, 2002), METEOR (Banerjee & Lavie, 2005), ROGUE L (Lin, 2004), and CIDer (Vedantam, Lawrence & Parikh, 2015) are calculated. Finally, the statistical significance of the results is evaluated by performing paired t tests. </p> </div> </div> </div> Natural Language Processing Artificial intelligence. Natural language processsing Image Captioning Deep Learning
107	Lexical vagueness handling using fuzzy logic in human robot interaction Guo, Xiao January 2011 (has links) Lexical vagueness is a ubiquitous phenomenon in natural language. Most of previous works in natural language processing (NLP) consider lexical ambiguity as the main problem in natural language understanding rather than lexical vagueness. Lexical vagueness is usually considered as a solution rather than a problem in natural language understanding since precise information is usually failed to be provided in conversations. However, lexical vagueness is obviously an obstacle in human robot interaction (HRI) since the robots are expected to precisely understand their users' utterances in order to provide reliable services to their users. This research aims to develop novel lexical vagueness handling techniques to enable service robots to precisely understand their users' utterance so that they can provide the reliable services to their users. A novel integrated system to handle lexical vagueness is proposed in this research based on an in-depth understanding of lexical ambiguity and lexical vagueness including why they exist, how they are presented, what differences are in between them, and the mainstream techniques to handle lexical ambiguity and lexical vagueness. The integrated system consists of two blocks: the block of lexical ambiguity handling and the block of lexical vagueness handling. The block of lexical ambiguity handling first removes syntactic ambiguity and lexical ambiguity. The block of lexical vagueness handling is then used to model and remove lexical vagueness. Experimental results show that the robots endowed with the developed integrated system are able to understand their users' utterances. The reliable services to their users, therefore, can be provided by the robots. 006.3
108	Μελέτη και έλεγχος του Python Natural Language Toolkit στην ελληνική γλώσσα Σταυλιώτης, Λεωνίδας 14 May 2012 (has links) Στην παρούσα διπλωματική εργασία παρουσιάζεται ο έλεγχος του εργαλείου NLTK (Natural Language Toolkit) της Python. Συγκεκριμένα, το nltk είναι μια ανοιχτού κώδικα βιβλιοθήκη συναρτήσεων για επεξεργασία φυσικής γλώσσας και ανάπτυξη ανάλογων εφαρμογών. Έχει αναπτυχθεί σε γλώσσα Python με στόχο την ανάλυση και ανάπτυξη εφαρμογών κυρίως για την Αγγλική γλώσσα. Αντικείμενο αυτής της εργασίας είναι η συστηματική μελέτη και ο έλεγχος των συναρτήσεων του nltk για την Ελληνική γλώσσα, καθώς υπάρχουν ενδείξεις ότι σημαντικό μέρος αυτών δουλεύει σωστά. Αρχικά, έγινε η μελέτη για εισαγωγή ελληνικών κειμένων, καθώς και κατάλληλη επεξεργασία αυτών, ώστε να είναι σε επεξεργάσιμη μορφή από το εργαλείο. Έπειτα, ελέγχθησαν όλες οι εντολές και κατηγοριοποιήθηκαν με βάση τη λειτουργία τους. Τέλος, παρατηρώντας τα συγκεντρωτικά αποτελέσματα, εξάγεται το συμπέρασμα ότι οι υποψίες για σωστή λειτουργία μεγάλου αριθμού εντολών επαληθεύονται, καθώς το 87,9 % των εντολών φαίνεται να λειτουργεί σωστά. / This diploma dissertation presents the examination of Python NLTK (Natural Language Toolkit) tool. Particularly, nltk is an open source function library suitable for natural language processing and the development of respective applications. It has been developed into Python language in order to analyse and develop applications mostly for the English language. The present dissertation is concerned with the systematic study and the examination of nltk functions for the Greek language, given that there is evidence of the correct operation of some. At first, research for the input of Greek texts as well as their appropriate processing was conducted as a way of presenting these texts in a processable by the tool form. Thereupon, all functions were tested and categorised in terms of their operation. Finally, the observation of concentrated results leads to the conclusion that the initial hypothesis for the correct operation of a great number of order is confirmed, as 87,9% of the functions appears to be operating correctly. 005.133 Natural language toolkit Natural language processing Python
109	All Purpose Textual Data Information Extraction, Visualization and Querying January 2018 (has links) abstract: Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data. This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline. / Dissertation/Thesis / Masters Thesis Software Engineering 2018 Computer science Natural Language Processing Natural Language Understanding Text Mining Text Querying Text Visualization
110	Refinements in hierarchical phrase-based translation systems Pino, Juan Miguel January 2015 (has links) The relatively recently proposed hierarchical phrase-based translation model for statistical machine translation (SMT) has achieved state-of-the-art performance in numerous recent translation evaluations. Hierarchical phrase-based systems comprise a pipeline of modules with complex interactions. In this thesis, we propose refinements to the hierarchical phrase-based model as well as improvements and analyses in various modules for hierarchical phrase-based systems. We took the opportunity of increasing amounts of available training data for machine translation as well as existing frameworks for distributed computing in order to build better infrastructure for extraction, estimation and retrieval of hierarchical phrase-based grammars. We design and implement grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting grammar using the HFile format, which offers competitive trade-offs in terms of efficiency and simplicity. We demonstrate improvements over two alternative solutions used in machine translation. The modular nature of the SMT pipeline, while allowing individual improvements, has the disadvantage that errors committed by one module are propagated to the next. This thesis alleviates this issue between the word alignment module and the grammar extraction and estimation module by considering richer statistics from word alignment models in extraction. We use alignment link and alignment phrase pair posterior probabilities for grammar extraction and estimation and demonstrate translation improvements in Chinese to English translation. This thesis also proposes refinements in grammar and language modelling both in the context of domain adaptation and in the context of the interaction between first-pass decoding and lattice rescoring. We analyse alternative strategies for grammar and language model cross-domain adaptation. We also study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methods for large 5-gram language model rescoring. The last two chapters are devoted to the application of phrase-based grammars to the string regeneration task, which we consider as a means to study the fluency of machine translation output. We design and implement a monolingual phrase-based decoder for string regeneration and achieve state-of-the-art performance on this task. By applying our decoder to the output of a hierarchical phrase-based translation system, we are able to recover the same level of translation quality as the translation system. 410.285

Search results