Global ETD Search

111	Quality of SQL Code Security on StackOverflow and Methods of Prevention Klock, Robert 29 July 2021 (has links) No description available. Computer Science StackOverflow SQL Injection Security Text mining Machine learning
112	A Machine Learning Approach to Predicting Community Engagement on Social Media During Disasters Alshehri, Adel 01 July 2019 (has links) The use of social media is expanding significantly and can serve a variety of purposes. Over the last few years, users of social media have played an increasing role in the dissemination of emergency and disaster information. It is becoming more common for affected populations and other stakeholders to turn to Twitter to gather information about a crisis when decisions need to be made, and action is taken. However, social media platforms, especially on Twitter, presents some drawbacks when it comes to gathering information during disasters. These drawbacks include information overload, messages are written in an informal format, the presence of noise and irrelevant information. These factors make gathering accurate information online very challenging and confusing, which in turn may affect public, communities, and organizations to prepare for, respond to, and recover from disasters. To address these challenges, we present an integrated three parts (clustering-classification-ranking) framework, which helps users choose through the masses of Twitter data to find useful information. In the first part, we build standard machine learning models to automatically extract and identify topics present in a text and to derive hidden patterns exhibited by a dataset. Next part, we developed a binary and multi-class classification model of Twitter data to categorize each tweet as relevant or irrelevant and to further classify relevant tweets into four types of community engagement: reporting information, expressing negative engagement, expressing positive engagement, and asking for information. In the third part, we propose a binary classification model to categorize the collected tweets into high or low priority tweets. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely: textual content, term frequency-inverse document frequency, Linguistic, sentiment, psychometric, temporal, and spatial. Our framework also provides insights for researchers and developers to build more robust socio-technical disasters for identifying types of online community engagement and ranking high-priority tweets in disaster situations. Classification Crisis Event Detection Ranking Text Mining Computer Sciences
113	Interpretability for Deep Learning Text Classifiers Lucaci, Diana 14 December 2020 (has links) The ubiquitous presence of automated decision-making systems that have a performance comparable to humans brought attention towards the necessity of interpretability for the generated predictions. Whether the goal is predicting the system’s behavior when the input changes, building user trust, or expert assistance in improving the machine learning methods, interpretability is paramount when the problem is not sufficiently validated in real applications, and when unacceptable results lead to significant consequences. While for humans, there are no standard interpretations for the decisions they make, the complexity of the systems with advanced information-processing capacities conceals the detailed explanations for individual predictions, encapsulating them under layers of abstractions and complex mathematical operations. Interpretability for deep learning classifiers becomes, thus, a challenging research topic where the ambiguity of the problem statement allows for multiple exploratory paths. Our work focuses on generating natural language interpretations for individual predictions of deep learning text classifiers. We propose a framework for extracting and identifying the phrases of the training corpus that influence the prediction confidence the most through unsupervised key phrase extraction and neural predictions. We assess the contribution margin that the added justification has when the deep learning model predicts the class probability of a text instance, by introducing and defining a contribution metric that allows one to quantify the fidelity of the explanation to the model. We assess both the performance impact of the proposed approach on the classification task as quantitative analysis and the quality of the generated justifications through extensive qualitative and error analysis. This methodology manages to capture the most influencing phrases of the training corpus as explanations that reveal the linguistic features used for individual test predictions, allowing humans to predict the behavior of the deep learning classifier. Interpretability Deep Learning Text Classifiers Natural Language Processing Text Mining
114	Automated Extraction Of Associations Between Methylated Genes and Diseases From Biomedical Literature Bin Res, Arwa A. 12 1900 (has links) Associations between methylated genes and diseases have been investigated in several studies, and it is critical to have such information available for better understanding of diseases and clinical decisions. However, such information is scattered in a large number of electronic publications and it is difficult to manually search for it. Therefore, the goal of the project is to develop a machine learning model that can efficiently extract such information. Twelve machine learning algorithms were applied and compared in application to this problem based on three approaches that involve: document-term frequency matrices, position weight matrices, and a hybrid approach that uses the combination of the previous two. The best results we obtained by the hybrid approach with a random forest model that, in a 10-fold cross-validation, achieved F-score and accuracy of nearly 85% and 84%, respectively. On a completely separate testing set, F-score and accuracy of 89% and 88%, respectively, were obtained. Based on this model, we developed a tool that automates extraction of associations between methylated genes and diseases from electronic text. Our study contributed an efficient method for extracting specific types of associations from free text and the methodology developed here can be extended to other similar association extraction problems. Bioinformatics genes Diseases Text Mining Methylation Association extraction
115	Intelligent Prediction of Stock Market Using Text and Data Mining Techniques Raahemi, Mohammad 04 September 2020 (has links) The stock market undergoes many fluctuations on a daily basis. These changes can be challenging to anticipate. Understanding such volatility are beneficial to investors as it empowers them to make inform decisions to avoid losses and invest when opportunities are predicted to earn funds. The objective of this research is to use text mining and data mining techniques to discover the relationship between news articles and stock prices fluctuations. There are a variety of sources for news articles, including Bloomberg, Google Finance, Yahoo Finance, Factiva, Thompson Routers, and Twitter. In our research, we use Factive and Intrinio news databases. These databases provide daily analytical articles about the general stock market, as well as daily changes in stock prices. The focus of this research is on understanding the news articles which influence stock prices. We believe that different types of stocks in the market behave differently, and news articles could provide indications on different stock price movements. The goal of this research is to create a framework that uses text mining and data mining algorithms to correlate different types of news articles with stock fluctuations to predict whether to “Buy”, “Sell”, or “Hold” a specific stock. We train Doc2Vec models on 1GB of financial news from Factiva to convert news articles into vectors of 100 dimensions. After preprocessing the data, including labeling and balancing the data, we build five predictive models, namely Neural Networks, SVM, Decision Tree, KNN, and Random Forest to predict stock movements (Buy, Sell, or Hold). We evaluate the performances of the predictive models in terms of accuracy and area under the ROC. We conclude that SVM provides the best performance among the five models to predict the stock movement. Stock Prediction Text Mining Data Mining Machine Learning Word Embedding
116	Analyzing and evaluating security features in software requirements Hayrapetian, Allenoush 28 October 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Software requirements, for complex projects, often contain specifications of non-functional attributes (e.g., security-related features). The process of analyzing such requirements for standards compliance is laborious and error prone. Due to the inherent free-flowing nature of software requirements, it is tempting to apply Natural Language Processing (NLP) and Machine Learning (ML) based techniques for analyzing these documents. In this thesis, we propose a novel semi-automatic methodology that assesses the security requirements of the software system with respect to completeness and ambiguity, creating a bridge between the requirements documents and being in compliance. Security standards, e.g., those introduced by the ISO and OWASP, are compared against annotated software project documents for textual entailment relationships (NLP), and the results are used to train a neural network model (ML) for classifying security-based requirements. Hence, this approach aims to identify the appropriate structures that underlie software requirements documents. Once such structures are formalized and empirically validated, they will provide guidelines to software organizations for generating comprehensive and unambiguous requirements specification documents as related to security-oriented features. The proposed solution will assist organizations during the early phases of developing secure software and reduce overall development effort and costs. Security NLP Machine Learning Text Mining Neural Network Software Requirements
117	Translational drug interaction study using text mining technology Wu, Heng-Yi 15 August 2017 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Drug-Drug Interaction (DDI) is one of the major causes of adverse drug reaction (ADR) and has been demonstrated to threat public health. It causes an estimated 195,000 hospitalizations and 74,000 emergency room visits each year in the USA alone. Current DDI research aims to investigate different scopes of drug interactions: molecular level of pharmacogenetics interaction (PG), pharmacokinetics interaction (PK), and clinical pharmacodynamics consequences (PD). All three types of experiments are important, but they are playing different roles for DDI research. As diverse disciplines and varied studies are involved, interaction evidence is often not available cross all three types of evidence, which create knowledge gaps and these gaps hinder both DDI and pharmacogenetics research. In this dissertation, we proposed to distinguish the three types of DDI evidence (in vitro PK, in vivo PK, and clinical PD studies) and identify all knowledge gaps in experimental evidence for them. This is a collective intelligence effort, whereby a text mining tool will be developed for the large-scale mining and analysis of drug-interaction information such that it can be applied to retrieve, categorize, and extract the information of DDI from published literature available on PubMed. To this end, three tasks will be done in this research work: First, the needed lexica, ontology, and corpora for distinguishing three different types of studies were prepared. Despite the lexica prepared in this work, a comprehensive dictionary for drug metabolites or reaction, which is critical to in vitro PK study, is still lacking in pubic databases. Thus, second, a name entity recognition tool will be proposed to identify drug metabolites and reaction in free text. Third, text mining tools for retrieving DDI articles and extracting DDI evidence are developed. In this work, the knowledge gaps cross all three types of DDI evidence can be identified and the gaps between knowledge of molecular mechanisms underlying DDI and their clinical consequences can be closed with the result of DDI prediction using the retrieved drug gene interaction information such that we can exemplify how the tools and methods can advance DDI pharmacogenetics research. / 2 years Pharmacodynamics Pharmacokinetics Translation drug interaction study Text mining
118	Does Quality Management Practice Influence Performance in the Healthcare Industry? Xie, Heng 08 1900 (has links) This research examines the relationship between quality management (QM) practices and performance in the healthcare industry via the conduct of three studies. The results of this research contribute both to advancing QM theory as well as in developing a unique text mining method that is illustrated by examining QM in the healthcare industry. Essay 1 explains the relationship between operational performance and QM practices in the healthcare industry. This study analyzed the findings from the literature using meta-analysis. We applied confirmatory semantic analysis (CSA) to examine the Baldrige winners' applications. Essay 2 examines the benefits associated with an effective QM program in the healthcare industry. This study addressed the research question about how effective QM practice results in improved hospital performance. This study compares the performance of Baldrige Award-winning hospitals with matching hospitals, state average, and national average. The results show that the Baldrige Award can lead to an increase in patient satisfaction in certain periods. Essay 3 discusses the contribution of an online clinic appointment system (OCAS) to QM practices. An enhanced trust model was built on understanding the mechanism of patients' trust formation in the OCAS. Understanding the determinants related to patients' trust and willingness to use OCAS can provide valuable guidance for medical institutions to establish health information technology-based services in the quality service improvement programs. This research has three significant contributions. First, this research analyzes the role of QM practices in the healthcare industry. Second, this research attempts to develop a unique text mining method. Third, this research provides a validated trust model and contributes to the body of research on the trust of healthcare information technology. Quality Management Healthcare Meta-analysis Text Mining Baldrige
119	TEXT MINER FOR HYPERGRAPHS USING OUTPUT SPACE SAMPLING Tirupattur, Naveen 16 August 2011 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Text Mining is process of extracting high-quality knowledge from analysis of textual data. Rapidly growing interest and focus on research in many fields is resulting in an overwhelming amount of research literature. This literature is a vast source of knowledge. But due to huge volume of literature, it is practically impossible for researchers to manually extract the knowledge. Hence, there is a need for automated approach to extract knowledge from unstructured data. Text mining is right approach for automated extraction of knowledge from textual data. The objective of this thesis is to mine documents pertaining to research literature, to find novel associations among entities appearing in that literature using Incremental Mining. Traditional text mining approaches provide binary associations. But it is important to understand context in which these associations occur. For example entity A has association with entity B in context of entity C. These contexts can be visualized as multi-way associations among the entities which are represented by a Hypergraph. This thesis work talks about extracting such multi-way associations among the entities using Frequent Itemset Mining and application of a new concept called Output space sampling to extract such multi-way associations in space and time efficient manner. We incorporated concept of personalization in Output space sampling so that user can specify his/her interests as the frequent hyper-associations are extracted from the text. Text Mining PubMed Frequent Itemset Mining Data mining Hypergraphs
120	Bisecting Document Clustering Using Model-Based Methods Davis, Aaron Samuel 09 December 2009 (has links) (PDF) We all have access to large collections of digital text documents, which are useful only if we can make sense of them all and distill important information from them. Good document clustering algorithms that organize such information automatically in meaningful ways can make a difference in how effective we are at using that information. In this paper we use model-based document clustering algorithms as a base for bisecting methods in order to identify increasingly cohesive clusters from larger, more diverse clusters. We specifically use the EM algorithm and Gibbs Sampling on a mixture of multinomials as the base clustering algorithms on three data sets. Additionally, we apply a refinement step, using EM, to the final output of each clustering technique. Our results show improved agreement with human annotated document classes when compared to the existing base clustering algorithms, with marked improvement in two out of three data sets. document clustering text mining model-based Computer Sciences

Search results