Global ETD Search

91	Monitoring Tweets for Depression to Detect At-Risk Users Jamil, Zunaira January 2017 (has links) According to the World Health Organization, mental health is an integral part of health and well-being. Mental illness can affect anyone, rich or poor, male or female. One such example of mental illness is depression. In Canada 5.3% of the population had presented a depressive episode in the past 12 months. Depression is difficult to diagnose, resulting in high under-diagnosis. Diagnosing depression is often based on self-reported experiences, behaviors reported by relatives, and a mental status examination. Currently, author- ities use surveys and questionnaires to identify individuals who may be at risk of depression. This process is time-consuming and costly. We propose an automated system that can identify at-risk users from their public social media activity. More specifically, we identify at-risk users from Twitter. To achieve this goal we trained a user-level classifier using Support Vector Machine (SVM) that can detect at-risk users with a recall of 0.8750 and a precision of 0.7778. We also trained a tweet-level classifier that predicts if a tweet indicates distress. This task was much more difficult due to the imbalanced data. In the dataset that we labeled, we came across 5% distress tweets and 95% non-distress tweets. To handle this class imbalance, we used undersampling methods. The resulting classifier uses SVM and performs with a recall of 0.8020 and a precision of 0.1237. Our system can be used by authorities to find a focused group of at-risk users. It is not a platform for labeling an individual as a patient with depres- sion, but only a platform for raising an alarm so that the relevant authorities could take necessary interventions to further analyze the predicted user to confirm his/her state of mental health. We respect the ethical boundaries relating to the use of social media data and therefore do not use any user identification information in our research. NLP Machine Learning Tweets text mining social media sentiment analysis
92	Extraction and representation of key characteristics from epidemiological literature Karystianis, George January 2014 (has links) Epidemiological studies are rich in information that could improve the understanding of concept complexity of a health problem, and are important sources for evidence based medicine. However, epidemiologists experience difficulties in recognising and aggregating key characteristics in related research due to an increasing number of published articles. The main aim of this dissertation is to explore how text mining techniques can assist epidemiologists to identify important pieces of information and detect and integrate key knowledge for further research and exploration via concept maps. Concept maps are widely used in medicine for exploration and representation as a relatively formal, easy to design and understand knowledge representation model. To support this aim, we have developed a methodology for the extraction of key epidemiological characteristics from all types of epidemiological research articles in order to visualise, explore and aggregate concepts related to a health care problem. A generic rule-based approach was designed and implemented for the identification of mentions of six key characteristics, including study design, population, exposure, outcome, covariate and effect size. The system also relies on automatic term recognition and biomedical dictionaries to identify concepts of interests. In order to facilitate knowledge integration and aggregation, extracted characteristics are further normalized and mapped to existing resources. Study design mentions are mapped to an expanded version of the Ontology of Clinical Research (OCRe), whereas exposure, outcome and covariate mentions are mapped to Unified Medical Language System (UMLS) semantic groups and categories. Population mentions are mapped to age groups, gender and nationality/ethnicity, and effect size mentions are normalised with the regards to the used metric and confidence interval and related concept. The evaluation has shown reliable results, with an average micro F-score of 87% for recognition of epidemiological mentions and 91% for normalisation. Normalised concepts are further organised in an automatically generated concept map, which has three sections for exposures, outcomes and covariates. To demonstrate the potential of the developed methodology, it was applied to a large-scale corpus of epidemiological research abstracts related to obesity. Obesity was chosen as a case study since it has emerged as one of the most important global health problems of the 21st century. Using the concepts extracted from the corpus, we have built a searchable database of key epidemiological characteristics explored in obesity and an automatically generated concept map represented the normalized exposures, outcomes and covariates. An epidemiological workbench (EpiTeM) was designed to enable further exploration and inspection of the normalized extracted data, with direct links to the literature. The generated results also allow exploration of trends in obesity research and can facilitate understanding of its concept complexity. For example, we have noted the most frequent concepts and the most common pairs of characteristics that have been studied in obesity epidemiology. Finally, this thesis also discusses a number of challenges for text mining of epidemiological literature and suggests various opportunities for future work. 614.4
93	Automatic structure and keyphrase analysis of scientific publications Constantin, Alexandru January 2014 (has links) Purpose. This work addresses an escalating problem within the realm of scientific publishing, that stems from accelerated publication rates of article formats difficult to process automatically. The amount of manual labour required to organise a comprehensive corpus of relevant literature has long been impractical. This has, in effect, reduced research efficiency and delayed scientific advancement. Two complementary approaches meant to alleviate this problem are detailed and improved upon beyond the current state-of-the-art, namely logical structure recovery of articles and keyphrase extraction. Methodology. The first approach targets the issue of flat-format publishing. It performs a structural analysis of the camera-ready PDF article and recognises its fine-grained organisation over logical units. The second approach is the application of a keyphrase extraction algorithm that relies on rhetorical information from the recovered structure to better contour an article’s true points of focus. A recount of the scientific article’s function, content and structure is provided, along with insights into how different logical components such as section headings or the bibliography can be automatically identified and utilised for higher-quality keyphrase extraction. Findings. Structure recovery can be carried out independently of an article’s formatting specifics, by exploiting conventional dependencies between logical components. In addition, access to an article’s logical structure is beneficial across term extraction approaches, reducing input noise and facilitating the emphasis of regions of interest. Value. The first part of this work details a novel method for recovering the rhetorical structure of scientific articles that is competitive with state-of-the-art machine learning techniques, yet requires no layout-specific tuning or prior training. The second part showcases a keyphrase extraction algorithm that outperforms other solutions in an established benchmark, yet does not rely on collection statistics or external knowledge sources in order to be proficient. 005
94	Mejora de proceso de evaluación y co-creación basada en técnica de text-analytics Rojas Valenzuela, Manuel Humberto January 2016 (has links) Magíster en Ingeniería de Negocios con Tecnologías de Información / El emprendimiento a nivel mundial y particularmente en chile presenta una problemática constante, debido a las dificultades propias de emprender se suma la necesidad de disponer de capitales de riesgo que posibiliten llevar adelante una idea de negocio. Si en lo anterior se considera que en nuestro país cerca del 96% de las empresas formalmente existentes corresponde a Mypes micros y pequeñas empresas las cuales por su tamaño no disponen de acceso directo a fuentes de financiación tradicionales, la propuesta de participar y ser acreedor de uno de los fondos concursables de capitales semillas, CORFO, Indap, capitales abeja o Crece, resulta ser una solución viable para mantener el emprendimiento o simplemente dejarlo olvidado en un baúl por falta de recursos económicos. CSASESORES es una organización creada en el año 2011 bajo el reconocimiento de esta necesidad, con el objetivo de ser un factor de cambio que permita aportar en el crecimiento de los emprendimientos y Mypes en Chile. En sus breves años la organización ha sido acreedora de uno de los capitales semillas de emprendimiento en la región metropolitana y ha aportado activamente en el desarrollo de más de 50 ideas de negocios que finalmente han resultado ganadoras en la asignación de capital de riesgo de capitales semilla SERCOTEC. Para apoyar esta iniciativa se ha diseñado un proyecto que pretende fundar las bases para los procesos de gestión de ideas de negocios de la organización creciente; permitiendo además implementar soluciones tecnológicas que posibiliten automatizar unos de los procesos más extensos, que corresponde a la evaluación, comprensión y mejora de emprendimientos que finalmente son presentados a postulaciones de fondos concursables de capitales semillas. Los resultados preliminares obtenidos son alentadores, ya que la aplicación de la técnica de Text Mining y Latent Semantic Analysis permitió identificar cerca de diez clúster con sus temáticas durante el proceso de evaluación en los ámbitos de las fortalezas y debilidades de iniciativas de capitales semillas. Junto con lo anterior se logró descubrir un conjunto de relaciones semánticas estrechas, tanto en las fortalezas como también en el ámbito de las debilidades de las iniciativas evaluadas, estas relaciones se encuentran visibles y documentadas gracias a la utilización de la técnica de Latent Semantic Analysis. / 14/7/2021 Capacidad emprendedora Evaluación de proyectos Analisis textual Text mining
95	Automatic Protein Function Annotation Through Text Mining Toonsi, Sumyyah 25 August 2019 (has links) The knowledge of a protein’s function is essential to many studies in molecular biology, genetic experiments and protein-protein interactions. The Gene Ontology (GO) captures gene products' functions in classes and establishes relationship between them. Manually annotating proteins with GO functions from the bio-medical litera- ture is a tedious process which calls for automation. We develop a novel, dictionary- based method to annotate proteins with functions from text. We extract text-based features from words matched against a dictionary of GO. Since classes are included upon any word match with their class description, the number of negative samples outnumbers the positive ones. To mitigate this imbalance, we apply strict rules before weakly labeling the dataset according to the curated annotations. Furthermore, we discard samples of low statistical evidence and train a logistic regression classifier. The results of a 5-fold cross-validation show a high precision of 91% and 96% accu- racy in the best performing fold. The worst fold showed a precision of 80% and an accuracy of 95%. We conclude by explaining how this method can be used for similar annotation problems. Protein function Gene Ontology Text Mining Biomedical Annotation Automatic
96	Quality of SQL Code Security on StackOverflow and Methods of Prevention Klock, Robert 29 July 2021 (has links) No description available. Computer Science StackOverflow SQL Injection Security Text mining Machine learning
97	A Machine Learning Approach to Predicting Community Engagement on Social Media During Disasters Alshehri, Adel 01 July 2019 (has links) The use of social media is expanding significantly and can serve a variety of purposes. Over the last few years, users of social media have played an increasing role in the dissemination of emergency and disaster information. It is becoming more common for affected populations and other stakeholders to turn to Twitter to gather information about a crisis when decisions need to be made, and action is taken. However, social media platforms, especially on Twitter, presents some drawbacks when it comes to gathering information during disasters. These drawbacks include information overload, messages are written in an informal format, the presence of noise and irrelevant information. These factors make gathering accurate information online very challenging and confusing, which in turn may affect public, communities, and organizations to prepare for, respond to, and recover from disasters. To address these challenges, we present an integrated three parts (clustering-classification-ranking) framework, which helps users choose through the masses of Twitter data to find useful information. In the first part, we build standard machine learning models to automatically extract and identify topics present in a text and to derive hidden patterns exhibited by a dataset. Next part, we developed a binary and multi-class classification model of Twitter data to categorize each tweet as relevant or irrelevant and to further classify relevant tweets into four types of community engagement: reporting information, expressing negative engagement, expressing positive engagement, and asking for information. In the third part, we propose a binary classification model to categorize the collected tweets into high or low priority tweets. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely: textual content, term frequency-inverse document frequency, Linguistic, sentiment, psychometric, temporal, and spatial. Our framework also provides insights for researchers and developers to build more robust socio-technical disasters for identifying types of online community engagement and ranking high-priority tweets in disaster situations. Classification Crisis Event Detection Ranking Text Mining Computer Sciences
98	Interpretability for Deep Learning Text Classifiers Lucaci, Diana 14 December 2020 (has links) The ubiquitous presence of automated decision-making systems that have a performance comparable to humans brought attention towards the necessity of interpretability for the generated predictions. Whether the goal is predicting the system’s behavior when the input changes, building user trust, or expert assistance in improving the machine learning methods, interpretability is paramount when the problem is not sufficiently validated in real applications, and when unacceptable results lead to significant consequences. While for humans, there are no standard interpretations for the decisions they make, the complexity of the systems with advanced information-processing capacities conceals the detailed explanations for individual predictions, encapsulating them under layers of abstractions and complex mathematical operations. Interpretability for deep learning classifiers becomes, thus, a challenging research topic where the ambiguity of the problem statement allows for multiple exploratory paths. Our work focuses on generating natural language interpretations for individual predictions of deep learning text classifiers. We propose a framework for extracting and identifying the phrases of the training corpus that influence the prediction confidence the most through unsupervised key phrase extraction and neural predictions. We assess the contribution margin that the added justification has when the deep learning model predicts the class probability of a text instance, by introducing and defining a contribution metric that allows one to quantify the fidelity of the explanation to the model. We assess both the performance impact of the proposed approach on the classification task as quantitative analysis and the quality of the generated justifications through extensive qualitative and error analysis. This methodology manages to capture the most influencing phrases of the training corpus as explanations that reveal the linguistic features used for individual test predictions, allowing humans to predict the behavior of the deep learning classifier. Interpretability Deep Learning Text Classifiers Natural Language Processing Text Mining
99	Automated Extraction Of Associations Between Methylated Genes and Diseases From Biomedical Literature Bin Res, Arwa A. 12 1900 (has links) Associations between methylated genes and diseases have been investigated in several studies, and it is critical to have such information available for better understanding of diseases and clinical decisions. However, such information is scattered in a large number of electronic publications and it is difficult to manually search for it. Therefore, the goal of the project is to develop a machine learning model that can efficiently extract such information. Twelve machine learning algorithms were applied and compared in application to this problem based on three approaches that involve: document-term frequency matrices, position weight matrices, and a hybrid approach that uses the combination of the previous two. The best results we obtained by the hybrid approach with a random forest model that, in a 10-fold cross-validation, achieved F-score and accuracy of nearly 85% and 84%, respectively. On a completely separate testing set, F-score and accuracy of 89% and 88%, respectively, were obtained. Based on this model, we developed a tool that automates extraction of associations between methylated genes and diseases from electronic text. Our study contributed an efficient method for extracting specific types of associations from free text and the methodology developed here can be extended to other similar association extraction problems. Bioinformatics genes Diseases Text Mining Methylation Association extraction
100	Intelligent Prediction of Stock Market Using Text and Data Mining Techniques Raahemi, Mohammad 04 September 2020 (has links) The stock market undergoes many fluctuations on a daily basis. These changes can be challenging to anticipate. Understanding such volatility are beneficial to investors as it empowers them to make inform decisions to avoid losses and invest when opportunities are predicted to earn funds. The objective of this research is to use text mining and data mining techniques to discover the relationship between news articles and stock prices fluctuations. There are a variety of sources for news articles, including Bloomberg, Google Finance, Yahoo Finance, Factiva, Thompson Routers, and Twitter. In our research, we use Factive and Intrinio news databases. These databases provide daily analytical articles about the general stock market, as well as daily changes in stock prices. The focus of this research is on understanding the news articles which influence stock prices. We believe that different types of stocks in the market behave differently, and news articles could provide indications on different stock price movements. The goal of this research is to create a framework that uses text mining and data mining algorithms to correlate different types of news articles with stock fluctuations to predict whether to “Buy”, “Sell”, or “Hold” a specific stock. We train Doc2Vec models on 1GB of financial news from Factiva to convert news articles into vectors of 100 dimensions. After preprocessing the data, including labeling and balancing the data, we build five predictive models, namely Neural Networks, SVM, Decision Tree, KNN, and Random Forest to predict stock movements (Buy, Sell, or Hold). We evaluate the performances of the predictive models in terms of accuracy and area under the ROC. We conclude that SVM provides the best performance among the five models to predict the stock movement. Stock Prediction Text Mining Data Mining Machine Learning Word Embedding

Search results