Global ETD Search

Return to search

Health Information Extraction from Social Media

abstract: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media.

This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.

This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2016

http://hdl.handle.net/2286/R.I.40354

Bioinformatics

Artificial intelligence

Public health

Deep Learning

Information Extraction

Machine Learning

Natural Language Processing

Pharmacovigilance

Social Media Mining

Identifer	oai:union.ndltd.org:asu.edu/item:40354
Date	January 2016
Contributors	Nikfarjam, Azadeh (Author), Gonzalez, Graciela (Advisor), Greenes, Robert (Committee member), Scotch, Matthew (Committee member), Arizona State University (Publisher)
Source Sets	Arizona State University
Language	English
Detected Language	English
Type	Doctoral Dissertation
Format	105 pages
Rights	http://rightsstatements.org/vocab/InC/1.0/, All Rights Reserved

Page generated in 0.0021 seconds

Health Information Extraction from Social Media

Description

Links & Downloads

Tags

Additional Fields