Return to search

Product Defect Discovery and Summarization from Online User Reviews

Product defects concern various groups of people, such as customers, manufacturers, government officials, etc. Thus, defect-related knowledge and information are essential. In keeping with the growth of social media, online forums, and Internet commerce, people post a vast amount of feedback on products, which forms a good source for the automatic acquisition of knowledge about defects. However, considering the vast volume of online reviews, how to automatically identify critical product defects and summarize the related information from the huge number of user reviews is challenging, even when we target only the negative reviews. As a kind of opinion mining research, existing defect discovery methods mainly focus on how to classify the type of product issues, which is not enough for users. People expect to see defect information in multiple facets, such as product model, component, and symptom, which are necessary to understand the defects and quantify their influence. In addition, people are eager to seek problem resolutions once they spot defects. These challenges cannot be solved by existing aspect-oriented opinion mining models, which seldom consider the defect entities mentioned above. Furthermore, users also want to better capture the semantics of review text, and to summarize product defects more accurately in the form of natural language sentences. However, existing text summarization models including neural networks can hardly generalize to user review summarization due to the lack of labeled data.

In this research, we explore topic models and neural network models for product defect discovery and summarization from user reviews. Firstly, a generative Probabilistic Defect Model (PDM) is proposed, which models the generation process of user reviews from key defect entities including product Model, Component, Symptom, and Incident Date. Using the joint topics in these aspects, which are produced by PDM, people can discover defects which are represented by those entities. Secondly, we devise a Product Defect Latent Dirichlet Allocation (PDLDA) model, which describes how negative reviews are generated from defect elements like Component, Symptom, and Resolution. The interdependency between these entities is modeled by PDLDA as well. PDLDA answers not only what the defects look like, but also how to address them using the crowd wisdom hidden in user reviews. Finally, the problem of how to summarize user reviews more accurately, and better capture the semantics in them, is studied using deep neural networks, especially Hierarchical Encoder-Decoder Models.

For each of the research topics, comprehensive evaluations are conducted to justify the effectiveness and accuracy of the proposed models, on heterogeneous datasets. Further, on the theoretical side, this research contributes to the research stream on product defect discovery, opinion mining, probabilistic graphical models, and deep neural network models. Regarding impact, these techniques will benefit related users such as customers, manufacturers, and government officials. / Ph. D. / Product defects concern various groups of people, such as customers, manufacturers, and government officials. Thus, defect-related knowledge and information are essential. In keeping with the growth of social media, online forums, and Internet commerce, people post a vast amount of feedback on products, which forms a good source for the automatic acquisition of knowledge about defects. However, considering the vast volume of online reviews, how to automatically identify critical product defects and summarize the related information from the huge number of user reviews is challenging, even when we target only the negative reviews. People expect to see defect information in multiple facets, such as product model, component, and symptom, which are necessary to understand the defects and quantify their influence. In addition, people are eager to seek problem resolutions once they spot defects. Furthermore, users also want to better summarize product defects more accurately in the form of natural language sentences. These requirements cannot be satisfied by existing methods, which seldom consider the defect entities mentioned above, or hardly generalize to user review summarization. In this research, we develop novel Machine Learning (ML) algorithms for product defect discovery and summarization. Firstly, we study how to identify product defects and their related attributes, such as Product Model, Component, Symptom, and Incident Date. Secondly, we devise a novel algorithm, which can discover product defects and the related Component, Symptom, and Resolution, from online user reviews. This method tells not only what the defects look like, but also how to address them using the crowd wisdom hidden in user reviews. Finally, we address the problem of how to summarize user reviews in the form of natural language sentences using a paraphrase-style method. On the theoretical side, this research contributes to multiple research areas in Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning. Regarding impact, these techniques will benefit related users such as customers, manufacturers, and government officials.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/85581
Date29 October 2018
CreatorsZhang, Xuan
ContributorsComputer Science, Fan, Weiguo, Fox, Edward A., Rozovskaya, Alla, Zhang, Zhongju, Huang, Bert, Wang, Gang Alan
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.002 seconds