Return to search

Machine learning-based approaches to data quality improvement in mobile crowdsensing and crowdsourcing

With the wide popularity of smart devices such as smartphones, smartwatches, and smart cameras, Mobile Crowdsensing (MCS) and Crowdsourcing (CS) have been broadly applied for collecting data from a large group of ordinary participants. The quality of participants' contributed data, however, is hard to guarantee, and as such it is critical to develop efficient and effective methods to automatically improve data quality over MCS/CS platforms. In this thesis, we propose three machine learning-based solutions for data quality enhancement in different participatory MCS/CS scenarios. Our solutions aim at the data extraction phase as well as the data collection phase of participatory MCS/CS, including (1) trustworthy information extraction from conflicting data, (2) recognition of learning patterns, and (3) worker recruitment based on interactive training and learning pattern extraction. The first one is designed for the data extraction phase and the other two for the data collection phase.

First, to derive reliable data from diverse or even conflicting labels from the crowd, we design a mechanism to infuse knowledge from domain experts into the labels from the crowd to automatically make correct decisions on classification-based MCS tasks. Our solution, named EFusion, utilizes a probabilistic graphical model and the expectation maximization (EM) algorithm to infer the most likely expertise level of each crowd worker, the difficulty level of tasks, and the ground truth answers. Furthermore, we introduce a method to extend EFusion from solving binary classification problems to handling multi-class classification problems. We evaluate EFusion using real-world case studies as well as simulations. Evaluation results demonstrate that EFusion can return more accurate and stable classification results than the majority voting method and state-of-the-art methods.

Second, we propose Goldilocks, an interactive learning pattern recognition framework that can identify suitable participants whose performance follows desired learning patterns. To accurately extract a participant's learning pattern, we first estimate the impact of previous training questions on the participant before she answers a new question. After the participant answers each new question, we adjust the estimation of her capability by considering a quantitative measure of the impact of previous questions and her answer to the new question. Based on the extracted learning curve of each participant, we recruit the candidates, who have showed good learning capability and desired learning patterns, for the formal MCS/CS task. We further develop a web service over Amazon Web Services (AWS) that automatically adjusts questions to maximize individual participants' learning performance. This website also profiles the participants' learning patterns, which can be used for task assignment in MCS/CS.

Third, we present HybrTraining, a hybrid deep learning framework that captures each candidate’s capability from a long-term perspective and excludes the undesired candidates in the early stage of the training phase. Using two collaborative deep learning networks, HybrTraining can dynamically match participants and MCS/CS tasks. In detail, we build a deep Q-network (DQN) to match the candidates and training batches in the training phase, and develop a long short-term memory (LSTM) model that extracts the learning patterns of different candidates and helps the DQN make better worker-task matching decisions. We build HyberTraining on Compute Canada and evaluate it over two scientific datasets. For each dataset, the learning data of candidates is collected with a Python-based Django website over Amazon Elastic Compute Cloud (Amazon EC2). Evaluation results show that HybrTraining can increase data collection efficiency and improve data quality in MCS/CS. / Graduate / 2022-08-19

Identiferoai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/13385
Date13 September 2021
CreatorsJiang, Jinghan
ContributorsWu, Kui
Source SetsUniversity of Victoria
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAvailable to the World Wide Web

Page generated in 0.0111 seconds