The increasing volume and complexity of text data are challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but it remains a challenge to model the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this work, I explore how to crowdsource sensemaking for intelligence analysis. Specifically, I focus on the complex processes that include developing hypotheses and theories from a raw dataset and iteratively refining the analysis. I first developed Connect the Dots, a web application that implements the concept of "context slices" and supports novice crowds in building relationship networks for exploratory analysis. Then I developed CrowdIA, a software platform that implements the entire crowd sensemaking pipeline and the context slicing for each step, to enable unsupervised crowd sensemaking. Using the pipeline as a testbed, I probed the errors and bottlenecks in crowdsourced sensemaking,and suggested design recommendations for integrated crowdsourcing systems. Building on these insights and to support iterative crowd sensemaking, I developed the concept of "crowd auditing" in which an auditor examines a pipeline of crowd analyses and diagnoses the problems to steer future refinement. I explored the design space to support crowd auditing and developed CrowdTrace, a crowd auditing tool that enables novice auditors to effectively identify the important problems with the crowd analysis and create microtasks for crowd workers to fix the problems.The core contributions of this work include a pipeline that enables distributed crowd collaboration to holistic sensemaking processes, two novel concepts of "context slices" and "crowd auditing", web applications that support crowd sensemaking and auditing, as well as design implications for crowd sensemaking systems. The hope is that the crowd sensemaking pipeline can serve to accelerate research on sensemaking, and contribute to helping people conduct in-depth investigations of large collections of information. / Doctor of Philosophy / In today's world, we have access to large amounts of data that provide opportunities to solve problems at unprecedented depths and scales. While machine learning offers powerful capabilities to support data analysis, to extract meaning from raw data is cognitively demanding and requires significant person-power. Crowdsourcing aggregates human intelligence, yet it remains a challenge for many distributed agents to collaborate asynchronously and meaningfully.
The contribution of this work is to explore how to use crowdsourcing to make sense of the copious and complex data. I first implemented the concept of ``context slices'', which split up complex sensemaking tasks by context, to support meaningful division of work. I developed a web application, Connect the Dots, which generates relationship networks from text documents with crowdsourcing and context slices. Then I developed a crowd sensemaking pipeline based on the expert sensemaking process. I implemented the pipeline as a web platform, CrowdIA, which guides crowds to solve mysteries without expert intervention. Using the pipeline as a testbed, I probed the errors and bottlenecks in crowd sensemaking and provided design recommendations for crowd intelligence systems. Finally, I introduced the concept of ``crowd auditing'', in which an auditor examines a pipeline of crowd analyses and diagnoses the problems to steer a top-down path of the pipeline and refine the crowd analysis. The hope is that the crowd sensemaking pipeline can serve to accelerate research on sensemaking, and contribute to helping people conduct in-depth investigations of large collections of data.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/99937 |
Date | 28 July 2020 |
Creators | Li, Tianyi |
Contributors | Computer Science, North, Christopher L., Luther, Kurt, Convertino, Gregorio, Kavanaugh, Andrea L., Wang, Gang Alan |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0023 seconds