271 |
Designing and Evaluating Object-Level Interaction to Support Human-Model Communication in Data AnalysisSelf, Jessica Zeitz 09 May 2016 (has links)
High-dimensional data appear in all domains and it is challenging to explore. As the number of dimensions in datasets increases, the harder it becomes to discover patterns and develop insights. Data analysis and exploration is an important skill given the amount of data collection in every field of work. However, learning this skill without an understanding of high-dimensional data is challenging. Users naturally tend to characterize data in simplistic one-dimensional terms using metrics such as mean, median, mode. Real-world data is more complex. To gain the most insight from data, users need to recognize and create high-dimensional arguments. Data exploration methods can encourage thinking beyond traditional one-dimensional insights. Dimension reduction algorithms, such as multidimensional scaling, support data explorations by reducing datasets to two dimensions for visualization. Because these algorithms rely on underlying parameterizations, they may be manipulated to assess the data from multiple perspectives. Manipulating can be difficult for users without a strong knowledge of the underlying algorithms. Visual analytics tools that afford object-level interaction (OLI) allow for generation of more complex insights, despite inexperience with multivariate data or the underlying algorithm.
The goal of this research is to develop and test variations on types of interactions for interactive visual analytic systems that enable users to tweak model parameters directly or indirectly so that they may explore high-dimensional data. To study interactive data analysis, we present an interface, Andromeda, that enables non-experts of statistical models to explore domain-specific, high-dimensional data. This application implements interactive weighted multidimensional scaling (WMDS) and allows for both parametric and observation-level interaction to provide in-depth data exploration.
We performed multiple user studies to answer how parametric and object-level interaction aid in data analysis. With each study, we found usability issues and then designed solutions for the next study. With each critique we uncovered design principles of effective, interactive, visual analytic tools. The final part of this research presents these principles supported by the results of our multiple informal and formal usability studies. The established design principles focus on human-centered usability for developing interactive visual analytic systems that enable users to analyze high-dimensional data through object-level interaction. / Ph. D.
|
272 |
Compressive Sensing Approaches for Sensor based Predictive Analytics in Manufacturing and Service SystemsBastani, Kaveh 14 March 2016 (has links)
Recent advancements in sensing technologies offer new opportunities for quality improvement and assurance in manufacturing and service systems. The sensor advances provide a vast amount of data, accommodating quality improvement decisions such as fault diagnosis (root cause analysis), and real-time process monitoring. These quality improvement decisions are typically made based on the predictive analysis of the sensor data, so called sensor-based predictive analytics. Sensor-based predictive analytics encompasses a variety of statistical, machine learning, and data mining techniques to identify patterns between the sensor data and historical facts. Given these patterns, predictions are made about the quality state of the process, and corrective actions are taken accordingly.
Although the recent advances in sensing technologies have facilitated the quality improvement decisions, they typically result in high dimensional sensor data, making the use of sensor-based predictive analytics challenging due to their inherently intensive computation. This research begins in Chapter 1 by raising an interesting question, whether all these sensor data are required for making effective quality improvement decisions, and if not, is there any way to systematically reduce the number of sensors without affecting the performance of the predictive analytics? Chapter 2 attempts to address this question by reviewing the related research in the area of signal processing, namely, compressive sensing (CS), which is a novel sampling paradigm as opposed to the traditional sampling strategy following the Shannon Nyquist rate. By CS theory, a signal can be reconstructed from a reduced number of samples, hence, this motivates developing CS based approaches to facilitate predictive analytics using a reduced number of sensors. The proposed research methodology in this dissertation encompasses CS approaches developed to deliver the following two major contributions, (1) CS sensing to reduce the number of sensors while capturing the most relevant information, and (2) CS predictive analytics to conduct predictive analysis on the reduced number of sensor data.
The proposed methodology has a generic framework which can be utilized for numerous real-world applications. However, for the sake of brevity, the validity of the proposed methodology has been verified with real sensor data associated with multi-station assembly processes (Chapters 3 and 4), additive manufacturing (Chapter 5), and wearable sensing systems (Chapter 6). Chapter 7 summarizes the contribution of the research and expresses the potential future research directions with applications to big data analytics. / Ph. D.
|
273 |
Towards Support of Visual Analytics for Synthetic InformationAgashe, Aditya Vidyanand 15 September 2015 (has links)
This thesis describes a scalable system for visualizing and exploring global synthetic populations. The implementation described in this thesis addresses the following existing limitations of the Syn- thetic Information Viewer (SIV): (i) it adds ability to support synthetic populations for the entire globe by resolving data inconsistencies, (ii) introduces opportunities to explore and find patterns in the data, and (iii) allows the addition of new synthetic population centers with minimal effort. We propose the following extensions to the system: (i) Data Registry: an abstraction layer for handling heterogeneity of data across countries, and adding new population centers for visualizations, and (ii) Visual Query Interface: for exploring and analyzing patterns to gain insights. With these additions, our system is capable of visual exploration and querying of heterogeneous, temporal, spatial and social data for 14 countries with a total population of 830 million. Work in this thesis takes a step towards providing visual analytics capability for synthetic information. This system will assist urban planners, public health analysts, and, any individuals interested in socially-coupled systems, by empowering them to make informed decisions through exploration of synthetic information. / Master of Science
|
274 |
Efficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed DatabasesKhan, Mohammed Saquib Akmal 26 January 2015 (has links)
Real-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges.
In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35].
DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges.
We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11.
Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government. / Master of Science
|
275 |
Data-Driven Park Planning: Comparative Study of Survey with Social Media DataSim, Jisoo 05 May 2020 (has links)
The purpose of this study was (1) to identify visitors’ behaviors in and perceptions of linear parks, (2) to identify social media users’ behaviors in and perceptions of linear parks, and (3) to compare small data with big data. This chapter discusses the main findings and their implications for practitioners such as landscape architects and urban planners. It has three sections. The first addresses the main findings in the order of the research questions at the center of the study. The second describes implications and recommendations for practitioners. The final section discusses the limitations of the study and suggests directions for future work.
This study compares two methods of data collection, focused on activities and benefits. The survey asked respondents to check all the activities they did in the park. Social media users’ activities were detected by term frequency in social media data. Both results ordered the activities similarly. For example social interaction and art viewing were most popular on the High Line, then the 606, then the High Bridge according to both methods. Both methods also reported that High Line visitors engaged in viewing from overlooks the most. As for benefits, according to both methods vistors to the 606 were more satisfied than High Line visitors with the parks’ social and natural benefits. These results suggest social media analytics can replace surveys when the textual information is sufficient for analysis.
Social media analytics also differ from surveys in accuracy of results. For example, social media revealed that 606 users were interested in events and worried about housing prices and crimes, but the pre-designed survey could not capture those facts. Social media analytics can also catch hidden and more general information: through cluster analysis, we found possible reasons for the High Line’s success in the arts and in the New York City itself. These results involve general information that would be hard to identify through a survey.
On the other hand, surveys provide specific information and can describe visitors’ demographics, motivations, travel information, and specific benefits. For example, 606 users tend to be young, high-income, well educated, white, and female. These data cannot be collected through social media. / Doctor of Philosophy / Turning unused infrastructure into green infrastructure, such as linear parks, is not a new approach to managing brownfields. In the last few decades, changes in the industrial structure and the development of transportation have had a profound effect on urban spatial structure. As the need for infrastructure, which played an important role in the development of past industry, has decreased, many industrial sites, power plants, and military bases have become unused. This study identifies new ways of collecting information about a new type of park, linear parks, using a new method, social media analytics. The results are then compared with survey results to establish the credibility of social media analytics. Lastly, shortcomings of social media analytics are identified. This study is meaningful in helping us understand the users of new types of parks and suggesting design and planning strategies. Regarding methodology, this study also involves evaluating the use of social media analytics and its advantages, disadvantages, and reliability.
|
276 |
Automated extraction of product feedback from online reviews: Improving efficiency, value, and total yieldGoldberg, David Michael 25 April 2019 (has links)
In recent years, the expansion of online media has presented firms with rich and voluminous new datasets with profound business applications. Among these, online reviews provide nuanced details on consumers' interactions with products. Analysis of these reviews has enormous potential, but the enormity of the data and the nature of unstructured text make mining these insights challenging and time-consuming. This paper presents three studies examining this problem and suggesting techniques for automated extraction of vital insights.
The first study examines the problem of identifying mentions of safety hazards in online reviews. Discussions of hazards may have profound importance for firms and regulators as they seek to protect consumers. However, as most online reviews do not pertain to safety hazards, identifying this small portion of reviews is a challenging problem. Much of the literature in this domain focuses on selecting "smoke terms," or specific words and phrases closely associated with the mentions of safety hazards. We first examine and evaluate prior techniques to identify these reviews, which incorporate substantial human opinion in curating smoke terms and thus vary in their effectiveness. We propose a new automated method that utilizes a heuristic to curate smoke terms, and we find that this method is far more efficient than the human-driven techniques. Finally, we incorporate consumers' star ratings in our analysis, further improving prediction of safety hazard-related discussions.
The second study examines the identification of consumer-sourced innovation ideas and opportunities from online reviews. We build upon a widely-accepted attribute mapping framework from the entrepreneurship literature for evaluating and comparing product attributes. We first adapt this framework for use in the analysis of online reviews. Then, we develop analytical techniques based on smoke terms for automated identification of innovation opportunities mentioned in online reviews. These techniques can be used to profile products as to attributes that affect or have the potential to affect their competitive standing. In collaboration with a large countertop appliances manufacturer, we assess and validate the usefulness of these suggestions, tying together the theoretical value of the attribute mapping framework and the practical value of identifying innovation-related discussions in online reviews.
The third study addresses safety hazard monitoring for use cases in which a higher yield of safety hazards detected is desirable. We note a trade-off between the efficiency of hazard techniques described in the first study and the depth of such techniques, as a high proportion of identified records refer to true hazards, but several important hazards may be undetected. We suggest several techniques for handling this trade-off, including alternate objective functions for heuristics and fuzzy term matching, which improve the total yield. We examine the efficacy of each of these techniques and contrast their merits with past techniques. Finally, we test the capability of these methods to generalize to online reviews across different product categories. / Doctor of Philosophy / This dissertation presents three studies that utilize text analytic methods to analyze and derive insights from online reviews. The first study aims to detect distinctive words and phrases particularly prevalent in online reviews that describe safety hazards. This study proposes algorithmic and heuristic methods for identifying words and phrases that are especially common in these reviews, allowing for an automated process to prioritize these reviews for practitioners more efficiently. The second study extends these methods for use in detecting mentions of product innovation opportunities in online reviews. We show that these techniques can used to profile products based on attributes that differentiate them from competition or have the potential to do so in the future. Additionally, we validate that product managers find this attribute profiling useful to their innovation processes. Finally, the third study examines automated safety hazard monitoring for situations in which the yield or total number of safety hazards detected is an important consideration in addition to efficiency. We propose a variety of new techniques for handling these situations and contrast them with the techniques used in prior studies. Lastly, we test these methods across diverse product categories.
|
277 |
Dynamic Behavior Visualizer: A Dynamic Visual Analytics Framework for Understanding Complex Networked ModelsMaloo, Akshay 04 February 2014 (has links)
Dynamic Behavior Visualizer (DBV) is a visual analytics environment to visualize the spatial and temporal movements and behavioral changes of an individual or a group, e.g. family within a realistic urban environment. DBV is specifically designed to visualize the adaptive behavioral changes, as they pertain to the interactions with multiple inter-dependent infrastructures, in the aftermath of a large crisis, e.g. hurricane or the detonation of an improvised nuclear device. DBV is web-enabled and thus is easily accessible to any user with access to a web browser. A novel aspect of the system is its scale and fidelity. The goal of DBV is to synthesize information and derive insight from it; detect the expected and discover the unexpected; provide timely and easily understandable assessment and the ability to piece together all this information. / Master of Science
|
278 |
Solving Mysteries with Crowds: Supporting Crowdsourced Sensemaking with a Modularized Pipeline and Context SlicesLi, Tianyi 28 July 2020 (has links)
The increasing volume and complexity of text data are challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but it remains a challenge to model the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this work, I explore how to crowdsource sensemaking for intelligence analysis. Specifically, I focus on the complex processes that include developing hypotheses and theories from a raw dataset and iteratively refining the analysis. I first developed Connect the Dots, a web application that implements the concept of "context slices" and supports novice crowds in building relationship networks for exploratory analysis. Then I developed CrowdIA, a software platform that implements the entire crowd sensemaking pipeline and the context slicing for each step, to enable unsupervised crowd sensemaking. Using the pipeline as a testbed, I probed the errors and bottlenecks in crowdsourced sensemaking,and suggested design recommendations for integrated crowdsourcing systems. Building on these insights and to support iterative crowd sensemaking, I developed the concept of "crowd auditing" in which an auditor examines a pipeline of crowd analyses and diagnoses the problems to steer future refinement. I explored the design space to support crowd auditing and developed CrowdTrace, a crowd auditing tool that enables novice auditors to effectively identify the important problems with the crowd analysis and create microtasks for crowd workers to fix the problems.The core contributions of this work include a pipeline that enables distributed crowd collaboration to holistic sensemaking processes, two novel concepts of "context slices" and "crowd auditing", web applications that support crowd sensemaking and auditing, as well as design implications for crowd sensemaking systems. The hope is that the crowd sensemaking pipeline can serve to accelerate research on sensemaking, and contribute to helping people conduct in-depth investigations of large collections of information. / Doctor of Philosophy / In today's world, we have access to large amounts of data that provide opportunities to solve problems at unprecedented depths and scales. While machine learning offers powerful capabilities to support data analysis, to extract meaning from raw data is cognitively demanding and requires significant person-power. Crowdsourcing aggregates human intelligence, yet it remains a challenge for many distributed agents to collaborate asynchronously and meaningfully.
The contribution of this work is to explore how to use crowdsourcing to make sense of the copious and complex data. I first implemented the concept of ``context slices'', which split up complex sensemaking tasks by context, to support meaningful division of work. I developed a web application, Connect the Dots, which generates relationship networks from text documents with crowdsourcing and context slices. Then I developed a crowd sensemaking pipeline based on the expert sensemaking process. I implemented the pipeline as a web platform, CrowdIA, which guides crowds to solve mysteries without expert intervention. Using the pipeline as a testbed, I probed the errors and bottlenecks in crowd sensemaking and provided design recommendations for crowd intelligence systems. Finally, I introduced the concept of ``crowd auditing'', in which an auditor examines a pipeline of crowd analyses and diagnoses the problems to steer a top-down path of the pipeline and refine the crowd analysis. The hope is that the crowd sensemaking pipeline can serve to accelerate research on sensemaking, and contribute to helping people conduct in-depth investigations of large collections of data.
|
279 |
Sensemaking in Immersive Space to Think: Exploring Evolution, Expertise, Familiarity, and Organizational StrategiesDavidson, Kylie Marie 20 August 2024 (has links)
Sensemaking is the way in which we understand the world around us. Pirolli and Card developed a sensemaking model related to intelligence analysis, which involves taking raw, unstructured data, analyzing it, and presenting a report of the findings. With lower-cost immersive technologies becoming more popular, new opportunities exist to leverage embodied and distributed cognition to better support sensemaking by providing vast, immersive space for creating meaningful schemas (organizational structures) during an analysis task. This work builds on prior work in immersive analytics on the concept of Immersive Space to Think (IST), which provides analysts with immersive space to physically navigate and use to organize information during a sensemaking task. In this work, we performed several studies that aimed to understand how IST supports sensemaking and how we can develop additional features to better aid analysts while they complete sensemaking in immersive analytics systems, focusing on non-quantitative data analysis. In a series of exploratory user studies, we aimed to understand how users' sensemaking process evolves during multiple session analyses, which identified how the participants refined their use of the immersive space into later stages of the sensemaking process. Another exploratory user study highlighted how professional analysts and novice users share many similarities in immersive analytic tool usage during sensemaking within IST. In addition to looking at multi-session analysis tasks, we also explored how sensemaking strategies change as users become more familiar with the immersive analytics tool usage in an exploratory study that utilized multiple analysis tasks completed over a series of three user study sessions. Lastly, we conducted a comparative user study to evaluate how the addition of new organizational features, clustering, and linking affect sensemaking within IST. Overall, our studies expanded the IST tool set and gathered an enhanced understanding of how immersive space is utilized during analysis tasks within IST. / Doctor of Philosophy / Sensemaking is a process we do in our daily lives. It is how we understand the world around us, make decisions, and complete complex analyses, like journalists writing stories or detectives solving cases. Sensemaking involves gathering information, making sense of it, developing hypotheses, and drawing conclusions, similar to writing a report. This work builds on prior work in Immersive Space to Think (IST), which is a concept of using immersive technologies (Virtual /Augmented Reality) to support sensemaking by providing vast 3D space for organizing the data used in a sensemaking task. Additionally, using these technologies to support sensemaking provides benefits such as increased space for analysis, increased engagement, and natural user interaction, which allow us to interact with information used during sensemaking tasks in new ways. In IST, users are able to move virtual documents around in the space around them to support their analysis process. In this work, we ran a study focused on multi-session analysis within IST, revealing how users refined their document placements over time while completing sensemaking tasks within IST. We also ran a study to understand how professional analysts' and novice users' analysis with IST differed in the IST tool usage. In another user study, we explored how users' strategies for sensemaking and document layouts changed as they became more familiar with the IST tool. Lastly, we conducted a comparative user study to evaluate how new features like clustering and linking affected analysis within IST. Overall, our work contributed to an enhanced understanding of how immersive space is utilized during analysis tasks within IST.
|
280 |
The impact of big data analytics on firms’ high value business performancePopovic, A., Hackney, R., Tassabehji, Rana, Castelli, M. 2016 October 1928 (has links)
Yes / Big Data Analytics (BDA) is an emerging phenomenon with the reported potential to transform how firms manage and enhance high value businesses performance. The purpose of our study is to investigate the impact of BDA on operations management in the manufacturing sector, which is an acknowledged infrequently researched context. Using an interpretive qualitative approach, this empirical study leverages a comparative case study of three manufacturing companies with varying levels of BDA usage (experimental, moderate and heavy). The information technology (IT) business value literature and a resource based view informed the development of our research propositions and the conceptual framework that illuminated the relationships between BDA capability and organizational readiness and design. Our findings indicate that BDA capability (in terms of data sourcing, access, integration, and delivery, analytical capabilities, and people’s expertise) along with organizational readiness and design factors (such as BDA strategy, top management support, financial resources, and employee engagement) facilitated better utilization of BDA in manufacturing decision making, and thus enhanced high value business performance. Our results also highlight important managerial implications related to the impact of BDA on empowerment of employees, and how BDA can be integrated into organizations to augment rather than replace management capabilities. Our research will be of benefit to academics and practitioners in further aiding our understanding of BDA utilization in transforming operations and production management. It adds to the body of limited empirically based knowledge by highlighting the real business value resulting from applying BDA in manufacturing firms and thus encouraging beneficial economic societal changes.
|
Page generated in 0.0262 seconds