Global ETD Search

261	<b>Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative models for confidence intervals and Free-Knot changepoint models</b> Sheng Zhang (18468615) 11 June 2024 (has links) <p dir="ltr">We studied the relationship between subsample size and the accuracy of resulted estimation under big data setup.</p><p dir="ltr">We also proposed a novel approach to the construction of confidence intervals based on improved concentration inequalities.</p><p dir="ltr">Lastly, we studied irregular change-point models using free-knot splines.</p> Applied statistics Subsampling. Big Data Analytics Analyzing Changepoint model
262	Online Review Analytics: New Methods for discovering Key Product Quality and Service Concerns Zaman, Nohel 09 July 2019 (has links) The purpose of this dissertation intends to discover as well as categorize safety concern reports in online reviews by using key terms prevalent in sub-categories of safety concerns. This dissertation extends the literature of semi-automatic text classification methodology in monitoring and classifying product quality and service concerns. We develop various text classification methods for finding key concerns across a diverse set of product and service categories. Additionally, we generalize our results by testing the performance of our methodologies on online reviews collected from two different data sources (Amazon product reviews and Facebook hospital service reviews). Stakeholders such as product designers and safety regulators can use the semi-automatic classification procedure to subcategorize safety concerns by injury type and narrative type (Chapter 1). We enhance the text classification approach by proposing a Risk Assessment Model for quality management (QM) professionals, safety regulators, and product designers to allow them to estimate overall risk level of specific products by analyzing consumer-generated content in online reviews (Chapter 2). Monitoring and prioritizing the hazard risk levels of products will help the stakeholders to make appropriate actions on mitigating the risk of product safety. Lastly, the text classification approach discovers and ranks aspects of services that predict overall user satisfaction (Chapter 3). The key service terms are beneficial for healthcare providers to rapidly trace specific service concerns for improving the hospital services. / Doctor of Philosophy / This dissertation extends past studies by examining safety surveillance of online reviews. We examine online reviews reporting specific categories of safety concerns and contrast them with reviews not reporting these specific safety concerns. Business and regulators are benefited in detecting, categorizing, and prioritizing safety concerns across product categories. We use key terms prevalent in domain-related safety concerns for granular analysis of consumer reviews. Secondly, beyond utilizing the key terms to discover specific hazard incidents, safety regulators and manufacturers may use the extended risk assessment framework to estimate the risk severity, risk likelihood, and overall risk level of a specific product. The model could be useful for product safety practitioners in product risk identification and mitigation. Finally, this dissertation identifies the aspects of service quality concerns present in online hospital reviews. This study uses text analytics method by using key terms to detect these specific service concerns and hence determine primary rationales for patient feedback on hospital services. Managerially, this information helps to prioritize the areas in greatest need of improvement of hospital services. Additionally, generating key terms for a particular service attribute aids health care policy makers and providers in rapidly monitoring specific concerns and adjusting policies or resources to better serve patient online reviews text analytics risk assessment hospitals service quality
263	Characterizing Human Driving Behavior Through an Analysis of Naturalistic Driving Data Ali, Gibran 23 January 2023 (has links) Reducing the number of motor vehicle crashes is one of the major challenges of our times. Current strategies to reduce crash rates can be divided into two groups: identifying risky driving behavior prior to crashes to proactively reduce risk and automating some or all human driving tasks using intelligent vehicle systems such as Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS). For successful implementation of either strategy, a deeper understanding of human driving behavior is essential. This dissertation characterizes human driving behavior through an analysis of a large naturalistic driving study and offers four major contributions to the field. First, it describes the creation of the Surface Accelerations Reference, a catalog of all longitudinal and lateral surface accelerations found in the Second Strategic Highway Research Program Naturalistic Driving Study (SHRP 2 NDS). SHRP 2 NDS is the largest naturalistic driving study in the world with 34.5 million miles of data collected from over 3,500 participants driving in six separate locations across the United States. An algorithm was developed to detect each acceleration epoch and summarize key parameters, such as the mean and maxima of the magnitude, roadway properties, and driver inputs. A statistical profile was then created for each participant describing their acceleration behavior in terms of rates, percentiles, and the magnitude of the strongest event in a distance threshold. The second major contribution is quantifying the effect of several factors that influence acceleration behavior. The rate of mild to harsh acceleration epochs was modeled using negative binomial distribution-based generalized linear mixed effect models. Roadway speed category, driver age, driver gender, vehicle class, and location were used as fixed effects, and a unique participant identifier was as the random effect. Subcategories of each fixed effect were compared using incident rate ratios. Roadway speed category was found to have the largest effect on acceleration behavior, followed by driver age, vehicle class, and location. This methodology accounts for the major influences while simultaneously ensuring that the comparisons are meaningful and not driven by coincidences of data collection. The third major contribution is the extraction of acceleration-based long-term driving styles and determining their relationship to crash risk. Rates of acceleration epochs experienced on ≤ 30 mph roadways were used to cluster the participants into four groups. The metrics to cluster the participants were chosen so that they represent long-term driving style and not short-term driving behavior being influenced by transient traffic and environmental conditions. The driving style was also correlated to driving risk by comparing the crash rates, near-crash rates, and speeding behavior of the participants. Finally, the fourth major contribution is the creation of a set of interactive analytics tools that facilitate quick characterization of human driving during regular as well as safety-critical driving events. These tools enable users to answer a large and open-ended set of research questions that aid in the development of ADAS and ADS components. These analytics tools facilitate the exploration of queries such as how often do certain scenarios occur in naturalistic driving, what is the distribution of key metrics during a particular scenario, or what is the relative composition of various crash datasets? Novel visual analytics principles such as video on demand have been implemented to accelerate the sense-making loop for the user. / Doctor of Philosophy / Naturalistic driving studies collect data from participants driving their own vehicles over an extended period. These studies offer unique perspectives in understanding driving behavior by capturing routine and rare events. Two important aspects of understanding driving behavior are longitudinal acceleration, which indicates how people speed up or slow down, and lateral acceleration, which shows how people take turns. In this dissertation, millions of miles of driving data were analyzed to create an open access acceleration database representing the driving profiles of thousands of drivers. These profiles are useful to understand and model human driving behavior, which is essential for developing advanced vehicle systems and smart roadway infrastructure. The acceleration database was used to quantify the effect of various roadway properties, driver demographics, vehicle classification, and environmental factors on acceleration driving behavior. The acceleration database was also used to define distinct driving styles and their relationship to driving risk. A set of interactive analytics tools was developed that leverage naturalistic driving data by enabling users to ask a large set of questions and facilitate open-ended analysis. Novel visualization and data presentation techniques were developed to help users extract deeper insight about driving behavior faster than previously exiting tools. These tools will aid in the development and testing of automated driving systems and advanced driver assistance systems. Driving style vehicle acceleration crash risk interactive analytics
264	Explainable and Network-based Approaches for Decision-making in Emergency Management Tabassum, Anika 19 October 2021 (has links) Critical Infrastructures (CIs), such as power, transportation, healthcare, etc., refer to systems, facilities, technologies, and networks vital to national security, public health, and socio-economic well-being of people. CIs play a crucial role in emergency management. For example, the recent Hurricane Ida, Texas Winter storm, colonial cyber-attack that occurred during 2021 in the US, shows the CIs are highly inter-dependent with complex interactions. Hence power system failures and shutdown of natural gas pipelines, in turn, led to debilitating impacts on communication, waste systems, public health, etc. Consider power failures during a disaster, such as a hurricane. Subject Matter Experts (SMEs) such as emergency management authorities may be interested in several decision-making tasks. Can we identify disaster phases in terms of the severity of damage from analyzing changes in power failures? Can we tell the SMEs which power grids or regions are the most affected during each disaster phase and need immediate action to recover? Answering these questions can help SMEs to respond quickly and send resources for fast recovery from damage. Can we systematically provide how the failure of different power grids may impact the whole CIs due to inter-dependencies? This can help SMEs to better prepare and mitigate the risks by improving system resiliency. In this thesis, we explore problems to efficiently operate decision-making tasks during a disaster for emergency management authorities. Our research has two primary directions, guide decision-making in resource allocation and plans to improve system resiliency. Our work is done in collaboration with the Oak Ridge National Laboratory to contribute impactful research in real-life CIs and disaster power failure data. 1. Explainable resource allocation: In contrast to the current interpretable or explainable model that provides answers to understand a model output, we view explanations as answers to guide resource allocation decision-making. In this thesis, we focus on developing a novel model and algorithm to identify disaster phases from changes in power failures. Also, pinpoint the regions which can get most affected at each disaster phase so the SMEs can send resources for fast recovery. 2. Networks for improving system resiliency: We view CIs as a large heterogeneous network with nodes as infrastructure components and dependencies as edges. Our goal is to construct a visual analytic tool and develop a domain-inspired model to identify the important components and connections to which the SMEs need to focus and better prepare to mitigate the risk of a disaster. / Doctor of Philosophy / Critical Infrastructure Systems (CIs) entitle multiple infrastructures valuable for maintaining public life and national security, e.g., power, water, transportation. US Federal Emergency Management Agency (FEMA) aims to protect the nation and citizens by mitigating all hazards during natural or man-made disasters. For this, they aim to adopt different decision-making strategies efficiently. E.g., During an ongoing disaster, when to quickly send resources, which regions to send resources first, etc. Besides, they also need to plan how to prepare for a future disaster and which CIs need maintenance to improve system resiliency. We explore several data-mining problems which can guide FEMA towards developing efficient decision-making strategies. Our thesis emphasizes explainable and network-based models and algorithms to help decision-making operations for emergency management experts by leveraging critical infrastructures data. Critical Infrastructure Urban analytics Multivariate Time-series Urban-Net Explanations
265	Bridging Cognitive Gaps Between User and Model in Interactive Dimension Reduction Wang, Ming 05 May 2020 (has links) High-dimensional data is prevalent in all domains but is challenging to explore. Analysis and exploration of high-dimensional data are important for people in numerous fields. To help people explore and understand high-dimensional data, Andromeda, an interactive visual analytics tool, has been developed. However, our analysis uncovered several cognitive gaps relating to the Andromeda system: users do not realize the necessity of explicitly highlighting all the relevant data points; users are not clear about the dimensional information in the Andromeda visualization; and the Andromeda model cannot capture user intentions when constructing and deconstructing clusters. In this study, we designed and implemented solutions to address these gaps. Specifically, for the gap in highlighting all the relevant data points, we introduced a foreground and background view and distance lines. Our user study with a group of undergraduate students revealed that the foreground and background views and distance lines could significantly alleviate the highlighting issue. For the gap in understanding visualization dimensions, we implemented a dimension-assist feature. The results of a second user study with students with various backgrounds suggested that the dimension-assist feature could make it easier for users to find the extremum in one dimension and to describe correlations among multiple dimensions; however, the dimension-assist feature had only a small impact on characterizing the data distribution and assisting users in understanding the meanings of the weighted multidimensional scaling (WMDS) plot axes. Regarding the gap in creating and deconstructing clusters, we implemented a solution utilizing random sampling. A quantitative analysis of the random sampling strategy was performed, and the results demonstrated that the strategy improved Andromeda's capabilities in constructing and deconstructing clusters. We also applied the random sampling to two-point manipulations, making the Andromeda system more flexible and adaptable to differing data exploration tasks. Limitations are discussed, and potential future research directions are identified. / Master of Science / High-dimensional data is the dataset with hundreds or thousands of features. The animal dataset, which has been used in this study, is an example of high-dimensional dataset, since animals can be categorized by a lot of features, such as size, furry, behavior and so on. High-dimensional data is prevalent but difficult for people to analyze. For example, it is hard to find out the similarity among dozens of animals, or to find the relationship between different characterizations of animals. To help people with no statistical knowledge to analyze the high-dimensional dataset, our group developed a web-based visualization software called Andromeda, which can display data as points (such as animal data points) on a screen and allow people to interact with these points to express their similarity by dragging points on the screen (e.g., drag "Lion," "Wolf," and "Killer Whale" together because all three are hunters, forming a cluster of three animals). Therefore, it enables people to interactively analyze the hidden pattern of high-dimensional data. However, we identified several cognitive gaps that have negatively limited Andromeda's effectiveness in helping people understand high-dimensional data. Therefore, in this work, we intended to make improvements to the original Andromeda system to bridge these gaps, including designing new visual features to help people better understand how Andromeda processes and interacts with high-dimensional data and improving the underlying algorithm so that the Andromeda system can better understand people's intension during the data exploration process. We extensively evaluated our designs through both qualitative and quantitative analysis (e.g., user study on both undergraduate and graduate students and statistical testing) on our animal dataset, and the results confirmed that the improved Andromeda system outperformed the original version significantly in a series of high-dimensional data understanding tasks. Finally, the limitations and potential future research directions were discussed. visual analytics human-computer interaction interface design dimension reduction
266	Dynamic Behavior Visualizer: A Dynamic Visual Analytics Framework for Understanding Complex Networked Models Maloo, Akshay 04 February 2014 (has links) Dynamic Behavior Visualizer (DBV) is a visual analytics environment to visualize the spatial and temporal movements and behavioral changes of an individual or a group, e.g. family within a realistic urban environment. DBV is specifically designed to visualize the adaptive behavioral changes, as they pertain to the interactions with multiple inter-dependent infrastructures, in the aftermath of a large crisis, e.g. hurricane or the detonation of an improvised nuclear device. DBV is web-enabled and thus is easily accessible to any user with access to a web browser. A novel aspect of the system is its scale and fidelity. The goal of DBV is to synthesize information and derive insight from it; detect the expected and discover the unexpected; provide timely and easily understandable assessment and the ability to piece together all this information. / Master of Science Information Visualization Visual Analytics Data Modeling Networked Models
267	Solving Mysteries with Crowds: Supporting Crowdsourced Sensemaking with a Modularized Pipeline and Context Slices Li, Tianyi 28 July 2020 (has links) The increasing volume and complexity of text data are challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but it remains a challenge to model the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this work, I explore how to crowdsource sensemaking for intelligence analysis. Specifically, I focus on the complex processes that include developing hypotheses and theories from a raw dataset and iteratively refining the analysis. I first developed Connect the Dots, a web application that implements the concept of "context slices" and supports novice crowds in building relationship networks for exploratory analysis. Then I developed CrowdIA, a software platform that implements the entire crowd sensemaking pipeline and the context slicing for each step, to enable unsupervised crowd sensemaking. Using the pipeline as a testbed, I probed the errors and bottlenecks in crowdsourced sensemaking,and suggested design recommendations for integrated crowdsourcing systems. Building on these insights and to support iterative crowd sensemaking, I developed the concept of "crowd auditing" in which an auditor examines a pipeline of crowd analyses and diagnoses the problems to steer future refinement. I explored the design space to support crowd auditing and developed CrowdTrace, a crowd auditing tool that enables novice auditors to effectively identify the important problems with the crowd analysis and create microtasks for crowd workers to fix the problems.The core contributions of this work include a pipeline that enables distributed crowd collaboration to holistic sensemaking processes, two novel concepts of "context slices" and "crowd auditing", web applications that support crowd sensemaking and auditing, as well as design implications for crowd sensemaking systems. The hope is that the crowd sensemaking pipeline can serve to accelerate research on sensemaking, and contribute to helping people conduct in-depth investigations of large collections of information. / Doctor of Philosophy / In today's world, we have access to large amounts of data that provide opportunities to solve problems at unprecedented depths and scales. While machine learning offers powerful capabilities to support data analysis, to extract meaning from raw data is cognitively demanding and requires significant person-power. Crowdsourcing aggregates human intelligence, yet it remains a challenge for many distributed agents to collaborate asynchronously and meaningfully. The contribution of this work is to explore how to use crowdsourcing to make sense of the copious and complex data. I first implemented the concept of ``context slices'', which split up complex sensemaking tasks by context, to support meaningful division of work. I developed a web application, Connect the Dots, which generates relationship networks from text documents with crowdsourcing and context slices. Then I developed a crowd sensemaking pipeline based on the expert sensemaking process. I implemented the pipeline as a web platform, CrowdIA, which guides crowds to solve mysteries without expert intervention. Using the pipeline as a testbed, I probed the errors and bottlenecks in crowd sensemaking and provided design recommendations for crowd intelligence systems. Finally, I introduced the concept of ``crowd auditing'', in which an auditor examines a pipeline of crowd analyses and diagnoses the problems to steer a top-down path of the pipeline and refine the crowd analysis. The hope is that the crowd sensemaking pipeline can serve to accelerate research on sensemaking, and contribute to helping people conduct in-depth investigations of large collections of data. Sensemaking Text Analytics Intelligence Analysis Mystery Solving Investigation Crowdsourcing
268	Sensemaking in Immersive Space to Think: Exploring Evolution, Expertise, Familiarity, and Organizational Strategies Davidson, Kylie Marie 20 August 2024 (has links) Sensemaking is the way in which we understand the world around us. Pirolli and Card developed a sensemaking model related to intelligence analysis, which involves taking raw, unstructured data, analyzing it, and presenting a report of the findings. With lower-cost immersive technologies becoming more popular, new opportunities exist to leverage embodied and distributed cognition to better support sensemaking by providing vast, immersive space for creating meaningful schemas (organizational structures) during an analysis task. This work builds on prior work in immersive analytics on the concept of Immersive Space to Think (IST), which provides analysts with immersive space to physically navigate and use to organize information during a sensemaking task. In this work, we performed several studies that aimed to understand how IST supports sensemaking and how we can develop additional features to better aid analysts while they complete sensemaking in immersive analytics systems, focusing on non-quantitative data analysis. In a series of exploratory user studies, we aimed to understand how users' sensemaking process evolves during multiple session analyses, which identified how the participants refined their use of the immersive space into later stages of the sensemaking process. Another exploratory user study highlighted how professional analysts and novice users share many similarities in immersive analytic tool usage during sensemaking within IST. In addition to looking at multi-session analysis tasks, we also explored how sensemaking strategies change as users become more familiar with the immersive analytics tool usage in an exploratory study that utilized multiple analysis tasks completed over a series of three user study sessions. Lastly, we conducted a comparative user study to evaluate how the addition of new organizational features, clustering, and linking affect sensemaking within IST. Overall, our studies expanded the IST tool set and gathered an enhanced understanding of how immersive space is utilized during analysis tasks within IST. / Doctor of Philosophy / Sensemaking is a process we do in our daily lives. It is how we understand the world around us, make decisions, and complete complex analyses, like journalists writing stories or detectives solving cases. Sensemaking involves gathering information, making sense of it, developing hypotheses, and drawing conclusions, similar to writing a report. This work builds on prior work in Immersive Space to Think (IST), which is a concept of using immersive technologies (Virtual /Augmented Reality) to support sensemaking by providing vast 3D space for organizing the data used in a sensemaking task. Additionally, using these technologies to support sensemaking provides benefits such as increased space for analysis, increased engagement, and natural user interaction, which allow us to interact with information used during sensemaking tasks in new ways. In IST, users are able to move virtual documents around in the space around them to support their analysis process. In this work, we ran a study focused on multi-session analysis within IST, revealing how users refined their document placements over time while completing sensemaking tasks within IST. We also ran a study to understand how professional analysts' and novice users' analysis with IST differed in the IST tool usage. In another user study, we explored how users' strategies for sensemaking and document layouts changed as they became more familiar with the IST tool. Lastly, we conducted a comparative user study to evaluate how new features like clustering and linking affected analysis within IST. Overall, our work contributed to an enhanced understanding of how immersive space is utilized during analysis tasks within IST. Human-Computer Interaction Immersive Analytics Sensemaking Visualization Distributed Cognition
269	Online Denoising Solutions for Forecasting Applications Khadivi, Pejman 08 September 2016 (has links) Dealing with noisy time series is a crucial task in many data-driven real-time applications. Due to the inaccuracies in data acquisition, time series suffer from noise and instability which leads to inaccurate forecasting results. Therefore, in order to improve the performance of time series forecasting, an important pre-processing step is the denoising of data before performing any action. In this research, we will propose various approaches to tackle the noisy time series in forecasting applications. For this purpose, we use different machine learning methods and information theoretical approaches to develop online denoising algorithms. In this dissertation, we propose four categories of time series denoising methods that can be used in different situations, depending on the noise and time series properties. In the first category, a seasonal regression technique is proposed for the denoising of time series with seasonal behavior. In the second category, multiple discrete universal denoisers are developed that can be used for the online denoising of discrete value time series. In the third category, we develop a noisy channel reversal model based on the similarities between time series forecasting and data communication and use that model to deploy an out-of-band noise filtering in forecasting applications. The last category of the proposed methods is deep-learning based denoisers. We use information theoretic concepts to analyze a general feed-forward deep neural network and to prove theoretical bounds for deep neural networks behavior. Furthermore, we propose a denoising deep neural network method for the online denoising of time series. Real-world and synthetic time series are used for numerical experiments and performance evaluations. Experimental results show that the proposed methods can efficiently denoise the time series and improve their quality. / Ph. D. Data analytics Denoising Information theory Time series Forecasting
270	Improving the Interoperability of the OpenDSA eTextbook System Wonderly, Jackson Daniel 07 October 2019 (has links) In recent years there has been considerable adoption of the IMS Learning Tools Interoperability (LTI) standard among both Learning Management Systems (LMS), and learning applications. The LTI standard defines a way to securely connect learning applications and tools with platforms like LMS, enabling content from external learning tools to appear as if it were a native part of the LMS, and enabling these learning tools to send users' scores directly to the gradebook in the LMS. An example of such a learning tool is the OpenDSA eTextbook system which provides materials that cover a variety of Computer Science-related topics, incorporating hundreds of interactive visualizations and auto-graded exercises. Previous work turned OpenDSA into an LTI tool provider, allowing for OpenDSA eTextbooks to be integrated with the Canvas LMS. In this thesis, we further explore the problem of connecting educational systems while documenting challenges, issues, and design rationales. We expand upon the existing OpenDSA LTI infrastructure by turning OpenDSA into an LTI tool consumer, thus enabling OpenDSA to better integrate content from other LTI tool providers. We also describe how we expanded OpenDSA's LTI tool provider functionality to increase the level of granularity at which OpenDSA content can be served, and how we implemented support for several LMS, including challenges faced and remaining issues. Finally, we discuss the problem of sharing analytics data among educational systems, and outline an architecture that could be used for this purpose. / Master of Science / In recent years there has been considerable adoption of the IMS Learning Tools Interoperability (LTI) standard among Learning Management Systems (LMS) like Blackboard and Canvas, and among learning tools. The LTI standard allows for learning tools to be securely connected with platforms like LMS, enabling content from external learning tools to appear as if it were built into the LMS, and enabling these learning tools to send users’ scores directly to the gradebook in the LMS. An example of such a learning tool is the OpenDSA online textbook system which provides materials that cover a variety of Computer Science-related topics, incorporating hundreds of interactive visualizations and auto-graded exercises. Previous work enabled OpenDSA textbooks to be connected with the Canvas LMS using LTI. In this thesis, we further explore the problem of connecting educational systems while documenting challenges, issues, and design rationales. We expand the existing OpenDSA system to allow OpenDSA to better integrate content from other learning tools. We also describe how we expanded OpenDSA’s features to increase number of ways that OpenDSA content can be consumed, and how we implemented support for adding OpenDSA content to several LMS, including challenges faced and remaining issues. Finally, we discuss the problem of sharing analytics data among educational systems, and outline a potential way to connect educational systems for this purpose. learning tools interoperability learning analytics content reusability computer science education

Search results