• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 625
  • 79
  • 64
  • 58
  • 34
  • 25
  • 24
  • 21
  • 10
  • 8
  • 8
  • 4
  • 3
  • 3
  • 2
  • Tagged with
  • 1190
  • 542
  • 236
  • 218
  • 206
  • 189
  • 189
  • 172
  • 156
  • 152
  • 147
  • 142
  • 130
  • 128
  • 125
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

<b>Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative models for confidence intervals and Free-Knot changepoint models</b>

Sheng Zhang (18468615) 11 June 2024 (has links)
<p dir="ltr">We studied the relationship between subsample size and the accuracy of resulted estimation under big data setup.</p><p dir="ltr">We also proposed a novel approach to the construction of confidence intervals based on improved concentration inequalities.</p><p dir="ltr">Lastly, we studied irregular change-point models using free-knot splines.</p>
262

Online Review Analytics: New Methods for discovering Key Product Quality and Service Concerns

Zaman, Nohel 09 July 2019 (has links)
The purpose of this dissertation intends to discover as well as categorize safety concern reports in online reviews by using key terms prevalent in sub-categories of safety concerns. This dissertation extends the literature of semi-automatic text classification methodology in monitoring and classifying product quality and service concerns. We develop various text classification methods for finding key concerns across a diverse set of product and service categories. Additionally, we generalize our results by testing the performance of our methodologies on online reviews collected from two different data sources (Amazon product reviews and Facebook hospital service reviews). Stakeholders such as product designers and safety regulators can use the semi-automatic classification procedure to subcategorize safety concerns by injury type and narrative type (Chapter 1). We enhance the text classification approach by proposing a Risk Assessment Model for quality management (QM) professionals, safety regulators, and product designers to allow them to estimate overall risk level of specific products by analyzing consumer-generated content in online reviews (Chapter 2). Monitoring and prioritizing the hazard risk levels of products will help the stakeholders to make appropriate actions on mitigating the risk of product safety. Lastly, the text classification approach discovers and ranks aspects of services that predict overall user satisfaction (Chapter 3). The key service terms are beneficial for healthcare providers to rapidly trace specific service concerns for improving the hospital services. / Doctor of Philosophy / This dissertation extends past studies by examining safety surveillance of online reviews. We examine online reviews reporting specific categories of safety concerns and contrast them with reviews not reporting these specific safety concerns. Business and regulators are benefited in detecting, categorizing, and prioritizing safety concerns across product categories. We use key terms prevalent in domain-related safety concerns for granular analysis of consumer reviews. Secondly, beyond utilizing the key terms to discover specific hazard incidents, safety regulators and manufacturers may use the extended risk assessment framework to estimate the risk severity, risk likelihood, and overall risk level of a specific product. The model could be useful for product safety practitioners in product risk identification and mitigation. Finally, this dissertation identifies the aspects of service quality concerns present in online hospital reviews. This study uses text analytics method by using key terms to detect these specific service concerns and hence determine primary rationales for patient feedback on hospital services. Managerially, this information helps to prioritize the areas in greatest need of improvement of hospital services. Additionally, generating key terms for a particular service attribute aids health care policy makers and providers in rapidly monitoring specific concerns and adjusting policies or resources to better serve patient
263

Characterizing Human Driving Behavior Through an Analysis of Naturalistic Driving Data

Ali, Gibran 23 January 2023 (has links)
Reducing the number of motor vehicle crashes is one of the major challenges of our times. Current strategies to reduce crash rates can be divided into two groups: identifying risky driving behavior prior to crashes to proactively reduce risk and automating some or all human driving tasks using intelligent vehicle systems such as Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS). For successful implementation of either strategy, a deeper understanding of human driving behavior is essential. This dissertation characterizes human driving behavior through an analysis of a large naturalistic driving study and offers four major contributions to the field. First, it describes the creation of the Surface Accelerations Reference, a catalog of all longitudinal and lateral surface accelerations found in the Second Strategic Highway Research Program Naturalistic Driving Study (SHRP 2 NDS). SHRP 2 NDS is the largest naturalistic driving study in the world with 34.5 million miles of data collected from over 3,500 participants driving in six separate locations across the United States. An algorithm was developed to detect each acceleration epoch and summarize key parameters, such as the mean and maxima of the magnitude, roadway properties, and driver inputs. A statistical profile was then created for each participant describing their acceleration behavior in terms of rates, percentiles, and the magnitude of the strongest event in a distance threshold. The second major contribution is quantifying the effect of several factors that influence acceleration behavior. The rate of mild to harsh acceleration epochs was modeled using negative binomial distribution-based generalized linear mixed effect models. Roadway speed category, driver age, driver gender, vehicle class, and location were used as fixed effects, and a unique participant identifier was as the random effect. Subcategories of each fixed effect were compared using incident rate ratios. Roadway speed category was found to have the largest effect on acceleration behavior, followed by driver age, vehicle class, and location. This methodology accounts for the major influences while simultaneously ensuring that the comparisons are meaningful and not driven by coincidences of data collection. The third major contribution is the extraction of acceleration-based long-term driving styles and determining their relationship to crash risk. Rates of acceleration epochs experienced on ≤ 30 mph roadways were used to cluster the participants into four groups. The metrics to cluster the participants were chosen so that they represent long-term driving style and not short-term driving behavior being influenced by transient traffic and environmental conditions. The driving style was also correlated to driving risk by comparing the crash rates, near-crash rates, and speeding behavior of the participants. Finally, the fourth major contribution is the creation of a set of interactive analytics tools that facilitate quick characterization of human driving during regular as well as safety-critical driving events. These tools enable users to answer a large and open-ended set of research questions that aid in the development of ADAS and ADS components. These analytics tools facilitate the exploration of queries such as how often do certain scenarios occur in naturalistic driving, what is the distribution of key metrics during a particular scenario, or what is the relative composition of various crash datasets? Novel visual analytics principles such as video on demand have been implemented to accelerate the sense-making loop for the user. / Doctor of Philosophy / Naturalistic driving studies collect data from participants driving their own vehicles over an extended period. These studies offer unique perspectives in understanding driving behavior by capturing routine and rare events. Two important aspects of understanding driving behavior are longitudinal acceleration, which indicates how people speed up or slow down, and lateral acceleration, which shows how people take turns. In this dissertation, millions of miles of driving data were analyzed to create an open access acceleration database representing the driving profiles of thousands of drivers. These profiles are useful to understand and model human driving behavior, which is essential for developing advanced vehicle systems and smart roadway infrastructure. The acceleration database was used to quantify the effect of various roadway properties, driver demographics, vehicle classification, and environmental factors on acceleration driving behavior. The acceleration database was also used to define distinct driving styles and their relationship to driving risk. A set of interactive analytics tools was developed that leverage naturalistic driving data by enabling users to ask a large set of questions and facilitate open-ended analysis. Novel visualization and data presentation techniques were developed to help users extract deeper insight about driving behavior faster than previously exiting tools. These tools will aid in the development and testing of automated driving systems and advanced driver assistance systems.
264

Explainable and Network-based Approaches for Decision-making in Emergency Management

Tabassum, Anika 19 October 2021 (has links)
Critical Infrastructures (CIs), such as power, transportation, healthcare, etc., refer to systems, facilities, technologies, and networks vital to national security, public health, and socio-economic well-being of people. CIs play a crucial role in emergency management. For example, the recent Hurricane Ida, Texas Winter storm, colonial cyber-attack that occurred during 2021 in the US, shows the CIs are highly inter-dependent with complex interactions. Hence power system failures and shutdown of natural gas pipelines, in turn, led to debilitating impacts on communication, waste systems, public health, etc. Consider power failures during a disaster, such as a hurricane. Subject Matter Experts (SMEs) such as emergency management authorities may be interested in several decision-making tasks. Can we identify disaster phases in terms of the severity of damage from analyzing changes in power failures? Can we tell the SMEs which power grids or regions are the most affected during each disaster phase and need immediate action to recover? Answering these questions can help SMEs to respond quickly and send resources for fast recovery from damage. Can we systematically provide how the failure of different power grids may impact the whole CIs due to inter-dependencies? This can help SMEs to better prepare and mitigate the risks by improving system resiliency. In this thesis, we explore problems to efficiently operate decision-making tasks during a disaster for emergency management authorities. Our research has two primary directions, guide decision-making in resource allocation and plans to improve system resiliency. Our work is done in collaboration with the Oak Ridge National Laboratory to contribute impactful research in real-life CIs and disaster power failure data. 1. Explainable resource allocation: In contrast to the current interpretable or explainable model that provides answers to understand a model output, we view explanations as answers to guide resource allocation decision-making. In this thesis, we focus on developing a novel model and algorithm to identify disaster phases from changes in power failures. Also, pinpoint the regions which can get most affected at each disaster phase so the SMEs can send resources for fast recovery. 2. Networks for improving system resiliency: We view CIs as a large heterogeneous network with nodes as infrastructure components and dependencies as edges. Our goal is to construct a visual analytic tool and develop a domain-inspired model to identify the important components and connections to which the SMEs need to focus and better prepare to mitigate the risk of a disaster. / Doctor of Philosophy / Critical Infrastructure Systems (CIs) entitle multiple infrastructures valuable for maintaining public life and national security, e.g., power, water, transportation. US Federal Emergency Management Agency (FEMA) aims to protect the nation and citizens by mitigating all hazards during natural or man-made disasters. For this, they aim to adopt different decision-making strategies efficiently. E.g., During an ongoing disaster, when to quickly send resources, which regions to send resources first, etc. Besides, they also need to plan how to prepare for a future disaster and which CIs need maintenance to improve system resiliency. We explore several data-mining problems which can guide FEMA towards developing efficient decision-making strategies. Our thesis emphasizes explainable and network-based models and algorithms to help decision-making operations for emergency management experts by leveraging critical infrastructures data.
265

Bridging Cognitive Gaps Between User and Model in Interactive Dimension Reduction

Wang, Ming 05 May 2020 (has links)
High-dimensional data is prevalent in all domains but is challenging to explore. Analysis and exploration of high-dimensional data are important for people in numerous fields. To help people explore and understand high-dimensional data, Andromeda, an interactive visual analytics tool, has been developed. However, our analysis uncovered several cognitive gaps relating to the Andromeda system: users do not realize the necessity of explicitly highlighting all the relevant data points; users are not clear about the dimensional information in the Andromeda visualization; and the Andromeda model cannot capture user intentions when constructing and deconstructing clusters. In this study, we designed and implemented solutions to address these gaps. Specifically, for the gap in highlighting all the relevant data points, we introduced a foreground and background view and distance lines. Our user study with a group of undergraduate students revealed that the foreground and background views and distance lines could significantly alleviate the highlighting issue. For the gap in understanding visualization dimensions, we implemented a dimension-assist feature. The results of a second user study with students with various backgrounds suggested that the dimension-assist feature could make it easier for users to find the extremum in one dimension and to describe correlations among multiple dimensions; however, the dimension-assist feature had only a small impact on characterizing the data distribution and assisting users in understanding the meanings of the weighted multidimensional scaling (WMDS) plot axes. Regarding the gap in creating and deconstructing clusters, we implemented a solution utilizing random sampling. A quantitative analysis of the random sampling strategy was performed, and the results demonstrated that the strategy improved Andromeda's capabilities in constructing and deconstructing clusters. We also applied the random sampling to two-point manipulations, making the Andromeda system more flexible and adaptable to differing data exploration tasks. Limitations are discussed, and potential future research directions are identified. / Master of Science / High-dimensional data is the dataset with hundreds or thousands of features. The animal dataset, which has been used in this study, is an example of high-dimensional dataset, since animals can be categorized by a lot of features, such as size, furry, behavior and so on. High-dimensional data is prevalent but difficult for people to analyze. For example, it is hard to find out the similarity among dozens of animals, or to find the relationship between different characterizations of animals. To help people with no statistical knowledge to analyze the high-dimensional dataset, our group developed a web-based visualization software called Andromeda, which can display data as points (such as animal data points) on a screen and allow people to interact with these points to express their similarity by dragging points on the screen (e.g., drag "Lion," "Wolf," and "Killer Whale" together because all three are hunters, forming a cluster of three animals). Therefore, it enables people to interactively analyze the hidden pattern of high-dimensional data. However, we identified several cognitive gaps that have negatively limited Andromeda's effectiveness in helping people understand high-dimensional data. Therefore, in this work, we intended to make improvements to the original Andromeda system to bridge these gaps, including designing new visual features to help people better understand how Andromeda processes and interacts with high-dimensional data and improving the underlying algorithm so that the Andromeda system can better understand people's intension during the data exploration process. We extensively evaluated our designs through both qualitative and quantitative analysis (e.g., user study on both undergraduate and graduate students and statistical testing) on our animal dataset, and the results confirmed that the improved Andromeda system outperformed the original version significantly in a series of high-dimensional data understanding tasks. Finally, the limitations and potential future research directions were discussed.
266

Online Denoising Solutions for Forecasting Applications

Khadivi, Pejman 08 September 2016 (has links)
Dealing with noisy time series is a crucial task in many data-driven real-time applications. Due to the inaccuracies in data acquisition, time series suffer from noise and instability which leads to inaccurate forecasting results. Therefore, in order to improve the performance of time series forecasting, an important pre-processing step is the denoising of data before performing any action. In this research, we will propose various approaches to tackle the noisy time series in forecasting applications. For this purpose, we use different machine learning methods and information theoretical approaches to develop online denoising algorithms. In this dissertation, we propose four categories of time series denoising methods that can be used in different situations, depending on the noise and time series properties. In the first category, a seasonal regression technique is proposed for the denoising of time series with seasonal behavior. In the second category, multiple discrete universal denoisers are developed that can be used for the online denoising of discrete value time series. In the third category, we develop a noisy channel reversal model based on the similarities between time series forecasting and data communication and use that model to deploy an out-of-band noise filtering in forecasting applications. The last category of the proposed methods is deep-learning based denoisers. We use information theoretic concepts to analyze a general feed-forward deep neural network and to prove theoretical bounds for deep neural networks behavior. Furthermore, we propose a denoising deep neural network method for the online denoising of time series. Real-world and synthetic time series are used for numerical experiments and performance evaluations. Experimental results show that the proposed methods can efficiently denoise the time series and improve their quality. / Ph. D.
267

Improving the Interoperability of the OpenDSA eTextbook System

Wonderly, Jackson Daniel 07 October 2019 (has links)
In recent years there has been considerable adoption of the IMS Learning Tools Interoperability (LTI) standard among both Learning Management Systems (LMS), and learning applications. The LTI standard defines a way to securely connect learning applications and tools with platforms like LMS, enabling content from external learning tools to appear as if it were a native part of the LMS, and enabling these learning tools to send users' scores directly to the gradebook in the LMS. An example of such a learning tool is the OpenDSA eTextbook system which provides materials that cover a variety of Computer Science-related topics, incorporating hundreds of interactive visualizations and auto-graded exercises. Previous work turned OpenDSA into an LTI tool provider, allowing for OpenDSA eTextbooks to be integrated with the Canvas LMS. In this thesis, we further explore the problem of connecting educational systems while documenting challenges, issues, and design rationales. We expand upon the existing OpenDSA LTI infrastructure by turning OpenDSA into an LTI tool consumer, thus enabling OpenDSA to better integrate content from other LTI tool providers. We also describe how we expanded OpenDSA's LTI tool provider functionality to increase the level of granularity at which OpenDSA content can be served, and how we implemented support for several LMS, including challenges faced and remaining issues. Finally, we discuss the problem of sharing analytics data among educational systems, and outline an architecture that could be used for this purpose. / Master of Science / In recent years there has been considerable adoption of the IMS Learning Tools Interoperability (LTI) standard among Learning Management Systems (LMS) like Blackboard and Canvas, and among learning tools. The LTI standard allows for learning tools to be securely connected with platforms like LMS, enabling content from external learning tools to appear as if it were built into the LMS, and enabling these learning tools to send users’ scores directly to the gradebook in the LMS. An example of such a learning tool is the OpenDSA online textbook system which provides materials that cover a variety of Computer Science-related topics, incorporating hundreds of interactive visualizations and auto-graded exercises. Previous work enabled OpenDSA textbooks to be connected with the Canvas LMS using LTI. In this thesis, we further explore the problem of connecting educational systems while documenting challenges, issues, and design rationales. We expand the existing OpenDSA system to allow OpenDSA to better integrate content from other learning tools. We also describe how we expanded OpenDSA’s features to increase number of ways that OpenDSA content can be consumed, and how we implemented support for adding OpenDSA content to several LMS, including challenges faced and remaining issues. Finally, we discuss the problem of sharing analytics data among educational systems, and outline a potential way to connect educational systems for this purpose.
268

Visual Representations and Interaction Technologies

Earnshaw, Rae A. January 2005 (has links)
No / This chapter discusses important aspects of visual representations and interaction techniques necessary to support visual analytics. It covers five primary topics. First, it addresses the need for scientific principles for depicting information. Next, it focuses on methods for interacting with visualizations and considers the opportunities available given recent developments in input and display technologies. Third, it addresses the research and technology needed to develop new visual paradigms that support analytical reasoning. Then, it discusses the impact of scale issues on the creation of effective visual representations and interactions. Finally, it considers alternative ways to construct visualization systems more efficiently
269

On the Use of Grouped Covariate Regression in Oversaturated Models

Loftus, Stephen Christopher 11 December 2015 (has links)
As data collection techniques improve, oftentimes the number of covariates exceeds the number of observations. When this happens, regression models become oversaturated and, thus, inestimable. Many classical and Bayesian techniques have been designed to combat this difficulty, with various means of combating the oversaturation. However, these techniques can be tricky to implement well, difficult to interpret, and unstable. What is proposed is a technique that takes advantage of the natural clustering of variables that can often be found in biological and ecological datasets known as the omics datasests. Generally speaking, omics datasets attempt to classify host species structure or function by characterizing a group of biological molecules, such as genes (Genomics), the proteins (Proteomics), and metabolites (Metabolomics). By clustering the covariates and regressing on a single value for each cluster, the model becomes both estimable and stable. In addition, the technique can account for the variability within each cluster, allow for the inclusion of expert judgment, and provide a probability of inclusion for each cluster. / Ph. D.
270

Knowledge Creation Analytics for Online Engineering Learning

Teo, Hon Jie 25 July 2014 (has links)
The ubiquitous use of computers and greater accessibility of the Internet have triggered widespread use of educational innovations such as online discussion forums, Wikis, Open Educational Resources, MOOCs, to name a few. These advances have led to the creation of a wide range of instructional videos, written documents and discussion archives by engineering learners seeking to expand their learning and advance their knowledge beyond the engineering classroom. However, it remains a challenging task to assess the quality of knowledge advancement on these learning platforms particularly due to the informal nature of engagement as a whole and the massive amount of learner-generated data. This research addresses this broad challenge through a research approach based on the examination of the state of knowledge advancement, analysis of relationships between variables indicative of knowledge creation and participation in knowledge creation, and identification of groups of learners. The study site is an online engineering community, All About Circuits, that serves 31,219 electrical and electronics engineering learners who contributed 503,908 messages in 65,209 topics. The knowledge creation metaphor provides the guiding theoretical framework for this research. This metaphor is based on a set of related theories that conceptualizes learning as a collaborative process of developing shared knowledge artifacts for the collective benefit of a community of learners. In a knowledge-creating community, the quality of learning and participation can be evaluated by examining the degree of collaboration and the advancement of knowledge artifacts over an extended period of time. Software routines were written in Python programming language to collect and process more than half a million messages, and to extract user-produced data from 87,263 web pages to examine the use of engineering terms, social networks and engineering artifacts. Descriptive analysis found that state of knowledge advancement varies across discussion topics and the level of engagement in knowledge creating activities varies across individuals. Non-parametric correlation analysis uncovered strong associations between topic length and knowledge creating activities, and between the total interactions experienced by individuals and individual engagement in knowledge creating activities. On the other hand, the variable of individual total membership period has week associations with individual engagement in knowledge creating activities. K-means clustering analysis identified the presence of eight clusters of individuals with varying lengths of participation and membership, and Kruskal-Wallis tests confirmed that significant differences between the clusters. Based on a comparative analysis of Kruskal-Wallis Score Means and the examination of descriptive statistics for each cluster, three groups of learners were identified: Disengaged (88% of all individuals), Transient (10%) and Engaged (2%). A comparison of Spearman Correlations between pairs of variables suggests that variable of individual active membership period exhibits stronger association with knowledge creation activities for the group of Disengaged, whereas the variable of individual total interactions exhibits stronger association with knowledge creation activities for the group of Engaged. Limitations of the study are discussed and recommendations for future work are made. / Ph. D.

Page generated in 0.0726 seconds