Global ETD Search

1	Automated Machine Learning: Intellient Binning Data Preparation and Regularized Regression Classfier Zhu, Jianbin 01 January 2023 (has links) (PDF) Automated machine learning (AutoML) has become a new trend which is the process of automating the complete pipeline from the raw dataset to the development of machine learning model. It not only can relief data scientists' works but also allows non-experts to finish the jobs without solid knowledge and understanding of statistical inference and machine learning. One limitation of AutoML framework is the data quality differs significantly batch by batch. Consequently, fitted model quality for some batches of data can be very poor due to distribution shift for some numerical predictors. In this dissertation, we develop an intelligent binning to resolve this problem. In addition, various regularized regression classifiers (RRCs) including Ridge, Lasso and Elastic Net regression have been tested to enhance model performance further after binning. We focus on the binary classification problem and have developed an AutoML framework using Python to handle the entire data preparation process including data partition and intelligent binning. This system has been tested extensively by simulations and real datasets analyses and the results have shown that (1) All the models perform better with intelligent binding for both balanced and imbalance binary classification problem. (2) Regression-based methods are more sensitive than tree-based methods using intelligent binning. RRCs can work better than other tree methods by using intelligent binning technique. (3) Weighted RRC can obtain the best results compared to other methods. (4) Our framework is an effective and reliable tool to conduct AutoML. Categorical Data Analysis
2	An Evaluation of the Performance of Proc ARIMA's Identify Statement: A Data-Driven Approach using COVID-19 Cases and Deaths in Florida Shahela, Fahmida Akter 01 January 2021 (has links) (PDF) Understanding data on novel coronavirus (COVID-19) pandemic, and modeling such data over time are crucial for decision making at managing, fighting, and controlling the spread of this emerging disease. This thesis work looks at some aspects of exploratory analysis and modeling of COVID-19 data obtained from the Florida Department of Health (FDOH). In particular, the present work is devoted to data collection, preparation, description, and modeling of COVID-19 cases and deaths reported by FDOH between March 12, 2020, and April 30, 2021. For modeling data on both cases and deaths, this thesis utilized an autoregressive integrated moving average (ARIMA) times series model. The "IDENTIFY" statement of SAS PROC ARIMA suggests a few competing models with suggested values of the parameter p (the order of the Autoregressive model), d (the order of the differencing), and q (the order of the Moving Average model). All suggested models are then compared using AIC (Akaike Information Criterion), SBC (Schwarz Bayes Criterion), and MAE (Mean Absolute Error) values, and the best-fitting models are then chosen with smaller values of the above model comparison criteria. To evaluate the performance of the model selected under this modeling approach, the procedure is repeated using the first six month's data and forecasting the next 7 days data, nine month's data and forecasting the next 7 days data, and then all reported FDOH data from March 12, 2020, to April 30, 2021, and forecasting the future data. The findings of exploratory data analysis that suggests higher COVID-19 cases for females compared to males and higher male deaths compared to females are taken into consideration by evaluating the performance of final models by gender for both cases and deaths' data reported by FDOH. The gender-specific models appear to be better under model comparison criteria Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to models based on gender aggregated data. It is observed that the fitted models reasonably predicted the future numbers of confirmed cases and deaths. Given similarities in reported COVID-19 data, the proposed modeling approach can be applied to data in the USA and many other States, and countries around the world. Categorical Data Analysis
3	A Simulation-Based Task Analysis using Agent-Based, Discrete Event and System Dynamics Simulation Angelopoulou, Anastasia 01 January 2015 (has links) Recent advances in technology have increased the need for using simulation models to analyze tasks and obtain human performance data. A variety of task analysis approaches and tools have been proposed and developed over the years. Over 100 task analysis methods have been reported in the literature. However, most of the developed methods and tools allow for representation of the static aspects of the tasks performed by expert system-driven human operators, neglecting aspects of the work environment, i.e. physical layout, and dynamic aspects of the task. The use of simulation can help face the new challenges in the field of task analysis as it allows for simulation of the dynamic aspects of the tasks, the humans performing them, and their locations in the environment. Modeling and/or simulation task analysis tools and techniques have been proven to be effective in task analysis, workload, and human reliability assessment. However, most of the existing task analysis simulation models and tools lack features that allow for consideration of errors, workload, level of operator's expertise and skills, among others. In addition, the current task analysis simulation tools require basic training on the tool to allow for modeling the flow of task analysis process and/or error and workload assessment. The modeling process is usually achieved using drag and drop functionality and, in some cases, programming skills. This research focuses on automating the modeling process and simulating individuals (or groups of individuals) performing tasks in a dynamic work environment in any domain. The main objective of this research is to develop a universal tool that allows for modeling and simulation of task analysis models in a short amount of time with limited need for training or knowledge of modeling and simulation theory. A Universal Task Analysis Simulation Modeling (UTASiMo) tool can be used for automatically generating simulation models that analyze the tasks performed by human operators. UTASiMo is a multi-method modeling and simulation tool developed as a combination of agent-based, discrete event, and system dynamics simulation models. A generic multi-method modeling and simulation framework, named 3M&S Framework, as well as the Unified Modeling Language have been used for the design of the conceptual model and the implementation of the simulation tool. UTASiMo-generated models are dynamically created during run-time based on user inputs. The simulation results include estimations of operator workload, task completion time, and probability of human errors based on human operator variability and task structure. Categorical Data Analysis
4	Graph Neural Networks for Improved Interpretability and Efficiency Pho, Patrick 01 January 2022 (has links) (PDF) Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation in their capability to make use of both network topology and node features. Graph Neural Networks (GNNs) provide an efficient framework combining both sources of information to produce accurate prediction for a wide range of tasks including node classification, link prediction, etc. The exponential growth of graph datasets drives the development of complex GNN models causing concerns about processing time and interpretability of the result. Another issue arises from the cost and limitation of collecting a large amount of annotated data for training deep learning GNN models. Apart from sampling issue, the existence of anomaly entities in the data might degrade the quality of the fitted models. In this dissertation, we propose novel techniques and strategies to overcome the above challenges. First, we present a flexible regularization scheme applied to the Simple Graph Convolution (SGC). The proposed framework inherits fast and efficient properties of SGC while rendering a sparse set of fitted parameter vectors, facilitating the identification of important input features. Next, we examine efficient procedures for collecting training samples and develop indicative measures as well as quantitative guidelines to assist practitioners in choosing the optimal sampling strategy to obtain data. We then improve upon an existing GNN model for the anomaly detection task. Our proposed framework achieves better accuracy and reliability. Lastly, we experiment with adapting the flexible regularization mechanism to link prediction task. Categorical Data Analysis Data Science
5	Change Point Detection for Streaming Data Using Support Vector Methods Harrison, Charles 01 January 2022 (has links) (PDF) Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound a set of multivariate vectors using a closed-form solution. The mathematical tractability of the LS-SVDD facilitates closed-form updates for the LS-SVDD Lagrange multipliers. The update formulae concern either adding or removing a block of observations from an existing LS-SVDD description, respectively, and thus LS-SVDD can be constructed or updated sequentially which makes it attractive for online problems with sequential data streams. LS-SVDD is applied to a variety of scenarios including online anomaly detection and sequential multiple change point detection. Categorical Data Analysis Data Science
6	Implementing a Class of Permutation Tests: The coin Package Zeileis, Achim, Wiel, Mark A. van de, Hornik, Kurt, Hothorn, Torsten 11 1900 (has links) (PDF) The R package coin implements a unified approach to permutation tests providing a huge class of independence tests for nominal, ordered, numeric, and censored data as well as multivariate data at mixed scales. Based on a rich and exible conceptual framework that embeds different permutation test procedures into a common theory, a computational framework is established in coin that likewise embeds the corresponding R functionality in a common S4 class structure with associated generic functions. As a consequence, the computational tools in coin inherit the exibility of the underlying theory and conditional inference functions for important special cases can be set up easily. Conditional versions of classical tests\|such as tests for location and scale problems in two or more samples, independence in two- or three-way contingency tables, or association problems for censored, ordered categorical or multivariate data\|can easily be implemented as special cases using this computational toolbox by choosing appropriate transformations of the observations. The paper gives a detailed exposition of both the internal structure of the package and the provided user interfaces along with examples on how to extend the implemented functionality. (authors' abstract)
7	Implementing a Class of Permutation Tests: The coin Package Hothorn, Torsten, Hornik, Kurt, van de Wiel, Mark A., Zeileis, Achim January 2007 (has links) (PDF) The R package coin implements a unified approach to permutation tests providing a huge class of independence tests for nominal, ordered, numeric, and censored data as well as multivariate data at mixed scales. Based on a rich and flexible conceptual framework that embeds different permutation test procedures into a common theory, a computational framework is established in coin that likewise embeds the corresponding R functionality in a common S4 class structure with associated generic functions. As a consequence, the computational tools in coin inherit the flexibility of the underlying theory and conditional inference functions for important special cases can be set up easily. Conditional versions of classical tests - such as tests for location and scale problems in two or more samples, independence in two- or three-way contingency tables, or association problems for censored, ordered categorical or multivariate data - can be easily be implemented as special cases using this computational toolbox by choosing appropriate transformations of the observations. The paper gives a detailed exposition of both the internal structure of the package and the provided user interfaces. / Series: Research Report Series / Department of Statistics and Mathematics
8	Statistické srovnání výsledků perkutánních, ureteroskopických a robotických operací pro obstrukci ureteropelvické junkce. / Statistical evaluation of percutan, ureteroscopic a robotic surgeries of ureteropelvic obstruction Masarovičová, Martina January 2008 (has links) The aim of this diploma thesis is statistical processing of a sample of patients that have been hospitalized and treated for ureteropelvic junction obstruction at the urological department of ÚNV Prague in last 20 years and to determine the optimal treatment method. Evaluation of surgical techniques from the surgical and economical point of creates a comprehensive image of advantages and disadvantages connected with application of a particular method and enables all participating subjects to decide in case of doubt. In this case the statistical analysis is a proper instrument, leading to find answers, however, it also gives an opportunity for discussion.
9	Using Three Different Categorical Data Analysis Techniques to Detect Differential Item Functioning Stephens-Bonty, Torie Amelia 16 May 2008 (has links) Diversity in the population along with the diversity of testing usage has resulted in smaller identified groups of test takers. In addition, computer adaptive testing sometimes results in a relatively small number of items being used for a particular assessment. The need and use for statistical techniques that are able to effectively detect differential item functioning (DIF) when the population is small and or the assessment is short is necessary. Identification of empirically biased items is a crucial step in creating equitable and construct-valid assessments. Parshall and Miller (1995) compared the conventional asymptotic Mantel-Haenszel (MH) with the exact test (ET) for the detection of DIF with small sample sizes. Several studies have since compared the performance of MH to logistic regression (LR) under a variety of conditions. Both Swaminathan and Rogers (1990), and Hildalgo and López-Pina (2004) demonstrated that MH and LR were comparable in their detection of items with DIF. This study followed by comparing the performance of the MH, the ET, and LR performance when both the sample size is small and test length is short. The purpose of this Monte Carlo simulation study was to expand on the research done by Parshall and Miller (1995) by examining power and power with effect size measures for each of the three DIF detection procedures. The following variables were manipulated in this study: focal group sample size, percent of items with DIF, and magnitude of DIF. For each condition, a small reference group size of 200 was utilized as well as a short, 10-item test. The results demonstrated that in general, LR was slightly more powerful in detecting items with DIF. In most conditions, however, power was well below the acceptable rate of 80%. As the size of the focal group and the magnitude of DIF increased, the three procedures were more likely to reach acceptable power. Also, all three procedures demonstrated the highest power for the most discriminating item. Collectively, the results from this research provide information in the area of small sample size and DIF detection. differential item functioning categorical data analysis exact test logistic regression MH Education Education Policy
10	Integrated studies on structure and formation mechanism of environmental consciousness in rural and urban China / 中国農村部と都市部における環境意識の構造と形成のメカニズムに関する総合的研究 / チュウゴクノウソンブトトシブニオケルカンキョウイシキノコウゾウトケイセイノメカニズムニカンスルソウゴウテキケンキュウ陳艶艶, Yanyan Chen 22 March 2016 (has links) 中国における都市部と農村部異なる制度的・社会経済的背景により、独特な環境意識を生まれていると考えられる。本研究は、現地調査によりデータを収集し、統計分析を駆使したことにより、都市部と農村部における環境意識の特有の構造と形成メカニズムを解明することを目的とする。先行研究の成果を踏まえ、都市部と農村部の社会構造を考慮し、環境意識に関する総合的な理論モデルを提案し、環境意識の三つのディメンションに分けて展開する。理論的に検討することと実証的なデータの分析結果を基に、環境意識形成の内在因子と外部影響要因を明らかにした。 / Long-time institutional and socioeconomic segmentations make rural China become a distinctive society from the urban China. The remarkable rural and urban division in China supplies us a good context to explore the formation and diverse social facets of environmental consciousness. This study aims to clarify the specific structure and formation mechanism of environmental consciousness under the different social backgrounds of rural and urban China based on the statistical results derived from survey data. Three dimensions of environmental consciousness and an integrated theoretical framework which involves both social structural and social psychological variables are proposed. Based on the proposed theoretical framework and examined data analyses, the inner causes and externally influencing factors of environmental consciousness were clarified. / 博士(文化情報学) / Doctor of Culture and Information Science / 同志社大学 / Doshisha University Value Judgment Environmental Attitude Behavior Intention Social Survey Categorical Data Analysis

Search results