Spelling suggestions: "subject:"data 2analysis"" "subject:"data 3analysis""
11 |
Automated Machine Learning: Intellient Binning Data Preparation and Regularized Regression ClassfierZhu, Jianbin 01 January 2023 (has links) (PDF)
Automated machine learning (AutoML) has become a new trend which is the process of automating the complete pipeline from the raw dataset to the development of machine learning model. It not only can relief data scientists' works but also allows non-experts to finish the jobs without solid knowledge and understanding of statistical inference and machine learning. One limitation of AutoML framework is the data quality differs significantly batch by batch. Consequently, fitted model quality for some batches of data can be very poor due to distribution shift for some numerical predictors. In this dissertation, we develop an intelligent binning to resolve this problem. In addition, various regularized regression classifiers (RRCs) including Ridge, Lasso and Elastic Net regression have been tested to enhance model performance further after binning. We focus on the binary classification problem and have developed an AutoML framework using Python to handle the entire data preparation process including data partition and intelligent binning. This system has been tested extensively by simulations and real datasets analyses and the results have shown that (1) All the models perform better with intelligent binding for both balanced and imbalance binary classification problem. (2) Regression-based methods are more sensitive than tree-based methods using intelligent binning. RRCs can work better than other tree methods by using intelligent binning technique. (3) Weighted RRC can obtain the best results compared to other methods. (4) Our framework is an effective and reliable tool to conduct AutoML.
|
12 |
An Evaluation of the Performance of Proc ARIMA's Identify Statement: A Data-Driven Approach using COVID-19 Cases and Deaths in FloridaShahela, Fahmida Akter 01 January 2021 (has links) (PDF)
Understanding data on novel coronavirus (COVID-19) pandemic, and modeling such data over time are crucial for decision making at managing, fighting, and controlling the spread of this emerging disease. This thesis work looks at some aspects of exploratory analysis and modeling of COVID-19 data obtained from the Florida Department of Health (FDOH). In particular, the present work is devoted to data collection, preparation, description, and modeling of COVID-19 cases and deaths reported by FDOH between March 12, 2020, and April 30, 2021. For modeling data on both cases and deaths, this thesis utilized an autoregressive integrated moving average (ARIMA) times series model. The "IDENTIFY" statement of SAS PROC ARIMA suggests a few competing models with suggested values of the parameter p (the order of the Autoregressive model), d (the order of the differencing), and q (the order of the Moving Average model). All suggested models are then compared using AIC (Akaike Information Criterion), SBC (Schwarz Bayes Criterion), and MAE (Mean Absolute Error) values, and the best-fitting models are then chosen with smaller values of the above model comparison criteria. To evaluate the performance of the model selected under this modeling approach, the procedure is repeated using the first six month's data and forecasting the next 7 days data, nine month's data and forecasting the next 7 days data, and then all reported FDOH data from March 12, 2020, to April 30, 2021, and forecasting the future data. The findings of exploratory data analysis that suggests higher COVID-19 cases for females compared to males and higher male deaths compared to females are taken into consideration by evaluating the performance of final models by gender for both cases and deaths' data reported by FDOH. The gender-specific models appear to be better under model comparison criteria Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to models based on gender aggregated data. It is observed that the fitted models reasonably predicted the future numbers of confirmed cases and deaths. Given similarities in reported COVID-19 data, the proposed modeling approach can be applied to data in the USA and many other States, and countries around the world.
|
13 |
A Simulation-Based Task Analysis using Agent-Based, Discrete Event and System Dynamics SimulationAngelopoulou, Anastasia 01 January 2015 (has links)
Recent advances in technology have increased the need for using simulation models to analyze tasks and obtain human performance data. A variety of task analysis approaches and tools have been proposed and developed over the years. Over 100 task analysis methods have been reported in the literature. However, most of the developed methods and tools allow for representation of the static aspects of the tasks performed by expert system-driven human operators, neglecting aspects of the work environment, i.e. physical layout, and dynamic aspects of the task. The use of simulation can help face the new challenges in the field of task analysis as it allows for simulation of the dynamic aspects of the tasks, the humans performing them, and their locations in the environment. Modeling and/or simulation task analysis tools and techniques have been proven to be effective in task analysis, workload, and human reliability assessment. However, most of the existing task analysis simulation models and tools lack features that allow for consideration of errors, workload, level of operator's expertise and skills, among others. In addition, the current task analysis simulation tools require basic training on the tool to allow for modeling the flow of task analysis process and/or error and workload assessment. The modeling process is usually achieved using drag and drop functionality and, in some cases, programming skills. This research focuses on automating the modeling process and simulating individuals (or groups of individuals) performing tasks in a dynamic work environment in any domain. The main objective of this research is to develop a universal tool that allows for modeling and simulation of task analysis models in a short amount of time with limited need for training or knowledge of modeling and simulation theory. A Universal Task Analysis Simulation Modeling (UTASiMo) tool can be used for automatically generating simulation models that analyze the tasks performed by human operators. UTASiMo is a multi-method modeling and simulation tool developed as a combination of agent-based, discrete event, and system dynamics simulation models. A generic multi-method modeling and simulation framework, named 3M&S Framework, as well as the Unified Modeling Language have been used for the design of the conceptual model and the implementation of the simulation tool. UTASiMo-generated models are dynamically created during run-time based on user inputs. The simulation results include estimations of operator workload, task completion time, and probability of human errors based on human operator variability and task structure.
|
14 |
Observations of the Copenhagen Networks StudyCantrell, Michael A 01 June 2019 (has links) (PDF)
Attribute-rich longitudinal datasets of any kind are extremely rare. In 2012 and 2013, the SensibleDTU project created such a dataset using approximately 1,000 university students. Since then, a large number of studies have been performed using this dataset to ask various questions about social dynamics. This thesis delves into this dataset in an effort to explore previously unanswered questions. First, we define and identify social encounters in order to ask questions about face-to-face interaction networks. Next, we isolate students who send and receive disproportionately high numbers of phone calls and text messages to see how these groups compare to the overall population. Finally, we attempt to identify individual class schedules based solely on Bluetooth scans collected by smart phones. Our results from analyzing the phone call and text message logs as well as social encounters indicate that our methods are effective in studying and understanding social behavior.
|
15 |
Improvements in and Relating to Processing Apparatus & MethodNoras, James M., Jones, Steven M.R., Rajamani, Haile S., Shepherd, Simon J., Van Eetvelt, Peter 25 May 2004 (has links)
No
|
16 |
Large-scale data analysis using the Wigner functionEarnshaw, Rae A., Lei, Ci, Li, Jing, Mugassabi, Souad, Vourdas, Apostolos January 2012 (has links)
No / Large-scale data are analysed using the Wigner function. It is shown that the ‘frequency variable’ provides important information, which is lost with other techniques. The method is applied to ‘sentiment analysis’ in data from social networks and also to financial data.
|
17 |
Models for Univariate and Multivariate Analysis of Longitudinal and Clustered DataLuo, Dandan Unknown Date
No description available.
|
18 |
Industrial Batch Data Analysis Using Latent Variable MethodsRodrigues, Cecilia 09 1900 (has links)
Currently most batch processes run in an open loop manner with respect to final product quality, regardless of the performance obtained. This fact, allied with the increased industrial importance of batch processes, indicates that there is a pressing need for the development and dissemination of automated batch quality control techniques that suit present industrial needs. Within this context, the main objective of the current work is to exemplify the use of empirical latent variable methods to reduce product quality variability in batch processes. These methods are also known as multiway principal component analysis (MPCA) and partial least squares (MPLS) and were originally introduced by Nomikos and MacGregor (1992, 1994, 1995a and 1995b ). Their use is tied with the concepts of statistical process control (SPC) and lead to incremental process improvements. Throughout this thesis three different sets of industrial sets of data, originating from different batch process were analyzed. The first section of this thesis (Chapter 3) demonstrates how MPCA and multi-block, multiway, partial least squares (MB-MPLS) methods can be successfully used to troubleshoot an industrial batch unit in order to identify optimal process conditions with respect to quality. Additionally, approaches to batch data laundering are proposed. The second section (Chapter 4) elaborates on the use of a MPCA model to build a single, all-encompassing, on-line monitoring scheme for the heating phase of a multi-grade batch annealing process. Additionally, this same data set is used to present a simple alignment technique for batch data when on-line monitoring is intended (Chapter 5). This technique is referred to as pre-alignment and it relies on the use of a PLS model to predict the duration of new batches. Also, various methods for dealing with matrices containing different sized observations are proposed and evaluated. Finally, the last section (Chapter 6) deals with end-point prediction of a condensation polymerization process. / Thesis / Master of Applied Science (MASc)
|
19 |
Application of extreme value theoryHakimi Sibooni, J. January 1988 (has links)
No description available.
|
20 |
MAPS OF THE MAGELLANIC CLOUDS FROM COMBINED SOUTH POLE TELESCOPE AND PLANCK DATACrawford, T. M., Chown, R., Holder, G. P., Aird, K. A., Benson, B. A., Bleem, L. E., Carlstrom, J. E., Chang, C. L., Cho, H-M., Crites, A. T., Haan, T. de, Dobbs, M. A., George, E. M., Halverson, N. W., Harrington, N. L., Holzapfel, W. L., Hou, Z., Hrubes, J. D., Keisler, R., Knox, L., Lee, A. T., Leitch, E. M., Luong-Van, D., Marrone, D. P., McMahon, J. J., Meyer, S. S., Mocanu, L. M., Mohr, J. J., Natoli, T., Padin, S., Pryke, C., Reichardt, C. L., Ruhl, J. E., Sayre, J. T., Schaffer, K. K., Shirokoff, E., Staniszewski, Z., Stark, A. A., Story, K. T., Vanderlinde, K., Vieira, J. D., Williamson, R. 09 December 2016 (has links)
We present maps of the Large and Small Magellanic Clouds from combined South Pole Telescope (SPT) and Planck data. The Planck satellite observes in nine bands, while the SPT data used in this work were taken with the three-band SPT-SZ camera, The SPT-SZ bands correspond closely to three of the nine Planck bands, namely those centered at 1.4, 2.1, and 3.0 mm. The angular resolution of the Planck data ranges from 5 to 10 arcmin, while the SPT resolution ranges from 1.0 to 1.7 arcmin. The combined maps take advantage of the high resolution of the SPT data and the long-timescale stability of the space-based Planck observations to deliver robust brightness measurements on scales from the size of the maps down to similar to 1 arcmin. In each band, we first calibrate and color-correct the SPT data to match the Planck data, then we use noise estimates from each instrument and knowledge of each instrument's beam to make the inverse-variance-weighted combination of the two instruments' data as a function of angular scale. We create maps assuming a range of underlying emission spectra and at a range of final resolutions. We perform several consistency tests on the combined maps and estimate the expected noise in measurements of features in them. We compare maps from this work to those from the Herschel HERITAGE survey, finding general consistency between the data sets. All data products described in this paper are available for download from the NASA Legacy Archive for Microwave Background Data Analysis server.
|
Page generated in 0.043 seconds