Global ETD Search

1001	An Evaluation of the Performance of Proc ARIMA's Identify Statement: A Data-Driven Approach using COVID-19 Cases and Deaths in Florida Shahela, Fahmida Akter 01 January 2021 (has links) (PDF) Understanding data on novel coronavirus (COVID-19) pandemic, and modeling such data over time are crucial for decision making at managing, fighting, and controlling the spread of this emerging disease. This thesis work looks at some aspects of exploratory analysis and modeling of COVID-19 data obtained from the Florida Department of Health (FDOH). In particular, the present work is devoted to data collection, preparation, description, and modeling of COVID-19 cases and deaths reported by FDOH between March 12, 2020, and April 30, 2021. For modeling data on both cases and deaths, this thesis utilized an autoregressive integrated moving average (ARIMA) times series model. The "IDENTIFY" statement of SAS PROC ARIMA suggests a few competing models with suggested values of the parameter p (the order of the Autoregressive model), d (the order of the differencing), and q (the order of the Moving Average model). All suggested models are then compared using AIC (Akaike Information Criterion), SBC (Schwarz Bayes Criterion), and MAE (Mean Absolute Error) values, and the best-fitting models are then chosen with smaller values of the above model comparison criteria. To evaluate the performance of the model selected under this modeling approach, the procedure is repeated using the first six month's data and forecasting the next 7 days data, nine month's data and forecasting the next 7 days data, and then all reported FDOH data from March 12, 2020, to April 30, 2021, and forecasting the future data. The findings of exploratory data analysis that suggests higher COVID-19 cases for females compared to males and higher male deaths compared to females are taken into consideration by evaluating the performance of final models by gender for both cases and deaths' data reported by FDOH. The gender-specific models appear to be better under model comparison criteria Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to models based on gender aggregated data. It is observed that the fitted models reasonably predicted the future numbers of confirmed cases and deaths. Given similarities in reported COVID-19 data, the proposed modeling approach can be applied to data in the USA and many other States, and countries around the world. Categorical Data Analysis
1002	A Simulation-Based Task Analysis using Agent-Based, Discrete Event and System Dynamics Simulation Angelopoulou, Anastasia 01 January 2015 (has links) Recent advances in technology have increased the need for using simulation models to analyze tasks and obtain human performance data. A variety of task analysis approaches and tools have been proposed and developed over the years. Over 100 task analysis methods have been reported in the literature. However, most of the developed methods and tools allow for representation of the static aspects of the tasks performed by expert system-driven human operators, neglecting aspects of the work environment, i.e. physical layout, and dynamic aspects of the task. The use of simulation can help face the new challenges in the field of task analysis as it allows for simulation of the dynamic aspects of the tasks, the humans performing them, and their locations in the environment. Modeling and/or simulation task analysis tools and techniques have been proven to be effective in task analysis, workload, and human reliability assessment. However, most of the existing task analysis simulation models and tools lack features that allow for consideration of errors, workload, level of operator's expertise and skills, among others. In addition, the current task analysis simulation tools require basic training on the tool to allow for modeling the flow of task analysis process and/or error and workload assessment. The modeling process is usually achieved using drag and drop functionality and, in some cases, programming skills. This research focuses on automating the modeling process and simulating individuals (or groups of individuals) performing tasks in a dynamic work environment in any domain. The main objective of this research is to develop a universal tool that allows for modeling and simulation of task analysis models in a short amount of time with limited need for training or knowledge of modeling and simulation theory. A Universal Task Analysis Simulation Modeling (UTASiMo) tool can be used for automatically generating simulation models that analyze the tasks performed by human operators. UTASiMo is a multi-method modeling and simulation tool developed as a combination of agent-based, discrete event, and system dynamics simulation models. A generic multi-method modeling and simulation framework, named 3M&S Framework, as well as the Unified Modeling Language have been used for the design of the conceptual model and the implementation of the simulation tool. UTASiMo-generated models are dynamically created during run-time based on user inputs. The simulation results include estimations of operator workload, task completion time, and probability of human errors based on human operator variability and task structure. Categorical Data Analysis
1003	Observations of the Copenhagen Networks Study Cantrell, Michael A 01 June 2019 (has links) (PDF) Attribute-rich longitudinal datasets of any kind are extremely rare. In 2012 and 2013, the SensibleDTU project created such a dataset using approximately 1,000 university students. Since then, a large number of studies have been performed using this dataset to ask various questions about social dynamics. This thesis delves into this dataset in an effort to explore previously unanswered questions. First, we define and identify social encounters in order to ask questions about face-to-face interaction networks. Next, we isolate students who send and receive disproportionately high numbers of phone calls and text messages to see how these groups compare to the overall population. Finally, we attempt to identify individual class schedules based solely on Bluetooth scans collected by smart phones. Our results from analyzing the phone call and text message logs as well as social encounters indicate that our methods are effective in studying and understanding social behavior. Data Analysis Graph Theory
1004	High-Performance Processing of Continuous Uncertain Data Tran, Thanh Thi Lac 01 May 2013 (has links) Uncertain data has arisen in a growing number of applications such as sensor networks, RFID systems, weather radar networks, and digital sky surveys. The fact that the raw data in these applications is often incomplete, imprecise and even misleading has two implications: (i) the raw data is not suitable for direct querying, (ii) feeding the uncertain data into existing systems produces results of unknown quality. This thesis presents a system for uncertain data processing that has two key functionalities, (i) capturing and transforming raw noisy data to rich queriable tuples that carry attributes needed for query processing with quantified uncertainty, and (ii) performing query processing on such tuples, which captures changes of uncertainty as data goes through various query operators. The proposed system considers data naturally captured by continuous distributions, which is prevalent in sensing and scientific applications. The first part of the thesis addresses data capture and transformation by proposing a probabilistic modeling and inference approach. Since this task is application-specific and requires domain knowledge, this approach is demonstrated for RFID data from mobile readers. More specifically, the proposed solution involves an inference and cleaning substrate to transform raw RFID data streams to object location tuple streams where locations are inferred from raw noisy data and their uncertain values are captured by probability distributions. The second, also the main part, of this thesis examines query processing for uncertain data modeled by continuous random variables. The proposed system includes new data models and algorithms for relational processing, with a focus on aggregation and conditioning operations. For operations of high complexity, optimizations including approximations with guaranteed error bounds are considered. Then complex queries involving a mix of operations are addressed by query planning, which given a query, finds an efficient plan that meets user-defined accuracy requirements. Besides relational processing, this thesis also provides the support for user-defined functions (UDFs) on uncertain data, which aims to compute the output distribution given uncertain input and a black-box UDF. The proposed solution employs a learning-based approach using Gaussian processes to compute approximate output with error bounds, and a suite of optimizations for high performance in online settings such as data stream processing and interactive data analysis. The techniques proposed in this thesis are thoroughly evaluated using both synthetic data with controlled properties and various real-world datasets from the domains of severe weather monitoring, object tracking using RFID readers, and computational astrophysics. The experimental results show that these techniques can yield high accuracy, meet stream speeds, and outperform existing techniques such as Monte Carlo sampling for many important workloads . databases data management data models data streams processing algorithms uncertain data Computer Sciences
1005	Trajectory Data Mining in the Design of Intelligent Vehicular Networks Soares de Sousa, Roniel 02 November 2022 (has links) Vehicular networks are a promising technology to help solve complex problems of modern society, such as urban mobility. However, the vehicular environment has some characteristics that pose challenges for wireless communication in vehicular networks not usually found in traditional networks. Therefore, the scientific community is yet investigating alternative techniques to improve data delivery in vehicular networks. In this context, the recent and increasing availability of trajectory data offers us valuable information in many research areas. These data comprise the so-called "big trajectory data" and represent a new opportunity for improving vehicular networks. However, there is a lack of specific data mining techniques to extract the hidden knowledge from these data. This thesis explores vehicle trajectory data mining to design intelligent vehicular networks. In the first part of this thesis, we deal with errors intrinsic to vehicle trajectory data that hinder their applicability. We propose a trajectory reconstruction framework composed of several preprocessing techniques to convert flawed GPS-based data to road-network constrained trajectories. This new data representation reduces trajectory uncertainty and removes problems such as noise and outliers compared to raw GPS trajectories. After that, we develop a novel and scalable cluster-based trajectory prediction framework that uses enhanced big trajectory data. Besides the prediction framework, we propose a new hierarchical agglomerative clustering algorithm for road-network constrained trajectories that automatically detects the most appropriate number of clusters. The proposed clustering algorithm is one of the components that allow the prediction framework to process large-scale datasets. The second part of this thesis applies the enhanced trajectory representation and the prediction framework to improve the vehicular network. We propose the VDDTP algorithm, a novel vehicle-assisted data delivery algorithm based on trajectory prediction. VDDTP creates an extended trajectory model and uses predicted road-network constrained trajectories to calculate packet delivery probabilities. Then, it applies the predicted trajectories and some proposed heuristics in a data forwarding strategy, aiming to improve the vehicular network's global metrics (i.e., delivery ratio, communication overhead, and delivery delay). In this part, we also propose the DisTraC protocol to demonstrate the applicability of vehicular networks to detect traffic congestion and improve urban mobility. DisTraC uses V2V communication to measure road congestion levels cooperatively and reroute vehicles to reduce travel time. We evaluate the proposed solutions through extensive experiments and simulations. For that, we prepare a new large-scale and real-world dataset based on the city of Rio de Janeiro, Brazil. We also use other real-world datasets publicly available. The results demonstrate the potential of the proposed data mining techniques (i.e., trajectory reconstruction and prediction frameworks) and vehicular networks algorithms. vehicular networks data mining
1006	Studies on Privacy-Aware Data Trading / プライバシーを考慮したデータ取引に関する研究 Zheng, Shuyuan 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24933号 / 情博第844号 / 新制\|\|情\|\|141(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授伊藤孝行, 教授鹿島久嗣, 教授岡部寿男, 阿部正幸(NTT社会情報研究所) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DGAM data trading data economy data pricing data privacy market design 007
1007	The impact of EDI on bills of lading: a global perspective on the dynamics involved Muthow, Erik Andy 13 September 2023 (has links) (PDF) This paper aims to give some idea of the dynamics involved in implementing electronic bills of lading. The bill of lading is one of the compendium of documents used in carriage of goods by sea. The writer did therefore not attempt to isolate the bill of lading, although the emphasis is clearly placed on substituting the traditional (tangible) bill of lading with EDI. To understand the complexity of adapting existing documentation to EDI, it is essential to place the bill of lading into an EDI context. The electronic transfer of documents is nothing new. It is the statutory requirements and legal rights and obligations associated with the transfer that is currently stretching the boundaries of the law. Most of the legislation dealing with carriage and shipping documentation was drafted in an age where EDI was clearly not envisaged. Consequently, uncertainty exists regarding the legal recognition of electronic documentation. Bills of lading - Data processing
1008	Do Procedure Codes within Population-based Administrative Datasets Accurately Identify Patients Undergoing Cystectomy with Urinary Diversion? Ross, James 01 February 2024 (has links) Abstract Introduction Cystectomy with urinary diversion (i.e. bladder removal surgery) is commonly studied using large health administrative databases. These databases often use diagnoses or procedure codes with unknown accuracy to identify cystectomy patients, thereby resulting in significant misclassification bias. The primary objective of this study is to develop a predictive model that will return an accurate probability that patients recorded in the discharge abstract database have undergone cystectomy with urinary diversion, stratified by type of urinary diversion (continent vs incontinent). Secondary objectives of this study include: 1) to internally validate our predictive model to determine its accuracy using a cohort of all adults admitted to The Ottawa Hospital (TOH) within the study period; and 2) compare the accuracy of this model to that of code-based algorithms used previously in published studies to identify cystectomy. Methods A gold standard reference cohort (GSC) of all patients who underwent cystectomy and urinary diversion at TOH between 2009 and 2019 was created by using the SIMS registry within the TOH data warehouse which captures all primary surgical procedures performed. The GSC was then confirmed by manual chart review to ensure accuracy. Through ICES, the GSC was linked to the provincial Discharge Abstract Database (DAD), physician billing records (OHIP), and Ontario Cancer Registry (OCR) and a new combined dataset containing all admissions at TOH during the study period was created. Clinical information, billing, and intervention codes within these databases were reviewed and the co-variables thought to be predictive of cystectomy were selected a priori. A multinomial logistic regression model (i.e. The Ottawa Cystectomy Identification Model or OCIM) was created using these co-variables to determine the probability of a patient undergoing cystectomy, stratified by continent vs incontinent diversion, during an admission in the DAD. Using the OCIM and bootstrap imputation methods, co-variable values and 95% confidence intervals were calculated. The values of these same co- variables were then measured using a code algorithm (the presence of either a procedure code or billing code for cystectomy with incontinent or continent diversion). Misclassification bias was then measured by comparing the values of co-variables using the OCIM or code algorithm to the true values obtained from the gold standard reference cohort. Results Five hundred patients were included in the GSC [median age 68.0 (IQR 13.0); 75.6% male; 55.6% incontinent diversion]. The prevalence of cystectomy within the DAD over the study period was 0.12% (500/428697 total admissions). Sensitivity and positive predictive values for cystectomy codes were 97.1% and 58.6% for incontinent diversions and 100.0% and 48.4% for continent diversions, respectively. The OCIM accurately predicted cystectomy with incontinent diversion (c-statistic [C] 0.999, Integrated Calibration Index [ICI] 0.000) and cystectomy with continent diversion (C:1.000, ICI 0.000) probabilities. Misclassification bias was lower when identifying cystectomy patients using the OCIM with bootstrap imputation compared to the use of the code algorithm alone. Conclusions A model using administrative data accurately returned the probability that cystectomy by diversion type occurred during a hospitalization. Using this model to impute cystectomy status minimized misclassification bias. Validation Cystectomy Administrative Data
1009	Open Government Data and Value Creation: Exploring the Roles Canadian Data Intermediaries Play in the Value Creation Process Merhi, Salah 03 January 2023 (has links) Open Government Data, an initiative of the open government movement, is believed to have the potential to increase government transparency, accountability, and citizens' participation in government affairs. It is also posited that they will contribute to economic growth and value creation. The Canadian federal, provincial, and local governments have been actively opening and releasing open datasets about multiple subjects of interest to the public. However, evidence of the benefits of using open datasets is scant, with no empirical research undertaken to understand how the data are used and what value is being created. This study, based on a qualitative, grounded theory method, focuses on the works and experiences of 17 Canadian open data intermediary firms to discover patterns and themes that explain how the data were used, what resources were needed, the value created, and the challenges faced. The data collection is based on semi-structured interviews conducted virtually with the founder or company's executives. The data analysis provided insights into how open government data were used, the organizational challenges the open data intermediaries faced, the state of open government data, and the economic value created. The findings highlighted the key similarities and differences in the activities the open data intermediaries performed and the importance of resources and capabilities in developing products/services that contribute to economic value creation. The study concluded by listing five challenges impacting the use of open government data: (a) awareness, (b) quality of open government data, (c) competencies of users, (d) data standards, and (e) value creation. Open Government Data Open Data Value Creation Data Intermediaries Canadian Data Intermediaries
1010	Feature Tracking in Two Dimensional Time Varying Datasets Thampy, Sajjit 10 May 2003 (has links) This study investigates methods that can be used for tracking features in computationalluid-dynamics datasets. The two approaches of overlap based feature tracking and attribute-based feature tracking are studied. Overlap based techniques use the actual degree of overlap between sucessive time steps to conclude a match. Attribute-based techniques use characteristics native to the feature being studied, like size, orientation, speed etc, to conclude a match between candidate features. Due to limitations on the number of time steps that can be held in a computer's memory, it may be possible to load only a time-subsampled data set. This might result in a decrease in the overlap obtained, and hence a subsequent decrease in the confidence of the match. This study looks into using specific attributes of features, like rotational velocity, linear velocity to predict the presence of that feature in a future time step. The use of predictive techniques is tested on swirling features, i.e., vortices. An ellipse-like representation is assumed to be a good approximation of any such feature. The location of a feature in previous time-steps are used to predict its position in a future time-step. The ellipse-like representation of the feature is translated over to the predicted location and aligned in the predicted orientation. An overlap test is then done. Use of predictive techniques will help increase the overlap, and subsequently the confidence in the match obtained. The techniques were tested on an artificial data set for linear velocity and rotation and on a real data set of simulation of flow past a cylinder. Regions of swirling flow, detected by computing the swirl parameter, were taken as features for study. The degree of overlap obtained by a basic overlap and by the use of predictive methods were tabulated. The results show that the use of predictive techniques improved the overlap. Data mining Feature Tracking

Search results