Global ETD Search

131	Dataset selection for aggregate model implementation in predictive data mining Lutu, P.E.N. (Patricia Elizabeth Nalwoga) 15 November 2010 (has links) Data mining has become a commonly used method for the analysis of organisational data, for purposes of summarizing data in useful ways and identifying non-trivial patterns and relationships in the data. Given the large volumes of data that are collected by business, government, non-government and scientific research organizations, a major challenge for data mining researchers and practitioners is how to select relevant data for analysis in sufficient quantities, in order to meet the objectives of a data mining task. This thesis addresses the problem of dataset selection for predictive data mining. Dataset selection was studied in the context of aggregate modeling for classification. The central argument of this thesis is that, for predictive data mining, it is possible to systematically select many dataset samples and employ different approaches (different from current practice) to feature selection, training dataset selection, and model construction. When a large amount of information in a large dataset is utilised in the modeling process, the resulting models will have a high level of predictive performance and should be more reliable. Aggregate classification models, also known as ensemble classifiers, have been shown to provide a high level of predictive accuracy on small datasets. Such models are known to achieve a reduction in the bias and variance components of the prediction error of a model. The research for this thesis was aimed at the design of aggregate models and the selection of training datasets from large amounts of available data. The objectives for the model design and dataset selection were to reduce the bias and variance components of the prediction error for the aggregate models. Design science research was adopted as the paradigm for the research. Large datasets obtained from the UCI KDD Archive were used in the experiments. Two classification algorithms: See5 for classification tree modeling and K-Nearest Neighbour, were used in the experiments. The two methods of aggregate modeling that were studied are One-Vs-All (OVA) and positive-Vs-negative (pVn) modeling. While OVA is an existing method that has been used for small datasets, pVn is a new method of aggregate modeling, proposed in this thesis. Methods for feature selection from large datasets, and methods for training dataset selection from large datasets, for OVA and pVn aggregate modeling, were studied. The experiments of feature selection revealed that the use of many samples, robust measures of correlation, and validation procedures result in the reliable selection of relevant features for classification. A new algorithm for feature subset search, based on the decision rule-based approach to heuristic search, was designed and the performance of this algorithm was compared to two existing algorithms for feature subset search. The experimental results revealed that the new algorithm makes better decisions for feature subset search. The information provided by a confusion matrix was used as a basis for the design of OVA and pVn base models which aren combined into one aggregate model. A new construct called a confusion graph was used in conjunction with new algorithms for the design of pVn base models. A new algorithm for combining base model predictions and resolving conflicting predictions was designed and implemented. Experiments to study the performance of the OVA and pVn aggregate models revealed the aggregate models provide a high level of predictive accuracy compared to single models. Finally, theoretical models to depict the relationships between the factors that influence feature selection and training dataset selection for aggregate models are proposed, based on the experimental results. / Thesis (PhD)--University of Pretoria, 2010. / Computer Science / unrestricted Dataset partitioning Data mining Bias reduction Predictive modeling Classification Model aggregation Ensemble classifiers Ova classification Pvn classification Dataset selection Featureselection Variable selection Large datasets Variance reduction Dataset sampling UCTD
132	Predictive Modeling of Enrollment and Academic Success in Secondary Chemistry Charnock, Nathan Lee 01 January 2016 (has links) The aim of this study was to identify predictors of student enrollment and successful achievement in 10th grade chemistry courses for a sample drawn from a single academic cohort from a single metropolitan school district in Florida. Predictors included, among others, letter grades for courses completed in academic classes for each independent grade level, sixth through 10th grade, as well as standardized test scores on the Florida Comprehensive Assessment Test and demographic variables. The predictive models demonstrated that it is possible to identify student attributes that result in either increased or decreased odds of enrollment in chemistry courses. The logistic models identified subsets of students who could potentially be candidates for academic interventions, which may increase the likelihood of enrollment and successful achievement in a 10th grade chemistry course. Predictors in this study included grades achieved for each school year for coursework completed in mathematics, English, history, and science, as well as reported FCAT performance band scores for students from sixth through 10th grade. Demographics, socioeconomic status, special learning services, attendance rates, and number of suspensions are considered. The results demonstrated that female students were more likely to enroll in and pass a chemistry course than their male peers. The results also demonstrated that prior science achievement (followed closely by mathematics achievement) was the strongest predictor of enrollment in—and passing of—a chemistry course. Additional analysis also demonstrated the relative stability of academic GPA per discipline from year to year; cumulative achievement was the best overall indicator of course enrollment and achievement. Chemistry Achievement Chemistry Enrollment Logistic Regression Predictive Modeling Education Science education Secondary education Chemistry Curriculum and Instruction Education Educational Methods Science and Mathematics Education Secondary Education and Teaching
133	Finding the Past in the Present: Modeling Prehistoric Occupation and Use of the Powder River Basin, Wyoming Clark, Catherine Anne 01 January 2012 (has links) In the Powder River Basin of Wyoming, our nation's interest in protecting its cultural heritage collides with the high demand for carbon fuels. "Clinker" deposits dot the basin. These distinctive buttes, created by the underground combustion of coal, are underlain by coal veins; they also provided the main lithic resources for prehistoric hunter-gatherers. These deposits signify both a likelihood of extractable carbon and high archaeological site density. Federal law requires that energy developers must identify culturally significant sites before mining can begin. The research presented here explains the need for and describes a statistical tool with the potential to predict sites where carbon and cultural resources co-occur, thus streamlining the process of identifying important heritage sites to protect them from adverse impacts by energy development. The methods used for this predictive model include two binary logistic regression models using known archaeological sites in the Powder River Basin. The model as developed requires further refinement; the results are nevertheless applicable to future research in this and similar areas, as I discuss in my conclusion. Logistic regression Hunter-gatherer Predictive modeling Northern Great Plains
134	Predictive Quality Analytics Salim A Semssar (11823407) 03 January 2022 (has links) Quality drives customer satisfaction, improved business performance, and safer products. Reducing waste and variation is critical to the financial success of organizations. Today, it is common to see Lean and Six Sigma used as the two main strategies in improving Quality. As advancements in information technologies enable the use of big data, defect reduction and continuous improvement philosophies will benefit and even prosper. Predictive Quality Analytics (PQA) is a framework where risk assessment and Machine Learning technology can help detect anomalies in the entire ecosystem, and not just in the manufacturing facility. PQA serves as an early warning system that directs resources to where help and mitigation actions are most needed. In a world where limited resources are the norm, focused actions on the significant few defect drivers can be the difference between success and failure Process Control and Simulation Manufacturing Management Manufacturing Safety and Quality Engineering Practice quality assurance plan supply chain network manufacturing manufacturing enterprises industrial organization
135	Large Eddy Simulations of a Back-step Turbulent Flow and Preliminary Assessment of Machine Learning for Reduced Order Turbulence Model Development Biswaranjan Pati (11205510) 30 July 2021 (has links) Accuracy in turbulence modeling remains a hurdle in the widespread use of Computational Fluid Dynamics (CFD) as a tool for furthering fluids dynamics research. Meanwhile, computational power remains a significant concern for solving real-life wall-bounded flows, which portray a wide range of length and time scales. The tools for turbulence analysis at our disposal, in the decreasing order of their accuracy, include Direct Numerical Simulation (DNS), Large Eddy Simulation (LES), and Reynolds-Averaged Navier Stokes (RANS) based models. While DNS and LES would remain exorbitantly expensive options for simulating high Reynolds number flows for the foreseeable future, RANS is and continues to be a viable option utilized in commercial and academic endeavors. In the first part of the present work, flow over the back-step test case was solved, and parametric studies for various parameters such as re-circulation length (X<sub>r</sub>), coefficient of pressure (C<sub>p</sub>), and coefficient of skin friction (C<sub>f</sub>) are presented and validated with experimental results. The back-step setup was chosen as the test case as turbulent modeling of flow past backward-facing step has been pivotal to understand separated flows better. Turbulence modeling is done on the test case using RANS (k-ε and k-ω models), and LES modeling, for different values of Reynolds number (Re ∈ {2, 2.5, 3, 3.5} × 10<sup>4</sup>) and expansion ratios (ER ∈ {1.5, 2, 2.5, 3}). The LES results show good agreement with experimental results, and the discrepancy between the RANS results and experimental data was highlighted. The results obtained in the first part reveal a pattern of under-prediction noticed with using RANS-based models to analyze canonical setups such as the backward-facing step. The LES results show close proximity to experimental data, as mentioned above, which makes it an excellent source of training data for the machine learning analysis outlined in the second part. The highlighted discrepancy and the inability of the RANS model to accurately predict significant flow properties create the need for a better model. The purpose of the second part of the present study is to make systematic efforts to minimize the error between flow properties from RANS modeling and experimental data, as seen in the first part. A machine learning model was constructed in the second part of the present study to predict the eddy viscosity parameter (μt) as a function of turbulent kinetic energy (TKE) and dissipation rate (ε) derived from LES data, effectively working as an ad hoc eddy-viscosity based turbulence model. The machine learning model does not work well with the flow domain as a whole, but a zonal analysis reveals a better prediction of eddy viscosity than the whole domain. Among the zones, the area in the vicinity of the re-circulation zone gives the best result. The obtained results point towards the need for a zonal analysis for the better performance of the machine learning model, which will enable us to improve RANS predictions by developing a reduced order turbulence model. Aerospace Engineering Computational Fluid Dynamics Fluidisation and Fluid Mechanics Computational Fluid Dynamic Simulations Turbulent flows Random Forests method Backward facing step
136	Exploring the modulation of information processing by task context Heisterberg, Lisa M. January 2021 (has links) No description available. Cognitive Psychology Neurosciences Psychology context predictive modeling individual differences information processing visual working memory long-term memory Gestalt illusory object contextual cueing visual working memory capacity contralateral delay activity functional connectivity
137	Three essays of healthcare data-driven predictive modeling Zhouyang Lou (15343159) 26 April 2023 (has links) <p>Predictive modeling in healthcare involves the development of data-driven and computational models which can predict what will happen, be it for a single individual or for an entire system. The adoption of predictive models can guide various stakeholders’ decision-making in the healthcare sector, and consequently improve individual outcomes and the cost-effectiveness of care. With the rapid development in healthcare of big data and the Internet of Things technologies, research in healthcare decision-making has grown in both importance and complexity. One of the complexities facing those who would build predictive models is heterogeneity of patient populations, clinical practices, and intervention outcomes, as well as from diverse health systems. There are many sub-domains in healthcare for which predictive modeling is useful such as disease risk modeling, clinical intelligence, pharmacovigilance, precision medicine, hospitalization process optimization, digital health, and preventive care. In my dissertation, I focus on predictive modeling for applications that fit into three broad and important domains of healthcare, namely clinical practice, public health, and healthcare system. In this dissertation, I present three papers that present a collection of predictive modeling studies to address the challenge of modeling heterogeneity in health care. The first paper presents a decision-tree model to address clinicians’ need to decide among various liver cirrhosis diagnosis strategies. The second paper presents a micro-simulation model to assess the impact on cardiovascular disease (CVD) to help decision makers at government agencies develop cost-effective food policies to prevent cardiovascular diseases, a public-health domain application. The third paper compares a set of data-driven prediction models, the best performing of which is paired together with interpretable machine learning to facilitate the coordination of optimization for hospital-discharged patients choosing skilled nursing facilities. This collection of studies addresses important modeling challenges in specific healthcare domains, and also broadly contribute to research in medical decision-making, public health policy and healthcare systems.</p> Gastroenterology and hepatology Aged health care Public health not elsewhere classified Modelling and simulation Predictive modeling Data-driven modeling Healthcare Simulation Public health Cardivascular disease Cost-effectiveness analysis
138	Employee Churn Prediction in Healthcare Industry using Supervised Machine Learning / Förutsägelse av Personalavgång inom Sjukvården med hjälp av Övervakad Maskininlärning Gentek, Anna January 2022 (has links) Given that employees are one of the most valuable assets of any organization, losing an employee has a detrimental impact on several aspects of business activities. Loss of competence, deteriorated productivity and increased hiring costs are just a small fraction of the consequences associated with high employee churn. To deal with this issue, organizations within many industries rely on machine learning and predictive analytics to model, predict and understand the cause of employee churn so that appropriate proactive retention strategies can be applied. However, up to this date, the problem of excessive churn prevalent in the healthcare industry has not been addressed. To fill this research gap, this study investigates the applicability of a machine learning-based employee churn prediction model for a Swedish healthcare organization. We start by extracting relevant features from real employee data followed by a comprehensive feature analysis using Recursive Feature Elimination (RFE) method. A wide range of prediction models including traditional classifiers, such as Random Forest, Support Vector Machine and Logistic Regression are then implemented. In addition, we explore the performance of ensemble machine learning model, XGBoost and neural networks, specifically Artificial Neural Network (ANN). The results of this study show superiority of an SVM model with a recall of 94.8% and a ROC-AUC accuracy of 91.1%. Additionally, to understand and identify the main churn contributors, model-agnostic interpretability methods are examined and applied on top of the predictions. The analysis has shown that wellness contribution, employment rate and number of vacations days as well as number of sick day are strong indicators of churn among healthcare employees. / Det sägs ofta att anställda är en verksamhets mest värdefulla tillgång. Att förlora en anställd har därmed ofta skadlig inverkan på flera aspekter av affärsverksamheter. Därtill hör bland annat kompetensförlust, försämrad produktivitet samt ökade anställningskostnader. Dessa täcker endast en bråkdel av konsekvenserna förknippade med en för hög personalomsättningshastighet. För att hantera och förstå hög personalomsättning har många verksamheter och organisationer börjat använda sig av maskininlärning och statistisk analys där de bland annat analyserar beteendedata i syfte att förutsäga personalomsättning samt för att proaktivt skapa en bättre arbetsmiljö där anställda väljer att stanna kvar. Trots att sjukvården är en bransch som präglas av hög personalomsättning finns det i dagsläget inga studier som adresserar detta uppenbara problem med utgångspunkt i maskininlärning. Denna studien undersöker tillämpbarheten av maskininlärningsmodeller för att modellera och förutsäga personalomsättning i en svensk sjukvårdsorganisation. Med utgångspunkt i relevanta variabler från faktisk data på anställda tillämpar vi Recursive Feature Elimination (RFE) som den primära analysmetoden. I nästa steg tillämpar vi flertalet prediktionsmodeller inklusive traditionella klassificerare såsom Random Forest, Support Vector Machine och Logistic Regression. Denna studien utvärderar också hur pass relevanta Neural Networks eller mer specifikt Artificial Neural Networks (ANN) är i syfte att förutse personalomsättning. Slutligen utvärderar vi precisionen av en sammansatt maskininlärningsmodell, Extreme Gradient Boost. Studiens resultat påvisar att SVM är en överlägsen model med 94.8% noggranhet. Resultaten från studien möjliggör även identifiering av variabler som mest bidrar till personalomsättning. Vår analys påvisar att variablerna relaterade till avhopp är friskvårdbidrag, sysselsättningsgrad, antal semesterdagar samt sjuktid är starkt korrelerade med personalomsättning i sjukvården. Employee churn Churn Prediction Predictive modeling Machine learning Deep-Learning Data mining Binary Classification Personalomsättning Avhoppsanalys Prediktiv Modellering Maskininlärning Datautvinning Binär Klassificering Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
139	The dynamics of Autism therapy with preschool children: quantitative observation and computational methods Bertamini, Giulio 05 April 2023 (has links) Clinical and research practice in the context of Autism rapidly evolved in the last decades. Finer diagnostic procedures, evidence-based models of intervention and higher social inclusivity significantly improved the possibility for autistic children to participate in the fabric of social life. In terms of health best practices, gold-standard procedures still need to be improved, and bridging research and clinical practice still presents several challenges. From the clinical standpoint, the role of process variables, predictors, mechanisms, and timing of change still requires extensive investigation in order to explain response variability and design optimized interventions, tailored to individual needs and maximally effective. Observational techniques represent the elective research methods in child development, especially in clinical contexts, due to their non-invasiveness. However, they still suffer from limited objectivity and poor quantification. Further, their main disadvantage is that they are highly time-consuming and labor-intensive. The aim of this thesis was moving forward to promote translational research in clinical practice of Autism intervention with preschool children. At first, we tried to design and apply quantitative observational techniques to longitudinally study treatment response trajectories during developmental intervention. We tried to characterize different response profiles, and which baseline predictors were able to predict the response over time. Secondly, we investigated mechanisms of change. In particular, we focused on the role of the child-therapist interaction dynamics as a possible active mediator of the process of intervention, especially in the developmental framework that stresses the importance of interpersonal aspects. We also aimed at understanding whether certain time-windows during the intervention were particularly predictive of the response, as well as which specific interaction aspects played a role. Finally, to promote the translational application of observational methods and to improve objective quantification, we proposed and validated an Artificial Intelligence (AI) system to automate data annotation in unconstrained clinical contexts, remaining completely non-invasive and dealing with the specific noisy data that characterize them, for the analysis of the child-therapist acoustic interaction. This effort represents a base building block enabling to employ downstream computational techniques greatly reducing the need for human annotation that usually prevents the application of observational research to large amounts of data . We discuss our findings stressing the importance of assuming a developmental framework in Autism, the key role of the interpersonal experience also in the clinical context, the importance of focusing on trajectories of change and the important need to promote the acquisition of large amounts of quantitative data from the clinical contexts exploiting AI-based systems to assist clinicians, improving objectivity, enabling treatment monitoring, and producing precious data-driven knowledge on treatment efficacy. Settore M-PSI/08 - Psicologia Clinica
140	Feasibility of a long-term food-based prevention trial with black raspberries in a post-surgical oral cancer population: Adherence and modulation of biomarkers of DNA damage Uhrig, Lana K. January 2014 (has links) No description available. Behaviorial Sciences Environmental Science Food Science Oncology Public Health black raspberries oxidative stress DNA damage adherence dimethyl ellagic acid urolithins 8-hydroxy-2-deoxyguanosine food-based predictive modeling study participation

Search results