• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 73
  • 10
  • 9
  • 3
  • 3
  • 1
  • 1
  • Tagged with
  • 149
  • 149
  • 44
  • 41
  • 37
  • 30
  • 19
  • 19
  • 18
  • 18
  • 17
  • 16
  • 15
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Superscalar Processor Models Using Statistical Learning

Joseph, P J 04 1900 (has links)
Processor architectures are becoming increasingly complex and hence architects have to evaluate a large design space consisting of several parameters, each with a number of potential settings. In order to assist in guiding design decisions we develop simple and accurate models of the superscalar processor design space using a detailed and validated superscalar processor simulator. Firstly, we obtain precise estimates of all significant micro-architectural parameters and their interactions by building linear regression models using simulation based experiments. We obtain good approximate models at low simulation costs using an iterative process in which Akaike’s Information Criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We use this procedure for model construction and show that it provides a cost effective scheme to experiment with all relevant parameters. We also obtain accurate predictors of the processors performance response across the entire design-space, by constructing radial basis function networks from sampled simulation experiments. We construct these models, by simulating at limited design points selected by latin hypercube sampling, and then deriving the radial neural networks from the results. We show that these predictors provide accurate approximations to the simulator’s performance response, and hence provide a cheap alternative to simulation while searching for optimal processor design points.
122

Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit

Du Toit, Jan Valentine January 2006 (has links)
In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing model selection based on an objective model selection criterion. Partial residual plots are only utilized after the best model is found to gain insight into the relationships between inputs and the target. Models are organized in a search tree with a greedy search procedure that identifies good models in a relatively short time. The automated construction algorithm, implemented in the powerful SAS® language, is nontrivial, effective, and comparable to other model selection methodologies found in the literature. This implementation, which is called AutoGANN, has a simple, intuitive, and user-friendly interface. The AutoGANN system is further extended with an approximation to Bayesian Model Averaging. This technique accounts for uncertainty about the variables that must be included in the model and uncertainty about the model structure. Model averaging utilizes in-sample model selection criteria and creates a combined model with better predictive ability than using any single model. In the field of Credit Scoring, the standard theory of scorecard building is not tampered with, but a pre-processing step is introduced to arrive at a more accurate scorecard that discriminates better between good and bad applicants. The pre-processing step exploits GANN models to achieve significant reductions in marginal and cumulative bad rates. The time it takes to develop a scorecard may be reduced by utilizing the automated construction algorithm. / Thesis (Ph.D. (Computer Science))--North-West University, Potchefstroom Campus, 2006.
123

Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit

Du Toit, Jan Valentine January 2006 (has links)
In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing model selection based on an objective model selection criterion. Partial residual plots are only utilized after the best model is found to gain insight into the relationships between inputs and the target. Models are organized in a search tree with a greedy search procedure that identifies good models in a relatively short time. The automated construction algorithm, implemented in the powerful SAS® language, is nontrivial, effective, and comparable to other model selection methodologies found in the literature. This implementation, which is called AutoGANN, has a simple, intuitive, and user-friendly interface. The AutoGANN system is further extended with an approximation to Bayesian Model Averaging. This technique accounts for uncertainty about the variables that must be included in the model and uncertainty about the model structure. Model averaging utilizes in-sample model selection criteria and creates a combined model with better predictive ability than using any single model. In the field of Credit Scoring, the standard theory of scorecard building is not tampered with, but a pre-processing step is introduced to arrive at a more accurate scorecard that discriminates better between good and bad applicants. The pre-processing step exploits GANN models to achieve significant reductions in marginal and cumulative bad rates. The time it takes to develop a scorecard may be reduced by utilizing the automated construction algorithm. / Thesis (Ph.D. (Computer Science))--North-West University, Potchefstroom Campus, 2006.
124

Statistical Modeling for Credit Ratings

Vana, Laura 01 August 2018 (has links) (PDF)
This thesis deals with the development, implementation and application of statistical modeling techniques which can be employed in the analysis of credit ratings. Credit ratings are one of the most widely used measures of credit risk and are relevant for a wide array of financial market participants, from investors, as part of their investment decision process, to regulators and legislators as a means of measuring and limiting risk. The majority of credit ratings is produced by the "Big Three" credit rating agencies Standard & Poors', Moody's and Fitch. Especially in the light of the 2007-2009 financial crisis, these rating agencies have been strongly criticized for failing to assess risk accurately and for the lack of transparency in their rating methodology. However, they continue to maintain a powerful role as financial market participants and have a huge impact on the cost of funding. These points of criticism call for the development of modeling techniques that can 1) facilitate an understanding of the factors that drive the rating agencies' evaluations, 2) generate insights into the rating patterns that these agencies exhibit. This dissertation consists of three research articles. The first one focuses on variable selection and assessment of variable importance in accounting-based models of credit risk. The credit risk measure employed in the study is derived from credit ratings assigned by ratings agencies Standard & Poors' and Moody's. To deal with the lack of theoretical foundation specific to this type of models, state-of-the-art statistical methods are employed. Different models are compared based on a predictive criterion and model uncertainty is accounted for in a Bayesian setting. Parsimonious models are identified after applying the proposed techniques. The second paper proposes the class of multivariate ordinal regression models for the modeling of credit ratings. The model class is motivated by the fact that correlated ordinal data arises naturally in the context of credit ratings. From a methodological point of view, we extend existing model specifications in several directions by allowing, among others, for a flexible covariate dependent correlation structure between the continuous variables underlying the ordinal credit ratings. The estimation of the proposed models is performed using composite likelihood methods. Insights into the heterogeneity among the "Big Three" are gained when applying this model class to the multiple credit ratings dataset. A comprehensive simulation study on the performance of the estimators is provided. The third research paper deals with the implementation and application of the model class introduced in the second article. In order to make the class of multivariate ordinal regression models more accessible, the R package mvord and the complementary paper included in this dissertation have been developed. The mvord package is available on the "Comprehensive R Archive Network" (CRAN) for free download and enhances the available ready-to-use statistical software for the analysis of correlated ordinal data. In the creation of the package a strong emphasis has been put on developing a user-friendly and flexible design. The user-friendly design allows end users to estimate in an easy way sophisticated models from the implemented model class. The end users the package appeals to are practitioners and researchers who deal with correlated ordinal data in various areas of application, ranging from credit risk to medicine or psychology.
125

Classification et modélisation statistique intégrant des données cliniques et d’imagerie par résonance magnétique conventionnelle et avancée / Classification and statistical modeling based on clinical and conventional and advanced Magnetic Resonance Imaging data

Tozlu, Ceren 19 March 2018 (has links)
L'accident vasculaire cérébral et la sclérose en plaques figurent parmi les maladies neurologiques les plus destructrices du système nerveux central. L'accident vasculaire cérébral est la deuxième cause de décès et la principale cause de handicap chez l'adulte dans le monde alors que la sclérose en plaques est la maladie neurologique non traumatique la plus fréquente chez l'adulte jeune. L'imagerie par résonance magnétique est un outil important pour distinguer le tissu cérébral sain du tissu pathologique à des fins de diagnostic, de suivi de la maladie, et de prise de décision pour un traitement personnalisé des patients atteints d'accident vasculaire cérébral ou de sclérose en plaques. La prédiction de l'évolution individuelle de la maladie chez les patients atteints d'accident vasculaire cérébral ou de sclérose en plaques constitue un défi pour les cliniciens avant de donner un traitement individuel approprié. Cette prédiction est possible avec des approches statistiques appropriées basées sur des informations cliniques et d'imagerie. Toutefois, l'étiologie, la physiopathologie, les symptômes et l'évolution dans l'accident vasculaire cérébral et la sclérose en plaques sont très différents. Par conséquent, dans cette thèse, les méthodes statistiques utilisées pour ces deux maladies neurologiques sont différentes. Le premier objectif était l'identification du tissu à risque d'infarctus chez les patients atteints d'accident vasculaire cérébral. Pour cet objectif, les méthodes de classification (dont les méthodes de machine learning) ont été utilisées sur des données d'imagerie mesurées à l'admission pour prédire le risque d'infarctus à un mois. Les performances des méthodes de classification ont été ensuite comparées dans un contexte d'identification de tissu à haut risque d'infarctus à partir de données humaines codées voxel par voxel. Le deuxième objectif était de regrouper les patients atteints de sclérose en plaques avec une méthode non supervisée basée sur des trajectoires individuelles cliniques et d'imagerie tracées sur cinq ans. Les groupes de trajectoires aideraient à identifier les patients menacés d'importantes progressions et donc à leur donner des médicaments plus efficaces. Le troisième et dernier objectif de la thèse était de développer un modèle prédictif pour l'évolution du handicap individuel des patients atteints de sclérose en plaques sur la base de données démographiques, cliniques et d'imagerie obtenues a l'inclusion. L'hétérogénéité des évolutions du handicap chez les patients atteints de sclérose en plaques est un important défi pour les cliniciens qui cherchent à prévoir l'évolution individuelle du handicap. Le modèle mixte linéaire à classes latentes a été utilisé donc pour prendre en compte la variabilité individuelle et la variabilité inobservée entre sous-groupes de sclérose en plaques / Stroke and multiple sclerosis are two of the most destructive neurological diseases of the central nervous system. Stroke is the second most common cause of death and the major cause of disability worldwide whereas multiple sclerosis is the most common non-traumatic disabling neurological disease of adulthood. Magnetic resonance imaging is an important tool to distinguish healthy from pathological brain tissue in diagnosis, monitoring disease evolution, and decision-making in personalized treatment of patients with stroke or multiple sclerosis.Predicting disease evolution in patients with stroke or multiple sclerosis is a challenge for clinicians that are about to decide on an appropriate individual treatment. The etiology, pathophysiology, symptoms, and evolution of stroke and multiple sclerosis are highly different. Therefore, in this thesis, the statistical methods used for the study of the two neurological diseases are different.The first aim was the identification of the tissue at risk of infarction in patients with stroke. For this purpose, the classification methods (including machine learning methods) have been used on voxel-based imaging data. The data measured at hospital admission is performed to predict the infarction risk at one month. Next, the performances of the classification methods in identifying the tissue at a high risk of infarction were compared. The second aim was to cluster patients with multiple sclerosis using an unsupervised method based on individual clinical and imaging trajectories plotted over five 5 years. Clusters of trajectories would help identifying patients who may have an important progression; thus, to treat them with more effective drugs irrespective of the clinical subtypes. The third and final aim of this thesis was to develop a predictive model for individual evolution of patients with multiple sclerosis based on demographic, clinical, and imaging data taken at study onset. The heterogeneity of disease evolution in patients with multiple sclerosis is an important challenge for the clinicians who seek to predict the disease evolution and decide on an appropriate individual treatment. For this purpose, the latent class linear mixed model was used to predict disease evolution considering individual and unobserved subgroup' variability in multiple sclerosis
126

Dataset selection for aggregate model implementation in predictive data mining

Lutu, P.E.N. (Patricia Elizabeth Nalwoga) 15 November 2010 (has links)
Data mining has become a commonly used method for the analysis of organisational data, for purposes of summarizing data in useful ways and identifying non-trivial patterns and relationships in the data. Given the large volumes of data that are collected by business, government, non-government and scientific research organizations, a major challenge for data mining researchers and practitioners is how to select relevant data for analysis in sufficient quantities, in order to meet the objectives of a data mining task. This thesis addresses the problem of dataset selection for predictive data mining. Dataset selection was studied in the context of aggregate modeling for classification. The central argument of this thesis is that, for predictive data mining, it is possible to systematically select many dataset samples and employ different approaches (different from current practice) to feature selection, training dataset selection, and model construction. When a large amount of information in a large dataset is utilised in the modeling process, the resulting models will have a high level of predictive performance and should be more reliable. Aggregate classification models, also known as ensemble classifiers, have been shown to provide a high level of predictive accuracy on small datasets. Such models are known to achieve a reduction in the bias and variance components of the prediction error of a model. The research for this thesis was aimed at the design of aggregate models and the selection of training datasets from large amounts of available data. The objectives for the model design and dataset selection were to reduce the bias and variance components of the prediction error for the aggregate models. Design science research was adopted as the paradigm for the research. Large datasets obtained from the UCI KDD Archive were used in the experiments. Two classification algorithms: See5 for classification tree modeling and K-Nearest Neighbour, were used in the experiments. The two methods of aggregate modeling that were studied are One-Vs-All (OVA) and positive-Vs-negative (pVn) modeling. While OVA is an existing method that has been used for small datasets, pVn is a new method of aggregate modeling, proposed in this thesis. Methods for feature selection from large datasets, and methods for training dataset selection from large datasets, for OVA and pVn aggregate modeling, were studied. The experiments of feature selection revealed that the use of many samples, robust measures of correlation, and validation procedures result in the reliable selection of relevant features for classification. A new algorithm for feature subset search, based on the decision rule-based approach to heuristic search, was designed and the performance of this algorithm was compared to two existing algorithms for feature subset search. The experimental results revealed that the new algorithm makes better decisions for feature subset search. The information provided by a confusion matrix was used as a basis for the design of OVA and pVn base models which aren combined into one aggregate model. A new construct called a confusion graph was used in conjunction with new algorithms for the design of pVn base models. A new algorithm for combining base model predictions and resolving conflicting predictions was designed and implemented. Experiments to study the performance of the OVA and pVn aggregate models revealed the aggregate models provide a high level of predictive accuracy compared to single models. Finally, theoretical models to depict the relationships between the factors that influence feature selection and training dataset selection for aggregate models are proposed, based on the experimental results. / Thesis (PhD)--University of Pretoria, 2010. / Computer Science / unrestricted
127

Predictive Modeling of Enrollment and Academic Success in Secondary Chemistry

Charnock, Nathan Lee 01 January 2016 (has links)
The aim of this study was to identify predictors of student enrollment and successful achievement in 10th grade chemistry courses for a sample drawn from a single academic cohort from a single metropolitan school district in Florida. Predictors included, among others, letter grades for courses completed in academic classes for each independent grade level, sixth through 10th grade, as well as standardized test scores on the Florida Comprehensive Assessment Test and demographic variables. The predictive models demonstrated that it is possible to identify student attributes that result in either increased or decreased odds of enrollment in chemistry courses. The logistic models identified subsets of students who could potentially be candidates for academic interventions, which may increase the likelihood of enrollment and successful achievement in a 10th grade chemistry course. Predictors in this study included grades achieved for each school year for coursework completed in mathematics, English, history, and science, as well as reported FCAT performance band scores for students from sixth through 10th grade. Demographics, socioeconomic status, special learning services, attendance rates, and number of suspensions are considered. The results demonstrated that female students were more likely to enroll in and pass a chemistry course than their male peers. The results also demonstrated that prior science achievement (followed closely by mathematics achievement) was the strongest predictor of enrollment in—and passing of—a chemistry course. Additional analysis also demonstrated the relative stability of academic GPA per discipline from year to year; cumulative achievement was the best overall indicator of course enrollment and achievement.
128

Finding the Past in the Present: Modeling Prehistoric Occupation and Use of the Powder River Basin, Wyoming

Clark, Catherine Anne 01 January 2012 (has links)
In the Powder River Basin of Wyoming, our nation's interest in protecting its cultural heritage collides with the high demand for carbon fuels. "Clinker" deposits dot the basin. These distinctive buttes, created by the underground combustion of coal, are underlain by coal veins; they also provided the main lithic resources for prehistoric hunter-gatherers. These deposits signify both a likelihood of extractable carbon and high archaeological site density. Federal law requires that energy developers must identify culturally significant sites before mining can begin. The research presented here explains the need for and describes a statistical tool with the potential to predict sites where carbon and cultural resources co-occur, thus streamlining the process of identifying important heritage sites to protect them from adverse impacts by energy development. The methods used for this predictive model include two binary logistic regression models using known archaeological sites in the Powder River Basin. The model as developed requires further refinement; the results are nevertheless applicable to future research in this and similar areas, as I discuss in my conclusion.
129

Predictive Quality Analytics

Salim A Semssar (11823407) 03 January 2022 (has links)
Quality drives customer satisfaction, improved business performance, and safer products. Reducing waste and variation is critical to the financial success of organizations. Today, it is common to see Lean and Six Sigma used as the two main strategies in improving Quality. As advancements in information technologies enable the use of big data, defect reduction and continuous improvement philosophies will benefit and even prosper. Predictive Quality Analytics (PQA) is a framework where risk assessment and Machine Learning technology can help detect anomalies in the entire ecosystem, and not just in the manufacturing facility. PQA serves as an early warning system that directs resources to where help and mitigation actions are most needed. In a world where limited resources are the norm, focused actions on the significant few defect drivers can be the difference between success and failure
130

Large Eddy Simulations of a Back-step Turbulent Flow and Preliminary Assessment of Machine Learning for Reduced Order Turbulence Model Development

Biswaranjan Pati (11205510) 30 July 2021 (has links)
Accuracy in turbulence modeling remains a hurdle in the widespread use of Computational Fluid Dynamics (CFD) as a tool for furthering fluids dynamics research. Meanwhile, computational power remains a significant concern for solving real-life wall-bounded flows, which portray a wide range of length and time scales. The tools for turbulence analysis at our disposal, in the decreasing order of their accuracy, include Direct Numerical Simulation (DNS), Large Eddy Simulation (LES), and Reynolds-Averaged Navier Stokes (RANS) based models. While DNS and LES would remain exorbitantly expensive options for simulating high Reynolds number flows for the foreseeable future, RANS is and continues to be a viable option utilized in commercial and academic endeavors. In the first part of the present work, flow over the back-step test case was solved, and parametric studies for various parameters such as re-circulation length (X<sub>r</sub>), coefficient of pressure (C<sub>p</sub>), and coefficient of skin friction (C<sub>f</sub>) are presented and validated with experimental results. The back-step setup was chosen as the test case as turbulent modeling of flow past backward-facing step has been pivotal to understand separated flows better. Turbulence modeling is done on the test case using RANS (k-ε and k-ω models), and LES modeling, for different values of Reynolds number (Re ∈ {2, 2.5, 3, 3.5} × 10<sup>4</sup>) and expansion ratios (ER ∈ {1.5, 2, 2.5, 3}). The LES results show good agreement with experimental results, and the discrepancy between the RANS results and experimental data was highlighted. The results obtained in the first part reveal a pattern of under-prediction noticed with using RANS-based models to analyze canonical setups such as the backward-facing step. The LES results show close proximity to experimental data, as mentioned above, which makes it an excellent source of training data for the machine learning analysis outlined in the second part. The highlighted discrepancy and the inability of the RANS model to accurately predict significant flow properties create the need for a better model. The purpose of the second part of the present study is to make systematic efforts to minimize the error between flow properties from RANS modeling and experimental data, as seen in the first part. A machine learning model was constructed in the second part of the present study to predict the eddy viscosity parameter (μt) as a function of turbulent kinetic energy (TKE) and dissipation rate (ε) derived from LES data, effectively working as an ad hoc eddy-viscosity based turbulence model. The machine learning model does not work well with the flow domain as a whole, but a zonal analysis reveals a better prediction of eddy viscosity than the whole domain. Among the zones, the area in the vicinity of the re-circulation zone gives the best result. The obtained results point towards the need for a zonal analysis for the better performance of the machine learning model, which will enable us to improve RANS predictions by developing a reduced order turbulence model.

Page generated in 0.1161 seconds