Global ETD Search

31	Analysis of "Observer Effect" in Logbook Reporting Accuracy for U.S. Pelagic Longline Fishing Vessels in the Atlantic and Gulf of Mexico Morrell, Thomas J 02 May 2019 (has links) Commercial pelagic longline fishers within the U.S. Atlantic, Gulf of Mexico, and Caribbean are required to report all fishing interactions per each gear deployment to NOAA’s Vessel Logbook Program of the Southeast Fisheries Science Center to quantify bycatch, increase conservation efforts, and avoid jeopardizing the existence of vulnerable species listed under the Endangered Species Act (ESA). To provide additional accuracy, the Pelagic Observer Program (POP) of the SEFSC deploys professionally trained observers on longline vessels to produce a statistically reliable subset of longline fisheries data. A comparison of self-reported (“unobserved”) datasets versus observer-collected (“observed”) datasets showed a general consistency for most target species but non-reporting or under-reporting for a number of bycatch species and “lesser-valued” target species. These discrepancies between catch compositions and abundancies regarding targeted species, species of bycatch concern, and species of minimum economic value can provide insight into increased fisheries regulations, stricter requirements, or additional observer coverage. catch per unit effort (CPUE) bycatch Highly Migratory Species (HMS) tuna NOAA General Additive Model (GAM) billfish Marine Biology
32	Spatial and temporal population dynamics of yellow perch (Perca flavescens) in Lake Erie Yu, Hao 19 August 2010 (has links) Yellow perch (Perca flavescens) in Lake Erie support valuable commercial and recreational fisheries critical to the local economy and society. The study of yellow perch's temporal and spatial population dynamics is important for both stock assessment and fisheries management. I explore the spatial and temporal variation of the yellow perch population by analyzing the fishery-independent surveys in Lake Erie. Model-based approaches were developed to estimate the relative abundance index, which reflected the temporal variation of the population. I also used design-based approaches to deal with the situation in which population density varied both spatially and temporally. I first used model-based approaches to explore the spatial and temporal variation of the yellow perch population and to develop the relative abundance index needed. Generalized linear models (GLM), spatial generalized linear models (s-GLM), and generalized additive models (GAM) were compared by examining the goodness-of-fit, reduction of spatial autocorrelation, and prediction errors from cross-validation. The relationship between yellow perch density distribution and spatial and environmental factors was also studied. I found that GAM showed the best goodness-of-fit shown as AIC and lowest prediction errors but s-GLM resulted in the best reduction of spatial autocorrelation. Both performed better than GLM for yellow perch relative abundance index estimation. I then applied design-based approaches to study the spatial and temporal population dynamics of yellow perch through both practical data analysis and simulation. The currently used approach in Lake Erie is stratified random sampling (StRS). Traditional sampling designs (simple random sampling (SRS) and StRS) and adaptive sampling designs (adaptive two-phase sampling (ATS), adaptive cluster sampling (ACS), and adaptive two-stage sequential sampling (ATSS)) for fishery-independent surveys were compared. From accuracy and precision aspect, ATS performed better than the SRS, StRS, ACS and ATSS for yellow perch fishery-independent survey data in Lake Erie. Model-based approaches were further studied by including geostatistical models. The performance of the GLM and GAM models and geostatistical models (spatial interpolation) were compared when they are used to analyze the temporal and spatial variation of the yellow perch population through a simulation study. This is the first time that these two types of model- based approaches have been compared in fisheries. I found that arithmetic mean (AM) method was only preferred when neither environment factors nor spatial information of sampling locations were available. If the survey can not cover the distribution area of the population due to biased design or lack of sampling locations, GLMs and GAMs are preferable to spatial interpolation (SI). Otherwise, SI is a good alternative model to estimate relative abundance index. SI has rarely been realized in fisheries. Different models may be recommended for different species/fisheries when we estimate their spatial-temporal dynamics, and also the most appropriate survey designs may be different for different species. However, the criteria and approaches for the comparison of both model-based and design-based approaches will be applied for different species or fisheries. / Ph. D. fishery-independent survey spatial generalized linear model generalized additive model generalized linear model catch rate Lake Erie Yellow perch spatial interpolation sampling design
33	Some Advanced Semiparametric Single-index Modeling for Spatially-Temporally Correlated Data Mahmoud, Hamdy F. F. 09 October 2014 (has links) Semiparametric modeling is a hybrid of the parametric and nonparametric modelings where some function forms are known and others are unknown. In this dissertation, we have made several contributions to semiparametric modeling based on the single index model related to the following three topics: the first is to propose a model for detecting change points simultaneously with estimating the unknown function; the second is to develop two models for spatially correlated data; and the third is to further develop two models for spatially-temporally correlated data. To address the first topic, we propose a unified approach in its ability to simultaneously estimate the nonlinear relationship and change points. We propose a single index change point model as our unified approach by adjusting for several other covariates. We nonparametrically estimate the unknown function using kernel smoothing and also provide a permutation based testing procedure to detect multiple change points. We show the asymptotic properties of the permutation testing based procedure. The advantage of our approach is demonstrated using the mortality data of Seoul, Korea from January, 2000 to December, 2007. On the second topic, we propose two semiparametric single index models for spatially correlated data. One additively separates the nonparametric function and spatially correlated random effects, while the other does not separate the nonparametric function and spatially correlated random effects. We estimate these two models using two algorithms based on Markov Chain Expectation Maximization algorithm. Our approaches are compared using simulations, suggesting that the semiparametric single index nonadditive model provides more accurate estimates of spatial correlation. The advantage of our approach is demonstrated using the mortality data of six cities, Korea from January, 2000 to December, 2007. The third topic involves proposing two semiparametric single index models for spatially and temporally correlated data. Our first model has the nonparametric function which can separate from spatially and temporally correlated random effects. We refer it to "semiparametric spatio-temporal separable single index model (SSTS-SIM)", while the second model does not separate the nonparametric function from spatially correlated random effects but separates the time random effects. We refer our second model to "semiparametric nonseparable single index model (SSTN-SIM)". Two algorithms based on Markov Chain Expectation Maximization algorithm are introduced to simultaneously estimate parameters, spatial effects, and times effects. The proposed models are then applied to the mortality data of six major cities in Korea. Our results suggest that SSTN-SIM is more flexible than SSTS-SIM because it can estimate various nonparametric functions while SSTS-SIM enforces the similar nonparametric curves. SSTN-SIM also provides better estimation and prediction. / Ph. D. Change Point Generalized Linear Model Generalized Additive Model Markov Chain Expectation Maximization Mixed model Permutation Test Semiparametric regression Single Index model Spatially correlated data Spatio-temporal data.
34	Semiparametric Bayesian Approach using Weighted Dirichlet Process Mixture For Finance Statistical Models Sun, Peng 07 March 2016 (has links) Dirichlet process mixture (DPM) has been widely used as exible prior in nonparametric Bayesian literature, and Weighted Dirichlet process mixture (WDPM) can be viewed as extension of DPM which relaxes model distribution assumptions. Meanwhile, WDPM requires to set weight functions and can cause extra computation burden. In this dissertation, we develop more efficient and exible WDPM approaches under three research topics. The first one is semiparametric cubic spline regression where we adopt a nonparametric prior for error terms in order to automatically handle heterogeneity of measurement errors or unknown mixture distribution, the second one is to provide an innovative way to construct weight function and illustrate some decent properties and computation efficiency of this weight under semiparametric stochastic volatility (SV) model, and the last one is to develop WDPM approach for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model (as an alternative approach for SV model) and propose a new model evaluation approach for GARCH which produces easier-to-interpret result compared to the canonical marginal likelihood approach. In the first topic, the response variable is modeled as the sum of three parts. One part is a linear function of covariates that enter the model parametrically. The second part is an additive nonparametric model. The covariates whose relationships to response variable are unclear will be included in the model nonparametrically using Lancaster and Šalkauskas bases. The third part is error terms whose means and variance are assumed to follow non-parametric priors. Therefore we denote our model as dual-semiparametric regression because we include nonparametric idea for both modeling mean part and error terms. Instead of assuming all of the error terms follow the same prior in DPM, our WDPM provides multiple candidate priors for each observation to select with certain probability. Such probability (or weight) is modeled by relevant predictive covariates using Gaussian kernel. We propose several different WDPMs using different weights which depend on distance in covariates. We provide the efficient Markov chain Monte Carlo (MCMC) algorithms and also compare our WDPMs to parametric model and DPM model in terms of Bayes factor using simulation and empirical study. In the second topic, we propose an innovative way to construct weight function for WDPM and apply it to SV model. SV model is adopted in time series data where the constant variance assumption is violated. One essential issue is to specify distribution of conditional return. We assume WDPM prior for conditional return and propose a new way to model the weights. Our approach has several advantages including computational efficiency compared to the weight constructed using Gaussian kernel. We list six properties of this proposed weight function and also provide the proof of them. Because of the additional Metropolis-Hastings steps introduced by WDPM prior, we find the conditions which can ensure the uniform geometric ergodicity of transition kernel in our MCMC. Due to the existence of zero values in asset price data, our SV model is semiparametric since we employ WDPM prior for non-zero values and parametric prior for zero values. On the third project, we develop WDPM approach for GARCH type model and compare different types of weight functions including the innovative method proposed in the second topic. GARCH model can be viewed as an alternative way of SV for analyzing daily stock prices data where constant variance assumption does not hold. While the response variable of our SV models is transformed log return (based on log-square transformation), GARCH directly models the log return itself. This means that, theoretically speaking, we are able to predict stock returns using GARCH models while this is not feasible if we use SV model. Because SV models ignore the sign of log returns and provides predictive densities for squared log return only. Motivated by this property, we propose a new model evaluation approach called back testing return (BTR) particularly for GARCH. This BTR approach produces model evaluation results which are easier to interpret than marginal likelihood and it is straightforward to draw conclusion about model profitability by applying this approach. Since BTR approach is only applicable to GARCH, we also illustrate how to properly cal- culate marginal likelihood to make comparison between GARCH and SV. Based on our MCMC algorithms and model evaluation approaches, we have conducted large number of model fittings to compare models in both simulation and empirical study. / Ph. D. Additive Model Bayes factor Cubic Splines Dual-Semiparametric Regression Generalized Polya urn Geometric ergodicity Gibbs sampling Metropolis-Hastings Nonparametric Bayesian Model Ordinal data Parameterization Semiparametric Regr
35	Modelos para relacionar variáveis de solos e área basal de espécies florestais em uma área de vegetação natural / Models to relate variable soil and basal area of forest species in an area of natural vegeration Grego, Simone 08 October 2014 (has links) O padrão espacial de ocorrência de atributos de espécies florestais, tal como a área basal das árvores, pode fornecer informações para o entendimento da estrutura da comunidade vegetal. Uma vez que fatores ambientais podem influenciar tanto o padrão espacial de ocorrência quanto os atributos das espécies em florestas nativas. Desse modo, investigar a relação entre as características ambientais e o padrão espacial de espécies florestais pode ajudar a entender a dinâmica das florestas. Especificamente, neste trabalho, o objetivo é avaliar métodos estatísticos que permitam identificar quais atributos do solo são capazes de explicar a variação da área basal de cada espécie de árvore. A área basal foi considerada como variável resposta e como covariáveis, um grande número de atributos físicos e químicos do solo, medidos em uma malha de localizações cobrindo a área de estudo. Foram revisados e utilizados os métodos de regressão linear múltipla com método de seleção stepwise, modelos aditivos generalizados e árvores de regressão. Em uma segunda fase das análises, adicionou-se um efeito espacial aos modelos, com o intuito de verificar se havia ainda padrões na variabilidade, não capturados pelos modelos. Com isso, foram considerados os modelos autoregressivo simultâneo, condicional autoregressivo e geoestatístico. Dado o grande número de atributos do solo, as análises foram também conduzidas utilizando-se as covariáveis originais, fatores identificados em uma análise fatorial prévia dos atributos de solo. A seleção de modelos com melhor ajuste foi utilizada para identificar os atributos de solo relevantes, bem como a presença e melhor descrição de padrões espaciais. A área de estudo foi a Estação Ecológica de Assis, da Unidade de Conservação do Estado de São Paulo em parcelas permanentes, dentro do projeto \"Diversidade, Dinâmica e Conservação em Florestas do Estado de São Paulo: 40 ha de parcelas permanentes\", do programa Biota da FAPESP. As análises reportadas aqui se referem à área basal das espécies Copaifera langsdorffii, Vochysia tucanorum e Xylopia aromatica. Com os atributos de solo reduzidos e consistentemente associados à área basal, a declividade, altitude, saturação por alumínio e potássio mostraram-se relevantes para duas das espécies. Resultados obtidos mostraram a presença de um padrão na variabilidade, mesmo levando-se em consideração os efeitos das covariáveis, ou seja, os atributos do solo explicam parcialmente a variabilidade da área basal, mas existe um padrão que ocorre no espaço que não é capturado por essas covariáveis. / The spatial pattern of occurrenceis of forest species and their attributes, such as the basal area of trees, can provide information for understanding the structure of the vegetable community. Considering the environmental factors can influence the spatial pattern of occurrences of species in native forests and related attributes, describing relationship between environmental characteristics and spatial pattern of forest species can be associated with the dynamics of forests. The objective of the present study is to assess different statistical methods used to identify which soil attributes are associated with the basal area of each tree selected species. The basal area was considered as the response variable and the covariates are given by a large number of physical and chemical attributes of the soil, measured at a grid of locations covering the study area. The methods considered are the multiple linear regression with stepwise model selection, generalized additive models and regression trees. Spatial effects were added to the models, in order to ascertain whether there is residual spatial patterns not captured by the covariates. Thus, simultaneous autoregressive model, autoregressive conditional and geostatistical were considered. Considering the large number of soil attributes, analysis were were conducted both ways, using the original covariates, and using factors identified in a preliminar factor analysis of the soil attributes. Model selection was used to identify the relevant attributes of soil as well as the presence and better description of spatial patterns. The study area was the Ecological Station of Assis, the Conservation Unit of the State of São Paulo in permanent plots within the \"Diversity Dynamics and Conservation Forests in the State of São Paulo: 40 ha of permanent plots\" project, under the research project FAPESP biota. The analyzes reported here refer to the basal area of the species Copaifera langsdorffii, Vochysia tucanorum and Xylopia aromatica. Results differ among the considered methods reinforcing the reccomendation of considering differing modeling strategies. Covariates consistently associated with basal area are slope, altitude and aluminum saturation, potassium, relevant to at least two of the species. Results obtained showed the presence of patterns in residual variability, even taking into account the effects of covariates. The soil characteristics only partially explain the variability of the basal area and there are spatial patterns not captured by these covariates. Árvore de regressão conditional autoregressive model generalized additive model Geoestatística geostatistics Modelo aditivo gene-ralizado Modelo condicional autoregressivo multiple regression Regressão múltipla regression tree simultâneous autoregressive model stepwise Stepwise
36	應用大量估價法進行公告土地現值評估之研究蘇文賢 Unknown Date (has links) 現行公告土地現值的評估，係採用人工的傳統方法，估價結果誤差甚大且過於主觀，無法達到大量估價客觀、快速、精確之目標。本文首先利用土地經濟理論的分析，探討土地市場價值、交易價格、評估價值之間的關係，釐清常見的混淆概念。並藉由估價比率研究，討論公告現值與市價差距的檢定模型，針對台南市的實際資料進行統計檢定，結果發現平均估價比率落於46.74﹪~48.52﹪之區間，並存在輕微的垂直不公平。為改進現行公告現值不夠準確之缺失，本研究基於都市經濟理論與估價先驗訊息之基礎，利用特徵價格法與可加性模型建立大量估價模式。實證結果發現，影響台南市地價之因素，以區位、臨街關係、路寬、使用分區最為重要。在部份年度中，亦證實存在基地面積規模不經濟（plattage）現象。傳統特徵價格法必須預設函數型態，若函數設定錯誤則將使參數估計產生偏誤。可加性模型結合無母數迴歸與母數迴歸之優點，不須預設函數型態、估計結果易於解釋且維持母數迴歸之收斂速度。其可經由修勻法配適出更客觀的函數關係，無論在樣本內與樣本外之估計均較特徵價格法為佳。研究結果發現，本文所提出的二種估價模式確可達到快速精確的目標，使估價比率接近１，比目前評估效率提高一倍；在公平性方面雖無改善，但亦無嚴重之垂直不公平。其中可加性模型又較特徵價格法為佳，在電腦技術快速進步的今天，應用至大量估價的可行性大為提高，值得後續進一步深入研究。 / The present Announced Land Current Value （ALCV）was evaluated by traditional appraisal method that may result in large errors. Comparing to mass assessment approaches, it is hard to be objective, quick and precise. This research begins with the analysis based on land economic theory to discuss the relation among the market value, sale price and assessed value of land in order to clarify some confusing concepts. Through assessment-sale price ratio study, we analyze the difference between ALCV and sale price, and then use the actual data of Tainan City for empirical study. The results show that the average a-s ratio falls between 46.74%~48.52% with slight vertical inequity. To improve the lack of preciseness and objectivity of the present ALCV, this research uses hedonic price theory and Generalized Additive Model（GAM）based on urban economic theory and appraisal priori information. The results show that location, relations with adjacent streets, road width and zoning are the most influencing factors of land price in Tainan City. During some years, the phenomenon of plattage effect also exits. The function form must be set beforehand in the traditional hedonic pricing, meanwhile parameters bias will occur if the pre-determined function form were wrong. GAM has the advantages of nonparametric regression and parametric regression. The function form needs not to be pre-determined, the empirical results are easy to interpret, and the speed of variable convergence can be maintained. More precise functional relations can also be smoothed by GAM. It is superior to the traditional hedonic price in the sample and out of the sample prediction alike. The results of empirical study show that both of two models can reach the goal of rapidity and preciseness and make the a-s ratio toward 1. As to the equity, although they are not improved very much, the models don't bring serious vertical inequity. However, GAM is better than hedonic pricing when compared to each other. Due to the great progress of computer technology, the application of GAM to mass assessment can be increased greatly and is worthy continuing further study. 大量估價公告現值比率研究特徵價格法可加性模型 mass assessment Announced Land Current Value (ALCV) assessment-sale price ratio study hedonic pricing Generalized Additive Model (GAM)
37	可加性模型保險之應用：壽險保費收入與總體經濟指標美、日、中、英、德之模型比較 / An Application of Insurance in Additive Model：United States's, Japan's,Taiwan's,England's and germnany's Life Insurance Model between Premiums and Macro-variables comparison. 許光宏, Ellit G. Sheu Unknown Date (has links) 在線性模型中以計算容易，解釋方便為著稱，但是比須加入許多嚴格限制，而對於事後之模型檢測亦要花費番功夫。，而可加性模型只要函數給定，backfitting 演算法收歛即可。可加性模型除了保留線性模型的加法性及解釋能力外，尚且提高了估計準度。在美、日、中、英、德五個國家的保險市場中，雖然判定係數的提升亦大有斬獲 (0.85->0.9957)，然而在台灣我們根據實證一、提升統計應用水準，大幅提高模型變數的解釋能力，模型內MSE(Me Square Error)大幅降低。(見表5-1、表5-2、表5-3、表5-4、表5-5、表5-6、二、維持了線性模型方便的解釋能力。三、提升估計水準，用以比較二種模型之優劣時，採1991年保費收入之實際值與估計值之比較(見表 5-3，表 5-6，表 5-9，表 5-12，表 5- 15)，可發現線性模型誤差率與可加性模型誤差率的比值美國為2倍、日本為12倍、臺灣為4.55倍、英國為2.95倍、德國為2.95倍。四、函數以圖形方式表示顯而易見。可加性模型所做的保費收入估計模型 / An Application of Insurance in Additive Model：United States's, Japan's,Taiwan's,England's and germnany's Life Insurance Model between Premiums and Macro-variables comparison. 可加性模型壽險保費總體經濟指標平滑法向後遞迴配適演算法 Additive model life premium macroeconomics index smoothing backkfitting algorithm
38	Elförbrukningen i svenska hushåll : En analys inom projektet ”Förbättrad energistatistik i bebyggelsen” för Energimyndigheten / Electricity consumption in Swedish households : An analysis in the project “Improved energy statistics for settlements” for the Swedish Energy Agency Nilsson, Josefine, Xie, Jing January 2012 (has links) Energimyndigheten har drivit ett projekt kallat ”Förbättrad energistatistik i bebyggelsen” för att få mer kunskap om energianvändningen i byggnader. Denna rapport fokuserar på ”Mätning av hushållsel på apparatnivå” som var ett delprojekt. Diverse regressionsmodeller används i denna rapport för att undersöka sambandet mellan elanvändningen och de olika förklarande variablerna, som exempelvis hushållens bakgrundsvariabler, hushållstyp och geografiska läge, elförbrukningen av olika elapparater samt antalet elapparater. Datamaterialet innefattar 389 hushåll där de flesta är spridda runt om i Mälardalen. Ett fåtal mätningar gjordes på hushåll i Kiruna och Malmö. Slutsatsen vi kan dra från denna uppsats är att hushållens bakgrund, hustyp, geografiska läge och antal elapparater samt dessa apparaters typ har relevans för elförbrukningen i ett hushåll. / The Swedish Energy Agency conducted a project which is called “Improved energy statistics for settlements”. This report focuses on one field of the project: “households’ electricity use on device level”. Various regression models are used in the analysis to analyze the relationship between electricity usage and different explanatory variables, for instance: background variables for the household, type of household, geographical setting, usage of different electrical devices and quantity of electrical devices used. The data material consists of 389 households which are spread around the region of Märlardalen except for a few households from the communities of Kiruna and Malmö. The conclusion we can draw from this thesis shows that the background variables for a household, its type, its geographical setting and the amount and type of devices it contains all have a contribution to the electricity usage in the household. / Förbättrad energistatistik i bebyggelsen Electricity consumption light household linear regression dummy variable imputation Markov Chain Monte Carlo Generalized Additive Model Elförbrukning belysning hushåll linjär regression dummyvariabel imputering Markov Chain Monte Carlo Generaliserad Additiv Modell
39	Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit Du Toit, Jan Valentine January 2006 (has links) In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing model selection based on an objective model selection criterion. Partial residual plots are only utilized after the best model is found to gain insight into the relationships between inputs and the target. Models are organized in a search tree with a greedy search procedure that identifies good models in a relatively short time. The automated construction algorithm, implemented in the powerful SAS® language, is nontrivial, effective, and comparable to other model selection methodologies found in the literature. This implementation, which is called AutoGANN, has a simple, intuitive, and user-friendly interface. The AutoGANN system is further extended with an approximation to Bayesian Model Averaging. This technique accounts for uncertainty about the variables that must be included in the model and uncertainty about the model structure. Model averaging utilizes in-sample model selection criteria and creates a combined model with better predictive ability than using any single model. In the field of Credit Scoring, the standard theory of scorecard building is not tampered with, but a pre-processing step is introduced to arrive at a more accurate scorecard that discriminates better between good and bad applicants. The pre-processing step exploits GANN models to achieve significant reductions in marginal and cumulative bad rates. The time it takes to develop a scorecard may be reduced by utilizing the automated construction algorithm. / Thesis (Ph.D. (Computer Science))--North-West University, Potchefstroom Campus, 2006. Akaike Information Criterion AIC Automated construction algorithm Bayesian Model Averaging Credit scoring Data mining Generalized Additive Neural Network GANN Generalized Additive Model GAM Interactive construction algorithm Model averaging Neural network Partial residua Predictive modeling Schwarz information criterion SBC
40	Automated construction of generalized additive neural networks for predictive data mining / Jan Valentine du Toit Du Toit, Jan Valentine January 2006 (has links) In this thesis Generalized Additive Neural Networks (GANNs) are studied in the context of predictive Data Mining. A GANN is a novel neural network implementation of a Generalized Additive Model. Originally GANNs were constructed interactively by considering partial residual plots. This methodology involves subjective human judgment, is time consuming, and can result in suboptimal results. The newly developed automated construction algorithm solves these difficulties by performing model selection based on an objective model selection criterion. Partial residual plots are only utilized after the best model is found to gain insight into the relationships between inputs and the target. Models are organized in a search tree with a greedy search procedure that identifies good models in a relatively short time. The automated construction algorithm, implemented in the powerful SAS® language, is nontrivial, effective, and comparable to other model selection methodologies found in the literature. This implementation, which is called AutoGANN, has a simple, intuitive, and user-friendly interface. The AutoGANN system is further extended with an approximation to Bayesian Model Averaging. This technique accounts for uncertainty about the variables that must be included in the model and uncertainty about the model structure. Model averaging utilizes in-sample model selection criteria and creates a combined model with better predictive ability than using any single model. In the field of Credit Scoring, the standard theory of scorecard building is not tampered with, but a pre-processing step is introduced to arrive at a more accurate scorecard that discriminates better between good and bad applicants. The pre-processing step exploits GANN models to achieve significant reductions in marginal and cumulative bad rates. The time it takes to develop a scorecard may be reduced by utilizing the automated construction algorithm. / Thesis (Ph.D. (Computer Science))--North-West University, Potchefstroom Campus, 2006. Akaike Information Criterion AIC Automated construction algorithm Bayesian Model Averaging Credit scoring Data mining Generalized Additive Neural Network GANN Generalized Additive Model GAM Interactive construction algorithm Model averaging Neural network Partial residua Predictive modeling Schwarz information criterion SBC

Search results