Spelling suggestions: "subject:"predictive models"" "subject:"redictive models""
21 |
Analysis of Healthcare Coverage Using Data Mining TechniquesTekieh, Mohammad Hossein 12 January 2012 (has links)
This study explores healthcare coverage disparity using a quantitative analysis on a large dataset from the United States. One of the objectives is to build supervised models including decision tree and neural network to study the efficient factors in healthcare coverage. We also discover groups of people with health coverage problems and inconsistencies by employing unsupervised modeling including K-Means clustering algorithm.
Our modeling is based on the dataset retrieved from Medical Expenditure Panel Survey with 98,175 records in the original dataset. After pre-processing the data, including binning, cleaning, dealing with missing values, and balancing, it contains 26,932 records and 23 variables. We build 50 classification models in IBM SPSS Modeler employing decision tree and neural networks. The accuracy of the models varies between 76% and 81%. The models can predict the healthcare coverage for a new sample based on its significant attributes. We demonstrate that the decision tree models provide higher accuracy that the models based on neural networks. Also, having extensively analyzed the results, we discover the most efficient factors in healthcare coverage to be: access to care, age, poverty level of family, and race/ethnicity.
|
22 |
Analysis of Healthcare Coverage Using Data Mining TechniquesTekieh, Mohammad Hossein 12 January 2012 (has links)
This study explores healthcare coverage disparity using a quantitative analysis on a large dataset from the United States. One of the objectives is to build supervised models including decision tree and neural network to study the efficient factors in healthcare coverage. We also discover groups of people with health coverage problems and inconsistencies by employing unsupervised modeling including K-Means clustering algorithm.
Our modeling is based on the dataset retrieved from Medical Expenditure Panel Survey with 98,175 records in the original dataset. After pre-processing the data, including binning, cleaning, dealing with missing values, and balancing, it contains 26,932 records and 23 variables. We build 50 classification models in IBM SPSS Modeler employing decision tree and neural networks. The accuracy of the models varies between 76% and 81%. The models can predict the healthcare coverage for a new sample based on its significant attributes. We demonstrate that the decision tree models provide higher accuracy that the models based on neural networks. Also, having extensively analyzed the results, we discover the most efficient factors in healthcare coverage to be: access to care, age, poverty level of family, and race/ethnicity.
|
23 |
Testing general rules in landscape ecology: Understanding the effects of landscape pattern on the avifauna of South East QueenslandDanielle Shanahan Unknown Date (has links)
Human land-use has a profound influence on wildlife populations; habitat loss can directly decrease population size and carrying capacity, and isolation of the remaining populations can increase their extinction probability. Landscape ecology as a discipline has worked towards creating general rules for the way species respond to landscape change. These rules include, for example, estimates of thresholds at which populations respond more severely to landscape level variables, or general theories as to which species will be more susceptible to landscape change. The demand for these generalisations is driven by the need for inexpensive, rapid and effective methods to manage problems caused by landscape change. The question as to whether general rules are accurate or useful solicits mixed responses from scientists and conservation managers. The most cited reason for this mixed response is the empirical inconsistencies in the way species respond to landscape change. In this thesis I suggest that general rules must be tested in an a priori fashion to directly assess their utility and assist in their translation from theory to practical tool. My primary aim is to test general rules in landscape ecology through creating a priori models; these models are based on ecological theories and existing species and landscape information. My secondary aim is to enhance the understanding of landscape level habitat fragmentation problems for birds in South East Queensland, Australia. I address these aims within four main data chapters as summarised below, where Chapter 1 is a broad introduction to the topic. Chapter 2 asks the question: can general rules and threshold theory be used to predict bird species patch occupancy in a fragmented landscape? I create a simple decision tree model based on threshold theories in landscape ecology, and use this to predict presence or absence of 17 forest bird species in a largely agricultural landscape. This decision tree is broadly based on theoretical patch area and connectivity threshold estimates, and incorporates basic species specific information (such as habitat suitability and mobility). I test this model using a presence/absence survey data set. The process of assessing for which species the model did not work is revealing: I show that the accuracy of ‘present’ predictions is somewhat compromised for habitat specialist species and ‘absent’ predictions are compromised for generalist species. Through creating the ‘optimal’ decision tree models for these species I show that these inaccuracies are likely to arise from vegetation mapping problems, including the lack of a ‘habitat quality’ measure. The study therefore highlights the need for high quality vegetation maps to carry out effective planning. For the majority of species I achieve reasonable predictive success. This study provides hope that general rules have some predictive ability in landscape ecology, and highlights the value of testing models to assess why, and for which species general rules may or may not work. In Chapter 3, I assess the utility of basic ecological principles for predicting the relative value of vegetation patches for specific bird species, focusing on a highly altered urban landscape. I create a model based on the mechanisms expected to be driving species abundance within urban landscapes where most sensitive bird species are likely to be already lost. The model states that a bird species will be more abundance in areas where the vegetation structure matches a species foraging height requirements; however, this effect will be moderated by the landscape context of the patch. From this model I create an index to quantify and rank the predicted value of patches for 30 species of interest in unmanaged and revegetated urban sites, in Brisbane city, Australia. I test the model using bird abundance data, and show that it achieved a reasonable level of predictive accuracy. The model presented within this study is significant as it has relatively low complexity and limited data requirements, yet provides a means to assess how altering the landscape context and vegetation structure within a patch may enhance the abundance of bird species of interest. With further development, the relative simplicity of the model should make it easy to use for land managers. In Chapter 4 I aim to examine how landscape features influence spatial genetic relatedness patterns at a fine, within-population scale on bird species with different life-history traits. I argue that individual level movement characteristics (particularly dispersal routes) in a variable landscape will drive these spatial genetic patterns; thus I create an a priori model based on this theory to make more specific quantifiable predictions of relatedness patterns. I use animal movement theory to deduce these movement characteristics (particularly the strength of avoidance of habitat boundaries) for species with different life-history traits, and apply the model for two closely related passerine bird species which co-occur within South East Queensland (the yellow-throated scrubwren, Sericornis citreogularis, a habitat specialist; and the white-browed scrubwren, Sericornis frontalis, a habitat generalist). I test these models using data on pairwise genetic distances between individuals of each species. The key outcome of this study is that the genetic data supports my predictions that individual level movement characteristics are a mechanistic driver of within-population spatial genetic patterns. For the habitat specialist bird species, the genetic data supported a model which incorporated a strong avoidance response to habitat boundaries and for the generalist species no response to habitat boundaries. This study takes a novel approach to an individual-based genetics study, making specific quantifiable predictions of how a species may be impacted by different landscape features. This research could have significant implications for conservation management, particularly for understanding and managing population responses to a changing landscape, and the early stages of fragmentation. In Chapter 5 I address the question of whether urban revegetation is more successful if it is used to extend the area of existing vegetation, or enhance connectivity in the landscape. This study is novel; for instead of assessing the factors influencing the extinction of a species in a patch, I assess the factors influencing colonisation. Using bird survey data, I use hierarchical partitioning and model selection approaches to determine the relative effect of connectivity and patch area on bird species richness and abundance in revegetated patches. The key finding was that connectivity provided better model fit for bird species richness, and total patch area and connectivity was better for mean bird abundance. My results suggest that the conservation goals of revegetation efforts, particularly in an urban landscape, must be considered when planning a revegetation program. Using revegetation to increase patch area may be the most effective approach for ensuring species persistence over time (i.e. abundance). However, to attract more species into an area enhancing the total area connected in the landscape may be a better approach. In this thesis I explicitly test general rules and theories in landscape ecology within a priori predictive models. Through their generality, the models I develop are potentially suitable for application in other ecosystems. The process of synthesising these models in a simple form, and testing them in a real landscape was revealing. I was able to examine where some general rules do not work, and also where they may not apply or need adjusting. I strived to create models that are easy to use and understand, particularly within Chapters 2 and 4, by trading off simplicity and accuracy. The models produce accurate results to the point that they are arguably valuable tools for landscape managers. This is achieved without compromising their accessibility, and so the research has the potential to transcend the gap between science and real world utility.
|
24 |
Tvorba predikčních modelů / Building predictive modelsZABLOUDIL, Jakub January 2016 (has links)
This mater thesis is focused on building predictive models. Their fundamental task is to provide an early-warning system, giving information about potential enterprise bankruptcy. The main essence and aim of the thesis is to create multivariate classification models by using discriminant analysis and logistic regression. Emphasis is put on their predictive accuracy, which is assessed for period of three years before bankruptcy declaration. Attempts to optimize classification thresholds in order to increase the initial accuracy are also made. Evaluating classification reliability of several existing models and performing profile analysis assessing predictive ability of univariate ratios were accomplished as well.
|
25 |
Využitelnost moderních metod hodnocení finanční situace podniku (ukazatele EVA, MVA a průměrné náklady kapitálu) / Usability of modern evaluation methods for the financial situation in the company (indicators EVA, MVA and the cost of capital)MINARČÍKOVÁ, Jana January 2016 (has links)
The aim of this diploma paper is to evaluate the applicability and usability of modern evaluation methods for the financial situation in the company, focused on indicators EVA, MVA, and the cost of capital. Firstly, there are some basic terms defined in the theoretical part. The methodological part describes single the steps of calculations that were done in order to find out the answers for these hypothetical assumptions: 1.Evaluate whether it is possible to substitute difficult to detect characteristics of modern EVA and MVA indicators by different, easier indicator which will have at least the same explanatory power. 2.EVA indicator is able to predict the future development of a business as well as traditional predication models. These hypotheses were tested on a sample of 100 Czech firms in the construction industry. The data source was the database Albertina, which was purchased through a grant GAJU 053/2016/S. In the practical part are introduced results and its interpretations. The thesis conclusion is focused on the evaluation of particular hypotheses. The analysis proved: irreplaceability of indicators EVA and MVA in the success evaluation of company in certain year and inability of indicators EVA, MVA and predicting models as well to predict the future evolution
|
26 |
Evaluating the Performance of Leadership in Energy and Environmental Design (LEED) Certified Facilities using Data-Driven Predictive Models for Energy and Occupant Satisfaction with Indoor Environmental Quality (IEQ)January 2015 (has links)
abstract: Given the importance of buildings as major consumers of resources worldwide, several organizations are working avidly to ensure the negative impacts of buildings are minimized. The U.S. Green Building Council's (USGBC) Leadership in Energy and Environmental Design (LEED) rating system is one such effort to recognize buildings that are designed to achieve a superior performance in several areas including energy consumption and indoor environmental quality (IEQ). The primary objectives of this study are to investigate the performance of LEED certified facilities in terms of energy consumption and occupant satisfaction with IEQ, and introduce a framework to assess the performance of LEED certified buildings.
This thesis attempts to achieve the research objectives by examining the LEED certified buildings on the Arizona State University (ASU) campus in Tempe, AZ, from two complementary perspectives: the Macro-level and the Micro-level. Heating, cooling, and electricity data were collected from the LEED-certified buildings on campus, and their energy use intensity was calculated in order to investigate the buildings' actual energy performance. Additionally, IEQ occupant satisfaction surveys were used to investigate users' satisfaction with the space layout, space furniture, thermal comfort, indoor air quality, lighting level, acoustic quality, water efficiency, cleanliness and maintenance of the facilities they occupy.
From a Macro-level perspective, the results suggest ASU LEED buildings consume less energy than regional counterparts, and exhibit higher occupant satisfaction than national counterparts. The occupant satisfaction results are in line with the literature on LEED buildings, whereas the energy results contribute to the inconclusive body of knowledge on energy performance improvements linked to LEED certification. From a Micro-level perspective, data analysis suggest an inconsistency between the LEED points earned for the Energy & Atmosphere and IEQ categories, on one hand, and the respective levels of energy consumption and occupant satisfaction on the other hand. Accordingly, this study showcases the variation in the performance results when approached from different perspectives. This contribution highlights the need to consider the Macro-level and Micro-level assessments in tandem, and assess LEED building performance from these two distinct but complementary perspectives in order to develop a more comprehensive understanding of the actual building performance. / Dissertation/Thesis / Masters Thesis Engineering 2015
|
27 |
Análise de sensibilidade e propagação de incerteza em modelos hidrossedimentológicos: contribuição à modelagem de bacias hidrográficas / Sensitivity analysis and uncertainty in hydrosedimentological models : contribution to modeling of watershedPereira, Luiz Henrique [UNESP] 28 October 2016 (has links)
Submitted by LUIZ HENRIQUE PEREIRA null (e_luizh@yahoo.com.br) on 2016-12-20T11:04:00Z
No. of bitstreams: 1
pereira_lh_geo_RC.pdf: 6871364 bytes, checksum: 830de7380f478a93aeff860b974bc3ce (MD5) / Rejected by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br), reason: Solicitamos que realize uma nova submissão seguindo a orientação abaixo:
Incluir o número do processo de financiamento nos agradecimentos da dissertação/tese.
Corrija esta informação e realize uma nova submissão com o arquivo correto.
Agradecemos a compreensão.
on 2016-12-22T10:30:31Z (GMT) / Submitted by LUIZ HENRIQUE PEREIRA (e_luizh@yahoo.com.br) on 2016-12-22T10:46:39Z
No. of bitstreams: 1
pereira_lh_dr_rcla.pdf: 6574636 bytes, checksum: c3702d008829fade5dc14e767d174030 (MD5) / Approved for entry into archive by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br) on 2016-12-22T12:04:12Z (GMT) No. of bitstreams: 1
pereira_lh_dr_rcla.pdf: 6574636 bytes, checksum: c3702d008829fade5dc14e767d174030 (MD5) / Made available in DSpace on 2016-12-22T12:04:12Z (GMT). No. of bitstreams: 1
pereira_lh_dr_rcla.pdf: 6574636 bytes, checksum: c3702d008829fade5dc14e767d174030 (MD5)
Previous issue date: 2016-10-28 / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / Atenuar incertezas nos resultados de modelos dinâmicos que estimam a produção de sedimento na vertente e o transporte do material no canal fluvial torna-se fundamental quando se considera a premência de políticas territoriais em minimizar o risco de sub ou super exploração dos recursos naturais, bem como indicar a disponibilidade de água em bacias hidrográficas. A aplicação de modelos de simulação de processos ambientais tem sido amplamente favorecida pelo avanço das geotecnologias, em especial dos Sistemas de Informações Geográficas, que viabilizam a extração, tratamento, análise e integração de dados geoespaciais. No entanto, observa-se que pouca atenção tem sido dada à análise e avaliação dos fatores responsáveis pela discrepância entre estimativas e observações. Diante o exposto, este trabalho apresentou como principal objetivo caracterizar a variabilidade espacial da incerteza propagada pela aplicação dos modelos hidrossedimentológicos EUPS, MEUPS e REUPS, e indicar sua correlação espacial com características geomorfométricas da área em análise. As atividades foram desenvolvidas com o escopo teórico da modelagem de sistemas ambientais, e baseadas em técnicas de geoprocessamento e sensoriamento remoto. Os resultados obtidos demonstram que a sensibilidade dos parâmetros é específica para cada tipo de bacia modelada, sendo os Fatores C, P os mais sensíveis para bacia do Monjolo Grande, e os Fatores C e LS os mais sensíveis para a bacia do ribeirão Jacutinga. As incertezas possuem maior expressividade nas áreas predominantemente de solos arenosos, e há correlação significativa do grau de incerteza dos resultados dos modelos com as características geomorfológicas, sobretudo em áreas de vertentes côncavas. / Lessening the uncertainties in the results from geospatial dynamic models, considering those that estimate the sediment production in the hillslope and the transportation of sediments over the river-channel, becomes essential when considering the current need to gather trustworthy quantitative information. In this sense, the contribution of hydro-sedimentological modeling is a significant part on the landscape planning phase, effectively composing the process of agricultural land management. The application of geospatial modeling has been widely benefited by the improvement on geotechnologies. However, it´s application has been applied indiscriminately from the procedures and methods for gathering entrance data. Differences in spatial scale of analysis, the characteristics of the geographical area of interest and the evaluation of the trustworthiness of the results are not taken into account. With these issues exposed this paper aimed to characterize the spatial variables and the uncertainties programed by the applications of the hydro-sedimentological models USLE, MUSLE e RUSLE. Indicating it´s spatial correlation with the geomorphometry characteristics of the analyzed areas it was possible to propose an objective criteria for the selection of the models based on the area´s geomorphological characteristics searching to minimize the statistical uncertainties thus offering measurements of trustworthiness in the final results. The activities were developed with the theoretical scope of environmental systems modeling and based on geoprocessing and remote sensing techniques. Results gathered show that the sensitivity of the parameters is specific to each type of watershed that was modeled, C and P factors being the most sensitives for Monjolo Grande river basin (sandy soil), and C and LS factors were the most sensitives for Jacutinga river basin (clay soil). The uncertainties are more prominent in the areas where the soil is predominantly sandy. There was a significant correlation between the level of uncertainty and the results from the models with geomorphological characteristic, especially in concave hillslope areas. / FAPESP: 2013/13885-0
|
28 |
Aboveground biomass of Atlantic Forest: modeling and strategies for carbon estimate / Biomassa acima do solo da Mata Atlântica: modelagem e estratégias para a estimativa de carbonoMichel Anderson Almeida Colmanetti 23 May 2018 (has links)
The current concerning on potential effect of CO2 on climate change has assigned to the biomass of the tropical forest the importance as a sink of carbon. However, the heterogeneity of the natural ecosystems in tropics has significant implications for biomass estimation. This study proposed different biomass models using destructive sampling for the highly diverse Atlantic Forest. Models from two different approaches: generalized and species-specific were fitted and had the performance compared. Regarding the generalized models, it was proposed different covariates including diameter at breast height (dbh), height to the crown base, woody specific gravity (wsg) and functional plant traits. The species-specific models were fitted by linear mixed-models (LME) using species as a random effect and ordinary least square (OLS). The performance of all models and approaches were compared to existing models from the literature. Also, different estimates of biomass in stand- and forest-level, and the implications for carbon quantification were verified. Additionally, two methods for calibration for individual tree-level biomass model were proposed, and different strategies for tree selection were tested. The primary results show that the species-specific model using LME had better performance and can be used for the most abundant species, and models that include dbh, wsg, and plant traits are suitable for less abundant species. The calibration using the LME method in some cases can be used as an alternative for species that do not have a random effect presented here being a reasonable alternative for diverse tropical forests such as Atlantic Forest. / Devido à atual preocupação do potencial efeito do CO2 nas mudanças climáticas atribuiu-se à biomassa das florestas tropicais uma grande importância como reservatório de carbono. No entanto, a heterogeneidade dos ecossistemas naturais nos trópicos tem significativas implicações para a estimativa de sua biomassa. O presente estudo propõe diferentes modelos de biomassa utilizando amostragem destrutiva para Mata Atlântica, uma floresta altamente diversa. Duas abordagens de modelos: generalizados e espécies-específicos foram ajustados e o desempenho comparado. Em relação aos modelos generalizados, foram testadas diferentes covariáveis, utilizando o diâmetro à altura do peito (dbh; em inglês), a altura da base da copa, densidade básica da madeira (wsg; em inglês) e os \"functional plant traits\". Os modelos espécies-específicos foram ajustados por modelos mistos lineares (LME; em inglês) utilizando as espécies como efeito aleatório e pelos mínimos quadrados (OLS; em inglês). O desempenho dos diferentes modelos e abordagens foi comparado ao desempenho de modelos existentes da literatura. Também foram verificadas diferentes estimativas de biomassa em nível de estande e floresta, assim como as implicações para a quantificação de carbono. Ainda, foram testados dois métodos de calibração para o modelo de biomassa em nível de árvore individual, variando o número de árvores e estratégias para seleção de árvores. Com base nos resultados, o modelo espécies-específicos usando LME apresentou melhor desempenho, podendo ser uma alternativa para as espécies mais abundantes, enquanto o modelo generalizado que inclui dbh, wsg e \"functional plant traits\" mostraram-se adequados para espécies menos abundantes. A calibração usando o método LME em alguns casos pode ser usada como uma alternativa para espécies que não possuem equação específica, sendo uma alternativa razoável para florestas tropicais altamente diversas, como a Mata Atlântica.
|
29 |
Otimização de parâmetros de interação do modelo UNIFAC-VISCO de misturas de interesse para a indústria de óleos essenciais / Optimization of interaction parameters for UNIFAC-VISCO model of mixtures interesting to essential oil industriesCamila Nardi Pinto 27 February 2015 (has links)
A determinação de propriedades físicas dos óleos essenciais é fundamental para sua aplicação na indústria de alimentos e também em projetos de equipamentos. A vasta quantidade de variáveis envolvidas no processo de desterpenação, tais como temperatura, pressão e composição, tornam a utilização de modelos preditivos de viscosidade necessária. Este trabalho teve como objetivo a obtenção de parâmetros para o modelo preditivo de viscosidade UNIFAC-VISCO com aplicação do método de otimização do gradiente descendente, a partir de dados de viscosidade de sistemas modelo que representam as fases que podem ser formadas em processos de desterpenação por extração líquido-líquido dos óleos essenciais de bergamota, limão e hortelã, utilizando como solvente uma mistura de etanol e água, em diferentes composições, a 25ºC. O experimento foi dividido em duas configurações; na primeira os parâmetros de interação previamente reportados na literatura foram mantidos fixos; na segunda todos os parâmetros de interação foram ajustados. O modelo e o método de otimização foram implementados em linguagem MATLAB®. O algoritmo de otimização foi executado 10 vezes para cada configuração, partindo de matrizes de parâmetros de interação iniciais diferentes obtidos pelo método de Monte Carlo. Os resultados foram comparados com o estudo realizado por Florido et al. (2014), no qual foi utilizado algoritmo genético como método de otimização. A primeira configuração obteve desvio médio relativo (DMR) de 1,366 e a segunda configuração resultou um DMR de 1,042. O método do gradiente descendente apresentou melhor desempenho para a primeira configuração em comparação com o método do algoritmo genético (DMR 1,70). Para a segunda configuração o método do algoritmo genético obteve melhor resultado (DMR 0,68). A capacidade preditiva do modelo UNIFAC-VISCO foi avaliada para o sistema de óleo essencial de eucalipto com os parâmetros determinados, obtendo-se DMR iguais a 17,191 e 3,711, para primeira e segunda configuração, respectivamente. Esses valores de DMR foram maiores do que os encontrados por Florido et al. (2014) (3,56 e 1,83 para primeira e segunda configuração, respectivamente). Os parâmetros de maior contribuição para o cálculo do DMR são CH-CH3 e OH-H2O para a primeira e segunda configuração, respectivamente. Os parâmetros que envolvem o grupo C não influenciam no valor do DMR, podendo ser excluído de análises futuras. / The determination of physical properties of essential oils is critical to their application in the food industry and also in equipment design. The large number of variables involved in deterpenation process, such as temperature, pressure and composition, to make use of viscosity predictive models required. This study aimed obtain parameters for the viscosity predictive model UNIFAC-VISCO using gradient descent as optimization method to model systems viscosity data representing the phases that can be formed in deterpenation processes for extraction liquid-liquid of bergamot, lemon and mint essential oils, using aqueous ethanol as solvente in different compositions at 25 º C. The work was divided in two configurations; in the first one the interaction parameters previously reported in the literature were kept fixed; in the second one all interaction parameters were adjusted. The model and the gradient descent method were implemented in MATLAB language. The optimization algorithm was runned 10 times for each configuration, starting from different arrays of initial interaction parameters obtained by the Monte Carlo method. The results were compared with the study carried out by Florido et al. (2014), which used genetic algorithm as optimization method. The first configuration provided an average deviation (DMR) of 1,366 and the second configuration resulted in a DMR 1,042. The gradient descent method showed better results for the first configuration comparing with the genetic algorithm method (DMR 1.70). On the other hand, for the second configuration the genetic algorithm method had a better result (DMR 0.68). The UNIFAC-VISCO model predictive ability was evaluated for eucalyptus essential oil system using the obtained parameters, providing DMR equal to 17.191 and 3.711, for the first and second configuration, respectively. The parameters determined by genetic algorithm presented lower DMR for the two settings (3.56 and 1.83 to the first and second configuration, respectively). The major parameters for calculating the DMR are CH-CH3 and OH-H2O to the first and second configuration, respectively. The parameters involving the C group did not influence the DMR and may be excluded from further analysis.
|
30 |
Analysis of Healthcare Coverage Using Data Mining TechniquesTekieh, Mohammad Hossein January 2012 (has links)
This study explores healthcare coverage disparity using a quantitative analysis on a large dataset from the United States. One of the objectives is to build supervised models including decision tree and neural network to study the efficient factors in healthcare coverage. We also discover groups of people with health coverage problems and inconsistencies by employing unsupervised modeling including K-Means clustering algorithm.
Our modeling is based on the dataset retrieved from Medical Expenditure Panel Survey with 98,175 records in the original dataset. After pre-processing the data, including binning, cleaning, dealing with missing values, and balancing, it contains 26,932 records and 23 variables. We build 50 classification models in IBM SPSS Modeler employing decision tree and neural networks. The accuracy of the models varies between 76% and 81%. The models can predict the healthcare coverage for a new sample based on its significant attributes. We demonstrate that the decision tree models provide higher accuracy that the models based on neural networks. Also, having extensively analyzed the results, we discover the most efficient factors in healthcare coverage to be: access to care, age, poverty level of family, and race/ethnicity.
|
Page generated in 0.0663 seconds